Sunday, March 11, 2012

How to change default Maven/Ant/derby/junit/.. version on Mac OS X?

I am taking Apache Ant as the example.
Instruction on other packages are similar. By default, Mac OS X 10.7.3 ships with Apache Ant 1.8.2. I will show how to upgrade this to version 1.8.3 (which is the latest @ the time of writing).
First,
cd /usr/share
ls -al ant
lrwxr-xr-x    1 root          wheel           14 Mar  9 16:08 ant -> java/ant-1.8.2

Take a note of last part of the result. i.e. ant -> java/ant-1.8.2

Which in-terns tell us folder 'ant' is a symlink to java/ant-1.8.2 folder. If you navigate to '/usr/share/java' folder, you will see all Java related installation that ships with Mac OS X. Now,
- Download & extract Apache Ant 1.8.3.
- Unzip it to '/usr/share/java'
unzip ~sumedha/tools/apache-ant-1.8.3-bin.zip  -d /usr/share/java

Update the symbolic link to point to new Ant version.
on -s -n -f /usr/share/java/ant-1.8.2/ /usr/share/ant

Now it should look like,
lrwxr-xr-x    1 root          wheel           32 Mar 10 23:15 ant -> /usr/share/java/apache-ant-1.8.3

Check the Ant version now. 1.8.3 should be shown as the version now.
ant -version
Apache Ant(TM) version 1.8.2 compiled on June 3 2011

Change the symbolic link to older version & check the version again.
ln -s -n -f /usr/share/java/apache-ant-1.8.3/ /usr/share/ant
ant -version
Apache Ant(TM) version 1.8.3 compiled on February 26 2012
Similar procedure has to be followed when upgrading other distributions under '/usr/share/java' folder.

Saturday, January 07, 2012

How to use Apache Pig to analyze a log file?

Apache Pig is a platform for analyzing large amount of data. It consists of a built-in language called 'Latin' for writing data analysis logic. When functionality provided the by 'Latin' is not enough, Pig allows you to write you own UDFs (User Define Functions) & make use of them in data analysis scripts. UDFs can be written using several languages including Java and JavaScript.

Simple tutorial bellow demonstrates how I have used basic functionality provided by Latin to do simple analysis of a Apache access log. Here I have not used any UDFs to make examples very simple. Let's walk through the samples starting from step 0.
  • Download Apache Pig from http://pig.apache.org/
  • Unzip it to a directory on your machine (eg: I used pig-0.9.1)
  • Start pig (inside ping-0.9.1 directory) 
java -Xmx512m -Xmx1024m -cp pig-0.9.1.jar org.apache.pig.Main -x local
  • You need yo have a Apache HTTP Server access log for the samples to run. I used a similar file generated from WSO2 Application Server. You can find it attached.
  • Once Apache Pig is started, it will take you to Grunt shell as follows.
  • Now, let's use few simple scripts written in Pig's Latin language to analyze the log file
Contents of the log file is similar to following.

0:0:0:0:0:0:0:1%0 - - [07/Jul/2011:09:17:35 +0530] "GET /carbon/admin/jsp/session-validate.jsp HTTP/1.1" 200 3197 "https://localhost:9443/carbon/admin/index.jsp?loginStatus=true" "Mozil la/5.0 (Macintosh; Intel Mac OS X 10_6_3) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/13.0.782.41 Safari/535.1"
127.0.0.1 - - [07/Jul/2011:09:17:35 +0530] "POST /services/ServerAdmin HTTP/1.1" 200 2165 "-" "Axis2"


Just paste following script segment to Grunt prompt & hit enter.

Scenario : viewing list of IPs from where access is made
Script
A = load 'http_access_2011-07-07.log' using PigStorage('-') as (f0,f1,f2,f3,f4);
B = foreach A generate f0;
C = distinct B;
dump C;


You will be presented with an output similar to following.
Output
(127.0.0.1 )
(0:0:0:0:0:0:0:1%0 )



More examples:

Browser Agents
Script
A = load 'http_access_2011-07-07.log' using PigStorage('"') as (f0,f1,f2,f3,f4,f5);
B = foreach A generate f5;
C = distinct B;
dump C;


Output
(Axis2)
(Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_3) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/13.0.782.41 Safari/535.1)


URI Accessed
Script
A = load 'http_access_2011-07-07.log' using PigStorage('"') as (f0,f1,f2,f3,f4);
B = foreach A generate f1;
C = distinct B;
dump C;

Output
(GET / HTTP/1.1)
(GET /carbon HTTP/1.1)
(POST /services/ServerAdmin HTTP/1.1)
(GET /carbon/admin/index.jsp HTTP/1.1)
(POST /carbon/admin/login_action.jsp HTTP/1.1)
(POST /services/RegistryAdminService HTTP/1.1)
(GET /carbon/yui/build/yahoo-dom-event/yahoo-dom-event.js HTTP/1.1)
(GET /carbon/data_service/images/data-services-uploadWSDL.gif HTTP/1.1)
(GET /carbon/viewflows/extensions/core/images/handler_flow.gif HTTP/1.1)


Access time & source
Script
A = load 'http_access_2011-07-07.log' using PigStorage('"') as (f0,f1,f2,f3,f4);
B = foreach A generate f0;
C = distinct B;
dump C;


Output
(127.0.0.1 - - [07/Jul/2011:09:20:17 +0530] )
(127.0.0.1 - - [07/Jul/2011:09:20:23 +0530] )
(0:0:0:0:0:0:0:1%0 - - [07/Jul/2011:09:17:09 +0530] )
(0:0:0:0:0:0:0:1%0 - - [07/Jul/2011:09:17:17 +0530] )
(0:0:0:0:0:0:0:1%0 - - [07/Jul/2011:09:17:18 +0530] )

Solving Apache Pig : java.lang.OutOfMemoryError

If you start Apache Pig using following,

java  -cp pig-0.9.1.jar org.apache.pig.Main -x local


you will encounter OutOfMemoryError for even relatively moderate loads. The complete error is something similar to,

java.lang.OutOfMemoryError: Java heap space
 at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.(MapTask.java:781)
 at org.apache.hadoop.mapred.MapTask$NewOutputCollector.(MapTask.java:524)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:613)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)

As the solution start Pig by providing sufficient memory for process as follows.

java -Xmx512m -Xmx1024m -cp pig-0.9.1.jar org.apache.pig.Main -x local

Thursday, January 05, 2012

Don't promote leopard rush in Yala, Sri Lanka

Found this attachment in a mail. Original author is unknown.
"Yala is full of wild life. Leopards are only one of them. Enjoy your safari. Drive slowly"

Sunday, January 01, 2012

protobuf: Ant BuildException....Cannot run program "../src/protoc"

If your building probuf-2.4.1 (for Java) on non Windows environment, chances are very much for encountering following error.

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-antrun-plugin:1.3:run (generate-sources) on project protobuf-java: An Ant BuildException has occured: Execute failed: java.io.IOException: Cannot run program "../src/protoc": error=2, No such file or directory -> [Help 1]

The reason in following entry inside protobuf-2.4.1/java/pom.xml As a work-a-round to this problem, open up the above pom.xml and do following replacement.

In vim - %s/..\/src\/protoc/protoc/g

For this to work, you need to successfully install protoc & it should be available on your path.

eg:
~$ protoc --version
libprotoc 2.4.1

Sunday, December 25, 2011

Monitoring : replace (top,htop)

Tried using htop to monitor processes running.
Installation is simple.
  • Linux: sudo apt-get install htop
  • Mac : sudo port install htop
Just type 'htop' and you will have an output similar to following.
Some of the commands I found useful:
  • Sorting by output column – Press F6 or >
  • Tree View – Press F5 or t
  • Processes per user – Press u
  • lsof Output inside htop – Press l
  • Trace a Process from htop – Press s
  • Follow a Process – Press F
Update: htop for Snow Leopard - http://jeetworks.org/node/60

Saturday, December 24, 2011

Maven tips: Fixing OutOfMemoryError

Refer: https://cwiki.apache.org/confluence/display/MAVEN/OutOfMemoryError
Simple as setting MAVEN_OPTS variable on your environment.

MAVEN_OPTS="-Xmx512m -XX:MaxPermSize=128m"
 
A widely known problem & a solution. Just adding a reference to Maven documentation for my future reference & anyone else's.

What I did not know before is following part.
"When using the Java compiler in embedded mode, each compiled class will consume heap memory and depending on the JDK being used this memory is not subject to gargabe collection, i.e. the memory allocated for the compiled classes will not be freed"

If your using 'maven-compiler-plugin' this can be set using pom.xml of your project as well.
Refer: http://maven.apache.org/plugins/maven-compiler-plugin/examples/compile-with-memory-enhancements.html

Monday, December 19, 2011