Creating and Running your Java (JDK6) Code on Hadoop 1.2.1

1. Import hadoop-core-1.2.1.jar to your project as external jar
2. And others Import commons-logging-1.1.3.jar (optional)
3. Write Mapreduce job code, compile and create the class files
7. Export the project as JAR file – filename.jar
8. Start the hadoop cluster

hduser@neo:/usr/local/hadoop$ bin/start-all.sh

9.Copy data to HDFS
hduser@neo:~$ bin/hadoop dfs -copyFromLocal <local-dir> <hdfs-input-dir>
example:-

hduser@neo:~$ bin/hadoop dfs -copyFromLocal /tmp/ncdc /user/hduser/ncdc

10. Run the MapReduce job
hduser@neo:~$ bin/hadoop jar filename.jar <classname-containing-main-function> <hdfs-input-dir> <hdfs-output-dir>
example:-

hduser@neo:~$ bin/hadoop jar /home/naved/workspace/MaxTemperature/bin/MaxTemperature.jar MaxTemperatureWithCombiner /user/hduser/ncdc/ /user/hduser/ncdc-output

11. Retrieve the job result from HDFS
hduser@neo:~$ bin/hadoop dfs -getmerge <hdfs-output-dir> <local-dir>
example:-

hduser@neo:~$ bin/hadoop dfs -getmerge /user/hduser/ncdc-output /tmp/ncdc-output
Advertisements

Installing and running PIG 0.11.1 with Hadoop 1.2.1

1. Download pig from pig.apache.org
2. copy it to the hadoop user /home/hduser/
3. extract , make sure the extracted directory has the same permission (hduser:hadoop), insert path to $PATH
4. Start the hadoop server (./start-all.sh), check with jps
5. Copy the data to HDFS,
>hadoop dfs -copyFromLocal /user/hduser/<file-name>
6. run script using pig –
>pig <script-name>
7. check if output is generated
>hadoop dfs -ls /user/hduser/<file-name>/ [Note that , the output directory is automatically created and has the same name as the input file]
8. check output file
> hadoop dfs -cat /user/hduser/<output-dir>/part-r-00000|head -5