ud617 ยป

All Hadoop commands from the course videos

Lesson 2

Part 4, HDFS Demo (link)

  • hadoop fs -ls list a directory
  • hadoop fs -put purchases.txt upload a local file to HDFS
  • these are analogous to the traditional UNIX commands
    • hadoop fs -tail purchases.txt print last few lines of a file
    • hadoop fs -cat purchases.txt print the whole content of a file
    • hadoop fs -mv purchases.txt newname.txt rename (move) a file
    • hadoop fs -rm newname.txt delete a file
  • hadoop fs -mkdir myinput create a directory
  • hadoop fs -put purchases.txt myinput upload a file to a directory

    Part 13, Running a Job (link)

  • hadoop fs -ls myinput list files in directory myinput

  • hadoop jar /usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.0.0-mr1/cdh4.1.1.jar \ -mapper mapper.py -reducer reducer.py -file mapper.py -file reducer.py -input myinput -output joboutput run Python mapper and reducer using the Hadoop Streaming feature
  • hadoop fs -cat joboutput/part-00000 | less print file in HDFS and paginate the output`
  • hadoop fs -get joboutput/part-00000 mylocalfile.txt download a file to local computer

    Part 14, Simplyfying Things (link)

  • hs mapper.py reducer.py myinput joboutput convenience wrapper script for running jobs

Lesson 3

Part 7, Putting it All Together (link)

  • ./mapper.py run the script
  • Ctrl+d or Command+d input an End-Of-File character to the terminal
  • head -50 ../data/purchases.txt > testfile take first 50 lines of purchases.txt and write the output as testfile
  • cat testfile | ./mapper.py print the file and redirect the output into the mapper. Useful for testing.
  • cat testfile | ./mapper.py | sort | ./reducer.py simulates the pipeline locally with small dataset. For testing.