Skip to content

Latest commit

 

History

History
22 lines (11 loc) · 733 Bytes

README.md

File metadata and controls

22 lines (11 loc) · 733 Bytes

MapReduce_N-Gram_Count

Hadoop map reduce to compute n gram counts

The submission was programmed in python and tested on NYU Dataproc Hadoop Cluster.

To run the code: mapred streaming -input hw1.txt -output -mapper "python mapper.py" -reducer "python reducer.py" -file mapper.py -file reducer.py

--> This will run and output will be stored as <outputfile>

use this file and run: mapred streaming -input -output -mapper "python mapper2.py" -reducer "python reducer2.py" -file mapper2.py -file reducer2.py

The will be stored as a .txt file and we can parse it to check the output

We parse using the command,

hdfs dfs -cat .txt/par*