hadoop-mapreduce

This repository will be used to pool code and information pertaining to a project using Hadoop MapReduce in a Graduate Big Data course.

The instructions for the project are in Assignment2.pdf. The project uses Cloudera Hadoop running on VMWare. The Eclipse IDE included with the Cloudera Quickstart VM was used for development.

To run this as a JAR file in Cloudera, follow these steps:

Export the project as a JAR file (deselect 20-newsgroups from the export list) into the cloudera folder.
Make sure a copy of the 20-newsgroups dataset is on the Desktop. If you haven't made this copy, do so now.
Ensure that the HDFS and YARN services are running, and that the NameNode has left safe mode.
In the terminal, run the following command to make a directory to house the input:

hdfs dfs -mkdir /user/cloudera/input/

Next, load the data from 20-newgroups into the input directory.

hdfs dfs -copyFromLocal ~/Desktop/20-newsgroups/ /user/cloudera/input/

The terminal will probably complain the whole time (it will throw InterruptedException warnings, in my experience), but don't be alarmed! It's actually doing what it's supposed to be doing. Let it do its work. It may take a while.

To run the JAR file, now enter the following in Terminal:

hadoop jar Assignment2.jar AssignmentDriver /user/cloudera/input/ /user/cloudera/output/

If all goes well, this should properly produce the desired outputs. Check the results with the hdfs dfs -ls command.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
Article.java		Article.java
Assignment 2.pdf		Assignment 2.pdf
AssignmentDriver.java		AssignmentDriver.java
CategoryOverallStatsMapper.java		CategoryOverallStatsMapper.java
CategoryOverallStatsReducer.java		CategoryOverallStatsReducer.java
CategoryStatsMapper.java		CategoryStatsMapper.java
CategoryStatsReducer.java		CategoryStatsReducer.java
DocumentParsingMapper.java		DocumentParsingMapper.java
DocumentParsingReducer.java		DocumentParsingReducer.java
OverallStatsMapper.java		OverallStatsMapper.java
OverallStatsReducer.java		OverallStatsReducer.java
README.md		README.md
WholeFileInputFormat.java		WholeFileInputFormat.java

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

hadoop-mapreduce

About

Releases

Packages

Languages

cartermjones/hadoop-mapreduce

Folders and files

Latest commit

History

Repository files navigation

hadoop-mapreduce

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages