Group 06 (Aaqib Saeed, Muhammad Arif Wicaksana, Alexandru Serban)
In this project, we try to characterize Dutch job advertisement tweets in period 2014-2015. Questions we try to answer:
- What kind of jobs are posted on Twitter?
- How is the job trendings over the period?
- Which area over the most jobs on Twitter?
Our screencast:
##Files:
- geo-tagged.pig is for geo tagged tweets generated by twitter-geotagged.jar.
- nongeo-tagged.pig will sample the tweets generated from map reduce job by twitter-nongeotagged.jar.
- word-freq.pig will generate word counts/frequencies and the output will be used by R script.
- The folder bigdata-0.2 contains MapReduce code.
- folder parsed_job_tweets contains job tweets parsed from the datasets.
##Usage:
###MapReduce
hadoop jar twitter.jar nl.utwente.bigdata.TwitterR <INPUT DIRECTORY> <OUTPUT DIRECTORY>
###Pig Latin
pig –x mapreduce wordfrequency.pig
pig –x mapreduce general.pig
###R
dataset <- read.delim("[FILE PATH]", header=FALSE, quote="", stringsAsFactors=FALSE)
dataset$V1 = tolower(dataset$V1)
wordcloud(words = dataset$V1, freq = dataset$V2, colors=brewer.pal(9, "Dark2"))