A text analysis project on collection of script dialogue between characters for the episode 4,5,6 of star wars
Star Wars is a popular film franchise that takes place in a galaxy far, far away. This is a collection of script dialogue between characters for the first three movies (episodes 4-6). Since it's a holiday (and just because Star Wars is an awesome movie), this data should serve as a fun way to implement text mining and linguistics.
The source files are as listed below:
SW_EpisodeIV.txt - Script from the Episode IV: A New Hope with columns character and dialogue.
SW_EpisodeV.txt - Script from the Episode V: The Empire Strikes Back with columns character and dialogue.
SW_EpisodeVI.txt - Script from the Episode VI: Return of the Jedi with columns character and dialogue.
I have used R's tidytext, tm, wordcloud packages for doing text analysis. For cleansing the data I have used dplyr and ggplot2 packages of R.
Finally, I have created an ipython notebook with text analysis and have also checked-in an R markdown file.
Sridhar Varanasi