NLP/IE workshop for the Tucson Data Science meetup (6/30/2016)
Please fork this repository and follow along.
If you fork this repo and changes are made to this repository after that, you'll want to sync your fork.
If you clone your forked repo locally, here's how to keep your forked clone up-to-date:
git remote add upstream https://github.com/myedibleenso/nlp-for-the-easily-bored
# check for updates in myedibleenso/nlp...bored
git fetch upstream
# checkout your own local master branch
git checkout master
# pull in latest changes from myedibleenso/nlp...bored to your local master
git merge upstream/master
NOTE: this is a work in progress. Check back later for updates...
NOTE: When viewing the slides, it's easiest to advance using fn
+ Down Arrow
NLPInformation Extraction for the easily bored
- slides / notebook
- How do we get useful things out of a sea of text?
- Learn about finding people, places, organizations, etc.
- Introduction to
py-processors
- slides / notebook
- An overview of the library for natural language processing (NLP) library we'll be using in the examples
Here you'll find a few use cases illustrating the concepts covered in the intros.
- Who, what, when, and where? Making sense of web-based news
- slides / notebook
- go from
html
-> people, places, etc. - Learn how to do basic IE on an article you may have read from The Guardian
- Challenge: How do we disambiguate organizations and people?
- Getting structured information out of Wikipedia pages
- slides / notebook
- You now know a little about how to find named entities (people, places, organizations, etc.) in text, but how do these interact in text?
- Challenge: Try to populate a Wikipedia infobox for Barack Obama.
- Movie reviews
- slides / notebook
- Is it a positive or negative review? If we don't have a score, can we identity the sentiment and assign a score based on the review text?
- NOTE: To really get into this example, you'll need a rotten tomatoes developer key
- Challenge: Predict critics consensus scores based only on the review text
- Use whatever method you want
- feature-based classifier, latent feature model, etc.
- What works and why?
- Use whatever method you want
There a couple of things you'll need to run the notebooks in this repository...
- Java 8
- 2 or 3GB of RAM available for running the NLP server
Python dependencies via conda
conda create -n bored python=3
source activate bored
# assuming you're in the "nlp-for-the-easily-bored" directory
pip install -r requirements.txt
The notebooks are all under /notebooks
If you want to run/alter them locally after installing the project dependencies, simply run this command:
jupyter notebook
See resources.md for links to NLP datasets, free courses, etc.
Have a question? See the FAQ. It may have already been asked/answered.
Thanks for the help! Take a look at contributing.md