Skip to content

NLP/IE workshop for the Tucson Data Science meetup (6/30/2016)

License

Notifications You must be signed in to change notification settings

myedibleenso/nlp-for-the-easily-bored

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NLP Information Extraction for the easily bored

NLP/IE workshop for the Tucson Data Science meetup (6/30/2016)

Please fork this repository and follow along.

If you fork this repo and changes are made to this repository after that, you'll want to sync your fork.

If you clone your forked repo locally, here's how to keep your forked clone up-to-date:

git remote add upstream https://github.com/myedibleenso/nlp-for-the-easily-bored
# check for updates in myedibleenso/nlp...bored
git fetch upstream  
# checkout your own local master branch
git checkout master
# pull in latest changes from myedibleenso/nlp...bored to your local master
git merge upstream/master

NOTE: this is a work in progress. Check back later for updates...

Table of Contents

NOTE: When viewing the slides, it's easiest to advance using fn+ Down Arrow

  1. NLP Information Extraction for the easily bored
  • slides / notebook
  • How do we get useful things out of a sea of text?
  • Learn about finding people, places, organizations, etc.
  1. Introduction to py-processors
  • slides / notebook
  • An overview of the library for natural language processing (NLP) library we'll be using in the examples

Examples

Here you'll find a few use cases illustrating the concepts covered in the intros.

  1. Who, what, when, and where? Making sense of web-based news
  1. Getting structured information out of Wikipedia pages
  • slides / notebook
  • You now know a little about how to find named entities (people, places, organizations, etc.) in text, but how do these interact in text?
  • Challenge: Try to populate a Wikipedia infobox for Barack Obama.
  1. Movie reviews
  • slides / notebook
  • Is it a positive or negative review? If we don't have a score, can we identity the sentiment and assign a score based on the review text?
  • NOTE: To really get into this example, you'll need a rotten tomatoes developer key
  • Challenge: Predict critics consensus scores based only on the review text
    • Use whatever method you want
      • feature-based classifier, latent feature model, etc.
    • What works and why?

Installation

There a couple of things you'll need to run the notebooks in this repository...

Requirements

  • Java 8
  • 2 or 3GB of RAM available for running the NLP server

Python dependencies via conda

conda create -n bored python=3
source activate bored
# assuming you're in the "nlp-for-the-easily-bored" directory
pip install -r requirements.txt

Running the notebooks

The notebooks are all under /notebooks

If you want to run/alter them locally after installing the project dependencies, simply run this command:

jupyter notebook

Resources

See resources.md for links to NLP datasets, free courses, etc.

Questions

Have a question? See the FAQ. It may have already been asked/answered.

Contributing

Thanks for the help! Take a look at contributing.md

About

NLP/IE workshop for the Tucson Data Science meetup (6/30/2016)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published