notes.txt

- LabeledLDA on Freebase to utilize unlabeled dataset
- POS Tagger using Twitter tagged POS using Penn Bank tags
- SVM for learning whether tweet informative capitalized or not
	* features based on capitalization will improve accuracy
//capitalization 	
	To
model unlabeled entities and their possible types, we
apply LabeledLDA (Ramage et al., 2009), constraining
each entity’s distribution over topics based on
its set of possible types according to Freebase

 Additionally we
have shown the benefits of features generated from
T-POS and T-CHUNK in segmenting Named Entities.

POSTagger
Best results - 83% cross validation
Use Alan's data, with maxent

$ opennlp POSTaggerTrainer -type maxent -model twitter-en-pos-maxent.bin -lang en -data pos_tweets.txt -encoding UTF-8

Chunker - 85%
Use Alan's data with maxent

opennlp ChunkerTrainerME -model twitter-en-chunker.bin -lang en -data pos_chunk_tweets.txt -encoding UTF-8

get features from chunker and postagger
run maxent classifier
use viterbi to find best tags


Second attempt:

Use http://www.cs.cmu.edu/~ark/TweetNLP/#pos to train the POS Tagger, find twitter chunker? 
Use Tweebo dependency parser to get features http://www.cs.cmu.edu/~ark/TweetNLP/#tweeboparser_tweebank

Possible Features
MWE - and a ^ or N that depends on V or something else
N vs ^ that depends on a V
Tags of dependency heads
Proper noun ^ that depends on determiner D