-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathnotes.txt
41 lines (31 loc) · 1.37 KB
/
notes.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
- LabeledLDA on Freebase to utilize unlabeled dataset
- POS Tagger using Twitter tagged POS using Penn Bank tags
- SVM for learning whether tweet informative capitalized or not
* features based on capitalization will improve accuracy
//capitalization
To
model unlabeled entities and their possible types, we
apply LabeledLDA (Ramage et al., 2009), constraining
each entity’s distribution over topics based on
its set of possible types according to Freebase
Additionally we
have shown the benefits of features generated from
T-POS and T-CHUNK in segmenting Named Entities.
POSTagger
Best results - 83% cross validation
Use Alan's data, with maxent
$ opennlp POSTaggerTrainer -type maxent -model twitter-en-pos-maxent.bin -lang en -data pos_tweets.txt -encoding UTF-8
Chunker - 85%
Use Alan's data with maxent
opennlp ChunkerTrainerME -model twitter-en-chunker.bin -lang en -data pos_chunk_tweets.txt -encoding UTF-8
get features from chunker and postagger
run maxent classifier
use viterbi to find best tags
Second attempt:
Use http://www.cs.cmu.edu/~ark/TweetNLP/#pos to train the POS Tagger, find twitter chunker?
Use Tweebo dependency parser to get features http://www.cs.cmu.edu/~ark/TweetNLP/#tweeboparser_tweebank
Possible Features
MWE - and a ^ or N that depends on V or something else
N vs ^ that depends on a V
Tags of dependency heads
Proper noun ^ that depends on determiner D