Skip to content

Commit

Permalink
Updating readme.
Browse files Browse the repository at this point in the history
  • Loading branch information
tedunderwood committed May 12, 2016
1 parent d96d231 commit 38d238c
Showing 1 changed file with 4 additions and 1 deletion.
5 changes: 4 additions & 1 deletion readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ The data model here assumes that genre designations are situated and perspectiva

In short, every work can carry any number of genre tags, from zero upward. The compatibility of different definitions becomes an empirical question. Do different observers actually agree? Can a model trained on one observer's claims about detective fiction also predict the boundaries of 'crime fiction', as defined by someone else?

We use predictive modeling to test these questions. If you want to replicate the results here you'll need Python 3 and a copy of this repository. Running code/replicate.py will give you a range of modeling options keyed to particular sections of the article. The script will draw on metadata in meta/finalmeta.csv, and wordcount files in the newdata directory.
We use predictive modeling to test these questions. If you want to replicate the results here you'll need Python 3 and a copy of this repository. Running code/replicate.py will give you a range of modeling options keyed to particular sections of the article. The script will draw on metadata in meta/finalmeta.csv, wordcount files in the newdata directory, and the provided lexicon. Note that the selection of volumes in the negative contrast set can be stochastic, if more are available than needed to match the positive volumes. (For that matter, the positive set can at times be a random subset too.) So please don't expect replication to exactly match every figure down to the decimal point.

Because many of the books here are under copyright or otherwise encumbered with intellectual property agreements, I have to share wordcounts rather than original texts. If you want to consult texts in HathiTrust before 1922, it's usually possible to find them by pasting the Hathi volume id into a link of this form:

Expand Down Expand Up @@ -35,3 +35,6 @@ plot
----
(Mostly R) scripts for plotting in the sense of "dataviz." Has nothing to do with fictional plots.

lexicon
-------
The set of features that was used to produce the article; the top 10,000 words by document frequency in the whole corpus.

0 comments on commit 38d238c

Please # to comment.