Skip to content
This repository has been archived by the owner on Jan 14, 2021. It is now read-only.

Pr parameters

Carmen-digitalPebble edited this page Jul 12, 2012 · 2 revisions

TrainingCorpusCreator PR

Init params

  • directory : directory where the vector and lexicon files will be generated
  • reinitCorpus : delete existing files in the directory when reinitialising the PR

Runtime params

  • inputAnnotationSet : annotation set where the label and attribute annotations will be taken from
  • labelAnnotationType : annotation type to use as a training unit e.g. sentence, paragraph etc...
  • labelAnnotationValue : label to use for the training e.g. language etc...
  • attributeAnnotationType : annotation type to use for generating attributes e.g. Token
  • attributeAnnotationValue : feature to use for generating attributes e.g. string

Note : the TrainingCorpusCreator takes a single value for the parameters above. Some preprocessing might be needed in order to combine different annotation types (e.g. Token.string + Token.pos) into a single annotation. Also, the TrainingCorpusCreator does not generate the model directly. This must be done separately using a number of manual commands, see https://github.com/DigitalPebble/TextClassification/wiki/HOWTO for reference.

Classifier PR

Init params

  • modelDir : directory containing the model and lexicon files

Runtime params

  • inputAnnotationSet : annotation set where the label and attribute annotations will be taken from
  • labelAnnotationType : annotation type to use as a unit for classification e.g. sentence, paragraph etc...
  • labelAnnotationValue : label to use for the classification e.g. language etc... this will override any preexisting feature with the same name
  • attributeAnnotationType : annotation type to use for generating attributes e.g. Token
  • attributeAnnotationValue : feature to use for generating attributes e.g. string
Clone this wiki locally