We created a dataset of postpartum depression (PPD) forum threads with biomedical term annotations.
This corpus includes pre-processed 10,548 BabyCenter.com forum threads (posts and comments) and MMLite annotations.
The "PPD-NER-Corpus-1.0" folder contains pre-processed text files for each PPD forum thread (.txt files), annotation files (.ann files), and the MMLite configuration file (annotation.conf).
MMLite[1] was used to annotate biomedical terms from the following BabyCenter.com public online health community discussion boards:
- Postpartum Depression, Anxiety and Related Topics [Internet]. BabyCenter Community. (cited 2018May9)
- Postpartum Depression and Postpartum Anxiety Support Group [Internet]. BabyCenter Community. (cited 2018May9)
- POSTPARTUM ANXIETY SUPPORT GROUP [Internet]. BabyCenter Community. (cited 2018May9)
- Demner-Fushman D, Rogers WJ, Aronson AR. MetaMap Lite: an evaluation of a new Java implementation of MetaMap. Journal of the American Medical Informatics Association. 2017 Jan 27;24(4):841-4.
Data can be used as-is under the MIT License attached to the repository. Please cite this article if using this data set:
Chowdhuri S, McCrea S, Fushman DD, Taylor CO. Extracting Biomedical Terms from Postpartum Depression Online Health Communities. AMIA Summits on Translational Science Proceedings. 2019;2019:592.