Social media sites such as Twitter, Reddit, and Mumsnet can be an important source of information for health research. They provide an archive of the thoughts, feelings, and concerns of large parts of the population on a wide range of topics. This can be used to explore citizen sentiment towards a topic, track changes over time, and reveal new bodies of concern that traditional research methods may miss. We can scrape data from websites such as Mumsnet or use the Twitter Academic API to search for tweets relevant to our research question. Natural language processing methods such as Latent Dirichlet Allocation can then be used to structure the collections of tweets or posts into topics. Qualitative methods can also be used to interpret the topics found, or independently applied to generate an understanding of small numbers of posts or tweets. Steps:
- Identify sources of interest, e.g. the forum or social media you wish to search.
- Formulate a search strategy: search terms and a method for applying them.
- Collate your posts or tweets.
- Clean and pre-process your posts or tweets.
- Fit an LDA topic model.
- Interpret the topics found.