Skip to content

Latest commit

 

History

History
10 lines (7 loc) · 697 Bytes

README.md

File metadata and controls

10 lines (7 loc) · 697 Bytes

Web Scraping and Text Analysis

This repository shares R and Python codes that extract texts from web.

JPE_Scraping_Visualizing.R scraps the titles of articles published at the Journal of Political Economy as a data frame and visualizes the frequently used words.

oldbaileyonline.py has a function that randomly chooses and downloads 1000 court trials out of 202790 trials digitalized and published at www.oldbaileyonline.org. Time difference between incidents and trials over time is visualized.

Install the following packages to run the R codes: rvest, xml2, quanteda, readtext, devtools, tm, wordcloud, magrittr