This repository contains various Jupyter notebooks pertaining to the use of stylometric analysis for Chinese literature. These notebooks were originally created for my DHAsia workshop at Stanford in February 2016.
Feel free to use them and modify them for your own research, but please cite this repository.
The Python Basics notebook contains a quick rundown of the basic knowledge necessary to understand the code used in the other notebooks.
The Stylometry notebook contains a detailed explanation of how to conduct stylometric analysis. This includes both hierarchical cluster analysis and principal component analysis.
The two streamlined files contain just the code necessary to do pca and hca. Variables that the user should adjust are highlighted with comments.
The general shape of my approach has been influenced by many folks, but the methods described in this workshop were strongly influenced by stylometric work done by Cristof Schöch (http://dragonfly.hypotheses.org/), the computational stylistic group's "stylo" package (https://sites.google.com/site/computationalstylistics/), as well as:
J.F. Burrows and D.H. Craig, “Lyrical Drama and the ‘Turbid Mountebanks:’ Styles of Dialogue in Romantic and Renaissance Tragedy,” Computers and the Humanities 28 (1994): 63-86
JNG Binongo and MWA Smith, “The Application of Principal Component Analysis to Stylometry,” Literary and Linguistic Computing 14 (1999): 445-466.