Skip to content

Word embedding models and text data from charter school websites for workshop and hackathon of TextXD 2018 at BIDS, UC Berkeley.

License

Notifications You must be signed in to change notification settings

TextXD/charters4textxd2018

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Binder

OR

Click here for TextXD collaboration session

Word Embeddings: Workshop and Exploration of Charter Schools

This repository includes a workshop (more info below) introducing word embedding models as well as hack session starter code for loading and exploring word embedding models with charter school data. Some data are contained in the repo; others will be linked into the Jupyter instance we'll set up to start the workshop. The charter school data come from author Jaren Haber's web-scraping of charter school websites, and the embeddings were created in the word2vec implementation in gensim. The repository is prepared for TextXD 2018 (http://www.textxd.org/) at the Berkeley Institute for Data Science (BIDS), UC Berkeley.

Introduction to word embeddings (workshop)

Overview

This one-hour workshop introduces word embeddings in Python and explores the features produced through the word2vec model. We'll mainly use the Akkadian ORACC corpus, put together by Professor Niek Veldhuis, UC Berkeley Near Eastern Studies. We'll also look briefly at a Word2Vec model trained on the ECCO-TCP corpus of 2,350 eighteenth-century literary texts made available by Ryan Heuser.

Learning Goals

  • Learn the intuition behind word embedding models (WEMs)
  • Learn how to implement a WEM using the gensim implementation of word2vec
  • Explore a corpus you've probably never seen before
  • Think through how visualization of WEMs might help you explore your corpus
  • Implement text analysis on a non-English language

Prerequisites

All are welcome! You don't need to know how neural nets work or be a Python expert to benefit from this workshop. We'll focus on the concepts behind word embeddings more than the specific syntax. This workshop will be most useful to people who have some familiarity with Python but have never done word embeddings before.

Contributing

If you notice a problem with these materials, please make an issue describing the problem. Collaboration and transparency are worth everyone's time!

Workshop leader

  • Jaren Haber

Acknowledgments

About

Word embedding models and text data from charter school websites for workshop and hackathon of TextXD 2018 at BIDS, UC Berkeley.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 72.5%
  • Python 27.5%