Word Embeddings: Workshop and Exploration of Charter Schools

OR

Click here for TextXD collaboration session

Word Embeddings: Workshop and Exploration of Charter Schools

This repository includes a workshop (more info below) introducing word embedding models as well as hack session starter code for loading and exploring word embedding models with charter school data. Some data are contained in the repo; others will be linked into the Jupyter instance we'll set up to start the workshop. The charter school data come from author Jaren Haber's web-scraping of charter school websites, and the embeddings were created in the word2vec implementation in gensim. The repository is prepared for TextXD 2018 (http://www.textxd.org/) at the Berkeley Institute for Data Science (BIDS), UC Berkeley.

Introduction to word embeddings (workshop)

Overview

This one-hour workshop introduces word embeddings in Python and explores the features produced through the word2vec model. We'll mainly use the Akkadian ORACC corpus, put together by Professor Niek Veldhuis, UC Berkeley Near Eastern Studies. We'll also look briefly at a Word2Vec model trained on the ECCO-TCP corpus of 2,350 eighteenth-century literary texts made available by Ryan Heuser.

Learning Goals

Learn the intuition behind word embedding models (WEMs)
Learn how to implement a WEM using the gensim implementation of word2vec
Explore a corpus you've probably never seen before
Think through how visualization of WEMs might help you explore your corpus
Implement text analysis on a non-English language

Prerequisites

All are welcome! You don't need to know how neural nets work or be a Python expert to benefit from this workshop. We'll focus on the concepts behind word embeddings more than the specific syntax. This workshop will be most useful to people who have some familiarity with Python but have never done word embeddings before.

Contributing

If you notice a problem with these materials, please make an issue describing the problem. Collaboration and transparency are worth everyone's time!

Workshop leader

Jaren Haber

Acknowledgments

Laura Nelson and the D-Lab

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
data		data
notebooks		notebooks
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Word Embeddings: Workshop and Exploration of Charter Schools

Introduction to word embeddings (workshop)

Overview

Learning Goals

Prerequisites

Contributing

Workshop leader

Acknowledgments

About

Releases

Packages

Languages

License

TextXD/charters4textxd2018

Folders and files

Latest commit

History

Repository files navigation

Word Embeddings: Workshop and Exploration of Charter Schools

Introduction to word embeddings (workshop)

Overview

Learning Goals

Prerequisites

Contributing

Workshop leader

Acknowledgments

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages