This is Elena's term project. An analysis on how non-native English speakers acquire articles.
Title: English as a Second Language (ESL) Article Acquisition
By: Elena Cimino (e.cimino@pitt.edu)
Date: 2019.04.26
English article acquisition is notoriously difficult for non-native English learners. The four articles in English (definite the, indefinite a and an, and zero article ∅) map to approximately 90 different functions (Zhao & MacWhinney, 2018). It's no wonder that articles can be hard to acquire!
This project investigates the acquisition of English articles by learners of three different first languages: Spanish ([+art] language with definite, indefinite, and zero articles), Korean ([-art] language, meaning it lacks any overt article system), and Arabic ([+art] language with definite and zero articles but no indefinite article).
The corpora used for this project are the BuiD Arabic Learner Corpus and the Pitt ELI Corpus.
- Project Plan: a description of the dataset and my initial project plan
- Progress Report: updates on my project throughout the term
- Guestbook: a guestbook for comments, questions, and suggestions
- Project Presentation: a pdf version of my final project presentation
- Final Report: an in-depth explanation of my project, with background information
- Quantitative Analysis || nbviewer version, where I did some quantitative analysis
- exploratory-analysis: some initial exploring and processing that I undertook
- images: images and graphs from my jupyter files
- data_samples: 6 sample CEPA texts from the BALC corpus, 1 from each level (1-6)
- data: CSV files of samples pulled from the BALC corpus for analysis
- analysis: a hub for the qualitative and quantitative analysis for this project
- archive: a hub for old code and files that I did not end up using or were for testing purposes
- exploring_balc.ipynb || nbviewer version
- Exploring the BALC corpus
- balc_clean.ipynb || nbviewer version
- Processing and reformatting files from BALC for my needs
- exploring_spaCy.ipynb || nbviewer version
- Exploring the Python library spaCy a bit and comparing its tokenizer and lemmatizer to NLTK's
- pelic_data.ipynb || nbviewer version
- Doing some exploration with the PELIC data and targeting the three L1s I need (Arabic, Spanish, Korean)
See the LICENSE file for license rights and limitations. This project is licensed under the Creative Commons Attribution Share Alike 4.0 International.