Skip to content

Independent final project for the UC Berkeley Natural Language Processing with Deep Learning graduate course.

Notifications You must be signed in to change notification settings

emilyarobles/simplifying_science

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Simplifying Science: Utilizing BERT and SciBERT for Scientific and Plain Language Text Classification

Independent final project for the UC Berkeley Natural Language Processing with Deep Learning graduate course.

Scientific jargon poses a significant barrier to the accessibility of scientific literature, yet from a researcher’s perspective it can be difficult to identify. This study explores the efficacy of advanced natural language processing (NLP) models in distinguishing between scientific and plain language texts using the Plain Language Adaptation of Biomedical Abstracts (PLABA) dataset. Leveraging the capabilities of BERT (Bidirectional Encoder Representations from Transformers) and SciBERT, a BERT variant pre-trained on scientific corpora, I conducted a comparative analysis to assess their performance in classifying text as either scientific or plain language. My methodology involved preprocessing the texts, implementing a simple neural network as a baseline, and then employing both BERT and SciBERT models. The baseline model, utilizing Word2Vec and NLTK, achieved a modest accuracy as expected. BERT demonstrated significant improvement, achieving a test accuracy of 97.01%, with high F1 scores and recall, indicating its proficiency in contextual understanding. Highlighting the advantages of domain-specific models in NLP tasks, SciBERT slightly outperformed BERT. This research offers insights into the optimization of NLP models scientific text identification, which could lead to advancements in plain language tools to aid scientific communication.

About

Independent final project for the UC Berkeley Natural Language Processing with Deep Learning graduate course.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published