Part of Speech Tagger using Hidden Markov Model (HMM)

This repository contains a Jupyter notebook that demonstrates the implementation of a Part of Speech (POS) tagger using a Hidden Markov Model (HMM).

Overview

The notebook implements an HMM-based POS tagger which involves:

Data Preparation: Reading and processing input data to extract words and their corresponding tags.
Model Initialization: Creating an instance of the HMM model.
State Creation: Defining states with emission probabilities.
Adding Transitions: Setting up transitions between states based on observed data.
Finalizing the Model: Baking the model to make it ready for use.
Evaluation: Evaluating the tagger on a test dataset.

Initialization:
- Import necessary libraries and modules.
Data Preparation:
- Set up data streams to extract words and tags.
- Generate tag and word lists from the data stream.
- Calculate emission counts for the words given tags.
Most Frequent Class Tagger (MFCTagger):
- Implement a simple baseline tagger that assigns the most frequent tag for each word.
- Compare the performance of this baseline with the HMM tagger.
HMM Model Initialization:
- Create an instance of the HMM model.
State Creation:
- For each tag, calculate the emission probabilities and create states using these probabilities.
- Add these states to the HMM model.
Adding Transitions:
- Add transitions from the start state to each tag state.
- Add transitions from each tag state to the end state.
- Add transitions between tag states based on bigram counts.
Finalizing the Model:
- Finalize the HMM model by baking it.
Evaluation:
- Evaluate the HMM tagger on a test dataset.
- Compare the accuracy of the HMM tagger and the MFCTagger.

To use this notebook, follow these steps:

Clone the Repository:

git clone <repository_url>
cd <repository_directory>

Set Up the Environment:
- Ensure you have Conda installed.
- Create and activate the Conda environment using the provided hmm-tagger.yaml file:
```
conda env create -f hmm-tagger.yaml
conda activate hmm-tagger
```
Run the Notebook:
- Launch Jupyter Notebook:
```
jupyter notebook
```
- Open HMM Tagger.ipynb and run the cells to execute the POS tagging process.
Execute the Cells:
- Run the cells in the notebook sequentially to execute the POS tagging process.

Most Frequent Class Tagger (MFCTagger): This section implements a simple baseline tagger that assigns the most frequent tag for each word. The MFCTagger class is provided to mock the interface of the HMM models so that they can be used interchangeably.
Evaluation: Evaluate the tagger on a test dataset and compare the accuracy of the HMM tagger and the MFCTagger.

For more information, please refer to the provided README from Udacity.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
CODEOWNERS		CODEOWNERS
HMM Tagger.html		HMM Tagger.html
HMM Tagger.ipynb		HMM Tagger.ipynb
HMM warmup (optional).html		HMM warmup (optional).html
HMM warmup (optional).ipynb		HMM warmup (optional).ipynb
LICENSE.txt		LICENSE.txt
README.md		README.md
README_Udacity.md		README_Udacity.md
_example.png		_example.png
_post-hmm.png		_post-hmm.png
brown-universal.txt		brown-universal.txt
helpers.py		helpers.py
hmm-tagger.yaml		hmm-tagger.yaml
tags-universal.txt		tags-universal.txt