Skip to content

I wanted to create a 'prototype' spell checker by finetuning different LLMs models

License

Notifications You must be signed in to change notification settings

Lidiasaes/Spellchecker_LLM_01

Repository files navigation

Spell_checker_LLM01

I wanted to create a 'prototype' spell checker by fine-tuning different LLMs models. I prepared some data and also downloaded some public datasets from Kaggle:

    https://www.kaggle.com/datasets/bittlingmayer/spelling?resource=download&select=aspell.txt

You will find:

· archive.zip : the dataset I downloaded from https://www.kaggle.com/datasets/bittlingmayer/spelling?resource=download&select=aspell.txt

· .csv file: a sample of mispelled sentences and well spelled sentences.

· .ipynb : Python code to fine-tune a LLM and to evaluate it as well. I also used ChatGPT to improve/develop it.

I have also explored another prototype you may find in my repo: Spellchecker_LLM02. It is a more finegrained approach, where more sentences are used, instead of a mix of words and sentences, which may had led to problems because of different length of each sample during training.



Some other interesting and more advanced projects:

NeuSpell:

https://github.com/neuspell/neuspell#Datasets

Spelling corrector:

https://www.kaggle.com/datasets/bittlingmayer/spelling?resource=download&select=aspell.txt

About

I wanted to create a 'prototype' spell checker by finetuning different LLMs models

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published