Spell_checker_LLM01

I wanted to create a 'prototype' spell checker by fine-tuning different LLMs models. I prepared some data and also downloaded some public datasets from Kaggle:

    https://www.kaggle.com/datasets/bittlingmayer/spelling?resource=download&select=aspell.txt

You will find:

· archive.zip : the dataset I downloaded from https://www.kaggle.com/datasets/bittlingmayer/spelling?resource=download&select=aspell.txt

· .csv file: a sample of mispelled sentences and well spelled sentences.

· .ipynb : Python code to fine-tune a LLM and to evaluate it as well. I also used ChatGPT to improve/develop it.

I have also explored another prototype you may find in my repo: Spellchecker_LLM02. It is a more finegrained approach, where more sentences are used, instead of a mix of words and sentences, which may had led to problems because of different length of each sample during training.

Some other interesting and more advanced projects:

NeuSpell:

https://github.com/neuspell/neuspell#Datasets

Spelling corrector:

https://www.kaggle.com/datasets/bittlingmayer/spelling?resource=download&select=aspell.txt

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
LICENSE		LICENSE
README.md		README.md
archive.zip		archive.zip
sample_manual_annotated_sentences.csv		sample_manual_annotated_sentences.csv
spell_checker02.ipynb		spell_checker02.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spell_checker_LLM01

About

Releases

Packages

Languages

License

Lidiasaes/Spellchecker_LLM_01

Folders and files

Latest commit

History

Repository files navigation

Spell_checker_LLM01

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages