Skip to content

sanazkhalili/llm_nlp

Repository files navigation

Clean Function for Dataset Folder

We use various libraries and tools for cleaning and preprocessing the poem dataset (link to dataset).

The tools are:

  1. haraai_clean.py: ParsiNorm
  2. hazm normalizer

To use the clean function, you can use the following code:

from main_clean import clean

with open('/content/anvari.txt') as fp:
    texts = fp.readlines()

list_clean = clean(texts[0:15])
print(list_clean)

Persian model tools folder

We have collected several models for Persian and multilingual (supporting Persian) tokenization and text classification tasks.

Persian Poem Dataset Folder

This data sources were collected from

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published