Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

bug: Warning: Duplicate word in word2vec file #887

Open
bact opened this issue Dec 11, 2023 · 0 comments
Open

bug: Warning: Duplicate word in word2vec file #887

bact opened this issue Dec 11, 2023 · 0 comments
Labels
bug bugs in the library
Milestone

Comments

@bact
Copy link
Member

bact commented Dec 11, 2023

Description

There are hundreds of warnings like this during unit test:

2023-12-11:03:40:47 WARNING  [gensim.models.keyedvectors:1909] duplicate word 'ต่าง' in word2vec file, ignoring all but first

Expected results

No warning.

Current results

(partial)

2023-12-11:03:40:47 WARNING  [gensim.models.keyedvectors:1909] duplicate word 'ต่าง' in word2vec file, ignoring all but first
2023-12-11:03:40:47 WARNING  [gensim.models.keyedvectors:1909] duplicate word '	' in word2vec file, ignoring all but first
...
2023-12-11:03:40:57 WARNING  [gensim.models.keyedvectors:1909] duplicate word '' in word2vec file, ignoring all but first
2023-12-11:03:40:58 WARNING  [gensim.models.keyedvectors:1909] duplicate word 'หยับ' in word2vec file, ignoring all but first

Steps to reproduce

Run unit test

PyThaiNLP version

dev

Python version

3.8

Operating system and version

n/a

More info

No response

Possible solution

No response

Files

No response

@bact bact added the bug bugs in the library label Dec 11, 2023
@bact bact added this to the 4.0 milestone Dec 11, 2023
@bact bact changed the title bug: Duplicate word in word2vec file bug: Warning: Duplicate word in word2vec file Dec 11, 2023
@github-project-automation github-project-automation bot moved this to To do in PyThaiNLP Aug 29, 2024
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug bugs in the library
Projects
Status: To do
Development

No branches or pull requests

1 participant