Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

“成都”the two chinese words won't recognize #132

Open
GuoPL opened this issue Jun 8, 2022 · 2 comments
Open

“成都”the two chinese words won't recognize #132

GuoPL opened this issue Jun 8, 2022 · 2 comments

Comments

@GuoPL
Copy link

GuoPL commented Jun 8, 2022

from flashtext import KeywordProcessor

#text = "@苍月轶 再次核实:骆然5月8日持24小时核酸从宜昌回蓉,到成都24小时内核酸一次,9号回泸定,24小时内又做一次核酸,均阴性,健康码绿码。宜昌不是
AB区域。"
text = "成都到北京高铁3小时,郑州到成都2小时"

print(text)
kp = KeywordProcessor()
kp.add_keyword("到成都", ("成都", "ab"))
kp.add_keyword("宜昌", ("宜昌", "ab"))

print(len(kp))
print(kp)
word_index = kp.extract_keywords(text, span_info=True)
print(word_index)
for item in word_index:
print(text[item[1]:item[2]])

print('finished')

@githublyff
Copy link

from flashtext import KeywordProcessor

text = "成都到北京高铁3小时,郑州到成都2小时"
kp = KeywordProcessor()
kp.add_keyword("到成都", ("成都", "ab"))
kp.add_keyword("宜昌", ("宜昌", "ab"))

print(len(kp))
keywords_found = kp.extract_keywords(text, span_info=True)
for item in keywords_found:
print(item)

2
(('成都', 'ab'), 13, 15)

Reference:https://blog.csdn.net/chen10314/article/details/122048726

@zhangbo2008
Copy link

still not a good solution
cause so many special char will appear in our keywords. like () [] ... etc.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants