Skip to content

Commit

Permalink
Fix tokenization of Korean chars fix: #1877
Browse files Browse the repository at this point in the history
  • Loading branch information
hankcs committed Feb 24, 2024
1 parent beab770 commit 785bed5
Show file tree
Hide file tree
Showing 2 changed files with 2 additions and 2 deletions.
2 changes: 1 addition & 1 deletion hanlp/components/tokenizers/transformer.py
Original file line number Diff line number Diff line change
Expand Up @@ -147,7 +147,7 @@ def tag_to_span(self, batch_tags, batch: dict):
tags[i - 1] = 'B'
elif prev_tag == 'E':
tags[i - 1] = 'M'
tags[i] = 'M'
tags[i] = tag = 'M'
offset = e
prev_tag = tag
for tags in batch_tags:
Expand Down
2 changes: 1 addition & 1 deletion hanlp/version.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
# Author: hankcs
# Date: 2019-12-28 19:26

__version__ = '2.1.0-beta.55'
__version__ = '2.1.0-beta.56'
"""HanLP version"""


Expand Down

0 comments on commit 785bed5

Please # to comment.