Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

AssertionError on empty input text #142

Closed
lifeiteng opened this issue Nov 25, 2024 · 2 comments
Closed

AssertionError on empty input text #142

lifeiteng opened this issue Nov 25, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@lifeiteng
Copy link

from wtpsplit import SaT

sat = SaT("sat-3l")
# optionally run on GPU for better performance
# also supports TPUs via e.g. sat.to("xla:0"), in that case pass `pad_last_batch=True` to sat.split
sat.half().to("cuda")

[v for v in sat.split(["This is a test This is another test.", ""])]
wtpsplit/wtpsplit/extract.py", line 194, in extract
    assert current_chunk == num_chunks
AssertionError
@markus583 markus583 added the bug Something isn't working label Nov 27, 2024
@pf-crypto12
Copy link

pf-crypto12 commented Nov 29, 2024

Similar issue here. It can also happen when the text given is made of newlines (ex: \r\n\r\n\r\n\r\n\r\n\r\n) which is then tokenized into an empty string and finally yield the same assertion error.

@markus583
Copy link
Collaborator

Hi, thanks for raising this. Both cases are now handled with the current version (2.1.2), which I just released. Please let me know if this fixes your issues.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants