You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Run tarask -nc -nv < latest_be_by.txt > be_tarask_by_2.txt to regenerate the output file. The _2 file should be identical to the supplied output.
Inspect the output files. Lines 698, 4398, 5155 should contain invalid characters ��. In each case, one valid Cyrillic character is replaced with two replacement characters.
I suspect that the issue might be related to some async peculiarities. Maybe. Therefore, corruption might occur in other positions on a different system.
It's known that the commits on April 13th 2024 did not cause this kind of corruption. However, I have not re-tested this particular input file on those commits.
The text was updated successfully, but these errors were encountered:
For posterity: the issue was that the bytes read in chunks from stdin were immediately converted and appended to an existing UTF-8 string. Not all chunks constituted valid UTF-8 strings.
Writing to a file via
>
misbehaves.General information
Steps to reproduce
tarask -nc -nv < latest_be_by.txt > be_tarask_by_2.txt
to regenerate the output file. The _2 file should be identical to the supplied output.��
. In each case, one valid Cyrillic character is replaced with two replacement characters.I suspect that the issue might be related to some async peculiarities. Maybe. Therefore, corruption might occur in other positions on a different system.
latest_be_by.txt
be_tarask_by.txt
It's known that the commits on April 13th 2024 did not cause this kind of corruption. However, I have not re-tested this particular input file on those commits.
The text was updated successfully, but these errors were encountered: