Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Invalid characters when redirecting stdout to a file #5

Closed
coinvariant opened this issue May 10, 2024 · 1 comment
Closed

Invalid characters when redirecting stdout to a file #5

coinvariant opened this issue May 10, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@coinvariant
Copy link

coinvariant commented May 10, 2024

Writing to a file via > misbehaves.

General information

  • Version: 6.1.0
  • OS: Manjaro Linux

Steps to reproduce

  1. Get the input and output files.
  2. Run tarask -nc -nv < latest_be_by.txt > be_tarask_by_2.txt to regenerate the output file. The _2 file should be identical to the supplied output.
  3. Inspect the output files. Lines 698, 4398, 5155 should contain invalid characters ��. In each case, one valid Cyrillic character is replaced with two replacement characters.

I suspect that the issue might be related to some async peculiarities. Maybe. Therefore, corruption might occur in other positions on a different system.

latest_be_by.txt
be_tarask_by.txt

It's known that the commits on April 13th 2024 did not cause this kind of corruption. However, I have not re-tested this particular input file on those commits.

@coinvariant
Copy link
Author

For posterity: the issue was that the bytes read in chunks from stdin were immediately converted and appended to an existing UTF-8 string. Not all chunks constituted valid UTF-8 strings.

@GooseOb GooseOb added the bug Something isn't working label May 13, 2024
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants