Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Handle invalid UTF8 start bytes #69

Open
jrgfogh opened this issue Apr 25, 2024 · 1 comment
Open

Handle invalid UTF8 start bytes #69

jrgfogh opened this issue Apr 25, 2024 · 1 comment

Comments

@jrgfogh
Copy link

jrgfogh commented Apr 25, 2024

My build fails because my source code contains invalid unicode start bytes.
I don't know how the files got corrupted, but it seems like the kind of thing you would want a lint tool to fix, since my editors have no trouble reading the files.

Here is an example error message:

Processing 5 files: ./sw/lazy_init.h, ./sw/propagate_const.h, ./tests/lazy_init_tests.cpp, ./tests/propagate_const_tests.cpp, ./tests/gtest_unwarn.h
run-clang-format.py: error: ./tests/propagate_const_tests.cpp: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 69: invalid start byte
Traceback (most recent call last):
File "/run-clang-format.py", line 122, in run_clang_format_diff_wrapper
ret = run_clang_format_diff(args, file)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/run-clang-format.py", line 188, in run_clang_format_diff
errs = list(proc_stderr.readlines())
^^^^^^^^^^^^^^^^^^^^^^^
File "", line 322, in decode
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 69: invalid start byte

The failing code can be found here:
https://github.com/jrgfogh/small_wrappers/tree/909a477e92cf955ba24bd2b062f45e32c67f9644

@SteffenL
Copy link
Contributor

I don't know what's going on with your files, but I cloned your repository, checked out commit 909a477e92cf955ba24bd2b062f45e32c67f9644 and inspected each header/source file with a hex editor, and I couldn't find any 0xff bytes in ./tests/propagate_const_tests.cpp or any of the other header/source files. Without inspecting them any further, I would say the files are most likely valid and that's why your editors have no troubles with them.

While invalid UTF-8 is detectable, it isn't really something that can be corrected automatically, and you do want your source code to be interpreted correctly. A fatal error is therefore a desired behavior.

Unfortunately this issue isn't actionable from my point of view.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants