Handle invalid UTF8 start bytes #69

jrgfogh · 2024-04-25T17:51:50Z

My build fails because my source code contains invalid unicode start bytes.
I don't know how the files got corrupted, but it seems like the kind of thing you would want a lint tool to fix, since my editors have no trouble reading the files.

Here is an example error message:

Processing 5 files: ./sw/lazy_init.h, ./sw/propagate_const.h, ./tests/lazy_init_tests.cpp, ./tests/propagate_const_tests.cpp, ./tests/gtest_unwarn.h
run-clang-format.py: error: ./tests/propagate_const_tests.cpp: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 69: invalid start byte
Traceback (most recent call last):
File "/run-clang-format.py", line 122, in run_clang_format_diff_wrapper
ret = run_clang_format_diff(args, file)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/run-clang-format.py", line 188, in run_clang_format_diff
errs = list(proc_stderr.readlines())
^^^^^^^^^^^^^^^^^^^^^^^
File "", line 322, in decode
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 69: invalid start byte

The failing code can be found here:
https://github.com/jrgfogh/small_wrappers/tree/909a477e92cf955ba24bd2b062f45e32c67f9644

SteffenL · 2024-08-13T18:26:47Z

I don't know what's going on with your files, but I cloned your repository, checked out commit 909a477e92cf955ba24bd2b062f45e32c67f9644 and inspected each header/source file with a hex editor, and I couldn't find any 0xff bytes in ./tests/propagate_const_tests.cpp or any of the other header/source files. Without inspecting them any further, I would say the files are most likely valid and that's why your editors have no troubles with them.

While invalid UTF-8 is detectable, it isn't really something that can be corrected automatically, and you do want your source code to be interpreted correctly. A fatal error is therefore a desired behavior.

Unfortunately this issue isn't actionable from my point of view.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle invalid UTF8 start bytes #69

Handle invalid UTF8 start bytes #69

jrgfogh commented Apr 25, 2024

SteffenL commented Aug 13, 2024

Handle invalid UTF8 start bytes #69

Handle invalid UTF8 start bytes #69

Comments

jrgfogh commented Apr 25, 2024

SteffenL commented Aug 13, 2024