-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Cuckoo utf-8 encoding error on filenames with Umlauts #136
Cuckoo utf-8 encoding error on filenames with Umlauts #136
Comments
TL;DR: We're running into something very similar but not identical to cuckoosandbox/cuckoo/issues/2473. We're submitting correctly encoded as utf-8. It arrives at the Cuckoo API in utf-8 and is entered into the database correctly. In the database and on the wire back from the database it is still utf-8. But when handed to SQLAlchemy by the database module it becomes latin-1. Adding |
Reproducer:
Output:
libmysqlclient (C library) default client encoding seems to be latin1, even on an otherwise unicode-only system:
python3 seems to handle it correctly even when the default encoding of latin1 is used:
python2 breaks again when talking to a pipe:
On the wire it seems to be utf8 but then in python somehow seems to become latin1 in both cases at least transiently (otherwise the ascii codec would talk of
|
Problems like scVENUS/PeekabooAV#136 seem to be caused by a (still somewhat mysterious) latin1 encoding default in the mysqlclient package or rather libmysqlclient C library it uses. This seems to be mitigated in python3. As a workaround for python2 we add ?charset=utf8 to the connect string. Closes scVENUS/PeekabooAV#136.
When submitting a sample whose declared_name contains Umlauts (or other non-ascii characters), Cuckoo REST API requests for the task status fail with a utf-8 encoding error in the REST API. The filename displayed in the backtrace looks suspiciously like being latin1 encoded.
Actual error message to follow.
We need to find out if it's us submitting the filename parameter in the wrong encoding or it somehow reverting to latin1 on it's roundtrip to and from analysis on Windows.
Related to our addition of submitting the original filename as per #81.
The text was updated successfully, but these errors were encountered: