-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Default encoding is UTF-8? #64
Comments
Thanks for your comments! I'm glad you found it easy to override the encoding. I cannot find the encoding documented anywhere. The default was set to ISO-8859-1 a long time ago, probably due to an observation like yours. It may have evolved since then. The fact that your Windows machine seems to be recording in UTF-8 seems to be good reason to change the assumed default to UTF-8. |
Thanks @InvncibiltyCloak for bringing this up. Changing the default encoding to UTF-8 seems reasonable. One consideration though would be to give users the option to explicitly set encodings to maintain backwards compatibility with other encodings, e.g. ISO-8859-1, in older files and with older DEWE stacks? |
I never had a good example to test the encoding so it is intentionally very easy for the user to specify: import dewesoft as dw
dw.encoding='utf-8' Unfortunately, the Dewesoft library sometimes appends junk characters to the end of strings which cause utf-8 decoding errors in python and fail the tests. If we change the default to utf-8 then we need to either ask Dewesoft fix their library or have python ignore these decoding errors. |
Ah I should have been more specific. I saw this global option, but wondered if all of the 10 or so usages of it should all use the same encoding, e.g. opening the file in dwdatareader/dwdatareader/__init__.py Line 388 in e579a23
vs decoding text values e.g. in dwdatareader/dwdatareader/__init__.py Line 88 in e579a23
But it was only guessing on my part without any evidence of different encodings actually occurring.
That sounds annoying. I would guess that the junk characters are a result of the C lib interpreting parts of the memory as strings when it should not, i.e. string length mismatch at that level? |
First off, thanks for the great Dewesoft reader library.
I was recently using it for my datafiles which are DXD and are created on a Windows x64, en-US machine.
The units had some unicode characters for degree symbol and ohms. When I imported it with this library it had the classic Å symbol which is the give away of reading UTF-8 binary data but assuming it should be decoded according to Windows codepage (looks like you have ISO-8859-1 chosen).
A quick peek into the python code and I saw this is extremely easy to fix in this library - just call
dwdatareader.encoding = 'utf-8'
and it gives the correctly decoded strings.I just wanted to file an issue to bring up the fact that it appears that DewesoftX is encoding strings in UTF-8 and perhaps this library should change the default encoding to match?
Unfortunately I am only sample size of one and have not tested other locales or versions of Dewesoft, so I am not sure if this default encoding applies everywhere. Thanks for your time!
The text was updated successfully, but these errors were encountered: