Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Strings are one character short with version 3.2.5 through 4.0.0 #116

Closed
afh opened this issue Dec 7, 2023 · 4 comments
Closed

Strings are one character short with version 3.2.5 through 4.0.0 #116

afh opened this issue Dec 7, 2023 · 4 comments

Comments

@afh
Copy link

afh commented Dec 7, 2023

When compiling ledger with utfcpp version 3.2.5 until 4.0.0 (4.0.1 fixes the issue) there is an issue where a string passed from ledger's Python module to its C++ native code is shortened by one character (see ledger/ledger#2302 for details).

In ledger's test suite this results in data files not being found as their filepath is truncated, e.g. test/regress/4D9288AE.dat becomes test/regress/4D9288AE.da.

Currently the ledger project is trying to figure out the best way forward and any insight folks more familiar with utfcpp can provide would be greatly appreciated.

A few thoughts / questions:

  • Has a similar or related issue been raised with utfcpp in the past?
  • What information would be helpful to you in order to get a better understanding of how utfcpp is (mis?)used in ledger?
@nemtrif
Copy link
Owner

nemtrif commented Dec 7, 2023

See #111 Basically a regression was introduced when fixing another issue.

Your options are:

  1. upgrade to 4.0.3
  2. Instead of using utf8::unchecked::utf16to8(), use utf8::utf16to8(). It is safer anyway and just a bit slower.

@afh
Copy link
Author

afh commented Dec 7, 2023

Thanks for the quick response, @nemtrif, very much appreciated 🙏
I'll talk with the other devs about (dis)advantages to using utf8::utf16to8() or maybe we can find a way to switch to it only for the affected releases of utfcpp.

I'd like to suggest to add a notice or warning to each the affected releases on its GitHub release page. Maybe something along the lines of:

This release introduces a breaking regression, please use at least 4.0.1. For details see #111.

@afh afh closed this as completed Dec 7, 2023
@nemtrif
Copy link
Owner

nemtrif commented Dec 7, 2023

No problem. My assumption they are using unchecked namespace to avoid exceptions. If so, you should better upgrade to 4.0.3. Or, just stay on a pre-3.2.5 release. 3.2.4 is OK.

@afh
Copy link
Author

afh commented Dec 7, 2023

Ledger will update to utfcpp 4.0.3 (ledger/ledger#2315 is pending review), yet some distributions, like Debian, prefer programs to link to libutfcpp rather than embed it. In the case where a distro ships libutfcpp 3.2.5 a sensible path forward is to apply a patch replacing uses of utf8::unchecked::utf16to8() with utf8::utf16to8() before compiling the program for the distribution package.

Your comments have been very helpful! 💯🙏

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants