-
-
Notifications
You must be signed in to change notification settings - Fork 6.9k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
to_json(std::filesystem::path) can create invalid UTF-8 chars on windows #4271
Comments
I can also workaround this problem by adding a manifest XML that sets my app's code page to In CMake I wrapped this in a function:
which is used like this (probably want to wrap in a platform check):
with <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<assembly manifestVersion="1.0" xmlns="urn:schemas-microsoft-com:asm.v1">
<application>
<windowsSettings>
<activeCodePage xmlns="http://schemas.microsoft.com/SMI/2019/WindowsSettings">UTF-8</activeCodePage>
</windowsSettings>
</application>
</assembly> This solves the problem, if the app is running on at least Windows Version 1903. Still a bug but wanted to share this workaround because it's useful for many libraries that have the same issue. |
Proposed diff to do the conversion to UTF-8 when targeting windows: diff --git a/include/nlohmann/detail/conversions/to_json.hpp b/include/nlohmann/detail/conversions/to_json.hpp
index 562089c3..a8b74688 100644
--- a/include/nlohmann/detail/conversions/to_json.hpp
+++ b/include/nlohmann/detail/conversions/to_json.hpp
@@ -413,10 +413,20 @@ inline void to_json(BasicJsonType& j, const T& t)
}
#if JSON_HAS_FILESYSTEM || JSON_HAS_EXPERIMENTAL_FILESYSTEM
+#if defined(_WIN32)
+#include <windows.h>
+#endif
template<typename BasicJsonType>
inline void to_json(BasicJsonType& j, const std_fs::path& p)
{
+#if defined(_WIN32)
+ int len = ::WideCharToMultiByte(CP_UTF8, 0, &p.native()[0], p.native().size(), nullptr, 0, nullptr, nullptr);
+ std::string as_utf8(len, 0);
+ ::WideCharToMultiByte(CP_UTF8, 0, &p.native()[0], p.native().size(), &narrowed_string[0], len, nullptr, nullptr);
+ j = std::move(as_utf8);
+#else
j = p.string();
+#endif
}
#endif |
path may be represented in some ways (native/generic_string/string/u8string/e.t.c), so, I think it should be decided on client side how to store it before put it to json object. Just |
I am not sure how to proceed here as I am not a Windows user. Any idea? |
I'm happy to make a PR, up to you what solution to use though:
With (1) and (2) I'm not sure the effect on custom |
When is it not supported? I don't see any indication that it's optional. I see that on Windows it can throw if the string can't be represented in utf-8, so that's a consideration. |
Sorry, read the reference page wrong! In that case, @zel1b08a's solution seems best by far. Replace with Small chance of changing the behavior of already-broken code I guess (e.g. someone calling |
Awesome. Looking forward to a pull request :) |
The OS "favored" way how to process unicode on Windows is to use In Since Windows "native" way is to use already mentioned |
If it helps: the library contains code to convert UTF-16 or UTF-32 to UTF-8 (see |
I was thinking more about using all the power of the standard lib to avoid calling I just cleaned it up a bit and did few cosmetic changes: removed external types and dependencies, and replaced The cleaned up code for an explicit conversion between #if defined(_MSVC_LANG) && !defined(__clang__)
#define NLOHMANN_JSON_CPP_STD _MSVC_LANG
#else
#define NLOHMANN_JSON_CPP_STD __cplusplus
#endif
inline auto path_from_u8(const std::string& path) -> std::filesystem::path
{
#if NLOHMANN_JSON_CPP_STD < 202002 // `<` b/c `u8path` is deprecated in C++20; ⇨ warnings.
return std::filesystem::u8path(path);
#else
return std::filesystem::path(std::u8string_view(reinterpret_cast<const char8_t*>(path.data()), path.size()));
#endif
}
inline auto to_u8_string(const std::filesystem::path& p) -> std::string
{
#if NLOHMANN_JSON_CPP_STD < 202002
return p.u8string(); // Returns a `std::string` in C++17.
#else
const std::u8string s = p.u8string();
return std::string(s.begin(), s.end()); // Needless copy except for C++20 nonsense.
#endif
} I see certain beauty in this solution: The same codepath works on any platform, regardless its locale, and it does not need to use BTW: Are the "other" platforms (e.g. linux and macOS) only allowing |
@nlohmann |
Sure, please go ahead. |
Description
This conversion function:
https://github.com/nlohmann/json/blob/7efe875495a3ed7d805ddbb01af0c7725f50c88b/include/nlohmann/detail/conversions/to_json.hpp#L416C1-L420C2
uses
p.string()
, which does not give a UTF-8-encoded string on windows (in some cases, maybe?). Trying todump()
the resultant JSON throws a "invalid UTF-8 byte" exception.Reproduction steps
Convert a
std::filesystem::path
, which contains a unicode "Right Single Quotation Mark" character (U+2019), to ajson
implicitly or withto_json
.Inspect the new
json (string_t)
's bytes, either bydump()
ing, or converting to BSON.Expected vs. actual results
Expected: "Strings are stored in UTF-8 encoding." per https://json.nlohmann.me/api/basic_json/string_t/
Actual: The string gets converted by
std::filesystem::path::string()
, which appears to convert it to Windows-1252 encoding. Its bytes end up as\x92
rather than\xe2\x80\x99
.Minimal code example
Workaround I'm using is to use
WideCharToMultiByte
+.native()
to get the string in UTF-8 before passing to nlohmann:Error messages
"[json.exception.type_error.316] invalid UTF-8 byte at index 0: 0x92
Compiler and operating system
MSVC 2022 Professional, C++ 20
Library version
develop - a259ecc
Validation
develop
branch is used.The text was updated successfully, but these errors were encountered: