Message ID | 4491be7d-2cea-c437-b991-b1cc43e344ee@honermann.net |
---|---|
Headers | show |
Series | : C++ P1423R3 char8_t remediation implementation | expand |
On 15/09/19 15:39 -0400, Tom Honermann wrote: >This series of patches provides an implementation of the changes for >C++ proposal P1423R3 [1]. > >These changes do not impact default libstdc++ behavior for C++17 and >earlier; they are only active for C++2a or when the -fchar8_t option >is specified. > >Tested x86_64-linux. > >Patch 1: Decouple constraints for u8path from path constructors. >Patch 2: Update __cpp_lib_char8_t feature test macro value, add >deleted operators, update u8path. >Patch 3: Updates to existing tests. >Patch 4: New tests. > >Tom. > >[1]: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1423r3.html It took a while, but I've committed these four patches, with just some minor whitespace changes and changelog tweaks. I'm also following it up with this patch, which corrects some pre-existing problems that got worse with the new deleted operator<< overloads. Tested powerpc64le-linux, committed to trunk.
On 29/11/19 17:45 +0000, Jonathan Wakely wrote: >On 15/09/19 15:39 -0400, Tom Honermann wrote: >>This series of patches provides an implementation of the changes for >>C++ proposal P1423R3 [1]. >> >>These changes do not impact default libstdc++ behavior for C++17 and >>earlier; they are only active for C++2a or when the -fchar8_t option >>is specified. >> >>Tested x86_64-linux. >> >>Patch 1: Decouple constraints for u8path from path constructors. >>Patch 2: Update __cpp_lib_char8_t feature test macro value, add >>deleted operators, update u8path. >>Patch 3: Updates to existing tests. >>Patch 4: New tests. >> >>Tom. >> >>[1]: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1423r3.html > >It took a while, but I've committed these four patches, with just some >minor whitespace changes and changelog tweaks. Running the new tests revealed a latent bug on Windows, where experimental::filesystem::u8path(const Source&) assumed the input was an iterator over a NTCTS. That worked for a const char* but not a std::string or experimental::string_view. The attached patch fixes that (and simplifies the #if and if-constexpr conditions for Windows) but there's a remaining bug. Constructing a experimental::filesystem::path from a char8_t string doesn't do the right thing on Windows, so these cases fails: fs::path p(u8"\xf0\x9d\x84\x9e"); VERIFY( p.u8string() == u8"\U0001D11E" ); p = fs::u8path(u8"\xf0\x9d\x84\x9e"); VERIFY( p.u8string() == u8"\U0001D11E" ); It works correctly for std::filesystem::path, just not the TS version. I plan to commit the attached patch after some more testing. I'll backport something like it too.
On 29/11/19 19:48 +0000, Jonathan Wakely wrote: >On 29/11/19 17:45 +0000, Jonathan Wakely wrote: >>On 15/09/19 15:39 -0400, Tom Honermann wrote: >>>This series of patches provides an implementation of the changes >>>for C++ proposal P1423R3 [1]. >>> >>>These changes do not impact default libstdc++ behavior for C++17 >>>and earlier; they are only active for C++2a or when the -fchar8_t >>>option is specified. >>> >>>Tested x86_64-linux. >>> >>>Patch 1: Decouple constraints for u8path from path constructors. >>>Patch 2: Update __cpp_lib_char8_t feature test macro value, add >>>deleted operators, update u8path. >>>Patch 3: Updates to existing tests. >>>Patch 4: New tests. >>> >>>Tom. >>> >>>[1]: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1423r3.html >> >>It took a while, but I've committed these four patches, with just some >>minor whitespace changes and changelog tweaks. > >Running the new tests revealed a latent bug on Windows, where >experimental::filesystem::u8path(const Source&) assumed the input >was an iterator over a NTCTS. That worked for a const char* but not a >std::string or experimental::string_view. > >The attached patch fixes that (and simplifies the #if and if-constexpr >conditions for Windows) but there's a remaining bug. Constructing a >experimental::filesystem::path from a char8_t string doesn't do the >right thing on Windows, so these cases fails: > > fs::path p(u8"\xf0\x9d\x84\x9e"); > VERIFY( p.u8string() == u8"\U0001D11E" ); > > p = fs::u8path(u8"\xf0\x9d\x84\x9e"); > VERIFY( p.u8string() == u8"\U0001D11E" ); > >It works correctly for std::filesystem::path, just not the TS version. I think this is the fix needed for the TS code: --- a/libstdc++-v3/include/experimental/bits/fs_path.h +++ b/libstdc++-v3/include/experimental/bits/fs_path.h @@ -765,7 +765,14 @@ namespace __detail { #ifdef _GLIBCXX_USE_CHAR8_T if constexpr (is_same<_CharT, char8_t>::value) - return _S_wconvert((const char*)__f, (const char*)__l, true_type()); + { + const char* __f2 = (const char*)__f; + const char* __l2 = (const char*)__l; + std::wstring __wstr; + std::codecvt_utf8_utf16<wchar_t> __wcvt; + if (__str_codecvt_in_all(__f2, __l2, __wstr, __wcvt)) + return __wstr; + } else #endif { The current code uses std::codecvt<wchar_t, char, mbstate_t> but when we know the input is UTF-8 encoded we should use codecvt_utf8_utf16 (which is what the C++17 code already does for char8_t input). I'll add that the patch I'm testing.
On 29/11/19 21:45 +0000, Jonathan Wakely wrote: >On 29/11/19 19:48 +0000, Jonathan Wakely wrote: >>On 29/11/19 17:45 +0000, Jonathan Wakely wrote: >>>On 15/09/19 15:39 -0400, Tom Honermann wrote: >>>>This series of patches provides an implementation of the changes >>>>for C++ proposal P1423R3 [1]. >>>> >>>>These changes do not impact default libstdc++ behavior for C++17 >>>>and earlier; they are only active for C++2a or when the >>>>-fchar8_t option is specified. >>>> >>>>Tested x86_64-linux. >>>> >>>>Patch 1: Decouple constraints for u8path from path constructors. >>>>Patch 2: Update __cpp_lib_char8_t feature test macro value, add >>>>deleted operators, update u8path. >>>>Patch 3: Updates to existing tests. >>>>Patch 4: New tests. >>>> >>>>Tom. >>>> >>>>[1]: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1423r3.html >>> >>>It took a while, but I've committed these four patches, with just some >>>minor whitespace changes and changelog tweaks. >> >>Running the new tests revealed a latent bug on Windows, where >>experimental::filesystem::u8path(const Source&) assumed the input >>was an iterator over a NTCTS. That worked for a const char* but not a >>std::string or experimental::string_view. >> >>The attached patch fixes that (and simplifies the #if and if-constexpr >>conditions for Windows) but there's a remaining bug. Constructing a >>experimental::filesystem::path from a char8_t string doesn't do the >>right thing on Windows, so these cases fails: >> >>fs::path p(u8"\xf0\x9d\x84\x9e"); >>VERIFY( p.u8string() == u8"\U0001D11E" ); >> >>p = fs::u8path(u8"\xf0\x9d\x84\x9e"); >>VERIFY( p.u8string() == u8"\U0001D11E" ); >> >>It works correctly for std::filesystem::path, just not the TS version. > >I think this is the fix needed for the TS code: > >--- a/libstdc++-v3/include/experimental/bits/fs_path.h >+++ b/libstdc++-v3/include/experimental/bits/fs_path.h >@@ -765,7 +765,14 @@ namespace __detail > { >#ifdef _GLIBCXX_USE_CHAR8_T > if constexpr (is_same<_CharT, char8_t>::value) >- return _S_wconvert((const char*)__f, (const char*)__l, true_type()); >+ { >+ const char* __f2 = (const char*)__f; >+ const char* __l2 = (const char*)__l; >+ std::wstring __wstr; >+ std::codecvt_utf8_utf16<wchar_t> __wcvt; >+ if (__str_codecvt_in_all(__f2, __l2, __wstr, __wcvt)) >+ return __wstr; >+ } > else >#endif > { > >The current code uses std::codecvt<wchar_t, char, mbstate_t> but when >we know the input is UTF-8 encoded we should use codecvt_utf8_utf16 >(which is what the C++17 code already does for char8_t input). > >I'll add that the patch I'm testing. Here's the final patch I'm committing. Tested powerpc64le-linux and x86_64-w64-mingw, committed to trunk.