Message ID | 770e6d4f-2dae-e2f9-c6c8-bb3d458ef796@honermann.net |
---|---|
Headers | show |
Series | : C N2653 char8_t implementation | expand |
On Sun, 6 Jun 2021, Tom Honermann via Gcc-patches wrote: > These changes do not impact default gcc behavior. The existing -fchar8_t > option is extended to C compilation to enable the N2653 changes, and > -fno-char8_t is extended to explicitly disable them. N2653 has not yet been > accepted by WG14, so no changes are made to handling of the C2X language > dialect. Why is that option needed? Normally I'd expect features to be enabled or disabled based on the selected language version, rather than having separate options to adjust the configuration for one very specific feature in a language version. Adding extra language dialects not corresponding to any standard version but to some peculiar mix of versions (such as C17 with a changed type for u8"", or C2X with a changed type for u8'') needs a strong reason for those language dialects to be useful (for example, the -fgnu89-inline option was justified by widespread use of GNU-style extern inline in headers). I think the whole patch series would best wait until after the proposal has been considered by a WG14 meeting, in addition to not increasing the number of language dialects supported.
On 6/7/21 5:03 PM, Joseph Myers wrote: > On Sun, 6 Jun 2021, Tom Honermann via Gcc-patches wrote: > >> These changes do not impact default gcc behavior. The existing -fchar8_t >> option is extended to C compilation to enable the N2653 changes, and >> -fno-char8_t is extended to explicitly disable them. N2653 has not yet been >> accepted by WG14, so no changes are made to handling of the C2X language >> dialect. > Why is that option needed? Normally I'd expect features to be enabled or > disabled based on the selected language version, rather than having > separate options to adjust the configuration for one very specific feature > in a language version. Adding extra language dialects not corresponding > to any standard version but to some peculiar mix of versions (such as C17 > with a changed type for u8"", or C2X with a changed type for u8'') needs a > strong reason for those language dialects to be useful (for example, the > -fgnu89-inline option was justified by widespread use of GNU-style extern > inline in headers). The option is needed because it impacts core language backward compatibility (for both C and C++, the type of u8 string literals; for C++, the type of u8 character literals and the new char8_t fundamental type). The ability to opt-in or opt-out of the feature eases migration by enabling source code compatibility. C and C++ standards are not published at the same cadence. A project that targets C++20 and C17 may therefore have a need to either opt-out of char8_t support on the C++ side (already possible via -fno-char8_t), or to opt-in to char8_t support on the C side until such time as the targets change to C++20(+) and C23(+); assuming WG14 approval at some point. > > I think the whole patch series would best wait until after the proposal > has been considered by a WG14 meeting, in addition to not increasing the > number of language dialects supported. As an opt-in feature, this is useful to gain implementation and deployment experience for WG14. It would be appropriate to document this as an experimental feature pending WG14 approval. If WG14 declines it or approves it with different behavior, the feature can then be removed or changed. The option could also be introduced as -fexperimental-char8_t if that eases concerns, though I do not favor that approach due to misalignment with the existing option for C++. Tom.
On Fri, 11 Jun 2021, Tom Honermann via Gcc-patches wrote: > The option is needed because it impacts core language backward compatibility > (for both C and C++, the type of u8 string literals; for C++, the type of u8 > character literals and the new char8_t fundamental type). Lots of new features in new standard versions can affect backward compatibility. We generally bundle all of those up into a single -std option rather than having an explosion of different language variants with different features enabled or disabled. I don't think this feature, for C, reaches the threshold that would justify having a separate option to control it, especially given that people can use -Wno-pointer-sign or pointer casts or their own local char8_t typedef as an intermediate step if they want code using u8"" strings to work for both old and new standard versions. I don't think u8"" strings are widely used in C library headers in a way where the choice of type matters. (Use of a feature in library headers is a key thing that can justify options such as -fgnu89-inline, because it means the choice of language version is no longer fully under control of a single project.) The only feature proposed for C2x that I think is likely to have significant compatibility implications in practice for a lot of code is making bool, true and false into keywords. I still don't think a separate option makes sense there. (If that feature is accepted for C2x, what would be useful is for people to do distribution rebuilds with -std=gnu2x as the default to find and fix code that breaks, in advance of the default actually changing in GCC. But the workaround for not-yet-fixed code would be -std=gnu11, not a separate option for that one feature.) > > I think the whole patch series would best wait until after the proposal > > has been considered by a WG14 meeting, in addition to not increasing the > > number of language dialects supported. > > As an opt-in feature, this is useful to gain implementation and deployment > experience for WG14. I think this feature is one of the cases where experience in C++ is sufficiently relevant for C (although there are certainly cases of other language features where the languages are sufficiently different that using C++ experience like that can be problematic). E.g. we didn't need -fdigit-separators for C before digit separators were added to C2x, and we don't need -fno-digit-separators now they are in C2x (the feature is just enabled or disabled based on the language version), although that's one of many features that do affect compatibility in corner cases.
On 6/11/21 1:27 PM, Joseph Myers wrote: > On Fri, 11 Jun 2021, Tom Honermann via Gcc-patches wrote: > >> The option is needed because it impacts core language backward compatibility >> (for both C and C++, the type of u8 string literals; for C++, the type of u8 >> character literals and the new char8_t fundamental type). > Lots of new features in new standard versions can affect backward > compatibility. We generally bundle all of those up into a single -std > option rather than having an explosion of different language variants with > different features enabled or disabled. I don't think this feature, for > C, reaches the threshold that would justify having a separate option to > control it, especially given that people can use -Wno-pointer-sign or > pointer casts or their own local char8_t typedef as an intermediate step > if they want code using u8"" strings to work for both old and new standard > versions. Ok, I'm happy to defer to your experience. My perspective is likely biased by the C++20 changes being more disruptive for that language. > > I don't think u8"" strings are widely used in C library headers in a way > where the choice of type matters. (Use of a feature in library headers is > a key thing that can justify options such as -fgnu89-inline, because it > means the choice of language version is no longer fully under control of a > single project.) That aligns with my expectations. > > The only feature proposed for C2x that I think is likely to have > significant compatibility implications in practice for a lot of code is > making bool, true and false into keywords. I still don't think a separate > option makes sense there. (If that feature is accepted for C2x, what > would be useful is for people to do distribution rebuilds with -std=gnu2x > as the default to find and fix code that breaks, in advance of the default > actually changing in GCC. But the workaround for not-yet-fixed code would > be -std=gnu11, not a separate option for that one feature.) Ok, that comparison is helpful. > >>> I think the whole patch series would best wait until after the proposal >>> has been considered by a WG14 meeting, in addition to not increasing the >>> number of language dialects supported. >> As an opt-in feature, this is useful to gain implementation and deployment >> experience for WG14. > I think this feature is one of the cases where experience in C++ is > sufficiently relevant for C (although there are certainly cases of other > language features where the languages are sufficiently different that > using C++ experience like that can be problematic). > > E.g. we didn't need -fdigit-separators for C before digit separators were > added to C2x, and we don't need -fno-digit-separators now they are in C2x > (the feature is just enabled or disabled based on the language version), > although that's one of many features that do affect compatibility in > corner cases. Got it, thanks again, that comparison is helpful. Per this and prior messages, I'll revise the gcc patch series as follows (I'll likewise revise the glibc changes, but will detail that in the corresponding glibc mailing list thread). 1. Remove the proposed use of -fchar8_t and -fno-char8_t for C code. 2. Remove the updated documentation for the -fchar8_t option since it won't be applicable to C code. 3. Remove the _CHAR8_T_SOURCE macro. 4. Enable the change of u8 string literal type based on -std=[gnu|c]2x (by setting flag_char8_t if flag_isoc2x is set). 5. Condition the declarations of atomic_char8_t and __GCC_ATOMIC_CHAR8_T_LOCK_FREE on _GNU_SOURCE or _ISOC2X_SOURCE. 6. Remove the char8 data member from cpp_options that I had added and forgot to remove. 7. Revise the tests and rename them for consistency with other C2x tests. If I've forgotten anything, please let me know. Thank you for the thorough review! Tom.