Message ID | 26530bce-6ed5-082f-f5e5-cebe827719a2@suse.cz |
---|---|
State | New |
Headers | show |
Series | expr: build string_constant only for a char type | expand |
On Mon, Jul 27, 2020 at 02:32:15PM +0200, Martin Liška wrote: > As mentioned in the PR, we should not create a string constant for a type > that is different from char_type_node. Looking at expr.c, I was inspired > and used 'TYPE_MAIN_VARIANT (chartype) == char_type_node' to verify that underlying > type is a character type. That doesn't look correct, there is char, signed char, unsigned char, or say std::byte, and all of them are perfectly fine. So, rather than requiring it is char and nothing else, you should instead check that it is an INTEGRAL_TYPE_P (maybe better other than BOOLEAN_TYPE?), which is complete and has the same TYPE_PRECISION as char_type_node. Jakub
On 7/27/20 3:16 PM, Jakub Jelinek wrote: > On Mon, Jul 27, 2020 at 02:32:15PM +0200, Martin Liška wrote: >> As mentioned in the PR, we should not create a string constant for a type >> that is different from char_type_node. Looking at expr.c, I was inspired >> and used 'TYPE_MAIN_VARIANT (chartype) == char_type_node' to verify that underlying >> type is a character type. > > That doesn't look correct, there is char, signed char, unsigned char, > or say std::byte, and all of them are perfectly fine. > So, rather than requiring it is char and nothing else, you should instead > check that it is an INTEGRAL_TYPE_P (maybe better other than BOOLEAN_TYPE?), > which is complete and has the same TYPE_PRECISION as char_type_node. All right, the following survives tests and bootstraps. Ready to be installed? Thanks, Martin > > Jakub >
On Mon, Jul 27, 2020 at 04:12:09PM +0200, Martin Liška wrote: > On 7/27/20 3:16 PM, Jakub Jelinek wrote: > > On Mon, Jul 27, 2020 at 02:32:15PM +0200, Martin Liška wrote: > > > As mentioned in the PR, we should not create a string constant for a type > > > that is different from char_type_node. Looking at expr.c, I was inspired > > > and used 'TYPE_MAIN_VARIANT (chartype) == char_type_node' to verify that underlying > > > type is a character type. > > > > That doesn't look correct, there is char, signed char, unsigned char, > > or say std::byte, and all of them are perfectly fine. > > So, rather than requiring it is char and nothing else, you should instead > > check that it is an INTEGRAL_TYPE_P (maybe better other than BOOLEAN_TYPE?), > > which is complete and has the same TYPE_PRECISION as char_type_node. > > All right, the following survives tests and bootstraps. LGTM. Jakub
On 7/27/20 6:32 AM, Martin Liška wrote: > Hey. > > As mentioned in the PR, we should not create a string constant for a type > that is different from char_type_node. Looking at expr.c, I was inspired > and used 'TYPE_MAIN_VARIANT (chartype) == char_type_node' to verify that > underlying > type is a character type. > > Patch can bootstrap on x86_64-linux-gnu and survives regression tests. > And it fixes chromium > build with gcc-10 branch with the patch applied. > > Ready to be installed? > Thanks, > Martin > > gcc/ChangeLog: > > PR tree-optimization/96058 > * expr.c (string_constant): Build string_constant only > for a type that is main variant of char_type_node. > --- > gcc/expr.c | 22 +++++++++++++--------- > 1 file changed, 13 insertions(+), 9 deletions(-) > > diff --git a/gcc/expr.c b/gcc/expr.c > index 5db0a7a8565..c3fdd82b319 100644 > --- a/gcc/expr.c > +++ b/gcc/expr.c > @@ -11828,17 +11828,21 @@ string_constant (tree arg, tree *ptr_offset, > tree *mem_size, tree *decl) > chartype = TREE_TYPE (chartype); > while (TREE_CODE (chartype) == ARRAY_TYPE) > chartype = TREE_TYPE (chartype); > - /* Convert a char array to an empty STRING_CST having an array > - of the expected type and size. */ > - if (!initsize) > - initsize = integer_zero_node; > > - unsigned HOST_WIDE_INT size = tree_to_uhwi (initsize); > - init = build_string_literal (size, NULL, chartype, size); > - init = TREE_OPERAND (init, 0); > - init = TREE_OPERAND (init, 0); > + if (TYPE_MAIN_VARIANT (chartype) == char_type_node) The change to c_getstr I recently committed made it clear that the function can: Return a pointer P to a NUL-terminated string containing the sequence of bytes corresponding to the representation of the object referred to by SRC (or a subsequence of such bytes within it if SRC is a reference to an initialized constant array plus some constant offset). I.e., c_getstr returns a STRING_CST for arbitrary non-string constants. This enables optimizations like the by-pieces expansion of calls to raw memory functions like memcpy, or the folding of other raw memory calls like memchr with non- string arguments. c_getstr relies on string_constant for that. Restricting the latter function to just character types prevents these optimizations for zero-initialized constants of other types. A test case that shows the difference to the by-pieces expansion goes something like this: const struct { char a[64]; } x = { 0 }; void f (void *d) { __builtin_memcpy (d, &x, sizeof x - 1); } A test case for the memchr folding is similar: const struct { char a[64]; } x = { 0 }; int f (void *d) { return __builtin_memchr (&x, 0, sizeof x) != 0; } The tests I committed with the change didn't exercise any of this so that's my bad. I'm still not sure I understand how the problem with the incomplete type comes up (I haven't had a chance to look into the recent updates on the bug yet) but to retain the optimization (and keep the comments in sync with the code) I think a better solution than restricting the function to integers is to limit it to complete types. Beyond that, extending the function to also constant arrays or nonzero aggregates will also enable the optimization for those. Martin > + { > + /* Convert a char array to an empty STRING_CST having an array > + of the expected type and size. */ > + if (!initsize) > + initsize = integer_zero_node; > + > + unsigned HOST_WIDE_INT size = tree_to_uhwi (initsize); > + init = build_string_literal (size, NULL, chartype, size); > + init = TREE_OPERAND (init, 0); > + init = TREE_OPERAND (init, 0); > > - *ptr_offset = integer_zero_node; > + *ptr_offset = integer_zero_node; > + } > } > > if (decl)
On 7/27/20 5:53 PM, Martin Sebor wrote: > The tests I committed with the change didn't exercise any of > this so that's my bad. I'm still not sure I understand how > the problem with the incomplete type comes up (I haven't had > a chance to look into the recent updates on the bug yet) but > to retain the optimization (and keep the comments in sync > with the code) I think a better solution than restricting > the function to integers is to limit it to complete types. > Beyond that, extending the function to also constant arrays > or nonzero aggregates will also enable the optimization for > those. Hello. I must admit that I'm not super-familiar with that code I modified. Can you please assign the PR and propose a proper fix? I can then test it on chromium and I'm also deferring backport of the patch I installed to master. Thanks, Martin
On 7/27/20 12:54 PM, Martin Liška wrote: > On 7/27/20 5:53 PM, Martin Sebor wrote: >> The tests I committed with the change didn't exercise any of >> this so that's my bad. I'm still not sure I understand how >> the problem with the incomplete type comes up (I haven't had >> a chance to look into the recent updates on the bug yet) but >> to retain the optimization (and keep the comments in sync >> with the code) I think a better solution than restricting >> the function to integers is to limit it to complete types. >> Beyond that, extending the function to also constant arrays >> or nonzero aggregates will also enable the optimization for >> those. > > Hello. > > I must admit that I'm not super-familiar with that code I modified. > Can you please assign the PR and propose a proper fix? I can then > test it on chromium and I'm also deferring backport of the patch I > installed > to master. Sure. I've been trying to wrap something up and it's been taking longer than I expected. I'll look into this as soon as I'm done, hopefully tomorrow. Martin
On Mon, Jul 27, 2020 at 09:53:31AM -0600, Martin Sebor via Gcc-patches wrote: > Return a pointer P to a NUL-terminated string containing > the sequence of bytes corresponding to the representation > of the object referred to by SRC (or a subsequence of such > bytes within it if SRC is a reference to an initialized > constant array plus some constant offset). > > I.e., c_getstr returns a STRING_CST for arbitrary non-string > constants. This enables optimizations like the by-pieces > expansion of calls to raw memory functions like memcpy, or > the folding of other raw memory calls like memchr with non- > string arguments. > > c_getstr relies on string_constant for that. Restricting > the latter function to just character types prevents these > optimizations for zero-initialized constants of other types. > A test case that shows the difference to the by-pieces > expansion goes something like this: Having STRING_CST in the compiler with arbitrary array type is IMHO a very bad idea, so if you want something like that, you should come up with a different representation for that, not STRING_CSTs. Because most of the compiler assumes STRING_CSTs are what it says, string literals where elements are some kind of characters, don't have to be narrow, but better should be integral. Maybe returning a CONSTRUCTOR with no elements with the right type is a better idea for that, that in the compiler stands for zero initialized aggregate. Jakub
On Mon, Jul 27, 2020 at 10:49 PM Jakub Jelinek via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > > On Mon, Jul 27, 2020 at 09:53:31AM -0600, Martin Sebor via Gcc-patches wrote: > > Return a pointer P to a NUL-terminated string containing > > the sequence of bytes corresponding to the representation > > of the object referred to by SRC (or a subsequence of such > > bytes within it if SRC is a reference to an initialized > > constant array plus some constant offset). > > > > I.e., c_getstr returns a STRING_CST for arbitrary non-string > > constants. This enables optimizations like the by-pieces > > expansion of calls to raw memory functions like memcpy, or > > the folding of other raw memory calls like memchr with non- > > string arguments. > > > > c_getstr relies on string_constant for that. Restricting > > the latter function to just character types prevents these > > optimizations for zero-initialized constants of other types. > > A test case that shows the difference to the by-pieces > > expansion goes something like this: > > Having STRING_CST in the compiler with arbitrary array type is IMHO a very > bad idea, so if you want something like that, you should come up with a > different representation for that, not STRING_CSTs. > Because most of the compiler assumes STRING_CSTs are what it says, string > literals where elements are some kind of characters, don't have to be > narrow, but better should be integral. > Maybe returning a CONSTRUCTOR with no elements with the right type is a > better idea for that, that in the compiler stands for zero initialized > aggregate. It's also a much shorter representation (that also works for strings, btw) if it is all about all-zero "constants". Richard. > Jakub >
On 7/28/20 9:00 AM, Richard Biener via Gcc-patches wrote: > On Mon, Jul 27, 2020 at 10:49 PM Jakub Jelinek via Gcc-patches > <gcc-patches@gcc.gnu.org> wrote: >> >> On Mon, Jul 27, 2020 at 09:53:31AM -0600, Martin Sebor via Gcc-patches wrote: >>> Return a pointer P to a NUL-terminated string containing >>> the sequence of bytes corresponding to the representation >>> of the object referred to by SRC (or a subsequence of such >>> bytes within it if SRC is a reference to an initialized >>> constant array plus some constant offset). >>> >>> I.e., c_getstr returns a STRING_CST for arbitrary non-string >>> constants. This enables optimizations like the by-pieces >>> expansion of calls to raw memory functions like memcpy, or >>> the folding of other raw memory calls like memchr with non- >>> string arguments. >>> >>> c_getstr relies on string_constant for that. Restricting >>> the latter function to just character types prevents these >>> optimizations for zero-initialized constants of other types. >>> A test case that shows the difference to the by-pieces >>> expansion goes something like this: >> >> Having STRING_CST in the compiler with arbitrary array type is IMHO a very >> bad idea, so if you want something like that, you should come up with a >> different representation for that, not STRING_CSTs. >> Because most of the compiler assumes STRING_CSTs are what it says, string >> literals where elements are some kind of characters, don't have to be >> narrow, but better should be integral. >> Maybe returning a CONSTRUCTOR with no elements with the right type is a >> better idea for that, that in the compiler stands for zero initialized >> aggregate. > > It's also a much shorter representation (that also works for strings, btw) > if it is all about all-zero "constants". > > Richard. > >> Jakub >> Based on the upcoming discussion, I decided to backport my current fix to releases/gcc-10 branch in order to fix the problematic ICE in chromium. I'm ready to backport whatever Martin comes up with. Martin
diff --git a/gcc/expr.c b/gcc/expr.c index 5db0a7a8565..c3fdd82b319 100644 --- a/gcc/expr.c +++ b/gcc/expr.c @@ -11828,17 +11828,21 @@ string_constant (tree arg, tree *ptr_offset, tree *mem_size, tree *decl) chartype = TREE_TYPE (chartype); while (TREE_CODE (chartype) == ARRAY_TYPE) chartype = TREE_TYPE (chartype); - /* Convert a char array to an empty STRING_CST having an array - of the expected type and size. */ - if (!initsize) - initsize = integer_zero_node; - unsigned HOST_WIDE_INT size = tree_to_uhwi (initsize); - init = build_string_literal (size, NULL, chartype, size); - init = TREE_OPERAND (init, 0); - init = TREE_OPERAND (init, 0); + if (TYPE_MAIN_VARIANT (chartype) == char_type_node) + { + /* Convert a char array to an empty STRING_CST having an array + of the expected type and size. */ + if (!initsize) + initsize = integer_zero_node; + + unsigned HOST_WIDE_INT size = tree_to_uhwi (initsize); + init = build_string_literal (size, NULL, chartype, size); + init = TREE_OPERAND (init, 0); + init = TREE_OPERAND (init, 0); - *ptr_offset = integer_zero_node; + *ptr_offset = integer_zero_node; + } } if (decl)