diff mbox series

expr: build string_constant only for a char type

Message ID 26530bce-6ed5-082f-f5e5-cebe827719a2@suse.cz
State New
Headers show
Series expr: build string_constant only for a char type | expand

Commit Message

Martin Liška July 27, 2020, 12:32 p.m. UTC
Hey.

As mentioned in the PR, we should not create a string constant for a type
that is different from char_type_node. Looking at expr.c, I was inspired
and used 'TYPE_MAIN_VARIANT (chartype) == char_type_node' to verify that underlying
type is a character type.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests. And it fixes chromium
build with gcc-10 branch with the patch applied.

Ready to be installed?
Thanks,
Martin

gcc/ChangeLog:

	PR tree-optimization/96058
	* expr.c (string_constant): Build string_constant only
	for a type that is main variant of char_type_node.
---
  gcc/expr.c | 22 +++++++++++++---------
  1 file changed, 13 insertions(+), 9 deletions(-)

Comments

Jakub Jelinek July 27, 2020, 1:16 p.m. UTC | #1
On Mon, Jul 27, 2020 at 02:32:15PM +0200, Martin Liška wrote:
> As mentioned in the PR, we should not create a string constant for a type
> that is different from char_type_node. Looking at expr.c, I was inspired
> and used 'TYPE_MAIN_VARIANT (chartype) == char_type_node' to verify that underlying
> type is a character type.

That doesn't look correct, there is char, signed char, unsigned char,
or say std::byte, and all of them are perfectly fine.
So, rather than requiring it is char and nothing else, you should instead
check that it is an INTEGRAL_TYPE_P (maybe better other than BOOLEAN_TYPE?),
which is complete and has the same TYPE_PRECISION as char_type_node.

	Jakub
Martin Liška July 27, 2020, 2:12 p.m. UTC | #2
On 7/27/20 3:16 PM, Jakub Jelinek wrote:
> On Mon, Jul 27, 2020 at 02:32:15PM +0200, Martin Liška wrote:
>> As mentioned in the PR, we should not create a string constant for a type
>> that is different from char_type_node. Looking at expr.c, I was inspired
>> and used 'TYPE_MAIN_VARIANT (chartype) == char_type_node' to verify that underlying
>> type is a character type.
> 
> That doesn't look correct, there is char, signed char, unsigned char,
> or say std::byte, and all of them are perfectly fine.
> So, rather than requiring it is char and nothing else, you should instead
> check that it is an INTEGRAL_TYPE_P (maybe better other than BOOLEAN_TYPE?),
> which is complete and has the same TYPE_PRECISION as char_type_node.

All right, the following survives tests and bootstraps.

Ready to be installed?
Thanks,
Martin

> 
> 	Jakub
>
Jakub Jelinek July 27, 2020, 2:14 p.m. UTC | #3
On Mon, Jul 27, 2020 at 04:12:09PM +0200, Martin Liška wrote:
> On 7/27/20 3:16 PM, Jakub Jelinek wrote:
> > On Mon, Jul 27, 2020 at 02:32:15PM +0200, Martin Liška wrote:
> > > As mentioned in the PR, we should not create a string constant for a type
> > > that is different from char_type_node. Looking at expr.c, I was inspired
> > > and used 'TYPE_MAIN_VARIANT (chartype) == char_type_node' to verify that underlying
> > > type is a character type.
> > 
> > That doesn't look correct, there is char, signed char, unsigned char,
> > or say std::byte, and all of them are perfectly fine.
> > So, rather than requiring it is char and nothing else, you should instead
> > check that it is an INTEGRAL_TYPE_P (maybe better other than BOOLEAN_TYPE?),
> > which is complete and has the same TYPE_PRECISION as char_type_node.
> 
> All right, the following survives tests and bootstraps.

LGTM.

	Jakub
Martin Sebor July 27, 2020, 3:53 p.m. UTC | #4
On 7/27/20 6:32 AM, Martin Liška wrote:
> Hey.
> 
> As mentioned in the PR, we should not create a string constant for a type
> that is different from char_type_node. Looking at expr.c, I was inspired
> and used 'TYPE_MAIN_VARIANT (chartype) == char_type_node' to verify that 
> underlying
> type is a character type.
> 
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests. 
> And it fixes chromium
> build with gcc-10 branch with the patch applied.
> 
> Ready to be installed?
> Thanks,
> Martin
> 
> gcc/ChangeLog:
> 
>      PR tree-optimization/96058
>      * expr.c (string_constant): Build string_constant only
>      for a type that is main variant of char_type_node.
> ---
>   gcc/expr.c | 22 +++++++++++++---------
>   1 file changed, 13 insertions(+), 9 deletions(-)
> 
> diff --git a/gcc/expr.c b/gcc/expr.c
> index 5db0a7a8565..c3fdd82b319 100644
> --- a/gcc/expr.c
> +++ b/gcc/expr.c
> @@ -11828,17 +11828,21 @@ string_constant (tree arg, tree *ptr_offset, 
> tree *mem_size, tree *decl)
>       chartype = TREE_TYPE (chartype);
>         while (TREE_CODE (chartype) == ARRAY_TYPE)
>       chartype = TREE_TYPE (chartype);
> -      /* Convert a char array to an empty STRING_CST having an array
> -     of the expected type and size.  */
> -      if (!initsize)
> -      initsize = integer_zero_node;
> 
> -      unsigned HOST_WIDE_INT size = tree_to_uhwi (initsize);
> -      init = build_string_literal (size, NULL, chartype, size);
> -      init = TREE_OPERAND (init, 0);
> -      init = TREE_OPERAND (init, 0);
> +      if (TYPE_MAIN_VARIANT (chartype) == char_type_node)

The change to c_getstr I recently committed made it clear that
the function can:

   Return a pointer P to a NUL-terminated string containing
   the sequence of bytes corresponding to the representation
   of the object referred to by SRC (or a subsequence of such
   bytes within it if SRC is a reference to an initialized
   constant array plus some constant offset).

I.e., c_getstr returns a STRING_CST for arbitrary non-string
constants.  This enables optimizations like the by-pieces
expansion of calls to raw memory functions like memcpy, or
the folding of other raw memory calls like memchr with non-
string arguments.

c_getstr relies on string_constant for that.  Restricting
the latter function to just character types prevents these
optimizations for zero-initialized constants of other types.
A test case that shows the difference to the by-pieces
expansion goes something like this:

   const struct { char a[64]; } x = { 0 };

   void f (void *d)
   {
     __builtin_memcpy (d, &x, sizeof x - 1);
   }

A test case for the memchr folding is similar:

   const struct { char a[64]; } x = { 0 };

   int f (void *d)
   {
     return __builtin_memchr (&x, 0, sizeof x) != 0;
   }

The tests I committed with the change didn't exercise any of
this so that's my bad.  I'm still not sure I understand how
the problem with the incomplete type comes up (I haven't had
a chance to look into the recent updates on the bug yet) but
to retain the optimization (and keep the comments in sync
with the code) I think a better solution than restricting
the function to integers is to limit it to complete types.
Beyond that, extending the function to also constant arrays
or nonzero aggregates will also enable the optimization for
those.

Martin

> +    {
> +      /* Convert a char array to an empty STRING_CST having an array
> +         of the expected type and size.  */
> +      if (!initsize)
> +        initsize = integer_zero_node;
> +
> +      unsigned HOST_WIDE_INT size = tree_to_uhwi (initsize);
> +      init = build_string_literal (size, NULL, chartype, size);
> +      init = TREE_OPERAND (init, 0);
> +      init = TREE_OPERAND (init, 0);
> 
> -      *ptr_offset = integer_zero_node;
> +      *ptr_offset = integer_zero_node;
> +    }
>       }
> 
>     if (decl)
Martin Liška July 27, 2020, 6:54 p.m. UTC | #5
On 7/27/20 5:53 PM, Martin Sebor wrote:
> The tests I committed with the change didn't exercise any of
> this so that's my bad.  I'm still not sure I understand how
> the problem with the incomplete type comes up (I haven't had
> a chance to look into the recent updates on the bug yet) but
> to retain the optimization (and keep the comments in sync
> with the code) I think a better solution than restricting
> the function to integers is to limit it to complete types.
> Beyond that, extending the function to also constant arrays
> or nonzero aggregates will also enable the optimization for
> those.

Hello.

I must admit that I'm not super-familiar with that code I modified.
Can you please assign the PR and propose a proper fix? I can then
test it on chromium and I'm also deferring backport of the patch I installed
to master.

Thanks,
Martin
Martin Sebor July 27, 2020, 7:39 p.m. UTC | #6
On 7/27/20 12:54 PM, Martin Liška wrote:
> On 7/27/20 5:53 PM, Martin Sebor wrote:
>> The tests I committed with the change didn't exercise any of
>> this so that's my bad.  I'm still not sure I understand how
>> the problem with the incomplete type comes up (I haven't had
>> a chance to look into the recent updates on the bug yet) but
>> to retain the optimization (and keep the comments in sync
>> with the code) I think a better solution than restricting
>> the function to integers is to limit it to complete types.
>> Beyond that, extending the function to also constant arrays
>> or nonzero aggregates will also enable the optimization for
>> those.
> 
> Hello.
> 
> I must admit that I'm not super-familiar with that code I modified.
> Can you please assign the PR and propose a proper fix? I can then
> test it on chromium and I'm also deferring backport of the patch I 
> installed
> to master.

Sure.  I've been trying to wrap something up and it's been taking
longer than I expected.  I'll look into this as soon as I'm done,
hopefully tomorrow.

Martin
Jakub Jelinek July 27, 2020, 8:49 p.m. UTC | #7
On Mon, Jul 27, 2020 at 09:53:31AM -0600, Martin Sebor via Gcc-patches wrote:
>   Return a pointer P to a NUL-terminated string containing
>   the sequence of bytes corresponding to the representation
>   of the object referred to by SRC (or a subsequence of such
>   bytes within it if SRC is a reference to an initialized
>   constant array plus some constant offset).
> 
> I.e., c_getstr returns a STRING_CST for arbitrary non-string
> constants.  This enables optimizations like the by-pieces
> expansion of calls to raw memory functions like memcpy, or
> the folding of other raw memory calls like memchr with non-
> string arguments.
> 
> c_getstr relies on string_constant for that.  Restricting
> the latter function to just character types prevents these
> optimizations for zero-initialized constants of other types.
> A test case that shows the difference to the by-pieces
> expansion goes something like this:

Having STRING_CST in the compiler with arbitrary array type is IMHO a very
bad idea, so if you want something like that, you should come up with a
different representation for that, not STRING_CSTs.
Because most of the compiler assumes STRING_CSTs are what it says, string
literals where elements are some kind of characters, don't have to be
narrow, but better should be integral.
Maybe returning a CONSTRUCTOR with no elements with the right type is a
better idea for that, that in the compiler stands for zero initialized
aggregate.

	Jakub
Richard Biener July 28, 2020, 7 a.m. UTC | #8
On Mon, Jul 27, 2020 at 10:49 PM Jakub Jelinek via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> On Mon, Jul 27, 2020 at 09:53:31AM -0600, Martin Sebor via Gcc-patches wrote:
> >   Return a pointer P to a NUL-terminated string containing
> >   the sequence of bytes corresponding to the representation
> >   of the object referred to by SRC (or a subsequence of such
> >   bytes within it if SRC is a reference to an initialized
> >   constant array plus some constant offset).
> >
> > I.e., c_getstr returns a STRING_CST for arbitrary non-string
> > constants.  This enables optimizations like the by-pieces
> > expansion of calls to raw memory functions like memcpy, or
> > the folding of other raw memory calls like memchr with non-
> > string arguments.
> >
> > c_getstr relies on string_constant for that.  Restricting
> > the latter function to just character types prevents these
> > optimizations for zero-initialized constants of other types.
> > A test case that shows the difference to the by-pieces
> > expansion goes something like this:
>
> Having STRING_CST in the compiler with arbitrary array type is IMHO a very
> bad idea, so if you want something like that, you should come up with a
> different representation for that, not STRING_CSTs.
> Because most of the compiler assumes STRING_CSTs are what it says, string
> literals where elements are some kind of characters, don't have to be
> narrow, but better should be integral.
> Maybe returning a CONSTRUCTOR with no elements with the right type is a
> better idea for that, that in the compiler stands for zero initialized
> aggregate.

It's also a much shorter representation (that also works for strings, btw)
if it is all about all-zero "constants".

Richard.

>         Jakub
>
Martin Liška July 28, 2020, 10:38 a.m. UTC | #9
On 7/28/20 9:00 AM, Richard Biener via Gcc-patches wrote:
> On Mon, Jul 27, 2020 at 10:49 PM Jakub Jelinek via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
>>
>> On Mon, Jul 27, 2020 at 09:53:31AM -0600, Martin Sebor via Gcc-patches wrote:
>>>    Return a pointer P to a NUL-terminated string containing
>>>    the sequence of bytes corresponding to the representation
>>>    of the object referred to by SRC (or a subsequence of such
>>>    bytes within it if SRC is a reference to an initialized
>>>    constant array plus some constant offset).
>>>
>>> I.e., c_getstr returns a STRING_CST for arbitrary non-string
>>> constants.  This enables optimizations like the by-pieces
>>> expansion of calls to raw memory functions like memcpy, or
>>> the folding of other raw memory calls like memchr with non-
>>> string arguments.
>>>
>>> c_getstr relies on string_constant for that.  Restricting
>>> the latter function to just character types prevents these
>>> optimizations for zero-initialized constants of other types.
>>> A test case that shows the difference to the by-pieces
>>> expansion goes something like this:
>>
>> Having STRING_CST in the compiler with arbitrary array type is IMHO a very
>> bad idea, so if you want something like that, you should come up with a
>> different representation for that, not STRING_CSTs.
>> Because most of the compiler assumes STRING_CSTs are what it says, string
>> literals where elements are some kind of characters, don't have to be
>> narrow, but better should be integral.
>> Maybe returning a CONSTRUCTOR with no elements with the right type is a
>> better idea for that, that in the compiler stands for zero initialized
>> aggregate.
> 
> It's also a much shorter representation (that also works for strings, btw)
> if it is all about all-zero "constants".
> 
> Richard.
> 
>>          Jakub
>>

Based on the upcoming discussion, I decided to backport my current fix to releases/gcc-10
branch in order to fix the problematic ICE in chromium. I'm ready to backport whatever
Martin comes up with.

Martin
diff mbox series

Patch

diff --git a/gcc/expr.c b/gcc/expr.c
index 5db0a7a8565..c3fdd82b319 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -11828,17 +11828,21 @@  string_constant (tree arg, tree *ptr_offset, tree *mem_size, tree *decl)
  	chartype = TREE_TYPE (chartype);
        while (TREE_CODE (chartype) == ARRAY_TYPE)
  	chartype = TREE_TYPE (chartype);
-      /* Convert a char array to an empty STRING_CST having an array
-	 of the expected type and size.  */
-      if (!initsize)
-	  initsize = integer_zero_node;
  
-      unsigned HOST_WIDE_INT size = tree_to_uhwi (initsize);
-      init = build_string_literal (size, NULL, chartype, size);
-      init = TREE_OPERAND (init, 0);
-      init = TREE_OPERAND (init, 0);
+      if (TYPE_MAIN_VARIANT (chartype) == char_type_node)
+	{
+	  /* Convert a char array to an empty STRING_CST having an array
+	     of the expected type and size.  */
+	  if (!initsize)
+	    initsize = integer_zero_node;
+
+	  unsigned HOST_WIDE_INT size = tree_to_uhwi (initsize);
+	  init = build_string_literal (size, NULL, chartype, size);
+	  init = TREE_OPERAND (init, 0);
+	  init = TREE_OPERAND (init, 0);
  
-      *ptr_offset = integer_zero_node;
+	  *ptr_offset = integer_zero_node;
+	}
      }
  
    if (decl)