diff mbox series

rtl cse: Fix PR94740, ICE on testsuite/gcc.dg/sso/t5.c with -mcpu=future -mpcrel -O1

Message ID 6748c592-4a51-604c-8dd1-c5f2bae5b5da@linux.ibm.com
State New
Headers show
Series rtl cse: Fix PR94740, ICE on testsuite/gcc.dg/sso/t5.c with -mcpu=future -mpcrel -O1 | expand

Commit Message

Peter Bergner April 27, 2020, 10:30 p.m. UTC
rtl-optimization: ICE on testsuite/gcc.dg/sso/t5.c with -mcpu=future -mpcrel -O1 [PR94740]

We ICE on the test case below because decompose_normal_address()  doesn't
expect to see memory operands with constant addresses like below without
a (const:DI ...) wrapped around the PLUS:

	(mem/c:SI (plus:DI (symbol_ref:DI ("*.LANCHOR0") [flags 0x182])
			   (const_int 4 [0x4])) [1 array+4 S4 A32])

What we expect to see is:

	(mem/c:SI (const:DI (plus:DI (symbol_ref:DI ("*.LANCHOR0") [flags 0x182])
				     (const_int 4 [0x4]))) [1 arrayD.2903+4 S4 A32])

Sometimes, combine adds the (const: ...) for us via simplify_binary_operand
call, but that only happens when combine actually does something with this
insn.  The bad address from the test case actually comes from CSE, so the
fix here is to make CSE add the (const: ...) whenever it creates a MEM with
a constant address.

This passed bootstrap and regtesting on powerpc64le-linux with no regressions.
Ok for trunk?

Peter


gcc/
	PR rtl-optimization/94740
	* cse.c (cse_process_notes):

gcc/testsuite/
	PR rtl-optimization/94740
	* gcc.target/powerpc/pr94740.c: New test.

Comments

Richard Sandiford April 28, 2020, 7:38 a.m. UTC | #1
Peter Bergner <bergner@linux.ibm.com> writes:
> rtl-optimization: ICE on testsuite/gcc.dg/sso/t5.c with -mcpu=future -mpcrel -O1 [PR94740]
>
> We ICE on the test case below because decompose_normal_address()  doesn't
> expect to see memory operands with constant addresses like below without
> a (const:DI ...) wrapped around the PLUS:
>
> 	(mem/c:SI (plus:DI (symbol_ref:DI ("*.LANCHOR0") [flags 0x182])
> 			   (const_int 4 [0x4])) [1 array+4 S4 A32])
>
> What we expect to see is:
>
> 	(mem/c:SI (const:DI (plus:DI (symbol_ref:DI ("*.LANCHOR0") [flags 0x182])
> 				     (const_int 4 [0x4]))) [1 arrayD.2903+4 S4 A32])
>
> Sometimes, combine adds the (const: ...) for us via simplify_binary_operand
> call, but that only happens when combine actually does something with this
> insn.  The bad address from the test case actually comes from CSE, so the
> fix here is to make CSE add the (const: ...) whenever it creates a MEM with
> a constant address.

I think we should do this in cse_process_notes_1, both to avoid creating
an invalid MEM in the first place, and so that we handle embedded MEMs
correctly too.

Also, the (const:P ...) ought to be there even outside of a MEM.  E.g. we
ought to have:

  (set (reg X) (const:P (plus:P (symbol_ref:P S) (const_int D))))

rather than:

  (set (reg X) (plus:P (symbol_ref:P S) (const_int D)))

Adding a PLUS case would fix this example, but I guess a more general
fix would be to add a second switch statement (after the first) that
switches on GET_RTX_CLASS and uses logic like simplify_replace_fn_rtx:

    case RTX_BIN_ARITH:
    case RTX_COMM_ARITH:
      op0 = simplify_replace_fn_rtx (XEXP (x, 0), old_rtx, fn, data);
      op1 = simplify_replace_fn_rtx (XEXP (x, 1), old_rtx, fn, data);
      if (op0 == XEXP (x, 0) && op1 == XEXP (x, 1))
	return x;
      return simplify_gen_binary (code, mode, op0, op1);

Maybe we could even replace cse_process_notes_1 with simplify_replace_fn_rtx,
but that's obviously a much more invasive change :-)

Thanks,
Richard
Peter Bergner April 28, 2020, 12:23 p.m. UTC | #2
On 4/28/20 2:38 AM, Richard Sandiford wrote:
> I think we should do this in cse_process_notes_1, both to avoid creating
> an invalid MEM in the first place, and so that we handle embedded MEMs
> correctly too.
> 
> Also, the (const:P ...) ought to be there even outside of a MEM.  E.g. we
> ought to have:
> 
>   (set (reg X) (const:P (plus:P (symbol_ref:P S) (const_int D))))
> 
> rather than:
> 
>   (set (reg X) (plus:P (symbol_ref:P S) (const_int D)))

I wondered about things such as this, but I couldn't remember ever
seeing a (const:P ...) wrapped around a constant expression before.
Maybe I just wasn't looking for it???



> Adding a PLUS case would fix this example, but I guess a more general
> fix would be to add a second switch statement (after the first) that
> switches on GET_RTX_CLASS and uses logic like simplify_replace_fn_rtx:
> 
>     case RTX_BIN_ARITH:
>     case RTX_COMM_ARITH:
>       op0 = simplify_replace_fn_rtx (XEXP (x, 0), old_rtx, fn, data);
>       op1 = simplify_replace_fn_rtx (XEXP (x, 1), old_rtx, fn, data);
>       if (op0 == XEXP (x, 0) && op1 == XEXP (x, 1))
> 	return x;
>       return simplify_gen_binary (code, mode, op0, op1);

Ok, I'll try this and see whether we survive bootstrap and regtesting.
Thanks!

Peter
Peter Bergner April 28, 2020, 4:02 p.m. UTC | #3
On 4/28/20 2:38 AM, Richard Sandiford wrote:
>     case RTX_BIN_ARITH:
>     case RTX_COMM_ARITH:
>       op0 = simplify_replace_fn_rtx (XEXP (x, 0), old_rtx, fn, data);
>       op1 = simplify_replace_fn_rtx (XEXP (x, 1), old_rtx, fn, data);
>       if (op0 == XEXP (x, 0) && op1 == XEXP (x, 1))
> 	return x;
>       return simplify_gen_binary (code, mode, op0, op1);

Is there a reason you use simplify_replace_fn_rtx here, rather than
just using op0 = simplify_rtx (XEXP (x, 0))?  Ditto for op1.
Does simplify_replace_fn_rtx do something that simplify_rtx doesn't?

Peter
Richard Sandiford April 28, 2020, 4:57 p.m. UTC | #4
Peter Bergner <bergner@linux.ibm.com> writes:
> On 4/28/20 2:38 AM, Richard Sandiford wrote:
>>     case RTX_BIN_ARITH:
>>     case RTX_COMM_ARITH:
>>       op0 = simplify_replace_fn_rtx (XEXP (x, 0), old_rtx, fn, data);
>>       op1 = simplify_replace_fn_rtx (XEXP (x, 1), old_rtx, fn, data);
>>       if (op0 == XEXP (x, 0) && op1 == XEXP (x, 1))
>> 	return x;
>>       return simplify_gen_binary (code, mode, op0, op1);
>
> Is there a reason you use simplify_replace_fn_rtx here, rather than
> just using op0 = simplify_rtx (XEXP (x, 0))?  Ditto for op1.
> Does simplify_replace_fn_rtx do something that simplify_rtx doesn't?

I was just quoting code from simplify_replace_fn_rtx as an example of
something that handles a similar situation.  The recursive calls would
be different for cse_process_notes_1.

Sorry for the confusion :-)

Richard
Peter Bergner April 28, 2020, 9:51 p.m. UTC | #5
On 4/28/20 11:57 AM, Richard Sandiford wrote:
> I was just quoting code from simplify_replace_fn_rtx as an example of
> something that handles a similar situation.  The recursive calls would
> be different for cse_process_notes_1.

Ok, how about the following which adds the (const:P ...) as well, which
fixes the ICE?  The FOR loop which now follows this switch statement is
what used to modify the PLUS's operands without adding the (const:P ).

This is still in the middle of bootstrap and regtesting.

Peter


gcc/
	PR rtl-optimization/94740
	* cse.c (cse_process_notes_1):

gcc/testsuite/
	PR rtl-optimization/94740
	* gcc.target/powerpc/pr94740.c: New test.

diff --git a/gcc/cse.c b/gcc/cse.c
index 5aaba8d80e0..4592820c04c 100644
--- a/gcc/cse.c
+++ b/gcc/cse.c
@@ -6314,6 +6314,21 @@ cse_process_notes_1 (rtx x, rtx object, bool *changed)
       break;
     }
 
+  switch (GET_RTX_CLASS (code))
+    {
+    case RTX_BIN_ARITH:
+    case RTX_COMM_ARITH:
+      /* Call simplify_gen_binary with our updated operands in case they
+	 are now constant and need to be wrapped with a (const:P ...).  */
+      validate_change (object, &XEXP (x, 0),
+		       cse_process_notes (XEXP (x, 0), object, changed), false);
+      validate_change (object, &XEXP (x, 1),
+		       cse_process_notes (XEXP (x, 1), object, changed), false);
+      return simplify_gen_binary (code, GET_MODE (x), XEXP (x, 0), XEXP (x, 1));
+    default:
+      break;
+    }
+
   for (i = 0; i < GET_RTX_LENGTH (code); i++)
     if (fmt[i] == 'e')
       validate_change (object, &XEXP (x, i),
diff --git a/gcc/testsuite/gcc.target/powerpc/pr94740.c b/gcc/testsuite/gcc.target/powerpc/pr94740.c
new file mode 100644
index 00000000000..78e40fc84da
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr94740.c
@@ -0,0 +1,11 @@
+/* PR rtl-optimization/94740 */
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_future_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=future -mpcrel" } */
+
+int array[8];
+int
+foo (void)
+{
+  return __builtin_bswap32 (array[1]);
+}
Richard Sandiford April 28, 2020, 10:39 p.m. UTC | #6
Peter Bergner <bergner@linux.ibm.com> writes:
> On 4/28/20 11:57 AM, Richard Sandiford wrote:
>> I was just quoting code from simplify_replace_fn_rtx as an example of
>> something that handles a similar situation.  The recursive calls would
>> be different for cse_process_notes_1.
>
> Ok, how about the following which adds the (const:P ...) as well, which
> fixes the ICE?  The FOR loop which now follows this switch statement is
> what used to modify the PLUS's operands without adding the (const:P ).
>
> This is still in the middle of bootstrap and regtesting.
>
> Peter
>
>
> gcc/
> 	PR rtl-optimization/94740
> 	* cse.c (cse_process_notes_1):
>
> gcc/testsuite/
> 	PR rtl-optimization/94740
> 	* gcc.target/powerpc/pr94740.c: New test.
>
> diff --git a/gcc/cse.c b/gcc/cse.c
> index 5aaba8d80e0..4592820c04c 100644
> --- a/gcc/cse.c
> +++ b/gcc/cse.c
> @@ -6314,6 +6314,21 @@ cse_process_notes_1 (rtx x, rtx object, bool *changed)
>        break;
>      }
>  
> +  switch (GET_RTX_CLASS (code))
> +    {
> +    case RTX_BIN_ARITH:
> +    case RTX_COMM_ARITH:
> +      /* Call simplify_gen_binary with our updated operands in case they
> +	 are now constant and need to be wrapped with a (const:P ...).  */
> +      validate_change (object, &XEXP (x, 0),
> +		       cse_process_notes (XEXP (x, 0), object, changed), false);
> +      validate_change (object, &XEXP (x, 1),
> +		       cse_process_notes (XEXP (x, 1), object, changed), false);
> +      return simplify_gen_binary (code, GET_MODE (x), XEXP (x, 0), XEXP (x, 1));

If we use simplify_gen_binary then I don't think we need to validate
each change individually.  It should be enough to do something like:

      x0 = cse_process_notes (XEXP (x, 0), object, changed);
      x1 = cse_process_notes (XEXP (x, 1), object, changed);
      if (x0 == XEXP (x, 0) && x1 == XEXP (x, 1))
        return x;
      return simplify_gen_binary (code, GET_MODE (x), x0, x1);

Alternatively, in the hope of reducing redundant rtl, I guess we could
add the switch statement after the format walk and do:

  switch (GET_RTX_CLASS (code))
    {
    case RTX_BIN_ARITH:
    case RTX_COMM_ARITH:
      if (rtx new_rtx = simplify_binary (code, GET_MODE (x),
					 XEXP (x, 0), XEXP (x, 1)))
	return new_rtx;
      break;
    default:
      break;
   }

Thanks,
Richard
Peter Bergner April 28, 2020, 11:06 p.m. UTC | #7
On 4/28/20 5:39 PM, Richard Sandiford wrote:
> If we use simplify_gen_binary then I don't think we need to validate
> each change individually.  It should be enough to do something like:
> 
>       x0 = cse_process_notes (XEXP (x, 0), object, changed);
>       x1 = cse_process_notes (XEXP (x, 1), object, changed);
>       if (x0 == XEXP (x, 0) && x1 == XEXP (x, 1))
>         return x;
>       return simplify_gen_binary (code, GET_MODE (x), x0, x1);

Ok, I wasn't sure whether simplify_gen_binary verified its changes or not.



> Alternatively, in the hope of reducing redundant rtl, I guess we could
> add the switch statement after the format walk and do:
> 
>   switch (GET_RTX_CLASS (code))
>     {
>     case RTX_BIN_ARITH:
>     case RTX_COMM_ARITH:
>       if (rtx new_rtx = simplify_binary (code, GET_MODE (x),
> 					 XEXP (x, 0), XEXP (x, 1)))
> 	return new_rtx;
>       break;
>     default:
>       break;
>    }

Doesn't moving this after the format walk, create the situation of
producing constant expressions without (const:P ...) and then fixing
them up after the fact like my original patch did?  That's why I placed
the previous version before the format walk, but if you're ok with that
now, then that's fine with me.  That said, the PLUS will be wrapped in
a (const:P ...) before it's substituted into the MEM, so maybe it was
just the MEM you were worried about?

However, do you mean the change to be the following, since I don't think
simplify_gen_binary ever returns NULL?


       validate_change (object, &XEXP (x, i),
                       cse_process_notes (XEXP (x, i), object, changed), 0);
 
+  switch (GET_RTX_CLASS (code))
+    {
+    case RTX_BIN_ARITH:
+    case RTX_COMM_ARITH:
+      return simplify_gen_binary (code, GET_MODE (x), XEXP (x, 0), XEXP (x, 1));
+    default:
+      break;
+   }
+
   return x;
 }


Peter
Peter Bergner April 29, 2020, 12:50 a.m. UTC | #8
On 4/28/20 6:06 PM, Peter Bergner wrote:
> However, do you mean the change to be the following, since I don't think
> simplify_gen_binary ever returns NULL?
> 
> 
>        validate_change (object, &XEXP (x, i),
>                        cse_process_notes (XEXP (x, i), object, changed), 0);
>  
> +  switch (GET_RTX_CLASS (code))
> +    {
> +    case RTX_BIN_ARITH:
> +    case RTX_COMM_ARITH:
> +      return simplify_gen_binary (code, GET_MODE (x), XEXP (x, 0), XEXP (x, 1));
> +    default:
> +      break;
> +   }
> +
>    return x;
>  }

FYI, this passed bootstrap and regtesting with no regressions.
Is the above ok (with comment from previous patch) for trunk?


Peter
Segher Boessenkool April 29, 2020, 12:57 a.m. UTC | #9
Hi!

On Mon, Apr 27, 2020 at 05:30:24PM -0500, Peter Bergner wrote:
> rtl-optimization: ICE on testsuite/gcc.dg/sso/t5.c with -mcpu=future -mpcrel -O1 [PR94740]
> 
> We ICE on the test case below because decompose_normal_address()  doesn't
> expect to see memory operands with constant addresses like below without
> a (const:DI ...) wrapped around the PLUS:
> 
> 	(mem/c:SI (plus:DI (symbol_ref:DI ("*.LANCHOR0") [flags 0x182])
> 			   (const_int 4 [0x4])) [1 array+4 S4 A32])
> 
> What we expect to see is:
> 
> 	(mem/c:SI (const:DI (plus:DI (symbol_ref:DI ("*.LANCHOR0") [flags 0x182])
> 				     (const_int 4 [0x4]))) [1 arrayD.2903+4 S4 A32])
> 
> Sometimes, combine adds the (const: ...) for us via simplify_binary_operand
> call, but that only happens when combine actually does something with this
> insn.  The bad address from the test case actually comes from CSE, so the
> fix here is to make CSE add the (const: ...) whenever it creates a MEM with
> a constant address.

So, is it a rule that we always should have CONST here?  Where is that
documented?

> gcc/
> 	PR rtl-optimization/94740
> 	* cse.c (cse_process_notes):

(Missing changelog entry here).

> diff --git a/gcc/cse.c b/gcc/cse.c
> index 5aaba8d80e0..cbe9e0a0692 100644
> --- a/gcc/cse.c
> +++ b/gcc/cse.c
> @@ -6328,6 +6328,14 @@ cse_process_notes (rtx x, rtx object, bool *changed)
>    rtx new_rtx = cse_process_notes_1 (x, object, changed);
>    if (new_rtx != x)
>      *changed = true;
> +  if (*changed && object != NULL_RTX && MEM_P (object))
> +    {
> +      /* Call simplify_rtx on the updated address in case it is now
> +	 a constant and needs to be wrapped with a (const: ...).  */
> +      rtx simplified_rtx = simplify_rtx (new_rtx);
> +      if (simplified_rtx)
> +	new_rtx = simplified_rtx;
> +    }
>    return new_rtx;
>  }

You can just do

  return simplify_replace_rtx (new_rtx, 0, 0);

(it doesn't return 0 if nothing changed, unlike many other simplify*
things) for everything inside the if (); but you can even lose the if ()
itself?

(Although, maybe you're doing this all somewhere else now?  :-) )

> diff --git a/gcc/testsuite/gcc.target/powerpc/pr94740.c b/gcc/testsuite/gcc.target/powerpc/pr94740.c
> new file mode 100644
> index 00000000000..78e40fc84da
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr94740.c
> @@ -0,0 +1,11 @@
> +/* PR rtl-optimization/94740 */
> +/* { dg-do compile } */
> +/* { dg-require-effective-target powerpc_future_ok } */
> +/* { dg-options "-O2 -mdejagnu-cpu=future -mpcrel" } */
> +
> +int array[8];
> +int
> +foo (void)
> +{
> +  return __builtin_bswap32 (array[1]);
> +}

That part is okay for trunk, of course :-)


Segher
Segher Boessenkool April 29, 2020, 1:03 a.m. UTC | #10
Hi!

On Tue, Apr 28, 2020 at 08:38:56AM +0100, Richard Sandiford wrote:
> Also, the (const:P ...) ought to be there even outside of a MEM.  E.g. we
> ought to have:
> 
>   (set (reg X) (const:P (plus:P (symbol_ref:P S) (const_int D))))
> 
> rather than:
> 
>   (set (reg X) (plus:P (symbol_ref:P S) (const_int D)))

What are the actual rules?  Where is this documented?

Of course it is highly desirable to have CONST around such constant
addresses, but when is it *required*?  And what *is* a constant address
(in this context)?


Segher
Li, Pan2 via Gcc-patches April 29, 2020, 2:56 p.m. UTC | #11
On Tue, 2020-04-28 at 20:03 -0500, Segher Boessenkool wrote:
> Hi!
> 
> On Tue, Apr 28, 2020 at 08:38:56AM +0100, Richard Sandiford wrote:
> > Also, the (const:P ...) ought to be there even outside of a MEM.  E.g. we
> > ought to have:
> > 
> >   (set (reg X) (const:P (plus:P (symbol_ref:P S) (const_int D))))
> > 
> > rather than:
> > 
> >   (set (reg X) (plus:P (symbol_ref:P S) (const_int D)))
> 
> What are the actual rules?  Where is this documented?
> 
> Of course it is highly desirable to have CONST around such constant
> addresses, but when is it *required*?  And what *is* a constant address
> (in this context)?
I don't know if it's documented, but it's been expected practice for decades.

It includes symbolic constants, and the results of simple arithmetic on symbolic
constants.  It (to the best of my knowledge) does not include logical operations
on symbolic constants.

Jeff
Richard Sandiford April 29, 2020, 3:13 p.m. UTC | #12
Peter Bergner <bergner@linux.ibm.com> writes:
> On 4/28/20 5:39 PM, Richard Sandiford wrote:
>> If we use simplify_gen_binary then I don't think we need to validate
>> each change individually.  It should be enough to do something like:
>> 
>>       x0 = cse_process_notes (XEXP (x, 0), object, changed);
>>       x1 = cse_process_notes (XEXP (x, 1), object, changed);
>>       if (x0 == XEXP (x, 0) && x1 == XEXP (x, 1))
>>         return x;
>>       return simplify_gen_binary (code, GET_MODE (x), x0, x1);
>
> Ok, I wasn't sure whether simplify_gen_binary verified its changes or not.
>
>> Alternatively, in the hope of reducing redundant rtl, I guess we could
>> add the switch statement after the format walk and do:
>> 
>>   switch (GET_RTX_CLASS (code))
>>     {
>>     case RTX_BIN_ARITH:
>>     case RTX_COMM_ARITH:
>>       if (rtx new_rtx = simplify_binary (code, GET_MODE (x),
>> 					 XEXP (x, 0), XEXP (x, 1)))
>> 	return new_rtx;
>>       break;
>>     default:
>>       break;
>>    }

To answer your later question, I meant simplify_binary_operation rather
than simplify_binary here.  Sorry, I always forget that the simplify_gen_*
and non-gen versions use different naming schemes.

But then I guess we're effectively back to acting like simplify_rtx,
as in your original patch, but without checking whether we're in a MEM.

The traditional problem with doing recursive replacement followed by
simplification is that we can lose the original mode for the operands
of unary operators.  E.g. (zero_extend:DI (reg:M X)) can become
(zero_extend:DI (const_int -1)), which isn't well-formed.
So calling simplify_rtx for all rtxes (not just binary ones)
could lead to problems.  More below.

> Doesn't moving this after the format walk, create the situation of
> producing constant expressions without (const:P ...) and then fixing
> them up after the fact like my original patch did?  That's why I placed
> the previous version before the format walk, but if you're ok with that
> now, then that's fine with me.  That said, the PLUS will be wrapped in
> a (const:P ...) before it's substituted into the MEM, so maybe it was
> just the MEM you were worried about?

To be honest, when reviewing the original patch, I think I'd missed that
cse_process_notes and cse_process_notes_1 were co-recursive, and so
every level of the MEM would be simplified by your simplify_rtx call.
I originally thought it was building up a new MEM and only then
simplifying it.

But yeah, like you say, the problem is that if we just add the
simplification on top, we're still validating the original unsimplified
replacement beforehand, so there's still a potential for non-canonical
rtl to seep into the validation routine before it gets corrected by
a parent call.  That might cause problems in future, even if it
doesn't yet.

Ugh.

Unfortunately, there doesn't seem to be a perfect off-the-shelf
routine to use here.  validate_replace_rtx_part looks like it has
the right kind of idea, but can only handle a single "from" and "to"
value.  simplify_replace_fn_rtx can handle general replacements,
but generates more garbage rtl and doesn't verify the new MEMs.

If this is a bug we need to fix for GCC 10 then how about this:

(1) change the MEM case to something like:

      {
        bool addr_changed = false;
        rtx addr = cse_process_notes (XEXP (x, 0), x, &addr_changed);
        if (addr_changed && validate_change (x, &XEXP (x, 0), addr, 0))
	  *changed = true;
        return x;
      }

    which makes it safe to do the validation only at the top level
    of the MEM.

(2) make the:

       for (i = 0; i < GET_RTX_LENGTH (code); i++)
         if (fmt[i] == 'e')

    loop assign directly to XEXP (x, i) if the new rtx is different,
    instead of using validate_change.  This ensures that temporarily
    invalid rtl doesn't get verified.

    (passing a null object to validate_change would have the same effect,
    but that feels a bit hacky)

(3) use the simplify_binary_operator thing above, since that at least
    should be safe, even if it potentially leaves similar bugs unfixed
    for other operators.

Sorry for the runaround :-(

Hopefully this area is something we can clean up for GCC 11.

Thanks,
Richard
Peter Bergner April 29, 2020, 6:19 p.m. UTC | #13
On 4/29/20 10:13 AM, Richard Sandiford wrote:
> If this is a bug we need to fix for GCC 10 then how about this:

I do think this is important to fix.  Mike Meissner just switched our
-mcpu=future code to enable -mpcrel by default in GCC10, so we're going
to see a lot more of these types of addresses than before.



> (1) change the MEM case to something like:
> 
>       {
>         bool addr_changed = false;
>         rtx addr = cse_process_notes (XEXP (x, 0), x, &addr_changed);
>         if (addr_changed && validate_change (x, &XEXP (x, 0), addr, 0))
> 	  *changed = true;
>         return x;
>       }
> 
>     which makes it safe to do the validation only at the top level
>     of the MEM.
> 
> (2) make the:
> 
>        for (i = 0; i < GET_RTX_LENGTH (code); i++)
>          if (fmt[i] == 'e')
> 
>     loop assign directly to XEXP (x, i) if the new rtx is different,
>     instead of using validate_change.  This ensures that temporarily
>     invalid rtl doesn't get verified.
> 
>     (passing a null object to validate_change would have the same effect,
>     but that feels a bit hacky)
> 
> (3) use the simplify_binary_operator thing above, since that at least
>     should be safe, even if it potentially leaves similar bugs unfixed
>     for other operators.
> 
> Sorry for the runaround :-(

No problem.  I appreciate the feedback and I agree we want to get this right.
I'll implement the above and report back.  Thanks!

Peter
Richard Sandiford April 29, 2020, 8:28 p.m. UTC | #14
Peter Bergner <bergner@linux.ibm.com> writes:
> On 4/29/20 10:13 AM, Richard Sandiford wrote:
>> If this is a bug we need to fix for GCC 10 then how about this:
>
> I do think this is important to fix.  Mike Meissner just switched our
> -mcpu=future code to enable -mpcrel by default in GCC10, so we're going
> to see a lot more of these types of addresses than before.

OK.

>> (1) change the MEM case to something like:
>> 
>>       {
>>         bool addr_changed = false;
>>         rtx addr = cse_process_notes (XEXP (x, 0), x, &addr_changed);
>>         if (addr_changed && validate_change (x, &XEXP (x, 0), addr, 0))
>> 	  *changed = true;
>>         return x;
>>       }
>> 
>>     which makes it safe to do the validation only at the top level
>>     of the MEM.
>> 
>> (2) make the:
>> 
>>        for (i = 0; i < GET_RTX_LENGTH (code); i++)
>>          if (fmt[i] == 'e')
>> 
>>     loop assign directly to XEXP (x, i) if the new rtx is different,
>>     instead of using validate_change.  This ensures that temporarily
>>     invalid rtl doesn't get verified.
>> 
>>     (passing a null object to validate_change would have the same effect,
>>     but that feels a bit hacky)
>> 
>> (3) use the simplify_binary_operator thing above, since that at least
>>     should be safe, even if it potentially leaves similar bugs unfixed
>>     for other operators.
>> 
>> Sorry for the runaround :-(
>
> No problem.  I appreciate the feedback and I agree we want to get this right.
> I'll implement the above and report back.  Thanks!

Sigh.  And now of course I realise that that too is flawed.  If we make
a replacement on a subexpression and the containing expression isn't
simplified, the validate_change for the MEM will end up being a no-op,
even if it shouldn't be.  So in one case we really need the recursive
validate_changes, and in one case we really want to avoid them.

I think at this point it would be better to go for the
simplify_replace_fn_rtx thing I mentioned in the original review,
rather than try to hack at the current code even more.  What do you
think about the attached?  Testing in progress.

(Sorry for going ahead and writing an alternative patch, since if we do
go for this, I guess the earlier misdirections will have wasted two days
of your time.  But it seemed like I was just never going to think about
this PR properly unless I actually tried to write something. :-()

Richard
Peter Bergner April 29, 2020, 9:15 p.m. UTC | #15
On 4/29/20 3:28 PM, Richard Sandiford wrote:
> (Sorry for going ahead and writing an alternative patch, since if we do
> go for this, I guess the earlier misdirections will have wasted two days
> of your time.  But it seemed like I was just never going to think about
> this PR properly unless I actually tried to write something. :-()

No worries from me!  I'm just glad to see this fixed before the release.
I'll kill off a bootstrap and regtest on powerpc64le-linux too, in addition
to your tests (arm & x86_64?).  Thanks for your help with this!

Peter
Peter Bergner April 29, 2020, 9:26 p.m. UTC | #16
On 4/29/20 3:28 PM, Richard Sandiford wrote:
> I think at this point it would be better to go for the
> simplify_replace_fn_rtx thing I mentioned in the original review,
> rather than try to hack at the current code even more.  What do you
> think about the attached?  Testing in progress.

My testing is still running, but I like the patch a lot!
It's nice when a patch not only fixes a bug, but makes the code
*MUCH* simpler too.  Well done!

Peter
Segher Boessenkool April 29, 2020, 10:54 p.m. UTC | #17
On Wed, Apr 29, 2020 at 08:56:11AM -0600, Jeff Law wrote:
> On Tue, 2020-04-28 at 20:03 -0500, Segher Boessenkool wrote:
> > What are the actual rules?  Where is this documented?
> > 
> > Of course it is highly desirable to have CONST around such constant
> > addresses, but when is it *required*?  And what *is* a constant address
> > (in this context)?
> I don't know if it's documented, but it's been expected practice for decades.
> 
> It includes symbolic constants, and the results of simple arithmetic on symbolic
> constants.  It (to the best of my knowledge) does not include logical operations
> on symbolic constants.

So should it be done for anything that is "i" as constraint?  Or only
for constant addresses?  In what sense constant, anyway?

None of this ever was clear to me...  I always did the "if it doesn't
ICE anymore, we're good" routine, but *everyone* seems to do that, and
that isn't so optimal ;-)


Segher
Peter Bergner April 29, 2020, 11:14 p.m. UTC | #18
On 4/29/20 4:15 PM, Peter Bergner wrote:
> On 4/29/20 3:28 PM, Richard Sandiford wrote:
>> (Sorry for going ahead and writing an alternative patch, since if we do
>> go for this, I guess the earlier misdirections will have wasted two days
>> of your time.  But it seemed like I was just never going to think about
>> this PR properly unless I actually tried to write something. :-()
> 
> No worries from me!  I'm just glad to see this fixed before the release.
> I'll kill off a bootstrap and regtest on powerpc64le-linux too, in addition
> to your tests (arm & x86_64?).  Thanks for your help with this!

My bootstrap and regtesting of your patch on powerpc64le-linux was clean.
Ship it! :-)

Peter
Li, Pan2 via Gcc-patches April 30, 2020, 3:18 a.m. UTC | #19
On Wed, 2020-04-29 at 17:54 -0500, Segher Boessenkool wrote:
> On Wed, Apr 29, 2020 at 08:56:11AM -0600, Jeff Law wrote:
> > On Tue, 2020-04-28 at 20:03 -0500, Segher Boessenkool wrote:
> > > What are the actual rules?  Where is this documented?
> > > 
> > > Of course it is highly desirable to have CONST around such constant
> > > addresses, but when is it *required*?  And what *is* a constant address
> > > (in this context)?
> > I don't know if it's documented, but it's been expected practice for decades.
> > 
> > It includes symbolic constants, and the results of simple arithmetic on
> > symbolic
> > constants.  It (to the best of my knowledge) does not include logical
> > operations
> > on symbolic constants.
> 
> So should it be done for anything that is "i" as constraint?  Or only
> for constant addresses?  In what sense constant, anyway?
Only symbolic constants.  ie, things involving SYMBOL_REF or LABEL_REF which are
constant when the program is run, but the exact value isn't known at compile
time.  Simple CONST_INT, CONST_DOUBLE and the like do not need the CONST
wrapping.
Jeff
Richard Sandiford April 30, 2020, 7:54 a.m. UTC | #20
Peter Bergner <bergner@linux.ibm.com> writes:
> On 4/29/20 4:15 PM, Peter Bergner wrote:
>> On 4/29/20 3:28 PM, Richard Sandiford wrote:
>>> (Sorry for going ahead and writing an alternative patch, since if we do
>>> go for this, I guess the earlier misdirections will have wasted two days
>>> of your time.  But it seemed like I was just never going to think about
>>> this PR properly unless I actually tried to write something. :-()
>> 
>> No worries from me!  I'm just glad to see this fixed before the release.
>> I'll kill off a bootstrap and regtest on powerpc64le-linux too, in addition
>> to your tests (arm & x86_64?).  Thanks for your help with this!
>
> My bootstrap and regtesting of your patch on powerpc64le-linux was clean.

Thanks.  aarch64-linux-gnu and x86_64-linux-gnu bootstrap & regtests
also came back clean.  I'll kick off an arm-linux-gnueabihf one too
just to be safe.

I guess at this point it needs a review from someone else though.
Jeff, WDYT?  Attached again below, this time without the shonky mime type.

Richard
From 39e20b51af8f977a1786ef5c15af80e47415d352 Mon Sep 17 00:00:00 2001
From: Richard Sandiford <richard.sandiford@arm.com>
Date: Wed, 29 Apr 2020 20:38:13 +0100
Subject: [PATCH] cse: Use simplify_replace_fn_rtx to process notes [PR94740]

cse_process_notes did a very simple substitution, which in the wrong
circumstances could create non-canonical RTL and invalid MEMs.
Various sticking plasters have been applied to cse_process_notes_1
to handle cases like ZERO_EXTEND, SIGN_EXTEND and UNSIGNED_FLOAT,
but I think this PR is a plaster too far.

The code is trying hard to avoid creating unnecessary rtl, which of
course is a good thing.  If we continue to do that, then we can end
up changing subexpressions while keeping the containing rtx.
This in turn means that validate_change will be a no-op on the
containing rtx, even if its contents have changed.  So in these
cases we have to apply validate_change to the individual subexpressions.

On the other hand, if we always apply validate_change to the
individual subexpressions, we'll end up calling validate_change
on something before it has been simplified and canonicalised.
And that's one of the situations we're trying to avoid.

There might be a middle ground in which we queue the validate_changes
as part of a group, and so can cancel the pending validate_changes
for subexpressions if there's a change in the outer expression.
But that seems even more ad-hoc than the current code.
It would also be quite an invasive change.

I think the best thing is just to hook into the existing
simplify_replace_fn_rtx function, keeping the REG and MEM handling
from cse_process_notes_1 essentially unchanged.  It can generate
more redundant rtl when a simplification takes place, but it has
the advantage of being relative well-used code (both directly
and via simplify_replace_rtx).

2020-04-29  Richard Sandiford  <richard.sandiford@arm.com>

gcc/
	PR rtl-optimization/94740
	* cse.c (cse_process_notes_1): Replace with...
	(cse_process_note_1): ...this new function, acting as a
	simplify_replace_fn_rtx callback to process_note.  Handle only
	REGs and MEMs directly.  Validate the MEM if cse_process_note
	changes its address.
	(cse_process_notes): Replace with...
	(cse_process_note): ...this new function.
	(cse_extended_basic_block): Update accordingly, iterating over
	the register notes and passing individual notes to cse_process_note.
---
 gcc/cse.c | 118 ++++++++++++++++--------------------------------------
 1 file changed, 34 insertions(+), 84 deletions(-)

diff --git a/gcc/cse.c b/gcc/cse.c
index 5aaba8d80e0..36bcfc354d8 100644
--- a/gcc/cse.c
+++ b/gcc/cse.c
@@ -585,7 +585,6 @@ static void cse_insn (rtx_insn *);
 static void cse_prescan_path (struct cse_basic_block_data *);
 static void invalidate_from_clobbers (rtx_insn *);
 static void invalidate_from_sets_and_clobbers (rtx_insn *);
-static rtx cse_process_notes (rtx, rtx, bool *);
 static void cse_extended_basic_block (struct cse_basic_block_data *);
 extern void dump_class (struct table_elt*);
 static void get_cse_reg_info_1 (unsigned int regno);
@@ -6222,75 +6221,28 @@ invalidate_from_sets_and_clobbers (rtx_insn *insn)
     }
 }
 
-/* Process X, part of the REG_NOTES of an insn.  Look at any REG_EQUAL notes
-   and replace any registers in them with either an equivalent constant
-   or the canonical form of the register.  If we are inside an address,
-   only do this if the address remains valid.
+static rtx cse_process_note (rtx);
 
-   OBJECT is 0 except when within a MEM in which case it is the MEM.
+/* A simplify_replace_fn_rtx callback for cse_process_note.  Process X,
+   part of the REG_NOTES of an insn.  Replace any registers with either
+   an equivalent constant or the canonical form of the register.
+   Only replace addresses if the containing MEM remains valid.
 
-   Return the replacement for X.  */
+   Return the replacement for X, or null if it should be simplified
+   recursively.  */
 
 static rtx
-cse_process_notes_1 (rtx x, rtx object, bool *changed)
+cse_process_note_1 (rtx x, const_rtx, void *)
 {
-  enum rtx_code code = GET_CODE (x);
-  const char *fmt = GET_RTX_FORMAT (code);
-  int i;
-
-  switch (code)
+  if (MEM_P (x))
     {
-    case CONST:
-    case SYMBOL_REF:
-    case LABEL_REF:
-    CASE_CONST_ANY:
-    case PC:
-    case CC0:
-    case LO_SUM:
+      validate_change (x, &XEXP (x, 0), cse_process_note (XEXP (x, 0)), false);
       return x;
+    }
 
-    case MEM:
-      validate_change (x, &XEXP (x, 0),
-		       cse_process_notes (XEXP (x, 0), x, changed), 0);
-      return x;
-
-    case EXPR_LIST:
-      if (REG_NOTE_KIND (x) == REG_EQUAL)
-	XEXP (x, 0) = cse_process_notes (XEXP (x, 0), NULL_RTX, changed);
-      /* Fall through.  */
-
-    case INSN_LIST:
-    case INT_LIST:
-      if (XEXP (x, 1))
-	XEXP (x, 1) = cse_process_notes (XEXP (x, 1), NULL_RTX, changed);
-      return x;
-
-    case SIGN_EXTEND:
-    case ZERO_EXTEND:
-    case SUBREG:
-      {
-	rtx new_rtx = cse_process_notes (XEXP (x, 0), object, changed);
-	/* We don't substitute VOIDmode constants into these rtx,
-	   since they would impede folding.  */
-	if (GET_MODE (new_rtx) != VOIDmode)
-	  validate_change (object, &XEXP (x, 0), new_rtx, 0);
-	return x;
-      }
-
-    case UNSIGNED_FLOAT:
-      {
-	rtx new_rtx = cse_process_notes (XEXP (x, 0), object, changed);
-	/* We don't substitute negative VOIDmode constants into these rtx,
-	   since they would impede folding.  */
-	if (GET_MODE (new_rtx) != VOIDmode
-	    || (CONST_INT_P (new_rtx) && INTVAL (new_rtx) >= 0)
-	    || (CONST_DOUBLE_P (new_rtx) && CONST_DOUBLE_HIGH (new_rtx) >= 0))
-	  validate_change (object, &XEXP (x, 0), new_rtx, 0);
-	return x;
-      }
-
-    case REG:
-      i = REG_QTY (REGNO (x));
+  if (REG_P (x))
+    {
+      int i = REG_QTY (REGNO (x));
 
       /* Return a constant or a constant register.  */
       if (REGNO_QTY_VALID_P (REGNO (x)))
@@ -6309,26 +6261,19 @@ cse_process_notes_1 (rtx x, rtx object, bool *changed)
 
       /* Otherwise, canonicalize this register.  */
       return canon_reg (x, NULL);
-
-    default:
-      break;
     }
 
-  for (i = 0; i < GET_RTX_LENGTH (code); i++)
-    if (fmt[i] == 'e')
-      validate_change (object, &XEXP (x, i),
-		       cse_process_notes (XEXP (x, i), object, changed), 0);
-
-  return x;
+  return NULL_RTX;
 }
 
+/* Process X, part of the REG_NOTES of an insn.  Replace any registers in it
+   with either an equivalent constant or the canonical form of the register.
+   Only replace addresses if the containing MEM remains valid.  */
+
 static rtx
-cse_process_notes (rtx x, rtx object, bool *changed)
+cse_process_note (rtx x)
 {
-  rtx new_rtx = cse_process_notes_1 (x, object, changed);
-  if (new_rtx != x)
-    *changed = true;
-  return new_rtx;
+  return simplify_replace_fn_rtx (x, NULL_RTX, cse_process_note_1, NULL);
 }
 
 
@@ -6623,14 +6568,19 @@ cse_extended_basic_block (struct cse_basic_block_data *ebb_data)
 	    {
 	      /* Process notes first so we have all notes in canonical forms
 		 when looking for duplicate operations.  */
-	      if (REG_NOTES (insn))
-		{
-		  bool changed = false;
-		  REG_NOTES (insn) = cse_process_notes (REG_NOTES (insn),
-						        NULL_RTX, &changed);
-		  if (changed)
-		    df_notes_rescan (insn);
-		}
+	      bool changed = false;
+	      for (rtx note = REG_NOTES (insn); note; note = XEXP (note, 1))
+		if (REG_NOTE_KIND (note) == REG_EQUAL)
+		  {
+		    rtx newval = cse_process_note (XEXP (note, 0));
+		    if (newval != XEXP (note, 0))
+		      {
+			XEXP (note, 0) = newval;
+			changed = true;
+		      }
+		  }
+	      if (changed)
+		df_notes_rescan (insn);
 
 	      cse_insn (insn);
Richard Sandiford April 30, 2020, 5:51 p.m. UTC | #21
Richard Sandiford <richard.sandiford@arm.com> writes:
> Peter Bergner <bergner@linux.ibm.com> writes:
>> On 4/29/20 4:15 PM, Peter Bergner wrote:
>>> On 4/29/20 3:28 PM, Richard Sandiford wrote:
>>>> (Sorry for going ahead and writing an alternative patch, since if we do
>>>> go for this, I guess the earlier misdirections will have wasted two days
>>>> of your time.  But it seemed like I was just never going to think about
>>>> this PR properly unless I actually tried to write something. :-()
>>> 
>>> No worries from me!  I'm just glad to see this fixed before the release.
>>> I'll kill off a bootstrap and regtest on powerpc64le-linux too, in addition
>>> to your tests (arm & x86_64?).  Thanks for your help with this!
>>
>> My bootstrap and regtesting of your patch on powerpc64le-linux was clean.
>
> Thanks.  aarch64-linux-gnu and x86_64-linux-gnu bootstrap & regtests
> also came back clean.  I'll kick off an arm-linux-gnueabihf one too
> just to be safe.

FTR, that came back clean too.

> I guess at this point it needs a review from someone else though.
> Jeff, WDYT?  Attached again below, this time without the shonky mime type.

Here's a different attempt to attach the patch (sorry -- experimenting
with different ways of attaching the patch inline, to see how the ML
archive reacts.)

Richard
From 39e20b51af8f977a1786ef5c15af80e47415d352 Mon Sep 17 00:00:00 2001
From: Richard Sandiford <richard.sandiford@arm.com>
Date: Wed, 29 Apr 2020 20:38:13 +0100
Subject: [PATCH] cse: Use simplify_replace_fn_rtx to process notes [PR94740]

cse_process_notes did a very simple substitution, which in the wrong
circumstances could create non-canonical RTL and invalid MEMs.
Various sticking plasters have been applied to cse_process_notes_1
to handle cases like ZERO_EXTEND, SIGN_EXTEND and UNSIGNED_FLOAT,
but I think this PR is a plaster too far.

The code is trying hard to avoid creating unnecessary rtl, which of
course is a good thing.  If we continue to do that, then we can end
up changing subexpressions while keeping the containing rtx.
This in turn means that validate_change will be a no-op on the
containing rtx, even if its contents have changed.  So in these
cases we have to apply validate_change to the individual subexpressions.

On the other hand, if we always apply validate_change to the
individual subexpressions, we'll end up calling validate_change
on something before it has been simplified and canonicalised.
And that's one of the situations we're trying to avoid.

There might be a middle ground in which we queue the validate_changes
as part of a group, and so can cancel the pending validate_changes
for subexpressions if there's a change in the outer expression.
But that seems even more ad-hoc than the current code.
It would also be quite an invasive change.

I think the best thing is just to hook into the existing
simplify_replace_fn_rtx function, keeping the REG and MEM handling
from cse_process_notes_1 essentially unchanged.  It can generate
more redundant rtl when a simplification takes place, but it has
the advantage of being relative well-used code (both directly
and via simplify_replace_rtx).

2020-04-29  Richard Sandiford  <richard.sandiford@arm.com>

gcc/
	PR rtl-optimization/94740
	* cse.c (cse_process_notes_1): Replace with...
	(cse_process_note_1): ...this new function, acting as a
	simplify_replace_fn_rtx callback to process_note.  Handle only
	REGs and MEMs directly.  Validate the MEM if cse_process_note
	changes its address.
	(cse_process_notes): Replace with...
	(cse_process_note): ...this new function.
	(cse_extended_basic_block): Update accordingly, iterating over
	the register notes and passing individual notes to cse_process_note.
---
 gcc/cse.c | 118 ++++++++++++++++--------------------------------------
 1 file changed, 34 insertions(+), 84 deletions(-)

diff --git a/gcc/cse.c b/gcc/cse.c
index 5aaba8d80e0..36bcfc354d8 100644
--- a/gcc/cse.c
+++ b/gcc/cse.c
@@ -585,7 +585,6 @@ static void cse_insn (rtx_insn *);
 static void cse_prescan_path (struct cse_basic_block_data *);
 static void invalidate_from_clobbers (rtx_insn *);
 static void invalidate_from_sets_and_clobbers (rtx_insn *);
-static rtx cse_process_notes (rtx, rtx, bool *);
 static void cse_extended_basic_block (struct cse_basic_block_data *);
 extern void dump_class (struct table_elt*);
 static void get_cse_reg_info_1 (unsigned int regno);
@@ -6222,75 +6221,28 @@ invalidate_from_sets_and_clobbers (rtx_insn *insn)
     }
 }
 
-/* Process X, part of the REG_NOTES of an insn.  Look at any REG_EQUAL notes
-   and replace any registers in them with either an equivalent constant
-   or the canonical form of the register.  If we are inside an address,
-   only do this if the address remains valid.
+static rtx cse_process_note (rtx);
 
-   OBJECT is 0 except when within a MEM in which case it is the MEM.
+/* A simplify_replace_fn_rtx callback for cse_process_note.  Process X,
+   part of the REG_NOTES of an insn.  Replace any registers with either
+   an equivalent constant or the canonical form of the register.
+   Only replace addresses if the containing MEM remains valid.
 
-   Return the replacement for X.  */
+   Return the replacement for X, or null if it should be simplified
+   recursively.  */
 
 static rtx
-cse_process_notes_1 (rtx x, rtx object, bool *changed)
+cse_process_note_1 (rtx x, const_rtx, void *)
 {
-  enum rtx_code code = GET_CODE (x);
-  const char *fmt = GET_RTX_FORMAT (code);
-  int i;
-
-  switch (code)
+  if (MEM_P (x))
     {
-    case CONST:
-    case SYMBOL_REF:
-    case LABEL_REF:
-    CASE_CONST_ANY:
-    case PC:
-    case CC0:
-    case LO_SUM:
+      validate_change (x, &XEXP (x, 0), cse_process_note (XEXP (x, 0)), false);
       return x;
+    }
 
-    case MEM:
-      validate_change (x, &XEXP (x, 0),
-		       cse_process_notes (XEXP (x, 0), x, changed), 0);
-      return x;
-
-    case EXPR_LIST:
-      if (REG_NOTE_KIND (x) == REG_EQUAL)
-	XEXP (x, 0) = cse_process_notes (XEXP (x, 0), NULL_RTX, changed);
-      /* Fall through.  */
-
-    case INSN_LIST:
-    case INT_LIST:
-      if (XEXP (x, 1))
-	XEXP (x, 1) = cse_process_notes (XEXP (x, 1), NULL_RTX, changed);
-      return x;
-
-    case SIGN_EXTEND:
-    case ZERO_EXTEND:
-    case SUBREG:
-      {
-	rtx new_rtx = cse_process_notes (XEXP (x, 0), object, changed);
-	/* We don't substitute VOIDmode constants into these rtx,
-	   since they would impede folding.  */
-	if (GET_MODE (new_rtx) != VOIDmode)
-	  validate_change (object, &XEXP (x, 0), new_rtx, 0);
-	return x;
-      }
-
-    case UNSIGNED_FLOAT:
-      {
-	rtx new_rtx = cse_process_notes (XEXP (x, 0), object, changed);
-	/* We don't substitute negative VOIDmode constants into these rtx,
-	   since they would impede folding.  */
-	if (GET_MODE (new_rtx) != VOIDmode
-	    || (CONST_INT_P (new_rtx) && INTVAL (new_rtx) >= 0)
-	    || (CONST_DOUBLE_P (new_rtx) && CONST_DOUBLE_HIGH (new_rtx) >= 0))
-	  validate_change (object, &XEXP (x, 0), new_rtx, 0);
-	return x;
-      }
-
-    case REG:
-      i = REG_QTY (REGNO (x));
+  if (REG_P (x))
+    {
+      int i = REG_QTY (REGNO (x));
 
       /* Return a constant or a constant register.  */
       if (REGNO_QTY_VALID_P (REGNO (x)))
@@ -6309,26 +6261,19 @@ cse_process_notes_1 (rtx x, rtx object, bool *changed)
 
       /* Otherwise, canonicalize this register.  */
       return canon_reg (x, NULL);
-
-    default:
-      break;
     }
 
-  for (i = 0; i < GET_RTX_LENGTH (code); i++)
-    if (fmt[i] == 'e')
-      validate_change (object, &XEXP (x, i),
-		       cse_process_notes (XEXP (x, i), object, changed), 0);
-
-  return x;
+  return NULL_RTX;
 }
 
+/* Process X, part of the REG_NOTES of an insn.  Replace any registers in it
+   with either an equivalent constant or the canonical form of the register.
+   Only replace addresses if the containing MEM remains valid.  */
+
 static rtx
-cse_process_notes (rtx x, rtx object, bool *changed)
+cse_process_note (rtx x)
 {
-  rtx new_rtx = cse_process_notes_1 (x, object, changed);
-  if (new_rtx != x)
-    *changed = true;
-  return new_rtx;
+  return simplify_replace_fn_rtx (x, NULL_RTX, cse_process_note_1, NULL);
 }
 
 
@@ -6623,14 +6568,19 @@ cse_extended_basic_block (struct cse_basic_block_data *ebb_data)
 	    {
 	      /* Process notes first so we have all notes in canonical forms
 		 when looking for duplicate operations.  */
-	      if (REG_NOTES (insn))
-		{
-		  bool changed = false;
-		  REG_NOTES (insn) = cse_process_notes (REG_NOTES (insn),
-						        NULL_RTX, &changed);
-		  if (changed)
-		    df_notes_rescan (insn);
-		}
+	      bool changed = false;
+	      for (rtx note = REG_NOTES (insn); note; note = XEXP (note, 1))
+		if (REG_NOTE_KIND (note) == REG_EQUAL)
+		  {
+		    rtx newval = cse_process_note (XEXP (note, 0));
+		    if (newval != XEXP (note, 0))
+		      {
+			XEXP (note, 0) = newval;
+			changed = true;
+		      }
+		  }
+	      if (changed)
+		df_notes_rescan (insn);
 
 	      cse_insn (insn);
Li, Pan2 via Gcc-patches April 30, 2020, 6:55 p.m. UTC | #22
On Thu, 2020-04-30 at 08:54 +0100, Richard Sandiford wrote:
> Peter Bergner <bergner@linux.ibm.com> writes:
> > On 4/29/20 4:15 PM, Peter Bergner wrote:
> > > On 4/29/20 3:28 PM, Richard Sandiford wrote:
> > > > (Sorry for going ahead and writing an alternative patch, since if we do
> > > > go for this, I guess the earlier misdirections will have wasted two days
> > > > of your time.  But it seemed like I was just never going to think about
> > > > this PR properly unless I actually tried to write something. :-()
> > > 
> > > No worries from me!  I'm just glad to see this fixed before the release.
> > > I'll kill off a bootstrap and regtest on powerpc64le-linux too, in addition
> > > to your tests (arm & x86_64?).  Thanks for your help with this!
> > 
> > My bootstrap and regtesting of your patch on powerpc64le-linux was clean.
> 
> Thanks.  aarch64-linux-gnu and x86_64-linux-gnu bootstrap & regtests
> also came back clean.  I'll kick off an arm-linux-gnueabihf one too
> just to be safe.
> 
> I guess at this point it needs a review from someone else though.
> Jeff, WDYT?  Attached again below, this time without the shonky mime type.
It looks reasonable reasonable to me.  Re-using simplify_replace_fn_rtx seems
like a major simplification, which is definitely good.  Presumably one of the
major goals here is to get the CONST wrapping from simplify_plus_minus?

Jeff

ps.  Both forms looked the same in my inbox.  Not sure how they showed up in the
archives.
Richard Sandiford April 30, 2020, 7:02 p.m. UTC | #23
Jeff Law <law@redhat.com> writes:
> On Thu, 2020-04-30 at 08:54 +0100, Richard Sandiford wrote:
>> Peter Bergner <bergner@linux.ibm.com> writes:
>> > On 4/29/20 4:15 PM, Peter Bergner wrote:
>> > > On 4/29/20 3:28 PM, Richard Sandiford wrote:
>> > > > (Sorry for going ahead and writing an alternative patch, since if we do
>> > > > go for this, I guess the earlier misdirections will have wasted two days
>> > > > of your time.  But it seemed like I was just never going to think about
>> > > > this PR properly unless I actually tried to write something. :-()
>> > > 
>> > > No worries from me!  I'm just glad to see this fixed before the release.
>> > > I'll kill off a bootstrap and regtest on powerpc64le-linux too, in addition
>> > > to your tests (arm & x86_64?).  Thanks for your help with this!
>> > 
>> > My bootstrap and regtesting of your patch on powerpc64le-linux was clean.
>> 
>> Thanks.  aarch64-linux-gnu and x86_64-linux-gnu bootstrap & regtests
>> also came back clean.  I'll kick off an arm-linux-gnueabihf one too
>> just to be safe.
>> 
>> I guess at this point it needs a review from someone else though.
>> Jeff, WDYT?  Attached again below, this time without the shonky mime type.
> It looks reasonable reasonable to me. Re-using simplify_replace_fn_rtx seems
> like a major simplification, which is definitely good.

Great, thanks!  Now pushed to master.

> Presumably one of the major goals here is to get the CONST wrapping
> from simplify_plus_minus?

Yeah, that's right (via very indirect means :-))

> ps.  Both forms looked the same in my inbox.  Not sure how they showed up in the
> archives.

The last one came out OK.  I'd missed specifying a charset for
the second one.

Richard
Li, Pan2 via Gcc-patches April 30, 2020, 7:14 p.m. UTC | #24
On Thu, 2020-04-30 at 20:02 +0100, Richard Sandiford wrote:
> Jeff Law <law@redhat.com> writes:
> > On Thu, 2020-04-30 at 08:54 +0100, Richard Sandiford wrote:
> > > Peter Bergner <bergner@linux.ibm.com> writes:
> > > > On 4/29/20 4:15 PM, Peter Bergner wrote:
> > > > > On 4/29/20 3:28 PM, Richard Sandiford wrote:
> > > > > > (Sorry for going ahead and writing an alternative patch, since if we
> > > > > > do
> > > > > > go for this, I guess the earlier misdirections will have wasted two
> > > > > > days
> > > > > > of your time.  But it seemed like I was just never going to think
> > > > > > about
> > > > > > this PR properly unless I actually tried to write something. :-()
> > > > > 
> > > > > No worries from me!  I'm just glad to see this fixed before the
> > > > > release.
> > > > > I'll kill off a bootstrap and regtest on powerpc64le-linux too, in
> > > > > addition
> > > > > to your tests (arm & x86_64?).  Thanks for your help with this!
> > > > 
> > > > My bootstrap and regtesting of your patch on powerpc64le-linux was clean.
> > > 
> > > Thanks.  aarch64-linux-gnu and x86_64-linux-gnu bootstrap & regtests
> > > also came back clean.  I'll kick off an arm-linux-gnueabihf one too
> > > just to be safe.
> > > 
> > > I guess at this point it needs a review from someone else though.
> > > Jeff, WDYT?  Attached again below, this time without the shonky mime type.
> > It looks reasonable reasonable to me. Re-using simplify_replace_fn_rtx seems
> > like a major simplification, which is definitely good.
> 
> Great, thanks!  Now pushed to master.
> 
> > Presumably one of the major goals here is to get the CONST wrapping
> > from simplify_plus_minus?
> 
> Yeah, that's right (via very indirect means :-))
It was certainly indirect :-)  Thankfully there's only one place that does CONST
wrapping in simplify-rtx, so it was just a matter of walking up the most
interesting call sites.

Jeff
Peter Bergner April 30, 2020, 11:26 p.m. UTC | #25
On 4/30/20 2:02 PM, Richard Sandiford wrote:
> Jeff Law <law@redhat.com> writes:
>> On Thu, 2020-04-30 at 08:54 +0100, Richard Sandiford wrote:
>>> I guess at this point it needs a review from someone else though.
>>> Jeff, WDYT?  Attached again below, this time without the shonky mime type.
>> It looks reasonable reasonable to me. Re-using simplify_replace_fn_rtx seems
>> like a major simplification, which is definitely good.
> 
> Great, thanks!  Now pushed to master.

I pushed the test case to master too.  Thank you everyone!

Peter
diff mbox series

Patch

diff --git a/gcc/cse.c b/gcc/cse.c
index 5aaba8d80e0..cbe9e0a0692 100644
--- a/gcc/cse.c
+++ b/gcc/cse.c
@@ -6328,6 +6328,14 @@  cse_process_notes (rtx x, rtx object, bool *changed)
   rtx new_rtx = cse_process_notes_1 (x, object, changed);
   if (new_rtx != x)
     *changed = true;
+  if (*changed && object != NULL_RTX && MEM_P (object))
+    {
+      /* Call simplify_rtx on the updated address in case it is now
+	 a constant and needs to be wrapped with a (const: ...).  */
+      rtx simplified_rtx = simplify_rtx (new_rtx);
+      if (simplified_rtx)
+	new_rtx = simplified_rtx;
+    }
   return new_rtx;
 }
 
diff --git a/gcc/testsuite/gcc.target/powerpc/pr94740.c b/gcc/testsuite/gcc.target/powerpc/pr94740.c
new file mode 100644
index 00000000000..78e40fc84da
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr94740.c
@@ -0,0 +1,11 @@ 
+/* PR rtl-optimization/94740 */
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_future_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=future -mpcrel" } */
+
+int array[8];
+int
+foo (void)
+{
+  return __builtin_bswap32 (array[1]);
+}