diff mbox series

[aarch64,RFA,rtl-optimization/87763] Fix insv_1 and insv_2 for aarch64

Message ID 08fc62d8-2f8f-d07d-1656-78f36703cc22@redhat.com
State New
Headers show
Series [aarch64,RFA,rtl-optimization/87763] Fix insv_1 and insv_2 for aarch64 | expand

Commit Message

Jeff Law April 22, 2019, 3:09 p.m. UTC
This is a generalized version of the patch I sent last week.  This
variant fixes the remainder of the insv_1 and insv_2 failures on
aarch64.   In combination with an updated patch from Steve this should
be enough to fix 87763.




A couple notes.

First, it re-uses combine's make_field_assignment in the aarch64
backend.  So we don't have to duplicate any of that logic.  I scanned
make_field_assignment and its children to make sure it didn't use any
data that was only valid during the combine pass, but I could have
missed something.  This is one of those cases where encapsulating the
pass specific bits in a class really helps avoid problems...  Just
saying....

Second, it relaxes the main insv pattern to allow an immediate in the
operand predicate, but forces it into a register during LRA.  I don't
generally like having predicates that are looser than the constraints,
but it really helps here.  Basically we want to see the constant field
we're going to insert.

Primarily looking for feedback from the aarch64 maintainers on the
pattern and Segher on the re-use of make_field_assignment.

I've bootstrapped and regression tested on aarch64 and aarch64_be as
well as x86_64, ppc64le, ppc64, i686, etc.  It's also been regression
tested on a nice variety of *-elf targets.

Segher, Richard/James OK for the trunk?

Jeff

Comments

Kugan Vivekanandarajah April 23, 2019, 12:39 a.m. UTC | #1
Hi Jeff,

[...]

+  "#"
+  "&& 1"
+  [(const_int 0)]
+  "{
+     /* If we do not have an RMW operand, then copy the input
+ to the output before this insn.  Also modify the existing
+ insn in-place so we can have make_field_assignment actually
+ generate a suitable extraction.  */
+     if (!rtx_equal_p (operands[0], operands[1]))
+       {
+         emit_move_insn (operands[0], operands[1]);
+ XEXP (XEXP (SET_SRC (PATTERN (curr_insn)), 0), 0) = copy_rtx (operands[0]);
+       }
+
+     rtx make_field_assignment (rtx);
+     rtx newpat = make_field_assignment (PATTERN (curr_insn));
+     gcc_assert (newpat);
+     emit_insn (newpat);
+     DONE;

It seems that make_field_assignment returns a new pattern only  if it
succeeds and returns the same pattern otherwise. So I am wondering if
it is worth simplifying the above. Like removing the assert and
checking/inserting move only when new pattern is returned?

Thanks,
Kugan



>
> Jeff
Jeff Law April 23, 2019, 1:42 a.m. UTC | #2
On 4/22/19 6:39 PM, Kugan Vivekanandarajah wrote:
> Hi Jeff,
> 
> [...]
> 
> +  "#"
> +  "&& 1"
> +  [(const_int 0)]
> +  "{
> +     /* If we do not have an RMW operand, then copy the input
> + to the output before this insn.  Also modify the existing
> + insn in-place so we can have make_field_assignment actually
> + generate a suitable extraction.  */
> +     if (!rtx_equal_p (operands[0], operands[1]))
> +       {
> +         emit_move_insn (operands[0], operands[1]);
> + XEXP (XEXP (SET_SRC (PATTERN (curr_insn)), 0), 0) = copy_rtx (operands[0]);
> +       }
> +
> +     rtx make_field_assignment (rtx);
> +     rtx newpat = make_field_assignment (PATTERN (curr_insn));
> +     gcc_assert (newpat);
> +     emit_insn (newpat);
> +     DONE;
> 
> It seems that make_field_assignment returns a new pattern only  if it
> succeeds and returns the same pattern otherwise. So I am wondering if
> it is worth simplifying the above. Like removing the assert and
> checking/inserting move only when new pattern is returned?
Thanks, this is something I meant to clean up and just totally forgot.
If the assert ever triggers it means something has gone horribly wrong.
 Essentially the insn's condition wasn't sufficiently tight.

Catching it with the assert at this point is much easier to debug.  The
failure modes when you don't catch the failure to create a field
assignment here occur much later in the pipeline and it won't be obvious
why things went wrong (I know that from experience :-).

So please consider the assert to be

gcc_assert (newpat != PATTERN (curr_insn);

Jeff
Segher Boessenkool April 24, 2019, 1:05 p.m. UTC | #3
Hi Jeff,

On Mon, Apr 22, 2019 at 09:09:12AM -0600, Jeff Law wrote:
> First, it re-uses combine's make_field_assignment in the aarch64
> backend.

That's not really acceptable.  make_field_assignment depends intimately
on make_extraction, which is very much combine-specific (and wrong in so
many ways it's hard to tell).

If you now use this in other places than combine, we will *never* be able
to fix it.  It's also shoddy architecture, of course.

> So we don't have to duplicate any of that logic.  I scanned
> make_field_assignment and its children to make sure it didn't use any
> data that was only valid during the combine pass, but I could have
> missed something.  This is one of those cases where encapsulating the
> pass specific bits in a class really helps avoid problems...  Just
> saying....

It isn't the data that is the problem.  combine has almost no private
data.  But the problem is all the hidden assumptions combine makes, and
for example that it often make **non-canonical** RTL; this is no problem
in combine, because that code will just not recog() (except of course
that you then get less well optimised code, in general).  (Often, combine
makes that RTL canonical later, fwiw).

But it _is_ a big problem in other contexts: you should not create
non-canonical RTL.  In your specific case, where you want to run this
from a splitter, you have to make sure you get only correct, recognised
patterns for your machine.  make_field_extraction cannot guarantee you
that.

> Second, it relaxes the main insv pattern to allow an immediate in the
> operand predicate, but forces it into a register during LRA.  I don't
> generally like having predicates that are looser than the constraints,
> but it really helps here.  Basically we want to see the constant field
> we're going to insert.

You also have an insn condition that can become invalid by the time the
pattern is split (the !reload_complted part), which means the pattern
will then *not* be split, which won't work here (the template is "#").

> -static int
> +int
>  get_pos_from_mask (unsigned HOST_WIDE_INT m, unsigned HOST_WIDE_INT *plen)

If you want to reuse this, please give it a better name, and move it to
rtlanal or some header file or similar?

It may be nicer to make a more machine-specific routine for this, see
rs6000_is_valid_mask for example.

> +;; This must be split before register allocation and reloading as
> +;; we need to verify it's actually an bitfield insertion by
> +;; examining both immediate operands.  After reload we will lose
> +;; knowledge of operands[3] constant status.

I don't understand what this means?  After reload operands[3] is still a
const_int?

> +(define_insn_and_split ""

Give this a name?  Starting with * if you don't want a gen_* for it.

> +  [(set (match_operand:GPI 0 "register_operand" "=r")
> +	(ior:GPI (and:GPI (match_operand:GPI 1 "register_operand" "0")
> +			  (match_operand:GPI 2 "const_int_operand" "n"))
> +		 (match_operand:GPI 3 "const_int_operand" "n")))]
> +  "(!reload_completed
> +    /* make_field_assignment doesn't handle subregs well.  */
> +    && REG_P (operands[0])

Another reason to not use make_field_assignment, I'm afraid :-(


Segher
Jeff Law April 24, 2019, 6:16 p.m. UTC | #4
On 4/24/19 7:05 AM, Segher Boessenkool wrote:
> Hi Jeff,
> 
> On Mon, Apr 22, 2019 at 09:09:12AM -0600, Jeff Law wrote:
>> First, it re-uses combine's make_field_assignment in the aarch64
>> backend.
> 
> That's not really acceptable.  make_field_assignment depends intimately
> on make_extraction, which is very much combine-specific (and wrong in so
> many ways it's hard to tell).
I think the structure of the pattern and its condition avoid the most
problematic issues.  But yea, it's not ideal.

>> So we don't have to duplicate any of that logic.  I scanned
>> make_field_assignment and its children to make sure it didn't use any
>> data that was only valid during the combine pass, but I could have
>> missed something.  This is one of those cases where encapsulating the
>> pass specific bits in a class really helps avoid problems...  Just
>> saying....
> 
> It isn't the data that is the problem.  combine has almost no private
> data.  But the problem is all the hidden assumptions combine makes, and
> for example that it often make **non-canonical** RTL; this is no problem
> in combine, because that code will just not recog() (except of course
> that you then get less well optimised code, in general).  (Often, combine
> makes that RTL canonical later, fwiw).
> 
> But it _is_ a big problem in other contexts: you should not create
> non-canonical RTL.  In your specific case, where you want to run this
> from a splitter, you have to make sure you get only correct, recognised
> patterns for your machine.  make_field_extraction cannot guarantee you
> that.
Even in the case where the form and operands are limited to what's in
the new pattern?  My worry with make_extraction is those cases where it
gets too smart for its own good and returns something even simpler than
extraction.  I'm not sure that can happen in this case or not.

I'm not keen on duplicating the logic from make_field_assignment and
make_extraction.  Though again, given the limited forms we accept it may
not be too bad.

> 
>> Second, it relaxes the main insv pattern to allow an immediate in the
>> operand predicate, but forces it into a register during LRA.  I don't
>> generally like having predicates that are looser than the constraints,
>> but it really helps here.  Basically we want to see the constant field
>> we're going to insert.
> 
> You also have an insn condition that can become invalid by the time the
> pattern is split (the !reload_complted part), which means the pattern
> will then *not* be split, which won't work here (the template is "#").
The pattern is split by the splitting pass before register allocation
and reload.  It's never supposed to exist beyond that splitting pass.
I'm pretty sure we've done this kind of thing regularly in the past.


> 
>> -static int
>> +int
>>  get_pos_from_mask (unsigned HOST_WIDE_INT m, unsigned HOST_WIDE_INT *plen)
> 
> If you want to reuse this, please give it a better name, and move it to
> rtlanal or some header file or similar?
If we go forward, yea, makes sense.

> 
> It may be nicer to make a more machine-specific routine for this, see
> rs6000_is_valid_mask for example.
> 
>> +;; This must be split before register allocation and reloading as
>> +;; we need to verify it's actually an bitfield insertion by
>> +;; examining both immediate operands.  After reload we will lose
>> +;; knowledge of operands[3] constant status.
> 
> I don't understand what this means?  After reload operands[3] is still a
> const_int?
It has to be an integer prior to register allocation.  Otherwise we
can't verify we have a valid field assignment.

The pattern should never exist past the insn splitting pass.  The
comment is obsolete.  It should always be split prior to register
allocation into a field assignment.



> 
>> +(define_insn_and_split ""
> 
> Give this a name?  Starting with * if you don't want a gen_* for it.
Didn't figure it was worth the trouble.  But it's easy enough to do.

> 
>> +  [(set (match_operand:GPI 0 "register_operand" "=r")
>> +	(ior:GPI (and:GPI (match_operand:GPI 1 "register_operand" "0")
>> +			  (match_operand:GPI 2 "const_int_operand" "n"))
>> +		 (match_operand:GPI 3 "const_int_operand" "n")))]
>> +  "(!reload_completed
>> +    /* make_field_assignment doesn't handle subregs well.  */
>> +    && REG_P (operands[0])
> 
> Another reason to not use make_field_assignment, I'm afraid :-(
Perhaps.  I'd probably want to avoid SUBREGs here regardless :-)


jeff
Segher Boessenkool April 25, 2019, 12:54 p.m. UTC | #5
On Wed, Apr 24, 2019 at 12:16:39PM -0600, Jeff Law wrote:
> On 4/24/19 7:05 AM, Segher Boessenkool wrote:
> > On Mon, Apr 22, 2019 at 09:09:12AM -0600, Jeff Law wrote:
> >> First, it re-uses combine's make_field_assignment in the aarch64
> >> backend.
> > 
> > That's not really acceptable.  make_field_assignment depends intimately
> > on make_extraction, which is very much combine-specific (and wrong in so
> > many ways it's hard to tell).
> I think the structure of the pattern and its condition avoid the most
> problematic issues.  But yea, it's not ideal.

Oh, I'm not saying it does not work, I trust you tested it well.  But there
are so many ways this can completely break (with ICEs and everything), this
really isn't good.

You say all the problematic cases cannot happen...  So make a clone of this
function without any of those cases?  :-)

> > But it _is_ a big problem in other contexts: you should not create
> > non-canonical RTL.  In your specific case, where you want to run this
> > from a splitter, you have to make sure you get only correct, recognised
> > patterns for your machine.  make_field_extraction cannot guarantee you
> > that.
> Even in the case where the form and operands are limited to what's in
> the new pattern?  My worry with make_extraction is those cases where it
> gets too smart for its own good and returns something even simpler than
> extraction.  I'm not sure that can happen in this case or not.
> 
> I'm not keen on duplicating the logic from make_field_assignment and
> make_extraction.  Though again, given the limited forms we accept it may
> not be too bad.

Just praying that the RTL it creates is the RTL you prefer to see, or even
just RTL the backend can handle, doesn't sound too great.

(And any time this breaks it is a target bug, not combine).

> > You also have an insn condition that can become invalid by the time the
> > pattern is split (the !reload_complted part), which means the pattern
> > will then *not* be split, which won't work here (the template is "#").
> The pattern is split by the splitting pass before register allocation
> and reload.  It's never supposed to exist beyond that splitting pass.
> I'm pretty sure we've done this kind of thing regularly in the past.

Are you sure this code can never be created between the split pass and
when reload is finished?  Like maybe during LRA itself.  It's hard to
convince myself of this.

> >> +(define_insn_and_split ""
> > 
> > Give this a name?  Starting with * if you don't want a gen_* for it.
> Didn't figure it was worth the trouble.  But it's easy enough to do.

It helps debugging (and it helps naming stuff in the changelog :-) )

> >> +  [(set (match_operand:GPI 0 "register_operand" "=r")
> >> +	(ior:GPI (and:GPI (match_operand:GPI 1 "register_operand" "0")
> >> +			  (match_operand:GPI 2 "const_int_operand" "n"))
> >> +		 (match_operand:GPI 3 "const_int_operand" "n")))]
> >> +  "(!reload_completed
> >> +    /* make_field_assignment doesn't handle subregs well.  */
> >> +    && REG_P (operands[0])
> > 
> > Another reason to not use make_field_assignment, I'm afraid :-(
> Perhaps.  I'd probably want to avoid SUBREGs here regardless :-)

You get a subreg for using a 64-bit pseudo as 32-bit (a pseudo has only
one mode).  This happens a lot for insert insns, from what I see.


Segher
diff mbox series

Patch

diff --git a/gcc/combine.c b/gcc/combine.c
index 5616e6b1bac..03be306f5bc 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -449,8 +449,6 @@  static rtx expand_compound_operation (rtx);
 static const_rtx expand_field_assignment (const_rtx);
 static rtx make_extraction (machine_mode, rtx, HOST_WIDE_INT,
 			    rtx, unsigned HOST_WIDE_INT, int, int, int);
-static int get_pos_from_mask (unsigned HOST_WIDE_INT,
-			      unsigned HOST_WIDE_INT *);
 static rtx canon_reg_for_combine (rtx, rtx);
 static rtx force_int_to_mode (rtx, scalar_int_mode, scalar_int_mode,
 			      scalar_int_mode, unsigned HOST_WIDE_INT, int);
@@ -459,7 +457,6 @@  static rtx force_to_mode (rtx, machine_mode,
 static rtx if_then_else_cond (rtx, rtx *, rtx *);
 static rtx known_cond (rtx, enum rtx_code, rtx, rtx);
 static int rtx_equal_for_field_assignment_p (rtx, rtx, bool = false);
-static rtx make_field_assignment (rtx);
 static rtx apply_distributive_law (rtx);
 static rtx distribute_and_simplify_rtx (rtx, int);
 static rtx simplify_and_const_int_1 (scalar_int_mode, rtx,
@@ -8569,7 +8566,7 @@  make_compound_operation (rtx x, enum rtx_code in_code)
 
    *PLEN is set to the length of the field.  */
 
-static int
+int
 get_pos_from_mask (unsigned HOST_WIDE_INT m, unsigned HOST_WIDE_INT *plen)
 {
   /* Get the bit number of the first 1 bit from the right, -1 if none.  */
@@ -8584,7 +8581,8 @@  get_pos_from_mask (unsigned HOST_WIDE_INT m, unsigned HOST_WIDE_INT *plen)
   if (len <= 0)
     pos = -1;
 
-  *plen = len;
+  if (plen)
+    *plen = len;
   return pos;
 }
 
@@ -9745,7 +9743,7 @@  rtx_equal_for_field_assignment_p (rtx x, rtx y, bool widen_x)
 
    We only handle the most common cases.  */
 
-static rtx
+rtx
 make_field_assignment (rtx x)
 {
   rtx dest = SET_DEST (x);
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 5a1894063a1..a2a0a86cb51 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -5473,7 +5473,7 @@ 
   [(set (zero_extract:GPI (match_operand:GPI 0 "register_operand" "+r")
 			  (match_operand 1 "const_int_operand" "n")
 			  (match_operand 2 "const_int_operand" "n"))
-	(match_operand:GPI 3 "register_operand" "r"))]
+	(match_operand:GPI 3 "aarch64_reg_or_imm" "r"))]
   "!(UINTVAL (operands[1]) == 0
      || (UINTVAL (operands[2]) + UINTVAL (operands[1])
 	 > GET_MODE_BITSIZE (<MODE>mode)))"
@@ -5481,6 +5481,43 @@ 
   [(set_attr "type" "bfm")]
 )
 
+;; This must be split before register allocation and reloading as
+;; we need to verify it's actually an bitfield insertion by
+;; examining both immediate operands.  After reload we will lose
+;; knowledge of operands[3] constant status.
+;;
+(define_insn_and_split ""
+  [(set (match_operand:GPI 0 "register_operand" "=r")
+	(ior:GPI (and:GPI (match_operand:GPI 1 "register_operand" "0")
+			  (match_operand:GPI 2 "const_int_operand" "n"))
+		 (match_operand:GPI 3 "const_int_operand" "n")))]
+  "(!reload_completed
+    /* make_field_assignment doesn't handle subregs well.  */
+    && REG_P (operands[0])
+    && get_pos_from_mask (~UINTVAL (operands[2]), NULL) >= 0
+    && (UINTVAL (operands[2]) & UINTVAL (operands[3])) == 0)"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+  "{
+     /* If we do not have an RMW operand, then copy the input
+	to the output before this insn.  Also modify the existing
+	insn in-place so we can have make_field_assignment actually
+	generate a suitable extraction.  */
+     if (!rtx_equal_p (operands[0], operands[1]))
+       {
+         emit_move_insn (operands[0], operands[1]);
+	 XEXP (XEXP (SET_SRC (PATTERN (curr_insn)), 0), 0) = copy_rtx (operands[0]);
+       }
+
+     rtx make_field_assignment (rtx);
+     rtx newpat = make_field_assignment (PATTERN (curr_insn));
+     gcc_assert (newpat);
+     emit_insn (newpat);
+     DONE;
+   }"
+)
+
 (define_insn "*aarch64_bfi<GPI:mode><ALLX:mode>4"
   [(set (zero_extract:GPI (match_operand:GPI 0 "register_operand" "+r")
 			  (match_operand 1 "const_int_operand" "n")
diff --git a/gcc/rtl.h b/gcc/rtl.h
index b4a906f9181..5824d4a7770 100644
--- a/gcc/rtl.h
+++ b/gcc/rtl.h
@@ -4015,6 +4015,8 @@  extern rtx remove_death (unsigned int, rtx_insn *);
 extern void dump_combine_stats (FILE *);
 extern void dump_combine_total_stats (FILE *);
 extern rtx make_compound_operation (rtx, enum rtx_code);
+extern int get_pos_from_mask (unsigned HOST_WIDE_INT, unsigned HOST_WIDE_INT *);
+extern rtx make_field_assignment (rtx);
 
 /* In sched-rgn.c.  */
 extern void schedule_insns (void);