diff mbox series

[Aarch64] PR rtl-optimization/87763 - this patch fixes gcc.target/aarch64/lsl_asr_sbfiz.c

Message ID 78cb6082e68ed593a8e247c295ff3c415c2194ac.camel@marvell.com
State New
Headers show
Series [Aarch64] PR rtl-optimization/87763 - this patch fixes gcc.target/aarch64/lsl_asr_sbfiz.c | expand

Commit Message

Steve Ellcey April 10, 2019, 10:03 p.m. UTC
Here is another patch to fix one of the failures
listed in PR rtl-optimization/87763. This change
fixes gcc.target/aarch64/lsl_asr_sbfiz.c by adding
an alternative version of *ashiftsi_extv_bfiz that
has a subreg in it.

Tested with bootstrap and regression test run.

OK for checkin?

Steve Ellcey


2018-04-10  Steve Ellcey  <sellcey@marvell.com>

	PR rtl-optimization/87763
	* config/aarch64/aarch64.md (*ashiftsi_extv_bfiz_alt):
	New Instruction.

Comments

Steve Ellcey April 10, 2019, 10:35 p.m. UTC | #1
On Wed, 2019-04-10 at 15:03 -0700, Steve Ellcey wrote:
> Here is another patch to fix one of the failures
> listed in PR rtl-optimization/87763. This change
> fixes gcc.target/aarch64/lsl_asr_sbfiz.c by adding
> an alternative version of *ashiftsi_extv_bfiz that
> has a subreg in it.
> 
> Tested with bootstrap and regression test run.
> 
> OK for checkin?
> 
> Steve Ellcey
> 
> 
> 2018-04-10  Steve Ellcey  <sellcey@marvell.com>
> 
> 	PR rtl-optimization/87763
> 	* config/aarch64/aarch64.md (*ashiftsi_extv_bfiz_alt):
> 	New Instruction.

I forgot I sent this out before (in January).  It generated some
discussion then but no action.  So I guess this is a ping.

https://gcc.gnu.org/ml/gcc-patches/2019-01/msg01694.html

Steve Ellcey
sellcey@marvell.com
Steve Ellcey April 16, 2019, 4:29 p.m. UTC | #2
Re-ping.  I know there are discussions about bigger changes to fix the
various failures listed in PR rtl-optimization/87763 but this patch
at least fixes one of them (gcc.target/aarch64/lsl_asr_sbfiz.c).

Steve Ellcey
sellcey@marvell.com


On Wed, 2019-04-10 at 15:35 -0700, Steve Ellcey wrote:
> On Wed, 2019-04-10 at 15:03 -0700, Steve Ellcey wrote:
> > Here is another patch to fix one of the failures
> > listed in PR rtl-optimization/87763. This change
> > fixes gcc.target/aarch64/lsl_asr_sbfiz.c by adding
> > an alternative version of *ashiftsi_extv_bfiz that
> > has a subreg in it.
> > 
> > Tested with bootstrap and regression test run.
> > 
> > OK for checkin?
> > 
> > Steve Ellcey
> > 
> > 
> > 2018-04-10  Steve Ellcey  <sellcey@marvell.com>
> > 
> > 	PR rtl-optimization/87763
> > 	* config/aarch64/aarch64.md (*ashiftsi_extv_bfiz_alt):
> > 	New Instruction.
> 
> I forgot I sent this out before (in January).  It generated some
> discussion then but no action.  So I guess this is a ping.
> 
> https://gcc.gnu.org/ml/gcc-patches/2019-01/msg01694.html
> 
> Steve Ellcey
> sellcey@marvell.com
Richard Earnshaw (lists) April 17, 2019, 10:19 a.m. UTC | #3
On 10/04/2019 23:03, Steve Ellcey wrote:
> 
> Here is another patch to fix one of the failures
> listed in PR rtl-optimization/87763. This change
> fixes gcc.target/aarch64/lsl_asr_sbfiz.c by adding
> an alternative version of *ashiftsi_extv_bfiz that
> has a subreg in it.
> 
> Tested with bootstrap and regression test run.
> 
> OK for checkin?
> 
> Steve Ellcey
> 
> 
> 2018-04-10  Steve Ellcey  <sellcey@marvell.com>
> 
> 	PR rtl-optimization/87763
> 	* config/aarch64/aarch64.md (*ashiftsi_extv_bfiz_alt):
> 	New Instruction.
> 
> 
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index e0df975..04dc06f 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -5634,6 +5634,22 @@
>    [(set_attr "type" "bfx")]
>  )
>  
> +(define_insn "*ashiftsi_extv_bfiz_alt"
> +  [(set (match_operand:SI 0 "register_operand" "=r")
> +	(ashift:SI
> +	  (subreg:SI
> +	    (sign_extract:DI
> +	      (subreg:DI (match_operand:SI 1 "register_operand" "r") 0)
> +	      (match_operand 2 "aarch64_simd_shift_imm_offset_si" "n")
> +	      (const_int 0))
> +	    0)
> +	  (match_operand 3 "aarch64_simd_shift_imm_si" "n")))]
> +  "IN_RANGE (INTVAL (operands[2]) + INTVAL (operands[3]),
> +	     1, GET_MODE_BITSIZE (SImode) - 1)"
> +  "sbfiz\\t%w0, %w1, %3, %2"
> +  [(set_attr "type" "bfx")]
> +)
> +
>  ;; When the bit position and width of the equivalent extraction add up to 32
>  ;; we can use a W-reg LSL instruction taking advantage of the implicit
>  ;; zero-extension of the X-reg.
> 

I don't think this is right for big-endian, where the subreg offset is
not zero.  Perhaps you should look at using subreg_lowpart_operator.

Due to that, I think this also needs some test cases.

R.
Jeff Law April 17, 2019, 1:27 p.m. UTC | #4
On 4/17/19 4:19 AM, Richard Earnshaw (lists) wrote:
> On 10/04/2019 23:03, Steve Ellcey wrote:
>>
>> Here is another patch to fix one of the failures
>> listed in PR rtl-optimization/87763. This change
>> fixes gcc.target/aarch64/lsl_asr_sbfiz.c by adding
>> an alternative version of *ashiftsi_extv_bfiz that
>> has a subreg in it.
>>
>> Tested with bootstrap and regression test run.
>>
>> OK for checkin?
>>
>> Steve Ellcey
>>
>>
>> 2018-04-10  Steve Ellcey  <sellcey@marvell.com>
>>
>> 	PR rtl-optimization/87763
>> 	* config/aarch64/aarch64.md (*ashiftsi_extv_bfiz_alt):
>> 	New Instruction.
>>
>>
>> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
>> index e0df975..04dc06f 100644
>> --- a/gcc/config/aarch64/aarch64.md
>> +++ b/gcc/config/aarch64/aarch64.md
>> @@ -5634,6 +5634,22 @@
>>    [(set_attr "type" "bfx")]
>>  )
>>  
>> +(define_insn "*ashiftsi_extv_bfiz_alt"
>> +  [(set (match_operand:SI 0 "register_operand" "=r")
>> +	(ashift:SI
>> +	  (subreg:SI
>> +	    (sign_extract:DI
>> +	      (subreg:DI (match_operand:SI 1 "register_operand" "r") 0)
>> +	      (match_operand 2 "aarch64_simd_shift_imm_offset_si" "n")
>> +	      (const_int 0))
>> +	    0)
>> +	  (match_operand 3 "aarch64_simd_shift_imm_si" "n")))]
>> +  "IN_RANGE (INTVAL (operands[2]) + INTVAL (operands[3]),
>> +	     1, GET_MODE_BITSIZE (SImode) - 1)"
>> +  "sbfiz\\t%w0, %w1, %3, %2"
>> +  [(set_attr "type" "bfx")]
>> +)
>> +
>>  ;; When the bit position and width of the equivalent extraction add up to 32
>>  ;; we can use a W-reg LSL instruction taking advantage of the implicit
>>  ;; zero-extension of the X-reg.
>>
> 
> I don't think this is right for big-endian, where the subreg offset is
> not zero.  Perhaps you should look at using subreg_lowpart_operator.
As general guidance anytime I see a subreg in the .md files I suspect
we're likely gone the wrong direction at some point.

That doesn't mean we can't use subregs, nor does it mean it's wrong in
this instance, but it certainly makes me look at the changes more
carefully to see if we can do something earlier or later so that we're
not matching subreg expressions in the md files.

I agree that in this specific case, it's likely incorrect.
subreg_lowpart_* would likely help, either as a predicate or as an operator.

jeff
Jeff Law April 23, 2019, 3:03 p.m. UTC | #5
On 4/16/19 10:29 AM, Steve Ellcey wrote:
> Re-ping.  I know there are discussions about bigger changes to fix the
> various failures listed in PR rtl-optimization/87763 but this patch
> at least fixes one of them (gcc.target/aarch64/lsl_asr_sbfiz.c).
So one thing to ponder here is whether or not combine should be handling
the subregs better.  A year or so ago I improved make_field_assignment
to handle some cases where subregs were appearing.  It may be the case
that we just need further refinements to subreg handling.

At some point we call combine_simplify_rtx with:

(set (reg:SI 93)
    (ashift:SI (subreg:SI (sign_extract:DI (subreg:DI (reg:SI 95) 0)
                (const_int 3 [0x3])
                (const_int 0 [0])) 0)
        (const_int 19 [0x13])))

The subregs are obviously inconvenient.  More importantly, I think
they're redundant.  I think that's equivalent to:

(set (reg:SI 93)
    (ashift:SI (sign_extract:SI (reg:SI 95)
                (const_int 3 [0x3])
                (const_int 0 [0]))
        (const_int 19 [0x13])))


At which point I think it'd match the existing pattern in aarch64.md

Thoughts?

jeff
Jeff Law April 23, 2019, 3:29 p.m. UTC | #6
On 4/16/19 10:29 AM, Steve Ellcey wrote:
> Re-ping.  I know there are discussions about bigger changes to fix the
> various failures listed in PR rtl-optimization/87763 but this patch
> at least fixes one of them (gcc.target/aarch64/lsl_asr_sbfiz.c).
So we actually call simplify_set with:

(set (reg:SI 93)
    (ashiftrt:SI (ashift:SI (reg:SI 95)
            (const_int 29 [0x1d]))
        (const_int 10 [0xa])))

Then we expand that via make_compound_operation into:

(set (reg:SI 93)
    (ashift:SI (subreg:SI (sign_extract:DI (subreg:DI (reg:SI 95) 0)
                (const_int 3 [0x3])
                (const_int 0 [0])) 0)
        (const_int 19 [0x13])))


This is almost certainly related to the target preferring DImode for its
insertion/extraction insns.  See get_best_extraction_insn.

Still investigating...

jeff
Jeff Law April 23, 2019, 4:06 p.m. UTC | #7
On 4/16/19 10:29 AM, Steve Ellcey wrote:
> Re-ping.  I know there are discussions about bigger changes to fix the
> various failures listed in PR rtl-optimization/87763 but this patch
> at least fixes one of them (gcc.target/aarch64/lsl_asr_sbfiz.c).
So I think we can address this by making the extv/extzv expander handle
both modes just like we do for insv.    That's cleaner than addding new
patterns to match the subreg nonsense IMHO.

Something like this:

> (define_expand "<optab><mode>"
>   [(set (match_operand:GPI 0 "register_operand" "=r")
>         (ANY_EXTRACT:GPI (match_operand:GPI 1 "register_operand")
>                         (match_operand 2
>                           "aarch64_simd_shift_imm_offset_<mode>")
>                         (match_operand 3 "aarch64_simd_shift_imm_<mode>")))]
>   ""
>   {
>     if (!IN_RANGE (INTVAL (operands[2]) + INTVAL (operands[3]),
>                    1, GET_MODE_BITSIZE (<MODE>mode) - 1))
>      FAIL;
>   }
> )

Note how we're using <mode>, <MODE> and the GPI iterator.  With that
updated expander I get:

sbfiz32:
        sbfiz   w0, w0, 19, 3   // 12   [c=8 l=4]  *ashiftsi_extv_bfiz
        ret             // 21   [c=0 l=4]  *do_return


and

sbfiz64:
        sbfiz   x0, x0, 38, 6   // 12   [c=8 l=4]  *ashiftdi_extv_bfiz
        ret             // 21   [c=0 l=4]  *do_return


I'm going to throw that into the tester and see what we get...

jeff
Jeff Law April 26, 2019, 2:13 p.m. UTC | #8
On 4/16/19 10:29 AM, Steve Ellcey wrote:
> Re-ping.  I know there are discussions about bigger changes to fix the
> various failures listed in PR rtl-optimization/87763 but this patch
> at least fixes one of them (gcc.target/aarch64/lsl_asr_sbfiz.c).
> 
> Steve Ellcey
> sellcey@marvell.com
So here's my take on fixing the lsl_asr_sbfix.c regression.

As I mentioned earlier the problem is the aarch64 is using the old way
of describing bitfield extractions (extv, extzv).  Specifically it
defined a single expander that operated on the natural word mode (DImode).

This forces the input operand to be used in DImode as well.  So we get
those annoying subregs in the expressions that combine generates.  But
there's a better way :-)

The new way to handle this stuff is to define expanders for supported
modes (SI and DI for aarch64).  Interestingly enough the aarch64 port
already does this for bitfield insertions via insv.

So I made the obvious changes to the extv/extzv expander.  That fixes
the lsl_asr_sbfiz regression.  But doesn't bootstrap.  The availability
of the new expander makes extract_bit_field_using_extv want to extract a
field from an HFmode object and shove it into an SImode.  That runs
afoul of checks in validate_subreg (as it should).  We shouldn't be
using subregs to change the size of a floating point object, so that
needs to be filtered out in extract_bit_field_using_extv.

That fixes the bootstrap issue.  But regression testing trips over
ashtidisi.c.  That can be easily fixed by allowing zero/sign extractions
which change size.  ie:

(set (reg:DI) (sign_extract:DI (reg:SI) ...)))

Which seems like a reasonable thing to handle.

So here's what I've got.  I've bootstrapped and regression tested on
aarch64.  It's also bootstrapped on aarch64_be for good measure.

OK (from aarch64 maintainers) for the gcc-9 branch and trunk?  Or we
could just address on the trunk for gcc-10.  I don't have strong
preferences either way.

Jeff

ps.  Time to return insv regressions and address Segher's comments :-)
* aarch64.md (extv, extzv expander): Generalize so that it works
	for both SImode and DImode.
	(extv_3264, extzv_3264): New pattern to handle extractions with
	size change between the input and output operand.
	* expmed.c (extract_bitfield_using_extv): Do not allow changing
	sizes of floating point objects using SUBREGs.

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 5a1894063a1..13e2bca05a1 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -5390,16 +5390,16 @@
 ;; Bitfields
 ;; -------------------------------------------------------------------
 
-(define_expand "<optab>"
-  [(set (match_operand:DI 0 "register_operand" "=r")
-	(ANY_EXTRACT:DI (match_operand:DI 1 "register_operand")
+(define_expand "<optab><mode>"
+  [(set (match_operand:GPI 0 "register_operand" "=r")
+	(ANY_EXTRACT:GPI (match_operand:GPI 1 "register_operand")
 			(match_operand 2
-			  "aarch64_simd_shift_imm_offset_di")
-			(match_operand 3 "aarch64_simd_shift_imm_di")))]
+			  "aarch64_simd_shift_imm_offset_<mode>")
+			(match_operand 3 "aarch64_simd_shift_imm_<mode>")))]
   ""
   {
     if (!IN_RANGE (INTVAL (operands[2]) + INTVAL (operands[3]),
-		   1, GET_MODE_BITSIZE (DImode) - 1))
+		   1, GET_MODE_BITSIZE (<MODE>mode) - 1))
      FAIL;
   }
 )
@@ -5418,6 +5418,21 @@
   [(set_attr "type" "bfx")]
 )
 
+;; Similar to the previous pattern, but 32->64 extraction
+(define_insn "*<optab>_3264"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+	(ANY_EXTRACT:DI (match_operand:SI 1 "register_operand" "r")
+			 (match_operand 2
+			   "aarch64_simd_shift_imm_offset_si" "n")
+			 (match_operand 3
+			   "aarch64_simd_shift_imm_si" "n")))]
+  "IN_RANGE (INTVAL (operands[2]) + INTVAL (operands[3]),
+	     1, GET_MODE_BITSIZE (DImode) - 1)"
+  "<su>bfx\\t%x0, %x1, %3, %2"
+  [(set_attr "type" "bfx")]
+)
+
+
 ;; When the bit position and width add up to 32 we can use a W-reg LSR
 ;; instruction taking advantage of the implicit zero-extension of the X-reg.
 (define_split
diff --git a/gcc/expmed.c b/gcc/expmed.c
index d7f8e9a5d76..ce8f9595b9a 100644
--- a/gcc/expmed.c
+++ b/gcc/expmed.c
@@ -1544,7 +1544,14 @@ extract_bit_field_using_extv (const extraction_insn *extv, rtx op0,
 	 mode.  Instead, create a temporary and use convert_move to set
 	 the target.  */
       if (REG_P (target)
-	  && TRULY_NOOP_TRUNCATION_MODES_P (GET_MODE (target), ext_mode))
+	  && TRULY_NOOP_TRUNCATION_MODES_P (GET_MODE (target), ext_mode)
+	  /* We can't change the size of float mode subregs (see
+	     validate_subreg).  So if either mode is a floating point
+	     mode and the sizes are not equal, then reject doing this
+	     via gen_lowpart.  */
+	  && !((FLOAT_MODE_P (GET_MODE (target)) || FLOAT_MODE_P (ext_mode))
+	       && maybe_ne (GET_MODE_BITSIZE (GET_MODE (target)),
+			    GET_MODE_BITSIZE (ext_mode))))
 	{
 	  target = gen_lowpart (ext_mode, target);
 	  if (partial_subreg_p (GET_MODE (spec_target), ext_mode))
Richard Earnshaw (lists) April 26, 2019, 2:52 p.m. UTC | #9
On 26/04/2019 15:13, Jeff Law wrote:
> On 4/16/19 10:29 AM, Steve Ellcey wrote:
>> Re-ping.  I know there are discussions about bigger changes to fix the
>> various failures listed in PR rtl-optimization/87763 but this patch
>> at least fixes one of them (gcc.target/aarch64/lsl_asr_sbfiz.c).
>>
>> Steve Ellcey
>> sellcey@marvell.com
> So here's my take on fixing the lsl_asr_sbfix.c regression.
> 
> As I mentioned earlier the problem is the aarch64 is using the old way
> of describing bitfield extractions (extv, extzv).  Specifically it
> defined a single expander that operated on the natural word mode (DImode).
> 
> This forces the input operand to be used in DImode as well.  So we get
> those annoying subregs in the expressions that combine generates.  But
> there's a better way :-)
> 
> The new way to handle this stuff is to define expanders for supported
> modes (SI and DI for aarch64).  Interestingly enough the aarch64 port
> already does this for bitfield insertions via insv.
> 
> So I made the obvious changes to the extv/extzv expander.  That fixes
> the lsl_asr_sbfiz regression.  But doesn't bootstrap.  The availability
> of the new expander makes extract_bit_field_using_extv want to extract a
> field from an HFmode object and shove it into an SImode.  That runs
> afoul of checks in validate_subreg (as it should).  We shouldn't be
> using subregs to change the size of a floating point object, so that
> needs to be filtered out in extract_bit_field_using_extv.
> 
> That fixes the bootstrap issue.  But regression testing trips over
> ashtidisi.c.  That can be easily fixed by allowing zero/sign extractions
> which change size.  ie:
> 
> (set (reg:DI) (sign_extract:DI (reg:SI) ...)))
> 
> Which seems like a reasonable thing to handle.
> 
> So here's what I've got.  I've bootstrapped and regression tested on
> aarch64.  It's also bootstrapped on aarch64_be for good measure.
> 
> OK (from aarch64 maintainers) for the gcc-9 branch and trunk?  Or we
> could just address on the trunk for gcc-10.  I don't have strong
> preferences either way.
> 
> Jeff
> 
> ps.  Time to return insv regressions and address Segher's comments :-)
> 
> 
> 
> P
> 
> 	* aarch64.md (extv, extzv expander): Generalize so that it works
> 	for both SImode and DImode.
> 	(extv_3264, extzv_3264): New pattern to handle extractions with
> 	size change between the input and output operand.
> 	* expmed.c (extract_bitfield_using_extv): Do not allow changing
> 	sizes of floating point objects using SUBREGs.
> 
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 5a1894063a1..13e2bca05a1 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -5390,16 +5390,16 @@
>  ;; Bitfields
>  ;; -------------------------------------------------------------------
>  
> -(define_expand "<optab>"
> -  [(set (match_operand:DI 0 "register_operand" "=r")
> -	(ANY_EXTRACT:DI (match_operand:DI 1 "register_operand")
> +(define_expand "<optab><mode>"
> +  [(set (match_operand:GPI 0 "register_operand" "=r")
> +	(ANY_EXTRACT:GPI (match_operand:GPI 1 "register_operand")
>  			(match_operand 2
> -			  "aarch64_simd_shift_imm_offset_di")
> -			(match_operand 3 "aarch64_simd_shift_imm_di")))]
> +			  "aarch64_simd_shift_imm_offset_<mode>")
> +			(match_operand 3 "aarch64_simd_shift_imm_<mode>")))]
>    ""
>    {
>      if (!IN_RANGE (INTVAL (operands[2]) + INTVAL (operands[3]),
> -		   1, GET_MODE_BITSIZE (DImode) - 1))
> +		   1, GET_MODE_BITSIZE (<MODE>mode) - 1))
>       FAIL;
>    }
>  )
> @@ -5418,6 +5418,21 @@
>    [(set_attr "type" "bfx")]
>  )
>  
> +;; Similar to the previous pattern, but 32->64 extraction
> +(define_insn "*<optab>_3264"
> +  [(set (match_operand:DI 0 "register_operand" "=r")
> +	(ANY_EXTRACT:DI (match_operand:SI 1 "register_operand" "r")

So is that valid RTL (DI extract of SI value)?  Why wouldn't that just
use a paradoxical subreg to widen the register being extracted?

R.


> +			 (match_operand 2
> +			   "aarch64_simd_shift_imm_offset_si" "n")
> +			 (match_operand 3
> +			   "aarch64_simd_shift_imm_si" "n")))]
> +  "IN_RANGE (INTVAL (operands[2]) + INTVAL (operands[3]),
> +	     1, GET_MODE_BITSIZE (DImode) - 1)"
> +  "<su>bfx\\t%x0, %x1, %3, %2"
> +  [(set_attr "type" "bfx")]
> +)
> +
> +
>  ;; When the bit position and width add up to 32 we can use a W-reg LSR
>  ;; instruction taking advantage of the implicit zero-extension of the X-reg.
>  (define_split
> diff --git a/gcc/expmed.c b/gcc/expmed.c
> index d7f8e9a5d76..ce8f9595b9a 100644
> --- a/gcc/expmed.c
> +++ b/gcc/expmed.c
> @@ -1544,7 +1544,14 @@ extract_bit_field_using_extv (const extraction_insn *extv, rtx op0,
>  	 mode.  Instead, create a temporary and use convert_move to set
>  	 the target.  */
>        if (REG_P (target)
> -	  && TRULY_NOOP_TRUNCATION_MODES_P (GET_MODE (target), ext_mode))
> +	  && TRULY_NOOP_TRUNCATION_MODES_P (GET_MODE (target), ext_mode)
> +	  /* We can't change the size of float mode subregs (see
> +	     validate_subreg).  So if either mode is a floating point
> +	     mode and the sizes are not equal, then reject doing this
> +	     via gen_lowpart.  */
> +	  && !((FLOAT_MODE_P (GET_MODE (target)) || FLOAT_MODE_P (ext_mode))
> +	       && maybe_ne (GET_MODE_BITSIZE (GET_MODE (target)),
> +			    GET_MODE_BITSIZE (ext_mode))))
>  	{
>  	  target = gen_lowpart (ext_mode, target);
>  	  if (partial_subreg_p (GET_MODE (spec_target), ext_mode))
>
Jeff Law April 26, 2019, 2:57 p.m. UTC | #10
On 4/26/19 8:52 AM, Richard Earnshaw (lists) wrote:
> On 26/04/2019 15:13, Jeff Law wrote:
>> On 4/16/19 10:29 AM, Steve Ellcey wrote:
>>> Re-ping.  I know there are discussions about bigger changes to fix the
>>> various failures listed in PR rtl-optimization/87763 but this patch
>>> at least fixes one of them (gcc.target/aarch64/lsl_asr_sbfiz.c).
>>>
>>> Steve Ellcey
>>> sellcey@marvell.com
>> So here's my take on fixing the lsl_asr_sbfix.c regression.
>>
>> As I mentioned earlier the problem is the aarch64 is using the old way
>> of describing bitfield extractions (extv, extzv).  Specifically it
>> defined a single expander that operated on the natural word mode (DImode).
>>
>> This forces the input operand to be used in DImode as well.  So we get
>> those annoying subregs in the expressions that combine generates.  But
>> there's a better way :-)
>>
>> The new way to handle this stuff is to define expanders for supported
>> modes (SI and DI for aarch64).  Interestingly enough the aarch64 port
>> already does this for bitfield insertions via insv.
>>
>> So I made the obvious changes to the extv/extzv expander.  That fixes
>> the lsl_asr_sbfiz regression.  But doesn't bootstrap.  The availability
>> of the new expander makes extract_bit_field_using_extv want to extract a
>> field from an HFmode object and shove it into an SImode.  That runs
>> afoul of checks in validate_subreg (as it should).  We shouldn't be
>> using subregs to change the size of a floating point object, so that
>> needs to be filtered out in extract_bit_field_using_extv.
>>
>> That fixes the bootstrap issue.  But regression testing trips over
>> ashtidisi.c.  That can be easily fixed by allowing zero/sign extractions
>> which change size.  ie:
>>
>> (set (reg:DI) (sign_extract:DI (reg:SI) ...)))
>>
>> Which seems like a reasonable thing to handle.
>>
>> So here's what I've got.  I've bootstrapped and regression tested on
>> aarch64.  It's also bootstrapped on aarch64_be for good measure.
>>
>> OK (from aarch64 maintainers) for the gcc-9 branch and trunk?  Or we
>> could just address on the trunk for gcc-10.  I don't have strong
>> preferences either way.
>>
>> Jeff
>>
>> ps.  Time to return insv regressions and address Segher's comments :-)
>>
>>
>>
>> P
>>
>> 	* aarch64.md (extv, extzv expander): Generalize so that it works
>> 	for both SImode and DImode.
>> 	(extv_3264, extzv_3264): New pattern to handle extractions with
>> 	size change between the input and output operand.
>> 	* expmed.c (extract_bitfield_using_extv): Do not allow changing
>> 	sizes of floating point objects using SUBREGs.
>>
>> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
>> index 5a1894063a1..13e2bca05a1 100644
>> --- a/gcc/config/aarch64/aarch64.md
>> +++ b/gcc/config/aarch64/aarch64.md
>> @@ -5390,16 +5390,16 @@
>>  ;; Bitfields
>>  ;; -------------------------------------------------------------------
>>  
>> -(define_expand "<optab>"
>> -  [(set (match_operand:DI 0 "register_operand" "=r")
>> -	(ANY_EXTRACT:DI (match_operand:DI 1 "register_operand")
>> +(define_expand "<optab><mode>"
>> +  [(set (match_operand:GPI 0 "register_operand" "=r")
>> +	(ANY_EXTRACT:GPI (match_operand:GPI 1 "register_operand")
>>  			(match_operand 2
>> -			  "aarch64_simd_shift_imm_offset_di")
>> -			(match_operand 3 "aarch64_simd_shift_imm_di")))]
>> +			  "aarch64_simd_shift_imm_offset_<mode>")
>> +			(match_operand 3 "aarch64_simd_shift_imm_<mode>")))]
>>    ""
>>    {
>>      if (!IN_RANGE (INTVAL (operands[2]) + INTVAL (operands[3]),
>> -		   1, GET_MODE_BITSIZE (DImode) - 1))
>> +		   1, GET_MODE_BITSIZE (<MODE>mode) - 1))
>>       FAIL;
>>    }
>>  )
>> @@ -5418,6 +5418,21 @@
>>    [(set_attr "type" "bfx")]
>>  )
>>  
>> +;; Similar to the previous pattern, but 32->64 extraction
>> +(define_insn "*<optab>_3264"
>> +  [(set (match_operand:DI 0 "register_operand" "=r")
>> +	(ANY_EXTRACT:DI (match_operand:SI 1 "register_operand" "r")
> 
> So is that valid RTL (DI extract of SI value)?  Why wouldn't that just
> use a paradoxical subreg to widen the register being extracted?
It wasn't clear to me -- I couldn't find anything which indicated it
wasn't valid.  Given that the RTL code explicitly sign/zero extends the
value we should have the freedom to have the destination be wider.

No idea why combine didn't wrap it in a subreg when it didn't match the
first time around.  I suspect it's never been worth doing until we added
the multi-mode extraction expander support and the targets where we're
using multi-mode extraction expanders didn't have a testcase where it'd
matter.

Jeff
Jeff Law April 26, 2019, 4:08 p.m. UTC | #11
On 4/26/19 8:52 AM, Richard Earnshaw (lists) wrote:
> On 26/04/2019 15:13, Jeff Law wrote:
>> On 4/16/19 10:29 AM, Steve Ellcey wrote:
>>> Re-ping.  I know there are discussions about bigger changes to fix the
>>> various failures listed in PR rtl-optimization/87763 but this patch
>>> at least fixes one of them (gcc.target/aarch64/lsl_asr_sbfiz.c).
>>>
>>> Steve Ellcey
>>> sellcey@marvell.com
>> So here's my take on fixing the lsl_asr_sbfix.c regression.
>>
>> As I mentioned earlier the problem is the aarch64 is using the old way
>> of describing bitfield extractions (extv, extzv).  Specifically it
>> defined a single expander that operated on the natural word mode (DImode).
>>
>> This forces the input operand to be used in DImode as well.  So we get
>> those annoying subregs in the expressions that combine generates.  But
>> there's a better way :-)
>>
>> The new way to handle this stuff is to define expanders for supported
>> modes (SI and DI for aarch64).  Interestingly enough the aarch64 port
>> already does this for bitfield insertions via insv.
>>
>> So I made the obvious changes to the extv/extzv expander.  That fixes
>> the lsl_asr_sbfiz regression.  But doesn't bootstrap.  The availability
>> of the new expander makes extract_bit_field_using_extv want to extract a
>> field from an HFmode object and shove it into an SImode.  That runs
>> afoul of checks in validate_subreg (as it should).  We shouldn't be
>> using subregs to change the size of a floating point object, so that
>> needs to be filtered out in extract_bit_field_using_extv.
>>
>> That fixes the bootstrap issue.  But regression testing trips over
>> ashtidisi.c.  That can be easily fixed by allowing zero/sign extractions
>> which change size.  ie:
>>
>> (set (reg:DI) (sign_extract:DI (reg:SI) ...)))
>>
>> Which seems like a reasonable thing to handle.
>>
>> So here's what I've got.  I've bootstrapped and regression tested on
>> aarch64.  It's also bootstrapped on aarch64_be for good measure.
>>
>> OK (from aarch64 maintainers) for the gcc-9 branch and trunk?  Or we
>> could just address on the trunk for gcc-10.  I don't have strong
>> preferences either way.
>>
>> Jeff
>>
>> ps.  Time to return insv regressions and address Segher's comments :-)
>>
>>
>>
>> P
>>
>> 	* aarch64.md (extv, extzv expander): Generalize so that it works
>> 	for both SImode and DImode.
>> 	(extv_3264, extzv_3264): New pattern to handle extractions with
>> 	size change between the input and output operand.
>> 	* expmed.c (extract_bitfield_using_extv): Do not allow changing
>> 	sizes of floating point objects using SUBREGs.
>>
>> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
>> index 5a1894063a1..13e2bca05a1 100644
>> --- a/gcc/config/aarch64/aarch64.md
>> +++ b/gcc/config/aarch64/aarch64.md
>> @@ -5390,16 +5390,16 @@
>>  ;; Bitfields
>>  ;; -------------------------------------------------------------------
>>  
>> -(define_expand "<optab>"
>> -  [(set (match_operand:DI 0 "register_operand" "=r")
>> -	(ANY_EXTRACT:DI (match_operand:DI 1 "register_operand")
>> +(define_expand "<optab><mode>"
>> +  [(set (match_operand:GPI 0 "register_operand" "=r")
>> +	(ANY_EXTRACT:GPI (match_operand:GPI 1 "register_operand")
>>  			(match_operand 2
>> -			  "aarch64_simd_shift_imm_offset_di")
>> -			(match_operand 3 "aarch64_simd_shift_imm_di")))]
>> +			  "aarch64_simd_shift_imm_offset_<mode>")
>> +			(match_operand 3 "aarch64_simd_shift_imm_<mode>")))]
>>    ""
>>    {
>>      if (!IN_RANGE (INTVAL (operands[2]) + INTVAL (operands[3]),
>> -		   1, GET_MODE_BITSIZE (DImode) - 1))
>> +		   1, GET_MODE_BITSIZE (<MODE>mode) - 1))
>>       FAIL;
>>    }
>>  )
>> @@ -5418,6 +5418,21 @@
>>    [(set_attr "type" "bfx")]
>>  )
>>  
>> +;; Similar to the previous pattern, but 32->64 extraction
>> +(define_insn "*<optab>_3264"
>> +  [(set (match_operand:DI 0 "register_operand" "=r")
>> +	(ANY_EXTRACT:DI (match_operand:SI 1 "register_operand" "r")
> 
> So is that valid RTL (DI extract of SI value)?  Why wouldn't that just
> use a paradoxical subreg to widen the register being extracted?
[ ... ]
And for completeness, the patch also survived regression testing
aarch64be where it fixes the lsl_asr_sbfiz test.

I think the big question here is whether or not we consider this valid RTL.

jeff
Richard Earnshaw (lists) April 26, 2019, 5:06 p.m. UTC | #12
On 26/04/2019 17:08, Jeff Law wrote:
> On 4/26/19 8:52 AM, Richard Earnshaw (lists) wrote:
>> On 26/04/2019 15:13, Jeff Law wrote:
>>> On 4/16/19 10:29 AM, Steve Ellcey wrote:
>>>> Re-ping.  I know there are discussions about bigger changes to fix the
>>>> various failures listed in PR rtl-optimization/87763 but this patch
>>>> at least fixes one of them (gcc.target/aarch64/lsl_asr_sbfiz.c).
>>>>
>>>> Steve Ellcey
>>>> sellcey@marvell.com
>>> So here's my take on fixing the lsl_asr_sbfix.c regression.
>>>
>>> As I mentioned earlier the problem is the aarch64 is using the old way
>>> of describing bitfield extractions (extv, extzv).  Specifically it
>>> defined a single expander that operated on the natural word mode (DImode).
>>>
>>> This forces the input operand to be used in DImode as well.  So we get
>>> those annoying subregs in the expressions that combine generates.  But
>>> there's a better way :-)
>>>
>>> The new way to handle this stuff is to define expanders for supported
>>> modes (SI and DI for aarch64).  Interestingly enough the aarch64 port
>>> already does this for bitfield insertions via insv.
>>>
>>> So I made the obvious changes to the extv/extzv expander.  That fixes
>>> the lsl_asr_sbfiz regression.  But doesn't bootstrap.  The availability
>>> of the new expander makes extract_bit_field_using_extv want to extract a
>>> field from an HFmode object and shove it into an SImode.  That runs
>>> afoul of checks in validate_subreg (as it should).  We shouldn't be
>>> using subregs to change the size of a floating point object, so that
>>> needs to be filtered out in extract_bit_field_using_extv.
>>>
>>> That fixes the bootstrap issue.  But regression testing trips over
>>> ashtidisi.c.  That can be easily fixed by allowing zero/sign extractions
>>> which change size.  ie:
>>>
>>> (set (reg:DI) (sign_extract:DI (reg:SI) ...)))
>>>
>>> Which seems like a reasonable thing to handle.
>>>
>>> So here's what I've got.  I've bootstrapped and regression tested on
>>> aarch64.  It's also bootstrapped on aarch64_be for good measure.
>>>
>>> OK (from aarch64 maintainers) for the gcc-9 branch and trunk?  Or we
>>> could just address on the trunk for gcc-10.  I don't have strong
>>> preferences either way.
>>>
>>> Jeff
>>>
>>> ps.  Time to return insv regressions and address Segher's comments :-)
>>>
>>>
>>>
>>> P
>>>
>>> 	* aarch64.md (extv, extzv expander): Generalize so that it works
>>> 	for both SImode and DImode.
>>> 	(extv_3264, extzv_3264): New pattern to handle extractions with
>>> 	size change between the input and output operand.
>>> 	* expmed.c (extract_bitfield_using_extv): Do not allow changing
>>> 	sizes of floating point objects using SUBREGs.
>>>
>>> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
>>> index 5a1894063a1..13e2bca05a1 100644
>>> --- a/gcc/config/aarch64/aarch64.md
>>> +++ b/gcc/config/aarch64/aarch64.md
>>> @@ -5390,16 +5390,16 @@
>>>  ;; Bitfields
>>>  ;; -------------------------------------------------------------------
>>>  
>>> -(define_expand "<optab>"
>>> -  [(set (match_operand:DI 0 "register_operand" "=r")
>>> -	(ANY_EXTRACT:DI (match_operand:DI 1 "register_operand")
>>> +(define_expand "<optab><mode>"
>>> +  [(set (match_operand:GPI 0 "register_operand" "=r")
>>> +	(ANY_EXTRACT:GPI (match_operand:GPI 1 "register_operand")
>>>  			(match_operand 2
>>> -			  "aarch64_simd_shift_imm_offset_di")
>>> -			(match_operand 3 "aarch64_simd_shift_imm_di")))]
>>> +			  "aarch64_simd_shift_imm_offset_<mode>")
>>> +			(match_operand 3 "aarch64_simd_shift_imm_<mode>")))]
>>>    ""
>>>    {
>>>      if (!IN_RANGE (INTVAL (operands[2]) + INTVAL (operands[3]),
>>> -		   1, GET_MODE_BITSIZE (DImode) - 1))
>>> +		   1, GET_MODE_BITSIZE (<MODE>mode) - 1))
>>>       FAIL;
>>>    }
>>>  )
>>> @@ -5418,6 +5418,21 @@
>>>    [(set_attr "type" "bfx")]
>>>  )
>>>  
>>> +;; Similar to the previous pattern, but 32->64 extraction
>>> +(define_insn "*<optab>_3264"
>>> +  [(set (match_operand:DI 0 "register_operand" "=r")
>>> +	(ANY_EXTRACT:DI (match_operand:SI 1 "register_operand" "r")
>>
>> So is that valid RTL (DI extract of SI value)?  Why wouldn't that just
>> use a paradoxical subreg to widen the register being extracted?
> [ ... ]
> And for completeness, the patch also survived regression testing
> aarch64be where it fixes the lsl_asr_sbfiz test.
> 
> I think the big question here is whether or not we consider this valid RTL.
> 

The documentation for sign_extract says:

@item (sign_extract:@var{m} @var{loc} @var{size} @var{pos})

...

If @var{loc} is in memory, its mode must be a single-byte integer mode.
If @var{loc} is in a register, the mode to use is specified by the
operand of the @code{insv} or @code{extv} pattern
(@pxref{Standard Names}) and is usually a full-word integer mode,
which is the default if none is specified.

So it's a little unclear to me whether the mode of loc is ignored for
registers, or if it must match the mode of the extract.  It would be a
bit perverse if the mode on loc were completely ignored.  What would it
mean if the register was a floating-point or vector mode?

If we do allow a mode mismatch, should the extract be wider than the
contained register, or can it also be narrower (subject to the range
information being compatible with both the source and the destination)?

R.

> jeff
>
Segher Boessenkool April 26, 2019, 9:09 p.m. UTC | #13
On Fri, Apr 26, 2019 at 06:06:37PM +0100, Richard Earnshaw (lists) wrote:
> On 26/04/2019 17:08, Jeff Law wrote:
> >> So is that valid RTL (DI extract of SI value)?  Why wouldn't that just
> >> use a paradoxical subreg to widen the register being extracted?
> > [ ... ]
> > And for completeness, the patch also survived regression testing
> > aarch64be where it fixes the lsl_asr_sbfiz test.
> > 
> > I think the big question here is whether or not we consider this valid RTL.
> > 
> 
> The documentation for sign_extract says:
> 
> @item (sign_extract:@var{m} @var{loc} @var{size} @var{pos})
> 
> ...
> 
> If @var{loc} is in memory, its mode must be a single-byte integer mode.
> If @var{loc} is in a register, the mode to use is specified by the
> operand of the @code{insv} or @code{extv} pattern
> (@pxref{Standard Names}) and is usually a full-word integer mode,
> which is the default if none is specified.
> 
> So it's a little unclear to me whether the mode of loc is ignored for
> registers, or if it must match the mode of the extract.

It must use the mode the extv pattern says to use.   But you don't *have*
such a pattern here, you have extv<mode> instead.

It makes most sense if the mode for extv<mode> is the same both in and out
(it has only one mode in the pattern name, to start with), and for
sign_extract to be similar.  The docs aren't quite clear, but defining it
to have multiple modes doesn't really solve anything afaics, subregs work
just fine here?


Segher
Jeff Law April 26, 2019, 9:54 p.m. UTC | #14
On 4/26/19 3:09 PM, Segher Boessenkool wrote:
> On Fri, Apr 26, 2019 at 06:06:37PM +0100, Richard Earnshaw (lists) wrote:
>> On 26/04/2019 17:08, Jeff Law wrote:
>>>> So is that valid RTL (DI extract of SI value)?  Why wouldn't that just
>>>> use a paradoxical subreg to widen the register being extracted?
>>> [ ... ]
>>> And for completeness, the patch also survived regression testing
>>> aarch64be where it fixes the lsl_asr_sbfiz test.
>>>
>>> I think the big question here is whether or not we consider this valid RTL.
>>>
>>
>> The documentation for sign_extract says:
>>
>> @item (sign_extract:@var{m} @var{loc} @var{size} @var{pos})
>>
>> ...
>>
>> If @var{loc} is in memory, its mode must be a single-byte integer mode.
>> If @var{loc} is in a register, the mode to use is specified by the
>> operand of the @code{insv} or @code{extv} pattern
>> (@pxref{Standard Names}) and is usually a full-word integer mode,
>> which is the default if none is specified.
>>
>> So it's a little unclear to me whether the mode of loc is ignored for
>> registers, or if it must match the mode of the extract.
> 
> It must use the mode the extv pattern says to use.   But you don't *have*
> such a pattern here, you have extv<mode> instead.
Right.

> 
> It makes most sense if the mode for extv<mode> is the same both in and out
> (it has only one mode in the pattern name, to start with), and for
> sign_extract to be similar.  The docs aren't quite clear, but defining it
> to have multiple modes doesn't really solve anything afaics, subregs work
> just fine here?
You'd have to scatter them in the MD file.  That's generally frowned upon.

My argument is that we have very clear semantics here.  We're taking a
field from an object in a mode.  We zero or sign extract it to the mode
of the destination.   Two modes actually make reasonable sense here.

Think of it like zero_extend or sign_extend, but for a bitfield.

jeff
Richard Earnshaw (lists) April 26, 2019, 10:30 p.m. UTC | #15
On 26/04/2019 22:54, Jeff Law wrote:
> On 4/26/19 3:09 PM, Segher Boessenkool wrote:
>> On Fri, Apr 26, 2019 at 06:06:37PM +0100, Richard Earnshaw (lists) wrote:
>>> On 26/04/2019 17:08, Jeff Law wrote:
>>>>> So is that valid RTL (DI extract of SI value)?  Why wouldn't that just
>>>>> use a paradoxical subreg to widen the register being extracted?
>>>> [ ... ]
>>>> And for completeness, the patch also survived regression testing
>>>> aarch64be where it fixes the lsl_asr_sbfiz test.
>>>>
>>>> I think the big question here is whether or not we consider this valid RTL.
>>>>
>>>
>>> The documentation for sign_extract says:
>>>
>>> @item (sign_extract:@var{m} @var{loc} @var{size} @var{pos})
>>>
>>> ...
>>>
>>> If @var{loc} is in memory, its mode must be a single-byte integer mode.
>>> If @var{loc} is in a register, the mode to use is specified by the
>>> operand of the @code{insv} or @code{extv} pattern
>>> (@pxref{Standard Names}) and is usually a full-word integer mode,
>>> which is the default if none is specified.
>>>
>>> So it's a little unclear to me whether the mode of loc is ignored for
>>> registers, or if it must match the mode of the extract.
>>
>> It must use the mode the extv pattern says to use.   But you don't *have*
>> such a pattern here, you have extv<mode> instead.
> Right.
> 
>>
>> It makes most sense if the mode for extv<mode> is the same both in and out
>> (it has only one mode in the pattern name, to start with), and for
>> sign_extract to be similar.  The docs aren't quite clear, but defining it
>> to have multiple modes doesn't really solve anything afaics, subregs work
>> just fine here?
> You'd have to scatter them in the MD file.  That's generally frowned upon.

A subreg on a reg is fine (which is what we'd have in this specific
case).  It's when the subreg gets left on something else (other than a
mem) when the problems start.

> 
> My argument is that we have very clear semantics here.  We're taking a
> field from an object in a mode.  We zero or sign extract it to the mode
> of the destination.   Two modes actually make reasonable sense here.
> 
> Think of it like zero_extend or sign_extend, but for a bitfield.

This certainly makes sense, but the cases are more general since the
extraction might produce a result that is either wider or narrower than
the original reg being acted upon.

R.
Jeff Law April 26, 2019, 10:38 p.m. UTC | #16
On 4/26/19 4:30 PM, Richard Earnshaw (lists) wrote:
>>> It makes most sense if the mode for extv<mode> is the same both in and out
>>> (it has only one mode in the pattern name, to start with), and for
>>> sign_extract to be similar.  The docs aren't quite clear, but defining it
>>> to have multiple modes doesn't really solve anything afaics, subregs work
>>> just fine here?
>> You'd have to scatter them in the MD file.  That's generally frowned upon.
> 
> A subreg on a reg is fine (which is what we'd have in this specific
> case).  It's when the subreg gets left on something else (other than a
> mem) when the problems start.
I've generally found that we're better off avoiding (subreg (reg)) when
we reasonably can inside MD patterns.  But I won't object if you'd
prefer to go with a subreg style solution.

> 
>>
>> My argument is that we have very clear semantics here.  We're taking a
>> field from an object in a mode.  We zero or sign extract it to the mode
>> of the destination.   Two modes actually make reasonable sense here.
>>
>> Think of it like zero_extend or sign_extend, but for a bitfield.
> 
> This certainly makes sense, but the cases are more general since the
> extraction might produce a result that is either wider or narrower than
> the original reg being acted upon.
True, but aarch64 promotes just about everything to at least SImode.  So
supporting QI/HI destinations doesn't seem to make sense.  Supporting
different modes on the source could make sense, though I suspect the
vast majority of the benefit will be from SImode & DImode.

How do you want to proceed?   I'm just trying to move things along and
will happily step aside if you'd prefer to go with something like
Steve's approach.

Jeff
Segher Boessenkool April 26, 2019, 11:54 p.m. UTC | #17
On Fri, Apr 26, 2019 at 03:54:49PM -0600, Jeff Law wrote:
> > It makes most sense if the mode for extv<mode> is the same both in and out
> > (it has only one mode in the pattern name, to start with), and for
> > sign_extract to be similar.  The docs aren't quite clear, but defining it
> > to have multiple modes doesn't really solve anything afaics, subregs work
> > just fine here?
> You'd have to scatter them in the MD file.  That's generally frowned upon.

Why would you need that?  You just say  match_operand:M "register_operand"
and that will match a  subreg:M (reg:N)  just fine?

And on the outside...  Do you have insns that act on DImode regs but only
output to a SImode one?!

> My argument is that we have very clear semantics here.  We're taking a
> field from an object in a mode.  We zero or sign extract it to the mode
> of the destination.   Two modes actually make reasonable sense here.
> 
> Think of it like zero_extend or sign_extend, but for a bitfield.

The pattern with the standard name extv<mode> can only have one mode
obviously...  Do you want to define a new standard one (and handle it
everywhere you need to handle it)?  Or just have the aarch64 port define
its own non-standard pattern?  Or make extv<mode> have semantics that
are different per target, like extv?


Segher
Segher Boessenkool April 26, 2019, 11:59 p.m. UTC | #18
On Fri, Apr 26, 2019 at 11:30:47PM +0100, Richard Earnshaw (lists) wrote:
> A subreg on a reg is fine (which is what we'd have in this specific
> case).  It's when the subreg gets left on something else (other than a
> mem) when the problems start.

Yeah.  We typically push the subregs as far inward as we can.  I have a
patch for one case where we explicitly pull a subreg to the left instead,
and that destroys all of rs6000's insert patterns :-/


Segher
Jeff Law April 27, 2019, 2:37 p.m. UTC | #19
On 4/26/19 5:59 PM, Segher Boessenkool wrote:
> On Fri, Apr 26, 2019 at 11:30:47PM +0100, Richard Earnshaw (lists) wrote:
>> A subreg on a reg is fine (which is what we'd have in this specific
>> case).  It's when the subreg gets left on something else (other than a
>> mem) when the problems start.
> 
> Yeah.  We typically push the subregs as far inward as we can.  I have a
> patch for one case where we explicitly pull a subreg to the left instead,
> and that destroys all of rs6000's insert patterns :-/
Richard, if you go back to Steve's patch and its discussion from Jan
you'll see that the subregs are not just on the register operand, but
also on the sign_extract:

Quoting from Steve's post:

> 
> (set (reg:SI 93)
>     (ashift:SI (subreg:SI (sign_extract:DI (subreg:DI (reg:SI 95) 0)
>                 (const_int 3 [0x3])
>                 (const_int 0 [0])) 0)
>         (const_int 19 [0x13])))
> 
> 
> The subreg's were not there before.  My proposed fix is to add an new
> instruction like *ashiftsi_extv_bfiz but with the subregs.  This fixes
> lsl_asr_sbfiz.c.  Does this seem like the right way to fix this?
> 


To fix this without having the two mode pattern I've proposed or a
pattern which matches the subreg mess above we'd need combine or
simplify-rtx to simplify the above into something like this (From Andrew P):

> Seems to me rather this should have been simplified to just:
>  (set (reg:SI 93)
>      (ashift:SI (sign_extract:SI (reg:SI 95)
>                  (const_int 3 [0x3])
>                  (const_int 0 [0]))
>          (const_int 19 [0x13])))
> 
> Because the two subreg cancel each other out.
> This would be a thing to add to simplify-rtx.c.

Jeff
Segher Boessenkool April 27, 2019, 3:26 p.m. UTC | #20
On Sat, Apr 27, 2019 at 08:37:37AM -0600, Jeff Law wrote:
> To fix this without having the two mode pattern I've proposed or a
> pattern which matches the subreg mess above we'd need combine or
> simplify-rtx to simplify the above into something like this (From Andrew P):

Combine will do this just fine, if the backend didn't force DImode for
extv?  Can the extv pattern be made to work for GPI?


Segher
Jeff Law April 27, 2019, 3:39 p.m. UTC | #21
On 4/27/19 9:26 AM, Segher Boessenkool wrote:
> On Sat, Apr 27, 2019 at 08:37:37AM -0600, Jeff Law wrote:
>> To fix this without having the two mode pattern I've proposed or a
>> pattern which matches the subreg mess above we'd need combine or
>> simplify-rtx to simplify the above into something like this (From Andrew P):
> 
> Combine will do this just fine, if the backend didn't force DImode for
> extv?  Can the extv pattern be made to work for GPI?
It certainly doesn't.  Even if the extv expander is changed into an
extv<mode> for SI and DI.

jeff
Jeff Law April 27, 2019, 3:40 p.m. UTC | #22
On 4/27/19 9:26 AM, Segher Boessenkool wrote:
> On Sat, Apr 27, 2019 at 08:37:37AM -0600, Jeff Law wrote:
>> To fix this without having the two mode pattern I've proposed or a
>> pattern which matches the subreg mess above we'd need combine or
>> simplify-rtx to simplify the above into something like this (From Andrew P):
> 
> Combine will do this just fine, if the backend didn't force DImode for
> extv?  Can the extv pattern be made to work for GPI?
> 
> 
> Segher
> 
SOrry, misspoke.

If we create extv<mode> combine will do the right thing for the tests
mentioned in 87763, but we get a regression on a different test
(mentioned in my email) which requires the multi-mode pattern to resolve.

jeff
diff mbox series

Patch

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index e0df975..04dc06f 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -5634,6 +5634,22 @@ 
   [(set_attr "type" "bfx")]
 )
 
+(define_insn "*ashiftsi_extv_bfiz_alt"
+  [(set (match_operand:SI 0 "register_operand" "=r")
+	(ashift:SI
+	  (subreg:SI
+	    (sign_extract:DI
+	      (subreg:DI (match_operand:SI 1 "register_operand" "r") 0)
+	      (match_operand 2 "aarch64_simd_shift_imm_offset_si" "n")
+	      (const_int 0))
+	    0)
+	  (match_operand 3 "aarch64_simd_shift_imm_si" "n")))]
+  "IN_RANGE (INTVAL (operands[2]) + INTVAL (operands[3]),
+	     1, GET_MODE_BITSIZE (SImode) - 1)"
+  "sbfiz\\t%w0, %w1, %3, %2"
+  [(set_attr "type" "bfx")]
+)
+
 ;; When the bit position and width of the equivalent extraction add up to 32
 ;; we can use a W-reg LSL instruction taking advantage of the implicit
 ;; zero-extension of the X-reg.