diff mbox series

simplify-rtx: Extend (truncate (*extract ...)) fold [PR87763]

Message ID mpteevkhmws.fsf@arm.com
State New
Headers show
Series simplify-rtx: Extend (truncate (*extract ...)) fold [PR87763] | expand

Commit Message

Richard Sandiford Jan. 27, 2020, 4:41 p.m. UTC
In the gcc.target/aarch64/lsl_asr_sbfiz.c part of this PR, we have:

Failed to match this instruction:
(set (reg:SI 95)
    (ashift:SI (subreg:SI (sign_extract:DI (subreg:DI (reg:SI 97) 0)
                (const_int 3 [0x3])
                (const_int 0 [0])) 0)
        (const_int 19 [0x13])))

If we perform the natural simplification to:

(set (reg:SI 95)
    (ashift:SI (sign_extract:SI (reg:SI 97)
                (const_int 3 [0x3])
                (const_int 0 [0])) 0)
        (const_int 19 [0x13])))

then the pattern matches.  And it turns out that we do have a
simplification like that already, but it would only kick in for
extractions from a reg, not a subreg.  E.g.:

(set (reg:SI 95)
    (ashift:SI (subreg:SI (sign_extract:DI (reg:DI X)
                (const_int 3 [0x3])
                (const_int 0 [0])) 0)
        (const_int 19 [0x13])))

would simplify to:

(set (reg:SI 95)
    (ashift:SI (sign_extract:SI (subreg:SI (reg:DI X) 0)
                (const_int 3 [0x3])
                (const_int 0 [0])) 0)
        (const_int 19 [0x13])))

IMO the subreg case is even more obviously a simplification
than the bare reg case, since the net effect is to remove
either one or two subregs, rather than simply change the
position of a subreg/truncation.

However, doing that regressed gcc.dg/tree-ssa/pr64910-2.c
for -m32 on x86_64-linux-gnu, because we could then simplify
a :HI zero_extract to a :QI one.  The associated *testqi_ext_3
pattern did already seem to want to handle QImode extractions:

  "ix86_match_ccmode (insn, CCNOmode)
   && ((TARGET_64BIT && GET_MODE (operands[2]) == DImode)
       || GET_MODE (operands[2]) == SImode
       || GET_MODE (operands[2]) == HImode
       || GET_MODE (operands[2]) == QImode)

but I'm not sure how often the QI case would trigger in practice,
since the zero_extract mode was restricted to HI and above.  I checked
the other x86 patterns and couldn't see any other instances of this.

Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu,
OK to install?

Richard


2020-01-27  Richard Sandiford  <richard.sandiford@arm.com>

gcc/
	PR rtl-optimization/87763
	* simplify-rtx.c (simplify_truncation): Extend sign/zero_extract
	simplification to handle subregs as well as bare regs.
	* config/i386/i386.md (*testqi_ext_3): Match QI extracts too.
---
 gcc/config/i386/i386.md | 2 +-
 gcc/simplify-rtx.c      | 4 +++-
 2 files changed, 4 insertions(+), 2 deletions(-)

Comments

Jeff Law Jan. 27, 2020, 5:38 p.m. UTC | #1
On Mon, 2020-01-27 at 16:41 +0000, Richard Sandiford wrote:
> In the gcc.target/aarch64/lsl_asr_sbfiz.c part of this PR, we have:
> 
> Failed to match this instruction:
> (set (reg:SI 95)
>     (ashift:SI (subreg:SI (sign_extract:DI (subreg:DI (reg:SI 97) 0)
>                 (const_int 3 [0x3])
>                 (const_int 0 [0])) 0)
>         (const_int 19 [0x13])))
> 
> If we perform the natural simplification to:
> 
> (set (reg:SI 95)
>     (ashift:SI (sign_extract:SI (reg:SI 97)
>                 (const_int 3 [0x3])
>                 (const_int 0 [0])) 0)
>         (const_int 19 [0x13])))
> 
> then the pattern matches.  And it turns out that we do have a
> simplification like that already, but it would only kick in for
> extractions from a reg, not a subreg.  E.g.:
Yea.  I ran into similar problems with the extract/extend bits in
combine.  And we know it's a fairly general problem that we don't
handle SUBREGs anywhere near as consistently as REGs.


> 
> (set (reg:SI 95)
>     (ashift:SI (subreg:SI (sign_extract:DI (reg:DI X)
>                 (const_int 3 [0x3])
>                 (const_int 0 [0])) 0)
>         (const_int 19 [0x13])))
> 
> would simplify to:
> 
> (set (reg:SI 95)
>     (ashift:SI (sign_extract:SI (subreg:SI (reg:DI X) 0)
>                 (const_int 3 [0x3])
>                 (const_int 0 [0])) 0)
>         (const_int 19 [0x13])))
> 
> IMO the subreg case is even more obviously a simplification
> than the bare reg case, since the net effect is to remove
> either one or two subregs, rather than simply change the
> position of a subreg/truncation.
> 
> However, doing that regressed gcc.dg/tree-ssa/pr64910-2.c
> for -m32 on x86_64-linux-gnu, because we could then simplify
> a :HI zero_extract to a :QI one.  The associated *testqi_ext_3
> pattern did already seem to want to handle QImode extractions:
> 
>   "ix86_match_ccmode (insn, CCNOmode)
>    && ((TARGET_64BIT && GET_MODE (operands[2]) == DImode)
>        || GET_MODE (operands[2]) == SImode
>        || GET_MODE (operands[2]) == HImode
>        || GET_MODE (operands[2]) == QImode)
> 
> but I'm not sure how often the QI case would trigger in practice,
> since the zero_extract mode was restricted to HI and above.  I checked
> the other x86 patterns and couldn't see any other instances of this.
> 
> Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu,
> OK to install?
> 
> Richard
> 
> 
> 2020-01-27  Richard Sandiford  <richard.sandiford@arm.com>
> 
> gcc/
> 	PR rtl-optimization/87763
> 	* simplify-rtx.c (simplify_truncation): Extend sign/zero_extract
> 	simplification to handle subregs as well as bare regs.
> 	* config/i386/i386.md (*testqi_ext_3): Match QI extracts too.
Do you need to check for and reject paradoxicals here?  If not, OK as-
is.  If you need to check, then that's pre-approved as well.

jeff
>
Richard Sandiford Jan. 28, 2020, 11:08 a.m. UTC | #2
Jeff Law <law@redhat.com> writes:
> On Mon, 2020-01-27 at 16:41 +0000, Richard Sandiford wrote:
>> In the gcc.target/aarch64/lsl_asr_sbfiz.c part of this PR, we have:
>> 
>> Failed to match this instruction:
>> (set (reg:SI 95)
>>     (ashift:SI (subreg:SI (sign_extract:DI (subreg:DI (reg:SI 97) 0)
>>                 (const_int 3 [0x3])
>>                 (const_int 0 [0])) 0)
>>         (const_int 19 [0x13])))
>> 
>> If we perform the natural simplification to:
>> 
>> (set (reg:SI 95)
>>     (ashift:SI (sign_extract:SI (reg:SI 97)
>>                 (const_int 3 [0x3])
>>                 (const_int 0 [0])) 0)
>>         (const_int 19 [0x13])))
>> 
>> then the pattern matches.  And it turns out that we do have a
>> simplification like that already, but it would only kick in for
>> extractions from a reg, not a subreg.  E.g.:
> Yea.  I ran into similar problems with the extract/extend bits in
> combine.  And we know it's a fairly general problem that we don't
> handle SUBREGs anywhere near as consistently as REGs.
>
>
>> 
>> (set (reg:SI 95)
>>     (ashift:SI (subreg:SI (sign_extract:DI (reg:DI X)
>>                 (const_int 3 [0x3])
>>                 (const_int 0 [0])) 0)
>>         (const_int 19 [0x13])))
>> 
>> would simplify to:
>> 
>> (set (reg:SI 95)
>>     (ashift:SI (sign_extract:SI (subreg:SI (reg:DI X) 0)
>>                 (const_int 3 [0x3])
>>                 (const_int 0 [0])) 0)
>>         (const_int 19 [0x13])))
>> 
>> IMO the subreg case is even more obviously a simplification
>> than the bare reg case, since the net effect is to remove
>> either one or two subregs, rather than simply change the
>> position of a subreg/truncation.
>> 
>> However, doing that regressed gcc.dg/tree-ssa/pr64910-2.c
>> for -m32 on x86_64-linux-gnu, because we could then simplify
>> a :HI zero_extract to a :QI one.  The associated *testqi_ext_3
>> pattern did already seem to want to handle QImode extractions:
>> 
>>   "ix86_match_ccmode (insn, CCNOmode)
>>    && ((TARGET_64BIT && GET_MODE (operands[2]) == DImode)
>>        || GET_MODE (operands[2]) == SImode
>>        || GET_MODE (operands[2]) == HImode
>>        || GET_MODE (operands[2]) == QImode)
>> 
>> but I'm not sure how often the QI case would trigger in practice,
>> since the zero_extract mode was restricted to HI and above.  I checked
>> the other x86 patterns and couldn't see any other instances of this.
>> 
>> Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu,
>> OK to install?
>> 
>> Richard
>> 
>> 
>> 2020-01-27  Richard Sandiford  <richard.sandiford@arm.com>
>> 
>> gcc/
>> 	PR rtl-optimization/87763
>> 	* simplify-rtx.c (simplify_truncation): Extend sign/zero_extract
>> 	simplification to handle subregs as well as bare regs.
>> 	* config/i386/i386.md (*testqi_ext_3): Match QI extracts too.
> Do you need to check for and reject paradoxicals here?  If not, OK as-
> is.  If you need to check, then that's pre-approved as well.

Thanks, pushed.

On the paradoxical thing: the outer subreg is always non-paradoxical
for simplify_truncation, so we don't need to check that specifically.
For the inner subreg we need to handle the paradoxical case for the PR.

I wondered at one point about punting for non-paradoxical inner subregs,
but there didn't seem to be a good reason to keep an expression like
(truncate (op (truncate...))) over (op (truncate ...)).  If it turns
out there is a good reason though, changing SUBREG_P to
paradoxical_subreg_p would be fine in terms of this PR.

Richard
Andreas Schwab Jan. 29, 2020, 3:28 p.m. UTC | #3
On Jan 27 2020, Richard Sandiford wrote:

> 	* simplify-rtx.c (simplify_truncation): Extend sign/zero_extract
> 	simplification to handle subregs as well as bare regs.

That breaks gcc.target/m68k/pr39726.c

@@ -67,39 +67,39 @@ Disassembly of section .text:
 00000088 <a0wbs>:
   88:	302f 0006      	movew %sp@(6),%d0
   8c:	906f 000a      	subw %sp@(10),%d0
-  90:	4a00           	tstb %d0
-  92:	6a08           	bpls 9c <a0wbs+0x14>
-  94:	13fc 0001 0000 	moveb #1,0 <a0bs>
-  9a:	0000 
-			98: R_68K_32	v
-  9c:	4e75           	rts
+  90:	0240 0080      	andiw #128,%d0
+  94:	6708           	beqs 9e <a0wbs+0x16>
+  96:	13fc 0001 0000 	moveb #1,0 <a0bs>
+  9c:	0000 
+			9a: R_68K_32	v
+  9e:	4e75           	rts
 
-0000009e <a1wbs>:
-  9e:	302f 0006      	movew %sp@(6),%d0
-  a2:	d06f 000a      	addw %sp@(10),%d0
-  a6:	4a00           	tstb %d0
-  a8:	6a08           	bpls b2 <a1wbs+0x14>
-  aa:	13fc 0001 0000 	moveb #1,0 <a0bs>
-  b0:	0000 
-			ae: R_68K_32	v
-  b2:	4e75           	rts
+000000a0 <a1wbs>:
+  a0:	302f 0006      	movew %sp@(6),%d0
+  a4:	d06f 000a      	addw %sp@(10),%d0
+  a8:	0240 0080      	andiw #128,%d0
+  ac:	6708           	beqs b6 <a1wbs+0x16>
+  ae:	13fc 0001 0000 	moveb #1,0 <a0bs>
+  b4:	0000 
+			b2: R_68K_32	v
+  b6:	4e75           	rts
 
-000000b4 <a0w>:
-  b4:	302f 0006      	movew %sp@(6),%d0
-  b8:	906f 000a      	subw %sp@(10),%d0
-  bc:	0240 8421      	andiw #-31711,%d0
-  c0:	6708           	beqs ca <a0w+0x16>
-  c2:	13fc 0001 0000 	moveb #1,0 <a0bs>
-  c8:	0000 
-			c6: R_68K_32	v
-  ca:	4e75           	rts
+000000b8 <a0w>:
+  b8:	302f 0006      	movew %sp@(6),%d0
+  bc:	906f 000a      	subw %sp@(10),%d0
+  c0:	0240 8421      	andiw #-31711,%d0
+  c4:	6708           	beqs ce <a0w+0x16>
+  c6:	13fc 0001 0000 	moveb #1,0 <a0bs>
+  cc:	0000 
+			ca: R_68K_32	v
+  ce:	4e75           	rts
 
-000000cc <a1w>:
-  cc:	302f 0006      	movew %sp@(6),%d0
-  d0:	d06f 000a      	addw %sp@(10),%d0
-  d4:	0240 8421      	andiw #-31711,%d0
-  d8:	6708           	beqs e2 <a1w+0x16>
-  da:	13fc 0001 0000 	moveb #1,0 <a0bs>
-  e0:	0000 
-			de: R_68K_32	v
-  e2:	4e75           	rts
+000000d0 <a1w>:
+  d0:	302f 0006      	movew %sp@(6),%d0
+  d4:	d06f 000a      	addw %sp@(10),%d0
+  d8:	0240 8421      	andiw #-31711,%d0
+  dc:	6708           	beqs e6 <a1w+0x16>
+  de:	13fc 0001 0000 	moveb #1,0 <a0bs>
+  e4:	0000 
+			e2: R_68K_32	v
+  e6:	4e75           	rts

Andreas.
Richard Sandiford Jan. 29, 2020, 7:18 p.m. UTC | #4
Andreas Schwab <schwab@suse.de> writes:
> On Jan 27 2020, Richard Sandiford wrote:
>
>> 	* simplify-rtx.c (simplify_truncation): Extend sign/zero_extract
>> 	simplification to handle subregs as well as bare regs.
>
> That breaks gcc.target/m68k/pr39726.c

Gah.  Jeff pointed out off-list that it also broke
gcc.target/sh/pr64345-1.c on sh3-linux-gnu.  It didn't look like either
of them would be fixable in an acceptably non-invasive and unhacky way,
so I've reverted the patch "for now".

Thanks,
Richard
Jeff Law Jan. 30, 2020, 5:23 p.m. UTC | #5
On Wed, 2020-01-29 at 19:18 +0000, Richard Sandiford wrote:
> Andreas Schwab <schwab@suse.de> writes:
> > On Jan 27 2020, Richard Sandiford wrote:
> > 
> > > 	* simplify-rtx.c (simplify_truncation): Extend sign/zero_extract
> > > 	simplification to handle subregs as well as bare regs.
> > 
> > That breaks gcc.target/m68k/pr39726.c
> 
> Gah.  Jeff pointed out off-list that it also broke
> gcc.target/sh/pr64345-1.c on sh3-linux-gnu.  It didn't look like either
> of them would be fixable in an acceptably non-invasive and unhacky way,
> so I've reverted the patch "for now".
I would have considered letting those two targets regress those tests
to move forward on 87763.  aarch64 is (IMHO) more important than the sh
and m68k combined ;-)  It also seems to me that your patch generates
better RTL and that we could claim that a port that regresses on code
quality needs its port maintainer to step in and fix the port.

WRT the m68k issue I'd think it could be fixed by twiddling
cbranchsi4_btst_reg_insn_1 to accept a mode iterator on the
zero_extract and making some minor adjustments in its output code. 
Something like the attached.  I haven't tested it in any real way and
haven't really thought about whether or not it does the right thing for
a MEM operand.

I'd be surprised if the SH fix wasn't similar, but I know almost
nothing about the SH implementation.

Jeff
Jakub Jelinek Jan. 30, 2020, 5:27 p.m. UTC | #6
On Thu, Jan 30, 2020 at 10:23:35AM -0700, Jeff Law wrote:
> diff --git a/gcc/config/m68k/m68k.md b/gcc/config/m68k/m68k.md
> index 8e35357ea23..78c4cbe4753 100644
> --- a/gcc/config/m68k/m68k.md
> +++ b/gcc/config/m68k/m68k.md
> @@ -644,12 +644,12 @@
>    return m68k_output_branch_integer (code);
>  })
>  
> -(define_insn "cbranchsi4_btst_reg_insn_1"
> +(define_insn "cbranch<mode>4_btst_reg_insn_1"
>    [(set (pc)
>  	(if_then_else (match_operator 0 "equality_comparison_operator"
> -		       [(zero_extract:SI (match_operand:SI 1 "nonimmediate_operand" "do,dQ")
> -					 (const_int 1)
> -					 (match_operand:SI 2 "const_int_operand" "n,n"))
> +		       [(zero_extract:I (match_operand:I 1 "nonimmediate_operand" "do,dQ")
> +					(const_int 1)
> +					(match_operand:I 2 "const_int_operand" "n,n"))
>  			(const_int 0)])
>  		      (label_ref (match_operand 3 ""))
>  		      (pc)))]
> @@ -665,8 +665,9 @@
>      }
>    else
>      {
> -      operands[2] = GEN_INT (31 - INTVAL (operands[2]));
> -      code = m68k_output_btst (operands[2], operands[1], code, 31);
> +      operands[2] = GEN_INT (GET_MODE_BITSIZE (GET_MODE (operands[1]))
> +			     - INTVAL (operands[2]) - 1);
> +      code = m68k_output_btst (operands[2], operands[1], code, GET_MODE_BITSIZE (GET_MODE (operands[1])) - 1);

s/GET_MODE (operands[1])/<MODE>mode/g ?
Also, the last line is too long.

	Jakub
Jeff Law Jan. 30, 2020, 5:29 p.m. UTC | #7
On Thu, 2020-01-30 at 18:27 +0100, Jakub Jelinek wrote:
> On Thu, Jan 30, 2020 at 10:23:35AM -0700, Jeff Law wrote:
> > diff --git a/gcc/config/m68k/m68k.md b/gcc/config/m68k/m68k.md
> > index 8e35357ea23..78c4cbe4753 100644
> > --- a/gcc/config/m68k/m68k.md
> > +++ b/gcc/config/m68k/m68k.md
> > @@ -644,12 +644,12 @@
> >    return m68k_output_branch_integer (code);
> >  })
> >  
> > -(define_insn "cbranchsi4_btst_reg_insn_1"
> > +(define_insn "cbranch<mode>4_btst_reg_insn_1"
> >    [(set (pc)
> >  	(if_then_else (match_operator 0 "equality_comparison_operator"
> > -		       [(zero_extract:SI (match_operand:SI 1 "nonimmediate_operand" "do,dQ")
> > -					 (const_int 1)
> > -					 (match_operand:SI 2 "const_int_operand" "n,n"))
> > +		       [(zero_extract:I (match_operand:I 1 "nonimmediate_operand" "do,dQ")
> > +					(const_int 1)
> > +					(match_operand:I 2 "const_int_operand" "n,n"))
> >  			(const_int 0)])
> >  		      (label_ref (match_operand 3 ""))
> >  		      (pc)))]
> > @@ -665,8 +665,9 @@
> >      }
> >    else
> >      {
> > -      operands[2] = GEN_INT (31 - INTVAL (operands[2]));
> > -      code = m68k_output_btst (operands[2], operands[1], code, 31);
> > +      operands[2] = GEN_INT (GET_MODE_BITSIZE (GET_MODE (operands[1]))
> > +			     - INTVAL (operands[2]) - 1);
> > +      code = m68k_output_btst (operands[2], operands[1], code, GET_MODE_BITSIZE (GET_MODE (operands[1])) - 1);
> 
> s/GET_MODE (operands[1])/<MODE>mode/g ?
> Also, the last line is too long.
It's not meant for inclusion as-is, but to show how we might fix this
and allow us to move forward on 87763.
Jeff
Richard Sandiford Jan. 30, 2020, 5:54 p.m. UTC | #8
Jeff Law <law@redhat.com> writes:
> On Wed, 2020-01-29 at 19:18 +0000, Richard Sandiford wrote:
>> Andreas Schwab <schwab@suse.de> writes:
>> > On Jan 27 2020, Richard Sandiford wrote:
>> > 
>> > > 	* simplify-rtx.c (simplify_truncation): Extend sign/zero_extract
>> > > 	simplification to handle subregs as well as bare regs.
>> > 
>> > That breaks gcc.target/m68k/pr39726.c
>> 
>> Gah.  Jeff pointed out off-list that it also broke
>> gcc.target/sh/pr64345-1.c on sh3-linux-gnu.  It didn't look like either
>> of them would be fixable in an acceptably non-invasive and unhacky way,
>> so I've reverted the patch "for now".
> I would have considered letting those two targets regress those tests
> to move forward on 87763.  aarch64 is (IMHO) more important than the sh
> and m68k combined ;-)  It also seems to me that your patch generates
> better RTL and that we could claim that a port that regresses on code
> quality needs its port maintainer to step in and fix the port.
>
> WRT the m68k issue I'd think it could be fixed by twiddling
> cbranchsi4_btst_reg_insn_1 to accept a mode iterator on the
> zero_extract and making some minor adjustments in its output code. 
> Something like the attached.  I haven't tested it in any real way and
> haven't really thought about whether or not it does the right thing for
> a MEM operand.

It just feels like this is breaking the contract with extv and extzv,
where all *_extracts are supposed to be in the mode that those patterns
accept.  My i386 patch was doing that too TBH, I was just being
optimistic given that QImode was already handled by that pattern. :-)

So IMO my patch has too many knock-on effects for it to be suitable for
stage 4.  While we have make_extraction, we probably have to leave this
kind of expression untouched.

For AArch64 I was planning on adding a new pattern to match the
(subreg:SI (zero_extract:DI ...)) -- as one of the comments in the
PR suggested -- but with the subreg matched via subreg_lowpart_operator,
to make it endian-safe.  This is similar in spirit to the new i686
popcount patterns.

Thanks,
Richard

>
> I'd be surprised if the SH fix wasn't similar, but I know almost
> nothing about the SH implementation.
>
> Jeff
Jeff Law Jan. 31, 2020, 5:09 p.m. UTC | #9
On Thu, 2020-01-30 at 17:54 +0000, Richard Sandiford wrote:
> Jeff Law <law@redhat.com> writes:
> > On Wed, 2020-01-29 at 19:18 +0000, Richard Sandiford wrote:
> > > Andreas Schwab <schwab@suse.de> writes:
> > > > On Jan 27 2020, Richard Sandiford wrote:
> > > > 
> > > > > 	* simplify-rtx.c (simplify_truncation): Extend sign/zero_extract
> > > > > 	simplification to handle subregs as well as bare regs.
> > > > 
> > > > That breaks gcc.target/m68k/pr39726.c
> > > 
> > > Gah.  Jeff pointed out off-list that it also broke
> > > gcc.target/sh/pr64345-1.c on sh3-linux-gnu.  It didn't look like either
> > > of them would be fixable in an acceptably non-invasive and unhacky way,
> > > so I've reverted the patch "for now".
> > I would have considered letting those two targets regress those tests
> > to move forward on 87763.  aarch64 is (IMHO) more important than the sh
> > and m68k combined ;-)  It also seems to me that your patch generates
> > better RTL and that we could claim that a port that regresses on code
> > quality needs its port maintainer to step in and fix the port.
> > 
> > WRT the m68k issue I'd think it could be fixed by twiddling
> > cbranchsi4_btst_reg_insn_1 to accept a mode iterator on the
> > zero_extract and making some minor adjustments in its output code. 
> > Something like the attached.  I haven't tested it in any real way and
> > haven't really thought about whether or not it does the right thing for
> > a MEM operand.
> 
> It just feels like this is breaking the contract with extv and extzv,
> where all *_extracts are supposed to be in the mode that those patterns
> accept.  My i386 patch was doing that too TBH, I was just being
> optimistic given that QImode was already handled by that pattern. :-)
> 
> So IMO my patch has too many knock-on effects for it to be suitable for
> stage 4.  While we have make_extraction, we probably have to leave this
> kind of expression untouched.
> 
> For AArch64 I was planning on adding a new pattern to match the
> (subreg:SI (zero_extract:DI ...)) -- as one of the comments in the
> PR suggested -- but with the subreg matched via subreg_lowpart_operator,
> to make it endian-safe.  This is similar in spirit to the new i686
> popcount patterns.
Fair enough on the simplify-rtx change :-)  One might argue that we
should be loosening the requirements, but memory operands are
particularly troublesome.  I think HP and I had a light discussion on
that a year or so ago.

WRT adding patterns to aarch64.  Yea, I went that route last year, but
never was able to go from "hey, this seems to work pretty well" to
"it's OK for the trunk".

Jeff
Maxim Kuvyrkov Feb. 3, 2020, 9:29 a.m. UTC | #10
Hi Richard,

You patch below increases code-size on aarch64-linux-gnu with -Os on SPEC2k6 400.perlbench and 453.povray -- by 1% and 2% respectively.

400.perlbench,perlbench_base.default,	101,939261,951221
453.povray,povray_base.default,		102,707807,721399

Would you please check whether these can be avoided?  [Let me know if you need help reproducing this problem.]

Thank you,

--
Maxim Kuvyrkov
https://www.linaro.org

> On Jan 27, 2020, at 7:41 PM, Richard Sandiford <richard.sandiford@arm.com> wrote:
> 
> In the gcc.target/aarch64/lsl_asr_sbfiz.c part of this PR, we have:
> 
> Failed to match this instruction:
> (set (reg:SI 95)
>    (ashift:SI (subreg:SI (sign_extract:DI (subreg:DI (reg:SI 97) 0)
>                (const_int 3 [0x3])
>                (const_int 0 [0])) 0)
>        (const_int 19 [0x13])))
> 
> If we perform the natural simplification to:
> 
> (set (reg:SI 95)
>    (ashift:SI (sign_extract:SI (reg:SI 97)
>                (const_int 3 [0x3])
>                (const_int 0 [0])) 0)
>        (const_int 19 [0x13])))
> 
> then the pattern matches.  And it turns out that we do have a
> simplification like that already, but it would only kick in for
> extractions from a reg, not a subreg.  E.g.:
> 
> (set (reg:SI 95)
>    (ashift:SI (subreg:SI (sign_extract:DI (reg:DI X)
>                (const_int 3 [0x3])
>                (const_int 0 [0])) 0)
>        (const_int 19 [0x13])))
> 
> would simplify to:
> 
> (set (reg:SI 95)
>    (ashift:SI (sign_extract:SI (subreg:SI (reg:DI X) 0)
>                (const_int 3 [0x3])
>                (const_int 0 [0])) 0)
>        (const_int 19 [0x13])))
> 
> IMO the subreg case is even more obviously a simplification
> than the bare reg case, since the net effect is to remove
> either one or two subregs, rather than simply change the
> position of a subreg/truncation.
> 
> However, doing that regressed gcc.dg/tree-ssa/pr64910-2.c
> for -m32 on x86_64-linux-gnu, because we could then simplify
> a :HI zero_extract to a :QI one.  The associated *testqi_ext_3
> pattern did already seem to want to handle QImode extractions:
> 
>  "ix86_match_ccmode (insn, CCNOmode)
>   && ((TARGET_64BIT && GET_MODE (operands[2]) == DImode)
>       || GET_MODE (operands[2]) == SImode
>       || GET_MODE (operands[2]) == HImode
>       || GET_MODE (operands[2]) == QImode)
> 
> but I'm not sure how often the QI case would trigger in practice,
> since the zero_extract mode was restricted to HI and above.  I checked
> the other x86 patterns and couldn't see any other instances of this.
> 
> Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu,
> OK to install?
> 
> Richard
> 
> 
> 2020-01-27  Richard Sandiford  <richard.sandiford@arm.com>
> 
> gcc/
> 	PR rtl-optimization/87763
> 	* simplify-rtx.c (simplify_truncation): Extend sign/zero_extract
> 	simplification to handle subregs as well as bare regs.
> 	* config/i386/i386.md (*testqi_ext_3): Match QI extracts too.
> ---
> gcc/config/i386/i386.md | 2 +-
> gcc/simplify-rtx.c      | 4 +++-
> 2 files changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> index 6e9c9bd2fb6..a125ab350bb 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -8927,7 +8927,7 @@ (define_insn "*testqi_ext_2"
> (define_insn_and_split "*testqi_ext_3"
>   [(set (match_operand 0 "flags_reg_operand")
>         (match_operator 1 "compare_operator"
> -	  [(zero_extract:SWI248
> +	  [(zero_extract:SWI
> 	     (match_operand 2 "nonimmediate_operand" "rm")
> 	     (match_operand 3 "const_int_operand" "n")
> 	     (match_operand 4 "const_int_operand" "n"))
> diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
> index eff1d07a253..db4f9339c15 100644
> --- a/gcc/simplify-rtx.c
> +++ b/gcc/simplify-rtx.c
> @@ -736,7 +736,9 @@ simplify_truncation (machine_mode mode, rtx op,
>      (*_extract:M1 (truncate:M1 (reg:M2)) (len) (pos')) if possible without
>      changing len.  */
>   if ((GET_CODE (op) == ZERO_EXTRACT || GET_CODE (op) == SIGN_EXTRACT)
> -      && REG_P (XEXP (op, 0))
> +      && (REG_P (XEXP (op, 0))
> +	  || (SUBREG_P (XEXP (op, 0))
> +	      && REG_P (SUBREG_REG (XEXP (op, 0)))))
>       && GET_MODE (XEXP (op, 0)) == GET_MODE (op)
>       && CONST_INT_P (XEXP (op, 1))
>       && CONST_INT_P (XEXP (op, 2)))
> -- 
> 2.17.1
>
Richard Sandiford Feb. 3, 2020, 1:15 p.m. UTC | #11
Maxim Kuvyrkov <maxim.kuvyrkov@gmail.com> writes:
> Hi Richard,
>
> You patch below increases code-size on aarch64-linux-gnu with -Os on SPEC2k6 400.perlbench and 453.povray -- by 1% and 2% respectively.
>
> 400.perlbench,perlbench_base.default,	101,939261,951221
> 453.povray,povray_base.default,		102,707807,721399
>
> Would you please check whether these can be avoided?  [Let me know if you need help reproducing this problem.]

I reverted the patch on Wednesday, see:
https://gcc.gnu.org/ml/gcc-patches/2020-01/msg01962.html

Thanks,
Richard

>
> Thank you,
>
> --
> Maxim Kuvyrkov
> https://www.linaro.org
>
>> On Jan 27, 2020, at 7:41 PM, Richard Sandiford <richard.sandiford@arm.com> wrote:
>> 
>> In the gcc.target/aarch64/lsl_asr_sbfiz.c part of this PR, we have:
>> 
>> Failed to match this instruction:
>> (set (reg:SI 95)
>>    (ashift:SI (subreg:SI (sign_extract:DI (subreg:DI (reg:SI 97) 0)
>>                (const_int 3 [0x3])
>>                (const_int 0 [0])) 0)
>>        (const_int 19 [0x13])))
>> 
>> If we perform the natural simplification to:
>> 
>> (set (reg:SI 95)
>>    (ashift:SI (sign_extract:SI (reg:SI 97)
>>                (const_int 3 [0x3])
>>                (const_int 0 [0])) 0)
>>        (const_int 19 [0x13])))
>> 
>> then the pattern matches.  And it turns out that we do have a
>> simplification like that already, but it would only kick in for
>> extractions from a reg, not a subreg.  E.g.:
>> 
>> (set (reg:SI 95)
>>    (ashift:SI (subreg:SI (sign_extract:DI (reg:DI X)
>>                (const_int 3 [0x3])
>>                (const_int 0 [0])) 0)
>>        (const_int 19 [0x13])))
>> 
>> would simplify to:
>> 
>> (set (reg:SI 95)
>>    (ashift:SI (sign_extract:SI (subreg:SI (reg:DI X) 0)
>>                (const_int 3 [0x3])
>>                (const_int 0 [0])) 0)
>>        (const_int 19 [0x13])))
>> 
>> IMO the subreg case is even more obviously a simplification
>> than the bare reg case, since the net effect is to remove
>> either one or two subregs, rather than simply change the
>> position of a subreg/truncation.
>> 
>> However, doing that regressed gcc.dg/tree-ssa/pr64910-2.c
>> for -m32 on x86_64-linux-gnu, because we could then simplify
>> a :HI zero_extract to a :QI one.  The associated *testqi_ext_3
>> pattern did already seem to want to handle QImode extractions:
>> 
>>  "ix86_match_ccmode (insn, CCNOmode)
>>   && ((TARGET_64BIT && GET_MODE (operands[2]) == DImode)
>>       || GET_MODE (operands[2]) == SImode
>>       || GET_MODE (operands[2]) == HImode
>>       || GET_MODE (operands[2]) == QImode)
>> 
>> but I'm not sure how often the QI case would trigger in practice,
>> since the zero_extract mode was restricted to HI and above.  I checked
>> the other x86 patterns and couldn't see any other instances of this.
>> 
>> Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu,
>> OK to install?
>> 
>> Richard
>> 
>> 
>> 2020-01-27  Richard Sandiford  <richard.sandiford@arm.com>
>> 
>> gcc/
>> 	PR rtl-optimization/87763
>> 	* simplify-rtx.c (simplify_truncation): Extend sign/zero_extract
>> 	simplification to handle subregs as well as bare regs.
>> 	* config/i386/i386.md (*testqi_ext_3): Match QI extracts too.
>> ---
>> gcc/config/i386/i386.md | 2 +-
>> gcc/simplify-rtx.c      | 4 +++-
>> 2 files changed, 4 insertions(+), 2 deletions(-)
>> 
>> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
>> index 6e9c9bd2fb6..a125ab350bb 100644
>> --- a/gcc/config/i386/i386.md
>> +++ b/gcc/config/i386/i386.md
>> @@ -8927,7 +8927,7 @@ (define_insn "*testqi_ext_2"
>> (define_insn_and_split "*testqi_ext_3"
>>   [(set (match_operand 0 "flags_reg_operand")
>>         (match_operator 1 "compare_operator"
>> -	  [(zero_extract:SWI248
>> +	  [(zero_extract:SWI
>> 	     (match_operand 2 "nonimmediate_operand" "rm")
>> 	     (match_operand 3 "const_int_operand" "n")
>> 	     (match_operand 4 "const_int_operand" "n"))
>> diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
>> index eff1d07a253..db4f9339c15 100644
>> --- a/gcc/simplify-rtx.c
>> +++ b/gcc/simplify-rtx.c
>> @@ -736,7 +736,9 @@ simplify_truncation (machine_mode mode, rtx op,
>>      (*_extract:M1 (truncate:M1 (reg:M2)) (len) (pos')) if possible without
>>      changing len.  */
>>   if ((GET_CODE (op) == ZERO_EXTRACT || GET_CODE (op) == SIGN_EXTRACT)
>> -      && REG_P (XEXP (op, 0))
>> +      && (REG_P (XEXP (op, 0))
>> +	  || (SUBREG_P (XEXP (op, 0))
>> +	      && REG_P (SUBREG_REG (XEXP (op, 0)))))
>>       && GET_MODE (XEXP (op, 0)) == GET_MODE (op)
>>       && CONST_INT_P (XEXP (op, 1))
>>       && CONST_INT_P (XEXP (op, 2)))
>> -- 
>> 2.17.1
>>
diff mbox series

Patch

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 6e9c9bd2fb6..a125ab350bb 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -8927,7 +8927,7 @@  (define_insn "*testqi_ext_2"
 (define_insn_and_split "*testqi_ext_3"
   [(set (match_operand 0 "flags_reg_operand")
         (match_operator 1 "compare_operator"
-	  [(zero_extract:SWI248
+	  [(zero_extract:SWI
 	     (match_operand 2 "nonimmediate_operand" "rm")
 	     (match_operand 3 "const_int_operand" "n")
 	     (match_operand 4 "const_int_operand" "n"))
diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
index eff1d07a253..db4f9339c15 100644
--- a/gcc/simplify-rtx.c
+++ b/gcc/simplify-rtx.c
@@ -736,7 +736,9 @@  simplify_truncation (machine_mode mode, rtx op,
      (*_extract:M1 (truncate:M1 (reg:M2)) (len) (pos')) if possible without
      changing len.  */
   if ((GET_CODE (op) == ZERO_EXTRACT || GET_CODE (op) == SIGN_EXTRACT)
-      && REG_P (XEXP (op, 0))
+      && (REG_P (XEXP (op, 0))
+	  || (SUBREG_P (XEXP (op, 0))
+	      && REG_P (SUBREG_REG (XEXP (op, 0)))))
       && GET_MODE (XEXP (op, 0)) == GET_MODE (op)
       && CONST_INT_P (XEXP (op, 1))
       && CONST_INT_P (XEXP (op, 2)))