diff mbox

combine: Convert subreg-of-lshiftrt to zero_extract properly (PR78390)

Message ID 9f8b8662c941ed5972a6f61b9da0fa7dd81409ff.1479910323.git.segher@kernel.crashing.org
State New
Headers show

Commit Message

Segher Boessenkool Nov. 23, 2016, 2:22 p.m. UTC
r242414, for PR77881, introduces some bugs (PR78390, PR78438, PR78477).
It all has the same root cause: that patch makes combine convert every
lowpart subreg of a logical shift right to a zero_extract.  This cannot
work at all if it is not a constant shift, and it has to be a bit more
careful exactly which bits it extracts.

Tested on powerpc64-linux {-m32,-m64} (where it fixes the regression
c-c++-common/torture/vector-compare-1.c fails at -O1, and where it also
has a bootstrap failure with some other patches).  Also tested that the
x86_64 compiler still generates the wanted code for the PR77881 testcase.


Segher


2016-11-23  Segher Boessenkool  <segher@kernel.crashing.org>

	PR target/77881
	PR bootstrap/78390
	PR target/78438
	PR bootstrap/78477
	* combine.c (make_compound_operation_int): Do not convert a subreg of
	a non-constant logical shift right to a zero_extract.  Handle the case
	where some zero bits have been shifted into the range covered by that
	subreg.

---
 gcc/combine.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

Comments

Michael Matz Nov. 23, 2016, 3:24 p.m. UTC | #1
Hi,

On Wed, 23 Nov 2016, Segher Boessenkool wrote:

> r242414, for PR77881, introduces some bugs (PR78390, PR78438, PR78477).
> It all has the same root cause: that patch makes combine convert every
> lowpart subreg of a logical shift right to a zero_extract.  This cannot
> work at all if it is not a constant shift,

Even with non-constant shifts it remains an extract and make_extraction 
does support variable start positions (which is the shift amount), as does 
the zero_extract pattern (depending on target of course).

>  	if (GET_CODE (inner) == LSHIFTRT
> +	    && CONST_INT_P (XEXP (inner, 1))
>  	    && GET_MODE_SIZE (mode) < GET_MODE_SIZE (GET_MODE (inner))
>  	    && subreg_lowpart_p (x))
>  	  {
>  	    new_rtx = make_compound_operation (XEXP (inner, 0), next_code);
> +	    int width = GET_MODE_PRECISION (GET_MODE (inner))

GET_MODE (new_rtx), because that's the object you're giving to 
make_extraction, not inner (and not XEXP(inner, 0)).

> +			- INTVAL (XEXP (inner, 1));
> +	    if (width > mode_width)
> +	      width = mode_width;
>  	    new_rtx = make_extraction (mode, new_rtx, 0, XEXP (inner, 1),
> -				       mode_width, 1, 0, in_code == COMPARE);
> +				       width, 1, 0, in_code == COMPARE);
>  	    break;
>  	  }


Ciao,
Michael.
Segher Boessenkool Nov. 23, 2016, 3:41 p.m. UTC | #2
On Wed, Nov 23, 2016 at 04:24:37PM +0100, Michael Matz wrote:
> > r242414, for PR77881, introduces some bugs (PR78390, PR78438, PR78477).
> > It all has the same root cause: that patch makes combine convert every
> > lowpart subreg of a logical shift right to a zero_extract.  This cannot
> > work at all if it is not a constant shift,
> 
> Even with non-constant shifts it remains an extract and make_extraction 
> does support variable start positions (which is the shift amount), as does 
> the zero_extract pattern (depending on target of course).

Sure, but the extraction length is non-constant as well then!  And not
just not-constant, with a conditional inside it, even.

> >  	if (GET_CODE (inner) == LSHIFTRT
> > +	    && CONST_INT_P (XEXP (inner, 1))
> >  	    && GET_MODE_SIZE (mode) < GET_MODE_SIZE (GET_MODE (inner))
> >  	    && subreg_lowpart_p (x))
> >  	  {
> >  	    new_rtx = make_compound_operation (XEXP (inner, 0), next_code);
> > +	    int width = GET_MODE_PRECISION (GET_MODE (inner))
> > +			- INTVAL (XEXP (inner, 1));
> 
> GET_MODE (new_rtx), because that's the object you're giving to 
> make_extraction, not inner (and not XEXP(inner, 0)).

This is about the *original* rtx, the lshiftrt: how many non-zero bits
does that leave.  The bits zeroed out by the lshiftrt have to be zeroed
out by the zero_extract as well, to keep the same semantics in the
resulting rtl.

> > +	    if (width > mode_width)
> > +	      width = mode_width;
> >  	    new_rtx = make_extraction (mode, new_rtx, 0, XEXP (inner, 1),
> > -				       mode_width, 1, 0, in_code == COMPARE);
> > +				       width, 1, 0, in_code == COMPARE);
> >  	    break;
> >  	  }


Segher
Michael Matz Nov. 23, 2016, 4:11 p.m. UTC | #3
Hi,

On Wed, 23 Nov 2016, Segher Boessenkool wrote:

> > Even with non-constant shifts it remains an extract and make_extraction 
> > does support variable start positions (which is the shift amount), as does 
> > the zero_extract pattern (depending on target of course).
> 
> Sure, but the extraction length is non-constant as well then!  And not
> just not-constant, with a conditional inside it, even.

Right, that would be the consequence.  I don't see much care for this 
problem anywhere, which probably hints that there aren't many targets 
accepting variant positions for zero_extract :)

> > >  	if (GET_CODE (inner) == LSHIFTRT
> > > +	    && CONST_INT_P (XEXP (inner, 1))
> > >  	    && GET_MODE_SIZE (mode) < GET_MODE_SIZE (GET_MODE (inner))
> > >  	    && subreg_lowpart_p (x))
> > >  	  {
> > >  	    new_rtx = make_compound_operation (XEXP (inner, 0), next_code);
> > > +	    int width = GET_MODE_PRECISION (GET_MODE (inner))
> > > +			- INTVAL (XEXP (inner, 1));
> > 
> > GET_MODE (new_rtx), because that's the object you're giving to 
> > make_extraction, not inner (and not XEXP(inner, 0)).
> 
> This is about the *original* rtx, the lshiftrt: how many non-zero bits
> does that leave.  The bits zeroed out by the lshiftrt have to be zeroed
> out by the zero_extract as well, to keep the same semantics in the
> resulting rtl.

Of course.  I saw it more as a requirements on the inputs of 
make_extraction.  In the end this is moot, as all three modes (of inner, 
of (inner,0) and of new_rtx) are the same :)


Ciao,
Michael.
Dominik Vogt Nov. 24, 2016, 12:32 p.m. UTC | #4
On Wed, Nov 23, 2016 at 02:22:07PM +0000, Segher Boessenkool wrote:
> r242414, for PR77881, introduces some bugs (PR78390, PR78438, PR78477).
> It all has the same root cause: that patch makes combine convert every
> lowpart subreg of a logical shift right to a zero_extract.  This cannot
> work at all if it is not a constant shift, and it has to be a bit more
> careful exactly which bits it extracts.
> 
> Tested on powerpc64-linux {-m32,-m64} (where it fixes the regression
> c-c++-common/torture/vector-compare-1.c fails at -O1, and where it also
> has a bootstrap failure with some other patches).  Also tested that the
> x86_64 compiler still generates the wanted code for the PR77881 testcase.

On s390 (31-Bit) we get two (easily fixable) test regression
supposedly because of the original path (+ this fix), and I don't
know what to do about it.  Perhaps these point to two situations,
where the change from lshiftrt to zero_extract sould not be done?

1) (subtests f29 and f31 in s390/risbg-ll-1.c)

An unpatched Gcc would generate this:

--
(insn 19 17 20 2 (set (reg:DI 2 %r2 [+-4 ])
        (lshiftrt:DI (reg:DI 66 [ v_shl ])
            (const_int 32 [0x20])))

(insn 20 19 21 2 (set (reg:SI 3 %r3)
        (subreg:SI (reg:DI 66 [ v_shl ]) 4))
--

A patched Gcc generates

--
(insn 19 17 20 2 (parallel [
            (set (reg:DI 2 %r2 [+-4 ])
                (zero_extract:DI (reg:DI 66 [ v_shl ])
                    (const_int 32 [0x20])
                    (const_int 0 [0])))
            (clobber (reg:CC 33 %cc))
--

which results in a more complicated "risbg" instruction instead of
a simpler shift right logical instruction "srlg".  (The former is
harder to read and got a CC-less variant only recently).

==> Should the change from lshiftrt to zero_extract be skipped if
it's just a 32 bit shift right operation in DImode, or should this
be "fixed" in the backend by adding a special case to the
"extzv<mode>" pattern?

--------

2) There's not committed test case for the second finding yet.  In
a future patch, this will be added:

--
i32 f44 (i64 v_x)
{
  /* { dg-final { scan-assembler "f44:\n\(\t.*\n\)*\tnilf\t" { target { ! lp64 } } } } */
  i64 v_shr4 = ((ui64)v_x) >> 12;
  i32 v_conv = (ui32)v_shr4;
  i32 v_and = v_conv & 10;
  return v_and;
}
--

Rtl before combine is:

--
(insn 4 3 5 2 (set (reg:DI 65 [ v_x ])
        (subreg:DI (reg:SI 67 [ v_x+4 ]) 0))

(insn 5 4 7 2 (parallel [
            (set (zero_extract:DI (reg:DI 65 [ v_x ])
                    (const_int 32 [0x20])
                    (const_int 0 [0]))
                (subreg:DI (reg:SI 66 [ v_x ]) 0))
            (clobber (reg:CC 33 %cc))

(insn 10 7 11 2 (set (reg:DI 69)
        (lshiftrt:DI (reg:DI 65 [ v_x ])
            (const_int 12 [0xc])))

(insn 11 10 16 2 (parallel [
            (set (reg:SI 68 [ v_and ])
                (and:SI (subreg:SI (reg:DI 69) 4)
                    (const_int 10 [0xa])))
            (clobber (reg:CC 33 %cc))
--

Before the patch, combine generated

--
Trying 5, 10 -> 11:
Failed to match this instruction:
(parallel [
...
Failed to match this instruction:
(set (reg:SI 68 [ v_and ])
    (and:SI (subreg:SI (lshiftrt:DI (reg:DI 65 [ v_x ])
                (const_int 12 [0xc])) 4)
        (const_int 10 [0xa])))
Successfully matched this instruction:
(set (reg:DI 69)
    (lshiftrt:DI (reg:DI 65 [ v_x ])  <---------
        (const_int 12 [0xc])))
Successfully matched this instruction:
(set (reg:SI 68 [ v_and ])
    (and:SI (subreg:SI (reg:DI 69) 4)
        (const_int 10 [0xa])))
allowing combination of insns 5, 10 and 11
...
--

With the patch it uses zero_extract instead of lshiftrt (which is
the whole point of the patch?):

--
Trying 5, 10 -> 11:
Failed to match this instruction:
...
Successfully matched this instruction:
(set (reg:DI 69)
    (zero_extract:DI (reg:DI 65 [ v_x ])   <---------
        (const_int 32 [0x20])
        (const_int 20 [0x14])))
Successfully matched this instruction:
(set (reg:SI 68 [ v_and ])
    (and:SI (subreg:SI (reg:DI 69) 4)
        (const_int 10 [0xa])))
allowing combination of insns 5, 10 and 11
--

Regardless of whether zeroes are shifted into the word or
zero_extract explicitly extends the extracted bits, none of the
bits set to zero are used at all because of the following "(and
... (const_int 10))".

In this specific case zero_extract has more (unnecessary) "side
effects" than lshiftrt, and s390 needs a more complicated
instruction to do the extraction (risbg) than it needs for the
shift (slrg).  However, on s390 both instructions have the same
size and cost, so it doesn't really matter.  On other targets a
shift may actually be cheaper than the zero_extract.

While this seems to be all right on s390, it may still indicate a
case that should be handled differently?


Ciao

Dominik ^_^  ^_^
Michael Matz Nov. 24, 2016, 2:01 p.m. UTC | #5
Hi,

On Thu, 24 Nov 2016, Dominik Vogt wrote:

> On s390 (31-Bit) we get two (easily fixable) test regression
> supposedly because of the original path (+ this fix), and I don't
> know what to do about it.  Perhaps these point to two situations,
> where the change from lshiftrt to zero_extract sould not be done?

The _extract forms are considered to be the canonical ones for backends 
that support them (I don't know why).  It's just that the canonicalization 
wasn't applied as often as it could and hence some backends aren't 
prepared to see it for also trivial cases.

> 1) (subtests f29 and f31 in s390/risbg-ll-1.c)
> 
> An unpatched Gcc would generate this:
> 
> --
> (insn 19 17 20 2 (set (reg:DI 2 %r2 [+-4 ])
>         (lshiftrt:DI (reg:DI 66 [ v_shl ])
>             (const_int 32 [0x20])))
> 
> (insn 20 19 21 2 (set (reg:SI 3 %r3)
>         (subreg:SI (reg:DI 66 [ v_shl ]) 4))
> --
> 
> A patched Gcc generates
> 
> --
> (insn 19 17 20 2 (parallel [
>             (set (reg:DI 2 %r2 [+-4 ])
>                 (zero_extract:DI (reg:DI 66 [ v_shl ])
>                     (const_int 32 [0x20])
>                     (const_int 0 [0])))
>             (clobber (reg:CC 33 %cc))

Well, one insn instead of two, seems sensible to me.

> ==> Should the change from lshiftrt to zero_extract be skipped if
> it's just a 32 bit shift right operation in DImode, or should this
> be "fixed" in the backend by adding a special case to the
> "extzv<mode>" pattern?

Special cases in extzv IMHO.

> 2) There's not committed test case for the second finding yet.  In
> 
> Rtl before combine is:
> 
> --
> (insn 4 3 5 2 (set (reg:DI 65 [ v_x ])
>         (subreg:DI (reg:SI 67 [ v_x+4 ]) 0))
> 
> (insn 5 4 7 2 (parallel [
>             (set (zero_extract:DI (reg:DI 65 [ v_x ])
>                     (const_int 32 [0x20])
>                     (const_int 0 [0]))
>                 (subreg:DI (reg:SI 66 [ v_x ]) 0))
>             (clobber (reg:CC 33 %cc))
> 
> (insn 10 7 11 2 (set (reg:DI 69)
>         (lshiftrt:DI (reg:DI 65 [ v_x ])
>             (const_int 12 [0xc])))
> 
> (insn 11 10 16 2 (parallel [
>             (set (reg:SI 68 [ v_and ])
>                 (and:SI (subreg:SI (reg:DI 69) 4)
>                     (const_int 10 [0xa])))
>             (clobber (reg:CC 33 %cc))
> --
> 
> Before the patch, combine generated
> 
> --
> Trying 5, 10 -> 11:
> Failed to match this instruction:
> (parallel [
> ...
> Failed to match this instruction:
> (set (reg:SI 68 [ v_and ])
>     (and:SI (subreg:SI (lshiftrt:DI (reg:DI 65 [ v_x ])
>                 (const_int 12 [0xc])) 4)
>         (const_int 10 [0xa])))
> Successfully matched this instruction:
> (set (reg:DI 69)
>     (lshiftrt:DI (reg:DI 65 [ v_x ])  <---------
>         (const_int 12 [0xc])))
> Successfully matched this instruction:
> (set (reg:SI 68 [ v_and ])
>     (and:SI (subreg:SI (reg:DI 69) 4)
>         (const_int 10 [0xa])))
> allowing combination of insns 5, 10 and 11
> ...
> --
> 
> With the patch it uses zero_extract instead of lshiftrt (which is
> the whole point of the patch?):
> 
> --
> Trying 5, 10 -> 11:
> Failed to match this instruction:
> ...
> Successfully matched this instruction:
> (set (reg:DI 69)
>     (zero_extract:DI (reg:DI 65 [ v_x ])   <---------
>         (const_int 32 [0x20])
>         (const_int 20 [0x14])))

Hmm, this transformation (from v_x>>12 to zext(v_x,32,20) is only valid 
when the top 20 bits of v_x are known to be zero already, or alternatively 
if it's know that the top 32bits of reg 69 won't matter in the future.  I 
wonder where that knowledge is coming from.

> Successfully matched this instruction:
> (set (reg:SI 68 [ v_and ])
>     (and:SI (subreg:SI (reg:DI 69) 4)
>         (const_int 10 [0xa])))
> allowing combination of insns 5, 10 and 11
> --
> 
> Regardless of whether zeroes are shifted into the word or
> zero_extract explicitly extends the extracted bits, none of the
> bits set to zero are used at all because of the following "(and
> ... (const_int 10))".
> 
> In this specific case zero_extract has more (unnecessary) "side
> effects" than lshiftrt, and s390 needs a more complicated
> instruction to do the extraction (risbg) than it needs for the
> shift (slrg).

The above zero-extract is equivalent to the shift under the above 
conditions so it could also be special cased in the extzv pattern again.  
As said I'm not sure where the knowledge of that condition comes from and 
if it's available to the backend.


Ciao,
Michael.
Dominik Vogt Nov. 24, 2016, 2:46 p.m. UTC | #6
On Thu, Nov 24, 2016 at 03:01:02PM +0100, Michael Matz wrote:
> Hi,
> 
> On Thu, 24 Nov 2016, Dominik Vogt wrote:
> 
> > On s390 (31-Bit) we get two (easily fixable) test regression
> > supposedly because of the original path (+ this fix), and I don't
> > know what to do about it.  Perhaps these point to two situations,
> > where the change from lshiftrt to zero_extract sould not be done?
> 
> The _extract forms are considered to be the canonical ones for backends 
> that support them (I don't know why).  It's just that the canonicalization 
> wasn't applied as often as it could and hence some backends aren't 
> prepared to see it for also trivial cases.
> 
> > 1) (subtests f29 and f31 in s390/risbg-ll-1.c)
> > 
> > An unpatched Gcc would generate this:
> > 
> > --
> > (insn 19 17 20 2 (set (reg:DI 2 %r2 [+-4 ])
> >         (lshiftrt:DI (reg:DI 66 [ v_shl ])
> >             (const_int 32 [0x20])))
> > 
> > (insn 20 19 21 2 (set (reg:SI 3 %r3)
> >         (subreg:SI (reg:DI 66 [ v_shl ]) 4))
> > --
> > 
> > A patched Gcc generates
> > 
> > --
> > (insn 19 17 20 2 (parallel [
> >             (set (reg:DI 2 %r2 [+-4 ])
> >                 (zero_extract:DI (reg:DI 66 [ v_shl ])
> >                     (const_int 32 [0x20])
> >                     (const_int 0 [0])))
> >             (clobber (reg:CC 33 %cc))
> 
> Well, one insn instead of two, seems sensible to me.

That was a copy-and-paste mistake.  Insn 20 does not go away with
the patched Gcc.

Ciao

Dominik ^_^  ^_^
Paolo Bonzini Nov. 25, 2016, 11:09 a.m. UTC | #7
On 24/11/2016 15:01, Michael Matz wrote:
>> > (set (reg:DI 69)
>> >     (zero_extract:DI (reg:DI 65 [ v_x ])   <---------
>> >         (const_int 32 [0x20])
>> >         (const_int 20 [0x14])))
> Hmm, this transformation (from v_x>>12 to zext(v_x,32,20) is only valid 
> when the top 20 bits of v_x are known to be zero already, or alternatively 
> if it's know that the top 32bits of reg 69 won't matter in the future.  I 
> wonder where that knowledge is coming from.

s390 is BITS_BIG_ENDIAN, so zext(v_x,32,20) really means the same as
zext(v_x,32,12) with the more common convention.

Insn 4 and 5 respectively set the low and high 32-bits of (reg:DI 65),
so the transform seems fine to me.

Paolo
Dominik Vogt Nov. 30, 2016, 1:12 p.m. UTC | #8
On Wed, Nov 23, 2016 at 02:22:07PM +0000, Segher Boessenkool wrote:
> r242414, for PR77881, introduces some bugs (PR78390, PR78438, PR78477).
> It all has the same root cause: that patch makes combine convert every
> lowpart subreg of a logical shift right to a zero_extract.  This cannot
> work at all if it is not a constant shift, and it has to be a bit more
> careful exactly which bits it extracts.
> 
> Tested on powerpc64-linux {-m32,-m64} (where it fixes the regression
> c-c++-common/torture/vector-compare-1.c fails at -O1, and where it also
> has a bootstrap failure with some other patches).  Also tested that the
> x86_64 compiler still generates the wanted code for the PR77881 testcase.

Is this a side effect of the patch series?

  Trying 7 -> 9:
  ...
  Failed to match this instruction:
  (set (reg:SI 68 [ v_or ])
      (ior:SI (subreg:SI (zero_extract:DI (reg:DI 2 %r2 [ v_x ])
               ^^^^^^^^^^^^^^^^^^^^^^^^^^
                  (const_int 16 [0x10])
                  (const_int 0 [0])) 4)
                                    ^^^
          (reg:SI 70 [ v_and1 ])))

Shouldn't this be simply

  ...
  (ior:SI (zero_extract:SI (reg:DI) (16) (0)))
  ...

?

Ciao

Dominik ^_^  ^_^
Segher Boessenkool Nov. 30, 2016, 1:35 p.m. UTC | #9
On Wed, Nov 30, 2016 at 02:12:35PM +0100, Dominik Vogt wrote:
> On Wed, Nov 23, 2016 at 02:22:07PM +0000, Segher Boessenkool wrote:
> > r242414, for PR77881, introduces some bugs (PR78390, PR78438, PR78477).
> > It all has the same root cause: that patch makes combine convert every
> > lowpart subreg of a logical shift right to a zero_extract.  This cannot
> > work at all if it is not a constant shift, and it has to be a bit more
> > careful exactly which bits it extracts.
> > 
> > Tested on powerpc64-linux {-m32,-m64} (where it fixes the regression
> > c-c++-common/torture/vector-compare-1.c fails at -O1, and where it also
> > has a bootstrap failure with some other patches).  Also tested that the
> > x86_64 compiler still generates the wanted code for the PR77881 testcase.
> 
> Is this a side effect of the patch series?

I do not know; I cannot tell just from this, and there is no source
snippet to try.  Maybe Michael can tell?

>   Trying 7 -> 9:
>   ...
>   Failed to match this instruction:
>   (set (reg:SI 68 [ v_or ])
>       (ior:SI (subreg:SI (zero_extract:DI (reg:DI 2 %r2 [ v_x ])
>                ^^^^^^^^^^^^^^^^^^^^^^^^^^
>                   (const_int 16 [0x10])
>                   (const_int 0 [0])) 4)
>                                     ^^^
>           (reg:SI 70 [ v_and1 ])))
> 
> Shouldn't this be simply
> 
>   ...
>   (ior:SI (zero_extract:SI (reg:DI) (16) (0)))
>   ...

That seems nicer, sure.  OTOH that will never match on a target that
does not have zero_extract:SI from :DI.  *_extracts are nasty.


Segher
Michael Matz Nov. 30, 2016, 1:43 p.m. UTC | #10
Hi,

On Wed, 30 Nov 2016, Dominik Vogt wrote:

> On Wed, Nov 23, 2016 at 02:22:07PM +0000, Segher Boessenkool wrote:
> > r242414, for PR77881, introduces some bugs (PR78390, PR78438, PR78477).
> > It all has the same root cause: that patch makes combine convert every
> > lowpart subreg of a logical shift right to a zero_extract.  This cannot
> > work at all if it is not a constant shift, and it has to be a bit more
> > careful exactly which bits it extracts.
> > 
> > Tested on powerpc64-linux {-m32,-m64} (where it fixes the regression
> > c-c++-common/torture/vector-compare-1.c fails at -O1, and where it also
> > has a bootstrap failure with some other patches).  Also tested that the
> > x86_64 compiler still generates the wanted code for the PR77881 testcase.
> 
> Is this a side effect of the patch series?

What is "this"?

>   Trying 7 -> 9:
>   ...
>   Failed to match this instruction:
>   (set (reg:SI 68 [ v_or ])
>       (ior:SI (subreg:SI (zero_extract:DI (reg:DI 2 %r2 [ v_x ])
>                ^^^^^^^^^^^^^^^^^^^^^^^^^^
>                   (const_int 16 [0x10])
>                   (const_int 0 [0])) 4)
>                                     ^^^
>           (reg:SI 70 [ v_and1 ])))
> 
> Shouldn't this be simply
> 
>   ...
>   (ior:SI (zero_extract:SI (reg:DI) (16) (0)))
>   ...

I don't think mode-changing _extracts are valid in this context.  From the 
docu:

  `(sign_extract:M LOC SIZE POS)'
  ...
     The mode M is the same as the mode that would be used for LOC if
     it were a register.

Probably it could be made to work just fine, but I'm not sure it'd be 
worth much, as then the targets would need to care for mode-changes 
occuring not just through subregs as usual, but also through extracts.


Ciao,
Michael.
Segher Boessenkool Nov. 30, 2016, 2:16 p.m. UTC | #11
On Wed, Nov 30, 2016 at 02:43:12PM +0100, Michael Matz wrote:
> > Shouldn't this be simply
> > 
> >   ...
> >   (ior:SI (zero_extract:SI (reg:DI) (16) (0)))
> >   ...
> 
> I don't think mode-changing _extracts are valid in this context.  From the 
> docu:
> 
>   `(sign_extract:M LOC SIZE POS)'
>   ...
>      The mode M is the same as the mode that would be used for LOC if
>      it were a register.
> 
> Probably it could be made to work just fine, but I'm not sure it'd be 
> worth much, as then the targets would need to care for mode-changes 
> occuring not just through subregs as usual, but also through extracts.

The patch https://gcc.gnu.org/ml/gcc-patches/2016-11/msg02987.html I
submitted yesterday deals with this same issue, FWIW -- some ports
apparently already do mode-changing extracts.


Segher
Michael Matz Nov. 30, 2016, 2:40 p.m. UTC | #12
Hi,

On Wed, 30 Nov 2016, Segher Boessenkool wrote:

> > I don't think mode-changing _extracts are valid in this context.  From the 
> > docu:
> > 
> >   `(sign_extract:M LOC SIZE POS)'
> >   ...
> >      The mode M is the same as the mode that would be used for LOC if
> >      it were a register.
> > 
> > Probably it could be made to work just fine, but I'm not sure it'd be 
> > worth much, as then the targets would need to care for mode-changes 
> > occuring not just through subregs as usual, but also through extracts.
> 
> The patch https://gcc.gnu.org/ml/gcc-patches/2016-11/msg02987.html I
> submitted yesterday deals with this same issue, FWIW -- some ports
> apparently already do mode-changing extracts.

Yeah, saw that a bit later.  So, hmmm.  I'm not sure what to make of it, 
if the targets choose to use mode-changing extracts I guess that's fine, 
as they presumably will have written patterns that recognize them.  But I 
don't think we should willy-nilly generate such patterns as we can't know 
if the target deals with them or not.  We could of course always generate 
both variants: (subreg:M1 (extract:M2 (object:M2)) and
(extract:M1 (object:M2)) and see if either matches, but that seems a bit 
too much work.


Ciao,
Michael.
Dominik Vogt Nov. 30, 2016, 4:01 p.m. UTC | #13
On Wed, Nov 30, 2016 at 03:40:32PM +0100, Michael Matz wrote:
> Hi,
> 
> On Wed, 30 Nov 2016, Segher Boessenkool wrote:
> 
> > > I don't think mode-changing _extracts are valid in this context.  From the 
> > > docu:
> > > 
> > >   `(sign_extract:M LOC SIZE POS)'
> > >   ...
> > >      The mode M is the same as the mode that would be used for LOC if
> > >      it were a register.
> > > 
> > > Probably it could be made to work just fine, but I'm not sure it'd be 
> > > worth much, as then the targets would need to care for mode-changes 
> > > occuring not just through subregs as usual, but also through extracts.
> > 
> > The patch https://gcc.gnu.org/ml/gcc-patches/2016-11/msg02987.html I
> > submitted yesterday deals with this same issue, FWIW -- some ports
> > apparently already do mode-changing extracts.
> 
> Yeah, saw that a bit later.  So, hmmm.  I'm not sure what to make of it, 
> if the targets choose to use mode-changing extracts I guess that's fine, 
> as they presumably will have written patterns that recognize them.  But I 
> don't think we should willy-nilly generate such patterns as we can't know 
> if the target deals with them or not.

Just working on such a pattern for s390x, I had the impression
that combine uses the automatic mode change when it's there, and
otherwise it does things differently, that is, it tries different
combinations when it has the pattern than without.  There seems to
be at least some kind of detection.

> We could of course always generate 
> both variants: (subreg:M1 (extract:M2 (object:M2)) and
> (extract:M1 (object:M2)) and see if either matches, but that seems a bit 
> too much work.

--

The insns that combine tried are:

  (insn 7 4 8 2 (set (reg:DI 69)
        (lshiftrt:DI (reg/v:DI 66 [ v_x ])
            (const_int 48 [0x30])))

  (insn 9 8 10 2 (parallel [
            (set (reg:SI 68 [ v_or ])
                (ior:SI (reg:SI 70 [ v_and1 ])
                    (subreg:SI (reg:DI 69) 4)))
            (clobber (reg:CC 33 %cc))
        ])

A while ago combine handled the situation well, resulting in the
new "risbg" instruction, but for a while it's not been working.
It's a bit difficult to track that down to a specific commit
because of the broken "combine"-patch that took a while to fix.

Ciao

Dominik ^_^  ^_^
diff mbox

Patch

diff --git a/gcc/combine.c b/gcc/combine.c
index c025125..ca944c6 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -8143,12 +8143,17 @@  make_compound_operation_int (machine_mode mode, rtx *x_ptr,
 	/* If the SUBREG is masking of a logical right shift,
 	   make an extraction.  */
 	if (GET_CODE (inner) == LSHIFTRT
+	    && CONST_INT_P (XEXP (inner, 1))
 	    && GET_MODE_SIZE (mode) < GET_MODE_SIZE (GET_MODE (inner))
 	    && subreg_lowpart_p (x))
 	  {
 	    new_rtx = make_compound_operation (XEXP (inner, 0), next_code);
+	    int width = GET_MODE_PRECISION (GET_MODE (inner))
+			- INTVAL (XEXP (inner, 1));
+	    if (width > mode_width)
+	      width = mode_width;
 	    new_rtx = make_extraction (mode, new_rtx, 0, XEXP (inner, 1),
-				       mode_width, 1, 0, in_code == COMPARE);
+				       width, 1, 0, in_code == COMPARE);
 	    break;
 	  }