diff mbox series

dse: Handle SUBREGs of word REGs differently for WORD_REGISTER_OPERATIONS targets [PR109040]

Message ID ZC08hc8fUczEywig@tucnak
State New
Headers show
Series dse: Handle SUBREGs of word REGs differently for WORD_REGISTER_OPERATIONS targets [PR109040] | expand

Commit Message

Jakub Jelinek April 5, 2023, 9:16 a.m. UTC
Hi!

The following testcase is miscompiled on riscv since the addition
of *mvconst_internal define_insn_and_split.
I believe the bug is in DSE.  We have:
(insn 36 35 39 2 (set (mem/c:SI (plus:SI (reg/f:SI 65 frame)
                (const_int -64 [0xffffffffffffffc0])) [2  S4 A128])
        (reg:SI 166)) "pr109040.c":9:11 178 {*movsi_internal}
     (expr_list:REG_DEAD (reg:SI 166)
        (nil)))
(insn 39 36 40 2 (set (reg:SI 171)
        (zero_extend:SI (mem/c:HI (plus:SI (reg/f:SI 65 frame)
                    (const_int -64 [0xffffffffffffffc0])) [0  S2 A128]))) "pr109040.c":9:11 111 {*zero_extendhisi2}
     (nil))
and RTL DSE's replace_read since r0-86337-g18b526e806ab6455 handles
even different modes like in the above case, and so it optimizes it into:
(insn 47 35 39 2 (set (reg:HI 175)
        (subreg:HI (reg:SI 166) 0)) "pr109040.c":9:11 179 {*movhi_internal}
     (expr_list:REG_DEAD (reg:SI 166)
        (nil)))
(insn 39 47 40 2 (set (reg:SI 171)
        (zero_extend:SI (reg:HI 175))) "pr109040.c":9:11 111 {*zero_extendhisi2}
     (expr_list:REG_DEAD (reg:HI 175)
        (nil)))
Pseudo 166 is result of AND with 0x8084c constant (forced into a register).
Combine attempts to combine the AND with the insn 47 above created by DSE,
and turns it because of WORD_REGISTER_OPERATIONS and its assumption that all
the subword operations are actually done on word mode into:
(set (subreg:SI (reg:HI 175) 0)
    (and:SI (reg:SI 167 [ m ])
        (reg:SI 168)))
and later on the ZERO_EXTEND is thrown away.

The following patch changes DSE to instead emit
(insn 47 35 39 2 (set (reg:SI 175)
        (reg:SI 166)) "pr109040.c":10:11 180 {*movsi_internal}
     (expr_list:REG_DEAD (reg:SI 166)
        (nil)))
(insn 39 47 40 2 (set (reg:SI 171)
        (zero_extend:SI (subreg:HI (reg:SI 175) 0))) "pr109040.c":10:11 111 {*zero_extendhisi2}
     (expr_list:REG_DEAD (reg:SI 175)
        (nil)))
i.e. in the new insn copy whole reg rather than subword part of it where
we don't really know anything about the upper bits.
With this change, combine manages to do the right thing, optimies the
(unsigned) (unsigned short) (reg & 0x8084c) into
reg & 0x84c.

Bootstrapped/regtested on x86_64-linux and i686-linux (admittedly not
WORD_REGISTER_OPERATIONS targets) and tested using a cross to riscv
on the testcase.  I have unfortunately no way to bootstrap this on
risc*-linux, could somebody do that?  Ok for trunk if that passes/

2023-04-05  Jakub Jelinek  <jakub@redhat.com>

	PR target/109040
	* dse.cc (replace_read): If read_reg is a SUBREG of a word mode
	REG, for WORD_REGISTER_OPERATIONS copy SUBREG_REG of it into
	a new REG rather than the SUBREG.

	* gcc.c-torture/execute/pr109040.c: New test.


	Jakub

Comments

Jeff Law April 5, 2023, 1:14 p.m. UTC | #1
On 4/5/23 03:16, Jakub Jelinek wrote:
> Hi!
> 
> The following testcase is miscompiled on riscv since the addition
> of *mvconst_internal define_insn_and_split.
> I believe the bug is in DSE.  We have:
> (insn 36 35 39 2 (set (mem/c:SI (plus:SI (reg/f:SI 65 frame)
>                  (const_int -64 [0xffffffffffffffc0])) [2  S4 A128])
>          (reg:SI 166)) "pr109040.c":9:11 178 {*movsi_internal}
>       (expr_list:REG_DEAD (reg:SI 166)
>          (nil)))
> (insn 39 36 40 2 (set (reg:SI 171)
>          (zero_extend:SI (mem/c:HI (plus:SI (reg/f:SI 65 frame)
>                      (const_int -64 [0xffffffffffffffc0])) [0  S2 A128]))) "pr109040.c":9:11 111 {*zero_extendhisi2}
>       (nil))
> and RTL DSE's replace_read since r0-86337-g18b526e806ab6455 handles
> even different modes like in the above case, and so it optimizes it into:
> (insn 47 35 39 2 (set (reg:HI 175)
>          (subreg:HI (reg:SI 166) 0)) "pr109040.c":9:11 179 {*movhi_internal}
>       (expr_list:REG_DEAD (reg:SI 166)
>          (nil)))
> (insn 39 47 40 2 (set (reg:SI 171)
>          (zero_extend:SI (reg:HI 175))) "pr109040.c":9:11 111 {*zero_extendhisi2}
>       (expr_list:REG_DEAD (reg:HI 175)
>          (nil)))
> Pseudo 166 is result of AND with 0x8084c constant (forced into a register).
Right.  But do we agree that the two above are equivalent?  If they are 
then changing DSE just papers over the combine issue downstream.





> Combine attempts to combine the AND with the insn 47 above created by DSE,
> and turns it because of WORD_REGISTER_OPERATIONS and its assumption that all
> the subword operations are actually done on word mode into:
> (set (subreg:SI (reg:HI 175) 0)
>      (and:SI (reg:SI 167 [ m ])
>          (reg:SI 168)))
> and later on the ZERO_EXTEND is thrown away.
And isn't that where the bug really is.  There's a patch from pan2.li in 
this exact space.  That patch still doesn't look right to me, but it 
seems to be attacking this problem closer to the right place.


> 
> The following patch changes DSE to instead emit
> (insn 47 35 39 2 (set (reg:SI 175)
>          (reg:SI 166)) "pr109040.c":10:11 180 {*movsi_internal}
>       (expr_list:REG_DEAD (reg:SI 166)
>          (nil)))
> (insn 39 47 40 2 (set (reg:SI 171)
>          (zero_extend:SI (subreg:HI (reg:SI 175) 0))) "pr109040.c":10:11 111 {*zero_extendhisi2}
>       (expr_list:REG_DEAD (reg:SI 175)
>          (nil)))
> i.e. in the new insn copy whole reg rather than subword part of it where
> we don't really know anything about the upper bits.
> With this change, combine manages to do the right thing, optimies the
> (unsigned) (unsigned short) (reg & 0x8084c) into
> reg & 0x84c.
This may still be desirable but it feels like papering over the problem 
to me.


> 
> Bootstrapped/regtested on x86_64-linux and i686-linux (admittedly not
> WORD_REGISTER_OPERATIONS targets) and tested using a cross to riscv
> on the testcase.  I have unfortunately no way to bootstrap this on
> risc*-linux, could somebody do that?  Ok for trunk if that passes/
> 
> 2023-04-05  Jakub Jelinek  <jakub@redhat.com>
> 
> 	PR target/109040
> 	* dse.cc (replace_read): If read_reg is a SUBREG of a word mode
> 	REG, for WORD_REGISTER_OPERATIONS copy SUBREG_REG of it into
> 	a new REG rather than the SUBREG.
> 
> 	* gcc.c-torture/execute/pr109040.c: New test.
No problem with the patch itself.  I just want to make sure we're fixing 
the problem in the right place.

jeff
Jakub Jelinek April 5, 2023, 2:51 p.m. UTC | #2
On Wed, Apr 05, 2023 at 07:14:23AM -0600, Jeff Law wrote:
> > The following testcase is miscompiled on riscv since the addition
> > of *mvconst_internal define_insn_and_split.
> > I believe the bug is in DSE.  We have:
> > (insn 36 35 39 2 (set (mem/c:SI (plus:SI (reg/f:SI 65 frame)
> >                  (const_int -64 [0xffffffffffffffc0])) [2  S4 A128])
> >          (reg:SI 166)) "pr109040.c":9:11 178 {*movsi_internal}
> >       (expr_list:REG_DEAD (reg:SI 166)
> >          (nil)))
> > (insn 39 36 40 2 (set (reg:SI 171)
> >          (zero_extend:SI (mem/c:HI (plus:SI (reg/f:SI 65 frame)
> >                      (const_int -64 [0xffffffffffffffc0])) [0  S2 A128]))) "pr109040.c":9:11 111 {*zero_extendhisi2}
> >       (nil))
> > and RTL DSE's replace_read since r0-86337-g18b526e806ab6455 handles
> > even different modes like in the above case, and so it optimizes it into:
> > (insn 47 35 39 2 (set (reg:HI 175)
> >          (subreg:HI (reg:SI 166) 0)) "pr109040.c":9:11 179 {*movhi_internal}
> >       (expr_list:REG_DEAD (reg:SI 166)
> >          (nil)))
> > (insn 39 47 40 2 (set (reg:SI 171)
> >          (zero_extend:SI (reg:HI 175))) "pr109040.c":9:11 111 {*zero_extendhisi2}
> >       (expr_list:REG_DEAD (reg:HI 175)
> >          (nil)))
> > Pseudo 166 is result of AND with 0x8084c constant (forced into a register).
> Right.  But do we agree that the two above are equivalent?  If they are then
> changing DSE just papers over the combine issue downstream.

It is true that an instruction like
(insn 8 7 9 2 (set (reg:HI 141)
        (subreg:HI (reg:SI 142) 0)) "aauu.c":6:18 181 {*movhi_internal}
     (nil))
can appear in the IL on WORD_REGISTER_OPERATIONS target, but I think the
upper bits shouldn't be random garbage in that case, it should be zero
extended or sign extended.

What happens in combine is we enter combine.cc (simplify_set) with
(set (reg:HI 175)
    (subreg:HI (and:SI (reg:SI 167 [ m ])
            (reg:SI 168)) 0))
and there trigger the
  /* If we have (set x (subreg:m1 (op:m2 ...) 0)) with OP being some operation,
     and X being a REG or (subreg (reg)), we may be able to convert this to
     (set (subreg:m2 x) (op)).

     We can always do this if M1 is narrower than M2 because that means that
     we only care about the low bits of the result.

     However, on machines without WORD_REGISTER_OPERATIONS defined, we cannot
     perform a narrower operation than requested since the high-order bits will
     be undefined.  On machine where it is defined, this transformation is safe
     as long as M1 and M2 have the same number of words.  */
transformation into:
(set (subreg:SI (reg:HI 175) 0)
    (and:SI (reg:SI 167 [ m ])
        (reg:SI 168)))
Though, it is !paradoxical_subreg_p (src) in that case, so it is done
regardless of WORD_REGISTER_OPERATIONS I think.

Then after that try_combine we do:
13325		record_value_for_reg (dest, record_dead_insn,
13326				      WORD_REGISTER_OPERATIONS
13327				      && word_register_operation_p (SET_SRC (setter))
13328				      && paradoxical_subreg_p (SET_DEST (setter))
13329				      ? SET_SRC (setter)
13330				      : gen_lowpart (GET_MODE (dest),
13331						     SET_SRC (setter)));
and the 3 conditions are true here and so record value of the whole setter.
That then records among other things nonzero_bits as 0x8084c.

Next when trying to combine
(insn 39 47 40 2 (set (reg:SI 171)
        (zero_extend:SI (reg:HI 175))) "pr109040.c":10:11 111 {*zero_extendhisi2}
     (expr_list:REG_DEAD (reg:HI 175)
        (nil)))
into
(insn 40 39 43 2 (set (reg:SI 172)
        (leu:SI (reg:SI 171)
            (const_int 5 [0x5]))) "pr109040.c":10:11 291 {*sleu_sisi}
     (expr_list:REG_DEAD (reg:SI 171)
        (nil)))
we i2src = subst (i2src, pc_rtx, pc_rtx, 0, 0, 0);
and that correctly simplifies it into
(and:SI (subreg:SI (reg:HI 175) 0)
    (const_int 2124 [0x84c]))
We substitute that
(leu:SI (and:SI (subreg:SI (reg:HI 175) 0)
        (const_int 2124 [0x84c]))
    (const_int 5 [0x5]))
but then trigger the WORD_REGISTER_OPERATIONS block in simplify_comparison:
          /* If this is (and:M1 (subreg:M1 X:M2 0) (const_int C1)) where C1
             fits in both M1 and M2 and the SUBREG is either paradoxical
             or represents the low part, permute the SUBREG and the AND
             and try again.  */
          if (GET_CODE (XEXP (op0, 0)) == SUBREG
              && CONST_INT_P (XEXP (op0, 1)))
            {
              unsigned HOST_WIDE_INT c1 = INTVAL (XEXP (op0, 1));
              /* Require an integral mode, to avoid creating something like
                 (AND:SF ...).  */
              if ((is_a <scalar_int_mode>
                   (GET_MODE (SUBREG_REG (XEXP (op0, 0))), &tmode))
                  /* It is unsafe to commute the AND into the SUBREG if the
                     SUBREG is paradoxical and WORD_REGISTER_OPERATIONS is
                     not defined.  As originally written the upper bits
                     have a defined value due to the AND operation.
                     However, if we commute the AND inside the SUBREG then
                     they no longer have defined values and the meaning of
                     the code has been changed.
                     Also C1 should not change value in the smaller mode,
                     see PR67028 (a positive C1 can become negative in the
                     smaller mode, so that the AND does no longer mask the
                     upper bits).  */
                  && ((WORD_REGISTER_OPERATIONS
                       && mode_width > GET_MODE_PRECISION (tmode)
                       && mode_width <= BITS_PER_WORD
                       && trunc_int_for_mode (c1, tmode) == (HOST_WIDE_INT) c1)
                      || (mode_width <= GET_MODE_PRECISION (tmode)
                          && subreg_lowpart_p (XEXP (op0, 0))))
                  && mode_width <= HOST_BITS_PER_WIDE_INT
                  && HWI_COMPUTABLE_MODE_P (tmode)
                  && (c1 & ~mask) == 0
                  && (c1 & ~GET_MODE_MASK (tmode)) == 0
                  && c1 != mask
                  && c1 != GET_MODE_MASK (tmode))
                {
                  op0 = simplify_gen_binary (AND, tmode,
                                             SUBREG_REG (XEXP (op0, 0)),
                                             gen_int_mode (c1, tmode));
                  op0 = gen_lowpart (mode, op0);
                  continue;
                }
            }
c1 is 0x84c.  I believe this is the exact spot where things go wrong,
and is because for WORD_REGISTER_OPERATIONS we assume something that the
DSE added instruction didn't guarantee.

	Jakub
Jeff Law April 5, 2023, 4:17 p.m. UTC | #3
On 4/5/23 08:51, Jakub Jelinek wrote:
> On Wed, Apr 05, 2023 at 07:14:23AM -0600, Jeff Law wrote:
>>> The following testcase is miscompiled on riscv since the addition
>>> of *mvconst_internal define_insn_and_split.
>>> I believe the bug is in DSE.  We have:
>>> (insn 36 35 39 2 (set (mem/c:SI (plus:SI (reg/f:SI 65 frame)
>>>                   (const_int -64 [0xffffffffffffffc0])) [2  S4 A128])
>>>           (reg:SI 166)) "pr109040.c":9:11 178 {*movsi_internal}
>>>        (expr_list:REG_DEAD (reg:SI 166)
>>>           (nil)))
>>> (insn 39 36 40 2 (set (reg:SI 171)
>>>           (zero_extend:SI (mem/c:HI (plus:SI (reg/f:SI 65 frame)
>>>                       (const_int -64 [0xffffffffffffffc0])) [0  S2 A128]))) "pr109040.c":9:11 111 {*zero_extendhisi2}
>>>        (nil))
>>> and RTL DSE's replace_read since r0-86337-g18b526e806ab6455 handles
>>> even different modes like in the above case, and so it optimizes it into:
>>> (insn 47 35 39 2 (set (reg:HI 175)
>>>           (subreg:HI (reg:SI 166) 0)) "pr109040.c":9:11 179 {*movhi_internal}
>>>        (expr_list:REG_DEAD (reg:SI 166)
>>>           (nil)))
>>> (insn 39 47 40 2 (set (reg:SI 171)
>>>           (zero_extend:SI (reg:HI 175))) "pr109040.c":9:11 111 {*zero_extendhisi2}
>>>        (expr_list:REG_DEAD (reg:HI 175)
>>>           (nil)))
>>> Pseudo 166 is result of AND with 0x8084c constant (forced into a register).
>> Right.  But do we agree that the two above are equivalent?  If they are then
>> changing DSE just papers over the combine issue downstream.
> 
> It is true that an instruction like
> (insn 8 7 9 2 (set (reg:HI 141)
>          (subreg:HI (reg:SI 142) 0)) "aauu.c":6:18 181 {*movhi_internal}
>       (nil))
> can appear in the IL on WORD_REGISTER_OPERATIONS target, but I think the
> upper bits shouldn't be random garbage in that case, it should be zero
> extended or sign extended.
Well, that's one of the core questions here.  What are the state of the 
upper 16 bits of (reg:HI 141)?  The WORD_REGISTER_OPERATIONS docs aren't 
100% clear as we're not really doing any operation.

So again, I think we need to decide if the DSE transformation is correct 
or not.  I *think* we can aggree that insn 39 is OK.  It's really the 
semantics of insn 47 that I think we need to agree on.  What is the 
state of the upper 16 bits of (reg:HI 175) after insn 47?




> 
> What happens in combine is we enter combine.cc (simplify_set) with
> (set (reg:HI 175)
>      (subreg:HI (and:SI (reg:SI 167 [ m ])
>              (reg:SI 168)) 0))
> and there trigger the
>    /* If we have (set x (subreg:m1 (op:m2 ...) 0)) with OP being some operation,
>       and X being a REG or (subreg (reg)), we may be able to convert this to
>       (set (subreg:m2 x) (op)).
> 
>       We can always do this if M1 is narrower than M2 because that means that
>       we only care about the low bits of the result.
> 
>       However, on machines without WORD_REGISTER_OPERATIONS defined, we cannot
>       perform a narrower operation than requested since the high-order bits will
>       be undefined.  On machine where it is defined, this transformation is safe
>       as long as M1 and M2 have the same number of words.  */
> transformation into:
> (set (subreg:SI (reg:HI 175) 0)
>      (and:SI (reg:SI 167 [ m ])
>          (reg:SI 168)))
I think we're OK a this point.


> Though, it is !paradoxical_subreg_p (src) in that case, so it is done
> regardless of WORD_REGISTER_OPERATIONS I think.
> 
> Then after that try_combine we do:
> 13325		record_value_for_reg (dest, record_dead_insn,
> 13326				      WORD_REGISTER_OPERATIONS
> 13327				      && word_register_operation_p (SET_SRC (setter))
> 13328				      && paradoxical_subreg_p (SET_DEST (setter))
> 13329				      ? SET_SRC (setter)
> 13330				      : gen_lowpart (GET_MODE (dest),
> 13331						     SET_SRC (setter)));
> and the 3 conditions are true here and so record value of the whole setter.
> That then records among other things nonzero_bits as 0x8084c.
> 
> Next when trying to combine
> (insn 39 47 40 2 (set (reg:SI 171)
>          (zero_extend:SI (reg:HI 175))) "pr109040.c":10:11 111 {*zero_extendhisi2}
>       (expr_list:REG_DEAD (reg:HI 175)
>          (nil)))
> into
> (insn 40 39 43 2 (set (reg:SI 172)
>          (leu:SI (reg:SI 171)
>              (const_int 5 [0x5]))) "pr109040.c":10:11 291 {*sleu_sisi}
>       (expr_list:REG_DEAD (reg:SI 171)
>          (nil)))
> we i2src = subst (i2src, pc_rtx, pc_rtx, 0, 0, 0);
> and that correctly simplifies it into
> (and:SI (subreg:SI (reg:HI 175) 0)
>      (const_int 2124 [0x84c]))
Right.  Still seems sane at this point.


> We substitute that
> (leu:SI (and:SI (subreg:SI (reg:HI 175) 0)
>          (const_int 2124 [0x84c]))
>      (const_int 5 [0x5]))
> but then trigger the WORD_REGISTER_OPERATIONS block in simplify_comparison:
>            /* If this is (and:M1 (subreg:M1 X:M2 0) (const_int C1)) where C1
>               fits in both M1 and M2 and the SUBREG is either paradoxical
>               or represents the low part, permute the SUBREG and the AND
>               and try again.  */
>            if (GET_CODE (XEXP (op0, 0)) == SUBREG
>                && CONST_INT_P (XEXP (op0, 1)))
>              {
>                unsigned HOST_WIDE_INT c1 = INTVAL (XEXP (op0, 1));
>                /* Require an integral mode, to avoid creating something like
>                   (AND:SF ...).  */
>                if ((is_a <scalar_int_mode>
>                     (GET_MODE (SUBREG_REG (XEXP (op0, 0))), &tmode))
>                    /* It is unsafe to commute the AND into the SUBREG if the
>                       SUBREG is paradoxical and WORD_REGISTER_OPERATIONS is
>                       not defined.  As originally written the upper bits
>                       have a defined value due to the AND operation.
>                       However, if we commute the AND inside the SUBREG then
>                       they no longer have defined values and the meaning of
>                       the code has been changed.
>                       Also C1 should not change value in the smaller mode,
>                       see PR67028 (a positive C1 can become negative in the
>                       smaller mode, so that the AND does no longer mask the
>                       upper bits).  */
>                    && ((WORD_REGISTER_OPERATIONS
>                         && mode_width > GET_MODE_PRECISION (tmode)
>                         && mode_width <= BITS_PER_WORD
>                         && trunc_int_for_mode (c1, tmode) == (HOST_WIDE_INT) c1)
>                        || (mode_width <= GET_MODE_PRECISION (tmode)
>                            && subreg_lowpart_p (XEXP (op0, 0))))
>                    && mode_width <= HOST_BITS_PER_WIDE_INT
>                    && HWI_COMPUTABLE_MODE_P (tmode)
>                    && (c1 & ~mask) == 0
>                    && (c1 & ~GET_MODE_MASK (tmode)) == 0
>                    && c1 != mask
>                    && c1 != GET_MODE_MASK (tmode))
>                  {
>                    op0 = simplify_gen_binary (AND, tmode,
>                                               SUBREG_REG (XEXP (op0, 0)),
>                                               gen_int_mode (c1, tmode));
>                    op0 = gen_lowpart (mode, op0);
>                    continue;
>                  }
>              }
> c1 is 0x84c.  I believe this is the exact spot where things go wrong,
> and is because for WORD_REGISTER_OPERATIONS we assume something that the
> DSE added instruction didn't guarantee.
pan2.li zero'd in on the same block of code.

The leap I'm strugging with is the assertion that this combine code 
assumes something that the DSE added instruction does not guarantee. 
That's why I asked if we agreed that the before/after from DSE was 
correct or not.  I think that's still the outstanding question.

Jeff
Jakub Jelinek April 5, 2023, 4:48 p.m. UTC | #4
On Wed, Apr 05, 2023 at 10:17:59AM -0600, Jeff Law wrote:
> > It is true that an instruction like
> > (insn 8 7 9 2 (set (reg:HI 141)
> >          (subreg:HI (reg:SI 142) 0)) "aauu.c":6:18 181 {*movhi_internal}
> >       (nil))
> > can appear in the IL on WORD_REGISTER_OPERATIONS target, but I think the
> > upper bits shouldn't be random garbage in that case, it should be zero
> > extended or sign extended.
> Well, that's one of the core questions here.  What are the state of the
> upper 16 bits of (reg:HI 141)?  The WORD_REGISTER_OPERATIONS docs aren't
> 100% clear as we're not really doing any operation.
> 
> So again, I think we need to decide if the DSE transformation is correct or
> not.  I *think* we can aggree that insn 39 is OK.  It's really the semantics
> of insn 47 that I think we need to agree on.  What is the state of the upper
> 16 bits of (reg:HI 175) after insn 47?

I'm afraid I don't know the answers here, I think Eric is
WORD_REGISTER_OPERATIONS expert here I think these days (most of the major
targets are !WORD_REGISTER_OPERATIONS).
Intuitively, WORD_REGISTER_OPERATIONS from the description would be
that
(insn 47 35 39 2 (set (reg:HI 175)                                                                                                                                                    
        (subreg:HI (reg:SI 166) 0)) "pr109040.c":9:11 179 {*movhi_internal}                                                                                                           
     (expr_list:REG_DEAD (reg:SI 166)                                                                                                                                                 
        (nil)))                                                                                                                                                                       
would copy not just the low 16-bits of pseudo 166 to 175, but also the upper
16-bits.  But if that is so, then something is broken in the code below.
Some archeology shows that we were doing that on all arches initially
and then
Wed May 13 17:38:35 1998  Richard Kenner  <kenner@vlsi1.ultra.nyu.edu>

	* combine.c (simplify_comparison, case AND): Don't commute AND
	with SUBREG if constant is whole mode and don't do if lowpart
	and not WORD_REGISTER_OPERATIONS.
restricted that to WORD_REGISTER_OPERATIONS only.
The whole optimization is then likely
Wed Mar 18 05:54:25 1998  Richard Kenner  <kenner@vlsi1.ultra.nyu.edu>

	* combine.c (gen_binary): Don't make AND that does nothing.
	(simplify_comparison, case AND): Commute AND and SUBREG.
	* i386.h (CONST_CONSTS, case CONST_INT): One-byte integers are cost 0.
but the code has been tweaked further many times since then.

> > We substitute that
> > (leu:SI (and:SI (subreg:SI (reg:HI 175) 0)
> >          (const_int 2124 [0x84c]))
> >      (const_int 5 [0x5]))
> > but then trigger the WORD_REGISTER_OPERATIONS block in simplify_comparison:
> >            /* If this is (and:M1 (subreg:M1 X:M2 0) (const_int C1)) where C1
> >               fits in both M1 and M2 and the SUBREG is either paradoxical
> >               or represents the low part, permute the SUBREG and the AND
> >               and try again.  */
> >            if (GET_CODE (XEXP (op0, 0)) == SUBREG
> >                && CONST_INT_P (XEXP (op0, 1)))
> >              {
> >                unsigned HOST_WIDE_INT c1 = INTVAL (XEXP (op0, 1));
> >                /* Require an integral mode, to avoid creating something like
> >                   (AND:SF ...).  */
> >                if ((is_a <scalar_int_mode>
> >                     (GET_MODE (SUBREG_REG (XEXP (op0, 0))), &tmode))
> >                    /* It is unsafe to commute the AND into the SUBREG if the
> >                       SUBREG is paradoxical and WORD_REGISTER_OPERATIONS is
> >                       not defined.  As originally written the upper bits
> >                       have a defined value due to the AND operation.
> >                       However, if we commute the AND inside the SUBREG then
> >                       they no longer have defined values and the meaning of
> >                       the code has been changed.
> >                       Also C1 should not change value in the smaller mode,
> >                       see PR67028 (a positive C1 can become negative in the
> >                       smaller mode, so that the AND does no longer mask the
> >                       upper bits).  */
> >                    && ((WORD_REGISTER_OPERATIONS
> >                         && mode_width > GET_MODE_PRECISION (tmode)
> >                         && mode_width <= BITS_PER_WORD
> >                         && trunc_int_for_mode (c1, tmode) == (HOST_WIDE_INT) c1)
> >                        || (mode_width <= GET_MODE_PRECISION (tmode)
> >                            && subreg_lowpart_p (XEXP (op0, 0))))
> >                    && mode_width <= HOST_BITS_PER_WIDE_INT
> >                    && HWI_COMPUTABLE_MODE_P (tmode)
> >                    && (c1 & ~mask) == 0
> >                    && (c1 & ~GET_MODE_MASK (tmode)) == 0
> >                    && c1 != mask
> >                    && c1 != GET_MODE_MASK (tmode))
> >                  {
> >                    op0 = simplify_gen_binary (AND, tmode,
> >                                               SUBREG_REG (XEXP (op0, 0)),
> >                                               gen_int_mode (c1, tmode));
> >                    op0 = gen_lowpart (mode, op0);
> >                    continue;
> >                  }
> >              }
> > c1 is 0x84c.  I believe this is the exact spot where things go wrong,
> > and is because for WORD_REGISTER_OPERATIONS we assume something that the
> > DSE added instruction didn't guarantee.
> pan2.li zero'd in on the same block of code.
> 
> The leap I'm strugging with is the assertion that this combine code assumes
> something that the DSE added instruction does not guarantee. That's why I
> asked if we agreed that the before/after from DSE was correct or not.  I
> think that's still the outstanding question.

	Jakub
Jeff Law April 5, 2023, 5:31 p.m. UTC | #5
On 4/5/23 10:48, Jakub Jelinek wrote:
> On Wed, Apr 05, 2023 at 10:17:59AM -0600, Jeff Law wrote:
>>> It is true that an instruction like
>>> (insn 8 7 9 2 (set (reg:HI 141)
>>>           (subreg:HI (reg:SI 142) 0)) "aauu.c":6:18 181 {*movhi_internal}
>>>        (nil))
>>> can appear in the IL on WORD_REGISTER_OPERATIONS target, but I think the
>>> upper bits shouldn't be random garbage in that case, it should be zero
>>> extended or sign extended.
>> Well, that's one of the core questions here.  What are the state of the
>> upper 16 bits of (reg:HI 141)?  The WORD_REGISTER_OPERATIONS docs aren't
>> 100% clear as we're not really doing any operation.
>>
>> So again, I think we need to decide if the DSE transformation is correct or
>> not.  I *think* we can aggree that insn 39 is OK.  It's really the semantics
>> of insn 47 that I think we need to agree on.  What is the state of the upper
>> 16 bits of (reg:HI 175) after insn 47?
> 
> I'm afraid I don't know the answers here, I think Eric is
> WORD_REGISTER_OPERATIONS expert here I think these days (most of the major
> targets are !WORD_REGISTER_OPERATIONS).
Hopefully he'll chime in.

> Intuitively, WORD_REGISTER_OPERATIONS from the description would be
> that
> (insn 47 35 39 2 (set (reg:HI 175)
>          (subreg:HI (reg:SI 166) 0)) "pr109040.c":9:11 179 {*movhi_internal}
>       (expr_list:REG_DEAD (reg:SI 166)
>          (nil)))
> would copy not just the low 16-bits of pseudo 166 to 175, but also the upper
> 16-bits.  
I've gone back and forth repeatedly on this.

Originally I didn't really see this as an operation.  But the more and 
more I ponder it feels like it's an operation and thus should be subject 
to WORD_REGISTER_OPERATIONS.

While it's not really binding on RTL semantics, if we look at how some 
architectures implement reg->reg copies, they're actually implemented 
with an ADD or IOR -- so a real operation under the hood.

If we accept a subreg copy as an operation and thus subject to 
WORD_REGISTER_OPERATIONS then that would seem to imply the combine is 
the real problem here.  Otherwise dse is the culprit.







But if that is so, then something is broken in the code below.
> Some archeology shows that we were doing that on all arches initially
> and then
> Wed May 13 17:38:35 1998  Richard Kenner  <kenner@vlsi1.ultra.nyu.edu>
> 
> 	* combine.c (simplify_comparison, case AND): Don't commute AND
> 	with SUBREG if constant is whole mode and don't do if lowpart
> 	and not WORD_REGISTER_OPERATIONS.
> restricted that to WORD_REGISTER_OPERATIONS only.
> The whole optimization is then likely
> Wed Mar 18 05:54:25 1998  Richard Kenner  <kenner@vlsi1.ultra.nyu.edu>
> 
> 	* combine.c (gen_binary): Don't make AND that does nothing.
> 	(simplify_comparison, case AND): Commute AND and SUBREG.
> 	* i386.h (CONST_CONSTS, case CONST_INT): One-byte integers are cost 0.
> but the code has been tweaked further many times since then.
Sigh.  1998 and probably directly from Kenner's tree at the time.  So no 
public discussion, likely no testcases either.  Off to my private 
archives.  Yup, no discussion of Kenner's patch or testcase.

Interestingly enough I did stumble across Andreas Schwab reporting a bug 
in Kenner's change, though on the m68k and unrelated to 
WORD_REGISTER_OPERATIONS


jeff
Richard Sandiford April 6, 2023, 9:31 a.m. UTC | #6
Jeff Law <jeffreyalaw@gmail.com> writes:
> On 4/5/23 10:48, Jakub Jelinek wrote:
>> On Wed, Apr 05, 2023 at 10:17:59AM -0600, Jeff Law wrote:
>>>> It is true that an instruction like
>>>> (insn 8 7 9 2 (set (reg:HI 141)
>>>>           (subreg:HI (reg:SI 142) 0)) "aauu.c":6:18 181 {*movhi_internal}
>>>>        (nil))
>>>> can appear in the IL on WORD_REGISTER_OPERATIONS target, but I think the
>>>> upper bits shouldn't be random garbage in that case, it should be zero
>>>> extended or sign extended.
>>> Well, that's one of the core questions here.  What are the state of the
>>> upper 16 bits of (reg:HI 141)?  The WORD_REGISTER_OPERATIONS docs aren't
>>> 100% clear as we're not really doing any operation.
>>>
>>> So again, I think we need to decide if the DSE transformation is correct or
>>> not.  I *think* we can aggree that insn 39 is OK.  It's really the semantics
>>> of insn 47 that I think we need to agree on.  What is the state of the upper
>>> 16 bits of (reg:HI 175) after insn 47?
>> 
>> I'm afraid I don't know the answers here, I think Eric is
>> WORD_REGISTER_OPERATIONS expert here I think these days (most of the major
>> targets are !WORD_REGISTER_OPERATIONS).
> Hopefully he'll chime in.

Just curious: have you experimented with making RISC-V
!WORD_REGISTER_OPERATIONS too?  Realise it's not the right way
to fix the bug, just curious in general.

Not defining it seems to have worked well for AArch64.  And IMO
the semantics are much easier to follow when there is no special
treatment of upper bits.  Subregs are hard enough to reason about
as it is...

Richard
Li, Pan2 via Gcc-patches April 6, 2023, 9:37 a.m. UTC | #7
Yes, RISC-V riscv.h defined the WORD_REGISTER_OPERATIONS to be 1, while aarch64.h defined it as 0, with below comments. No idea this can fit RISC-V or not.

/* WORD_REGISTER_OPERATIONS does not hold for AArch64.
   The assigned word_mode is DImode but operations narrower than SImode
   behave as 32-bit operations if using the W-form of the registers rather
   than as word_mode (64-bit) operations as WORD_REGISTER_OPERATIONS
   expects.  */
#define WORD_REGISTER_OPERATIONS 0

Pan

-----Original Message-----
From: Gcc-patches <gcc-patches-bounces+pan2.li=intel.com@gcc.gnu.org> On Behalf Of Richard Sandiford via Gcc-patches
Sent: Thursday, April 6, 2023 5:31 PM
To: Jeff Law <jeffreyalaw@gmail.com>
Cc: Jakub Jelinek <jakub@redhat.com>; Richard Biener <rguenther@suse.de>; Eric Botcazou <botcazou@adacore.com>; gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] dse: Handle SUBREGs of word REGs differently for WORD_REGISTER_OPERATIONS targets [PR109040]

Jeff Law <jeffreyalaw@gmail.com> writes:
> On 4/5/23 10:48, Jakub Jelinek wrote:
>> On Wed, Apr 05, 2023 at 10:17:59AM -0600, Jeff Law wrote:
>>>> It is true that an instruction like (insn 8 7 9 2 (set (reg:HI 141)
>>>>           (subreg:HI (reg:SI 142) 0)) "aauu.c":6:18 181 {*movhi_internal}
>>>>        (nil))
>>>> can appear in the IL on WORD_REGISTER_OPERATIONS target, but I 
>>>> think the upper bits shouldn't be random garbage in that case, it 
>>>> should be zero extended or sign extended.
>>> Well, that's one of the core questions here.  What are the state of 
>>> the upper 16 bits of (reg:HI 141)?  The WORD_REGISTER_OPERATIONS 
>>> docs aren't 100% clear as we're not really doing any operation.
>>>
>>> So again, I think we need to decide if the DSE transformation is 
>>> correct or not.  I *think* we can aggree that insn 39 is OK.  It's 
>>> really the semantics of insn 47 that I think we need to agree on.  
>>> What is the state of the upper
>>> 16 bits of (reg:HI 175) after insn 47?
>> 
>> I'm afraid I don't know the answers here, I think Eric is 
>> WORD_REGISTER_OPERATIONS expert here I think these days (most of the 
>> major targets are !WORD_REGISTER_OPERATIONS).
> Hopefully he'll chime in.

Just curious: have you experimented with making RISC-V !WORD_REGISTER_OPERATIONS too?  Realise it's not the right way to fix the bug, just curious in general.

Not defining it seems to have worked well for AArch64.  And IMO the semantics are much easier to follow when there is no special treatment of upper bits.  Subregs are hard enough to reason about as it is...

Richard
Eric Botcazou April 6, 2023, 10:15 a.m. UTC | #8
> Originally I didn't really see this as an operation.  But the more and
> more I ponder it feels like it's an operation and thus should be subject
> to WORD_REGISTER_OPERATIONS.
> 
> While it's not really binding on RTL semantics, if we look at how some
> architectures implement reg->reg copies, they're actually implemented
> with an ADD or IOR -- so a real operation under the hood.
> 
> If we accept a subreg copy as an operation and thus subject to
> WORD_REGISTER_OPERATIONS then that would seem to imply the combine is
> the real problem here.  Otherwise dse is the culprit.

Yes, I agree that there is an ambiguity for subreg copy operations.  At some 
point I tried to define what register operations are and added a predicate to 
that effect (word_register_operation_p ); while it returns true for SUBREG, 
it's an opt-out predicate so this does not mean much.

I don't think that DSE does anything wrong: as I wrote in the PR, defining 
WORD_REGISTER_OPERATIONS should not prevent any particular form of RTL.

I therefore think that the problem is in the combiner and probably in the 
intermediate step shown by Jakub:

"Then after that try_combine we do:
13325           record_value_for_reg (dest, record_dead_insn,
13326                                 WORD_REGISTER_OPERATIONS
13327                                 && word_register_operation_p (SET_SRC 
(setter))
13328                                 && paradoxical_subreg_p (SET_DEST 
(setter))
13329                                 ? SET_SRC (setter)
13330                                 : gen_lowpart (GET_MODE (dest),
13331                                                SET_SRC (setter)));
and the 3 conditions are true here and so record value of the whole setter.
That then records among other things nonzero_bits as 0x8084c."

That's a recent addition of mine (ae20d760b1ed69f631c3bf9351bf7e5005d52297) 
and I think that it probably abuses WORD_REGISTER_OPERATIONS and should either 
be reverted or restricted to the load case documented in its comment.  I can 
provide testing on SPARC if need be.
Jeff Law April 6, 2023, 2:45 p.m. UTC | #9
On 4/6/23 03:31, Richard Sandiford wrote:
> Jeff Law <jeffreyalaw@gmail.com> writes:
>> On 4/5/23 10:48, Jakub Jelinek wrote:
>>> On Wed, Apr 05, 2023 at 10:17:59AM -0600, Jeff Law wrote:
>>>>> It is true that an instruction like
>>>>> (insn 8 7 9 2 (set (reg:HI 141)
>>>>>            (subreg:HI (reg:SI 142) 0)) "aauu.c":6:18 181 {*movhi_internal}
>>>>>         (nil))
>>>>> can appear in the IL on WORD_REGISTER_OPERATIONS target, but I think the
>>>>> upper bits shouldn't be random garbage in that case, it should be zero
>>>>> extended or sign extended.
>>>> Well, that's one of the core questions here.  What are the state of the
>>>> upper 16 bits of (reg:HI 141)?  The WORD_REGISTER_OPERATIONS docs aren't
>>>> 100% clear as we're not really doing any operation.
>>>>
>>>> So again, I think we need to decide if the DSE transformation is correct or
>>>> not.  I *think* we can aggree that insn 39 is OK.  It's really the semantics
>>>> of insn 47 that I think we need to agree on.  What is the state of the upper
>>>> 16 bits of (reg:HI 175) after insn 47?
>>>
>>> I'm afraid I don't know the answers here, I think Eric is
>>> WORD_REGISTER_OPERATIONS expert here I think these days (most of the major
>>> targets are !WORD_REGISTER_OPERATIONS).
>> Hopefully he'll chime in.
> 
> Just curious: have you experimented with making RISC-V
> !WORD_REGISTER_OPERATIONS too?  Realise it's not the right way
> to fix the bug, just curious in general.
We haven't experimented with it AFAIK.  We don't have a full set of 
SImode operations, but we may have enough of a subset to try. 
Alternately, we could potentially see what happens if we ignore the 
32bit ops that we do support.  Both general directions are probably 
worth exploring, but not right now and probably not even for gcc-14 
(where we're going to be busy as hell on the vector side).

> 
> Not defining it seems to have worked well for AArch64.  And IMO
> the semantics are much easier to follow when there is no special
> treatment of upper bits.  Subregs are hard enough to reason about
> as it is...
Amen to that.  My sense is that the risc-v port relies far too heavily 
on SUBREGs and the WORD_REGISTER_OPERATIONS just makes reasoning about 
correctness harder.

jeff
Jeff Law April 6, 2023, 2:49 p.m. UTC | #10
On 4/6/23 03:37, Li, Pan2 wrote:
> Yes, RISC-V riscv.h defined the WORD_REGISTER_OPERATIONS to be 1, while aarch64.h defined it as 0, with below comments. No idea this can fit RISC-V or not.
I don't see any fundamental reason why it won't work.  Most of the 
expansion code already has code to widen types as necessary.  And given 
that we have a subset of 32bit ops, even in 64bit modes makes a 
WORD_REGISTER_OPERATIONS 0 a more sensible choice.

Jeff
Jeff Law April 6, 2023, 2:53 p.m. UTC | #11
On 4/6/23 04:15, Eric Botcazou wrote:
>> Originally I didn't really see this as an operation.  But the more and
>> more I ponder it feels like it's an operation and thus should be subject
>> to WORD_REGISTER_OPERATIONS.
>>
>> While it's not really binding on RTL semantics, if we look at how some
>> architectures implement reg->reg copies, they're actually implemented
>> with an ADD or IOR -- so a real operation under the hood.
>>
>> If we accept a subreg copy as an operation and thus subject to
>> WORD_REGISTER_OPERATIONS then that would seem to imply the combine is
>> the real problem here.  Otherwise dse is the culprit.
> 
> Yes, I agree that there is an ambiguity for subreg copy operations.  At some
> point I tried to define what register operations are and added a predicate to
> that effect (word_register_operation_p ); while it returns true for SUBREG,
> it's an opt-out predicate so this does not mean much.
Yea, I saw word_register_operation_p.  I was hesitant to treat it as a 
canonical definition of what ops are and are not subject to 
WORD_REGISTER_OPERATIONS.

> 
> I don't think that DSE does anything wrong: as I wrote in the PR, defining
> WORD_REGISTER_OPERATIONS should not prevent any particular form of RTL.
That was the conclusion I'd come to, predicated on treating SUBREGs as 
affected by WORD_REGISTER_OPERATIONS.

> 
> I therefore think that the problem is in the combiner and probably in the
> intermediate step shown by Jakub:
> 
> "Then after that try_combine we do:
> 13325           record_value_for_reg (dest, record_dead_insn,
> 13326                                 WORD_REGISTER_OPERATIONS
> 13327                                 && word_register_operation_p (SET_SRC
> (setter))
> 13328                                 && paradoxical_subreg_p (SET_DEST
> (setter))
> 13329                                 ? SET_SRC (setter)
> 13330                                 : gen_lowpart (GET_MODE (dest),
> 13331                                                SET_SRC (setter)));
> and the 3 conditions are true here and so record value of the whole setter.
> That then records among other things nonzero_bits as 0x8084c."
> 
> That's a recent addition of mine (ae20d760b1ed69f631c3bf9351bf7e5005d52297)
> and I think that it probably abuses WORD_REGISTER_OPERATIONS and should either
> be reverted or restricted to the load case documented in its comment.  I can
> provide testing on SPARC if need be.
I think that's the job for today.  Pan2, Jakub and myself have all 
zero'd in on this code in combine.

jeff
diff mbox series

Patch

--- gcc/dse.cc.jj	2023-01-02 09:32:50.369880943 +0100
+++ gcc/dse.cc	2023-04-04 22:17:22.906347794 +0200
@@ -2012,7 +2012,19 @@  replace_read (store_info *store_info, in
     }
   /* Force the value into a new register so that it won't be clobbered
      between the store and the load.  */
-  read_reg = copy_to_mode_reg (read_mode, read_reg);
+  if (WORD_REGISTER_OPERATIONS
+      && GET_CODE (read_reg) == SUBREG
+      && REG_P (SUBREG_REG (read_reg))
+      && GET_MODE (SUBREG_REG (read_reg)) == word_mode)
+    {
+      /* For WORD_REGISTER_OPERATIONS with subreg of word_mode register
+	 force SUBREG_REG into a new register rather than the SUBREG.  */
+      rtx r = copy_to_mode_reg (word_mode, SUBREG_REG (read_reg));
+      read_reg = shallow_copy_rtx (read_reg);
+      SUBREG_REG (read_reg) = r;
+    }
+  else
+    read_reg = copy_to_mode_reg (read_mode, read_reg);
   insns = get_insns ();
   end_sequence ();
 
--- gcc/testsuite/gcc.c-torture/execute/pr109040.c.jj	2023-04-04 16:57:31.739658564 +0200
+++ gcc/testsuite/gcc.c-torture/execute/pr109040.c	2023-04-04 16:57:15.238900638 +0200
@@ -0,0 +1,23 @@ 
+/* PR target/109040 */
+
+typedef unsigned short __attribute__((__vector_size__ (32))) V;
+
+unsigned short a, b, c, d;
+
+void
+foo (V m, unsigned short *ret)
+{
+  V v = 6 > ((V) { 2124, 8 } & m);
+  unsigned short uc = v[0] + a + b + c + d;
+  *ret = uc;
+}
+
+int
+main ()
+{
+  unsigned short x;
+  foo ((V) { 0, 15 }, &x);
+  if (x != (unsigned short) ~0)
+    __builtin_abort ();
+  return 0;
+}