riscv: implement TARGET_MODE_REP_EXTENDED

Message ID	20220905214437.1275139-1-philipp.tomsich@vrull.eu
State	New
Headers	show Return-Path: <gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org> DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org EB3053858D28 From: Philipp Tomsich <philipp.tomsich@vrull.eu> To: gcc-patches@gcc.gnu.org Subject: [PATCH] riscv: implement TARGET_MODE_REP_EXTENDED Date: Mon, 5 Sep 2022 23:44:37 +0200 Message-Id: <20220905214437.1275139-1-philipp.tomsich@vrull.eu> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: list Cc: Vineet Gupta <vineetg@rivosinc.com>, Kito Cheng <kito.cheng@gmail.com>, Philipp Tomsich <philipp.tomsich@vrull.eu>, Jojo R <rjiejie@linux.alibaba.com> Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" <gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org>
Series	riscv: implement TARGET_MODE_REP_EXTENDED \| expand riscv: implement TARGET_MODE_REP_EXTENDED

Philipp Tomsich Sept. 5, 2022, 9:44 p.m. UTC

TARGET_MODE_REP_EXTENDED is supposed to match LOAD_EXTEND_OP, so this
adds an implementation using the same logic as in LOAD_EXTEND_OP.

This reduces the number of extension operations, as evidenced in the
reduction of dynamic instructions for the xz benchmark in SPEC CPU:

                    # dynamic instructions
                    baseline          new          improvement
xz, workload 1    384681308026    374464538911        2.66%
xz, workload 2    985995327109    974304030498        1.19%
xz, workload 3    545372994523    533717744260        2.14%

The shift-shift-2.c testcase needs to be adjusted, as it will no
longer use slliw/slriw for sub5, but will instead emit slli/slri.

No new regressions runnung the riscv.exp suite.

gcc/ChangeLog:

	* config/riscv/riscv.cc (riscv_mode_rep_extended):
	(TARGET_MODE_REP_EXTENDED): Implement.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/shift-shift-2.c: Adjust.

Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu>

---

 gcc/config/riscv/riscv.cc                      | 15 +++++++++++++++
 gcc/testsuite/gcc.target/riscv/shift-shift-2.c |  2 --
 2 files changed, 15 insertions(+), 2 deletions(-)

Alexander Monakov Sept. 6, 2022, 11:39 a.m. UTC | #1

On Mon, 5 Sep 2022, Philipp Tomsich wrote:

> +riscv_mode_rep_extended (scalar_int_mode mode, scalar_int_mode mode_rep)
> +{
> +  /* On 64-bit targets, SImode register values are sign-extended to DImode.  */
> +  if (TARGET_64BIT && mode == SImode && mode_rep == DImode)
> +    return SIGN_EXTEND;

I think this leads to a counter-intuitive requirement that a hand-written
inline asm must sign-extend its output operands that are bound to either
signed or unsigned 32-bit lvalues. Will compiler users be aware of that?

Moreover, without adjusting TARGET_TRULY_NOOP_TRUNCATION this should cause
miscompilation when a 64-bit variable is truncated to 32 bits: the pre-existing
hook says that nothing needs to be done to truncate, but the new hook says
that the result of the truncation is properly sign-extended.

The documentation for TARGET_MODE_REP_EXTENDED warns about that:

    In order to enforce the representation of mode, TARGET_TRULY_NOOP_TRUNCATION
    should return false when truncating to mode. 

Alexander

Jeff Law Sept. 16, 2022, 11:48 p.m. UTC | #2

On 9/6/22 05:39, Alexander Monakov via Gcc-patches wrote:
> On Mon, 5 Sep 2022, Philipp Tomsich wrote:
>
>> +riscv_mode_rep_extended (scalar_int_mode mode, scalar_int_mode mode_rep)
>> +{
>> +  /* On 64-bit targets, SImode register values are sign-extended to DImode.  */
>> +  if (TARGET_64BIT && mode == SImode && mode_rep == DImode)
>> +    return SIGN_EXTEND;
> I think this leads to a counter-intuitive requirement that a hand-written
> inline asm must sign-extend its output operands that are bound to either
> signed or unsigned 32-bit lvalues. Will compiler users be aware of that?

Is this significantly different than on MIPS?  Hand-written code there 
also has to ensure that the results are properly sign extended and it's 
been that way for 20+ years since the introduction of mips64 IIRC.  
Though I don't think we had MODE_REP_EXTENDED that long.

Haha, MIPS is the only target that currently defines 
TARGET_MODE_REP_EXTENDED :-)



>
> Moreover, without adjusting TARGET_TRULY_NOOP_TRUNCATION this should cause
> miscompilation when a 64-bit variable is truncated to 32 bits: the pre-existing
> hook says that nothing needs to be done to truncate, but the new hook says
> that the result of the truncation is properly sign-extended.
>
> The documentation for TARGET_MODE_REP_EXTENDED warns about that:
>
>      In order to enforce the representation of mode, TARGET_TRULY_NOOP_TRUNCATION
>      should return false when truncating to mode.

This may well need adjusting in Philipp's patch.   I'd be surprised if 
the MIPS definition wasn't usable nearly verbatim here.


jeff

Palmer Dabbelt Sept. 17, 2022, 7:59 a.m. UTC | #3

On Fri, 16 Sep 2022 16:48:24 PDT (-0700), gcc-patches@gcc.gnu.org wrote:
>
> On 9/6/22 05:39, Alexander Monakov via Gcc-patches wrote:
>> On Mon, 5 Sep 2022, Philipp Tomsich wrote:
>>
>>> +riscv_mode_rep_extended (scalar_int_mode mode, scalar_int_mode mode_rep)
>>> +{
>>> +  /* On 64-bit targets, SImode register values are sign-extended to DImode.  */
>>> +  if (TARGET_64BIT && mode == SImode && mode_rep == DImode)
>>> +    return SIGN_EXTEND;
>> I think this leads to a counter-intuitive requirement that a hand-written
>> inline asm must sign-extend its output operands that are bound to either
>> signed or unsigned 32-bit lvalues. Will compiler users be aware of that?
>
> Is this significantly different than on MIPS?  Hand-written code there
> also has to ensure that the results are properly sign extended and it's
> been that way for 20+ years since the introduction of mips64 IIRC. 
> Though I don't think we had MODE_REP_EXTENDED that long.

IMO the problem isn't so much that asm has this constraint, it's that 
it's a new constraint and thus risks breaking code that used to work.  
That said...

> Haha, MIPS is the only target that currently defines
> TARGET_MODE_REP_EXTENDED :-)
>
>
>
>>
>> Moreover, without adjusting TARGET_TRULY_NOOP_TRUNCATION this should cause
>> miscompilation when a 64-bit variable is truncated to 32 bits: the pre-existing
>> hook says that nothing needs to be done to truncate, but the new hook says
>> that the result of the truncation is properly sign-extended.
>>
>> The documentation for TARGET_MODE_REP_EXTENDED warns about that:
>>
>>      In order to enforce the representation of mode, TARGET_TRULY_NOOP_TRUNCATION
>>      should return false when truncating to mode.
>
> This may well need adjusting in Philipp's patch.   I'd be surprised if
> the MIPS definition wasn't usable nearly verbatim here.

Yes, and we have a few MIPS-isms in the ISA but don't have the same 
flavor of TRULY_NOOP_TRUNCATION.  It's been pointed out a handful of 
times and I'm not sure what the right way to go is here, every time I 
try and reason about which is going to produce better code I come up 
with a different answer.  IIRC last time I looked at this I came to the 
conclusion that we're doing the right thing for RISC-V because most of 
our instructions implicitly truncate.  It's pretty easy to generate bad 
code here and I'm pretty sure we could fix some of that by moving to a 
more MIPS-like TRULY_MODE_TRUNCATION, but I think we'd end up just 
pushing the problems around.

Every time I look at this I also get worried that we've leaked some of 
these internal promotion rules into something visible to inline asm, but 
when I poke around it seems like things generally work.

>
>
> jeff

Philipp Tomsich Nov. 4, 2022, 11 p.m. UTC | #4

Alexander,

I had missed your comment until now.

On Tue, 6 Sept 2022 at 13:39, Alexander Monakov <amonakov@ispras.ru> wrote:
>
> On Mon, 5 Sep 2022, Philipp Tomsich wrote:
>
> > +riscv_mode_rep_extended (scalar_int_mode mode, scalar_int_mode
mode_rep)
> > +{
> > +  /* On 64-bit targets, SImode register values are sign-extended to
DImode.  */
> > +  if (TARGET_64BIT && mode == SImode && mode_rep == DImode)
> > +    return SIGN_EXTEND;
>
> I think this leads to a counter-intuitive requirement that a hand-written
> inline asm must sign-extend its output operands that are bound to either
> signed or unsigned 32-bit lvalues. Will compiler users be aware of that?

I am not sure if I fully understand your concern, as the mode of the
asm-output will be derived from the variable type.
So "asm (... : "=r" (a))" will take DI/SI/HI/QImode depending on the type
of a.

The concern, as far as I understand would be the case where the
assembly-sequence leaves an incompatible extension in the register.
So l understand the problem being:
    int a;
    asm volatile ("zext.w %0, %1" : "=r"(a) : "r"(b));  // possible pitfall
-- the programmer wants a SImode representation (so it needs to be extended)
but not
    long long a;
    asm volatile ("zext.w %0, %1" : "=r"(a): "r"(b));   // possible pitfall
-- the programmer wants the DImode representation (don't extend)

If we look at the output of expand for the first case (I made sure to
consume "a" again using a "asm volatile ("nop" : : "r"(a))"), we see that
an explicit extension is created from SImode to DImode (the "(subreg:SI
(reg:DI ...) 0)" in the asm-operand for the next block is a bigger concern
— this may be an artifact of TARGET_TRULY_NOOP_TRUNCATION not being
properly defined on RISC-V; but this issue didn't originate from this
patch):

;; __asm__ __volatile__("zext.w %0,%1" : "=r" a_2 : "r" b_1(D));
(insn 7 5 6 (set (reg:SI 75 [ aD.1490 ])
        (asm_operands/v:SI ("zext.w %0,%1") ("=r") 0 [
                (reg/v:DI 74 [ bD.1487 ])
            ]
             [
                (asm_input:DI ("r") ./rep-asm.c:5)
            ]
             [] ./rep-asm.c:5)) "./rep-asm.c":5:3 -1
     (nil))
(insn 6 7 0 (set (reg/v:DI 72 [ aD.1490 ])
        (sign_extend:DI (reg:SI 75 [ aD.1490 ]))) "./rep-asm.c":5:3 -1
     (nil))
;; __asm__ __volatile__("nop" :  : "r" a_2);
(insn 8 6 0 (asm_operands/v ("nop") ("") 0 [
            (subreg/s/u:SI (reg/v:DI 72 [ aD.1490 ]) 0)
        ]
         [
            (asm_input:SI ("r") ./rep-asm.c:6)
        ]
         [] ./rep-asm.c:6) "./rep-asm.c":6:3 -1
     (nil))

which translates to:

f:

#APP

# 5 "./rep-asm.c" 1

zext.w a0,a0

# 0 "" 2

#NO_APP

sext.w a0,a0

#APP

# 6 "./rep-asm.c" 1

nop

# 0 "" 2

#NO_APP


If this patch is not applied (and TARGET_MODE_REP_EXTENDED is not defined),
we see the sext.w missing.
So in fact, we may have unintentionally fixed an issue?

> Moreover, without adjusting TARGET_TRULY_NOOP_TRUNCATION this should cause
> miscompilation when a 64-bit variable is truncated to 32 bits: the
pre-existing
> hook says that nothing needs to be done to truncate, but the new hook says
> that the result of the truncation is properly sign-extended.
>
> The documentation for TARGET_MODE_REP_EXTENDED warns about that:
>
>     In order to enforce the representation of mode,
TARGET_TRULY_NOOP_TRUNCATION
>     should return false when truncating to mode.

Good point.  This should indeed be added and we'll look into this in more
detail for v2.

Looking into this in more detail, it seems that this change doesn't make
things worse (we already had that problem before, as an explicit extension
might also lead to the same reasoning).
So somehow we've been avoiding this bullet so far, although I don't know
yet how...

Philipp.

Zhongyunde Nov. 5, 2022, 6:16 a.m. UTC | #5

hi,
  This patch is try to fix the issue https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107190, 
would you like to give me some suggestion, thanks.

~/source/gccUpstreamDir/gcc/testsuite(cfg) » git format-patch -1 --start-number=00 HEAD -o ~/patch
/home/zhongyunde/patch/0000-PHIOPT-Add-A-B-CST-B-match-and-simplify-optimization.patch

Andrew Pinski Nov. 5, 2022, 6:34 a.m. UTC | #6

On Fri, Nov 4, 2022 at 11:17 PM Zhongyunde <zhongyunde@huawei.com> wrote:
>
> hi,
>   This patch is try to fix the issue https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107190,
> would you like to give me some suggestion, thanks.

This seems like a "simplified" version of
https://gcc.gnu.org/pipermail/gcc-patches/2021-November/584411.html
which just handles power of 2 constants where we know the cond will be
removed.
We could do even more "simplified" of 1 if needed really.
What is the IR before PHI-OPT? Is it just + 1?

Also your pattern can be simplified to use integer_pow2p in the match
part instead of INTEGER_CST.

Thanks,
Andrew

>
> ~/source/gccUpstreamDir/gcc/testsuite(cfg) » git format-patch -1 --start-number=00 HEAD -o ~/patch
> /home/zhongyunde/patch/0000-PHIOPT-Add-A-B-CST-B-match-and-simplify-optimization.patch

Zhongyunde Nov. 5, 2022, 9:03 a.m. UTC | #7

> -----Original Message-----
> From: Andrew Pinski [mailto:pinskia@gcc.gnu.org]
> Sent: Saturday, November 5, 2022 2:34 PM
> To: Zhongyunde <zhongyunde@huawei.com>
> Cc: hongtao.liu@intel.com; gcc-patches@gcc.gnu.org; Zhangwen(Esan)
> <zwzhangwen.zhang@huawei.com>; Weiwei (weiwei, Compiler)
> <weiwei64@huawei.com>; zhong_1985624@163.com
> Subject: Re: [PATCH] [PHIOPT] Add A ? B + CST : B match and simplify
> optimizations
> 
> On Fri, Nov 4, 2022 at 11:17 PM Zhongyunde <zhongyunde@huawei.com>
> wrote:
> >
> > hi,
> >   This patch is try to fix the issue
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107190,
> > would you like to give me some suggestion, thanks.
> 
> This seems like a "simplified" version of
> https://gcc.gnu.org/pipermail/gcc-patches/2021-November/584411.html
> which just handles power of 2 constants where we know the cond will be
> removed.
> We could do even more "simplified" of 1 if needed really.
> What is the IR before PHI-OPT? Is it just + 1?

Thanks for your attention. It is + 4294967296 before PHI-OPT  (See detail https://gcc.godbolt.org/z/6zEc6ja1z)
So we should keep matching the power of 2 constants ?

> Also your pattern can be simplified to use integer_pow2p in the match part
> instead of INTEGER_CST.
> 
Apply your comment, thanks

> Thanks,
> Andrew

Alexander Monakov Nov. 7, 2022, 1:55 p.m. UTC | #8

On Sat, 5 Nov 2022, Philipp Tomsich wrote:

> Alexander,
> 
> I had missed your comment until now.

Please make sure to read replies from Jeff and Palmer as well (their responses
went to gcc-patches with empty Cc list):
https://inbox.sourceware.org/gcc-patches/ba895f78-7f47-0f4-5bfe-e21893c4bcb@ispras.ru/T/#m7b7e5708b82de3b05ba8007ae6544891a03bdc42

For now let me respond to some of the more prominent points:

> > I think this leads to a counter-intuitive requirement that a hand-written
> > inline asm must sign-extend its output operands that are bound to either
> > signed or unsigned 32-bit lvalues. Will compiler users be aware of that?
> 
> I am not sure if I fully understand your concern, as the mode of the
> asm-output will be derived from the variable type.
> So "asm (... : "=r" (a))" will take DI/SI/HI/QImode depending on the type
> of a.

Yes. The problem arises when 'a' later undergoes conversion to a wider type.

> The concern, as far as I understand would be the case where the
> assembly-sequence leaves an incompatible extension in the register.

Existing documentation like the psABI does not constrain compiler users in any
way, so their inline asm snippets are free to leave garbage in upper bits. Thus
there's no "incompatibility" to speak of.

To give a specific example that will be problematic if you go far enough down
the road of matching MIPS64 behavior:

long f(void)
{
    int x;
    asm("" : "=r"(x));
    return x;
}

here GCC (unlike LLVM) omits sign extension of 'x', assuming that asm output
must have been sign-extended to 64 bits by the programmer.

Alexander

Richard Biener Nov. 8, 2022, 2:58 p.m. UTC | #9

On Sat, Nov 5, 2022 at 10:03 AM Zhongyunde via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
>
> > -----Original Message-----
> > From: Andrew Pinski [mailto:pinskia@gcc.gnu.org]
> > Sent: Saturday, November 5, 2022 2:34 PM
> > To: Zhongyunde <zhongyunde@huawei.com>
> > Cc: hongtao.liu@intel.com; gcc-patches@gcc.gnu.org; Zhangwen(Esan)
> > <zwzhangwen.zhang@huawei.com>; Weiwei (weiwei, Compiler)
> > <weiwei64@huawei.com>; zhong_1985624@163.com
> > Subject: Re: [PATCH] [PHIOPT] Add A ? B + CST : B match and simplify
> > optimizations
> >
> > On Fri, Nov 4, 2022 at 11:17 PM Zhongyunde <zhongyunde@huawei.com>
> > wrote:
> > >
> > > hi,
> > >   This patch is try to fix the issue
> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107190,
> > > would you like to give me some suggestion, thanks.
> >
> > This seems like a "simplified" version of
> > https://gcc.gnu.org/pipermail/gcc-patches/2021-November/584411.html
> > which just handles power of 2 constants where we know the cond will be
> > removed.
> > We could do even more "simplified" of 1 if needed really.
> > What is the IR before PHI-OPT? Is it just + 1?
>
> Thanks for your attention. It is + 4294967296 before PHI-OPT  (See detail https://gcc.godbolt.org/z/6zEc6ja1z)
> So we should keep matching the power of 2 constants ?
>
> > Also your pattern can be simplified to use integer_pow2p in the match part
> > instead of INTEGER_CST.
> >
> Apply your comment, thanks

How does the patch fix the mentioned bug?  match.pd patterns should make things
"simpler" but x + (a << C') isn't simpler than a ? x + C : x.  It
looks you are targeting
PHI-OPT here, so maybe instead extend value_replacement to handle this case,
it does look similar to the case with neutral/absorbing element there?

Richard.

>
> > Thanks,
> > Andrew
>
>

钟云德 Nov. 8, 2022, 3:51 p.m. UTC | #10

At 2022-11-08 22:58:34, "Richard Biener" <richard.guenther@gmail.com> wrote:

>On Sat, Nov 5, 2022 at 10:03 AM Zhongyunde via Gcc-patches
><gcc-patches@gcc.gnu.org> wrote:
>>
>>
>> > -----Original Message-----
>> > From: Andrew Pinski [mailto:pinskia@gcc.gnu.org]
>> > Sent: Saturday, November 5, 2022 2:34 PM
>> > To: Zhongyunde <zhongyunde@huawei.com>
>> > Cc: hongtao.liu@intel.com; gcc-patches@gcc.gnu.org; Zhangwen(Esan)
>> > <zwzhangwen.zhang@huawei.com>; Weiwei (weiwei, Compiler)
>> > <weiwei64@huawei.com>; zhong_1985624@163.com
>> > Subject: Re: [PATCH] [PHIOPT] Add A ? B + CST : B match and simplify
>> > optimizations
>> >
>> > On Fri, Nov 4, 2022 at 11:17 PM Zhongyunde <zhongyunde@huawei.com>
>> > wrote:
>> > >
>> > > hi,
>> > >   This patch is try to fix the issue
>> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107190,
>> > > would you like to give me some suggestion, thanks.
>> >
>> > This seems like a "simplified" version of
>> > https://gcc.gnu.org/pipermail/gcc-patches/2021-November/584411.html
>> > which just handles power of 2 constants where we know the cond will be
>> > removed.
>> > We could do even more "simplified" of 1 if needed really.
>> > What is the IR before PHI-OPT? Is it just + 1?
>>
>> Thanks for your attention. It is + 4294967296 before PHI-OPT  (See detail https://gcc.godbolt.org/z/6zEc6ja1z)
>> So we should keep matching the power of 2 constants ?
>>
>> > Also your pattern can be simplified to use integer_pow2p in the match part
>> > instead of INTEGER_CST.
>> >
>> Apply your comment, thanks
>
>How does the patch fix the mentioned bug?  match.pd patterns should make things
>"simpler" but x + (a << C') isn't simpler than a ? x + C : x.  It
>looks you are targeting
>PHI-OPT here, so maybe instead extend value_replacement to handle this case,
>it does look similar to the case with neutral/absorbing element there?
>

>Richard.


Thanks. This patch try to fix the 1st issued mentioned in107090 – [aarch64] sequence logic should be combined with mul and umulh (gnu.org)
Sure, I'll take a look at the function value_replacement. 
I have also noticed that the function of two_value_replacement is very close to patch I want to achieve, and it may be easy to extend.
It seems can be expressed equally in match.pd (called by match_simplify_replacement), so how do we
choose where to implement may be better?
```
|
/* Do the replacement of conditional if it can be done.  */if (!early_p                                                                                                                && !diamond_p                                                                                                           && two_value_replacement (bb, bb1, e2, phi, arg0, arg1))                                                              cfgchanged = true;                                                                                          elseif (!diamond_p                                                                                                              && match_simplify_replacement (bb, bb1, e1, e2, phi,                                                                                                   arg0, arg1, early_p))                                                             cfgchanged = true;                                                                                          
|
```
>> > Thanks, >> > Andrew >> >>

Philipp Tomsich Nov. 8, 2022, 11:45 p.m. UTC | #11

On Mon, 7 Nov 2022 at 14:55, Alexander Monakov <amonakov@ispras.ru> wrote:
>
>
>
> On Sat, 5 Nov 2022, Philipp Tomsich wrote:
>
> > Alexander,
> >
> > I had missed your comment until now.
>
> Please make sure to read replies from Jeff and Palmer as well (their responses
> went to gcc-patches with empty Cc list):
> https://inbox.sourceware.org/gcc-patches/ba895f78-7f47-0f4-5bfe-e21893c4bcb@ispras.ru/T/#m7b7e5708b82de3b05ba8007ae6544891a03bdc42
>
> For now let me respond to some of the more prominent points:
>
> > > I think this leads to a counter-intuitive requirement that a hand-written
> > > inline asm must sign-extend its output operands that are bound to either
> > > signed or unsigned 32-bit lvalues. Will compiler users be aware of that?
> >
> > I am not sure if I fully understand your concern, as the mode of the
> > asm-output will be derived from the variable type.
> > So "asm (... : "=r" (a))" will take DI/SI/HI/QImode depending on the type
> > of a.
>
> Yes. The problem arises when 'a' later undergoes conversion to a wider type.
>
> > The concern, as far as I understand would be the case where the
> > assembly-sequence leaves an incompatible extension in the register.
>
> Existing documentation like the psABI does not constrain compiler users in any
> way, so their inline asm snippets are free to leave garbage in upper bits. Thus
> there's no "incompatibility" to speak of.
>
> To give a specific example that will be problematic if you go far enough down
> the road of matching MIPS64 behavior:
>
> long f(void)
> {
>     int x;
>     asm("" : "=r"(x));
>     return x;
> }
>
> here GCC (unlike LLVM) omits sign extension of 'x', assuming that asm output
> must have been sign-extended to 64 bits by the programmer.

In fact, with the proposed patch (but also without it), GCC will sign-extend:
f:
  sext.w a0,a0
  ret
  .size f, .-f

To make sure that this is not just an extension to promote the int to
long for the function return, I next added another empty asm to
consume 'x'.
This clearly shows that the extension is performed to postprocess the
output of the asm-statement:

f:
  # ./asm2.c:4:     asm("" : "=r"(x));
  sext.w a0,a0 # x, x
  # ./asm2.c:5:     asm("" : : "r"(x));
  # ./asm2.c:7: }
  ret

Thanks,
Philipp.

Richard Biener Nov. 9, 2022, 8 a.m. UTC | #12

On Tue, Nov 8, 2022 at 4:51 PM 钟云德 <zhong_1985624@163.com> wrote:

> At 2022-11-08 22:58:34, "Richard Biener" <richard.guenther@gmail.com>
> wrote:
>
> >On Sat, Nov 5, 2022 at 10:03 AM Zhongyunde via Gcc-patches
> ><gcc-patches@gcc.gnu.org> wrote:
> >>
> >>
> >> > -----Original Message-----
> >> > From: Andrew Pinski [mailto:pinskia@gcc.gnu.org]
> >> > Sent: Saturday, November 5, 2022 2:34 PM
> >> > To: Zhongyunde <zhongyunde@huawei.com>
> >> > Cc: hongtao.liu@intel.com; gcc-patches@gcc.gnu.org; Zhangwen(Esan)
> >> > <zwzhangwen.zhang@huawei.com>; Weiwei (weiwei, Compiler)
> >> > <weiwei64@huawei.com>; zhong_1985624@163.com
> >> > Subject: Re: [PATCH] [PHIOPT] Add A ? B + CST : B match and simplify
> >> > optimizations
> >> >
> >> > On Fri, Nov 4, 2022 at 11:17 PM Zhongyunde <zhongyunde@huawei.com>
> >> > wrote:
> >> > >
> >> > > hi,
> >> > >   This patch is try to fix the issue
> >> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107190,
> >> > > would you like to give me some suggestion, thanks.
> >> >
> >> > This seems like a "simplified" version of
> >> > https://gcc.gnu.org/pipermail/gcc-patches/2021-November/584411.html
> >> > which just handles power of 2 constants where we know the cond will be
> >> > removed.
> >> > We could do even more "simplified" of 1 if needed really.
> >> > What is the IR before PHI-OPT? Is it just + 1?
> >>
> >> Thanks for your attention. It is + 4294967296 before PHI-OPT  (See detail https://gcc.godbolt.org/z/6zEc6ja1z)
> >> So we should keep matching the power of 2 constants ?
> >>
> >> > Also your pattern can be simplified to use integer_pow2p in the match part
> >> > instead of INTEGER_CST.
> >> >
> >> Apply your comment, thanks
> >
> >How does the patch fix the mentioned bug?  match.pd patterns should make things
> >"simpler" but x + (a << C') isn't simpler than a ? x + C : x.  It
> >looks you are targeting
> >PHI-OPT here, so maybe instead extend value_replacement to handle this case,
> >it does look similar to the case with neutral/absorbing element there?
> >
> >Richard.
>
> Thanks. This patch try to fix the 1st issued mentioned in 107090 – [aarch64] sequence logic should be combined with mul and umulh (gnu.org) <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107090>
> Sure, I'll take a look at the function value_replacement.
> I have also noticed that the function of two_value_replacement is very close to patch I want to achieve, and it may be easy to extend.
> It seems can be expressed equally in match.pd (called by match_simplify_replacement), so how do we
> choose where to implement may be better?
>
>
I think that if we realize that we don't want a simplification to always
apply (like by checking
for canonicalize_math_p ()), then we should look for a pass which is a good
fit and since
you add the pattern to trigger a PHI-OPT optimization that looked like an
obvious choice ...

```
>
>           /* Do the replacement of conditional if it can be done.  */                                                             if (!early_p                                                                                                                && !diamond_p                                                                                                           && two_value_replacement (bb, bb1, e2, phi, arg0, arg1))                                                              cfgchanged = true;                                                                                                    else if (!diamond_p                                                                                                              && match_simplify_replacement (bb, bb1, e1, e2, phi,                                                                                                   arg0, arg1, early_p))                                                             cfgchanged = true;
>
> ```
> >> > Thanks, >> > Andrew >> >>
>
>

Alexander Monakov Nov. 9, 2022, 5:21 p.m. UTC | #13

On Wed, 9 Nov 2022, Philipp Tomsich wrote:

> > To give a specific example that will be problematic if you go far enough down
> > the road of matching MIPS64 behavior:
> >
> > long f(void)
> > {
> >     int x;
> >     asm("" : "=r"(x));
> >     return x;
> > }
> >
> > here GCC (unlike LLVM) omits sign extension of 'x', assuming that asm output
> > must have been sign-extended to 64 bits by the programmer.
> 
> In fact, with the proposed patch (but also without it), GCC will sign-extend:
> f:
>   sext.w a0,a0
>   ret
>   .size f, .-f

I'm aware. I said "will be problematic if ...", meaning that GCC omits sign
extension when targeting MIPS64, and if you match MIPS64 behavior on RISC-V, 
you'll get in that situation as well.

> To make sure that this is not just an extension to promote the int to
> long for the function return, I next added another empty asm to
> consume 'x'.
> This clearly shows that the extension is performed to postprocess the
> output of the asm-statement:
> 
> f:
>   # ./asm2.c:4:     asm("" : "=r"(x));
>   sext.w a0,a0 # x, x
>   # ./asm2.c:5:     asm("" : : "r"(x));
>   # ./asm2.c:7: }
>   ret

No, you cannot distinguish post-processing the output of the first asm vs.
pre-processing the input of the second asm. Try

  asm("" : "+r"(x));

as the second asm instead, and you'll get

f:
# t.c:17:     asm("" : "=r"(x));
# t.c:18:     asm("" : "+r"(x));
# t.c:20: }
        sext.w  a0,a0   #, x
        ret

If it helps, here's a Compiler Explorer link comparing with MIPS64:
https://godbolt.org/z/7eobvdKdK

Alexander

Jeff Law Nov. 20, 2022, 4:09 p.m. UTC | #14

On 11/4/22 17:00, Philipp Tomsich wrote:
> Alexander,
>
> I had missed your comment until now.
>
> On Tue, 6 Sept 2022 at 13:39, Alexander Monakov <amonakov@ispras.ru> wrote:
>> On Mon, 5 Sep 2022, Philipp Tomsich wrote:
>>
>>> +riscv_mode_rep_extended (scalar_int_mode mode, scalar_int_mode
> mode_rep)
>>> +{
>>> +  /* On 64-bit targets, SImode register values are sign-extended to
> DImode.  */
>>> +  if (TARGET_64BIT && mode == SImode && mode_rep == DImode)
>>> +    return SIGN_EXTEND;
>> I think this leads to a counter-intuitive requirement that a hand-written
>> inline asm must sign-extend its output operands that are bound to either
>> signed or unsigned 32-bit lvalues. Will compiler users be aware of that?
> I am not sure if I fully understand your concern, as the mode of the
> asm-output will be derived from the variable type.
> So "asm (... : "=r" (a))" will take DI/SI/HI/QImode depending on the type
> of a.

Correct.


>
> The concern, as far as I understand would be the case where the
> assembly-sequence leaves an incompatible extension in the register.

Right.  The question in my mind is whether or not the responsibility 
should be on the compiler or on the developer to ensure the ASM output 
is properly extended.  If someone's writing ASMs, then to a large 
degree, I consider it their responsibility to make sure things are 
properly extended -- even more so if having the compiler do it results 
in slower code independent of ASMs.


Jeff

Alexander Monakov Nov. 21, 2022, 1:49 p.m. UTC | #15

On Sun, 20 Nov 2022, Jeff Law wrote:

> > The concern, as far as I understand would be the case where the
> > assembly-sequence leaves an incompatible extension in the register.
> 
> Right.  The question in my mind is whether or not the responsibility should be
> on the compiler or on the developer to ensure the ASM output is properly
> extended.  If someone's writing ASMs, then to a large degree, I consider it
> their responsibility to make sure things are properly extended

Right. They should also find out the hard way, with zero documentation telling
them they had to (let alone *why* they had to), and without a hypothetical
-fsanitize=abi that would catch exactly the point of the missing extension
instead of letting the program crash mysteriously at a much later point.
Uphill both ways, in a world where LLVM does not place such burden on the
programmer, and even among GCC targets only mips64 made a precedent (also
without documenting the new requirement).

> -- even more so
> if having the compiler do it results in slower code independent of ASMs.

I think LLVM demonstrates well enough that a compiler can do a better job
than GCC at eliminating redundant extensions without upgrading requirements
for inline asm (in the usual C code, not for sub-word outputs of an asm).

Alexander

Jeff Law Nov. 21, 2022, 2:56 p.m. UTC | #16

On 11/21/22 06:49, Alexander Monakov wrote:
> On Sun, 20 Nov 2022, Jeff Law wrote:
>
>>> The concern, as far as I understand would be the case where the
>>> assembly-sequence leaves an incompatible extension in the register.
>> Right.  The question in my mind is whether or not the responsibility should be
>> on the compiler or on the developer to ensure the ASM output is properly
>> extended.  If someone's writing ASMs, then to a large degree, I consider it
>> their responsibility to make sure things are properly extended
> Right. They should also find out the hard way, with zero documentation telling
> them they had to (let alone *why* they had to), and without a hypothetical
> -fsanitize=abi that would catch exactly the point of the missing extension
> instead of letting the program crash mysteriously at a much later point.
> Uphill both ways, in a world where LLVM does not place such burden on the
> programmer, and even among GCC targets only mips64 made a precedent (also
> without documenting the new requirement).

They're writing assembly code -- in my book that means they'd better 
have a pretty good understanding of the architecture, its limitations 
and quirks.

  I'm happy to give them some diagnostics, guardrails, etc etc, but 
slowing down standard C code to placate someone who doesn't really know 
the architecture, but is trying to write assembly code is a step too far 
for me.


>
>> -- even more so
>> if having the compiler do it results in slower code independent of ASMs.
> I think LLVM demonstrates well enough that a compiler can do a better job
> than GCC at eliminating redundant extensions without upgrading requirements
> for inline asm (in the usual C code, not for sub-word outputs of an asm).

Perhaps.  But it's also the case the LLVM and GCC have some significant 
differences in implementation.  Sometimes those differences in 
impementations have notable impacts on how code is generated.


jeff

>
> Alexander

Alexander Monakov Nov. 21, 2022, 3:33 p.m. UTC | #17

On Mon, 21 Nov 2022, Jeff Law wrote:

> They're writing assembly code -- in my book that means they'd better have a
> pretty good understanding of the architecture, its limitations and quirks.

That GCC ties together optimization and inline asm interface via its internal
TARGET_MODE_REP_EXTENDED macro is not a quirk of the RISC-V architecture.
It's a quirk of GCC.

Alexander

riscv: implement TARGET_MODE_REP_EXTENDED

Commit Message

Comments

Patch