[v2,0/5] RISC-V big endian support

Message ID	20210221000903.32039-1-marcus@mc.pp.se
Headers	show Return-Path: <gcc-patches-bounces@gcc.gnu.org> DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org ACBC03857C7A From: "Marcus Comstedt" <marcus@mc.pp.se> To: GCC Patches <gcc-patches@gcc.gnu.org> Subject: [PATCH v2 0/5] RISC-V big endian support Date: Sun, 21 Feb 2021 01:08:58 +0100 Message-Id: <20210221000903.32039-1-marcus@mc.pp.se> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: list Errors-To: gcc-patches-bounces@gcc.gnu.org Sender: "Gcc-patches" <gcc-patches-bounces@gcc.gnu.org>
Series	RISC-V big endian support \| expand [v2,0/5] RISC-V big endian support [v2,1/5] RISC-V: Support -mlittle-endian and -mbig-endian [v2,2/5] RISC-V: Add riscv{32,64}be with big endian as default [v2,3/5] RISC-V: Update soft-fp config for big-endian [v2,4/5] RISC-V: Fix trampoline generation on big endian [v2,5/5] RISC-V: Update shift-shift-5.c testcase for big endian

Marcus Comstedt Feb. 21, 2021, 12:08 a.m. UTC

This is an update to the patch series for big endian RISC-V support.

Changes since last version:

  * Added documentation of -mbig-endian and -mlittle-endian

  * New patch: Fix soft-fp endianness setting

  * New patch: Fix trampoline generation on big endian

  * New patch: Update the shift-shift-5.c testcase to work correctly
    on big endian

With these changes, and two fixes to newlib (setting correct floating
point byteorder, and an update to the handcoded assembler for strcmp),
I'm now down to

               ========= Summary of gcc testsuite =========
                            | # of unexpected case / # of unique unexpected case
                            |          gcc |          g++ |     gfortran |
     rv64gc/   lp64/ medlow |   14 /     8 |   39 /    10 |      - |

and of these only two failures do not also occur for little endian:

FAIL: gcc.target/riscv/shift-and-1.c scan-assembler-not andi
FAIL: gcc.target/riscv/shift-and-2.c scan-assembler-not andi

I'm quite puzzled why these two testcases give different results with
-mbig-endian compared to with -mlittle-endian though, since they only
deal with register-to-register operations so the memory model should be
completely irrelevant...


  // Marcus

Kito Cheng Feb. 23, 2021, 2:38 a.m. UTC | #1

Hi Marcus:

Thanks for the quick update, I am testing your V2 patch now, the result seems
really great now, some of fail case seems like not cause by
big-endian patch, I am reviewing and comparing the fail case with the
little-endian build.

> Should I make a PR against riscv-newlib on GitHub, or would you prefer
> some other process for dealing with newlib fixes related to these
> patches?

Could you send to newlib mailing list directly, ideally riscv-newlib
just a buffer
we don't want to hold any patch there as possible.
https://sourceware.org/mailman/listinfo/newlib/




On Sun, Feb 21, 2021 at 8:17 AM Marcus Comstedt <marcus@mc.pp.se> wrote:
>
> This is an update to the patch series for big endian RISC-V support.
>
> Changes since last version:
>
>   * Added documentation of -mbig-endian and -mlittle-endian
>
>   * New patch: Fix soft-fp endianness setting
>
>   * New patch: Fix trampoline generation on big endian
>
>   * New patch: Update the shift-shift-5.c testcase to work correctly
>     on big endian
>
> With these changes, and two fixes to newlib (setting correct floating
> point byteorder, and an update to the handcoded assembler for strcmp),
> I'm now down to
>
>                ========= Summary of gcc testsuite =========
>                             | # of unexpected case / # of unique unexpected case
>                             |          gcc |          g++ |     gfortran |
>      rv64gc/   lp64/ medlow |   14 /     8 |   39 /    10 |      - |
>
> and of these only two failures do not also occur for little endian:
>
> FAIL: gcc.target/riscv/shift-and-1.c scan-assembler-not andi
> FAIL: gcc.target/riscv/shift-and-2.c scan-assembler-not andi
>
> I'm quite puzzled why these two testcases give different results with
> -mbig-endian compared to with -mlittle-endian though, since they only
> deal with register-to-register operations so the memory model should be
> completely irrelevant...
>
>
>   // Marcus
>
>
>

Kito Cheng Feb. 23, 2021, 7:12 a.m. UTC | #2

Seems like only 3 fail are related to big-endian, you don't need to
worry about other fails.

FAIL: gcc.c-torture/execute/string-opt-5.c
FAIL: gcc.target/riscv/shift-and-1.c scan-assembler-not andi
FAIL: gcc.target/riscv/shift-and-2.c scan-assembler-not andi

On Tue, Feb 23, 2021 at 10:38 AM Kito Cheng <kito.cheng@gmail.com> wrote:
>
> Hi Marcus:
>
> Thanks for the quick update, I am testing your V2 patch now, the result seems
> really great now, some of fail case seems like not cause by
> big-endian patch, I am reviewing and comparing the fail case with the
> little-endian build.
>
> > Should I make a PR against riscv-newlib on GitHub, or would you prefer
> > some other process for dealing with newlib fixes related to these
> > patches?
>
> Could you send to newlib mailing list directly, ideally riscv-newlib
> just a buffer
> we don't want to hold any patch there as possible.
> https://sourceware.org/mailman/listinfo/newlib/
>
>
>
>
> On Sun, Feb 21, 2021 at 8:17 AM Marcus Comstedt <marcus@mc.pp.se> wrote:
> >
> > This is an update to the patch series for big endian RISC-V support.
> >
> > Changes since last version:
> >
> >   * Added documentation of -mbig-endian and -mlittle-endian
> >
> >   * New patch: Fix soft-fp endianness setting
> >
> >   * New patch: Fix trampoline generation on big endian
> >
> >   * New patch: Update the shift-shift-5.c testcase to work correctly
> >     on big endian
> >
> > With these changes, and two fixes to newlib (setting correct floating
> > point byteorder, and an update to the handcoded assembler for strcmp),
> > I'm now down to
> >
> >                ========= Summary of gcc testsuite =========
> >                             | # of unexpected case / # of unique unexpected case
> >                             |          gcc |          g++ |     gfortran |
> >      rv64gc/   lp64/ medlow |   14 /     8 |   39 /    10 |      - |
> >
> > and of these only two failures do not also occur for little endian:
> >
> > FAIL: gcc.target/riscv/shift-and-1.c scan-assembler-not andi
> > FAIL: gcc.target/riscv/shift-and-2.c scan-assembler-not andi
> >
> > I'm quite puzzled why these two testcases give different results with
> > -mbig-endian compared to with -mlittle-endian though, since they only
> > deal with register-to-register operations so the memory model should be
> > completely irrelevant...
> >
> >
> >   // Marcus
> >
> >
> >

Marcus Comstedt Feb. 23, 2021, 7:23 a.m. UTC | #3

Hi Kito,

Kito Cheng <kito.cheng@gmail.com> writes:

> FAIL: gcc.c-torture/execute/string-opt-5.c
> FAIL: gcc.target/riscv/shift-and-1.c scan-assembler-not andi
> FAIL: gcc.target/riscv/shift-and-2.c scan-assembler-not andi

string-opt-5.c is one of the newlib issues I mentioned (handcoded
assembler for strcmp which assumed LE (it was intended to #error out
on BE, but used "BYTE_ORDER" instead of "__BYTE_ORDER__", so the check
never worked)).  I'll send the fixes later today.

The shift-and tests don't generate incorrect code or anything, but
it's still puzzling why the generated code is different from with
-mlittle-endian.

  // Marcus

Kito Cheng Feb. 24, 2021, 7:45 a.m. UTC | #4

Hi Marcus:

I just spend some time on those two testcase, I think this those two
testcase could just skip in big-endinan.

> FAIL: gcc.target/riscv/shift-and-1.c scan-assembler-not andi
> FAIL: gcc.target/riscv/shift-and-2.c scan-assembler-not andi

However seems like rv32be has still has some strange fail there,
do you mind take a look for that?

../configure --prefix=$PREFIX --with-arch=rv32gc
--with-multilib-generator=rv32gc-ilp32--


diff --git a/gcc/testsuite/gcc.target/riscv/shift-and-1.c
b/gcc/testsuite/gcc.target/riscv/shift-and-1.c
index d1f3a05db2c..6f4dccc709f 100644
--- a/gcc/testsuite/gcc.target/riscv/shift-and-1.c
+++ b/gcc/testsuite/gcc.target/riscv/shift-and-1.c
@@ -1,5 +1,5 @@
/* { dg-do compile } */
-/* { dg-options "-march=rv32gc -mabi=ilp32 -O" } */
+/* { dg-options "-march=rv32gc -mabi=ilp32 -O -mlittle-endian" } */

/* Test for <optab>si3_mask.  */
int
diff --git a/gcc/testsuite/gcc.target/riscv/shift-and-2.c
b/gcc/testsuite/gcc.target/riscv/shift-and-2.c
index 2c98e50101b..19ce5a60b30 100644
--- a/gcc/testsuite/gcc.target/riscv/shift-and-2.c
+++ b/gcc/testsuite/gcc.target/riscv/shift-and-2.c
@@ -1,5 +1,5 @@
/* { dg-do compile { target { riscv64*-*-* } } } */
-/* { dg-options "-march=rv64gc -mabi=lp64 -O" } */
+/* { dg-options "-march=rv64gc -mabi=lp64 -O -mlittle-endian" } */

/* Test for <optab>si3_mask_1.  */
extern int k;
On Tue, Feb 23, 2021 at 3:23 PM Marcus Comstedt <marcus@mc.pp.se> wrote:
>
>
> Hi Kito,
>
> Kito Cheng <kito.cheng@gmail.com> writes:
>
> > FAIL: gcc.c-torture/execute/string-opt-5.c
> > FAIL: gcc.target/riscv/shift-and-1.c scan-assembler-not andi
> > FAIL: gcc.target/riscv/shift-and-2.c scan-assembler-not andi
>
> string-opt-5.c is one of the newlib issues I mentioned (handcoded
> assembler for strcmp which assumed LE (it was intended to #error out
> on BE, but used "BYTE_ORDER" instead of "__BYTE_ORDER__", so the check
> never worked)).  I'll send the fixes later today.
>
> The shift-and tests don't generate incorrect code or anything, but
> it's still puzzling why the generated code is different from with
> -mlittle-endian.
>
>
>   // Marcus
>
>

Marcus Comstedt Feb. 24, 2021, 6:03 p.m. UTC | #5

Hi Kito,

Kito Cheng <kito.cheng@gmail.com> writes:

> I just spend some time on those two testcase, I think this those two
> testcase could just skip in big-endinan.

Well, that sounds like a pretty big cop out.  If the software doesn't
behave like we expect it too I feel we should at least have some idea
_why_...

>> FAIL: gcc.target/riscv/shift-and-1.c scan-assembler-not andi
>> FAIL: gcc.target/riscv/shift-and-2.c scan-assembler-not andi
>
> However seems like rv32be has still has some strange fail there,
> do you mind take a look for that?

Do you mean in those two test cases specifically?  Or rv32be in
general?

  // Marcus

Marcus Comstedt Feb. 24, 2021, 7:23 p.m. UTC | #6

Hi again.

I've found the reason for the shift-and test fails.

riscv.md does a match on

  (subreg:QI (and:SI ...) 0)

Unfortunately, due to the way "subreg" is defined, this needs to be

  (subreg:QI (and:SI ...) 3)

on big endian.  I can fix the failures by duplicating the rule and
making the one with "0" check !BYTES_BIG_ENDIAN and the one with "3"
check BYTES_BIG_ENDIAN.  But that's a bit heavy handed of course.
I'll try to come up with a solution using subreg_lowpart_p instead of
hardcoding "0" or "3".


  // Marcus

Marcus Comstedt Feb. 26, 2021, 8:46 p.m. UTC | #7

Hi Kito.

I fixed almost all of the rv32be testcase failures simply by taking
endianness into account on the first line of riscv_subword, which is
used for long long handling on 32-bit.

Now, I only have one failing testcase (which does not also fail on
little endian), and it's a doozy.

The test in question is gcc.c-torture/compile/pr35318.c.  The test in
its entirety is

  
  double x = 4, y;
  __asm__ volatile ("# %0,%1,%2,%3" : "=r,r" (x), "=r,r" (y) : "%0,0" (x), "m,r" (8));


(the asm comment in the first argument was added by me to track what
 the actual assignments were.)

When compiled with -mbig-endian, this results in an ICE:

---8<---
/tmp/pr35318.c: In function 'foo':
/tmp/pr35318.c:9:1: error: unrecognizable insn:
    9 | }
      | ^
(insn 12 24 25 2 (parallel [
            (set (reg:DF 11 a1 [orig:74 x ] [74])
                (asm_operands/v:DF ("# %0,%1,%2,%3") ("=r,r") 0 [
                        (reg:SI 12 a2 [orig:74 x+4 ] [74])
                        (mem/c:DF (plus:SI (reg/f:SI 8 s0)
                                (const_int -40 [0xffffffffffffffd8])) [2 %sfp+-24 S8 A64])
                    ]
                     [
                        (asm_input:DF ("%0,0") /tmp/pr35318.c:8)
                        (asm_input:SI ("m,r") /tmp/pr35318.c:8)
                    ]
                     [] /tmp/pr35318.c:8))
            (set (reg:DF 15 a5 [orig:75 y ] [75])
                (asm_operands/v:DF ("# %0,%1,%2,%3") ("=r,r") 1 [
                        (reg:SI 12 a2 [orig:74 x+4 ] [74])
                        (mem/c:DF (plus:SI (reg/f:SI 8 s0)
                                (const_int -40 [0xffffffffffffffd8])) [2 %sfp+-24 S8 A64])
                    ]
                     [
                        (asm_input:DF ("%0,0") /tmp/pr35318.c:8)
                        (asm_input:SI ("m,r") /tmp/pr35318.c:8)
                    ]
                     [] /tmp/pr35318.c:8))
        ]) "/tmp/pr35318.c":8:3 -1
     (nil))
during RTL pass: reload
dump file: /tmp/pr35318b.txt
/tmp/pr35318.c:9:1: internal compiler error: in extract_constrain_insn, at recog.c:2670
0x101bf90b _fatal_insn(char const*, rtx_def const*, char const*, int, char const*)
	../../../riscv-gcc/gcc/rtl-error.c:108
0x101bf953 _fatal_insn_not_found(rtx_def const*, char const*, int, char const*)
	../../../riscv-gcc/gcc/rtl-error.c:116
0x10a1193f extract_constrain_insn(rtx_insn*)
	../../../riscv-gcc/gcc/recog.c:2670
0x1088fc77 check_rtl
	../../../riscv-gcc/gcc/lra.c:2087
0x108971c7 lra(_IO_FILE*)
	../../../riscv-gcc/gcc/lra.c:2505
0x1082fcb7 do_reload
	../../../riscv-gcc/gcc/ira.c:5827
0x1082fcb7 execute
	../../../riscv-gcc/gcc/ira.c:6013
---8<---

This insn looks extremely similar to one that's in the dump-rtl for
little endian:

---8<---
(insn 12 20 21 2 (parallel [
            (set (reg:DF 13 a3 [orig:74 x ] [74])
                (asm_operands/v:DF ("# %0,%1,%2,%3") ("=r,r") 0 [
                        (reg:SI 13 a3 [orig:74 x ] [74])
                        (mem/c:DF (plus:SI (reg/f:SI 8 s0)
                                (const_int -40 [0xffffffffffffffd8])) [2 %sfp+-24 S8 A64])
                    ]
                     [
                        (asm_input:DF ("%0,0") /tmp/pr35318.c:8)
                        (asm_input:SI ("m,r") /tmp/pr35318.c:8)
                    ]
                     [] /tmp/pr35318.c:8))
            (set (reg:DF 15 a5 [orig:75 y ] [75])
                (asm_operands/v:DF ("# %0,%1,%2,%3") ("=r,r") 1 [
                        (reg:SI 13 a3 [orig:74 x ] [74])
                        (mem/c:DF (plus:SI (reg/f:SI 8 s0)
                                (const_int -40 [0xffffffffffffffd8])) [2 %sfp+-24 S8 A64])
                    ]
                     [
                        (asm_input:DF ("%0,0") /tmp/pr35318.c:8)
                        (asm_input:SI ("m,r") /tmp/pr35318.c:8)
                    ]
                     [] /tmp/pr35318.c:8))
        ]) "/tmp/pr35318.c":8:3 -1
     (nil))
---8<---

So I don't know what's "unrecognizable" about it...

I also don't understand the code that is actually generated in the
little-endian case.

The way I read the asm statement, %2 should be a register (same as %0)
containing the (floating point?) value "4", and %3 should be a memory
location (assuming the first alternative is chosen) containing the
value "8".

However, looking at the generated assembler code, it seems that %2 is
a register (a3) which contains the integer value "8" and %3 is a
memory location (-40(s0)) which contains the floating point value
"4.0".  This seems mixed up.

---8<---
foo:
	addi	sp,sp,-48
	sw	s0,44(sp)
	addi	s0,sp,48
	lui	a5,%hi(.LC0)
	fld	fa5,%lo(.LC0)(a5)
	fsd	fa5,-24(s0)
	fld	fa5,-24(s0)
	li	a5,8
	fsd	fa5,-40(s0)
	mv	a3,a5
 #APP
# 8 "/tmp/pr35318.c" 1
	# a3,a5,a3,-40(s0)
# 0 "" 2
 #NO_APP
	sw	a3,-40(s0)
	sw	a4,-36(s0)
	fld	fa5,-40(s0)
	fsd	fa5,-24(s0)
	sw	a5,-32(s0)
	sw	a6,-28(s0)
	nop
	lw	s0,44(sp)
	addi	sp,sp,48
	jr	ra
	.size	foo, .-foo
	.section	.rodata
	.align	3
.LC0:  # little endian double "4.0"
	.word	0
	.word	1074790400
---8<---

Is this code correct, or is there some deeper issue at play here?
(AFAIU the testcase only checks that the compiler doesn't ICE, not
 that the generated code is correct...)

If the code generated for LE is bad, I probably should not try to make
BE generate the same thing.  :-/


  // Marcus

Marcus Comstedt March 14, 2021, 9:42 p.m. UTC | #8

Hello again Kito.

I've now delved a bit deeper into the failure of the testcase
gcc.c-torture/compile/pr35318.c on big endian RV32.

The point at which big endian diverges from little endian is where
process_alt_operands() is processing the "%0" constraint.  It calls
operands_match_p(), which succeeds on little endian but fails on
big endian.

On little endian, the two rtx:es passed to operands_match_p are
"r79:DF#0" and "r79:DF", while on big endian they are "r79:DF#4" and
"r79:DF".  While the first operand is different, it's actually saying
the same thing: The subreg with the least significant bits (meaning
the second register in the pair on big endian, and the first register
in the pair on little endian, what with two 32-bit integer registers
being allocated to hold a single 64-bit float).

The helper function lra_constraint_offset(), which is used by
operands_match_p, seems to be intended to handle this discrepancy.  It
contains the code

  if (WORDS_BIG_ENDIAN
      && is_a <scalar_int_mode> (mode, &int_mode)
      && GET_MODE_SIZE (int_mode) > UNITS_PER_WORD)
    return hard_regno_nregs (regno, mode) - 1;

However, in this case the rule does not trigger because the mode of
the second operand (which is the one where an adjustment would be
needed) does not have a scalar_int_mode, it has DFmode.  If I relax
this code to also allow scalar_float_mode, then the operands_match_p
call succeeds also on big endian.  There is still an ICE triggered
further down the line though.

I seem to be finding more questions than answers here.  Questions such
as "is it really correct that the first operand to operands_match_p()
has modeSI but the second one has modeDF?", "_should_ the operands
match?", and "why is the least significant half singled out when there
is no computation being perfomed".

Given that the code generated for LE seems incorrect, I still suspect
that there is some deeper issue here not related to endianness (but
possibly related to using integer registers for passing floating point
values to/from asm statements) and that it just happens to not cause
an internal error (only bad code) on LE.

How would you like to proceed?  I don't feel confident that I will
find a definitive solution to this issue anytime soon, but it feels
like such a weird special case (who passes 64-bit floats in 32-bit
integer registers to their asm?) that it might be ok to just ignore
it.  If you agree I'll just repost the patchset with the final fix
added (solves all remaining 32-bit testcases save for this one)...


  // Marcus

Kito Cheng March 19, 2021, 4:22 p.m. UTC | #9

Hi Marcus:

Thank you for digging this issue out, I would suggest you sent v4
patch which only v3 + riscv_subword fix, and then merge into master
first, and then sent separate patch for that issue, not sure what your
fix, but I guess it might fix some code for IRA/LRA, so I think has a
separate patch would be easy to discussion with other (non-RISC-V)
maintainers.


On Mon, Mar 15, 2021 at 5:42 AM Marcus Comstedt <marcus@mc.pp.se> wrote:
>
>
> Hello again Kito.
>
> I've now delved a bit deeper into the failure of the testcase
> gcc.c-torture/compile/pr35318.c on big endian RV32.
>
> The point at which big endian diverges from little endian is where
> process_alt_operands() is processing the "%0" constraint.  It calls
> operands_match_p(), which succeeds on little endian but fails on
> big endian.
>
> On little endian, the two rtx:es passed to operands_match_p are
> "r79:DF#0" and "r79:DF", while on big endian they are "r79:DF#4" and
> "r79:DF".  While the first operand is different, it's actually saying
> the same thing: The subreg with the least significant bits (meaning
> the second register in the pair on big endian, and the first register
> in the pair on little endian, what with two 32-bit integer registers
> being allocated to hold a single 64-bit float).
>
> The helper function lra_constraint_offset(), which is used by
> operands_match_p, seems to be intended to handle this discrepancy.  It
> contains the code
>
>   if (WORDS_BIG_ENDIAN
>       && is_a <scalar_int_mode> (mode, &int_mode)
>       && GET_MODE_SIZE (int_mode) > UNITS_PER_WORD)
>     return hard_regno_nregs (regno, mode) - 1;
>
> However, in this case the rule does not trigger because the mode of
> the second operand (which is the one where an adjustment would be
> needed) does not have a scalar_int_mode, it has DFmode.  If I relax
> this code to also allow scalar_float_mode, then the operands_match_p
> call succeeds also on big endian.  There is still an ICE triggered
> further down the line though.
>
> I seem to be finding more questions than answers here.  Questions such
> as "is it really correct that the first operand to operands_match_p()
> has modeSI but the second one has modeDF?", "_should_ the operands
> match?", and "why is the least significant half singled out when there
> is no computation being perfomed".
>
> Given that the code generated for LE seems incorrect, I still suspect
> that there is some deeper issue here not related to endianness (but
> possibly related to using integer registers for passing floating point
> values to/from asm statements) and that it just happens to not cause
> an internal error (only bad code) on LE.
>
> How would you like to proceed?  I don't feel confident that I will
> find a definitive solution to this issue anytime soon, but it feels
> like such a weird special case (who passes 64-bit floats in 32-bit
> integer registers to their asm?) that it might be ok to just ignore
> it.  If you agree I'll just repost the patchset with the final fix
> added (solves all remaining 32-bit testcases save for this one)...
>
>
>   // Marcus
>
>

Maciej W. Rozycki March 22, 2021, 2:36 p.m. UTC | #10

On Sun, 14 Mar 2021, Marcus Comstedt wrote:

> How would you like to proceed?  I don't feel confident that I will
> find a definitive solution to this issue anytime soon, but it feels
> like such a weird special case (who passes 64-bit floats in 32-bit
> integer registers to their asm?) that it might be ok to just ignore
> it.  If you agree I'll just repost the patchset with the final fix
> added (solves all remaining 32-bit testcases save for this one)...

 Soft-float use case?  Also VAX does even for hard float as it does not 
have separate FPRs, but then it is little-endian exclusively too.

 Overall I think this is analogous to `long long' with 32-bit targets, 
though individual psABIs may specify different conventions as to the order 
of the two parts of the FP datum between the registers in such a pair.

  Maciej

Jim Wilson March 23, 2021, 10:52 p.m. UTC | #11

On Fri, Mar 19, 2021 at 9:22 AM Kito Cheng via Gcc-patches <
gcc-patches@gcc.gnu.org> wrote:

> On Mon, Mar 15, 2021 at 5:42 AM Marcus Comstedt <marcus@mc.pp.se> wrote:
> > I've now delved a bit deeper into the failure of the testcase
> > gcc.c-torture/compile/pr35318.c on big endian RV32.
>

Looking at this testcase, I think this is triggering undefined behavior for
extended asms.

We have an SImode integer constant 8, a DFmode input/output, a 0
constraint that matches the input to output, and then a % commutative
operator that lets us swap operands, except once we swap operands we are
now trying to force SImode and DFmode values to match via the 0 constraint
which is unreasonable, plus a m constraint that then forces an input to
memory.  It works by accident for little-endian because we reload the +0
word of the double and it is still considered the same operand, and it
fails for big-endian by accident because we reload the +4 word of the
double and now it is considered a different operand.

If I change the "8" to "(double)8" or "8.0" then the testcase works for
both big and little endian, as now we have only DFmode values.

I tried ppc-eabi and ppcle-eabi to see what happens there, and the main
difference is that it chooses the 1 alternative in both cases.  However,
for RISC-V, we choose the 0 alternative with operands swapped.  The reason
for this is that we have a DFmode pseudo that wants an FP reg for a load,
and the same pseudo wants a general reg in the asm, and rv32gc does not
have an instruction to move directly between 64-bit FP regs and 32-bit
general regs, so it gets put in memory as the lowest cost option.  That
then leads to the case that alt 0 with swapped operands has the lowest
cost, except this case is the invalid case that tries to match SImode and
DFmode operands with 0 and m constraints and fails.

To summarize, I think that there are two problems here.
1) The testcase is invalid, and can be fixed by changing the "8" to
"(double)8" or "8.0" to ensure that we have a double constant that matches
the type of the other operands.
2) GCC should be giving an error for an asm like this rather than an ICE.
Note that if I manually swap the operands and remove the % I get
void
foo ()
{
  double x = 4, y;
  __asm__ volatile ("" : "=r" (x), "=r" (y) : "0" (8), "m" (x));
}
which fails with an ICE for big-endian ppc exactly the same as it does for
big-endian RISC-V.  We should be generating an error here rather than an
ICE.

Jim

[v2,0/5] RISC-V big endian support

Message

Comments