diff mbox series

[v2] RISC-V: XFAIL pr30957-1.c when loop vectorized with variable factor

Message ID 20231226093445.1860961-1-pan2.li@intel.com
State New
Headers show
Series [v2] RISC-V: XFAIL pr30957-1.c when loop vectorized with variable factor | expand

Commit Message

Li, Pan2 Dec. 26, 2023, 9:34 a.m. UTC
From: Pan Li <pan2.li@intel.com>

This patch would like to XFAIL the test case pr30957-1.c for the RVV when
build the elf with some configurations (list at the end of the log)
It will be vectorized during vect_transform_loop with a variable factor.
It won't benefit from unrolling/peeling and mark the loop->unroll as 1.
Of course, it will do nothing during unroll_loops when loop->unroll is 1.

The aarch64_sve may have the similar issue but it initialize the const
`0.0 / -5.0` in the test file to `+0.0` before pass to the function foo.
Then it will pass the execution test.

aarch64:
movi    v0.2s, #0x0
stp     x29, x30, [sp, #-16]!
mov     w0, #0xa
mov     x29, sp
bl      400280 <foo> <== s0 is +0.0

Unfortunately, the riscv initialize the the const `0.0 / -5.0` to the
`-0.0`, and then pass it to the function foo. Of course it the execution
test will fail.

riscv:
flw     fa0,388(gp) # 1299c <__SDATA_BEGIN__+0x4>
addi    sp,sp,-16
li      a0,10
sd      ra,8(sp)
jal     101fc <foo>  <== fa0 is -0.0

After this patch the loops vectorized with a variable factor of the RVV
will be treated as XFAIL by the tree dump when riscv_v and
variable_vect_length.

The below configurations are validated as XFAIL for RV64.

* riscv-sim/-march=rv64gcv/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-preference=fixed-vlmax
* riscv-sim/-march=rv64gcv/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m2/--param=riscv-autovec-preference=fixed-vlmax
* riscv-sim/-march=rv64gcv/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m4/--param=riscv-autovec-preference=fixed-vlmax
* riscv-sim/-march=rv64gcv/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m8/--param=riscv-autovec-preference=fixed-vlmax
* riscv-sim/-march=rv64gcv/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=dynamic/--param=riscv-autovec-preference=fixed-vlmax
* riscv-sim/-march=rv64gcv_zvl256b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-preference=fixed-vlmax
* riscv-sim/-march=rv64gcv_zvl256b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m2/--param=riscv-autovec-preference=fixed-vlmax
* riscv-sim/-march=rv64gcv_zvl256b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m4/--param=riscv-autovec-preference=fixed-vlmax
* riscv-sim/-march=rv64gcv_zvl256b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m8/--param=riscv-autovec-preference=fixed-vlmax
* riscv-sim/-march=rv64gcv_zvl256b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=dynamic/--param=riscv-autovec-preference=fixed-vlmax
* riscv-sim/-march=rv64gcv_zvl512b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-preference=fixed-vlmax
* riscv-sim/-march=rv64gcv_zvl512b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m2/--param=riscv-autovec-preference=fixed-vlmax
* riscv-sim/-march=rv64gcv_zvl512b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m4/--param=riscv-autovec-preference=fixed-vlmax
* riscv-sim/-march=rv64gcv_zvl512b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m8/--param=riscv-autovec-preference=fixed-vlmax
* riscv-sim/-march=rv64gcv_zvl512b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=dynamic/--param=riscv-autovec-preference=fixed-vlmax
* riscv-sim/-march=rv64gcv_zvl1024b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-preference=fixed-vlmax
* riscv-sim/-march=rv64gcv_zvl1024b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m2/--param=riscv-autovec-preference=fixed-vlmax
* riscv-sim/-march=rv64gcv_zvl1024b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m4/--param=riscv-autovec-preference=fixed-vlmax
* riscv-sim/-march=rv64gcv_zvl1024b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m8/--param=riscv-autovec-preference=fixed-vlmax
* riscv-sim/-march=rv64gcv_zvl1024b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=dynamic/--param=riscv-autovec-preference=fixed-vlmax

gcc/testsuite/ChangeLog:

	* gcc.dg/pr30957-1.c: Add XFAIL for RVV when vectorized with
	variable length.

Signed-off-by: Pan Li <pan2.li@intel.com>
---
 gcc/testsuite/gcc.dg/pr30957-1.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Jeff Law Dec. 28, 2023, 4:39 p.m. UTC | #1
On 12/26/23 02:34, pan2.li@intel.com wrote:
> From: Pan Li <pan2.li@intel.com>
> 
> This patch would like to XFAIL the test case pr30957-1.c for the RVV when
> build the elf with some configurations (list at the end of the log)
> It will be vectorized during vect_transform_loop with a variable factor.
> It won't benefit from unrolling/peeling and mark the loop->unroll as 1.
> Of course, it will do nothing during unroll_loops when loop->unroll is 1.
> 
> The aarch64_sve may have the similar issue but it initialize the const
> `0.0 / -5.0` in the test file to `+0.0` before pass to the function foo.
> Then it will pass the execution test.
> 
> aarch64:
> movi    v0.2s, #0x0
> stp     x29, x30, [sp, #-16]!
> mov     w0, #0xa
> mov     x29, sp
> bl      400280 <foo> <== s0 is +0.0
> 
> Unfortunately, the riscv initialize the the const `0.0 / -5.0` to the
> `-0.0`, and then pass it to the function foo. Of course it the execution
> test will fail.
> 
> riscv:
> flw     fa0,388(gp) # 1299c <__SDATA_BEGIN__+0x4>
> addi    sp,sp,-16
> li      a0,10
> sd      ra,8(sp)
> jal     101fc <foo>  <== fa0 is -0.0
> 
> After this patch the loops vectorized with a variable factor of the RVV
> will be treated as XFAIL by the tree dump when riscv_v and
> variable_vect_length.
> 
> The below configurations are validated as XFAIL for RV64.
Interesting.  So I'd actually peel one more layer off this onion.  Why 
do the aarch64 and riscv targets generate different constants (0.0 vs 
-0.0)?

Is it possible that the aarch64 is generating 0.0 when asked for -0.0 
and -fno-signed-zeros is in effect?  That's a valid thing to do when 
-fno-signed-zeros is on.  Look for HONOR_SIGNED_ZEROs in the aarch64 
backend.



Jeff
Li, Pan2 Dec. 29, 2023, 12:42 a.m. UTC | #2
Thanks Jeff for comments, and Happy new year!

> Interesting.  So I'd actually peel one more layer off this onion.  Why 
> do the aarch64 and riscv targets generate different constants (0.0 vs 
> -0.0)?

Yeah, it surprise me too when debugging the foo function. But didn't dig into it in previous as it may be unrelated to vectorize.

> Is it possible that the aarch64 is generating 0.0 when asked for -0.0 
> and -fno-signed-zeros is in effect?  That's a valid thing to do when 
> -fno-signed-zeros is on.  Look for HONOR_SIGNED_ZEROs in the aarch64 
> backend.

Sure, will have a try for making the -0.0 happen in aarch64.

Pan


-----Original Message-----
From: Jeff Law <jeffreyalaw@gmail.com> 
Sent: Friday, December 29, 2023 12:39 AM
To: Li, Pan2 <pan2.li@intel.com>; gcc-patches@gcc.gnu.org
Cc: juzhe.zhong@rivai.ai; Wang, Yanzhang <yanzhang.wang@intel.com>; kito.cheng@gmail.com; richard.guenther@gmail.com
Subject: Re: [PATCH v2] RISC-V: XFAIL pr30957-1.c when loop vectorized with variable factor



On 12/26/23 02:34, pan2.li@intel.com wrote:
> From: Pan Li <pan2.li@intel.com>
> 
> This patch would like to XFAIL the test case pr30957-1.c for the RVV when
> build the elf with some configurations (list at the end of the log)
> It will be vectorized during vect_transform_loop with a variable factor.
> It won't benefit from unrolling/peeling and mark the loop->unroll as 1.
> Of course, it will do nothing during unroll_loops when loop->unroll is 1.
> 
> The aarch64_sve may have the similar issue but it initialize the const
> `0.0 / -5.0` in the test file to `+0.0` before pass to the function foo.
> Then it will pass the execution test.
> 
> aarch64:
> movi    v0.2s, #0x0
> stp     x29, x30, [sp, #-16]!
> mov     w0, #0xa
> mov     x29, sp
> bl      400280 <foo> <== s0 is +0.0
> 
> Unfortunately, the riscv initialize the the const `0.0 / -5.0` to the
> `-0.0`, and then pass it to the function foo. Of course it the execution
> test will fail.
> 
> riscv:
> flw     fa0,388(gp) # 1299c <__SDATA_BEGIN__+0x4>
> addi    sp,sp,-16
> li      a0,10
> sd      ra,8(sp)
> jal     101fc <foo>  <== fa0 is -0.0
> 
> After this patch the loops vectorized with a variable factor of the RVV
> will be treated as XFAIL by the tree dump when riscv_v and
> variable_vect_length.
> 
> The below configurations are validated as XFAIL for RV64.
Interesting.  So I'd actually peel one more layer off this onion.  Why 
do the aarch64 and riscv targets generate different constants (0.0 vs 
-0.0)?

Is it possible that the aarch64 is generating 0.0 when asked for -0.0 
and -fno-signed-zeros is in effect?  That's a valid thing to do when 
-fno-signed-zeros is on.  Look for HONOR_SIGNED_ZEROs in the aarch64 
backend.



Jeff
Jeff Law Dec. 29, 2023, 1:03 a.m. UTC | #3
On 12/28/23 17:42, Li, Pan2 wrote:
> Thanks Jeff for comments, and Happy new year!
> 
>> Interesting.  So I'd actually peel one more layer off this onion.  Why
>> do the aarch64 and riscv targets generate different constants (0.0 vs
>> -0.0)?
> 
> Yeah, it surprise me too when debugging the foo function. But didn't dig into it in previous as it may be unrelated to vectorize.
> 
>> Is it possible that the aarch64 is generating 0.0 when asked for -0.0
>> and -fno-signed-zeros is in effect?  That's a valid thing to do when
>> -fno-signed-zeros is on.  Look for HONOR_SIGNED_ZEROs in the aarch64
>> backend.
> 
> Sure, will have a try for making the -0.0 happen in aarch64.
I would first look at the .optimized dump, then I'd look at the .final 
dump alongside the resulting assembly for aarch64.

I bet we're going to find that the aarch64 target internally converts 
-0.0 to 0.0 when we're not honoring signed zeros.

jeff
Li, Pan2 Dec. 29, 2023, 5:56 a.m. UTC | #4
Thanks Jeff.

I think I locate where aarch64 performs the trick here.

1. In the .final we have rtl like

(insn:TI 6 8 29 (set (reg:SF 32 v0)
        (const_double:SF -0.0 [-0x0.0p+0])) "/home/box/panli/gnu-toolchain/gcc/gcc/testsuite/gcc.dg/pr30957-1.c":31:7 79 {*movsf_aarch64}
     (nil))

2. the movsf_aarch64 comes from the aarch64.md file similar to the below rtl. Aka, it will generate movi\t%0.2s, #0 if
the aarch64_reg_or_fp_zero is true.

1640 (define_insn "*mov<mode>_aarch64"
1641   [(set (match_operand:SFD 0 "nonimmediate_operand")
1642       match_operand:SFD 1 "general_operand"))]
1643   "TARGET_FLOAT && (register_operand (operands[0], <MODE>mode)
1644     || aarch64_reg_or_fp_zero (operands[1], <MODE>mode))"
1645   {@ [ cons: =0 , 1   ; attrs: type , arch  ]
1646      [ w        , Y   ; neon_move   , simd  ] movi\t%0.2s, #0

3. Then we will have aarch64_float_const_zero_rtx_p here, and the -0.0 input rtl will return true in line 10873 because of no-signed-zero is given.

10863 bool
10864 aarch64_float_const_zero_rtx_p (rtx x
10865 {
10866   /* 0.0 in Decimal Floating Point cannot be represented by #0 or
10867      zr as our callers expect, so no need to check the actual
10868      value if X is of Decimal Floating Point type.  */
10869   if (GET_MODE_CLASS (GET_MODE (x)) == MODE_DECIMAL_FLOAT)
10870     return false;
10871
10872   if (REAL_VALUE_MINUS_ZERO (*CONST_DOUBLE_REAL_VALUE (x)))
10873     return !HONOR_SIGNED_ZEROS (GET_MODE (x));
10874   return real_equal (CONST_DOUBLE_REAL_VALUE (x), &dconst0);
10875 }

I think that explain why we have +0.0 in aarch64 here.

Pan

-----Original Message-----
From: Jeff Law <jeffreyalaw@gmail.com> 
Sent: Friday, December 29, 2023 9:04 AM
To: Li, Pan2 <pan2.li@intel.com>; gcc-patches@gcc.gnu.org
Cc: juzhe.zhong@rivai.ai; Wang, Yanzhang <yanzhang.wang@intel.com>; kito.cheng@gmail.com; richard.guenther@gmail.com
Subject: Re: [PATCH v2] RISC-V: XFAIL pr30957-1.c when loop vectorized with variable factor



On 12/28/23 17:42, Li, Pan2 wrote:
> Thanks Jeff for comments, and Happy new year!
> 
>> Interesting.  So I'd actually peel one more layer off this onion.  Why
>> do the aarch64 and riscv targets generate different constants (0.0 vs
>> -0.0)?
> 
> Yeah, it surprise me too when debugging the foo function. But didn't dig into it in previous as it may be unrelated to vectorize.
> 
>> Is it possible that the aarch64 is generating 0.0 when asked for -0.0
>> and -fno-signed-zeros is in effect?  That's a valid thing to do when
>> -fno-signed-zeros is on.  Look for HONOR_SIGNED_ZEROs in the aarch64
>> backend.
> 
> Sure, will have a try for making the -0.0 happen in aarch64.
I would first look at the .optimized dump, then I'd look at the .final 
dump alongside the resulting assembly for aarch64.

I bet we're going to find that the aarch64 target internally converts 
-0.0 to 0.0 when we're not honoring signed zeros.

jeff
Jeff Law Dec. 30, 2023, 3:13 a.m. UTC | #5
On 12/28/23 22:56, Li, Pan2 wrote:
> Thanks Jeff.
> 
> I think I locate where aarch64 performs the trick here.
> 
> 1. In the .final we have rtl like
> 
> (insn:TI 6 8 29 (set (reg:SF 32 v0)
>          (const_double:SF -0.0 [-0x0.0p+0])) "/home/box/panli/gnu-toolchain/gcc/gcc/testsuite/gcc.dg/pr30957-1.c":31:7 79 {*movsf_aarch64}
>       (nil))
> 
> 2. the movsf_aarch64 comes from the aarch64.md file similar to the below rtl. Aka, it will generate movi\t%0.2s, #0 if
> the aarch64_reg_or_fp_zero is true.
> 
> 1640 (define_insn "*mov<mode>_aarch64"
> 1641   [(set (match_operand:SFD 0 "nonimmediate_operand")
> 1642       match_operand:SFD 1 "general_operand"))]
> 1643   "TARGET_FLOAT && (register_operand (operands[0], <MODE>mode)
> 1644     || aarch64_reg_or_fp_zero (operands[1], <MODE>mode))"
> 1645   {@ [ cons: =0 , 1   ; attrs: type , arch  ]
> 1646      [ w        , Y   ; neon_move   , simd  ] movi\t%0.2s, #0
> 
> 3. Then we will have aarch64_float_const_zero_rtx_p here, and the -0.0 input rtl will return true in line 10873 because of no-signed-zero is given.
> 
> 10863 bool
> 10864 aarch64_float_const_zero_rtx_p (rtx x
> 10865 {
> 10866   /* 0.0 in Decimal Floating Point cannot be represented by #0 or
> 10867      zr as our callers expect, so no need to check the actual
> 10868      value if X is of Decimal Floating Point type.  */
> 10869   if (GET_MODE_CLASS (GET_MODE (x)) == MODE_DECIMAL_FLOAT)
> 10870     return false;
> 10871
> 10872   if (REAL_VALUE_MINUS_ZERO (*CONST_DOUBLE_REAL_VALUE (x)))
> 10873     return !HONOR_SIGNED_ZEROS (GET_MODE (x));
> 10874   return real_equal (CONST_DOUBLE_REAL_VALUE (x), &dconst0);
> 10875 }
> 
> I think that explain why we have +0.0 in aarch64 here.
Yup.  Thanks a ton for diving into this.  So I think that points us to 
the right fix, specifically we should be turning -0.0 into 0.0 when 
!HONOR_SIGNED_ZEROS rather than xfailing the test.

I think we'd need to adjust reg_or_0_operand and riscv_output_move, 
probably the G constraint as well.   We might also need to adjust 
move_operand and perhaps riscv_legitimize_move.

jeff
Li, Pan2 Jan. 1, 2024, 8:56 a.m. UTC | #6
Thanks Jeff for the confirmation and suggestions. It looks like not a corner case for the option no-signed-zero.
Consider 2 sample function as below with build with option " -march=rv64gcv -mabi=lp64d -O2 -fno-signed-zeros".

void
__attribute__ ((noinline))
test_float_zero_assign_0 (float *a)
{
  *a = +0.0;
}

void
__attribute__ ((noinline))
test_float_zero_assign_1 (float *a)
{
  *a = -0.0;
}

For the first one (aka float 0.0) we have rtl as below:
(insn 6 3 0 2 (set (mem:SF (reg/v/f:DI 134 [ a ]) [1 *a_2(D)+0 S4 A32])
        (const_double:SF 0.0 [0x0.0p+0])) "test.c":14:6 -1
     (nil))

But for the second one (aka float -0.0 with no-signed-zero) we have rtl as below but we expect const_double -0.0 here.
(insn 6 3 7 2 (set (reg:DI 135
        (high:DI (symbol_ref/u:DI ("*.LC0") [flags 0x82]))) "test.c":21:6 -1
     (nil))
(insn 7 6 8 2 (set (reg:SF 136)
        (mem/u/c:SF (lo_sum:DI (reg:DI 135)
                (symbol_ref/u:DI ("*.LC0") [flags 0x82])) [0  S4 A32])) "test.c":21:6 -1
     (nil))

I will have a try to fix it in V3.

Pan

-----Original Message-----
From: Jeff Law <jeffreyalaw@gmail.com> 
Sent: Saturday, December 30, 2023 11:14 AM
To: Li, Pan2 <pan2.li@intel.com>; gcc-patches@gcc.gnu.org
Cc: juzhe.zhong@rivai.ai; Wang, Yanzhang <yanzhang.wang@intel.com>; kito.cheng@gmail.com; richard.guenther@gmail.com
Subject: Re: [PATCH v2] RISC-V: XFAIL pr30957-1.c when loop vectorized with variable factor



On 12/28/23 22:56, Li, Pan2 wrote:
> Thanks Jeff.
> 
> I think I locate where aarch64 performs the trick here.
> 
> 1. In the .final we have rtl like
> 
> (insn:TI 6 8 29 (set (reg:SF 32 v0)
>          (const_double:SF -0.0 [-0x0.0p+0])) "/home/box/panli/gnu-toolchain/gcc/gcc/testsuite/gcc.dg/pr30957-1.c":31:7 79 {*movsf_aarch64}
>       (nil))
> 
> 2. the movsf_aarch64 comes from the aarch64.md file similar to the below rtl. Aka, it will generate movi\t%0.2s, #0 if
> the aarch64_reg_or_fp_zero is true.
> 
> 1640 (define_insn "*mov<mode>_aarch64"
> 1641   [(set (match_operand:SFD 0 "nonimmediate_operand")
> 1642       match_operand:SFD 1 "general_operand"))]
> 1643   "TARGET_FLOAT && (register_operand (operands[0], <MODE>mode)
> 1644     || aarch64_reg_or_fp_zero (operands[1], <MODE>mode))"
> 1645   {@ [ cons: =0 , 1   ; attrs: type , arch  ]
> 1646      [ w        , Y   ; neon_move   , simd  ] movi\t%0.2s, #0
> 
> 3. Then we will have aarch64_float_const_zero_rtx_p here, and the -0.0 input rtl will return true in line 10873 because of no-signed-zero is given.
> 
> 10863 bool
> 10864 aarch64_float_const_zero_rtx_p (rtx x
> 10865 {
> 10866   /* 0.0 in Decimal Floating Point cannot be represented by #0 or
> 10867      zr as our callers expect, so no need to check the actual
> 10868      value if X is of Decimal Floating Point type.  */
> 10869   if (GET_MODE_CLASS (GET_MODE (x)) == MODE_DECIMAL_FLOAT)
> 10870     return false;
> 10871
> 10872   if (REAL_VALUE_MINUS_ZERO (*CONST_DOUBLE_REAL_VALUE (x)))
> 10873     return !HONOR_SIGNED_ZEROS (GET_MODE (x));
> 10874   return real_equal (CONST_DOUBLE_REAL_VALUE (x), &dconst0);
> 10875 }
> 
> I think that explain why we have +0.0 in aarch64 here.
Yup.  Thanks a ton for diving into this.  So I think that points us to 
the right fix, specifically we should be turning -0.0 into 0.0 when 
!HONOR_SIGNED_ZEROS rather than xfailing the test.

I think we'd need to adjust reg_or_0_operand and riscv_output_move, 
probably the G constraint as well.   We might also need to adjust 
move_operand and perhaps riscv_legitimize_move.

jeff
diff mbox series

Patch

diff --git a/gcc/testsuite/gcc.dg/pr30957-1.c b/gcc/testsuite/gcc.dg/pr30957-1.c
index 564410913ab..7a7242ec16d 100644
--- a/gcc/testsuite/gcc.dg/pr30957-1.c
+++ b/gcc/testsuite/gcc.dg/pr30957-1.c
@@ -3,7 +3,7 @@ 
    where each addition is a library call.  /
 /* { dg-require-effective-target hard_float } */
 /* -fassociative-math requires -fno-trapping-math and -fno-signed-zeros. */
-/* { dg-options "-O2 -funroll-loops -fassociative-math -fno-trapping-math -fno-signed-zeros -fvariable-expansion-in-unroller -fdump-rtl-loop2_unroll" } */
+/* { dg-options "-O2 -funroll-loops -fassociative-math -fno-trapping-math -fno-signed-zeros -fvariable-expansion-in-unroller -fdump-rtl-loop2_unroll -fdump-tree-vect-details" } */
 
 extern void abort (void);
 extern void exit (int);
@@ -34,3 +34,4 @@  main ()
 }
 
 /* { dg-final { scan-rtl-dump "Expanding Accumulator" "loop2_unroll" { xfail mmix-*-* } } } */
+/* { dg-final { scan-tree-dump-times "Disabling unrolling due to variable-length vectorization factor" 1 "vect" { target { riscv_v } xfail { vect_variable_length } } } } */