diff mbox series

[v1] LoongArch: Fixed an issue with the implementation of the template atomic_compare_and_swapsi.

Message ID 20240307011252.11808-1-chenglulu@loongson.cn
State New
Headers show
Series [v1] LoongArch: Fixed an issue with the implementation of the template atomic_compare_and_swapsi. | expand

Commit Message

Lulu Cheng March 7, 2024, 1:12 a.m. UTC
If the hardware does not support LAMCAS, atomic_compare_and_swapsi needs to be
implemented through "ll.w+sc.w". In the implementation of the instruction sequence,
it is necessary to determine whether the two registers are equal.
Since LoongArch's comparison instructions do not distinguish between 32-bit
and 64-bit, the two operand registers that need to be compared are symbolically
extended, and one of the operand registers is obtained from memory through the
"ll.w" instruction, which can ensure that the symbolic expansion is carried out.
However, the value of the other operand register is not guaranteed to be the
value of the sign extension.

gcc/ChangeLog:

	* config/loongarch/sync.md (atomic_cas_value_strong<mode>):
	In loongarch64, a sign extension operation is added when
	operands[2] is a register operand and the mode is SImode.

gcc/testsuite/ChangeLog:

	* g++.target/loongarch/atomic-cas-int.C: New test.
---
 gcc/config/loongarch/sync.md                  | 46 ++++++++++++++-----
 .../g++.target/loongarch/atomic-cas-int.C     | 32 +++++++++++++
 2 files changed, 67 insertions(+), 11 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/loongarch/atomic-cas-int.C

Comments

Xi Ruoyao March 7, 2024, 12:52 p.m. UTC | #1
On Thu, 2024-03-07 at 09:12 +0800, Lulu Cheng wrote:

> +  output_asm_insn ("1:", operands);
> +  output_asm_insn ("ll.<amo>\t%0,%1", operands);
> +
> +  /* Like the test case atomic-cas-int.C, in loongarch64, O1 and higher, the
> +     return value of the val_without_const_folding will not be truncated and
> +     will be passed directly to the function compare_exchange_strong.
> +     However, the instruction 'bne' does not distinguish between 32-bit and
> +     64-bit operations.  so if the upper 32 bits of the register are not
> +     extended by the 32nd bit symbol, then the comparison may not be valid
> +     here.  This will affect the result of the operation.  */
> +
> +  if (TARGET_64BIT && REG_P (operands[2])
> +      && GET_MODE (operands[2]) == SImode)
> +    {
> +      output_asm_insn ("addi.w\t%5,%2,0", operands);
> +      output_asm_insn ("bne\t%0,%5,2f", operands);

It should be better to extend the expected value before the ll/sc loop
(like what LLVM does), instead of repeating the extending in each
iteration.  Something like:

diff --git a/gcc/config/loongarch/sync.md b/gcc/config/loongarch/sync.md
index 8f35a5b48d2..c21781947fd 100644
--- a/gcc/config/loongarch/sync.md
+++ b/gcc/config/loongarch/sync.md
@@ -234,11 +234,11 @@ (define_insn "atomic_exchange<mode>_short"
   "amswap%A3.<amo>\t%0,%z2,%1"
   [(set (attr "length") (const_int 4))])
 
-(define_insn "atomic_cas_value_strong<mode>"
+(define_insn "atomic_cas_value_strong<GPR:mode><X:mode>"
   [(set (match_operand:GPR 0 "register_operand" "=&r")
 	(match_operand:GPR 1 "memory_operand" "+ZC"))
    (set (match_dup 1)
-	(unspec_volatile:GPR [(match_operand:GPR 2 "reg_or_0_operand" "rJ")
+	(unspec_volatile:GPR [(match_operand:X 2 "reg_or_0_operand" "rJ")
 			      (match_operand:GPR 3 "reg_or_0_operand" "rJ")
 			      (match_operand:SI 4 "const_int_operand")]  ;; mod_s
 	 UNSPEC_COMPARE_AND_SWAP))
@@ -246,10 +246,10 @@ (define_insn "atomic_cas_value_strong<mode>"
   ""
 {
   return "1:\\n\\t"
-	 "ll.<amo>\\t%0,%1\\n\\t"
+	 "ll.<GPR:amo>\\t%0,%1\\n\\t"
 	 "bne\\t%0,%z2,2f\\n\\t"
 	 "or%i3\\t%5,$zero,%3\\n\\t"
-	 "sc.<amo>\\t%5,%1\\n\\t"
+	 "sc.<GPR:amo>\\t%5,%1\\n\\t"
 	 "beqz\\t%5,1b\\n\\t"
 	 "b\\t3f\\n\\t"
 	 "2:\\n\\t"
@@ -301,9 +301,23 @@ (define_expand "atomic_compare_and_swap<mode>"
 							 operands[3], operands[4],
 							 operands[6]));
   else
-    emit_insn (gen_atomic_cas_value_strong<mode> (operands[1], operands[2],
-						  operands[3], operands[4],
-						  operands[6]));
+    {
+      rtx (*cas)(rtx, rtx, rtx, rtx, rtx) =
+	TARGET_64BIT ? gen_atomic_cas_value_strong<mode>di
+		     : gen_atomic_cas_value_strong<mode>si;
+      rtx expect = operands[3];
+
+      if (<MODE>mode == SImode
+	  && TARGET_64BIT
+	  && operands[3] != const0_rtx)
+	{
+	  expect = gen_reg_rtx (DImode);
+	  emit_insn (gen_extendsidi2 (expect, operands[3]));
+	}
+
+      emit_insn (cas (operands[1], operands[2], expect, operands[4],
+		      operands[6]));
+    }
 
   rtx compare = operands[1];
   if (operands[3] != const0_rtx)

It produces:

        slli.w  $r4,$r4,0
        1:
        ll.w    $r14,$r3,0
        bne     $r14,$r4,2f
        or      $r15,$zero,$r12
        sc.w    $r15,$r3,0
        beqz    $r15,1b
        b       3f
        2:
        dbar    0b10100
        3:

for the test case and the compiled test case runs successfully.  I've
not done a full bootstrap yet though.
Lulu Cheng March 7, 2024, 1:07 p.m. UTC | #2
在 2024/3/7 下午8:52, Xi Ruoyao 写道:
> It should be better to extend the expected value before the ll/sc loop
> (like what LLVM does), instead of repeating the extending in each
> iteration.  Something like:

I wanted to do this at first, but it didn't work out.

But then I thought about it, and there are two benefits to putting it in 
the middle of ll/sc:

1. If there is an operation that uses the $r4 register after this atomic 
operation, another

register is required to store $r4.

2. ll.w requires long cycles, so putting an addi.w command after ll.w 
won't make a difference.

So based on the above, I didn't try again, but directly made a 
modification like a patch.
Xi Ruoyao March 8, 2024, 6:22 a.m. UTC | #3
On Thu, 2024-03-07 at 21:07 +0800, chenglulu wrote:
> 
> 在 2024/3/7 下午8:52, Xi Ruoyao 写道:
> > It should be better to extend the expected value before the ll/sc loop
> > (like what LLVM does), instead of repeating the extending in each
> > iteration.  Something like:
> 
> I wanted to do this at first, but it didn't work out.
> 
> But then I thought about it, and there are two benefits to putting it in 
> the middle of ll/sc:
> 
> 1. If there is an operation that uses the $r4 register after this atomic 
> operation, another
> 
> register is required to store $r4.
> 
> 2. ll.w requires long cycles, so putting an addi.w command after ll.w 
> won't make a difference.
> 
> So based on the above, I didn't try again, but directly made a 
> modification like a patch.

Ah, the explanation makes sense to me.  Ok with the original patch then.
Lulu Cheng March 9, 2024, 1:06 a.m. UTC | #4
在 2024/3/8 下午2:22, Xi Ruoyao 写道:
> On Thu, 2024-03-07 at 21:07 +0800, chenglulu wrote:
>> 在 2024/3/7 下午8:52, Xi Ruoyao 写道:
>>> It should be better to extend the expected value before the ll/sc loop
>>> (like what LLVM does), instead of repeating the extending in each
>>> iteration.  Something like:
>> I wanted to do this at first, but it didn't work out.
>>
>> But then I thought about it, and there are two benefits to putting it in
>> the middle of ll/sc:
>>
>> 1. If there is an operation that uses the $r4 register after this atomic
>> operation, another
>>
>> register is required to store $r4.
>>
>> 2. ll.w requires long cycles, so putting an addi.w command after ll.w
>> won't make a difference.
>>
>> So based on the above, I didn't try again, but directly made a
>> modification like a patch.
> Ah, the explanation makes sense to me.  Ok with the original patch then.
Ok.Thank you so much!:-)
Lulu Cheng March 9, 2024, 1:48 a.m. UTC | #5
Pushed to r14-9407.

在 2024/3/7 上午9:12, Lulu Cheng 写道:
> If the hardware does not support LAMCAS, atomic_compare_and_swapsi needs to be
> implemented through "ll.w+sc.w". In the implementation of the instruction sequence,
> it is necessary to determine whether the two registers are equal.
> Since LoongArch's comparison instructions do not distinguish between 32-bit
> and 64-bit, the two operand registers that need to be compared are symbolically
> extended, and one of the operand registers is obtained from memory through the
> "ll.w" instruction, which can ensure that the symbolic expansion is carried out.
> However, the value of the other operand register is not guaranteed to be the
> value of the sign extension.
>
> gcc/ChangeLog:
>
> 	* config/loongarch/sync.md (atomic_cas_value_strong<mode>):
> 	In loongarch64, a sign extension operation is added when
> 	operands[2] is a register operand and the mode is SImode.
>
> gcc/testsuite/ChangeLog:
>
> 	* g++.target/loongarch/atomic-cas-int.C: New test.
> ---
>   gcc/config/loongarch/sync.md                  | 46 ++++++++++++++-----
>   .../g++.target/loongarch/atomic-cas-int.C     | 32 +++++++++++++
>   2 files changed, 67 insertions(+), 11 deletions(-)
>   create mode 100644 gcc/testsuite/g++.target/loongarch/atomic-cas-int.C
>
> diff --git a/gcc/config/loongarch/sync.md b/gcc/config/loongarch/sync.md
> index 8f35a5b48d2..d41c2d26811 100644
> --- a/gcc/config/loongarch/sync.md
> +++ b/gcc/config/loongarch/sync.md
> @@ -245,18 +245,42 @@ (define_insn "atomic_cas_value_strong<mode>"
>      (clobber (match_scratch:GPR 5 "=&r"))]
>     ""
>   {
> -  return "1:\\n\\t"
> -	 "ll.<amo>\\t%0,%1\\n\\t"
> -	 "bne\\t%0,%z2,2f\\n\\t"
> -	 "or%i3\\t%5,$zero,%3\\n\\t"
> -	 "sc.<amo>\\t%5,%1\\n\\t"
> -	 "beqz\\t%5,1b\\n\\t"
> -	 "b\\t3f\\n\\t"
> -	 "2:\\n\\t"
> -	 "%G4\\n\\t"
> -	 "3:\\n\\t";
> +  output_asm_insn ("1:", operands);
> +  output_asm_insn ("ll.<amo>\t%0,%1", operands);
> +
> +  /* Like the test case atomic-cas-int.C, in loongarch64, O1 and higher, the
> +     return value of the val_without_const_folding will not be truncated and
> +     will be passed directly to the function compare_exchange_strong.
> +     However, the instruction 'bne' does not distinguish between 32-bit and
> +     64-bit operations.  so if the upper 32 bits of the register are not
> +     extended by the 32nd bit symbol, then the comparison may not be valid
> +     here.  This will affect the result of the operation.  */
> +
> +  if (TARGET_64BIT && REG_P (operands[2])
> +      && GET_MODE (operands[2]) == SImode)
> +    {
> +      output_asm_insn ("addi.w\t%5,%2,0", operands);
> +      output_asm_insn ("bne\t%0,%5,2f", operands);
> +    }
> +  else
> +    output_asm_insn ("bne\t%0,%z2,2f", operands);
> +
> +  output_asm_insn ("or%i3\t%5,$zero,%3", operands);
> +  output_asm_insn ("sc.<amo>\t%5,%1", operands);
> +  output_asm_insn ("beqz\t%5,1b", operands);
> +  output_asm_insn ("b\t3f", operands);
> +  output_asm_insn ("2:", operands);
> +  output_asm_insn ("%G4", operands);
> +  output_asm_insn ("3:", operands);
> +
> +  return "";
>   }
> -  [(set (attr "length") (const_int 28))])
> +  [(set (attr "length")
> +     (if_then_else
> +	(and (match_test "GET_MODE (operands[2]) == SImode")
> +	     (match_test "REG_P (operands[2])"))
> +	(const_int 32)
> +	(const_int 28)))])
>   
>   (define_insn "atomic_cas_value_strong<mode>_amcas"
>     [(set (match_operand:QHWD 0 "register_operand" "=&r")
> diff --git a/gcc/testsuite/g++.target/loongarch/atomic-cas-int.C b/gcc/testsuite/g++.target/loongarch/atomic-cas-int.C
> new file mode 100644
> index 00000000000..830ce48267a
> --- /dev/null
> +++ b/gcc/testsuite/g++.target/loongarch/atomic-cas-int.C
> @@ -0,0 +1,32 @@
> +/* { dg-do run } */
> +/* { dg-options "-O2" } */
> +
> +#include <atomic>
> +#include <cstdio>
> +
> +__attribute__ ((noinline)) long
> +val_without_const_folding (long val)
> +{
> +  return val;
> +}
> +
> +int
> +main ()
> +{
> +  int oldval = 0xaa;
> +  int newval = 0xbb;
> +  std::atomic<int> amo;
> +
> +  amo.store (oldval);
> +
> +  long longval = val_without_const_folding (0xff80000000000000 + oldval);
> +  oldval = static_cast<int> (longval);
> +
> +  amo.compare_exchange_strong (oldval, newval);
> +
> +  if (newval != amo.load (std::memory_order_relaxed))
> +    __builtin_abort ();
> +
> +  return 0;
> +}
> +
Lulu Cheng March 9, 2024, 7:27 a.m. UTC | #6
在 2024/3/9 上午9:48, chenglulu 写道:
> Pushed to r14-9407.
Cherry picked to r13-8413 and r12-10200.
>
> 在 2024/3/7 上午9:12, Lulu Cheng 写道:
>> If the hardware does not support LAMCAS, atomic_compare_and_swapsi 
>> needs to be
>> implemented through "ll.w+sc.w". In the implementation of the 
>> instruction sequence,
>> it is necessary to determine whether the two registers are equal.
>> Since LoongArch's comparison instructions do not distinguish between 
>> 32-bit
>> and 64-bit, the two operand registers that need to be compared are 
>> symbolically
>> extended, and one of the operand registers is obtained from memory 
>> through the
>> "ll.w" instruction, which can ensure that the symbolic expansion is 
>> carried out.
>> However, the value of the other operand register is not guaranteed to 
>> be the
>> value of the sign extension.
>>
>> gcc/ChangeLog:
>>
>>     * config/loongarch/sync.md (atomic_cas_value_strong<mode>):
>>     In loongarch64, a sign extension operation is added when
>>     operands[2] is a register operand and the mode is SImode.
>>
>> gcc/testsuite/ChangeLog:
>>
>>     * g++.target/loongarch/atomic-cas-int.C: New test.
>> ---
>>   gcc/config/loongarch/sync.md                  | 46 ++++++++++++++-----
>>   .../g++.target/loongarch/atomic-cas-int.C     | 32 +++++++++++++
>>   2 files changed, 67 insertions(+), 11 deletions(-)
>>   create mode 100644 gcc/testsuite/g++.target/loongarch/atomic-cas-int.C
>>
>> diff --git a/gcc/config/loongarch/sync.md b/gcc/config/loongarch/sync.md
>> index 8f35a5b48d2..d41c2d26811 100644
>> --- a/gcc/config/loongarch/sync.md
>> +++ b/gcc/config/loongarch/sync.md
>> @@ -245,18 +245,42 @@ (define_insn "atomic_cas_value_strong<mode>"
>>      (clobber (match_scratch:GPR 5 "=&r"))]
>>     ""
>>   {
>> -  return "1:\\n\\t"
>> -     "ll.<amo>\\t%0,%1\\n\\t"
>> -     "bne\\t%0,%z2,2f\\n\\t"
>> -     "or%i3\\t%5,$zero,%3\\n\\t"
>> -     "sc.<amo>\\t%5,%1\\n\\t"
>> -     "beqz\\t%5,1b\\n\\t"
>> -     "b\\t3f\\n\\t"
>> -     "2:\\n\\t"
>> -     "%G4\\n\\t"
>> -     "3:\\n\\t";
>> +  output_asm_insn ("1:", operands);
>> +  output_asm_insn ("ll.<amo>\t%0,%1", operands);
>> +
>> +  /* Like the test case atomic-cas-int.C, in loongarch64, O1 and 
>> higher, the
>> +     return value of the val_without_const_folding will not be 
>> truncated and
>> +     will be passed directly to the function compare_exchange_strong.
>> +     However, the instruction 'bne' does not distinguish between 
>> 32-bit and
>> +     64-bit operations.  so if the upper 32 bits of the register are 
>> not
>> +     extended by the 32nd bit symbol, then the comparison may not be 
>> valid
>> +     here.  This will affect the result of the operation.  */
>> +
>> +  if (TARGET_64BIT && REG_P (operands[2])
>> +      && GET_MODE (operands[2]) == SImode)
>> +    {
>> +      output_asm_insn ("addi.w\t%5,%2,0", operands);
>> +      output_asm_insn ("bne\t%0,%5,2f", operands);
>> +    }
>> +  else
>> +    output_asm_insn ("bne\t%0,%z2,2f", operands);
>> +
>> +  output_asm_insn ("or%i3\t%5,$zero,%3", operands);
>> +  output_asm_insn ("sc.<amo>\t%5,%1", operands);
>> +  output_asm_insn ("beqz\t%5,1b", operands);
>> +  output_asm_insn ("b\t3f", operands);
>> +  output_asm_insn ("2:", operands);
>> +  output_asm_insn ("%G4", operands);
>> +  output_asm_insn ("3:", operands);
>> +
>> +  return "";
>>   }
>> -  [(set (attr "length") (const_int 28))])
>> +  [(set (attr "length")
>> +     (if_then_else
>> +    (and (match_test "GET_MODE (operands[2]) == SImode")
>> +         (match_test "REG_P (operands[2])"))
>> +    (const_int 32)
>> +    (const_int 28)))])
>>     (define_insn "atomic_cas_value_strong<mode>_amcas"
>>     [(set (match_operand:QHWD 0 "register_operand" "=&r")
>> diff --git a/gcc/testsuite/g++.target/loongarch/atomic-cas-int.C 
>> b/gcc/testsuite/g++.target/loongarch/atomic-cas-int.C
>> new file mode 100644
>> index 00000000000..830ce48267a
>> --- /dev/null
>> +++ b/gcc/testsuite/g++.target/loongarch/atomic-cas-int.C
>> @@ -0,0 +1,32 @@
>> +/* { dg-do run } */
>> +/* { dg-options "-O2" } */
>> +
>> +#include <atomic>
>> +#include <cstdio>
>> +
>> +__attribute__ ((noinline)) long
>> +val_without_const_folding (long val)
>> +{
>> +  return val;
>> +}
>> +
>> +int
>> +main ()
>> +{
>> +  int oldval = 0xaa;
>> +  int newval = 0xbb;
>> +  std::atomic<int> amo;
>> +
>> +  amo.store (oldval);
>> +
>> +  long longval = val_without_const_folding (0xff80000000000000 + 
>> oldval);
>> +  oldval = static_cast<int> (longval);
>> +
>> +  amo.compare_exchange_strong (oldval, newval);
>> +
>> +  if (newval != amo.load (std::memory_order_relaxed))
>> +    __builtin_abort ();
>> +
>> +  return 0;
>> +}
>> +
diff mbox series

Patch

diff --git a/gcc/config/loongarch/sync.md b/gcc/config/loongarch/sync.md
index 8f35a5b48d2..d41c2d26811 100644
--- a/gcc/config/loongarch/sync.md
+++ b/gcc/config/loongarch/sync.md
@@ -245,18 +245,42 @@  (define_insn "atomic_cas_value_strong<mode>"
    (clobber (match_scratch:GPR 5 "=&r"))]
   ""
 {
-  return "1:\\n\\t"
-	 "ll.<amo>\\t%0,%1\\n\\t"
-	 "bne\\t%0,%z2,2f\\n\\t"
-	 "or%i3\\t%5,$zero,%3\\n\\t"
-	 "sc.<amo>\\t%5,%1\\n\\t"
-	 "beqz\\t%5,1b\\n\\t"
-	 "b\\t3f\\n\\t"
-	 "2:\\n\\t"
-	 "%G4\\n\\t"
-	 "3:\\n\\t";
+  output_asm_insn ("1:", operands);
+  output_asm_insn ("ll.<amo>\t%0,%1", operands);
+
+  /* Like the test case atomic-cas-int.C, in loongarch64, O1 and higher, the
+     return value of the val_without_const_folding will not be truncated and
+     will be passed directly to the function compare_exchange_strong.
+     However, the instruction 'bne' does not distinguish between 32-bit and
+     64-bit operations.  so if the upper 32 bits of the register are not
+     extended by the 32nd bit symbol, then the comparison may not be valid
+     here.  This will affect the result of the operation.  */
+
+  if (TARGET_64BIT && REG_P (operands[2])
+      && GET_MODE (operands[2]) == SImode)
+    {
+      output_asm_insn ("addi.w\t%5,%2,0", operands);
+      output_asm_insn ("bne\t%0,%5,2f", operands);
+    }
+  else
+    output_asm_insn ("bne\t%0,%z2,2f", operands);
+
+  output_asm_insn ("or%i3\t%5,$zero,%3", operands);
+  output_asm_insn ("sc.<amo>\t%5,%1", operands);
+  output_asm_insn ("beqz\t%5,1b", operands);
+  output_asm_insn ("b\t3f", operands);
+  output_asm_insn ("2:", operands);
+  output_asm_insn ("%G4", operands);
+  output_asm_insn ("3:", operands);
+
+  return "";
 }
-  [(set (attr "length") (const_int 28))])
+  [(set (attr "length")
+     (if_then_else
+	(and (match_test "GET_MODE (operands[2]) == SImode")
+	     (match_test "REG_P (operands[2])"))
+	(const_int 32)
+	(const_int 28)))])
 
 (define_insn "atomic_cas_value_strong<mode>_amcas"
   [(set (match_operand:QHWD 0 "register_operand" "=&r")
diff --git a/gcc/testsuite/g++.target/loongarch/atomic-cas-int.C b/gcc/testsuite/g++.target/loongarch/atomic-cas-int.C
new file mode 100644
index 00000000000..830ce48267a
--- /dev/null
+++ b/gcc/testsuite/g++.target/loongarch/atomic-cas-int.C
@@ -0,0 +1,32 @@ 
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+#include <atomic>
+#include <cstdio>
+
+__attribute__ ((noinline)) long
+val_without_const_folding (long val)
+{
+  return val;
+}
+
+int
+main ()
+{
+  int oldval = 0xaa;
+  int newval = 0xbb;
+  std::atomic<int> amo;
+
+  amo.store (oldval);
+
+  long longval = val_without_const_folding (0xff80000000000000 + oldval);
+  oldval = static_cast<int> (longval);
+
+  amo.compare_exchange_strong (oldval, newval);
+
+  if (newval != amo.load (std::memory_order_relaxed))
+    __builtin_abort ();
+
+  return 0;
+}
+