diff mbox series

rs6000: Retry tbegin. instructions that can fail intermittently

Message ID 9eb00ba1-e01e-3b47-7ac5-8092021d1114@linux.ibm.com
State New
Headers show
Series rs6000: Retry tbegin. instructions that can fail intermittently | expand

Commit Message

Peter Bergner Feb. 15, 2022, 7:03 p.m. UTC
The HTM tbegin. instruction can fail intermittently due to many reasons.
This can lead to the htm-1.c testsuite test case FAILing from time to time.
The solution is to allow retrying the instruction a few times before aborting.

Bill has seen this fail a 1-2 times per 10000 runs.  With the change below, I
haven't seen the test case fail in over 1 million runs.

Ok for trunk and backports to the release branches?

Peter


gcc/testsuite/
	* gcc.target/powerpc/htm-1.c: Retry intermittent failing tbegins.

Comments

Segher Boessenkool Feb. 15, 2022, 9:54 p.m. UTC | #1
Hi!

On Tue, Feb 15, 2022 at 01:03:09PM -0600, Peter Bergner wrote:
> The HTM tbegin. instruction can fail intermittently due to many reasons.

Just a few really.  But, if the transaction fails it will appear as if
the tbegin. had an error as well, is that what you are seeing?

> This can lead to the htm-1.c testsuite test case FAILing from time to time.
> The solution is to allow retrying the instruction a few times before aborting.

That is the way any HTM code should be written in the first place
(except for rollback-only transactions, but let's not go there --
besides, it is normal for those to fail as well, and there needs to be a
fallback there as well :-) )

The patch is fine.  Okay for trunk and backports (after soak time ofc).
Thanks!


Segher


> gcc/testsuite/
> 	* gcc.target/powerpc/htm-1.c: Retry intermittent failing tbegins.
Peter Bergner Feb. 15, 2022, 10:59 p.m. UTC | #2
On 2/15/22 3:54 PM, Segher Boessenkool wrote:
> On Tue, Feb 15, 2022 at 01:03:09PM -0600, Peter Bergner wrote:
>> The HTM tbegin. instruction can fail intermittently due to many reasons.
> 
> Just a few really.  But, if the transaction fails it will appear as if
> the tbegin. had an error as well, is that what you are seeing?

Yeah, exactly.  The transaction itself fails, which rolls us back to
the tbegin. with an error return value so we fall into the else/error
clause.



> That is the way any HTM code should be written in the first place
> (except for rollback-only transactions, but let's not go there --
> besides, it is normal for those to fail as well, and there needs to be a
> fallback there as well :-) )

Agreed and I'm not sure why I didn't write it that way to begin with.
Maybe I thought it was so simple that the likelihood of it failing was
so small we'd never see it?  Anyway, we do now, so...



> The patch is fine.  Okay for trunk and backports (after soak time ofc).
> Thanks!

Thanks, pushed.

Peter
Segher Boessenkool Feb. 16, 2022, 9:55 a.m. UTC | #3
On Tue, Feb 15, 2022 at 04:59:45PM -0600, Peter Bergner wrote:
> > That is the way any HTM code should be written in the first place
> > (except for rollback-only transactions, but let's not go there --
> > besides, it is normal for those to fail as well, and there needs to be a
> > fallback there as well :-) )
> 
> Agreed and I'm not sure why I didn't write it that way to begin with.
> Maybe I thought it was so simple that the likelihood of it failing was
> so small we'd never see it?  Anyway, we do now, so...

Yeah, and you perhaps were misled by not seeing it fail in any testing
(it fails only .02% of the time you said).  For that reason it helps to
make testcases fail *more* often.  That isn't very trivial to do with
HTM of course.  Since we don't do HTM anymore it will all fade away, and
let's not bother, why am I typing still :-)


Segher
diff mbox series

Patch

diff --git a/gcc/testsuite/gcc.target/powerpc/htm-1.c b/gcc/testsuite/gcc.target/powerpc/htm-1.c
index f27e32ca281..399a7ec8812 100644
--- a/gcc/testsuite/gcc.target/powerpc/htm-1.c
+++ b/gcc/testsuite/gcc.target/powerpc/htm-1.c
@@ -12,14 +12,21 @@  main (void)
 {
   long i;
   unsigned long mask = 0;
+  unsigned long retry_count = 0;
 
 repeat:
   if (__builtin_tbegin (0))
     {
       mask++;
+      retry_count = 0;
     }
   else
-    abort();
+    {
+      /* Retry a limited number of times before aborting.  */
+      if (retry_count++ < 10)
+	goto repeat;
+      abort ();
+    }
 
   if (mask == 1)
     {