diff mbox

libitm: Fix handling of reentrancy in the HTM fastpath

Message ID 1371653823.16968.25276.camel@triegel.csb
State New
Headers show

Commit Message

Torvald Riegel June 19, 2013, 2:57 p.m. UTC
(Re-sending to the proper list. Sorry for the noise at gcc@.)

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57643

The HTM fastpath didn't handle a situation in which a relaxed
transaction executed unsafe code that in turn starts a transaction; it
simply tried to wait for the "other" transaction, not checking whether
the current thread started the other transaction.

We fix this by doing this check, and if we have the lock, we just
continue with the fallback serial-mode path instead of using a HW
transaction.  The current code won't do the check before starting a HW
transaction, but this way we can keep the HTM fastpath unchanged; also,
this particular "reentrancy" is probably infrequent in practice, so I
suppose the small slowdown shouldn't matter much.

Also, I first thought about trying to use the HTM in the reentrancy
case, but this doesn't make any sense because other transactions can't
run anyway, and we should really just finish the serial-mode transaction
as fast as possible.

Peter and/or Andreas: Could you please check that this fixes the bug you
see on Power/s390?  Thanks.

Torvald
commit 185af84e365e1bae31aea5afd6e67e81f3c32c72
Author: Torvald Riegel <triegel@redhat.com>
Date:   Wed Jun 19 16:42:24 2013 +0200

    libitm: Fix handling of reentrancy in the HTM fastpath.
    
    PR libitm/57643

Comments

Peter Bergner June 19, 2013, 3:49 p.m. UTC | #1
On Wed, 2013-06-19 at 16:57 +0200, Torvald Riegel wrote:
> (Re-sending to the proper list. Sorry for the noise at gcc@.)
> 
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57643
> 
> The HTM fastpath didn't handle a situation in which a relaxed
> transaction executed unsafe code that in turn starts a transaction; it
> simply tried to wait for the "other" transaction, not checking whether
> the current thread started the other transaction.
[snip]
> Peter and/or Andreas: Could you please check that this fixes the bug you
> see on Power/s390?  Thanks.

This patch fixed the hang, but now I'm dying due to an abort in the
test case.  Specifically, the first abort in unsafe()

        int __attribute__((transaction_unsafe)) unsafe(int i)
         {
           if (_ITM_inTransaction() != inIrrevocableTransaction)
 dying here:  abort();
           __transaction_atomic {
             x++;
           }
           if (_ITM_inTransaction() != inIrrevocableTransaction)
             abort();
           return i+1;
         }

This is due to the following in _ITM_inTransaction():

47	  if (tx && (tx->nesting > 0))
(gdb) p tx
$2 = (GTM::gtm_thread *) 0x10901bf0
(gdb) p tx->nesting
$3 = 1
(gdb) step
49	      if (tx->state & gtm_thread::STATE_IRREVOCABLE)
(gdb) p tx->state
$4 = 3
(gdb) p gtm_thread::STATE_IRREVOCABLE
$5 = 2
(gdb) step
50		return inIrrevocableTransaction;

Peter
Peter Bergner June 19, 2013, 3:57 p.m. UTC | #2
On Wed, 2013-06-19 at 10:49 -0500, Peter Bergner wrote:
> This is due to the following in _ITM_inTransaction():
> 
> 47	  if (tx && (tx->nesting > 0))
> (gdb) p tx
> $2 = (GTM::gtm_thread *) 0x10901bf0
> (gdb) p tx->nesting
> $3 = 1
> (gdb) step
> 49	      if (tx->state & gtm_thread::STATE_IRREVOCABLE)
> (gdb) p tx->state
> $4 = 3
> (gdb) p gtm_thread::STATE_IRREVOCABLE
> $5 = 2
> (gdb) step
> 50		return inIrrevocableTransaction;

Bah, ignore this.  It's a different call that is returning something other
than inIrrevocableTransaction.  Unfortunately, gdb is having problems inside
hw txns and I'm having trouble seeing why/when _ITM_inTransaction() is
returning something other than inIrrevocableTransaction.  I'll see if I can
determine why and will report back.

Peter
Peter Bergner June 19, 2013, 7:43 p.m. UTC | #3
On Wed, 2013-06-19 at 10:57 -0500, Peter Bergner wrote:
> On Wed, 2013-06-19 at 10:49 -0500, Peter Bergner wrote:
> > This is due to the following in _ITM_inTransaction():
> > 
> > 47	  if (tx && (tx->nesting > 0))
> > (gdb) p tx
> > $2 = (GTM::gtm_thread *) 0x10901bf0
> > (gdb) p tx->nesting
> > $3 = 1
> > (gdb) step
> > 49	      if (tx->state & gtm_thread::STATE_IRREVOCABLE)
> > (gdb) p tx->state
> > $4 = 3
> > (gdb) p gtm_thread::STATE_IRREVOCABLE
> > $5 = 2
> > (gdb) step
> > 50		return inIrrevocableTransaction;
> 
> Bah, ignore this.  It's a different call that is returning something other
> than inIrrevocableTransaction.  Unfortunately, gdb is having problems inside
> hw txns and I'm having trouble seeing why/when _ITM_inTransaction() is
> returning something other than inIrrevocableTransaction.  I'll see if I can
> determine why and will report back.

Ok, we return outsideTransaction because the nesting level (tx->nesting)
is zero.

Peter
Richard Henderson June 20, 2013, 4:21 p.m. UTC | #4
On 06/19/2013 07:57 AM, Torvald Riegel wrote:
> commit 185af84e365e1bae31aea5afd6e67e81f3c32c72
> Author: Torvald Riegel <triegel@redhat.com>
> Date:   Wed Jun 19 16:42:24 2013 +0200
> 
>     libitm: Fix handling of reentrancy in the HTM fastpath.
>     
>     PR libitm/57643

Ok.


r~
diff mbox

Patch

diff --git a/libitm/beginend.cc b/libitm/beginend.cc
index 93e702e..a3bf549 100644
--- a/libitm/beginend.cc
+++ b/libitm/beginend.cc
@@ -197,6 +197,8 @@  GTM::gtm_thread::begin_transaction (uint32_t prop, const gtm_jmpbuf *jb)
 	      // We are executing a transaction now.
 	      // Monitor the writer flag in the serial-mode lock, and abort
 	      // if there is an active or waiting serial-mode transaction.
+	      // Note that this can also happen due to an enclosing
+	      // serial-mode transaction; we handle this case below.
 	      if (unlikely(serial_lock.is_write_locked()))
 		htm_abort();
 	      else
@@ -219,6 +221,14 @@  GTM::gtm_thread::begin_transaction (uint32_t prop, const gtm_jmpbuf *jb)
 	          tx = new gtm_thread();
 	          set_gtm_thr(tx);
 	        }
+	      // Check whether there is an enclosing serial-mode transaction;
+	      // if so, we just continue as a nested transaction and don't
+	      // try to use the HTM fastpath.  This case can happen when an
+	      // outermost relaxed transaction calls unsafe code that starts
+	      // a transaction.
+	      if (tx->nesting > 0)
+		break;
+	      // Another thread is running a serial-mode transaction.  Wait.
 	      serial_lock.read_lock(tx);
 	      serial_lock.read_unlock(tx);
 	      // TODO We should probably reset the retry count t here, unless