Message ID | 509C5B26.6000204@redhat.com |
---|---|
State | New |
Headers | show |
Richard Henderson <rth@redhat.com> writes: > > static inline void > -htm_abort () > +htm_abort_retry () > { > // ??? According to a yet unpublished ABI rule, 0xff is reserved and > // supposed to signal a busy lock. Source: andi.kleen@intel.com > _xabort(0xff); > } > > +static inline void > +htm_abort_cancel () > +{ > + // ??? What's the unpublished ABI rule for this, Andi? There is none for cancel, just for lock-is-locked (0xfe) and lock-busy (0xff) The convention is just for easier abort profiling. The profiler (perf) can display this abort code and it's far easier to understand if common situations have their standard code. But you can always make up your own too. -Andi
On 2012-11-08 17:23, Richard Henderson wrote: > I believe this is the sort of patch that Torvald was talking about > for handling abortTransaction with RTM. FYI, I realized that this patch doesn't handle aborts on a nested transaction properly. r~
On Thu, 2012-11-08 at 17:23 -0800, Richard Henderson wrote: > + // Honor an abort from abortTransaction. > + else if (htm_abort_is_cancel(ret)) > + return a_abortTransaction | a_restoreLiveVariables; The problem is that we cannot reliably detect whether an abort with a certain abort reason code really means that we got canceled. A nested transaction is one example: how do we distinguish whether the nested or the outermost transaction were canceled?. Another example is transaction_pure or tm_wrapper code that might reuse the same abort reason for explicit aborts: we would like to allow malloc() calls in transactions, but if malloc uses transactions with explicit aborts, we might abort a transaction without a cause. It might be possible to avoid that by carefully specifying the rules for abort reasons, which of those can be used in nested transactions, and so on. But I'm not aware of any specification like that, and it seems to likely be a pretty fragile approach. Thus, there are two options for how to handle transactions that may abort: Either execute them transactionally the first time, and do an explicit HTM abort on transaction_cancel that tells libitm not not retry as HW transaction, or execute such transactions always as SW transactions. Which of these two options is better depends on how frequently transaction_cancel would actually be called. If it's infrequent, then trying to run as HW transactions might be better. Torvald
On Thu, 2012-11-08 at 20:34 -0800, Andi Kleen wrote: > Richard Henderson <rth@redhat.com> writes: > > > > static inline void > > -htm_abort () > > +htm_abort_retry () > > { > > // ??? According to a yet unpublished ABI rule, 0xff is reserved and > > // supposed to signal a busy lock. Source: andi.kleen@intel.com > > _xabort(0xff); > > } > > > > +static inline void > > +htm_abort_cancel () > > +{ > > + // ??? What's the unpublished ABI rule for this, Andi? > > There is none for cancel, just for lock-is-locked (0xfe) and lock-busy (0xff) > > The convention is just for easier abort profiling. The profiler (perf) can display > this abort code and it's far easier to understand if common situations > have their standard code. But you can always make up your own too. I'm not sure this is quite true. If a libitm-executed transaction is started from within some other transactional region (e.g., managed explicitly by the user), and those two disagree about what is an abort that should be retried or not, then this can at least have a negative impact on performance: Either because the outermost HW transaction handler thinks it shouldn't retry even though it should, or because the outermost transaction retries continuously even though there's no way it can succeed. If all code that uses transactions will eventually choose the fallback path, this is not a correctness issue. But it can be a performance issue, so I think that HTMs and the code using them should be careful to agree about abort reasons and what they communicate. Torvald
Torvald Riegel <triegel@redhat.com> writes: > > I'm not sure this is quite true. If a libitm-executed transaction is > started from within some other transactional region (e.g., managed > explicitly by the user), and those two disagree about what is an abort > that should be retried or not, then this can at least have a negative > impact on performance: Either because the outermost HW transaction > handler thinks it shouldn't retry even though it should, or because the > outermost transaction retries continuously even though there's no way it > can succeed. It shouldn't retry continuously for a nested abort. Also retrying a lot is dangerous anyways, you have to be very careful about that. Short term your biggest problem is profiling the whole thing in any case, and that is what the XABORT with well defined codes make much easier. -Andi
Torvald Riegel <triegel@redhat.com> writes: > On Thu, 2012-11-08 at 17:23 -0800, Richard Henderson wrote: >> + // Honor an abort from abortTransaction. >> + else if (htm_abort_is_cancel(ret)) >> + return a_abortTransaction | a_restoreLiveVariables; > > The problem is that we cannot reliably detect whether an abort with a > certain abort reason code really means that we got canceled. A nested > transaction is one example: how do we distinguish whether the nested or > the outermost transaction were canceled?. In TSX you always come back to the outermost anyways. Look at the nested bit in the abort code? > Thus, there are two options for how to handle transactions that may > abort: Either execute them transactionally the first time, and do an > explicit HTM abort on transaction_cancel that tells libitm not not retry > as HW transaction, or execute such transactions always as SW > transactions. Which of these two options is better depends on how > frequently transaction_cancel would actually be called. If it's > infrequent, then trying to run as HW transactions might be better. I believe the current evidence is that cancel is very uncommon. -Andi
On Fri, 2012-11-09 at 10:24 -0800, Andi Kleen wrote: > Torvald Riegel <triegel@redhat.com> writes: > > > On Thu, 2012-11-08 at 17:23 -0800, Richard Henderson wrote: > >> + // Honor an abort from abortTransaction. > >> + else if (htm_abort_is_cancel(ret)) > >> + return a_abortTransaction | a_restoreLiveVariables; > > > > The problem is that we cannot reliably detect whether an abort with a > > certain abort reason code really means that we got canceled. A nested > > transaction is one example: how do we distinguish whether the nested or > > the outermost transaction were canceled?. > > In TSX you always come back to the outermost anyways. That, and the fact that we cannot reliably communicate which transaction (on the programming language level) was supposed to be aborted, is exactly the problem that I described.
> I'm not sure this is quite true. If a libitm-executed transaction is
It's just a convention. You don't have to use it. Not doing it will
just make abort profiling harder.
-Andi
On Fri, 2012-11-09 at 21:54 +0100, Andi Kleen wrote: > > I'm not sure this is quite true. If a libitm-executed transaction is > > It's just a convention. You don't have to use it. That's true ... > Not doing it will > just make abort profiling harder. ... but I disagree with this one. This won't only make profiling harder, but it also can have a negative impact on performance. If you think that there's something wrong with the reasons that I gave for why this can be a performance issue, please say why you think so instead of just saying that it's just about profiling. To avoid the potential performance complications, it would probably be sufficient to encourage people to follow this convention. For example, making sure that libitm and other libraries such as glibc are aware of it (or, ideally, follow it) would cover most of the combinations I suppose; perhaps the convention should also be mentioned in the RTM intrinsics header file. Torvald
diff --git a/libitm/beginend.cc b/libitm/beginend.cc index 4369946..8f5c4ef 100644 --- a/libitm/beginend.cc +++ b/libitm/beginend.cc @@ -166,18 +166,16 @@ GTM::gtm_thread::begin_transaction (uint32_t prop, const gtm_jmpbuf *jb) GTM_fatal("pr_undoLogCode not supported"); #if defined(USE_HTM_FASTPATH) && !defined(HTM_CUSTOM_FASTPATH) - // HTM fastpath. Only chosen in the absence of transaction_cancel to allow - // using an uninstrumented code path. - // The fastpath is enabled only by dispatch_htm's method group, which uses - // serial-mode methods as fallback. Serial-mode transactions cannot execute - // concurrently with HW transactions because the latter monitor the serial - // lock's writer flag and thus abort if another thread is or becomes a - // serial transaction. Therefore, if the fastpath is enabled, then a - // transaction is not executing as a HW transaction iff the serial lock is - // write-locked. This allows us to use htm_fastpath and the serial lock's - // writer flag to reliable determine whether the current thread runs a HW - // transaction, and thus we do not need to maintain this information in - // per-thread state. + // HTM fastpath. The fastpath is enabled only by dispatch_htm's method + // group, which uses serial-mode methods as fallback. Serial-mode + // transactions cannot execute concurrently with HW transactions because + // the latter monitor the serial lock's writer flag and thus abort if + // another thread is or becomes a serial transaction. Therefore, if the + // fastpath is enabled, then a transaction is not executing as a HW + // transaction iff the serial lock is write-locked. This allows us to + // use htm_fastpath and the serial lock's writer flag to reliable determine + // whether the current thread runs a HW transaction, and thus we do not + // need to maintain this information in per-thread state. // If an uninstrumented code path is not available, we can still run // instrumented code from a HW transaction because the HTM fastpath kicks // in early in both begin and commit, and the transaction is not canceled. @@ -187,7 +185,7 @@ GTM::gtm_thread::begin_transaction (uint32_t prop, const gtm_jmpbuf *jb) // indeed in serial mode, and HW transactions should never need serial mode // for any internal changes (e.g., they never abort visibly to the STM code // and thus do not trigger the standard retry handling). - if (likely(htm_fastpath && (prop & pr_hasNoAbort))) + if (likely(htm_fastpath)) { for (uint32_t t = htm_fastpath; t; t--) { @@ -198,33 +196,41 @@ GTM::gtm_thread::begin_transaction (uint32_t prop, const gtm_jmpbuf *jb) // Monitor the writer flag in the serial-mode lock, and abort // if there is an active or waiting serial-mode transaction. if (unlikely(serial_lock.is_write_locked())) - htm_abort(); + htm_abort_retry(); else // We do not need to set a_saveLiveVariables because of HTM. return (prop & pr_uninstrumentedCode) ? a_runUninstrumentedCode : a_runInstrumentedCode; } - // The transaction has aborted. Don't retry if it's unlikely that + // The transaction has aborted. Retry if it's likely that // retrying the transaction will be successful. - if (!htm_abort_should_retry(ret)) - break; - // Wait until any concurrent serial-mode transactions have finished. - // This is an empty critical section, but won't be elided. - if (serial_lock.is_write_locked()) + if (htm_abort_should_retry(ret)) { - tx = gtm_thr(); - if (unlikely(tx == NULL)) - { - // See below. - tx = new gtm_thread(); - set_gtm_thr(tx); - } - serial_lock.read_lock(tx); - serial_lock.read_unlock(tx); - // TODO We should probably reset the retry count t here, unless - // we have retried so often that we should go serial to avoid - // starvation. + // Wait until any concurrent serial-mode transactions have + // finished. This is an empty critical section, but won't + // be elided. + if (serial_lock.is_write_locked()) + { + tx = gtm_thr(); + if (unlikely(tx == NULL)) + { + // See below. + tx = new gtm_thread(); + set_gtm_thr(tx); + } + serial_lock.read_lock(tx); + serial_lock.read_unlock(tx); + // TODO We should probably reset the retry count t here, + // unless we have retried so often that we should go + // serial to avoid starvation. + } } + // Honor an abort from abortTransaction. + else if (htm_abort_is_cancel(ret)) + return a_abortTransaction | a_restoreLiveVariables; + // Otherwise quit using HTM and fall back to STM. + else + break; } } #endif @@ -438,6 +444,11 @@ GTM::gtm_thread::rollback (gtm_transaction_cp *cp, bool aborting) void ITM_REGPARM _ITM_abortTransaction (_ITM_abortReason reason) { +#if defined(USE_HTM_FASTPATH) + if (likely(htm_fastpath && !gtm_thread::serial_lock.is_write_locked())) + htm_abort_cancel(); +#endif + gtm_thread *tx = gtm_thr(); assert (reason == userAbort || reason == (userAbort | outerAbort)); diff --git a/libitm/config/x86/target.h b/libitm/config/x86/target.h index 41ae2eb..1d2e822 100644 --- a/libitm/config/x86/target.h +++ b/libitm/config/x86/target.h @@ -125,18 +125,32 @@ htm_commit () } static inline void -htm_abort () +htm_abort_retry () { // ??? According to a yet unpublished ABI rule, 0xff is reserved and // supposed to signal a busy lock. Source: andi.kleen@intel.com _xabort(0xff); } +static inline void +htm_abort_cancel () +{ + // ??? What's the unpublished ABI rule for this, Andi? + _xabort(0); +} + static inline bool htm_abort_should_retry (uint32_t begin_ret) { return begin_ret & _XABORT_RETRY; } + +static inline bool +htm_abort_is_cancel (uint32_t begin_ret) +{ + // return (begin_ret & _XABORT_EXPLICIT) && _XABORT_CODE(begin_ret) == 0; + return begin_ret == _XABORT_EXPLICIT; +} #endif