Message ID | 1329756857.2986.1153.camel@triegel.csb |
---|---|
State | New |
Headers | show |
On 02/20/12 08:54, Torvald Riegel wrote: > The following patches don't depend on each other but are all for the > gl_wt TM method. > > patch1: > For memory transfers from source regions with the read-for-write > modifier, undo logging of these regions was missing. Also optimize the > number of accesses to gtm_thr(). Out of curiosity, does an optional argument as in static void pre_write(const void *addr, size_t len, gtm_thread *tx = gtm_thr()) work just as well as + static void pre_write(const void *addr, size_t len) + { + gtm_thread *tx = gtm_thr(); + pre_write(addr, len, tx); + } > + if (dst_mod == WaW || dst_mod == NONTXNAL) > + tx = gtm_thr(); This sort of micro-optimization can't be good. If we are, for some reason, not unifying the two loads, it seems to me that it'd be better to simply unconditionally load the thread data here: > + gtm_thread *tx = 0; The actual bug fix, calling pre_write for RfW, is of course ok. > patch2: > Similar to the ml_wt code, we don't need to use acq_rel but just acquire > memory order when acquiring the orec. The following release fence is > still necessary. Ok. > > patch3: > Recently, I changed how gtm_rwlock handles upgrades to serial mode. > This change was to fix privatization safety and basically consists of > not modifying gtm_thread::shared_state until trycommit() and/or > rollback() have finished after the upgrade (and before the switch to > serialirr dispatch). With this in place, we can avoid the previous > workarounds for the commit-in-serial-mode corner case. Ok. > > patch4: > Privatization safety in gl_wt rollback was previously enforced with > something that probably worked but for which the precise C++11 mem model > reasoning was unclear. The patch fixes that (so it's clear why it works > on the basis of the language's memory model, not on the HW models), and > it less costly than the previously used seq_cst barrier. Ok. > > patch5: > Just put gl_wt's global orec on a separate cacheline, same as we do in > ml_wt. > + atomic<gtm_word> orec __attribute__((aligned(HW_CACHELINE_SIZE))); > + char tailpadding[HW_CACHELINE_SIZE - sizeof(atomic<gtm_word>)]; The "tailpadding" structure should not be required. Since an element of the structure is aligned, the whole class will be aligned. And the compiler automatically pads aligned structures so that arrays of them work, and satisfy the alignment. Or is the orec overlapping with the vtable or something? r~
On Mon, 2012-02-20 at 10:36 -0800, Richard Henderson wrote: > On 02/20/12 08:54, Torvald Riegel wrote: > > The following patches don't depend on each other but are all for the > > gl_wt TM method. > > > > patch1: > > For memory transfers from source regions with the read-for-write > > modifier, undo logging of these regions was missing. Also optimize the > > number of accesses to gtm_thr(). > > Out of curiosity, does an optional argument as in > > static void pre_write(const void *addr, size_t len, gtm_thread *tx = gtm_thr()) > > work just as well as > > + static void pre_write(const void *addr, size_t len) > + { > + gtm_thread *tx = gtm_thr(); > + pre_write(addr, len, tx); > + } > > > + if (dst_mod == WaW || dst_mod == NONTXNAL) > > + tx = gtm_thr(); > > This sort of micro-optimization can't be good. If we are, for some reason, > not unifying the two loads, it seems to me that it'd be better to simply > unconditionally load the thread data here: > > > + gtm_thread *tx = 0; > > The actual bug fix, calling pre_write for RfW, is of course ok. > > > patch2: > > Similar to the ml_wt code, we don't need to use acq_rel but just acquire > > memory order when acquiring the orec. The following release fence is > > still necessary. > > Ok. > > > > > > patch3: > > Recently, I changed how gtm_rwlock handles upgrades to serial mode. > > This change was to fix privatization safety and basically consists of > > not modifying gtm_thread::shared_state until trycommit() and/or > > rollback() have finished after the upgrade (and before the switch to > > serialirr dispatch). With this in place, we can avoid the previous > > workarounds for the commit-in-serial-mode corner case. > > Ok. > > > > > > patch4: > > Privatization safety in gl_wt rollback was previously enforced with > > something that probably worked but for which the precise C++11 mem model > > reasoning was unclear. The patch fixes that (so it's clear why it works > > on the basis of the language's memory model, not on the HW models), and > > it less costly than the previously used seq_cst barrier. > > Ok. > > > > > patch5: > > Just put gl_wt's global orec on a separate cacheline, same as we do in > > ml_wt. > > > + atomic<gtm_word> orec __attribute__((aligned(HW_CACHELINE_SIZE))); > > + char tailpadding[HW_CACHELINE_SIZE - sizeof(atomic<gtm_word>)]; > > The "tailpadding" structure should not be required. Since an element of > the structure is aligned, the whole class will be aligned. And the > compiler automatically pads aligned structures so that arrays of them > work, and satisfy the alignment. > > Or is the orec overlapping with the vtable or something? It is without the tail-padding, but currently the virtual functions are called very infrequently. Removed the tailpadding and added a comment instead. Committed with your changes applied.
diff --git a/libitm/method-gl.cc b/libitm/method-gl.cc index d6d050d..8df3d30 100644 --- a/libitm/method-gl.cc +++ b/libitm/method-gl.cc @@ -85,9 +85,8 @@ static gl_mg o_gl_mg; class gl_wt_dispatch : public abi_dispatch { protected: - static void pre_write(const void *addr, size_t len) + static void pre_write(const void *addr, size_t len, gtm_thread *tx) { - gtm_thread *tx = gtm_thr(); gtm_word v = tx->shared_state.load(memory_order_relaxed); if (unlikely(!gl_mg::is_locked(v))) { @@ -123,7 +122,13 @@ protected: tx->undolog.log(addr, len); } - static void validate() + static void pre_write(const void *addr, size_t len) + { + gtm_thread *tx = gtm_thr(); + pre_write(addr, len, tx); + } + + static void validate(gtm_thread *tx) { // Check that snapshot is consistent. We expect the previous data load to // have acquire memory order, or be atomic and followed by an acquire @@ -137,12 +142,17 @@ protected: // or read an orec value that was written after the data had been written. // Either will allow us to detect inconsistent reads because it will have // a higher/different value. - gtm_thread *tx = gtm_thr(); gtm_word l = o_gl_mg.orec.load(memory_order_relaxed); if (l != tx->shared_state.load(memory_order_relaxed)) tx->restart(RESTART_VALIDATE_READ); } + static void validate() + { + gtm_thread *tx = gtm_thr(); + validate(tx); + } + template <typename V> static V load(const V* addr, ls_modifier mod) { // Read-for-write should be unlikely, but we need to handle it or will @@ -198,9 +208,20 @@ public: static void memtransfer_static(void *dst, const void* src, size_t size, bool may_overlap, ls_modifier dst_mod, ls_modifier src_mod) { - if ((dst_mod != WaW && src_mod != RaW) - && (dst_mod != NONTXNAL || src_mod == RfW)) - pre_write(dst, size); + gtm_thread *tx = 0; + if (dst_mod != WaW && dst_mod != NONTXNAL) + { + tx = gtm_thr(); + pre_write(dst, size, tx); + } + // We need at least undo-logging for an RfW src region because we might + // subsequently write there with WaW. + if (src_mod == RfW) + { + if (dst_mod == WaW || dst_mod == NONTXNAL) + tx = gtm_thr(); + pre_write(src, size, tx); + } // FIXME We should use atomics here (see store()). Let's just hope that // memcpy/memmove are good enough. @@ -211,7 +232,11 @@ public: if (src_mod != RfW && src_mod != RaW && src_mod != NONTXNAL && dst_mod != WaW) - validate(); + { + if ((dst_mod == WaW || dst_mod == NONTXNAL) && src_mod != RfW) + tx = gtm_thr(); + validate(tx); + } } static void memset_static(void *dst, int c, size_t size, ls_modifier mod)