Message ID | af6efceb-8b05-b44f-7487-bb1192bda652@mentor.com |
---|---|
State | New |
Headers | show |
Series | [nvptx,PR84041] Add memory_barrier insn | expand |
On 04/09/2018 03:19 PM, Tom de Vries wrote: > Hi, > > we've been having hanging OpenMP tests for nvptx offloading: > for-{3,5,6}.c and the corresponding C++ test-cases. > > The failures have now been analyzed down to gomp_ptrlock_get in > libgomp/config/nvptx/ptrlock.h: > ... > static inline void *gomp_ptrlock_get (gomp_ptrlock_t *ptrlock) > { > uintptr_t v = (uintptr_t) __atomic_load_n (ptrlock, MEMMODEL_ACQUIRE); > if (v > 2) > return (void *) v; > > if (v == 0 > && __atomic_compare_exchange_n (ptrlock, &v, 1, false, > MEMMODEL_ACQUIRE, > MEMMODEL_ACQUIRE)) > return NULL; > > while (v == 1) > v = (uintptr_t) __atomic_load_n (ptrlock, MEMMODEL_ACQUIRE); > > return (void *) v; > } > ... > > There's no atomic load insn defined for nvptx, and also no memory > barrier insn, so the atomic load ends up generating a normal load. The > JIT compiler does loop-invariant code motion, and moves the load out of > the loop, which turns the while into an eternal loop. > > > Fix conservatively by defining the memory_barrier insn. This can > possibly be fixed more optimally by implementing an atomic load > operation in nvptx. > > Build x86_64 with nvptx accelerator and reg-tested libgomp. > > Committed to stage4 trunk. > And back-ported to og7 branch. Thanks, - Tom > 0001-nvptx-Add-memory_barrier-insn.patch > > > [nvptx] Add memory_barrier insn > > 2018-04-09 Tom de Vries <tom@codesourcery.com> > > PR target/84041 > * config/nvptx/nvptx.md (define_c_enum "unspecv"): Add UNSPECV_MEMBAR. > (define_expand "*memory_barrier"): New define_expand. > (define_insn "memory_barrier"): New insn. > > --- > gcc/config/nvptx/nvptx.md | 22 ++++++++++++++++++++++ > 1 file changed, 22 insertions(+) > > diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md > index 4f4453d..68bba36 100644 > --- a/gcc/config/nvptx/nvptx.md > +++ b/gcc/config/nvptx/nvptx.md > @@ -55,6 +55,7 @@ > UNSPECV_CAS > UNSPECV_XCHG > UNSPECV_BARSYNC > + UNSPECV_MEMBAR > UNSPECV_DIM_POS > > UNSPECV_FORK > @@ -1459,6 +1460,27 @@ > "\\tbar.sync\\t%0;" > [(set_attr "predicable" "false")]) > > +(define_expand "memory_barrier" > + [(set (match_dup 0) > + (unspec_volatile:BLK [(match_dup 0)] UNSPECV_MEMBAR))] > + "" > +{ > + operands[0] = gen_rtx_MEM (BLKmode, gen_rtx_SCRATCH (Pmode)); > + MEM_VOLATILE_P (operands[0]) = 1; > +}) > + > +;; Ptx defines the memory barriers membar.cta, membar.gl and membar.sys > +;; (corresponding to cuda functions threadfence_block, threadfence and > +;; threadfence_system). For the insn memory_barrier we use membar.sys. This > +;; may be overconservative, but before using membar.gl instead we'll need to > +;; explain in detail why it's safe to use. For now, use membar.sys. > +(define_insn "*memory_barrier" > + [(set (match_operand:BLK 0 "" "") > + (unspec_volatile:BLK [(match_dup 0)] UNSPECV_MEMBAR))] > + "" > + "\\tmembar.sys;" > + [(set_attr "predicable" "false")]) > + > (define_insn "nvptx_nounroll" > [(unspec_volatile [(const_int 0)] UNSPECV_NOUNROLL)] > "" >
[nvptx] Add memory_barrier insn 2018-04-09 Tom de Vries <tom@codesourcery.com> PR target/84041 * config/nvptx/nvptx.md (define_c_enum "unspecv"): Add UNSPECV_MEMBAR. (define_expand "*memory_barrier"): New define_expand. (define_insn "memory_barrier"): New insn. --- gcc/config/nvptx/nvptx.md | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md index 4f4453d..68bba36 100644 --- a/gcc/config/nvptx/nvptx.md +++ b/gcc/config/nvptx/nvptx.md @@ -55,6 +55,7 @@ UNSPECV_CAS UNSPECV_XCHG UNSPECV_BARSYNC + UNSPECV_MEMBAR UNSPECV_DIM_POS UNSPECV_FORK @@ -1459,6 +1460,27 @@ "\\tbar.sync\\t%0;" [(set_attr "predicable" "false")]) +(define_expand "memory_barrier" + [(set (match_dup 0) + (unspec_volatile:BLK [(match_dup 0)] UNSPECV_MEMBAR))] + "" +{ + operands[0] = gen_rtx_MEM (BLKmode, gen_rtx_SCRATCH (Pmode)); + MEM_VOLATILE_P (operands[0]) = 1; +}) + +;; Ptx defines the memory barriers membar.cta, membar.gl and membar.sys +;; (corresponding to cuda functions threadfence_block, threadfence and +;; threadfence_system). For the insn memory_barrier we use membar.sys. This +;; may be overconservative, but before using membar.gl instead we'll need to +;; explain in detail why it's safe to use. For now, use membar.sys. +(define_insn "*memory_barrier" + [(set (match_operand:BLK 0 "" "") + (unspec_volatile:BLK [(match_dup 0)] UNSPECV_MEMBAR))] + "" + "\\tmembar.sys;" + [(set_attr "predicable" "false")]) + (define_insn "nvptx_nounroll" [(unspec_volatile [(const_int 0)] UNSPECV_NOUNROLL)] ""