[nvptx,PR84041] Add memory_barrier insn

Message ID	af6efceb-8b05-b44f-7487-bb1192bda652@mentor.com
State	New
Headers	show Return-Path: <gcc-patches-return-476060-incoming=patchwork.ozlabs.org@gcc.gnu.org> DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to :from:cc:subject:message-id:date:mime-version:content-type; q= dns; s=default; b=rG5r0CCicl8vcHfT+/6aK2u9Zi5qwTLiUiLRhjmDH97Tek NFfQAK41cmCQ1VBI4gbhb2wGgL7ROtO9ptW1e2f8O7JhjGRAXKWoXBeVm4uAQPly sReUedtDuJZMz3EjhDai6u/iALI1DUBI7OorL8TDlYsc844F+llSLFbUed4LU= Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk Sender: gcc-patches-owner@gcc.gnu.org To: GCC Patches <gcc-patches@gcc.gnu.org> From: Tom de Vries <Tom_deVries@mentor.com> CC: Jakub Jelinek <jakub@redhat.com> Subject: [nvptx, PR84041] Add memory_barrier insn Message-ID: <af6efceb-8b05-b44f-7487-bb1192bda652@mentor.com> Date: Mon, 9 Apr 2018 15:19:52 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0 MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="------------AA459AEA8C3D97D146A835D4"
Series	[nvptx,PR84041] Add memory_barrier insn \| expand [nvptx,PR84041] Add memory_barrier insn

Message ID

af6efceb-8b05-b44f-7487-bb1192bda652@mentor.com

State

New

Headers

DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:to
	:from:cc:subject:message-id:date:mime-version:content-type; q=
	dns; s=default; b=rG5r0CCicl8vcHfT+/6aK2u9Zi5qwTLiUiLRhjmDH97Tek
	NFfQAK41cmCQ1VBI4gbhb2wGgL7ROtO9ptW1e2f8O7JhjGRAXKWoXBeVm4uAQPly
	sReUedtDuJZMz3EjhDai6u/iALI1DUBI7OorL8TDlYsc844F+llSLFbUed4LU=
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
Sender: gcc-patches-owner@gcc.gnu.org
To: GCC Patches <gcc-patches@gcc.gnu.org>
From: Tom de Vries <Tom_deVries@mentor.com>
CC: Jakub Jelinek <jakub@redhat.com>
Subject: [nvptx, PR84041] Add memory_barrier insn
Message-ID: <af6efceb-8b05-b44f-7487-bb1192bda652@mentor.com>
Date: Mon, 9 Apr 2018 15:19:52 +0200
User-Agent: Mozilla/5.0 (X11; Linux x86_64;
	rv:52.0) Gecko/20100101 Thunderbird/52.7.0
MIME-Version: 1.0
Content-Type: multipart/mixed;
	boundary="------------AA459AEA8C3D97D146A835D4"

Series

[nvptx,PR84041] Add memory_barrier insn | expand

Commit Message

Tom de Vries April 9, 2018, 1:19 p.m. UTC

Hi,

we've been having hanging OpenMP tests for nvptx offloading: 
for-{3,5,6}.c and the corresponding C++ test-cases.

The failures have now been analyzed down to gomp_ptrlock_get in 
libgomp/config/nvptx/ptrlock.h:
...
  static inline void *gomp_ptrlock_get (gomp_ptrlock_t *ptrlock)
{
   uintptr_t v = (uintptr_t) __atomic_load_n (ptrlock, MEMMODEL_ACQUIRE);
   if (v > 2)
     return (void *) v;

   if (v == 0
       && __atomic_compare_exchange_n (ptrlock, &v, 1, false,
                                       MEMMODEL_ACQUIRE,
                                       MEMMODEL_ACQUIRE))
     return NULL;

   while (v == 1)
     v = (uintptr_t) __atomic_load_n (ptrlock, MEMMODEL_ACQUIRE);

   return (void *) v;
}
...

There's no atomic load insn defined for nvptx, and also no memory 
barrier insn, so the atomic load ends up generating a normal load. The 
JIT compiler does loop-invariant code motion, and moves the load out of 
the loop, which turns the while into an eternal loop.


Fix conservatively by defining the memory_barrier insn. This can 
possibly be fixed more optimally by implementing an atomic load 
operation in nvptx.

Build x86_64 with nvptx accelerator and reg-tested libgomp.

Committed to stage4 trunk.

Thanks,
- Tom

Comments

Tom de Vries April 11, 2018, 9:04 a.m. UTC | #1

On 04/09/2018 03:19 PM, Tom de Vries wrote:
> Hi,
> 
> we've been having hanging OpenMP tests for nvptx offloading: 
> for-{3,5,6}.c and the corresponding C++ test-cases.
> 
> The failures have now been analyzed down to gomp_ptrlock_get in 
> libgomp/config/nvptx/ptrlock.h:
> ...
>   static inline void *gomp_ptrlock_get (gomp_ptrlock_t *ptrlock)
> {
>    uintptr_t v = (uintptr_t) __atomic_load_n (ptrlock, MEMMODEL_ACQUIRE);
>    if (v > 2)
>      return (void *) v;
> 
>    if (v == 0
>        && __atomic_compare_exchange_n (ptrlock, &v, 1, false,
>                                        MEMMODEL_ACQUIRE,
>                                        MEMMODEL_ACQUIRE))
>      return NULL;
> 
>    while (v == 1)
>      v = (uintptr_t) __atomic_load_n (ptrlock, MEMMODEL_ACQUIRE);
> 
>    return (void *) v;
> }
> ...
> 
> There's no atomic load insn defined for nvptx, and also no memory 
> barrier insn, so the atomic load ends up generating a normal load. The 
> JIT compiler does loop-invariant code motion, and moves the load out of 
> the loop, which turns the while into an eternal loop.
> 
> 
> Fix conservatively by defining the memory_barrier insn. This can 
> possibly be fixed more optimally by implementing an atomic load 
> operation in nvptx.
> 
> Build x86_64 with nvptx accelerator and reg-tested libgomp.
> 
> Committed to stage4 trunk.
> 

And back-ported to og7 branch.

Thanks,
- Tom

> 0001-nvptx-Add-memory_barrier-insn.patch
> 
> 
> [nvptx] Add memory_barrier insn
> 
> 2018-04-09  Tom de Vries  <tom@codesourcery.com>
> 
> 	PR target/84041
> 	* config/nvptx/nvptx.md (define_c_enum "unspecv"): Add UNSPECV_MEMBAR.
> 	(define_expand "*memory_barrier"): New define_expand.
> 	(define_insn "memory_barrier"): New insn.
> 
> ---
>   gcc/config/nvptx/nvptx.md | 22 ++++++++++++++++++++++
>   1 file changed, 22 insertions(+)
> 
> diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
> index 4f4453d..68bba36 100644
> --- a/gcc/config/nvptx/nvptx.md
> +++ b/gcc/config/nvptx/nvptx.md
> @@ -55,6 +55,7 @@
>      UNSPECV_CAS
>      UNSPECV_XCHG
>      UNSPECV_BARSYNC
> +   UNSPECV_MEMBAR
>      UNSPECV_DIM_POS
>   
>      UNSPECV_FORK
> @@ -1459,6 +1460,27 @@
>     "\\tbar.sync\\t%0;"
>     [(set_attr "predicable" "false")])
>   
> +(define_expand "memory_barrier"
> +  [(set (match_dup 0)
> +	(unspec_volatile:BLK [(match_dup 0)] UNSPECV_MEMBAR))]
> +  ""
> +{
> +  operands[0] = gen_rtx_MEM (BLKmode, gen_rtx_SCRATCH (Pmode));
> +  MEM_VOLATILE_P (operands[0]) = 1;
> +})
> +
> +;; Ptx defines the memory barriers membar.cta, membar.gl and membar.sys
> +;; (corresponding to cuda functions threadfence_block, threadfence and
> +;; threadfence_system).  For the insn memory_barrier we use membar.sys.  This
> +;; may be overconservative, but before using membar.gl instead we'll need to
> +;; explain in detail why it's safe to use.  For now, use membar.sys.
> +(define_insn "*memory_barrier"
> +  [(set (match_operand:BLK 0 "" "")
> +	(unspec_volatile:BLK [(match_dup 0)] UNSPECV_MEMBAR))]
> +  ""
> +  "\\tmembar.sys;"
> +  [(set_attr "predicable" "false")])
> +
>   (define_insn "nvptx_nounroll"
>     [(unspec_volatile [(const_int 0)] UNSPECV_NOUNROLL)]
>     ""
>

[nvptx] Add memory_barrier insn

2018-04-09  Tom de Vries  <tom@codesourcery.com>

	PR target/84041
	* config/nvptx/nvptx.md (define_c_enum "unspecv"): Add UNSPECV_MEMBAR.
	(define_expand "*memory_barrier"): New define_expand.
	(define_insn "memory_barrier"): New insn.

---
 gcc/config/nvptx/nvptx.md | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
index 4f4453d..68bba36 100644
--- a/gcc/config/nvptx/nvptx.md
+++ b/gcc/config/nvptx/nvptx.md
@@ -55,6 +55,7 @@ 
    UNSPECV_CAS
    UNSPECV_XCHG
    UNSPECV_BARSYNC
+   UNSPECV_MEMBAR
    UNSPECV_DIM_POS
 
    UNSPECV_FORK
@@ -1459,6 +1460,27 @@ 
   "\\tbar.sync\\t%0;"
   [(set_attr "predicable" "false")])
 
+(define_expand "memory_barrier"
+  [(set (match_dup 0)
+	(unspec_volatile:BLK [(match_dup 0)] UNSPECV_MEMBAR))]
+  ""
+{
+  operands[0] = gen_rtx_MEM (BLKmode, gen_rtx_SCRATCH (Pmode));
+  MEM_VOLATILE_P (operands[0]) = 1;
+})
+
+;; Ptx defines the memory barriers membar.cta, membar.gl and membar.sys
+;; (corresponding to cuda functions threadfence_block, threadfence and
+;; threadfence_system).  For the insn memory_barrier we use membar.sys.  This
+;; may be overconservative, but before using membar.gl instead we'll need to
+;; explain in detail why it's safe to use.  For now, use membar.sys.
+(define_insn "*memory_barrier"
+  [(set (match_operand:BLK 0 "" "")
+	(unspec_volatile:BLK [(match_dup 0)] UNSPECV_MEMBAR))]
+  ""
+  "\\tmembar.sys;"
+  [(set_attr "predicable" "false")])
+
 (define_insn "nvptx_nounroll"
   [(unspec_volatile [(const_int 0)] UNSPECV_NOUNROLL)]
   ""