diff mbox series

[v4,4/4] OpenMP/OpenACC: Unordered/non-constant component offset struct mapping

Message ID 3ff03cb463d35ffe96b1271a146f24899b2cb573.1665351785.git.julian@codesourcery.com
State New
Headers show
Series [v4,1/4] OpenMP: Pointers and member mappings | expand

Commit Message

Julian Brown Oct. 9, 2022, 9:51 p.m. UTC
This patch adds support for non-constant component offsets in "map"
clauses for OpenMP (and the equivalants for OpenACC), which are not able
to be sorted into order at compile time.  Normally struct accesses in
such clauses are gathered together and sorted into increasing address
order after a "GOMP_MAP_STRUCT" node: if we have variable indices,
that is no longer possible.

This patch adds support for such mappings by adding a new variant of
GOMP_MAP_STRUCT that does not require the list of nodes following to
be in sorted order.  This passes right down to the runtime: the list is
sorted in libgomp according to the dynamic values of the offsets after
the newly-introduced GOMP_MAP_STRUCT_UNORD node.

This mostly affects arrays of structs indexed by variables in C and C++,
but can also affect derived-type arrays with constant indexes when they
have an array descriptor.

2022-10-09  Julian Brown  <julian@codesourcery.com>

gcc/
	* gimplify.cc (extract_base_bit_offset): Add VARIABLE_OFFSET parameter.
	(omp_get_attachment, omp_group_last, omp_group_base,
	omp_directive_maps_explicitly): Add GOMP_MAP_STRUCT_UNORD support.
	(omp_accumulate_sibling_list): Update calls to extract_base_bit_offset.
	Support GOMP_MAP_STRUCT_UNORD.
	(omp_build_struct_sibling_lists, gimplify_scan_omp_clauses,
	gimplify_adjust_omp_clauses, gimplify_omp_target_update): Add
	GOMP_MAP_STRUCT_UNORD support.
	* omp-low.cc (lower_omp_target): Add GOMP_MAP_STRUCT_UNORD support.
	* tree-pretty-print.cc (dump_omp_clause): Likewise.

include/
	* gomp-constants.h (gomp_map_kind): Add GOMP_MAP_STRUCT_UNORD.

libgomp/
	* oacc-mem.c (find_group_last, goacc_enter_data_internal,
	goacc_exit_data_internal, GOACC_enter_exit_data): Add
	GOMP_MAP_STRUCT_UNORD support.
	* target.c (compare_addr_r): New helper function.
	(gomp_map_vars_internal, GOMP_target_enter_exit_data,
	gomp_target_task_fn): Add GOMP_MAP_STRUCT_UNORD support.
	* testsuite/libgomp.c-c++-common/map-arrayofstruct-1.c: New test.
	* testsuite/libgomp.c-c++-common/map-arrayofstruct-2.c: New test.
	* testsuite/libgomp.c-c++-common/map-arrayofstruct-3.c: New test.
	* testsuite/libgomp.fortran/map-subarray-3.f90: New test.
	* testsuite/libgomp.fortran/map-subarray-5.f90: New test.
---
 gcc/gimplify.cc                               | 110 +++++++++---
 gcc/omp-low.cc                                |   1 +
 gcc/tree-pretty-print.cc                      |   3 +
 include/gomp-constants.h                      |   9 +
 libgomp/oacc-mem.c                            |   6 +-
 libgomp/target.c                              | 162 ++++++++++++++----
 .../map-arrayofstruct-1.c                     |  38 ++++
 .../map-arrayofstruct-2.c                     |  54 ++++++
 .../map-arrayofstruct-3.c                     |  64 +++++++
 .../libgomp.fortran/map-subarray-3.f90        |  48 ++++++
 .../libgomp.fortran/map-subarray-5.f90        |  50 ++++++
 11 files changed, 495 insertions(+), 50 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/map-arrayofstruct-1.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/map-arrayofstruct-2.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/map-arrayofstruct-3.c
 create mode 100644 libgomp/testsuite/libgomp.fortran/map-subarray-3.f90
 create mode 100644 libgomp/testsuite/libgomp.fortran/map-subarray-5.f90

Comments

Julian Brown Oct. 10, 2022, 6:24 a.m. UTC | #1
On Sun, 9 Oct 2022 14:51:37 -0700
Julian Brown <julian@codesourcery.com> wrote:

> diff --git a/libgomp/testsuite/libgomp.fortran/map-subarray-3.f90
> b/libgomp/testsuite/libgomp.fortran/map-subarray-3.f90 new file mode
> 100644 index 00000000000..b009a4224cc
> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.fortran/map-subarray-3.f90
> @@ -0,0 +1,48 @@
> +
> +gvar(1)%arr => arr1
> +gvar(2)%arr => arr2
> +gvar(3)%arr => arr3
> +
> +gvar(1)%arr = 0
> +gvar(2)%arr = 0
> +gvar(3)%arr = 0
> +
> +i = 1
> +j = 2
> +
> +!$omp target map(gvar(i)%arr, gvar(j)%arr, gvar(j)%arr(1:5))
> +gvar(i)%arr(1) = gvar(i)%arr(1) + 1
> +gvar(j)%arr(1) = gvar(j)%arr(1) + 2
> +!$omp end target
> +
> +i = 2
> +j = 1
> +
> +!$omp target map(gvar(i)%arr, gvar(j)%arr, gvar(j)%arr(1:5))
> +gvar(i)%arr(1) = gvar(i)%arr(1) + 3
> +gvar(j)%arr(1) = gvar(j)%arr(1) + 4
> +!$omp end target
> +
> +if (gvar(i)%arr(1).ne.4) stop 1
> +if (gvar(j)%arr(1).ne.6) stop 2

Oops, this should have been:

-if (gvar(i)%arr(1).ne.4) stop 1
-if (gvar(j)%arr(1).ne.6) stop 2
+if (gvar(1)%arr(1).ne.5) stop 1
+if (gvar(2)%arr(1).ne.5) stop 2

Julian
Jakub Jelinek Oct. 10, 2022, 10:38 a.m. UTC | #2
On Sun, Oct 09, 2022 at 02:51:37PM -0700, Julian Brown wrote:
> This patch adds support for non-constant component offsets in "map"
> clauses for OpenMP (and the equivalants for OpenACC), which are not able
> to be sorted into order at compile time.  Normally struct accesses in
> such clauses are gathered together and sorted into increasing address
> order after a "GOMP_MAP_STRUCT" node: if we have variable indices,
> that is no longer possible.
> 
> This patch adds support for such mappings by adding a new variant of
> GOMP_MAP_STRUCT that does not require the list of nodes following to
> be in sorted order.  This passes right down to the runtime: the list is
> sorted in libgomp according to the dynamic values of the offsets after
> the newly-introduced GOMP_MAP_STRUCT_UNORD node.
> 
> This mostly affects arrays of structs indexed by variables in C and C++,
> but can also affect derived-type arrays with constant indexes when they
> have an array descriptor.

I don't understand why this is needed.
We have a restriction that ought to make all such cases invalid:
"If multiple list items are explicitly mapped on the same construct and have the same containing
array or have base pointers that share original storage, and if any of the list items do not have
corresponding list items that are present in the device data environment prior to a task
encountering the construct, then the list items must refer to the same array elements of either the
containing array or the implicit array of the base pointers."

So, all those
#pragma omp target map(t->a[i].p, t->a[j].p) etc. cases are invalid unless
i == j, so IMNSHO one doesn't need to worry about unordered cases.

> +#if defined(_GNU_SOURCE) || defined(__GNUC__)
> +static int
> +compare_addr_r (const void *a, const void *b, void *data)
> +{
> +  void **hostaddrs = (void **) data;
> +  int ai = *(int *) a, bi = *(int *) b;
> +  if (hostaddrs[ai] < hostaddrs[bi])
> +    return -1;
> +  else if (hostaddrs[ai] > hostaddrs[bi])
> +    return 1;
> +  return 0;
> +}
> +#endif

Note, not all glibcs have qsort_r and _GNU_SOURCE macro being defined
doesn't imply glibc.  You'd need something like _GLIBC_PREREQ macro
and require 2.8 or later.

> +
>  static inline __attribute__((always_inline)) struct target_mem_desc *
>  gomp_map_vars_internal (struct gomp_device_descr *devicep,
>  			struct goacc_asyncqueue *aq, size_t mapnum,
> @@ -968,6 +982,17 @@ gomp_map_vars_internal (struct gomp_device_descr *devicep,
>    tgt->device_descr = devicep;
>    tgt->prev = NULL;
>    struct gomp_coalesce_buf cbuf, *cbufp = NULL;
> +  size_t hostaddr_idx;
> +
> +#if !defined(_GNU_SOURCE) && defined(__GNUC__)
> +  /* If we don't have _GNU_SOURCE (thus no qsort_r), but we are compiling with
> +     GCC (and why wouldn't we be?), we can use this nested function for
> +     regular qsort.  */
> +  int compare_addr (const void *a, const void *b)
> +    {
> +      return compare_addr_r (a, b, (void *) &hostaddrs[hostaddr_idx]);
> +    }
> +#endif

Please don't use nested functions in libgomp.

> +	  int *order = NULL;
> +	  if ((kind & typemask) == GOMP_MAP_STRUCT_UNORD)
> +	    {
> +	      order = (int *) gomp_alloca (sizeof (int) * sizes[i]);
> +	      for (int j = 0; j < sizes[i]; j++)
> +		order[j] = j;
> +#ifdef _GNU_SOURCE
> +	      qsort_r (order, sizes[i], sizeof (int), &compare_addr_r,
> +		       &hostaddrs[i + 1]);
> +#elif defined(__GNUC__)
> +	      hostaddr_idx = i + 1;
> +	      qsort (order, sizes[i], sizeof (int), &compare_addr);
> +#else
> +#error no threadsafe qsort
> +#endif

This is too ugly.  If this is really needed (see above) and
you need fallback for missing qsort_t, better sort array of tuples
containing the order number and some data pointer needed for the comparison
routine.

	Jakub
diff mbox series

Patch

diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
index e245adfec3a..e8e0973eff0 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -8861,7 +8861,8 @@  build_omp_struct_comp_nodes (enum tree_code code, tree grp_start, tree grp_end,
 
 static tree
 extract_base_bit_offset (tree base, poly_int64 *bitposp,
-			 poly_offset_int *poffsetp)
+			 poly_offset_int *poffsetp,
+			 bool *variable_offset)
 {
   tree offset;
   poly_int64 bitsize, bitpos;
@@ -8879,10 +8880,13 @@  extract_base_bit_offset (tree base, poly_int64 *bitposp,
   if (offset && poly_int_tree_p (offset))
     {
       poffset = wi::to_poly_offset (offset);
-      offset = NULL_TREE;
+      *variable_offset = false;
     }
   else
-    poffset = 0;
+    {
+      poffset = 0;
+      *variable_offset = (offset != NULL_TREE);
+    }
 
   if (maybe_ne (bitpos, 0))
     poffset += bits_to_bytes_round_down (bitpos);
@@ -9038,6 +9042,7 @@  omp_get_attachment (omp_mapping_group *grp)
       return error_mark_node;
 
     case GOMP_MAP_STRUCT:
+    case GOMP_MAP_STRUCT_UNORD:
     case GOMP_MAP_FORCE_DEVICEPTR:
     case GOMP_MAP_DEVICE_RESIDENT:
     case GOMP_MAP_LINK:
@@ -9123,6 +9128,7 @@  omp_group_last (tree *start_p)
       break;
 
     case GOMP_MAP_STRUCT:
+    case GOMP_MAP_STRUCT_UNORD:
       {
 	unsigned HOST_WIDE_INT num_mappings
 	  = tree_to_uhwi (OMP_CLAUSE_SIZE (c));
@@ -9282,6 +9288,7 @@  omp_group_base (omp_mapping_group *grp, unsigned int *chained,
       return error_mark_node;
 
     case GOMP_MAP_STRUCT:
+    case GOMP_MAP_STRUCT_UNORD:
       {
 	unsigned HOST_WIDE_INT num_mappings
 	  = tree_to_uhwi (OMP_CLAUSE_SIZE (node));
@@ -9898,7 +9905,8 @@  omp_directive_maps_explicitly (hash_map<tree_operand_hash,
       /* We might be called during omp_build_struct_sibling_lists, when
 	 GOMP_MAP_STRUCT might have been inserted at the start of the group.
 	 Skip over that, and also possibly the node after it.  */
-      if (OMP_CLAUSE_MAP_KIND (grp_first) == GOMP_MAP_STRUCT)
+      if (OMP_CLAUSE_MAP_KIND (grp_first) == GOMP_MAP_STRUCT
+	  || OMP_CLAUSE_MAP_KIND (grp_first) == GOMP_MAP_STRUCT_UNORD)
 	{
 	  grp_first = OMP_CLAUSE_CHAIN (grp_first);
 	  if (OMP_CLAUSE_MAP_KIND (grp_first) == GOMP_MAP_FIRSTPRIVATE_POINTER
@@ -10600,7 +10608,9 @@  omp_accumulate_sibling_list (enum omp_region_type region_type,
 	}
     }
 
-  tree base = extract_base_bit_offset (ocd, &cbitpos, &coffset);
+  bool variable_offset;
+  tree base
+    = extract_base_bit_offset (ocd, &cbitpos, &coffset, &variable_offset);
 
   int base_token;
   for (base_token = addr_tokens.length () - 1; base_token >= 0; base_token--)
@@ -10628,14 +10638,20 @@  omp_accumulate_sibling_list (enum omp_region_type region_type,
 
   if (!struct_map_to_clause || struct_map_to_clause->get (base) == NULL)
     {
-      tree l = build_omp_clause (OMP_CLAUSE_LOCATION (grp_end), OMP_CLAUSE_MAP);
-
-      OMP_CLAUSE_SET_MAP_KIND (l, GOMP_MAP_STRUCT);
-      OMP_CLAUSE_DECL (l) = unshare_expr (base);
-      OMP_CLAUSE_SIZE (l) = size_int (1);
+      enum gomp_map_kind str_kind = GOMP_MAP_STRUCT;
 
       if (struct_map_to_clause == NULL)
 	struct_map_to_clause = new hash_map<tree_operand_hash, tree>;
+
+      if (variable_offset)
+	str_kind = GOMP_MAP_STRUCT_UNORD;
+
+      tree l = build_omp_clause (OMP_CLAUSE_LOCATION (grp_end), OMP_CLAUSE_MAP);
+
+      OMP_CLAUSE_SET_MAP_KIND (l, str_kind);
+      OMP_CLAUSE_DECL (l) = unshare_expr (base);
+      OMP_CLAUSE_SIZE (l) = size_int (1);
+
       struct_map_to_clause->put (base, l);
 
       /* On first iterating through the clause list, we insert the struct node
@@ -10863,6 +10879,11 @@  omp_accumulate_sibling_list (enum omp_region_type region_type,
     {
       tree *osc = struct_map_to_clause->get (base);
       tree *sc = NULL, *scp = NULL;
+      bool unordered = false;
+
+      if (osc && OMP_CLAUSE_MAP_KIND (*osc) == GOMP_MAP_STRUCT_UNORD)
+	unordered = true;
+
       unsigned HOST_WIDE_INT i, elems = tree_to_uhwi (OMP_CLAUSE_SIZE (*osc));
       sc = &OMP_CLAUSE_CHAIN (*osc);
       /* The struct mapping might be immediately followed by a
@@ -10903,12 +10924,20 @@  omp_accumulate_sibling_list (enum omp_region_type region_type,
 			 == REFERENCE_TYPE))
 	      sc_decl = TREE_OPERAND (sc_decl, 0);
 
-	    tree base2 = extract_base_bit_offset (sc_decl, &bitpos, &offset);
+	    bool variable_offset2;
+	    tree base2 = extract_base_bit_offset (sc_decl, &bitpos, &offset,
+						  &variable_offset2);
 	    if (!base2 || !operand_equal_p (base2, base, 0))
 	      break;
 	    if (scp)
 	      continue;
-	    if ((region_type & ORT_ACC) != 0)
+	    if (variable_offset2)
+	      {
+		OMP_CLAUSE_SET_MAP_KIND (*osc, GOMP_MAP_STRUCT_UNORD);
+		unordered = true;
+		break;
+	      }
+	    else if ((region_type & ORT_ACC) != 0)
 	      {
 		/* For OpenACC, allow (ignore) duplicate struct accesses in
 		   the middle of a mapping clause, e.g. "mystruct->foo" in:
@@ -10940,6 +10969,15 @@  omp_accumulate_sibling_list (enum omp_region_type region_type,
 	      }
 	  }
 
+      /* If this is an unordered struct, just insert the new element at the
+	 end of the list.  */
+      if (unordered)
+	{
+	  for (; i < elems; i++)
+	    sc = &OMP_CLAUSE_CHAIN (*sc);
+	  scp = NULL;
+	}
+
       OMP_CLAUSE_SIZE (*osc)
 	= size_binop (PLUS_EXPR, OMP_CLAUSE_SIZE (*osc), size_one_node);
 
@@ -11319,14 +11357,42 @@  omp_build_struct_sibling_lists (enum tree_code code,
 
 	/* This is the first sorted node in the struct sibling list.  Use it
 	   to recalculate the correct bias to use.
-	   (&first_node - attach_decl).  */
-	tree first_node = OMP_CLAUSE_DECL (OMP_CLAUSE_CHAIN (attach));
-	first_node = build_fold_addr_expr (first_node);
-	first_node = fold_convert (ptrdiff_type_node, first_node);
+	   (&first_node - attach_decl).
+	   For GOMP_MAP_STRUCT_UNORD, we need e.g. the
+	   min(min(min(first,second),third),fourth) element, because the
+	   elements aren't in any particular order.  */
+	tree lowest_addr;
+	if (OMP_CLAUSE_MAP_KIND (struct_node) == GOMP_MAP_STRUCT_UNORD)
+	  {
+	    tree first_node = OMP_CLAUSE_CHAIN (attach);
+	    unsigned HOST_WIDE_INT num_mappings
+	      = tree_to_uhwi (OMP_CLAUSE_SIZE (struct_node));
+	    lowest_addr = OMP_CLAUSE_DECL (first_node);
+	    lowest_addr = build_fold_addr_expr (lowest_addr);
+	    lowest_addr = fold_convert (pointer_sized_int_node, lowest_addr);
+	    tree next_node = OMP_CLAUSE_CHAIN (first_node);
+	    while (num_mappings > 1)
+	      {
+		tree tmp = OMP_CLAUSE_DECL (next_node);
+		tmp = build_fold_addr_expr (tmp);
+		tmp = fold_convert (pointer_sized_int_node, tmp);
+		lowest_addr = fold_build2 (MIN_EXPR, pointer_sized_int_node,
+					   lowest_addr, tmp);
+		next_node = OMP_CLAUSE_CHAIN (next_node);
+		num_mappings--;
+	      }
+	    lowest_addr = fold_convert (ptrdiff_type_node, lowest_addr);
+	  }
+	else
+	  {
+	    tree first_node = OMP_CLAUSE_DECL (OMP_CLAUSE_CHAIN (attach));
+	    first_node = build_fold_addr_expr (first_node);
+	    lowest_addr = fold_convert (ptrdiff_type_node, first_node);
+	  }
 	tree attach_decl = OMP_CLAUSE_DECL (attach);
 	attach_decl = fold_convert (ptrdiff_type_node, attach_decl);
 	OMP_CLAUSE_SIZE (attach)
-	  = fold_build2 (MINUS_EXPR, ptrdiff_type_node, first_node,
+	  = fold_build2 (MINUS_EXPR, ptrdiff_type_node, lowest_addr,
 			 attach_decl);
 
 	/* Remove GOMP_MAP_ATTACH node from after struct node.  */
@@ -11874,7 +11940,8 @@  gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p,
 				  GOVD_FIRSTPRIVATE | GOVD_SEEN);
 	    }
 
-	  if (OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_STRUCT
+	  if ((OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_STRUCT
+	       || OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_STRUCT_UNORD)
 	      && (addr_tokens[0]->type == STRUCTURE_BASE
 		  || addr_tokens[0]->type == ARRAY_BASE)
 	      && addr_tokens[0]->u.structure_base_kind == BASE_DECL)
@@ -13461,7 +13528,8 @@  gimplify_adjust_omp_clauses (gimple_seq *pre_p, gimple_seq body, tree *list_p,
 		    }
 		}
 	    }
-	  if (OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_STRUCT
+	  if ((OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_STRUCT
+	       || OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_STRUCT_UNORD)
 	      && (code == OMP_TARGET_EXIT_DATA || code == OACC_EXIT_DATA))
 	    {
 	      remove = true;
@@ -13505,7 +13573,8 @@  gimplify_adjust_omp_clauses (gimple_seq *pre_p, gimple_seq body, tree *list_p,
 		 in target block and none of the mapping has always modifier,
 		 remove all the struct element mappings, which immediately
 		 follow the GOMP_MAP_STRUCT map clause.  */
-	      if (OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_STRUCT)
+	      if (OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_STRUCT
+		  || OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_STRUCT_UNORD)
 		{
 		  HOST_WIDE_INT cnt = tree_to_shwi (OMP_CLAUSE_SIZE (c));
 		  while (cnt--)
@@ -16284,6 +16353,7 @@  gimplify_omp_target_update (tree *expr_p, gimple_seq *pre_p)
 	      have_clause = false;
 	      break;
 	    case GOMP_MAP_STRUCT:
+	    case GOMP_MAP_STRUCT_UNORD:
 	      have_clause = false;
 	      break;
 	    default:
diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc
index 67db528f252..92346672f1d 100644
--- a/gcc/omp-low.cc
+++ b/gcc/omp-low.cc
@@ -12780,6 +12780,7 @@  lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 	  case GOMP_MAP_FIRSTPRIVATE_POINTER:
 	  case GOMP_MAP_FIRSTPRIVATE_REFERENCE:
 	  case GOMP_MAP_STRUCT:
+	  case GOMP_MAP_STRUCT_UNORD:
 	  case GOMP_MAP_ALWAYS_POINTER:
 	  case GOMP_MAP_ATTACH:
 	  case GOMP_MAP_DETACH:
diff --git a/gcc/tree-pretty-print.cc b/gcc/tree-pretty-print.cc
index e7a8c9481a6..c0656104196 100644
--- a/gcc/tree-pretty-print.cc
+++ b/gcc/tree-pretty-print.cc
@@ -967,6 +967,9 @@  dump_omp_clause (pretty_printer *pp, tree clause, int spc, dump_flags_t flags)
 	case GOMP_MAP_STRUCT:
 	  pp_string (pp, "struct");
 	  break;
+	case GOMP_MAP_STRUCT_UNORD:
+	  pp_string (pp, "struct_unord");
+	  break;
 	case GOMP_MAP_ALWAYS_POINTER:
 	  pp_string (pp, "always_pointer");
 	  break;
diff --git a/include/gomp-constants.h b/include/gomp-constants.h
index 84316f953d0..564b119feca 100644
--- a/include/gomp-constants.h
+++ b/include/gomp-constants.h
@@ -138,6 +138,15 @@  enum gomp_map_kind
        (address of the last adjacent entry plus its size).  */
     GOMP_MAP_STRUCT =			(GOMP_MAP_FLAG_SPECIAL_2
 					 | GOMP_MAP_FLAG_SPECIAL | 0),
+    /* As above, but followed by an unordered list of adjacent entries.
+       Slightly less efficient at runtime, but allows for struct components
+       with dynamic offsets.  We can get those e.g. by indexing into an array
+       of structs using a non-constant expression, or even with a constant
+       expression when a Fortran array of derived types has an array
+       descriptor).  */
+    GOMP_MAP_STRUCT_UNORD =		(GOMP_MAP_FLAG_SPECIAL_3
+					 | GOMP_MAP_FLAG_SPECIAL_2
+					 | GOMP_MAP_FLAG_SPECIAL | 0),
     /* On a location of a pointer/reference that is assumed to be already mapped
        earlier, store the translated address of the preceeding mapping.
        No refcount is bumped by this, and the store is done unconditionally.  */
diff --git a/libgomp/oacc-mem.c b/libgomp/oacc-mem.c
index 73b2710c2b8..6bdee906387 100644
--- a/libgomp/oacc-mem.c
+++ b/libgomp/oacc-mem.c
@@ -1028,6 +1028,7 @@  find_group_last (int pos, size_t mapnum, size_t *sizes, unsigned short *kinds)
       break;
 
     case GOMP_MAP_STRUCT:
+    case GOMP_MAP_STRUCT_UNORD:
       pos += sizes[pos];
       break;
 
@@ -1088,6 +1089,7 @@  goacc_enter_data_internal (struct gomp_device_descr *acc_dev, size_t mapnum,
       switch (kinds[i] & 0xff)
 	{
 	case GOMP_MAP_STRUCT:
+	case GOMP_MAP_STRUCT_UNORD:
 	  {
 	    size = (uintptr_t) hostaddrs[group_last] + sizes[group_last]
 		   - (uintptr_t) hostaddrs[i];
@@ -1297,6 +1299,7 @@  goacc_exit_data_internal (struct gomp_device_descr *acc_dev, size_t mapnum,
 	  break;
 
 	case GOMP_MAP_STRUCT:
+	case GOMP_MAP_STRUCT_UNORD:
 	  /* Skip the 'GOMP_MAP_STRUCT' itself, and use the regular processing
 	     for all its entries.  This special handling exists for GCC 10.1
 	     compatibility; afterwards, we're not generating these no-op
@@ -1435,7 +1438,8 @@  GOACC_enter_exit_data (int flags_m, size_t mapnum, void **hostaddrs,
 
       if (kind == GOMP_MAP_POINTER
 	  || kind == GOMP_MAP_TO_PSET
-	  || kind == GOMP_MAP_STRUCT)
+	  || kind == GOMP_MAP_STRUCT
+	  || kind == GOMP_MAP_STRUCT_UNORD)
 	continue;
 
       if (kind == GOMP_MAP_FORCE_ALLOC
diff --git a/libgomp/target.c b/libgomp/target.c
index e5dec469519..015c25be86b 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -945,6 +945,20 @@  gomp_map_val (struct target_mem_desc *tgt, void **hostaddrs, size_t i)
     }
 }
 
+#if defined(_GNU_SOURCE) || defined(__GNUC__)
+static int
+compare_addr_r (const void *a, const void *b, void *data)
+{
+  void **hostaddrs = (void **) data;
+  int ai = *(int *) a, bi = *(int *) b;
+  if (hostaddrs[ai] < hostaddrs[bi])
+    return -1;
+  else if (hostaddrs[ai] > hostaddrs[bi])
+    return 1;
+  return 0;
+}
+#endif
+
 static inline __attribute__((always_inline)) struct target_mem_desc *
 gomp_map_vars_internal (struct gomp_device_descr *devicep,
 			struct goacc_asyncqueue *aq, size_t mapnum,
@@ -968,6 +982,17 @@  gomp_map_vars_internal (struct gomp_device_descr *devicep,
   tgt->device_descr = devicep;
   tgt->prev = NULL;
   struct gomp_coalesce_buf cbuf, *cbufp = NULL;
+  size_t hostaddr_idx;
+
+#if !defined(_GNU_SOURCE) && defined(__GNUC__)
+  /* If we don't have _GNU_SOURCE (thus no qsort_r), but we are compiling with
+     GCC (and why wouldn't we be?), we can use this nested function for
+     regular qsort.  */
+  int compare_addr (const void *a, const void *b)
+    {
+      return compare_addr_r (a, b, (void *) &hostaddrs[hostaddr_idx]);
+    }
+#endif
 
   if (mapnum == 0)
     {
@@ -1061,13 +1086,34 @@  gomp_map_vars_internal (struct gomp_device_descr *devicep,
 	    tgt->list[i].offset = 0;
 	  continue;
 	}
-      else if ((kind & typemask) == GOMP_MAP_STRUCT)
+      else if ((kind & typemask) == GOMP_MAP_STRUCT
+	       || (kind & typemask) == GOMP_MAP_STRUCT_UNORD)
 	{
-	  size_t first = i + 1;
-	  size_t last = i + sizes[i];
+	  int *order = NULL;
+	  if ((kind & typemask) == GOMP_MAP_STRUCT_UNORD)
+	    {
+	      order = (int *) gomp_alloca (sizeof (int) * sizes[i]);
+	      for (int j = 0; j < sizes[i]; j++)
+		order[j] = j;
+#ifdef _GNU_SOURCE
+	      qsort_r (order, sizes[i], sizeof (int), &compare_addr_r,
+		       &hostaddrs[i + 1]);
+#elif defined(__GNUC__)
+	      hostaddr_idx = i + 1;
+	      qsort (order, sizes[i], sizeof (int), &compare_addr);
+#else
+#error no threadsafe qsort
+#endif
+	    }
+	  size_t first = i + 1, last = i + sizes[i];
+	  size_t argmin = first, argmax = last;
+	  if (order)
+	    {
+	      argmin = first + order[0];
+	      argmax = first + order[sizes[i] - 1];
+	    }
 	  cur_node.host_start = (uintptr_t) hostaddrs[i];
-	  cur_node.host_end = (uintptr_t) hostaddrs[last]
-			      + sizes[last];
+	  cur_node.host_end = (uintptr_t) hostaddrs[argmax] + sizes[argmax];
 	  tgt->list[i].key = NULL;
 	  tgt->list[i].offset = OFFSET_STRUCT;
 	  splay_tree_key n = splay_tree_lookup (mem_map, &cur_node);
@@ -1076,21 +1122,26 @@  gomp_map_vars_internal (struct gomp_device_descr *devicep,
 	      size_t align = (size_t) 1 << (kind >> rshift);
 	      if (tgt_align < align)
 		tgt_align = align;
-	      tgt_size -= (uintptr_t) hostaddrs[first] - cur_node.host_start;
+	      tgt_size -= (uintptr_t) hostaddrs[argmin] - cur_node.host_start;
 	      tgt_size = (tgt_size + align - 1) & ~(align - 1);
 	      tgt_size += cur_node.host_end - cur_node.host_start;
 	      not_found_cnt += last - i;
+	      void *prev_addr = NULL;
 	      for (i = first; i <= last; i++)
 		{
+		  int oi = order ? first + order[i - first] : i;
 		  tgt->list[i].key = NULL;
+		  if (order && i > first && prev_addr == hostaddrs[oi])
+		    continue;
 		  if (!aq
-		      && gomp_to_device_kind_p (get_kind (short_mapkind, kinds, i)
-						& typemask)
-		      && sizes[i] != 0)
+		      && gomp_to_device_kind_p (get_kind (short_mapkind, kinds,
+							  oi) & typemask)
+		      && sizes[oi] != 0)
 		    gomp_coalesce_buf_add (&cbuf,
 					   tgt_size - cur_node.host_end
-					   + (uintptr_t) hostaddrs[i],
-					   sizes[i]);
+					   + (uintptr_t) hostaddrs[oi],
+					   sizes[oi]);
+		  prev_addr = hostaddrs[oi];
 		}
 	      i--;
 	      continue;
@@ -1368,11 +1419,12 @@  gomp_map_vars_internal (struct gomp_device_descr *devicep,
 	  {
 	    int kind = get_kind (short_mapkind, kinds, i);
 	    bool implicit = get_implicit (short_mapkind, kinds, i);
+	    int *order = NULL;
 	    if (hostaddrs[i] == NULL)
 	      continue;
 	    switch (kind & typemask)
 	      {
-		size_t align, len, first, last;
+		size_t align, len, first, last, argmin, argmax;
 		splay_tree_key n;
 	      case GOMP_MAP_FIRSTPRIVATE:
 		align = (size_t) 1 << (kind >> rshift);
@@ -1440,39 +1492,58 @@  gomp_map_vars_internal (struct gomp_device_descr *devicep,
 		    tgt->list[i].offset = OFFSET_INLINED;
 		  }
 		continue;
+	      case GOMP_MAP_STRUCT_UNORD:
+		order = (int *) gomp_alloca (sizeof (int) * sizes[i]);
+		for (int j = 0; j < sizes[i]; j++)
+		  order[j] = j;
+#ifdef _GNU_SOURCE
+		qsort_r (order, sizes[i], sizeof (int), &compare_addr_r,
+			 &hostaddrs[i + 1]);
+#elif defined(__GNUC__)
+		hostaddr_idx = i + 1;
+		qsort (order, sizes[i], sizeof (int), &compare_addr);
+#else
+#error no threadsafe qsort
+#endif
+		/* Fallthrough.  */
 	      case GOMP_MAP_STRUCT:
-		first = i + 1;
-		last = i + sizes[i];
+		first = argmin = i + 1;
+		last = argmax = i + sizes[i];
+		if (order)
+		  {
+		    argmin = first + order[0];
+		    argmax = first + order[sizes[i] - 1];
+		  }
 		cur_node.host_start = (uintptr_t) hostaddrs[i];
-		cur_node.host_end = (uintptr_t) hostaddrs[last]
-				    + sizes[last];
-		if (tgt->list[first].key != NULL)
+		cur_node.host_end = (uintptr_t) hostaddrs[argmax]
+				    + sizes[argmax];
+		if (tgt->list[argmin].key != NULL)
 		  continue;
-		if (sizes[last] == 0)
+		if (sizes[argmax] == 0)
 		  cur_node.host_end++;
 		n = splay_tree_lookup (mem_map, &cur_node);
-		if (sizes[last] == 0)
+		if (sizes[argmax] == 0)
 		  cur_node.host_end--;
 		if (n == NULL && cur_node.host_start == cur_node.host_end)
 		  {
 		    gomp_mutex_unlock (&devicep->lock);
 		    gomp_fatal ("Struct pointer member not mapped (%p)",
-				(void*) hostaddrs[first]);
+				(void*) hostaddrs[argmin]);
 		  }
 		if (n == NULL)
 		  {
 		    size_t align = (size_t) 1 << (kind >> rshift);
-		    tgt_size -= (uintptr_t) hostaddrs[first]
+		    tgt_size -= (uintptr_t) hostaddrs[argmin]
 				- (uintptr_t) hostaddrs[i];
 		    tgt_size = (tgt_size + align - 1) & ~(align - 1);
-		    tgt_size += (uintptr_t) hostaddrs[first]
+		    tgt_size += (uintptr_t) hostaddrs[argmin]
 				- (uintptr_t) hostaddrs[i];
-		    field_tgt_base = (uintptr_t) hostaddrs[first];
+		    field_tgt_base = (uintptr_t) hostaddrs[argmin];
 		    field_tgt_offset = tgt_size;
 		    field_tgt_clear = last;
 		    field_tgt_structelem_first = NULL;
 		    tgt_size += cur_node.host_end
-				- (uintptr_t) hostaddrs[first];
+				- (uintptr_t) hostaddrs[argmin];
 		    continue;
 		  }
 		for (i = first; i <= last; i++)
@@ -1557,9 +1628,40 @@  gomp_map_vars_internal (struct gomp_device_descr *devicep,
 	      k->host_end = k->host_start + sizeof (void *);
 	    splay_tree_key n = splay_tree_lookup (mem_map, k);
 	    if (n && n->refcount != REFCOUNT_LINK)
-	      gomp_map_vars_existing (devicep, aq, n, k, &tgt->list[i],
-				      kind & typemask, false, implicit, cbufp,
-				      refcount_set);
+	      {
+		if (field_tgt_clear != FIELD_TGT_EMPTY)
+		  {
+		    /* For this condition to be true, there must be a
+		       duplicate struct element mapping.  This can happen with
+		       GOMP_MAP_STRUCT_UNORD mappings, for example.  */
+		    tgt->list[i].key = n;
+		    if (openmp_p)
+		      {
+			assert ((n->refcount & REFCOUNT_STRUCTELEM) != 0);
+			assert (field_tgt_structelem_first != NULL);
+
+			if (i == field_tgt_clear)
+			  {
+			    n->refcount |= REFCOUNT_STRUCTELEM_FLAG_LAST;
+			    field_tgt_structelem_first = NULL;
+			  }
+		      }
+		    if (i == field_tgt_clear)
+		      field_tgt_clear = FIELD_TGT_EMPTY;
+		    gomp_increment_refcount (n, refcount_set);
+		    tgt->list[i].copy_from
+		      = GOMP_MAP_COPY_FROM_P (kind & typemask);
+		    tgt->list[i].always_copy_from
+		      = GOMP_MAP_ALWAYS_FROM_P (kind & typemask);
+		    tgt->list[i].is_attach = false;
+		    tgt->list[i].offset = 0;
+		    tgt->list[i].length = k->host_end - k->host_start;
+		  }
+		else
+		  gomp_map_vars_existing (devicep, aq, n, k, &tgt->list[i],
+					  kind & typemask, false, implicit,
+					  cbufp, refcount_set);
+	      }
 	    else
 	      {
 		k->aux = NULL;
@@ -3314,7 +3416,8 @@  GOMP_target_enter_exit_data (int device, size_t mapnum, void **hostaddrs,
   size_t i, j;
   if ((flags & GOMP_TARGET_FLAG_EXIT_DATA) == 0)
     for (i = 0; i < mapnum; i++)
-      if ((kinds[i] & 0xff) == GOMP_MAP_STRUCT)
+      if ((kinds[i] & 0xff) == GOMP_MAP_STRUCT
+	  || (kinds[i] & 0xff) == GOMP_MAP_STRUCT_UNORD)
 	{
 	  gomp_map_vars (devicep, sizes[i] + 1, &hostaddrs[i], NULL, &sizes[i],
 			 &kinds[i], true, &refcount_set,
@@ -3409,7 +3512,8 @@  gomp_target_task_fn (void *data)
       htab_t refcount_set = htab_create (ttask->mapnum);
       if ((ttask->flags & GOMP_TARGET_FLAG_EXIT_DATA) == 0)
 	for (i = 0; i < ttask->mapnum; i++)
-	  if ((ttask->kinds[i] & 0xff) == GOMP_MAP_STRUCT)
+	  if ((ttask->kinds[i] & 0xff) == GOMP_MAP_STRUCT
+	      || (ttask->kinds[i] & 0xff) == GOMP_MAP_STRUCT_UNORD)
 	    {
 	      gomp_map_vars (devicep, ttask->sizes[i] + 1, &ttask->hostaddrs[i],
 			     NULL, &ttask->sizes[i], &ttask->kinds[i], true,
diff --git a/libgomp/testsuite/libgomp.c-c++-common/map-arrayofstruct-1.c b/libgomp/testsuite/libgomp.c-c++-common/map-arrayofstruct-1.c
new file mode 100644
index 00000000000..b0994c0a7bb
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/map-arrayofstruct-1.c
@@ -0,0 +1,38 @@ 
+#include <stdlib.h>
+#include <assert.h>
+
+struct st {
+  int *p;
+};
+
+int main (void)
+{
+  struct st s[2];
+  s[0].p = (int *) calloc (5, sizeof (int));
+  s[1].p = (int *) calloc (5, sizeof (int));
+
+#pragma omp target map(s[0].p, s[1].p, s[0].p[0:2], s[1].p[1:3])
+  {
+    s[0].p[0] = 5;
+    s[1].p[1] = 7;
+  }
+
+#pragma omp target map(s, s[0].p[0:2], s[1].p[1:3])
+  {
+    s[0].p[0]++;
+    s[1].p[1]++;
+  }
+
+#pragma omp target map(s[0:2], s[0].p[0:2], s[1].p[1:3])
+  {
+    s[0].p[0]++;
+    s[1].p[1]++;
+  }
+
+  assert (s[0].p[0] == 7);
+  assert (s[1].p[1] == 9);
+
+  free (s[0].p);
+  free (s[1].p);
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.c-c++-common/map-arrayofstruct-2.c b/libgomp/testsuite/libgomp.c-c++-common/map-arrayofstruct-2.c
new file mode 100644
index 00000000000..fe2cc8c0515
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/map-arrayofstruct-2.c
@@ -0,0 +1,54 @@ 
+#include <stdlib.h>
+#include <assert.h>
+
+struct st {
+  int *p;
+};
+
+int main (void)
+{
+  struct st s[10];
+
+  for (int i = 0; i < 10; i++)
+    s[i].p = (int *) calloc (5, sizeof (int));
+
+  for (int i = 0; i < 10; i++)
+    for (int j = 0; j < 10; j++)
+      for (int k = 0; k < 10; k++)
+	{
+	  if (i == j || j == k || i == k)
+	    continue;
+
+#pragma omp target map(s[i].p, s[j].p, s[k].p, s[i].p[0:2], s[j].p[1:3], \
+		       s[k].p[2])
+	  {
+	    s[i].p[0]++;
+	    s[j].p[1]++;
+	    s[k].p[2]++;
+	  }
+
+#pragma omp target map(s, s[i].p[0:2], s[j].p[1:3], s[k].p[2])
+	  {
+	    s[i].p[0]++;
+	    s[j].p[1]++;
+	    s[k].p[2]++;
+	  }
+
+#pragma omp target map(s[0:10], s[i].p[0:2], s[j].p[1:3], s[k].p[2])
+	  {
+	    s[i].p[0]++;
+	    s[j].p[1]++;
+	    s[k].p[2]++;
+	  }
+	}
+
+  for (int i = 0; i < 10; i++)
+    {
+      assert (s[i].p[0] == 216);
+      assert (s[i].p[1] == 216);
+      assert (s[i].p[2] == 216);
+      free (s[i].p);
+    }
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.c-c++-common/map-arrayofstruct-3.c b/libgomp/testsuite/libgomp.c-c++-common/map-arrayofstruct-3.c
new file mode 100644
index 00000000000..8ed7e1d60a2
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/map-arrayofstruct-3.c
@@ -0,0 +1,64 @@ 
+#include <stdlib.h>
+#include <assert.h>
+
+struct st {
+  int *p;
+};
+
+struct tt {
+  struct st a[10];
+};
+
+struct ut {
+  struct tt *t;
+};
+
+int main (void)
+{
+  struct tt *t = (struct tt *) malloc (sizeof *t);
+  struct ut *u = (struct ut *) malloc (sizeof *u);
+
+  for (int i = 0; i < 10; i++)
+    t->a[i].p = (int *) calloc (5, sizeof (int));
+
+  u->t = t;
+
+  for (int i = 0; i < 10; i++)
+    for (int j = 0; j < 10; j++)
+      for (int k = 0; k < 10; k++)
+	{
+	  if (i == j || j == k || i == k)
+	    continue;
+
+	  /* This one can use "firstprivate" for T...  */
+#pragma omp target map(t->a[i].p, t->a[j].p, t->a[k].p, \
+		       t->a[i].p[0:2], t->a[j].p[1:3], t->a[k].p[2])
+	  {
+	    t->a[i].p[0]++;
+	    t->a[j].p[1]++;
+	    t->a[k].p[2]++;
+	  }
+
+	  /* ...but this one must use attach/detach for T.  */
+#pragma omp target map(u->t, u->t->a[i].p, u->t->a[j].p, u->t->a[k].p, \
+		       u->t->a[i].p[0:2], u->t->a[j].p[1:3], u->t->a[k].p[2])
+	  {
+	    u->t->a[i].p[0]++;
+	    u->t->a[j].p[1]++;
+	    u->t->a[k].p[2]++;
+	  }
+	}
+
+  for (int i = 0; i < 10; i++)
+    {
+      assert (t->a[i].p[0] == 144);
+      assert (t->a[i].p[1] == 144);
+      assert (t->a[i].p[2] == 144);
+      free (t->a[i].p);
+    }
+
+  free (u);
+  free (t);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.fortran/map-subarray-3.f90 b/libgomp/testsuite/libgomp.fortran/map-subarray-3.f90
new file mode 100644
index 00000000000..b009a4224cc
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/map-subarray-3.f90
@@ -0,0 +1,48 @@ 
+! { dg-do run }
+
+module mymod
+type G
+integer :: x, y
+integer, pointer :: arr(:)
+integer :: z
+end type G
+end module mymod
+
+program myprog
+use mymod
+
+integer, target :: arr1(10)
+integer, target :: arr2(10)
+integer, target :: arr3(10)
+type(G), dimension(3) :: gvar
+
+integer :: i, j
+
+gvar(1)%arr => arr1
+gvar(2)%arr => arr2
+gvar(3)%arr => arr3
+
+gvar(1)%arr = 0
+gvar(2)%arr = 0
+gvar(3)%arr = 0
+
+i = 1
+j = 2
+
+!$omp target map(gvar(i)%arr, gvar(j)%arr, gvar(j)%arr(1:5))
+gvar(i)%arr(1) = gvar(i)%arr(1) + 1
+gvar(j)%arr(1) = gvar(j)%arr(1) + 2
+!$omp end target
+
+i = 2
+j = 1
+
+!$omp target map(gvar(i)%arr, gvar(j)%arr, gvar(j)%arr(1:5))
+gvar(i)%arr(1) = gvar(i)%arr(1) + 3
+gvar(j)%arr(1) = gvar(j)%arr(1) + 4
+!$omp end target
+
+if (gvar(i)%arr(1).ne.4) stop 1
+if (gvar(j)%arr(1).ne.6) stop 2
+
+end program myprog
diff --git a/libgomp/testsuite/libgomp.fortran/map-subarray-5.f90 b/libgomp/testsuite/libgomp.fortran/map-subarray-5.f90
new file mode 100644
index 00000000000..33a64292bfd
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/map-subarray-5.f90
@@ -0,0 +1,50 @@ 
+! { dg-do run }
+
+type t
+  integer, pointer :: p(:)
+end type t
+
+type(t) :: var(3)
+integer :: i, j
+
+allocate (var(1)%p, source=[1,2,3,5])
+allocate (var(2)%p, source=[2,3,5])
+allocate (var(3)%p(1:3))
+
+var(3)%p = 0
+
+do i = 1, 3
+  do j = 1, 3
+!$omp target map(var(i)%p, var(j)%p)
+    var(i)%p(1) = 5
+    var(j)%p(2) = 7
+!$omp end target
+
+    if (i.ne.j) then
+!$omp target map(var(i)%p(1:3), var(i)%p, var(j)%p)
+      var(i)%p(1) = var(i)%p(1) + 1
+      var(j)%p(2) = var(j)%p(2) + 1
+!$omp end target
+
+!$omp target map(var(i)%p, var(j)%p, var(j)%p(1:3))
+      var(i)%p(1) = var(i)%p(1) + 1
+      var(j)%p(2) = var(j)%p(2) + 1
+!$omp end target
+
+!$omp target map(var(i)%p, var(i)%p(1:3), var(j)%p, var(j)%p(2))
+      var(i)%p(1) = var(i)%p(1) + 1
+      var(j)%p(2) = var(j)%p(2) + 1
+!$omp end target
+    end if
+
+    if (i.eq.j) then
+      if (var(i)%p(1).ne.5) stop 1
+      if (var(j)%p(2).ne.7) stop 2
+    else
+      if (var(i)%p(1).ne.8) stop 3
+      if (var(j)%p(2).ne.10) stop 4
+    end if
+  end do
+end do
+
+end