Message ID | 20151111161418.GI5675@tucnak.redhat.com |
---|---|
State | New |
Headers | show |
On Wed, 11 Nov 2015, Jakub Jelinek wrote: > Hi! > > Alex reported to me privately that with the OpenMP 4.5 handling of > array section bases (that they are firstprivate instead of mapped) > we unnecessarily mark the pointers addressable and that result > in less efficient way of passing them as shared to inner constructs. Thanks! Would you be interested in further (minimized) cases where new implementation no longer manages to perform copy-in/copy-out optimization? E.g. the following. Or I can try to put such cases in Bugzilla, if you like. Alexander void f(int *p, int n) { int tmp; #pragma omp target map(to:p[0:n]) map(tofrom:tmp) { #pragma omp parallel asm volatile ("" : "=r" (tmp) : "r" (p)); } #pragma omp target /* Missing optimization for 'tmp' here. */ #pragma omp parallel asm volatile ("" : : "r" (tmp)); }
On Wed, Nov 11, 2015 at 07:27:51PM +0300, Alexander Monakov wrote: > > Alex reported to me privately that with the OpenMP 4.5 handling of > > array section bases (that they are firstprivate instead of mapped) > > we unnecessarily mark the pointers addressable and that result > > in less efficient way of passing them as shared to inner constructs. > > Thanks! Would you be interested in further (minimized) cases where new > implementation no longer manages to perform copy-in/copy-out optimization? > E.g. the following. Or I can try to put such cases in Bugzilla, if you like. > > Alexander > > void f(int *p, int n) > { > int tmp; > #pragma omp target map(to:p[0:n]) map(tofrom:tmp) > { > #pragma omp parallel > asm volatile ("" : "=r" (tmp) : "r" (p)); > } > > #pragma omp target > /* Missing optimization for 'tmp' here. */ > #pragma omp parallel > asm volatile ("" : : "r" (tmp)); > } There is nothing to do in this case, map(tofrom:tmp) really has to make tmp addressable, it needs to deal with its address, and the copy-in/out optimization really relies on the var not being addressable in any way; the problem is that you need to be 100% sure that the thread invoking parallel owns the variable and nobody else can do anything with the variable concurrently, otherwise the compiler creates a data race that might not exist in the original program. And OpenMP 4.5 says that on the second target, tmp is implicitly firstprivate (tmp). People who care about the generated code would use firstprivate (tmp) on the second parallel anyway. Jakub
--- gcc/c/c-typeck.c.jj 2015-11-09 17:36:17.000000000 +0100 +++ gcc/c/c-typeck.c 2015-11-10 14:25:53.592499759 +0100 @@ -12865,7 +12865,10 @@ c_finish_omp_clauses (tree clauses, bool omp_clause_code_name[OMP_CLAUSE_CODE (c)]); remove = true; } - else if (!c_mark_addressable (t)) + else if ((OMP_CLAUSE_CODE (c) != OMP_CLAUSE_MAP + || (OMP_CLAUSE_MAP_KIND (c) + != GOMP_MAP_FIRSTPRIVATE_POINTER)) + && !c_mark_addressable (t)) remove = true; else if (!(OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP && (OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_POINTER --- gcc/cp/semantics.c.jj 2015-11-06 08:08:37.000000000 +0100 +++ gcc/cp/semantics.c 2015-11-10 14:27:14.916355747 +0100 @@ -6566,6 +6566,9 @@ finish_omp_clauses (tree clauses, bool a } else if (!processing_template_decl && TREE_CODE (TREE_TYPE (t)) != REFERENCE_TYPE + && (OMP_CLAUSE_CODE (c) != OMP_CLAUSE_MAP + || (OMP_CLAUSE_MAP_KIND (c) + != GOMP_MAP_FIRSTPRIVATE_POINTER)) && !cxx_mark_addressable (t)) remove = true; else if (!(OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP