diff mbox series

[nvptx,libgomp,testsuite,PR85519] Reduce recursion depth in declare_target-{1,2}.f90

Message ID a6367053-6f15-0466-cbad-460d88b2d720@mentor.com
State New
Headers show
Series [nvptx,libgomp,testsuite,PR85519] Reduce recursion depth in declare_target-{1,2}.f90 | expand

Commit Message

Tom de Vries April 25, 2018, 10:58 a.m. UTC
Hi,

when running the libgomp tests with nvptx accelerator on an Nvidia Titan 
V, we run into these failures:
...
FAIL: libgomp.fortran/examples-4/declare_target-1.f90   -O1  execution test
FAIL: libgomp.fortran/examples-4/declare_target-1.f90   -O2  execution test
FAIL: libgomp.fortran/examples-4/declare_target-1.f90   -Os  execution test
FAIL: libgomp.fortran/examples-4/declare_target-2.f90   -O1  execution test
FAIL: libgomp.fortran/examples-4/declare_target-2.f90   -O2  execution test
FAIL: libgomp.fortran/examples-4/declare_target-2.f90   -Os  execution test
...

These tests contain recursive functions, and the failures are due to the 
fact that during execution it runs out of thread stack. The symptom is:
...
libgomp: cuCtxSynchronize error: an illegal memory access was encountered
...
which we can turn into this symptom:
...
libgomp: cuStreamSynchronize error: an illegal instruction was encountered
...
by using GOMP_NVPTX_JIT=-O0, which inserts a valid thread stack check 
after the thread stack decrement at the start of each function.

The thread stack limit defaults to 1024 on all the boards that I've 
checked, including Titan V. The tests have a recursion depth of ~25, so 
when the frame size of the recursive function exceeds ~40, we can be 
sure to run out off thread stack. [ It also may happen at a smaller 
frame size, given that some thread stack space may have already been 
consumed before calling the recursive function. ]

[ The nvptx libgomp port uses a 128k per-warp stack in the global 
memory, avoiding the use of the .local directive in offloading 
functions, which would be mapped onto thread stack. But doing so does 
not eliminate the thread stack usage. F.i., device routine parameters 
can be stored on thread stack. ]


Concluding, these tests run out thread stack on Nvidia Titan V because 
the recursive functions have a larger frame size than we've seen for the 
Nvidia architecture flavours that we've tested before.

The patch fixes this by reducing the recursion depth.

OK for stage4 trunk?

Thanks,
- Tom

Comments

Jakub Jelinek April 25, 2018, 11:08 a.m. UTC | #1
On Wed, Apr 25, 2018 at 12:58:47PM +0200, Tom de Vries wrote:
> Concluding, these tests run out thread stack on Nvidia Titan V because the
> recursive functions have a larger frame size than we've seen for the Nvidia
> architecture flavours that we've tested before.
> 
> The patch fixes this by reducing the recursion depth.
> 
> OK for stage4 trunk?

Ok for trunk (i.e. 9.x) and 8.2 after 8.1 is released.

> [nvptx, libgomp, testsuite] Reduce recursion depth in declare_target-{1,2}.f90
> 
> 2018-04-25  Tom de Vries  <tom@codesourcery.com>
> 
> 	PR target/85519
> 	* testsuite/libgomp.fortran/examples-4/declare_target-1.f90: Reduce
> 	recursion depth from 25 to 23.
> 	* testsuite/libgomp.fortran/examples-4/declare_target-2.f90: Same.

	Jakub
diff mbox series

Patch

[nvptx, libgomp, testsuite] Reduce recursion depth in declare_target-{1,2}.f90

2018-04-25  Tom de Vries  <tom@codesourcery.com>

	PR target/85519
	* testsuite/libgomp.fortran/examples-4/declare_target-1.f90: Reduce
	recursion depth from 25 to 23.
	* testsuite/libgomp.fortran/examples-4/declare_target-2.f90: Same.

---
 libgomp/testsuite/libgomp.fortran/examples-4/declare_target-1.f90 | 4 +++-
 libgomp/testsuite/libgomp.fortran/examples-4/declare_target-2.f90 | 6 ++++--
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-1.f90 b/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-1.f90
index df941ee..51de6b2 100644
--- a/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-1.f90
+++ b/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-1.f90
@@ -27,5 +27,7 @@  end module
 program e_53_1
   use e_53_1_mod, only : fib, fib_wrapper
   if (fib (15) /= fib_wrapper (15)) STOP 1
-  if (fib (25) /= fib_wrapper (25)) STOP 2
+  ! Reduced from 25 to 23, otherwise execution runs out of thread stack on
+  ! Nvidia Titan V.
+  if (fib (23) /= fib_wrapper (23)) STOP 2
 end program
diff --git a/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-2.f90 b/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-2.f90
index 9c31569..76cce01 100644
--- a/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-2.f90
+++ b/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-2.f90
@@ -4,9 +4,11 @@  program e_53_2
   !$omp declare target (fib)
   integer :: x, fib
   !$omp target map(from: x)
-    x = fib (25)
+    ! Reduced from 25 to 23, otherwise execution runs out of thread stack on
+    ! Nvidia Titan V.
+    x = fib (23)
   !$omp end target
-  if (x /= fib (25)) STOP 1
+  if (x /= fib (23)) STOP 1
 end program
 
 integer recursive function fib (n) result (f)