diff mbox

[gomp4,11/14] libgomp: avoid variable-length stack allocation in team.c

Message ID 1445366076-16082-12-git-send-email-amonakov@ispras.ru
State New
Headers show

Commit Message

Alexander Monakov Oct. 20, 2015, 6:34 p.m. UTC
NVPTX does not support alloca or variable-length stack allocations, thus
heap allocation needs to be used instead.  I've opted to make this a generic
change instead of guarding it with an #ifdef: libgomp usually leaves thread
stack size up to libc, so avoiding unbounded stack allocation makes sense.

	* task.c (GOMP_task): Use a fixed-size on-stack buffer or a heap
        allocation instead of a variable-size on-stack allocation.
---
 libgomp/task.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

Comments

Bernd Schmidt Oct. 20, 2015, 8:45 p.m. UTC | #1
On 10/20/2015 08:34 PM, Alexander Monakov wrote:
> NVPTX does not support alloca or variable-length stack allocations, thus
> heap allocation needs to be used instead.  I've opted to make this a generic
> change instead of guarding it with an #ifdef: libgomp usually leaves thread
> stack size up to libc, so avoiding unbounded stack allocation makes sense.
>
> 	* task.c (GOMP_task): Use a fixed-size on-stack buffer or a heap
>          allocation instead of a variable-size on-stack allocation.

> +	  char buf_fixed[2048], *buf = buf_fixed;

This might also not be the best of ideas on a GPU - the stack size isn't 
all that unlimited, what with there being lots of threads. If I do

   size_t stack, heap;
   cuCtxGetLimit (&stack, CU_LIMIT_STACK_SIZE);

in the nvptx-run program we've used for testing, it shows a default 
stack size of just 1kB.


Bernd
Alexander Monakov Oct. 20, 2015, 9:36 p.m. UTC | #2
On Tue, 20 Oct 2015, Bernd Schmidt wrote:

> On 10/20/2015 08:34 PM, Alexander Monakov wrote:
> > NVPTX does not support alloca or variable-length stack allocations, thus
> > heap allocation needs to be used instead.  I've opted to make this a generic
> > change instead of guarding it with an #ifdef: libgomp usually leaves thread
> > stack size up to libc, so avoiding unbounded stack allocation makes sense.
> >
> >  * task.c (GOMP_task): Use a fixed-size on-stack buffer or a heap
> >          allocation instead of a variable-size on-stack allocation.
> 
> > +	  char buf_fixed[2048], *buf = buf_fixed;
> 
> This might also not be the best of ideas on a GPU - the stack size isn't all
> that unlimited, what with there being lots of threads. If I do
> 
>   size_t stack, heap;
>   cuCtxGetLimit (&stack, CU_LIMIT_STACK_SIZE);
> 
> in the nvptx-run program we've used for testing, it shows a default stack size
> of just 1kB.

Thanks, NVPTX will need a low buf_fixed size, perhaps 64 bytes or so.
What about the generic case, should it use a more generous threshold,
or revert to existing unbounded alloca?

Any ideas how big is the required allocation size is in practice?

Thanks.
Alexander
Bernd Schmidt Oct. 20, 2015, 9:41 p.m. UTC | #3
On 10/20/2015 11:36 PM, Alexander Monakov wrote:
> Thanks, NVPTX will need a low buf_fixed size, perhaps 64 bytes or so.
> What about the generic case, should it use a more generous threshold,
> or revert to existing unbounded alloca?
>
> Any ideas how big is the required allocation size is in practice?

I'll defer to Jakub for questions and patches that are more strongly 
libgomp-related than ptx-related.


Bernd
Jakub Jelinek Oct. 21, 2015, 9:58 a.m. UTC | #4
On Tue, Oct 20, 2015 at 09:34:33PM +0300, Alexander Monakov wrote:
> NVPTX does not support alloca or variable-length stack allocations, thus
> heap allocation needs to be used instead.  I've opted to make this a generic
> change instead of guarding it with an #ifdef: libgomp usually leaves thread
> stack size up to libc, so avoiding unbounded stack allocation makes sense.
> 
> 	* task.c (GOMP_task): Use a fixed-size on-stack buffer or a heap
>         allocation instead of a variable-size on-stack allocation.

I don't like this unconditionally.
This really isn't unbounded, the buffer just contains the privatized
variables.  If one uses
int c[124];
void foo (void)
{
  int a, b[10];
  #pragma omp parallel firstprivate (a, b, c)
  {
    use (a, b, c);
  }
}
then the private copies of the variables are allocated on the stack too,
a, b already in addition to the original non-privatized vars a and b,
c, which has been above a global var, is automatic just in the private copy.
Now, for #pragma omp task firstprivate (a, b, c) if there are copy constructors
involved, the copy ctors for the firstprivate vars need to be run before
GOMP_task returns, and therefore we let those variables live in the heap
rather than on the stack; but if we know the task needs to execute immediately,
with the alloca we do pretty much the same thing as parallel does, all the
privatized variables are allocated on the stack (just all of them together
using alloca instead of individually by the compiler).

I'm fine with temporarily having some #ifdef HAVE_BROKEN_ALLOCA or similar,
but as nvptx I think doesn't support setjmp/longjmp nor computed goto,
I think just supporting alloca by using malloc instead and freeing at the
end of function if any allocations happened in the function is the right
thing.
> ---
>  libgomp/task.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/libgomp/task.c b/libgomp/task.c
> index 74920d5..ffb7ed2 100644
> --- a/libgomp/task.c
> +++ b/libgomp/task.c
> @@ -162,11 +162,16 @@ GOMP_task (void (*fn) (void *), void *data, void (*cpyfn) (void *, void *),
>        thr->task = &task;
>        if (__builtin_expect (cpyfn != NULL, 0))
>  	{
> -	  char buf[arg_size + arg_align - 1];
> +	  long buf_size = arg_size + arg_align - 1;
> +	  char buf_fixed[2048], *buf = buf_fixed;
> +	  if (sizeof(buf_fixed) < buf_size)
> +	    buf = gomp_malloc (buf_size);
>  	  char *arg = (char *) (((uintptr_t) buf + arg_align - 1)
>  				& ~(uintptr_t) (arg_align - 1));
>  	  cpyfn (arg, data);
>  	  fn (arg);
> +	  if (buf != buf_fixed)
> +	    free (buf);
>  	}
>        else
>  	fn (data);

	Jakub
diff mbox

Patch

diff --git a/libgomp/task.c b/libgomp/task.c
index 74920d5..ffb7ed2 100644
--- a/libgomp/task.c
+++ b/libgomp/task.c
@@ -162,11 +162,16 @@  GOMP_task (void (*fn) (void *), void *data, void (*cpyfn) (void *, void *),
       thr->task = &task;
       if (__builtin_expect (cpyfn != NULL, 0))
 	{
-	  char buf[arg_size + arg_align - 1];
+	  long buf_size = arg_size + arg_align - 1;
+	  char buf_fixed[2048], *buf = buf_fixed;
+	  if (sizeof(buf_fixed) < buf_size)
+	    buf = gomp_malloc (buf_size);
 	  char *arg = (char *) (((uintptr_t) buf + arg_align - 1)
 				& ~(uintptr_t) (arg_align - 1));
 	  cpyfn (arg, data);
 	  fn (arg);
+	  if (buf != buf_fixed)
+	    free (buf);
 	}
       else
 	fn (data);