Message ID | bd5d7ef4-55dd-13be-849b-daa09c2801db@codesourcery.com |
---|---|
State | New |
Headers | show |
Series | [nvptx,libgomp] Avoid use of GOMP_PLUGIN_acc_thread() in nvptx_free() | expand |
On 8/20/20 3:03 PM, Chung-Lin Tang wrote: > Hi Tom, > this patch adjusts nvptx_free() in libgomp/plugin/plugin-nvptx.c to avoid a > "GOMP_PLUGIN_acc_thread() == NULL" check that was causing problems under > OpenMP offloading. > > This check was originally used to determine if nvptx_free() was running > under > CUDA callback context, when freeing resources from an OpenACC > asynchronous compute > region. Since CUDA API calls are not allowed inside callback context, we > have > to save the freed block to ptx_dev->free_blocks, and cuMemFree it later. > > The check to see if GOMP_PLUGIN_acc_thread() exists to determine normal > host thread > vs. callback thread worked under -fopenacc, but since the OpenACC > per-thread data > is not created under -fopenmp, and always returned NULL, we have a leak > situation > where OpenMP offloading kept accumulating freed device memory blocks > until failing; > nvptx_free() never reaches the part where cuMemFree() is actually called. > > I reviewed the CUDA API docs and see that CUDA_ERROR_NOT_PERMITTED is > returned > for such CUDA calls inside callback context, Right, that's "Callbacks must not make any CUDA API calls. Attempting to use a CUDA API will result in CUDA_ERROR_NOT_PERMITTED" at cuStreamAddCallback. Perhaps mention more precisely where you found this. > and it appears to be enough > to replace > the current check, so the new code sees if this error is returned on the > first > cuMemGetAddressRange call to determine callback context. This should now > work > for both OpenACC/OpenMP. > > (Tobias, Catherine, the earlier internal patch to re-organize this > callback context > checking did not work in general, since OpenACC also uses the > .queue_callback > functionality to free the struct target_mem_desc asynchronously, so in > general we > have to ensure nvptx_free() could be used under both normal/callback > context) > > This patch has been libgomp tested for x86_64-linux with nvptx > offloading without > regressions, and should be applied for mainline and GCC10. Is this okay? > Ok, thanks for fixing this. - Tom > Thanks, > Chung-Lin > > 2020-08-20 Chung-Lin Tang <cltang@codesourcery.com> > > libgomp/ > * plugin/plugin-nvptx.c (nvptx_free): Change "GOMP_PLUGIN_acc_thread > () == NULL" > test into check of CUDA_ERROR_NOT_PERMITTED status for > cuMemGetAddressRange. > Adjust comments. > diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c > index ec103a2f40b..188a34f1d04 100644 > --- a/libgomp/plugin/plugin-nvptx.c > +++ b/libgomp/plugin/plugin-nvptx.c > @@ -1038,27 +1038,34 @@ goacc_profiling_acc_ev_free (struct goacc_thread *thr, void *p) > } > > static bool > nvptx_free (void *p, struct ptx_device *ptx_dev) > { > - /* Assume callback context if this is null. */ > - if (GOMP_PLUGIN_acc_thread () == NULL) > + CUdeviceptr pb; > + size_t ps; > + > + CUresult r = CUDA_CALL_NOCHECK (cuMemGetAddressRange, &pb, &ps, > + (CUdeviceptr) p); > + if (r == CUDA_ERROR_NOT_PERMITTED) > { > + /* We assume that this error indicates we are in a CUDA callback context, > + where all CUDA calls are not allowed. Arrange to free this piece of > + device memory later. */ > struct ptx_free_block *n > = GOMP_PLUGIN_malloc (sizeof (struct ptx_free_block)); > n->ptr = p; > pthread_mutex_lock (&ptx_dev->free_blocks_lock); > n->next = ptx_dev->free_blocks; > ptx_dev->free_blocks = n; > pthread_mutex_unlock (&ptx_dev->free_blocks_lock); > return true; > } > - > - CUdeviceptr pb; > - size_t ps; > - > - CUDA_CALL (cuMemGetAddressRange, &pb, &ps, (CUdeviceptr) p); > + else if (r != CUDA_SUCCESS) > + { > + GOMP_PLUGIN_error ("cuMemGetAddressRange error: %s", cuda_error (r)); > + return false; > + } > if ((CUdeviceptr) p != pb) > { > GOMP_PLUGIN_error ("invalid device address"); > return false; > }
Hi Tom, thanks for the extremely quick review :) On 2020/8/20 9:33 PM, Tom de Vries wrote: >> I reviewed the CUDA API docs and see that CUDA_ERROR_NOT_PERMITTED is >> returned >> for such CUDA calls inside callback context, > Right, that's "Callbacks must not make any CUDA API calls. Attempting to > use a CUDA API will result in CUDA_ERROR_NOT_PERMITTED" at > cuStreamAddCallback. Perhaps mention more precisely where you found this. I've added a little bit more in the comments in the final patch pushed. >> and it appears to be enough >> to replace >> the current check, so the new code sees if this error is returned on the >> first >> cuMemGetAddressRange call to determine callback context. This should now >> work >> for both OpenACC/OpenMP. >> >> (Tobias, Catherine, the earlier internal patch to re-organize this >> callback context >> checking did not work in general, since OpenACC also uses the >> .queue_callback >> functionality to free the struct target_mem_desc asynchronously, so in >> general we >> have to ensure nvptx_free() could be used under both normal/callback >> context) >> >> This patch has been libgomp tested for x86_64-linux with nvptx >> offloading without >> regressions, and should be applied for mainline and GCC10. Is this okay? >> > Ok, thanks for fixing this. Just pushed to master, releases/gcc-10, and devel/omp/gcc-10. Chung-Lin
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c index ec103a2f40b..188a34f1d04 100644 --- a/libgomp/plugin/plugin-nvptx.c +++ b/libgomp/plugin/plugin-nvptx.c @@ -1038,27 +1038,34 @@ goacc_profiling_acc_ev_free (struct goacc_thread *thr, void *p) } static bool nvptx_free (void *p, struct ptx_device *ptx_dev) { - /* Assume callback context if this is null. */ - if (GOMP_PLUGIN_acc_thread () == NULL) + CUdeviceptr pb; + size_t ps; + + CUresult r = CUDA_CALL_NOCHECK (cuMemGetAddressRange, &pb, &ps, + (CUdeviceptr) p); + if (r == CUDA_ERROR_NOT_PERMITTED) { + /* We assume that this error indicates we are in a CUDA callback context, + where all CUDA calls are not allowed. Arrange to free this piece of + device memory later. */ struct ptx_free_block *n = GOMP_PLUGIN_malloc (sizeof (struct ptx_free_block)); n->ptr = p; pthread_mutex_lock (&ptx_dev->free_blocks_lock); n->next = ptx_dev->free_blocks; ptx_dev->free_blocks = n; pthread_mutex_unlock (&ptx_dev->free_blocks_lock); return true; } - - CUdeviceptr pb; - size_t ps; - - CUDA_CALL (cuMemGetAddressRange, &pb, &ps, (CUdeviceptr) p); + else if (r != CUDA_SUCCESS) + { + GOMP_PLUGIN_error ("cuMemGetAddressRange error: %s", cuda_error (r)); + return false; + } if ((CUdeviceptr) p != pb) { GOMP_PLUGIN_error ("invalid device address"); return false; }