diff mbox series

libgomp, OpenMP: Fix issue for omp_get_device_num on gfx targets.

Message ID 42b806a0-55a2-7fea-3e69-2d82bfc18007@codesourcery.com
State New
Headers show
Series libgomp, OpenMP: Fix issue for omp_get_device_num on gfx targets. | expand

Commit Message

Marcel Vollweiler Jan. 12, 2022, 9:43 a.m. UTC
Hi,

Currently omp_get_device_num does not work on gcn targets with more than
one offload device. The reason is that GOMP_DEVICE_NUM_VAR is static in
icv-device.c and thus "__gomp_device_num" is not visible in the offload
image.

This patch removes "static" such that "__gomp_device_num" is now part of
the offload image and can now be found in GOMP_OFFLOAD_load_image in the
plugin.

This is not an issue for nvptx. There, "__gomp_device_num" is in the
offload image even with "static".

The patch was tested on x86_64-linux with amdgcn offloading with no
regressions.

Marcel
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
libgomp, OpenMP: Fix issue for omp_get_device_num on gcn targets.

Currently omp_get_device_num does not work on gcn targets with more than one
offload device. The reason is that GOMP_DEVICE_NUM_VAR is static in
icv-device.c and thus "__gomp_device_num" is not visible in the offload image.

This patch removes "static" such that "__gomp_device_num" is now part of the
offload image and can now be found in GOMP_OFFLOAD_load_image in the plugin.

This is not an issue for nvptx. There, "__gomp_device_num" is in the offload
image even with "static".

libgomp/ChangeLog:

	* config/gcn/icv-device.c: Make GOMP_DEVICE_NUM_VAR public (remove
	"static") to make the device num available in the offload image.

Comments

Thomas Schwinge Jan. 18, 2022, 12:25 p.m. UTC | #1
Hi!

Maybe I'm just totally confused -- as so often ;-) -- but things seem
strange here:

On 2022-01-12T10:43:05+0100, Marcel Vollweiler <marcel@codesourcery.com> wrote:
> Currently omp_get_device_num does not work on gcn targets with more than
> one offload device. The reason is that GOMP_DEVICE_NUM_VAR

I understand the 'GOMP_DEVICE_NUM_VAR' "macro indirection" is so that we
define the actual symbol name ('__gomp_device_num') in one place
('libgomp/libgomp-plugin.h'), and then use it (via macro expansion) in
several places, right?

> is static in
> icv-device.c and thus "__gomp_device_num" is not visible in the offload
> image.

That behavior seems correct -- but undesired indeed?

> This patch removes "static" such that "__gomp_device_num" is now part of
> the offload image and can now be found in GOMP_OFFLOAD_load_image in the
> plugin.

That seems correct?

Or, is there a reason to have it 'static', say, so that several such
local variables can co-exist, instead of just one global one?

> This is not an issue for nvptx. There, "__gomp_device_num" is in the
> offload image even with "static".

That's unexpected then, and should be looked into?

Still, should 'static' be removed here, too?


Grüße
 Thomas


> The patch was tested on x86_64-linux with amdgcn offloading with no
> regressions.


> libgomp, OpenMP: Fix issue for omp_get_device_num on gcn targets.
>
> Currently omp_get_device_num does not work on gcn targets with more than one
> offload device. The reason is that GOMP_DEVICE_NUM_VAR is static in
> icv-device.c and thus "__gomp_device_num" is not visible in the offload image.
>
> This patch removes "static" such that "__gomp_device_num" is now part of the
> offload image and can now be found in GOMP_OFFLOAD_load_image in the plugin.
>
> This is not an issue for nvptx. There, "__gomp_device_num" is in the offload
> image even with "static".
>
> libgomp/ChangeLog:
>
>       * config/gcn/icv-device.c: Make GOMP_DEVICE_NUM_VAR public (remove
>       "static") to make the device num available in the offload image.
>
> diff --git a/libgomp/config/gcn/icv-device.c b/libgomp/config/gcn/icv-device.c
> index fcfa0f3..f70b7e6 100644
> --- a/libgomp/config/gcn/icv-device.c
> +++ b/libgomp/config/gcn/icv-device.c
> @@ -60,7 +60,7 @@ omp_is_initial_device (void)
>
>  /* This is set to the device number of current GPU during device initialization,
>     when the offload image containing this libgomp portion is loaded.  */
> -static volatile int GOMP_DEVICE_NUM_VAR;
> +volatile int GOMP_DEVICE_NUM_VAR;
>
>  int
>  omp_get_device_num (void)
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
Andrew Stubbs Jan. 18, 2022, 1:47 p.m. UTC | #2
Sorry, I had not seen that this was entirely within my amdgcn remit....

On 12/01/2022 09:43, Marcel Vollweiler wrote:
> Hi,
> 
> Currently omp_get_device_num does not work on gcn targets with more than
> one offload device. The reason is that GOMP_DEVICE_NUM_VAR is static in
> icv-device.c and thus "__gomp_device_num" is not visible in the offload
> image.
> 
> This patch removes "static" such that "__gomp_device_num" is now part of
> the offload image and can now be found in GOMP_OFFLOAD_load_image in the
> plugin.

To be clear, omp_get_device_num doesn't work correctly with any number 
of offload devices. It so happens that the uninitialized value (zero) is 
the correct answer on the first device so the failure was hidden on 
single-device machines.

> This is not an issue for nvptx. There, "__gomp_device_num" is in the
> offload image even with "static".

This is probably related to the unusual way PTX files are "linked".

> The patch was tested on x86_64-linux with amdgcn offloading with no
> regressions.

OK to commit. I know this isn't a regression, but it's a silly bug in a 
new feature on a secondary target, so I think we can have this in stage 4.

Andrew

P.S. Like Thomas says, the static can probably be safely removed in the 
NVPTX file also, but then I think you're planning to unify the two files 
in any case?
Andrew Stubbs Jan. 18, 2022, 1:55 p.m. UTC | #3
On 18/01/2022 12:25, Thomas Schwinge wrote:
> Hi!
> 
> Maybe I'm just totally confused -- as so often ;-) -- but things seem
> strange here:
> 
> On 2022-01-12T10:43:05+0100, Marcel Vollweiler <marcel@codesourcery.com> wrote:
>> Currently omp_get_device_num does not work on gcn targets with more than
>> one offload device. The reason is that GOMP_DEVICE_NUM_VAR
> 
> I understand the 'GOMP_DEVICE_NUM_VAR' "macro indirection" is so that we
> define the actual symbol name ('__gomp_device_num') in one place
> ('libgomp/libgomp-plugin.h'), and then use it (via macro expansion) in
> several places, right?

Right. I don't know what the motivation behind that design was, but 
Marcel has not invented this.

>> is static in
>> icv-device.c and thus "__gomp_device_num" is not visible in the offload
>> image.
> 
> That behavior seems correct -- but undesired indeed?
> 
>> This patch removes "static" such that "__gomp_device_num" is now part of
>> the offload image and can now be found in GOMP_OFFLOAD_load_image in the
>> plugin.
> 
> That seems correct?
> 
> Or, is there a reason to have it 'static', say, so that several such
> local variables can co-exist, instead of just one global one?

If there were several similar symbols "coexisting" then the libgomp 
plugin would not know which to update at kernel launch time, and 
different parts of libgomp would see different values for the same ICV. 
I can't think of any reason why we'd want that?

My guess is that it started out as a static variable inside a function 
and was then moved to file-scope without removing the keyword.

> 
>> This is not an issue for nvptx. There, "__gomp_device_num" is in the
>> offload image even with "static".
> 
> That's unexpected then, and should be looked into?

I suspect it's just because NVPTX doesn't use ELF at link time. Multiple 
static variables with the same name in different translation units might 
even be broken?

> Still, should 'static' be removed here, too?

I would have thought so. In fact there's no need for nvptx and amdgcn to 
have independent icv-device.c files, given that they must both implement 
the same features and there's nothing architecture-specific going on 
here (thus far).

Andrew
Marcel Vollweiler Jan. 18, 2022, 2:31 p.m. UTC | #4
Hi Thomas,

Am 18.01.2022 um 13:25 schrieb Thomas Schwinge:
> Hi!
>
> Maybe I'm just totally confused -- as so often ;-) -- but things seem
> strange here:
>
> On 2022-01-12T10:43:05+0100, Marcel Vollweiler <marcel@codesourcery.com> wrote:
>> Currently omp_get_device_num does not work on gcn targets with more than
>> one offload device. The reason is that GOMP_DEVICE_NUM_VAR
>
> I understand the 'GOMP_DEVICE_NUM_VAR' "macro indirection" is so that we
> define the actual symbol name ('__gomp_device_num') in one place
> ('libgomp/libgomp-plugin.h'), and then use it (via macro expansion) in
> several places, right?

Yes, as far as I understood.

>
>> is static in
>> icv-device.c and thus "__gomp_device_num" is not visible in the offload
>> image.
>
> That behavior seems correct -- but undesired indeed?

Good question. In contrast to nvptx I observed that __gomp_device_num is
not part of the offload image which we read out in
GOMP_OFFLOAD_load_image ("if (status != HSA_STATUS_SUCCESS)" in
libgomp/plugin/plugin-gcn.c returns false). I validated it with some
additional output in the if-branches.

>
>> This patch removes "static" such that "__gomp_device_num" is now part of
>> the offload image and can now be found in GOMP_OFFLOAD_load_image in the
>> plugin.
>
> That seems correct?
>
> Or, is there a reason to have it 'static', say, so that several such
> local variables can co-exist, instead of just one global one?
>
>> This is not an issue for nvptx. There, "__gomp_device_num" is in the
>> offload image even with "static".
>
> That's unexpected then, and should be looked into?

Actually, I don't see the reason for the different behaviour for nvptx.
I just tested that for nvptx the correct device number is returned by
omp_get_device_num on the device - also if we have more than one device.

>
> Still, should 'static' be removed here, too?

I wouldn't suggest unless it is really necessary? I mean, there we don't
have any issue. Although I aggree with Andrew that we could combine both
icv-device.c files into one common file.

Marcel
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
diff mbox series

Patch

diff --git a/libgomp/config/gcn/icv-device.c b/libgomp/config/gcn/icv-device.c
index fcfa0f3..f70b7e6 100644
--- a/libgomp/config/gcn/icv-device.c
+++ b/libgomp/config/gcn/icv-device.c
@@ -60,7 +60,7 @@  omp_is_initial_device (void)
 
 /* This is set to the device number of current GPU during device initialization,
    when the offload image containing this libgomp portion is loaded.  */
-static volatile int GOMP_DEVICE_NUM_VAR;
+volatile int GOMP_DEVICE_NUM_VAR;
 
 int
 omp_get_device_num (void)