Message ID | 20130918133457.GQ1817@tucnak.redhat.com |
---|---|
State | New |
Headers | show |
> As discussed earlier, I'd like to pass __OPENMP_TARGET__ argument to > all of GOMP_target{,_data,_update}, so that all those functions > can get at the offloading data section in the shared library or binary > making the call, so that the first time they encounter such a call > in the shared library or binary, it can attempt to actually offload > it to the target (and, if that fails, remember it, and do host fallback). Actually, I thought that that we'd perform loading the target code in gomp-initialization. So, first time libgomp is called, it loads target image (from somewhere) and starts the target-side MAIN on the target (which actually just starts to wait for offload requests). Then, when GOMP_target is called, it just passes address (or name, as it was initially) to this target-side MAIN and the target executes the requested function. I thought that at the moment GOMP_target is called everything is ready and loaded as initialization has been already executed. Isn't that correct? Maybe I misunderstood something - do you want to load target images only in GOMP_target, not earlier? And I still not quite get it, how would target code know, which function to call in this scheme. How will it figure out, which function to call if we didn't even pass the function name? As far as I understand, __OPENMP_TARGET__ will lead to a dedicated section, which contains addresses of host-side versions of the functions. But how could we obtain the corresponding addresses for the target? COI, e.g., as I understand it, currently doesn't have a capability of returning target-side function address - it only could call a function by name. Or am I missing something? Thanks, Michael > Jakub
On Wed, Sep 18, 2013 at 06:13:25PM +0400, Michael V. Zolotukhin wrote: > > As discussed earlier, I'd like to pass __OPENMP_TARGET__ argument to > > all of GOMP_target{,_data,_update}, so that all those functions > > can get at the offloading data section in the shared library or binary > > making the call, so that the first time they encounter such a call > > in the shared library or binary, it can attempt to actually offload > > it to the target (and, if that fails, remember it, and do host fallback). > Actually, I thought that that we'd perform loading the target code in > gomp-initialization. So, first time libgomp is called, it loads target image > (from somewhere) and starts the target-side MAIN on the target (which actually > just starts to wait for offload requests). Then, when GOMP_target is called, it > just passes address (or name, as it was initially) to this target-side MAIN and > the target executes the requested function. I thought that at the moment > GOMP_target is called everything is ready and loaded as initialization has been > already executed. Isn't that correct? Maybe I misunderstood something - do you > want to load target images only in GOMP_target, not earlier? Yes, I want to load target images only in GOMP_target{,_data,_update}. Loading it earlier is both premature (if you have say two MICs, one HSAIL available, why would you upload everything to all 3 devices, even when you don't know if the program will just print usage and exit and never enter any target region), and hard to do (a program can have many shared libraries (plus the binary), and each of them can have their offload stuff, how do you find out what to offload and what not, and from where?). __OPENMP_TARGET__ would be a linker plugin inserted symbol at the start of some linker plugin created data section, which would start with some header and then data. Say uleb128 number_of_supported_targets - n uleb128 number_of_host_var_pairs - m [ name of offload target (asciiz?) relative offset to the start of the offload data for the target (in MIC case embedded DSO) size of the offload data perhaps something how to find the target addresses array ] repeated n times [ host_address, size ] repeated m times (for the functions passed to GOMP_target the pair would be [ foobar.omp_fn.25, 1 ] ). If no offload data is provided at all, then the section would be just one byte - 0 (or two bytes, 0 0)? So, when GOMP_target{,_data,_update} is called, it could easily determine if the calling shared library resp. binary has been offloaded or not (it could e.g. have a few entries array as cache, otherwise lookup the __OPENMP_TARGET__ address in the splay tree or something? If it sees it hasn't been initialized yet, if it is the very first time, for MIC it would upload the binary (from the plugin data section?), libgomp and finally the shared library from the __OPENMP_TARGET__ and initialize the splay tree with the variables. > And I still not quite get it, how would target code know, which function to call > in this scheme. How will it figure out, which function to call if we didn't > even pass the function name? As far as I understand, __OPENMP_TARGET__ will See earlier mail, function names aren't neccessarily unique. After the __OPENMP_TARGET__ shared library is dlopened in the target and splay tree initialized from the host array of [host_address, size] pairs and corresponding array of target_address in the target code, you just lookup the [ fnaddr, 1 ] in the splay tree and use the target address from that mapping. > lead to a dedicated section, which contains addresses of host-side versions of > the functions. But how could we obtain the corresponding addresses for the > target? COI, e.g., as I understand it, currently doesn't have a capability of > returning target-side function address - it only could call a function by name. > Or am I missing something? Jakub
Hi Jakub, Thanks for the explanation, it's getting a bit clearer, though I still have some questions. > __OPENMP_TARGET__ would be a linker plugin inserted symbol at the start of > some linker plugin created data section, which would start with some header > and then data. > Say > uleb128 number_of_supported_targets - n > uleb128 number_of_host_var_pairs - m > [ name of offload target (asciiz?) > relative offset to the start of the offload data for the target (in MIC case embedded DSO) > size of the offload data > perhaps something how to find the target addresses array > ] repeated n times > [ host_address, size ] repeated m times > (for the functions passed to GOMP_target the pair would be [ foobar.omp_fn.25, 1 ] ). So, in this table we store host addresses of global variables, marked with 'pragma omp declare target', and addresses of host-versions of OMP-versioned functions. Correct? Also, there are pointers to images of target-binaries, which are (presumably) placed in other (or the same?) data sections. > So, when GOMP_target{,_data,_update} is called, it could easily determine > if the calling shared library resp. binary has been offloaded or not That's right. Then, if no initialization has been performed yet, GOMP_target{,_data,_update} is initialized. Now let's look at the initialization. In initialization GOMP_target* looks at the __OPENMP_TARGET__ table (its address is passed as the 3rd argument), finds pointer to a data section with target-binary image, loads it to memory, runs a process on a target from it (e.g. in COI using COIProcessCreateFromFile and/or COIProcessLoadLibraryFromMemory). Global variables are mapped and the corresponding host<->target address pairs are inserted to the splay tree, as usual. Also, GOMP_target* should do the same for function addresses. Could you please describe this step in more details? Do we want to just add some offset to host_function_address (as we want host versions of functions to be ordered exactly as the target versions)? > See above, names are just a bad idea. You can just use some magic wrapper > name in the target binary (the one sitting in libgomp), to which you just > pass the pair of function address and it's argument and the named function > will just read the (target) function pointer and (target) pointer argument > from misc data block and tail call that function. Yes, if we know target function pointer, we can do this. Basically, the main question I have now is how would we figure out target function address? Of course, after initialization we just look for it in our splay tree, so the question relates to the initialization step. Thanks, Michael
On Thu, Sep 19, 2013 at 12:58:28PM +0400, Michael V. Zolotukhin wrote: > Thanks for the explanation, it's getting a bit clearer, though I still have some > questions. > > > __OPENMP_TARGET__ would be a linker plugin inserted symbol at the start of > > some linker plugin created data section, which would start with some header > > and then data. > > Say > > uleb128 number_of_supported_targets - n > > uleb128 number_of_host_var_pairs - m > > [ name of offload target (asciiz?) > > relative offset to the start of the offload data for the target (in MIC case embedded DSO) > > size of the offload data > > perhaps something how to find the target addresses array > > ] repeated n times > > [ host_address, size ] repeated m times > > (for the functions passed to GOMP_target the pair would be [ foobar.omp_fn.25, 1 ] ). > So, in this table we store host addresses of global variables, marked with > 'pragma omp declare target', and addresses of host-versions of OMP-versioned > functions. Correct? Also, there are pointers to images of target-binaries, > which are (presumably) placed in other (or the same?) data sections. Yeah. How exactly we define the section is up to us, but it should have all the information that GOMP_target* will need to offload the stuff from the current shared library or binary, and everything needed to initialize the { host_addr, size } -> { target_addr } mapping of declare target global var definitions and functions passed to GOMP_target. The fewer relocations the section has, the better. But, if we need any relocations, it will need to be in a relro section, and supposedly the embedded shared library (resp. libraries) don't need any relocations on them and will be large, thus supposedly they should live in different sections and the header should just point to them (e.g. using offset relative to __OPENMP_TARGET__ or something that doesn't dynamic relocation). Similarly, if the linker plugin puts in the array of [ host_address, size ] rewritten such that host_address is an offset from __OPENMP_TARGET__, then we won't need dynamic relocations for that. Another complication is dependent shared libraries. Consider liba.c: #pragma omp declare target int i; int foo (void) { return ++i; } #pragma omp end declare target main.c: #pragma omp declare target extern int i; extern int foo (void); #pragma omp end declare target int main () { int j; #pragma omp target { j = i; j += foo (); } if (j != 1) abort (); return 0; } gcc -shared -O2 -fpic -fopenmp -o liba.so -Wl,-soname,liba.so liba.c gcc -O2 -fopenmp -o main main.c -L. -la ./main Perhaps the linker plugin can extract the target shared libraries from the embedded sections of dependent shared libraries (if any), and link the "main" shared library against that, but GOMP_target will need to know that it can't just offload main.so, but also has to offload the dependent liba.so (and of course libgomp.so.1 from the libgomp plugin). What does ICC do in this case? > > So, when GOMP_target{,_data,_update} is called, it could easily determine > > if the calling shared library resp. binary has been offloaded or not > That's right. > Then, if no initialization has been performed yet, GOMP_target{,_data,_update} > is initialized. Now let's look at the initialization. > In initialization GOMP_target* looks at the __OPENMP_TARGET__ table (its address > is passed as the 3rd argument), finds pointer to a data section with > target-binary image, loads it to memory, runs a process on a target from it > (e.g. in COI using COIProcessCreateFromFile and/or > COIProcessLoadLibraryFromMemory). > Global variables are mapped and the corresponding host<->target address pairs > are inserted to the splay tree, as usual. > Also, GOMP_target* should do the same for function addresses. Could you please > describe this step in more details? Do we want to just add some offset to > host_function_address (as we want host versions of functions to be ordered > exactly as the target versions)? The idea was that the host [ host_addr, size ] array (in some named section) would be ordered exactly the same as corresponding [ targ_addr ] array in the target shared library. So, [25] pair in the host array will correspond to [25] in the target shared library array. So you just walk the whole arrays, and in each iteration pick nth host array pair plus corresponding nth target array address, and put it into the splay tree. In the above testcase, host liba.so would contain a [ &i, sizeof(int) ] pair and target liba.so corresponding [ &i ] entry (target i in that case). In host main there would be [ &main.omp_fn.0, 1 ] and in target main.so corresponding [ &main.omp_fn.0 ] (target main.omp_fn.0 in that case). > > See above, names are just a bad idea. You can just use some magic wrapper > > name in the target binary (the one sitting in libgomp), to which you just > > pass the pair of function address and it's argument and the named function > > will just read the (target) function pointer and (target) pointer argument > > from misc data block and tail call that function. > Yes, if we know target function pointer, we can do this. > > Basically, the main question I have now is how would we figure out target > function address? Of course, after initialization we just look for it in our > splay tree, so the question relates to the initialization step. After you ensure the shared library is offloaded and splay tree initialized, you just splay_tree_lookup the host [ fnaddr, 1 ] and get corresponding target address (if not present in splay tree, that would be toolchain bug, so gomp_fatal or something). And, then you just make sure you call that function in the target, whether it is done by calling a named wrapper function to which you pass that target fn address and target address of the pointer array, or something else. Jakub
On 19 Sep 11:23, Jakub Jelinek wrote: > that. Another complication is dependent shared libraries. > Consider > liba.c: > #pragma omp declare target > int i; > int foo (void) > { > return ++i; > } > #pragma omp end declare target > main.c: > #pragma omp declare target > extern int i; > extern int foo (void); > #pragma omp end declare target > int main () > { > int j; > #pragma omp target > { > j = i; > j += foo (); > } > if (j != 1) > abort (); > return 0; > } > gcc -shared -O2 -fpic -fopenmp -o liba.so -Wl,-soname,liba.so liba.c > gcc -O2 -fopenmp -o main main.c -L. -la > ./main > > Perhaps the linker plugin can extract the target shared libraries from > the embedded sections of dependent shared libraries (if any), and link the > "main" shared library against that, but GOMP_target will need to know that > it can't just offload main.so, but also has to offload the dependent > liba.so (and of course libgomp.so.1 from the libgomp plugin). > What does ICC do in this case? > > Jakub Hi Jakub, Here's what ICC does. Suppose we have liba.c and main.c, both with target regions: 1. Building liba.c -> liba.so. A call to offload-runtime library is inserted into _init of liba.so. Target region is compiled into liba_target.so, and placed into .rodata of liba.so. 2. Building main.c -> main.exe. Similarly, a call to offload-runtime library is inserted into _init of main.exe. Target region is compiled into main_target.so, and placed into .rodata of main.exe. 3. Runtime. So, when liba.so and main.exe are loaded at host-side, the runtime library knows, that it should transfer liba_target.so and main_target.so to the target-side. Then, main.exe starts execution. At every entry point to the target region, runtime library checks whether it should perform an initialization. If target is not initialized, runtime library calls COIProcessCreateFromMemory(main_target.exe), that transfers some standard main_target.exe to the target and starts it. Then, runtime library calls COIProcessLoadLibraryFromMemory(liba_target.so, main_target.so), that transfers these libraries to the target and loads them into the main_target.exe. The target-side functions are called from host through COIProcessGetFunctionHandles("f_name") and COIPipelineRunFunction(handle). The addresses of target-side functions are obtained from *_target.so by dlsym(). So, the host-side knows nothing about target addresses. What do you think, how will such an approach work with other target architectures, and with current implementation of GOMP_target{,_data,_update}? Thanks, -- Ilya
--- gcc/omp-builtins.def.jj 2013-09-05 09:19:03.000000000 +0200 +++ gcc/omp-builtins.def 2013-09-18 14:16:38.522988068 +0200 @@ -229,10 +229,10 @@ DEF_GOMP_BUILTIN (BUILT_IN_GOMP_TARGET, BT_FN_VOID_INT_OMPFN_PTR_SIZE_PTR_PTR_PTR, ATTR_NOTHROW_LIST) DEF_GOMP_BUILTIN (BUILT_IN_GOMP_TARGET_DATA, "GOMP_target_data", - BT_FN_VOID_INT_SIZE_PTR_PTR_PTR, ATTR_NOTHROW_LIST) + BT_FN_VOID_INT_PTR_SIZE_PTR_PTR_PTR, ATTR_NOTHROW_LIST) DEF_GOMP_BUILTIN (BUILT_IN_GOMP_TARGET_END_DATA, "GOMP_target_end_data", BT_FN_VOID, ATTR_NOTHROW_LIST) DEF_GOMP_BUILTIN (BUILT_IN_GOMP_TARGET_UPDATE, "GOMP_target_update", - BT_FN_VOID_INT_SIZE_PTR_PTR_PTR, ATTR_NOTHROW_LIST) + BT_FN_VOID_INT_PTR_SIZE_PTR_PTR_PTR, ATTR_NOTHROW_LIST) DEF_GOMP_BUILTIN (BUILT_IN_GOMP_TEAMS, "GOMP_teams", BT_FN_VOID_UINT_UINT, ATTR_NOTHROW_LIST) --- gcc/builtin-types.def.jj 2013-09-05 09:19:03.000000000 +0200 +++ gcc/builtin-types.def 2013-09-18 14:17:20.964764926 +0200 @@ -473,8 +473,6 @@ DEF_FUNCTION_TYPE_5 (BT_FN_BOOL_VPTR_PTR DEF_FUNCTION_TYPE_5 (BT_FN_VOID_OMPFN_PTR_UINT_UINT_UINT, BT_VOID, BT_PTR_FN_VOID_PTR, BT_PTR, BT_UINT, BT_UINT, BT_UINT) -DEF_FUNCTION_TYPE_5 (BT_FN_VOID_INT_SIZE_PTR_PTR_PTR, - BT_VOID, BT_INT, BT_SIZE, BT_PTR, BT_PTR, BT_PTR) DEF_FUNCTION_TYPE_6 (BT_FN_INT_STRING_SIZE_INT_SIZE_CONST_STRING_VALIST_ARG, BT_INT, BT_STRING, BT_SIZE, BT_INT, BT_SIZE, @@ -502,6 +500,8 @@ DEF_FUNCTION_TYPE_6 (BT_FN_BOOL_VPTR_PTR BT_INT) DEF_FUNCTION_TYPE_6 (BT_FN_BOOL_SIZE_VPTR_PTR_PTR_INT_INT, BT_BOOL, BT_SIZE, BT_VOLATILE_PTR, BT_PTR, BT_PTR, BT_INT, BT_INT) +DEF_FUNCTION_TYPE_6 (BT_FN_VOID_INT_PTR_SIZE_PTR_PTR_PTR, + BT_VOID, BT_INT, BT_PTR, BT_SIZE, BT_PTR, BT_PTR, BT_PTR) DEF_FUNCTION_TYPE_7 (BT_FN_VOID_OMPFN_PTR_UINT_LONG_LONG_LONG_UINT, BT_VOID, BT_PTR_FN_VOID_PTR, BT_PTR, BT_UINT, --- gcc/fortran/types.def.jj 2013-09-05 09:19:03.000000000 +0200 +++ gcc/fortran/types.def 2013-09-18 14:23:23.914822416 +0200 @@ -168,8 +168,6 @@ DEF_FUNCTION_TYPE_5 (BT_FN_BOOL_LONG_LON BT_PTR_LONG, BT_PTR_LONG) DEF_FUNCTION_TYPE_5 (BT_FN_VOID_SIZE_VPTR_PTR_PTR_INT, BT_VOID, BT_SIZE, BT_VOLATILE_PTR, BT_PTR, BT_PTR, BT_INT) -DEF_FUNCTION_TYPE_5 (BT_FN_VOID_INT_SIZE_PTR_PTR_PTR, - BT_VOID, BT_INT, BT_SIZE, BT_PTR, BT_PTR, BT_PTR) DEF_FUNCTION_TYPE_6 (BT_FN_BOOL_LONG_LONG_LONG_LONG_LONGPTR_LONGPTR, BT_BOOL, BT_LONG, BT_LONG, BT_LONG, BT_LONG, @@ -194,6 +192,8 @@ DEF_FUNCTION_TYPE_6 (BT_FN_BOOL_VPTR_PTR BT_INT) DEF_FUNCTION_TYPE_6 (BT_FN_BOOL_SIZE_VPTR_PTR_PTR_INT_INT, BT_BOOL, BT_SIZE, BT_VOLATILE_PTR, BT_PTR, BT_PTR, BT_INT, BT_INT) +DEF_FUNCTION_TYPE_6 (BT_FN_VOID_INT_PTR_SIZE_PTR_PTR_PTR, + BT_VOID, BT_INT, BT_PTR, BT_SIZE, BT_PTR, BT_PTR, BT_PTR) DEF_FUNCTION_TYPE_7 (BT_FN_VOID_OMPFN_PTR_UINT_LONG_LONG_LONG_UINT, BT_VOID, BT_PTR_FN_VOID_PTR, BT_PTR, BT_UINT, --- gcc/omp-low.c.jj 2013-09-18 12:43:23.000000000 +0200 +++ gcc/omp-low.c 2013-09-18 14:49:01.708727891 +0200 @@ -7900,23 +7900,19 @@ expand_omp_target (struct omp_region *re } gimple g; + /* FIXME: This will be address of + extern char __OPENMP_TARGET__[] __attribute__((visibility ("hidden"))) + symbol, as soon as the linker plugin is able to create it for us. */ + tree openmp_target = build_zero_cst (ptr_type_node); if (kind == GF_OMP_TARGET_KIND_REGION) { tree fnaddr = build_fold_addr_expr (child_fn); - unsigned fnnamelen = IDENTIFIER_LENGTH (DECL_NAME (child_fn)); - tree fnname = build_string (fnnamelen, - IDENTIFIER_POINTER (DECL_NAME (child_fn))); - TREE_TYPE (fnname) = build_array_type_nelts (char_type_node, - fnnamelen); - TREE_READONLY (fnname) = 1; - TREE_STATIC (fnname) = 1; - fnname = build_fold_addr_expr (fnname); g = gimple_build_call (builtin_decl_explicit (start_ix), 7, - device, fnaddr, fnname, t1, t2, t3, t4); + device, fnaddr, openmp_target, t1, t2, t3, t4); } else - g = gimple_build_call (builtin_decl_explicit (start_ix), 5, - device, t1, t2, t3, t4); + g = gimple_build_call (builtin_decl_explicit (start_ix), 6, + device, openmp_target, t1, t2, t3, t4); gimple_set_location (g, gimple_location (entry_stmt)); gsi_insert_before (&gsi, g, GSI_SAME_STMT); if (kind != GF_OMP_TARGET_KIND_REGION) --- libgomp/target.c.jj 2013-09-16 10:10:07.000000000 +0200 +++ libgomp/target.c 2013-09-18 15:03:41.752244012 +0200 @@ -406,14 +406,15 @@ gomp_update (size_t mapnum, void **hosta /* Called when encountering a target directive. If DEVICE is -1, it means use device-var ICV. If it is -2 (or any other value larger than last available hw device, use host fallback. - FN is address of host code, FNNAME corresponding name to lookup - in the target code. HOSTADDRS, SIZES and KINDS are arrays + FN is address of host code, OPENMP_TARGET contains value of the + __OPENMP_TARGET__ symbol in the shared library or binary that invokes + GOMP_target. HOSTADDRS, SIZES and KINDS are arrays with MAPNUM entries, with addresses of the host objects, sizes of the host objects (resp. for pointer kind pointer bias and assumed sizeof (void *) size) and kinds. */ void -GOMP_target (int device, void (*fn) (void *), const char *fnname, +GOMP_target (int device, void (*fn) (void *), const void *openmp_target, size_t mapnum, void **hostaddrs, size_t *sizes, unsigned char *kinds) { @@ -434,8 +435,8 @@ GOMP_target (int device, void (*fn) (voi } void -GOMP_target_data (int device, size_t mapnum, void **hostaddrs, size_t *sizes, - unsigned char *kinds) +GOMP_target_data (int device, const void *openmp_target, size_t mapnum, + void **hostaddrs, size_t *sizes, unsigned char *kinds) { device = resolve_device (device); if (device == -1) @@ -479,8 +480,8 @@ GOMP_target_end_data (void) } void -GOMP_target_update (int device, size_t mapnum, void **hostaddrs, size_t *sizes, - unsigned char *kinds) +GOMP_target_update (int device, const void *openmp_target, size_t mapnum, + void **hostaddrs, size_t *sizes, unsigned char *kinds) { device = resolve_device (device); if (device == -1) --- libgomp/libgomp_g.h.jj 2013-09-05 09:19:03.000000000 +0200 +++ libgomp/libgomp_g.h 2013-09-18 15:03:22.800341890 +0200 @@ -201,12 +201,13 @@ extern void GOMP_single_copy_end (void * /* target.c */ -extern void GOMP_target (int, void (*) (void *), const char *, +extern void GOMP_target (int, void (*) (void *), const void *, size_t, void **, size_t *, unsigned char *); -extern void GOMP_target_data (int, size_t, void **, size_t *, unsigned char *); +extern void GOMP_target_data (int, const void *, + size_t, void **, size_t *, unsigned char *); extern void GOMP_target_end_data (void); -extern void GOMP_target_update (int, size_t, void **, size_t *, - unsigned char *); +extern void GOMP_target_update (int, const void *, + size_t, void **, size_t *, unsigned char *); extern void GOMP_teams (unsigned int, unsigned int); #endif /* LIBGOMP_G_H */