diff mbox

nvptx offloading patches [2/n]

Message ID 20150219130929.GG1746@tucnak.redhat.com
State New
Headers show

Commit Message

Jakub Jelinek Feb. 19, 2015, 1:09 p.m. UTC
On Tue, Feb 17, 2015 at 09:55:32PM +0100, Bernd Schmidt wrote:
> On 02/17/2015 06:10 PM, Jakub Jelinek wrote:
> >
> >What exact testcase are you trying to fix with this patch, and how do you
> >think offloading of code using va_list can work?
> 
> The exact testcase is any offloaded program - streaming in lto will crash if
> there is a mismatch in these preloaded nodes.
> 
> For OpenACC programs using va_list - I don't expect them to work at all. I
> don't believe the spec considers such issues, and ptx isn't expected to
> support variadic functions in the first place ("The current version of PTX
> does not support variadic functions" is what the spec has to say; the gcc
> port overachieves a little by implementing them anyway).

How do you support printf etc.?  Those are all varargs functions.

Anyway, could following untested patch be used as a temporary hack?
It might pessimize for GCC5 slightly the intelmic offloading, but I hope
the way forward is the stdarg late lowering for GCC 6.

Richard on IRC said it might be better for the lto_stream_offload_p
path to whitelist nodes it has to preload rather than blacklist the ones
that it doesn't, otherwise e.g. if you try to offload from say x86_64-mingw
with 32-bit long, but 64-bit pointers to 64-bit intelmic, preloading
the long_type_node will certainly break lots of things, while not preloading
them would only be problematic for builtins, we'll need some pass over the
builtins in the IL in any case, to find out if they are compatible or not,
adjust if needed and give up otherwise.  But I'd hope it can be worked on
incrementally, if this patch (plus the approved nvptx offloading patches,
plus mode_table streaming) makes the nvptx offloading work.

2015-02-19  Bernd Schmidt  <bernds@codesourcery.com>
	    Jakub Jelinek  <jakub@redhat.com>

	* tree-streamer.c (preload_common_nodes): Don't preload
	TI_VA_LIST* for offloading.
	* tree-stdarg.c (pass_stdarg::gate): Disable for ACCEL_COMPILER
	in_lto_p.


	Jakub

Comments

Richard Biener Feb. 19, 2015, 1:38 p.m. UTC | #1
On Thu, Feb 19, 2015 at 2:09 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Tue, Feb 17, 2015 at 09:55:32PM +0100, Bernd Schmidt wrote:
>> On 02/17/2015 06:10 PM, Jakub Jelinek wrote:
>> >
>> >What exact testcase are you trying to fix with this patch, and how do you
>> >think offloading of code using va_list can work?
>>
>> The exact testcase is any offloaded program - streaming in lto will crash if
>> there is a mismatch in these preloaded nodes.
>>
>> For OpenACC programs using va_list - I don't expect them to work at all. I
>> don't believe the spec considers such issues, and ptx isn't expected to
>> support variadic functions in the first place ("The current version of PTX
>> does not support variadic functions" is what the spec has to say; the gcc
>> port overachieves a little by implementing them anyway).
>
> How do you support printf etc.?  Those are all varargs functions.
>
> Anyway, could following untested patch be used as a temporary hack?
> It might pessimize for GCC5 slightly the intelmic offloading, but I hope
> the way forward is the stdarg late lowering for GCC 6.
>
> Richard on IRC said it might be better for the lto_stream_offload_p
> path to whitelist nodes it has to preload rather than blacklist the ones
> that it doesn't, otherwise e.g. if you try to offload from say x86_64-mingw
> with 32-bit long, but 64-bit pointers to 64-bit intelmic, preloading
> the long_type_node will certainly break lots of things, while not preloading
> them would only be problematic for builtins, we'll need some pass over the
> builtins in the IL in any case, to find out if they are compatible or not,
> adjust if needed and give up otherwise.  But I'd hope it can be worked on
> incrementally, if this patch (plus the approved nvptx offloading patches,
> plus mode_table streaming) makes the nvptx offloading work.

The patch works for me if it helps anything.

Thanks,
Richard.

> 2015-02-19  Bernd Schmidt  <bernds@codesourcery.com>
>             Jakub Jelinek  <jakub@redhat.com>
>
>         * tree-streamer.c (preload_common_nodes): Don't preload
>         TI_VA_LIST* for offloading.
>         * tree-stdarg.c (pass_stdarg::gate): Disable for ACCEL_COMPILER
>         in_lto_p.
>
> --- gcc/tree-streamer.c.jj      2015-02-18 12:36:20.000000000 +0100
> +++ gcc/tree-streamer.c 2015-02-19 13:57:26.089626006 +0100
> @@ -342,7 +342,14 @@ preload_common_nodes (struct streamer_tr
>         && i != TI_TARGET_OPTION_DEFAULT
>         && i != TI_TARGET_OPTION_CURRENT
>         && i != TI_CURRENT_TARGET_PRAGMA
> -       && i != TI_CURRENT_OPTIMIZE_PRAGMA)
> +       && i != TI_CURRENT_OPTIMIZE_PRAGMA
> +       /* Skip va_list* related nodes if offloading.  For native LTO
> +          we want them to be merged for the stdarg pass, for offloading
> +          they might not be identical between host and offloading target.  */
> +       && (!lto_stream_offload_p
> +           || (i != TI_VA_LIST_TYPE
> +               && i != TI_VA_LIST_GPR_COUNTER_FIELD
> +               && i != TI_VA_LIST_FPR_COUNTER_FIELD)))
>        record_common_node (cache, global_trees[i]);
>  }
>
> --- gcc/tree-stdarg.c.jj        2015-02-18 22:55:52.000000000 +0100
> +++ gcc/tree-stdarg.c   2015-02-19 14:00:11.882905823 +0100
> @@ -705,6 +705,13 @@ public:
>    virtual bool gate (function *fun)
>      {
>        return (flag_stdarg_opt
> +#ifdef ACCEL_COMPILER
> +             /* Disable for GCC5 in the offloading compilers, as
> +                va_list and gpr/fpr counter fields are not merged.
> +                In GCC6 when stdarg is lowered late this shouldn't be
> +                an issue.  */
> +             && !in_lto_p
> +#endif
>               /* This optimization is only for stdarg functions.  */
>               && fun->stdarg != 0);
>      }
>
>         Jakub
Thomas Schwinge Feb. 20, 2015, 9:33 a.m. UTC | #2
Hi!

On Thu, 19 Feb 2015 14:09:29 +0100, Jakub Jelinek <jakub@redhat.com> wrote:
> On Tue, Feb 17, 2015 at 09:55:32PM +0100, Bernd Schmidt wrote:
> > On 02/17/2015 06:10 PM, Jakub Jelinek wrote:
> > >
> > >What exact testcase are you trying to fix with this patch, and how do you
> > >think offloading of code using va_list can work?
> > 
> > The exact testcase is any offloaded program - streaming in lto will crash if
> > there is a mismatch in these preloaded nodes.

> could following untested patch be used as a temporary hack?

Thanks!  I'll leave the approval to Bernd, but can already report that
this works fine in my testing, for intelmic and nvptx offloading.

> It might pessimize for GCC5 slightly the intelmic offloading, but I hope
> the way forward is the stdarg late lowering for GCC 6.

> Richard on IRC said it might be better for the lto_stream_offload_p
> path to whitelist nodes it has to preload rather than blacklist the ones
> that it doesn't, otherwise e.g. if you try to offload from say x86_64-mingw
> with 32-bit long, but 64-bit pointers to 64-bit intelmic

That's a good consideration (for the future), but we're not currently
supporting the case of offloading with non-matching ABIs (data types).

> preloading
> the long_type_node will certainly break lots of things, while not preloading
> them would only be problematic for builtins, we'll need some pass over the
> builtins in the IL in any case, to find out if they are compatible or not,
> adjust if needed and give up otherwise.  But I'd hope it can be worked on
> incrementally, if this patch (plus the approved nvptx offloading patches,
> plus mode_table streaming) makes the nvptx offloading work.


Grüße,
 Thomas
Jakub Jelinek Feb. 20, 2015, 9:39 a.m. UTC | #3
On Fri, Feb 20, 2015 at 10:33:38AM +0100, Thomas Schwinge wrote:
> On Thu, 19 Feb 2015 14:09:29 +0100, Jakub Jelinek <jakub@redhat.com> wrote:
> > On Tue, Feb 17, 2015 at 09:55:32PM +0100, Bernd Schmidt wrote:
> > > On 02/17/2015 06:10 PM, Jakub Jelinek wrote:
> > > >
> > > >What exact testcase are you trying to fix with this patch, and how do you
> > > >think offloading of code using va_list can work?
> > > 
> > > The exact testcase is any offloaded program - streaming in lto will crash if
> > > there is a mismatch in these preloaded nodes.
> 
> > could following untested patch be used as a temporary hack?
> 
> Thanks!  I'll leave the approval to Bernd, but can already report that
> this works fine in my testing, for intelmic and nvptx offloading.

Richard already approved it if it helps anything.  So, if your testing
suggests it helps something, I'll apply it.

The mode_table patch is still awaiting approval, and Bernd's approved patches
aren't applied, ditto your toplevel configure patch.

	Jakub
diff mbox

Patch

--- gcc/tree-streamer.c.jj	2015-02-18 12:36:20.000000000 +0100
+++ gcc/tree-streamer.c	2015-02-19 13:57:26.089626006 +0100
@@ -342,7 +342,14 @@  preload_common_nodes (struct streamer_tr
 	&& i != TI_TARGET_OPTION_DEFAULT
 	&& i != TI_TARGET_OPTION_CURRENT
 	&& i != TI_CURRENT_TARGET_PRAGMA
-	&& i != TI_CURRENT_OPTIMIZE_PRAGMA)
+	&& i != TI_CURRENT_OPTIMIZE_PRAGMA
+	/* Skip va_list* related nodes if offloading.  For native LTO
+	   we want them to be merged for the stdarg pass, for offloading
+	   they might not be identical between host and offloading target.  */
+	&& (!lto_stream_offload_p
+	    || (i != TI_VA_LIST_TYPE
+		&& i != TI_VA_LIST_GPR_COUNTER_FIELD
+		&& i != TI_VA_LIST_FPR_COUNTER_FIELD)))
       record_common_node (cache, global_trees[i]);
 }
 
--- gcc/tree-stdarg.c.jj	2015-02-18 22:55:52.000000000 +0100
+++ gcc/tree-stdarg.c	2015-02-19 14:00:11.882905823 +0100
@@ -705,6 +705,13 @@  public:
   virtual bool gate (function *fun)
     {
       return (flag_stdarg_opt
+#ifdef ACCEL_COMPILER
+	      /* Disable for GCC5 in the offloading compilers, as
+		 va_list and gpr/fpr counter fields are not merged.
+		 In GCC6 when stdarg is lowered late this shouldn't be
+		 an issue.  */
+	      && !in_lto_p
+#endif
 	      /* This optimization is only for stdarg functions.  */
 	      && fun->stdarg != 0);
     }