diff mbox

Handle BUILT_IN_GOACC_PARALLEL in ipa-pta

Message ID 56718B13.4040601@mentor.com
State New
Headers show

Commit Message

Tom de Vries Dec. 16, 2015, 4:02 p.m. UTC
On 10/12/15 14:14, Tom de Vries wrote:
> [ copy-pasting-with-quote from
> https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00420.html , for some
> reason I didn't get this email ]
>
>> On Thu, 3 Dec 2015, Tom de Vries wrote:
>>> The flag is set here in expand_omp_target:
>>> ...
>>> 12682         /* Prevent IPA from removing child_fn as unreachable,
>>>                  since there are no
>>> 12683            refs from the parent function to child_fn in offload
>>>                  LTO mode.  */
>>> 12684         if (ENABLE_OFFLOADING)
>>> 12685           cgraph_node::get (child_fn)->mark_force_output ();
>>> ...
>>>
>>
>> How are there no refs from the "parent"?  Are there not refs from
>> some kind of descriptor that maps fallback CPU and offloaded variants?
>
> That descriptor is the offload table, which is emitted in
> omp_finish_file. The function iterates over vectors offload_vars and
> offload_funcs.
>
> [ I would guess there's a one-on-one correspondance between
> symtab_node::offloadable and membership of either offload_vars or
> offload_funcs. ]
>
>> I think the above needs sorting out in somw way, making the refs
>> explicit rather than implicit via force_output.
>
> I've tried an approach where I add a test for node->offloadable next to
> each test for node->force_output, except for the test in the nonlocal_p
> def in ipa_pta_execute. But I didn't (yet) manage to make that work.
>
>>> I guess setting forced_by_abi instead would also mean child_fn is not
>>> removed
>>> as unreachable, while still allowing optimizations:
>>> ...
>>>   /* Like FORCE_OUTPUT, but in the case it is ABI requiring the symbol
>>>      to be exported.  Unlike FORCE_OUTPUT this flag gets cleared to
>>>      symbols promoted to static and it does not inhibit
>>>      optimization.  */
>>>   unsigned forced_by_abi : 1;
>>> ...
>>>
>>> But I suspect that other optimizations (than ipa-pta) might break
>>> things.
>>
>> How so?
>
> Probably it's more accurate to say that I do not understand the
> difference very well between force_output and force_by_abi, and what is
> the class of optimizations enabled by using forced_by_abi instead of
> force_output.'
>
>>> Essentially we have two situations:
>>> - in the host compiler, there is no need for the forced_output flag,
>>>   and it inhibits optimization
>>> - in the accelerator compiler, it (or some equivalent) is needed
>
> Actually, things are slightly more complicated, I realize now. There's
> also the distinction between:
> - symbols declared as offloadable in the source code, and
> - symbols create by the compiler and marked offloadable
>
>>> I wonder if setting the force_output flag only when streaming the
>>> bytecode for
>>> offloading would work. That way, it wouldn't be set in the host
>>> compiler,
>>> while being set in the accelerator compiler.
>>
>> Yeah, that was my original thinking btw.
>
> FTR, I've tried that approach, as attached. It fixed the
> goacc/kernels-alias-ipa-pta*.c failures. And I ran target-libgomp (also
> using an accelerator configuration) without any regressions.

How about this patch?

We remove the setting of force_output when:
- encountering offloadable symbols in the frontend, or
- creating offloadable symbols in expand-omp.

Instead, we set force_output in input_offload_tables.

This is an improvement because:
- it moves the force_output setting to a single location
- it does the force_output setting ALAP

Thanks,
- Tom

Comments

Tom de Vries Jan. 5, 2016, 2:56 p.m. UTC | #1
[ was: Re: [PATCH] Handle BUILT_IN_GOACC_PARALLEL in ipa-pta ]

On 16/12/15 17:02, Tom de Vries wrote:
> On 10/12/15 14:14, Tom de Vries wrote:
>> [ copy-pasting-with-quote from
>> https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00420.html , for some
>> reason I didn't get this email ]
>>
>>> On Thu, 3 Dec 2015, Tom de Vries wrote:
>>>> The flag is set here in expand_omp_target:
>>>> ...
>>>> 12682         /* Prevent IPA from removing child_fn as unreachable,
>>>>                  since there are no
>>>> 12683            refs from the parent function to child_fn in offload
>>>>                  LTO mode.  */
>>>> 12684         if (ENABLE_OFFLOADING)
>>>> 12685           cgraph_node::get (child_fn)->mark_force_output ();
>>>> ...
>>>>
>>>
>>> How are there no refs from the "parent"?  Are there not refs from
>>> some kind of descriptor that maps fallback CPU and offloaded variants?
>>
>> That descriptor is the offload table, which is emitted in
>> omp_finish_file. The function iterates over vectors offload_vars and
>> offload_funcs.
>>
>> [ I would guess there's a one-on-one correspondance between
>> symtab_node::offloadable and membership of either offload_vars or
>> offload_funcs. ]
>>
>>> I think the above needs sorting out in somw way, making the refs
>>> explicit rather than implicit via force_output.
>>
>> I've tried an approach where I add a test for node->offloadable next to
>> each test for node->force_output, except for the test in the nonlocal_p
>> def in ipa_pta_execute. But I didn't (yet) manage to make that work.
>>
>>>> I guess setting forced_by_abi instead would also mean child_fn is not
>>>> removed
>>>> as unreachable, while still allowing optimizations:
>>>> ...
>>>>   /* Like FORCE_OUTPUT, but in the case it is ABI requiring the symbol
>>>>      to be exported.  Unlike FORCE_OUTPUT this flag gets cleared to
>>>>      symbols promoted to static and it does not inhibit
>>>>      optimization.  */
>>>>   unsigned forced_by_abi : 1;
>>>> ...
>>>>
>>>> But I suspect that other optimizations (than ipa-pta) might break
>>>> things.
>>>
>>> How so?
>>
>> Probably it's more accurate to say that I do not understand the
>> difference very well between force_output and force_by_abi, and what is
>> the class of optimizations enabled by using forced_by_abi instead of
>> force_output.'
>>
>>>> Essentially we have two situations:
>>>> - in the host compiler, there is no need for the forced_output flag,
>>>>   and it inhibits optimization
>>>> - in the accelerator compiler, it (or some equivalent) is needed
>>
>> Actually, things are slightly more complicated, I realize now. There's
>> also the distinction between:
>> - symbols declared as offloadable in the source code, and
>> - symbols create by the compiler and marked offloadable
>>
>>>> I wonder if setting the force_output flag only when streaming the
>>>> bytecode for
>>>> offloading would work. That way, it wouldn't be set in the host
>>>> compiler,
>>>> while being set in the accelerator compiler.
>>>
>>> Yeah, that was my original thinking btw.
>>
>> FTR, I've tried that approach, as attached. It fixed the
>> goacc/kernels-alias-ipa-pta*.c failures. And I ran target-libgomp (also
>> using an accelerator configuration) without any regressions.
>
> How about this patch?
>

Ping.

Thanks,
- Tom

> We remove the setting of force_output when:
> - encountering offloadable symbols in the frontend, or
> - creating offloadable symbols in expand-omp.
>
> Instead, we set force_output in input_offload_tables.
>
> This is an improvement because:
> - it moves the force_output setting to a single location
> - it does the force_output setting ALAP
>
> Thanks,
> - Tom
>
> 0008-Mark-symbols-in-offload-tables-with-force_output-in-read_offload_tables.patch
>
>
> Mark symbols in offload tables with force_output in read_offload_tables
>
> 2015-12-15  Tom de Vries<tom@codesourcery.com>
>
> 	* c-parser.c (c_parser_oacc_declare, c_parser_omp_declare_target): Don't
> 	set force_output.
>
> 	* parser.c (cp_parser_oacc_declare, cp_parser_omp_declare_target): Don't
> 	set force_output.
>
> 	* omp-low.c (expand_omp_target): Don't set force_output.
> 	* varpool.c (varpool_node::get_create): Same.
> 	* lto-cgraph.c (input_offload_tables): Mark entries in offload_vars and
> 	offload_funcs with force_output.
>
> ---
>   gcc/c/c-parser.c | 10 ++--------
>   gcc/cp/parser.c  | 10 ++--------
>   gcc/lto-cgraph.c |  9 +++++++++
>   gcc/omp-low.c    |  5 -----
>   gcc/varpool.c    |  1 -
>   5 files changed, 13 insertions(+), 22 deletions(-)
>
> diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
> index 124c30b..6e6f4b8 100644
> --- a/gcc/c/c-parser.c
> +++ b/gcc/c/c-parser.c
> @@ -13527,10 +13527,7 @@ c_parser_oacc_declare (c_parser *parser)
>   		    {
>   		      g->have_offload = true;
>   		      if (is_a <varpool_node *> (node))
> -			{
> -			  vec_safe_push (offload_vars, decl);
> -			  node->force_output = 1;
> -			}
> +			vec_safe_push (offload_vars, decl);
>   		    }
>   		}
>   	    }
> @@ -16412,10 +16409,7 @@ c_parser_omp_declare_target (c_parser *parser)
>   		{
>   		  g->have_offload = true;
>   		  if (is_a <varpool_node *> (node))
> -		    {
> -		      vec_safe_push (offload_vars, t);
> -		      node->force_output = 1;
> -		    }
> +		    vec_safe_push (offload_vars, t);
>   		}
>   	    }
>   	}
> diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
> index a420cf1..340cc4a 100644
> --- a/gcc/cp/parser.c
> +++ b/gcc/cp/parser.c
> @@ -35091,10 +35091,7 @@ cp_parser_oacc_declare (cp_parser *parser, cp_token *pragma_tok)
>   		    {
>   		      g->have_offload = true;
>   		      if (is_a <varpool_node *> (node))
> -			{
> -			  vec_safe_push (offload_vars, decl);
> -			  node->force_output = 1;
> -			}
> +			vec_safe_push (offload_vars, decl);
>   		    }
>   		}
>   	    }
> @@ -35631,10 +35628,7 @@ cp_parser_omp_declare_target (cp_parser *parser, cp_token *pragma_tok)
>   		{
>   		  g->have_offload = true;
>   		  if (is_a <varpool_node *> (node))
> -		    {
> -		      vec_safe_push (offload_vars, t);
> -		      node->force_output = 1;
> -		    }
> +		    vec_safe_push (offload_vars, t);
>   		}
>   	    }
>   	}
> diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c
> index 62e5454..cdaee41 100644
> --- a/gcc/lto-cgraph.c
> +++ b/gcc/lto-cgraph.c
> @@ -1911,6 +1911,11 @@ input_offload_tables (void)
>   	      tree fn_decl
>   		= lto_file_decl_data_get_fn_decl (file_data, decl_index);
>   	      vec_safe_push (offload_funcs, fn_decl);
> +
> +	      /* Prevent IPA from removing fn_decl as unreachable, since there
> +		 may be no refs from the parent function to child_fn in offload
> +		 LTO mode.  */
> +	      cgraph_node::get (fn_decl)->mark_force_output ();
>   	    }
>   	  else if (tag == LTO_symtab_variable)
>   	    {
> @@ -1918,6 +1923,10 @@ input_offload_tables (void)
>   	      tree var_decl
>   		= lto_file_decl_data_get_var_decl (file_data, decl_index);
>   	      vec_safe_push (offload_vars, var_decl);
> +
> +	      /* Prevent IPA from removing var_decl as unused, since there
> +		 may be no refs to var_decl in offload LTO mode.  */
> +	      varpool_node::get (var_decl)->force_output = 1;
>   	    }
>   	  else
>   	    fatal_error (input_location,
> diff --git a/gcc/omp-low.c b/gcc/omp-low.c
> index e1d7c09..a76d8fc 100644
> --- a/gcc/omp-low.c
> +++ b/gcc/omp-low.c
> @@ -12813,11 +12813,6 @@ expand_omp_target (struct omp_region *region)
>   	assign_assembler_name_if_neeeded (child_fn);
>         cgraph_edge::rebuild_edges ();
>
> -      /* Prevent IPA from removing child_fn as unreachable, since there are no
> -	 refs from the parent function to child_fn in offload LTO mode.  */
> -      if (ENABLE_OFFLOADING)
> -	cgraph_node::get (child_fn)->mark_force_output ();
> -
>         /* Some EH regions might become dead, see PR34608.  If
>   	 pass_cleanup_cfg isn't the first pass to happen with the
>   	 new child, these dead EH edges might cause problems.
> diff --git a/gcc/varpool.c b/gcc/varpool.c
> index 5e4fcbf..5ed65e5 100644
> --- a/gcc/varpool.c
> +++ b/gcc/varpool.c
> @@ -158,7 +158,6 @@ varpool_node::get_create (tree decl)
>   	  g->have_offload = true;
>   	  if (!in_lto_p)
>   	    vec_safe_push (offload_vars, decl);
> -	  node->force_output = 1;
>   	}
>       }
>
>
Richard Biener Jan. 8, 2016, 8:16 a.m. UTC | #2
On Wed, 16 Dec 2015, Tom de Vries wrote:

> On 10/12/15 14:14, Tom de Vries wrote:
> > [ copy-pasting-with-quote from
> > https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00420.html , for some
> > reason I didn't get this email ]
> > 
> > > On Thu, 3 Dec 2015, Tom de Vries wrote:
> > > > The flag is set here in expand_omp_target:
> > > > ...
> > > > 12682         /* Prevent IPA from removing child_fn as unreachable,
> > > >                  since there are no
> > > > 12683            refs from the parent function to child_fn in offload
> > > >                  LTO mode.  */
> > > > 12684         if (ENABLE_OFFLOADING)
> > > > 12685           cgraph_node::get (child_fn)->mark_force_output ();
> > > > ...
> > > > 
> > > 
> > > How are there no refs from the "parent"?  Are there not refs from
> > > some kind of descriptor that maps fallback CPU and offloaded variants?
> > 
> > That descriptor is the offload table, which is emitted in
> > omp_finish_file. The function iterates over vectors offload_vars and
> > offload_funcs.
> > 
> > [ I would guess there's a one-on-one correspondance between
> > symtab_node::offloadable and membership of either offload_vars or
> > offload_funcs. ]
> > 
> > > I think the above needs sorting out in somw way, making the refs
> > > explicit rather than implicit via force_output.
> > 
> > I've tried an approach where I add a test for node->offloadable next to
> > each test for node->force_output, except for the test in the nonlocal_p
> > def in ipa_pta_execute. But I didn't (yet) manage to make that work.
> > 
> > > > I guess setting forced_by_abi instead would also mean child_fn is not
> > > > removed
> > > > as unreachable, while still allowing optimizations:
> > > > ...
> > > >   /* Like FORCE_OUTPUT, but in the case it is ABI requiring the symbol
> > > >      to be exported.  Unlike FORCE_OUTPUT this flag gets cleared to
> > > >      symbols promoted to static and it does not inhibit
> > > >      optimization.  */
> > > >   unsigned forced_by_abi : 1;
> > > > ...
> > > > 
> > > > But I suspect that other optimizations (than ipa-pta) might break
> > > > things.
> > > 
> > > How so?
> > 
> > Probably it's more accurate to say that I do not understand the
> > difference very well between force_output and force_by_abi, and what is
> > the class of optimizations enabled by using forced_by_abi instead of
> > force_output.'
> > 
> > > > Essentially we have two situations:
> > > > - in the host compiler, there is no need for the forced_output flag,
> > > >   and it inhibits optimization
> > > > - in the accelerator compiler, it (or some equivalent) is needed
> > 
> > Actually, things are slightly more complicated, I realize now. There's
> > also the distinction between:
> > - symbols declared as offloadable in the source code, and
> > - symbols create by the compiler and marked offloadable
> > 
> > > > I wonder if setting the force_output flag only when streaming the
> > > > bytecode for
> > > > offloading would work. That way, it wouldn't be set in the host
> > > > compiler,
> > > > while being set in the accelerator compiler.
> > > 
> > > Yeah, that was my original thinking btw.
> > 
> > FTR, I've tried that approach, as attached. It fixed the
> > goacc/kernels-alias-ipa-pta*.c failures. And I ran target-libgomp (also
> > using an accelerator configuration) without any regressions.
> 
> How about this patch?
> 
> We remove the setting of force_output when:
> - encountering offloadable symbols in the frontend, or
> - creating offloadable symbols in expand-omp.
> 
> Instead, we set force_output in input_offload_tables.
> 
> This is an improvement because:
> - it moves the force_output setting to a single location
> - it does the force_output setting ALAP

Works for me.

Richard.
Thomas Schwinge Jan. 14, 2016, 9:31 a.m. UTC | #3
Hi!

On Fri, 8 Jan 2016 09:16:55 +0100, Richard Biener <rguenther@suse.de> wrote:
> On Wed, 16 Dec 2015, Tom de Vries wrote:
> > On 10/12/15 14:14, Tom de Vries wrote:
> > > > On Thu, 3 Dec 2015, Tom de Vries wrote:
> > > > > Essentially we have two situations:
> > > > > - in the host compiler, there is no need for the forced_output flag,
> > > > >   and it inhibits optimization
> > > > > - in the accelerator compiler, it (or some equivalent) is needed
> > > 
> > > Actually, things are slightly more complicated, I realize now. There's
> > > also the distinction between:
> > > - symbols declared as offloadable in the source code, and
> > > - symbols create by the compiler and marked offloadable
> > > 
> > > > > I wonder if setting the force_output flag only when streaming the
> > > > > bytecode for
> > > > > offloading would work. That way, it wouldn't be set in the host
> > > > > compiler,
> > > > > while being set in the accelerator compiler.
> > > > 
> > > > Yeah, that was my original thinking btw.
> > > 
> > > FTR, I've tried that approach, as attached. It fixed the
> > > goacc/kernels-alias-ipa-pta*.c failures.

Confirmed.  And, no change in offloading testing when applying the patch
to gomp-4_0-branch (where these FAILs didn't appear to begin with).

> > > And I ran target-libgomp (also
> > > using an accelerator configuration) without any regressions.

Confirmed.

> > How about this patch?
> > 
> > We remove the setting of force_output when:
> > - encountering offloadable symbols in the frontend, or
> > - creating offloadable symbols in expand-omp.
> > 
> > Instead, we set force_output in input_offload_tables.
> > 
> > This is an improvement because:
> > - it moves the force_output setting to a single location
> > - it does the force_output setting ALAP
> 
> Works for me.

Do we need review from Jakub or is this good for trunk?


Grüße
 Thomas
Richard Biener Jan. 14, 2016, 9:32 a.m. UTC | #4
On Thu, 14 Jan 2016, Thomas Schwinge wrote:

> Hi!
> 
> On Fri, 8 Jan 2016 09:16:55 +0100, Richard Biener <rguenther@suse.de> wrote:
> > On Wed, 16 Dec 2015, Tom de Vries wrote:
> > > On 10/12/15 14:14, Tom de Vries wrote:
> > > > > On Thu, 3 Dec 2015, Tom de Vries wrote:
> > > > > > Essentially we have two situations:
> > > > > > - in the host compiler, there is no need for the forced_output flag,
> > > > > >   and it inhibits optimization
> > > > > > - in the accelerator compiler, it (or some equivalent) is needed
> > > > 
> > > > Actually, things are slightly more complicated, I realize now. There's
> > > > also the distinction between:
> > > > - symbols declared as offloadable in the source code, and
> > > > - symbols create by the compiler and marked offloadable
> > > > 
> > > > > > I wonder if setting the force_output flag only when streaming the
> > > > > > bytecode for
> > > > > > offloading would work. That way, it wouldn't be set in the host
> > > > > > compiler,
> > > > > > while being set in the accelerator compiler.
> > > > > 
> > > > > Yeah, that was my original thinking btw.
> > > > 
> > > > FTR, I've tried that approach, as attached. It fixed the
> > > > goacc/kernels-alias-ipa-pta*.c failures.
> 
> Confirmed.  And, no change in offloading testing when applying the patch
> to gomp-4_0-branch (where these FAILs didn't appear to begin with).
> 
> > > > And I ran target-libgomp (also
> > > > using an accelerator configuration) without any regressions.
> 
> Confirmed.
> 
> > > How about this patch?
> > > 
> > > We remove the setting of force_output when:
> > > - encountering offloadable symbols in the frontend, or
> > > - creating offloadable symbols in expand-omp.
> > > 
> > > Instead, we set force_output in input_offload_tables.
> > > 
> > > This is an improvement because:
> > > - it moves the force_output setting to a single location
> > > - it does the force_output setting ALAP
> > 
> > Works for me.
> 
> Do we need review from Jakub or is this good for trunk?

It's good for trunk.

Richard.
Ilya Verbin Jan. 25, 2016, 1:27 p.m. UTC | #5
Hi!

On Tue, Jan 05, 2016 at 15:56:15 +0100, Tom de Vries wrote:
> >diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c
> >index 62e5454..cdaee41 100644
> >--- a/gcc/lto-cgraph.c
> >+++ b/gcc/lto-cgraph.c
> >@@ -1911,6 +1911,11 @@ input_offload_tables (void)
> >  	      tree fn_decl
> >  		= lto_file_decl_data_get_fn_decl (file_data, decl_index);
> >  	      vec_safe_push (offload_funcs, fn_decl);
> >+
> >+	      /* Prevent IPA from removing fn_decl as unreachable, since there
> >+		 may be no refs from the parent function to child_fn in offload
> >+		 LTO mode.  */
> >+	      cgraph_node::get (fn_decl)->mark_force_output ();
> >  	    }
> >  	  else if (tag == LTO_symtab_variable)
> >  	    {
> >@@ -1918,6 +1923,10 @@ input_offload_tables (void)
> >  	      tree var_decl
> >  		= lto_file_decl_data_get_var_decl (file_data, decl_index);
> >  	      vec_safe_push (offload_vars, var_decl);
> >+
> >+	      /* Prevent IPA from removing var_decl as unused, since there
> >+		 may be no refs to var_decl in offload LTO mode.  */
> >+	      varpool_node::get (var_decl)->force_output = 1;
> >  	    }

This doesn't work when there is more than one LTO partition, because only first
partition contains full offload table to maintain correct order, but cgraph and
varpool nodes aren't necessarily created for the first partition.  To reproduce:

$ make check-target-libgomp RUNTESTFLAGS="c.exp=for-* --target_board=unix/-flto"
FAIL: libgomp.c/for-3.c (internal compiler error)
FAIL: libgomp.c/for-5.c (internal compiler error)
FAIL: libgomp.c/for-6.c (internal compiler error)
$ make check-target-libgomp RUNTESTFLAGS="c++.exp=for-* --target_board=unix/-flto"
FAIL: libgomp.c++/for-11.C (internal compiler error)
FAIL: libgomp.c++/for-13.C (internal compiler error)
FAIL: libgomp.c++/for-14.C (internal compiler error)

  -- Ilya
diff mbox

Patch

Mark symbols in offload tables with force_output in read_offload_tables

2015-12-15  Tom de Vries  <tom@codesourcery.com>

	* c-parser.c (c_parser_oacc_declare, c_parser_omp_declare_target): Don't
	set force_output.

	* parser.c (cp_parser_oacc_declare, cp_parser_omp_declare_target): Don't
	set force_output.

	* omp-low.c (expand_omp_target): Don't set force_output.
	* varpool.c (varpool_node::get_create): Same.
	* lto-cgraph.c (input_offload_tables): Mark entries in offload_vars and
	offload_funcs with force_output.

---
 gcc/c/c-parser.c | 10 ++--------
 gcc/cp/parser.c  | 10 ++--------
 gcc/lto-cgraph.c |  9 +++++++++
 gcc/omp-low.c    |  5 -----
 gcc/varpool.c    |  1 -
 5 files changed, 13 insertions(+), 22 deletions(-)

diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index 124c30b..6e6f4b8 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -13527,10 +13527,7 @@  c_parser_oacc_declare (c_parser *parser)
 		    {
 		      g->have_offload = true;
 		      if (is_a <varpool_node *> (node))
-			{
-			  vec_safe_push (offload_vars, decl);
-			  node->force_output = 1;
-			}
+			vec_safe_push (offload_vars, decl);
 		    }
 		}
 	    }
@@ -16412,10 +16409,7 @@  c_parser_omp_declare_target (c_parser *parser)
 		{
 		  g->have_offload = true;
 		  if (is_a <varpool_node *> (node))
-		    {
-		      vec_safe_push (offload_vars, t);
-		      node->force_output = 1;
-		    }
+		    vec_safe_push (offload_vars, t);
 		}
 	    }
 	}
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index a420cf1..340cc4a 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -35091,10 +35091,7 @@  cp_parser_oacc_declare (cp_parser *parser, cp_token *pragma_tok)
 		    {
 		      g->have_offload = true;
 		      if (is_a <varpool_node *> (node))
-			{
-			  vec_safe_push (offload_vars, decl);
-			  node->force_output = 1;
-			}
+			vec_safe_push (offload_vars, decl);
 		    }
 		}
 	    }
@@ -35631,10 +35628,7 @@  cp_parser_omp_declare_target (cp_parser *parser, cp_token *pragma_tok)
 		{
 		  g->have_offload = true;
 		  if (is_a <varpool_node *> (node))
-		    {
-		      vec_safe_push (offload_vars, t);
-		      node->force_output = 1;
-		    }
+		    vec_safe_push (offload_vars, t);
 		}
 	    }
 	}
diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c
index 62e5454..cdaee41 100644
--- a/gcc/lto-cgraph.c
+++ b/gcc/lto-cgraph.c
@@ -1911,6 +1911,11 @@  input_offload_tables (void)
 	      tree fn_decl
 		= lto_file_decl_data_get_fn_decl (file_data, decl_index);
 	      vec_safe_push (offload_funcs, fn_decl);
+
+	      /* Prevent IPA from removing fn_decl as unreachable, since there
+		 may be no refs from the parent function to child_fn in offload
+		 LTO mode.  */
+	      cgraph_node::get (fn_decl)->mark_force_output ();
 	    }
 	  else if (tag == LTO_symtab_variable)
 	    {
@@ -1918,6 +1923,10 @@  input_offload_tables (void)
 	      tree var_decl
 		= lto_file_decl_data_get_var_decl (file_data, decl_index);
 	      vec_safe_push (offload_vars, var_decl);
+
+	      /* Prevent IPA from removing var_decl as unused, since there
+		 may be no refs to var_decl in offload LTO mode.  */
+	      varpool_node::get (var_decl)->force_output = 1;
 	    }
 	  else
 	    fatal_error (input_location,
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index e1d7c09..a76d8fc 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -12813,11 +12813,6 @@  expand_omp_target (struct omp_region *region)
 	assign_assembler_name_if_neeeded (child_fn);
       cgraph_edge::rebuild_edges ();
 
-      /* Prevent IPA from removing child_fn as unreachable, since there are no
-	 refs from the parent function to child_fn in offload LTO mode.  */
-      if (ENABLE_OFFLOADING)
-	cgraph_node::get (child_fn)->mark_force_output ();
-
       /* Some EH regions might become dead, see PR34608.  If
 	 pass_cleanup_cfg isn't the first pass to happen with the
 	 new child, these dead EH edges might cause problems.
diff --git a/gcc/varpool.c b/gcc/varpool.c
index 5e4fcbf..5ed65e5 100644
--- a/gcc/varpool.c
+++ b/gcc/varpool.c
@@ -158,7 +158,6 @@  varpool_node::get_create (tree decl)
 	  g->have_offload = true;
 	  if (!in_lto_p)
 	    vec_safe_push (offload_vars, decl);
-	  node->force_output = 1;
 	}
     }