Message ID | 23b4998b-bbe6-b052-d7f5-5304ee0f46a3@linux.ibm.com |
---|---|
State | New |
Headers | show |
Series | predict: Adjust optimize_function_for_size_p [PR105818] | expand |
> Hi, > > Function optimize_function_for_size_p returns OPTIMIZE_SIZE_NO > if func->decl is not null but no cgraph node is available for it. > As PR105818 shows, this could give unexpected result. For the > case in PR105818, when parsing bar decl in function foo, the cfun > is a function structure for foo, for which there is none cgraph > node, so it returns OPTIMIZE_SIZE_NO. But it's incorrect since > the context is to optimize for size, the flag optimize_size is > true. > > The patch is to make optimize_function_for_size_p to check > optimize_size as what it does when func->decl is unavailable. > > One regression failure got exposed on aarch64-linux-gnu: > > PASS->FAIL: gcc.dg/guality/pr54693-2.c -Os \ > -DPREVENT_OPTIMIZATION line 21 x == 10 - i > > The difference comes from the macro LOGICAL_OP_NON_SHORT_CIRCUIT > used in function fold_range_test during c parsing, it uses > optimize_function_for_speed_p which is equal to the invertion > of optimize_function_for_size_p. At that time cfun->decl is valid > but no cgraph node for it, w/o this patch function > optimize_function_for_speed_p returns true eventually, while it > returns false with this patch. Since the command line option -Os > is specified, there is no reason to interpret it as "for speed". > I think this failure is expected and adjust the test case > accordingly. > > Is it ok for trunk? > > BR, > Kewen > ----- > > PR target/105818 > > gcc/ChangeLog: > > * predict.cc (optimize_function_for_size_p): Check optimize_size when > func->decl is valid but its cgraph node is unavailable. > > gcc/testsuite/ChangeLog: > > * gcc.target/powerpc/pr105818.c: New test. > * gcc.dg/guality/pr54693-2.c: Adjust for aarch64. > --- > gcc/predict.cc | 2 +- > gcc/testsuite/gcc.dg/guality/pr54693-2.c | 2 +- > gcc/testsuite/gcc.target/powerpc/pr105818.c | 9 +++++++++ > 3 files changed, 11 insertions(+), 2 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/powerpc/pr105818.c > > diff --git a/gcc/predict.cc b/gcc/predict.cc > index 5734e4c8516..6c60a973236 100644 > --- a/gcc/predict.cc > +++ b/gcc/predict.cc > @@ -268,7 +268,7 @@ optimize_function_for_size_p (struct function *fun) > cgraph_node *n = cgraph_node::get (fun->decl); > if (n) > return n->optimize_for_size_p (); > - return OPTIMIZE_SIZE_NO; > + return optimize_size ? OPTIMIZE_SIZE_MAX : OPTIMIZE_SIZE_NO; We could also do (opt_for_fn (cfun->decl, optimize_size) that is probably better since one can change optimize_size with optimization attribute. However I think in most cases we check for optimize_size early I think we are doing something wrong, since at that time htere is no profile available. Why exactly PR105818 hits the flag change issue? Honza > } > > /* Return true if function FUN should always be optimized for speed. */ > diff --git a/gcc/testsuite/gcc.dg/guality/pr54693-2.c b/gcc/testsuite/gcc.dg/guality/pr54693-2.c > index 68aa6c63d71..14ca94ad37d 100644 > --- a/gcc/testsuite/gcc.dg/guality/pr54693-2.c > +++ b/gcc/testsuite/gcc.dg/guality/pr54693-2.c > @@ -17,7 +17,7 @@ foo (int x, int y, int z) > int i = 0; > while (x > 3 && y > 3 && z > 3) > { /* { dg-final { gdb-test .+2 "i" "v + 1" } } */ > - /* { dg-final { gdb-test .+1 "x" "10 - i" } } */ > + /* { dg-final { gdb-test .+1 "x" "10 - i" { xfail { aarch64*-*-* && { any-opts "-Os" } } } } } */ > bar (i); /* { dg-final { gdb-test . "y" "20 - 2 * i" } } */ > /* { dg-final { gdb-test .-1 "z" "30 - 3 * i" { xfail { aarch64*-*-* && { any-opts "-fno-fat-lto-objects" "-Os" } } } } } */ > i++, x--, y -= 2, z -= 3; > diff --git a/gcc/testsuite/gcc.target/powerpc/pr105818.c b/gcc/testsuite/gcc.target/powerpc/pr105818.c > new file mode 100644 > index 00000000000..18781edbf9e > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/pr105818.c > @@ -0,0 +1,9 @@ > +/* { dg-options "-Os -fno-tree-vectorize" } */ > + > +#pragma GCC optimize "-fno-tree-vectorize" > + > +void > +foo (void) > +{ > + void bar (void); > +} > -- > 2.27.0
On Tue, Jun 14, 2022 at 03:57:08PM +0800, Kewen.Lin wrote:
> PR target/105818
Change this to something else? It is not a target bug. "middle-end"
perhaps?
Thanks for fixing this,
Segher
Hi Honza, Thanks for the comments! Some replies are inlined below. on 2022/6/14 19:37, Jan Hubicka wrote: >> Hi, >> >> Function optimize_function_for_size_p returns OPTIMIZE_SIZE_NO >> if func->decl is not null but no cgraph node is available for it. >> As PR105818 shows, this could give unexpected result. For the >> case in PR105818, when parsing bar decl in function foo, the cfun >> is a function structure for foo, for which there is none cgraph >> node, so it returns OPTIMIZE_SIZE_NO. But it's incorrect since >> the context is to optimize for size, the flag optimize_size is >> true. >> >> The patch is to make optimize_function_for_size_p to check >> optimize_size as what it does when func->decl is unavailable. >> >> One regression failure got exposed on aarch64-linux-gnu: >> >> PASS->FAIL: gcc.dg/guality/pr54693-2.c -Os \ >> -DPREVENT_OPTIMIZATION line 21 x == 10 - i >> >> The difference comes from the macro LOGICAL_OP_NON_SHORT_CIRCUIT >> used in function fold_range_test during c parsing, it uses >> optimize_function_for_speed_p which is equal to the invertion >> of optimize_function_for_size_p. At that time cfun->decl is valid >> but no cgraph node for it, w/o this patch function >> optimize_function_for_speed_p returns true eventually, while it >> returns false with this patch. Since the command line option -Os >> is specified, there is no reason to interpret it as "for speed". >> I think this failure is expected and adjust the test case >> accordingly. >> >> Is it ok for trunk? >> >> BR, >> Kewen >> ----- >> >> PR target/105818 >> >> gcc/ChangeLog: >> >> * predict.cc (optimize_function_for_size_p): Check optimize_size when >> func->decl is valid but its cgraph node is unavailable. >> >> gcc/testsuite/ChangeLog: >> >> * gcc.target/powerpc/pr105818.c: New test. >> * gcc.dg/guality/pr54693-2.c: Adjust for aarch64. >> --- >> gcc/predict.cc | 2 +- >> gcc/testsuite/gcc.dg/guality/pr54693-2.c | 2 +- >> gcc/testsuite/gcc.target/powerpc/pr105818.c | 9 +++++++++ >> 3 files changed, 11 insertions(+), 2 deletions(-) >> create mode 100644 gcc/testsuite/gcc.target/powerpc/pr105818.c >> >> diff --git a/gcc/predict.cc b/gcc/predict.cc >> index 5734e4c8516..6c60a973236 100644 >> --- a/gcc/predict.cc >> +++ b/gcc/predict.cc >> @@ -268,7 +268,7 @@ optimize_function_for_size_p (struct function *fun) >> cgraph_node *n = cgraph_node::get (fun->decl); >> if (n) >> return n->optimize_for_size_p (); >> - return OPTIMIZE_SIZE_NO; >> + return optimize_size ? OPTIMIZE_SIZE_MAX : OPTIMIZE_SIZE_NO; > > We could also do (opt_for_fn (cfun->decl, optimize_size) that is > probably better since one can change optimize_size with optimization > attribute. Good point, agree! > However I think in most cases we check for optimize_size early I think > we are doing something wrong, since at that time htere is no profile > available. Why exactly PR105818 hits the flag change issue? For PR105818, the reason why the flag changs is that: Firstly, the inconsistent flag is OPTION_MASK_SAVE_TOC_INDIRECT bit of rs6000_isa_flags_explicit, it's set as below: /* If we can shrink-wrap the TOC register save separately, then use -msave-toc-indirect unless explicitly disabled. */ if ((rs6000_isa_flags_explicit & OPTION_MASK_SAVE_TOC_INDIRECT) == 0 && flag_shrink_wrap_separate && optimize_function_for_speed_p (cfun)) rs6000_isa_flags |= OPTION_MASK_SAVE_TOC_INDIRECT; Initially, rs6000 initialize target_option_default_node with OPTION_MASK_SAVE_TOC_INDIRECT unset, at that time cfun is NULL and optimize_size is true. Later, when c parser handling function foo, it builds target option node as target_option_default_node in function handle_optimize_attribute, it does global option saving and verifying there as well, at that time the cfun is NULL, no issue is found. And function store_parm_decls allocates struct_function for foo then, cfun becomes function struct for foo, when c parser continues to handle the decl bar in foo, function handle_optimize_attribute works as before, tries to restore the target options at the end, it calls targetm.target_option.restore (rs6000_function_specific_restore) which calls function rs6000_option_override_internal again, at this time the cfun is not NULL while there is no cgraph node for its decl, optimize_function_for_speed_p returns true and gets the OPTION_MASK_SAVE_TOC_INDIRECT bit of flag rs6000_isa_flags set unexpectedly. It becomes inconsistent as the one saved previously. IMHO, both contexts of global and function decl foo here hold optimize_size, function optimize_function_for_speed_p should not return true anyway. btw, the aarch64 failed case also gets the unexpected result for optimize_function_for_speed_p during c parsing (fold_range_test <- ... <- c_parser_condition). IIUC, in parsing time we don't have the profile information available. BR, Kewen
on 2022/6/14 20:53, Segher Boessenkool wrote: > On Tue, Jun 14, 2022 at 03:57:08PM +0800, Kewen.Lin wrote: >> PR target/105818 > > Change this to something else? It is not a target bug. "middle-end" > perhaps? > Good catch, will fix this. Thanks Segher! BR, Kewen
on 2022/6/15 14:20, Kewen.Lin wrote: > Hi Honza, > > Thanks for the comments! Some replies are inlined below. > > on 2022/6/14 19:37, Jan Hubicka wrote: >>> Hi, >>> >>> Function optimize_function_for_size_p returns OPTIMIZE_SIZE_NO >>> if func->decl is not null but no cgraph node is available for it. >>> As PR105818 shows, this could give unexpected result. For the >>> case in PR105818, when parsing bar decl in function foo, the cfun >>> is a function structure for foo, for which there is none cgraph >>> node, so it returns OPTIMIZE_SIZE_NO. But it's incorrect since >>> the context is to optimize for size, the flag optimize_size is >>> true. >>> >>> The patch is to make optimize_function_for_size_p to check >>> optimize_size as what it does when func->decl is unavailable. >>> >>> One regression failure got exposed on aarch64-linux-gnu: >>> >>> PASS->FAIL: gcc.dg/guality/pr54693-2.c -Os \ >>> -DPREVENT_OPTIMIZATION line 21 x == 10 - i >>> >>> The difference comes from the macro LOGICAL_OP_NON_SHORT_CIRCUIT >>> used in function fold_range_test during c parsing, it uses >>> optimize_function_for_speed_p which is equal to the invertion >>> of optimize_function_for_size_p. At that time cfun->decl is valid >>> but no cgraph node for it, w/o this patch function >>> optimize_function_for_speed_p returns true eventually, while it >>> returns false with this patch. Since the command line option -Os >>> is specified, there is no reason to interpret it as "for speed". >>> I think this failure is expected and adjust the test case >>> accordingly. >>> >>> Is it ok for trunk? >>> >>> BR, >>> Kewen >>> ----- >>> >>> PR target/105818 >>> >>> gcc/ChangeLog: >>> >>> * predict.cc (optimize_function_for_size_p): Check optimize_size when >>> func->decl is valid but its cgraph node is unavailable. >>> >>> gcc/testsuite/ChangeLog: >>> >>> * gcc.target/powerpc/pr105818.c: New test. >>> * gcc.dg/guality/pr54693-2.c: Adjust for aarch64. >>> --- >>> gcc/predict.cc | 2 +- >>> gcc/testsuite/gcc.dg/guality/pr54693-2.c | 2 +- >>> gcc/testsuite/gcc.target/powerpc/pr105818.c | 9 +++++++++ >>> 3 files changed, 11 insertions(+), 2 deletions(-) >>> create mode 100644 gcc/testsuite/gcc.target/powerpc/pr105818.c >>> >>> diff --git a/gcc/predict.cc b/gcc/predict.cc >>> index 5734e4c8516..6c60a973236 100644 >>> --- a/gcc/predict.cc >>> +++ b/gcc/predict.cc >>> @@ -268,7 +268,7 @@ optimize_function_for_size_p (struct function *fun) >>> cgraph_node *n = cgraph_node::get (fun->decl); >>> if (n) >>> return n->optimize_for_size_p (); >>> - return OPTIMIZE_SIZE_NO; >>> + return optimize_size ? OPTIMIZE_SIZE_MAX : OPTIMIZE_SIZE_NO; >> >> We could also do (opt_for_fn (cfun->decl, optimize_size) that is >> probably better since one can change optimize_size with optimization >> attribute. > > Good point, agree! > >> However I think in most cases we check for optimize_size early I think >> we are doing something wrong, since at that time htere is no profile >> available. Why exactly PR105818 hits the flag change issue? > > For PR105818, the reason why the flag changs is that: > > Firstly, the inconsistent flag is OPTION_MASK_SAVE_TOC_INDIRECT bit > of rs6000_isa_flags_explicit, it's set as below: > > /* If we can shrink-wrap the TOC register save separately, then use > -msave-toc-indirect unless explicitly disabled. */ > if ((rs6000_isa_flags_explicit & OPTION_MASK_SAVE_TOC_INDIRECT) == 0 > && flag_shrink_wrap_separate > && optimize_function_for_speed_p (cfun)) > rs6000_isa_flags |= OPTION_MASK_SAVE_TOC_INDIRECT; > > Initially, rs6000 initialize target_option_default_node with > OPTION_MASK_SAVE_TOC_INDIRECT unset, at that time cfun is NULL > and optimize_size is true. > > Later, when c parser handling function foo, it builds target > option node as target_option_default_node in function > handle_optimize_attribute, it does global option saving and > verifying there as well, at that time the cfun is NULL, no > issue is found. And function store_parm_decls allocates > struct_function for foo then, cfun becomes function struct > for foo, when c parser continues to handle the decl bar in > foo, function handle_optimize_attribute works as before, > tries to restore the target options at the end, it calls > targetm.target_option.restore (rs6000_function_specific_restore) > which calls function rs6000_option_override_internal again, > at this time the cfun is not NULL while there is no cgraph > node for its decl, optimize_function_for_speed_p returns true > and gets the OPTION_MASK_SAVE_TOC_INDIRECT bit of flag > rs6000_isa_flags set unexpectedly. It becomes inconsistent > as the one saved previously. > > IMHO, both contexts of global and function decl foo here hold > optimize_size, function optimize_function_for_speed_p should > not return true anyway. > > btw, the aarch64 failed case also gets the unexpected > result for optimize_function_for_speed_p during c parsing > (fold_range_test <- ... <- c_parser_condition). > > IIUC, in parsing time we don't have the profile information > available. > Hi Honza, Does the above explanation sound reasonable to you? BR, Kewen
on 2022/7/11 11:42, Kewen.Lin wrote: > on 2022/6/15 14:20, Kewen.Lin wrote: >> Hi Honza, >> >> Thanks for the comments! Some replies are inlined below. >> >> on 2022/6/14 19:37, Jan Hubicka wrote: >>>> Hi, >>>> >>>> Function optimize_function_for_size_p returns OPTIMIZE_SIZE_NO >>>> if func->decl is not null but no cgraph node is available for it. >>>> As PR105818 shows, this could give unexpected result. For the >>>> case in PR105818, when parsing bar decl in function foo, the cfun >>>> is a function structure for foo, for which there is none cgraph >>>> node, so it returns OPTIMIZE_SIZE_NO. But it's incorrect since >>>> the context is to optimize for size, the flag optimize_size is >>>> true. >>>> >>>> The patch is to make optimize_function_for_size_p to check >>>> optimize_size as what it does when func->decl is unavailable. >>>> >>>> One regression failure got exposed on aarch64-linux-gnu: >>>> >>>> PASS->FAIL: gcc.dg/guality/pr54693-2.c -Os \ >>>> -DPREVENT_OPTIMIZATION line 21 x == 10 - i >>>> >>>> The difference comes from the macro LOGICAL_OP_NON_SHORT_CIRCUIT >>>> used in function fold_range_test during c parsing, it uses >>>> optimize_function_for_speed_p which is equal to the invertion >>>> of optimize_function_for_size_p. At that time cfun->decl is valid >>>> but no cgraph node for it, w/o this patch function >>>> optimize_function_for_speed_p returns true eventually, while it >>>> returns false with this patch. Since the command line option -Os >>>> is specified, there is no reason to interpret it as "for speed". >>>> I think this failure is expected and adjust the test case >>>> accordingly. >>>> >>>> Is it ok for trunk? >>>> >>>> BR, >>>> Kewen >>>> ----- >>>> >>>> PR target/105818 >>>> >>>> gcc/ChangeLog: >>>> >>>> * predict.cc (optimize_function_for_size_p): Check optimize_size when >>>> func->decl is valid but its cgraph node is unavailable. >>>> >>>> gcc/testsuite/ChangeLog: >>>> >>>> * gcc.target/powerpc/pr105818.c: New test. >>>> * gcc.dg/guality/pr54693-2.c: Adjust for aarch64. >>>> --- >>>> gcc/predict.cc | 2 +- >>>> gcc/testsuite/gcc.dg/guality/pr54693-2.c | 2 +- >>>> gcc/testsuite/gcc.target/powerpc/pr105818.c | 9 +++++++++ >>>> 3 files changed, 11 insertions(+), 2 deletions(-) >>>> create mode 100644 gcc/testsuite/gcc.target/powerpc/pr105818.c >>>> >>>> diff --git a/gcc/predict.cc b/gcc/predict.cc >>>> index 5734e4c8516..6c60a973236 100644 >>>> --- a/gcc/predict.cc >>>> +++ b/gcc/predict.cc >>>> @@ -268,7 +268,7 @@ optimize_function_for_size_p (struct function *fun) >>>> cgraph_node *n = cgraph_node::get (fun->decl); >>>> if (n) >>>> return n->optimize_for_size_p (); >>>> - return OPTIMIZE_SIZE_NO; >>>> + return optimize_size ? OPTIMIZE_SIZE_MAX : OPTIMIZE_SIZE_NO; >>> >>> We could also do (opt_for_fn (cfun->decl, optimize_size) that is >>> probably better since one can change optimize_size with optimization >>> attribute. >> >> Good point, agree! >> >>> However I think in most cases we check for optimize_size early I think >>> we are doing something wrong, since at that time htere is no profile >>> available. Why exactly PR105818 hits the flag change issue? >> >> For PR105818, the reason why the flag changs is that: >> >> Firstly, the inconsistent flag is OPTION_MASK_SAVE_TOC_INDIRECT bit >> of rs6000_isa_flags_explicit, it's set as below: >> >> /* If we can shrink-wrap the TOC register save separately, then use >> -msave-toc-indirect unless explicitly disabled. */ >> if ((rs6000_isa_flags_explicit & OPTION_MASK_SAVE_TOC_INDIRECT) == 0 >> && flag_shrink_wrap_separate >> && optimize_function_for_speed_p (cfun)) >> rs6000_isa_flags |= OPTION_MASK_SAVE_TOC_INDIRECT; >> >> Initially, rs6000 initialize target_option_default_node with >> OPTION_MASK_SAVE_TOC_INDIRECT unset, at that time cfun is NULL >> and optimize_size is true. >> >> Later, when c parser handling function foo, it builds target >> option node as target_option_default_node in function >> handle_optimize_attribute, it does global option saving and >> verifying there as well, at that time the cfun is NULL, no >> issue is found. And function store_parm_decls allocates >> struct_function for foo then, cfun becomes function struct >> for foo, when c parser continues to handle the decl bar in >> foo, function handle_optimize_attribute works as before, >> tries to restore the target options at the end, it calls >> targetm.target_option.restore (rs6000_function_specific_restore) >> which calls function rs6000_option_override_internal again, >> at this time the cfun is not NULL while there is no cgraph >> node for its decl, optimize_function_for_speed_p returns true >> and gets the OPTION_MASK_SAVE_TOC_INDIRECT bit of flag >> rs6000_isa_flags set unexpectedly. It becomes inconsistent >> as the one saved previously. >> >> IMHO, both contexts of global and function decl foo here hold >> optimize_size, function optimize_function_for_speed_p should >> not return true anyway. >> >> btw, the aarch64 failed case also gets the unexpected >> result for optimize_function_for_speed_p during c parsing >> (fold_range_test <- ... <- c_parser_condition). >> >> IIUC, in parsing time we don't have the profile information >> available. >> > > Hi Honza, > > Does the above explanation sound reasonable to you? > Hi Honza, Gentle ping ... BR, Kewen
on 2022/8/15 16:33, Kewen.Lin via Gcc-patches wrote: > on 2022/7/11 11:42, Kewen.Lin wrote: >> on 2022/6/15 14:20, Kewen.Lin wrote: >>> Hi Honza, >>> >>> Thanks for the comments! Some replies are inlined below. >>> >>> on 2022/6/14 19:37, Jan Hubicka wrote: >>>>> Hi, >>>>> >>>>> Function optimize_function_for_size_p returns OPTIMIZE_SIZE_NO >>>>> if func->decl is not null but no cgraph node is available for it. >>>>> As PR105818 shows, this could give unexpected result. For the >>>>> case in PR105818, when parsing bar decl in function foo, the cfun >>>>> is a function structure for foo, for which there is none cgraph >>>>> node, so it returns OPTIMIZE_SIZE_NO. But it's incorrect since >>>>> the context is to optimize for size, the flag optimize_size is >>>>> true. >>>>> >>>>> The patch is to make optimize_function_for_size_p to check >>>>> optimize_size as what it does when func->decl is unavailable. >>>>> >>>>> One regression failure got exposed on aarch64-linux-gnu: >>>>> >>>>> PASS->FAIL: gcc.dg/guality/pr54693-2.c -Os \ >>>>> -DPREVENT_OPTIMIZATION line 21 x == 10 - i >>>>> >>>>> The difference comes from the macro LOGICAL_OP_NON_SHORT_CIRCUIT >>>>> used in function fold_range_test during c parsing, it uses >>>>> optimize_function_for_speed_p which is equal to the invertion >>>>> of optimize_function_for_size_p. At that time cfun->decl is valid >>>>> but no cgraph node for it, w/o this patch function >>>>> optimize_function_for_speed_p returns true eventually, while it >>>>> returns false with this patch. Since the command line option -Os >>>>> is specified, there is no reason to interpret it as "for speed". >>>>> I think this failure is expected and adjust the test case >>>>> accordingly. >>>>> >>>>> Is it ok for trunk? >>>>> >>>>> BR, >>>>> Kewen >>>>> ----- >>>>> >>>>> PR target/105818 >>>>> >>>>> gcc/ChangeLog: >>>>> >>>>> * predict.cc (optimize_function_for_size_p): Check optimize_size when >>>>> func->decl is valid but its cgraph node is unavailable. >>>>> >>>>> gcc/testsuite/ChangeLog: >>>>> >>>>> * gcc.target/powerpc/pr105818.c: New test. >>>>> * gcc.dg/guality/pr54693-2.c: Adjust for aarch64. >>>>> --- >>>>> gcc/predict.cc | 2 +- >>>>> gcc/testsuite/gcc.dg/guality/pr54693-2.c | 2 +- >>>>> gcc/testsuite/gcc.target/powerpc/pr105818.c | 9 +++++++++ >>>>> 3 files changed, 11 insertions(+), 2 deletions(-) >>>>> create mode 100644 gcc/testsuite/gcc.target/powerpc/pr105818.c >>>>> >>>>> diff --git a/gcc/predict.cc b/gcc/predict.cc >>>>> index 5734e4c8516..6c60a973236 100644 >>>>> --- a/gcc/predict.cc >>>>> +++ b/gcc/predict.cc >>>>> @@ -268,7 +268,7 @@ optimize_function_for_size_p (struct function *fun) >>>>> cgraph_node *n = cgraph_node::get (fun->decl); >>>>> if (n) >>>>> return n->optimize_for_size_p (); >>>>> - return OPTIMIZE_SIZE_NO; >>>>> + return optimize_size ? OPTIMIZE_SIZE_MAX : OPTIMIZE_SIZE_NO; >>>> >>>> We could also do (opt_for_fn (cfun->decl, optimize_size) that is >>>> probably better since one can change optimize_size with optimization >>>> attribute. >>> >>> Good point, agree! >>> >>>> However I think in most cases we check for optimize_size early I think >>>> we are doing something wrong, since at that time htere is no profile >>>> available. Why exactly PR105818 hits the flag change issue? >>> >>> For PR105818, the reason why the flag changs is that: >>> >>> Firstly, the inconsistent flag is OPTION_MASK_SAVE_TOC_INDIRECT bit >>> of rs6000_isa_flags_explicit, it's set as below: >>> >>> /* If we can shrink-wrap the TOC register save separately, then use >>> -msave-toc-indirect unless explicitly disabled. */ >>> if ((rs6000_isa_flags_explicit & OPTION_MASK_SAVE_TOC_INDIRECT) == 0 >>> && flag_shrink_wrap_separate >>> && optimize_function_for_speed_p (cfun)) >>> rs6000_isa_flags |= OPTION_MASK_SAVE_TOC_INDIRECT; >>> >>> Initially, rs6000 initialize target_option_default_node with >>> OPTION_MASK_SAVE_TOC_INDIRECT unset, at that time cfun is NULL >>> and optimize_size is true. >>> >>> Later, when c parser handling function foo, it builds target >>> option node as target_option_default_node in function >>> handle_optimize_attribute, it does global option saving and >>> verifying there as well, at that time the cfun is NULL, no >>> issue is found. And function store_parm_decls allocates >>> struct_function for foo then, cfun becomes function struct >>> for foo, when c parser continues to handle the decl bar in >>> foo, function handle_optimize_attribute works as before, >>> tries to restore the target options at the end, it calls >>> targetm.target_option.restore (rs6000_function_specific_restore) >>> which calls function rs6000_option_override_internal again, >>> at this time the cfun is not NULL while there is no cgraph >>> node for its decl, optimize_function_for_speed_p returns true >>> and gets the OPTION_MASK_SAVE_TOC_INDIRECT bit of flag >>> rs6000_isa_flags set unexpectedly. It becomes inconsistent >>> as the one saved previously. >>> >>> IMHO, both contexts of global and function decl foo here hold >>> optimize_size, function optimize_function_for_speed_p should >>> not return true anyway. >>> >>> btw, the aarch64 failed case also gets the unexpected >>> result for optimize_function_for_speed_p during c parsing >>> (fold_range_test <- ... <- c_parser_condition). >>> >>> IIUC, in parsing time we don't have the profile information >>> available. >>> >> >> Hi Honza, >> >> Does the above explanation sound reasonable to you? >> > Hi Honza, Gentle ping again ... BR, Kewen
on 2022/8/29 14:35, Kewen.Lin via Gcc-patches wrote: > on 2022/8/15 16:33, Kewen.Lin via Gcc-patches wrote: >> on 2022/7/11 11:42, Kewen.Lin wrote: >>> on 2022/6/15 14:20, Kewen.Lin wrote: >>>> Hi Honza, >>>> >>>> Thanks for the comments! Some replies are inlined below. >>>> >>>> on 2022/6/14 19:37, Jan Hubicka wrote: >>>>>> Hi, >>>>>> >>>>>> Function optimize_function_for_size_p returns OPTIMIZE_SIZE_NO >>>>>> if func->decl is not null but no cgraph node is available for it. >>>>>> As PR105818 shows, this could give unexpected result. For the >>>>>> case in PR105818, when parsing bar decl in function foo, the cfun >>>>>> is a function structure for foo, for which there is none cgraph >>>>>> node, so it returns OPTIMIZE_SIZE_NO. But it's incorrect since >>>>>> the context is to optimize for size, the flag optimize_size is >>>>>> true. >>>>>> >>>>>> The patch is to make optimize_function_for_size_p to check >>>>>> optimize_size as what it does when func->decl is unavailable. >>>>>> >>>>>> One regression failure got exposed on aarch64-linux-gnu: >>>>>> >>>>>> PASS->FAIL: gcc.dg/guality/pr54693-2.c -Os \ >>>>>> -DPREVENT_OPTIMIZATION line 21 x == 10 - i >>>>>> >>>>>> The difference comes from the macro LOGICAL_OP_NON_SHORT_CIRCUIT >>>>>> used in function fold_range_test during c parsing, it uses >>>>>> optimize_function_for_speed_p which is equal to the invertion >>>>>> of optimize_function_for_size_p. At that time cfun->decl is valid >>>>>> but no cgraph node for it, w/o this patch function >>>>>> optimize_function_for_speed_p returns true eventually, while it >>>>>> returns false with this patch. Since the command line option -Os >>>>>> is specified, there is no reason to interpret it as "for speed". >>>>>> I think this failure is expected and adjust the test case >>>>>> accordingly. >>>>>> >>>>>> Is it ok for trunk? >>>>>> >>>>>> BR, >>>>>> Kewen >>>>>> ----- >>>>>> >>>>>> PR target/105818 >>>>>> >>>>>> gcc/ChangeLog: >>>>>> >>>>>> * predict.cc (optimize_function_for_size_p): Check optimize_size when >>>>>> func->decl is valid but its cgraph node is unavailable. >>>>>> >>>>>> gcc/testsuite/ChangeLog: >>>>>> >>>>>> * gcc.target/powerpc/pr105818.c: New test. >>>>>> * gcc.dg/guality/pr54693-2.c: Adjust for aarch64. >>>>>> --- >>>>>> gcc/predict.cc | 2 +- >>>>>> gcc/testsuite/gcc.dg/guality/pr54693-2.c | 2 +- >>>>>> gcc/testsuite/gcc.target/powerpc/pr105818.c | 9 +++++++++ >>>>>> 3 files changed, 11 insertions(+), 2 deletions(-) >>>>>> create mode 100644 gcc/testsuite/gcc.target/powerpc/pr105818.c >>>>>> >>>>>> diff --git a/gcc/predict.cc b/gcc/predict.cc >>>>>> index 5734e4c8516..6c60a973236 100644 >>>>>> --- a/gcc/predict.cc >>>>>> +++ b/gcc/predict.cc >>>>>> @@ -268,7 +268,7 @@ optimize_function_for_size_p (struct function *fun) >>>>>> cgraph_node *n = cgraph_node::get (fun->decl); >>>>>> if (n) >>>>>> return n->optimize_for_size_p (); >>>>>> - return OPTIMIZE_SIZE_NO; >>>>>> + return optimize_size ? OPTIMIZE_SIZE_MAX : OPTIMIZE_SIZE_NO; >>>>> >>>>> We could also do (opt_for_fn (cfun->decl, optimize_size) that is >>>>> probably better since one can change optimize_size with optimization >>>>> attribute. >>>> >>>> Good point, agree! >>>> >>>>> However I think in most cases we check for optimize_size early I think >>>>> we are doing something wrong, since at that time htere is no profile >>>>> available. Why exactly PR105818 hits the flag change issue? >>>> >>>> For PR105818, the reason why the flag changs is that: >>>> >>>> Firstly, the inconsistent flag is OPTION_MASK_SAVE_TOC_INDIRECT bit >>>> of rs6000_isa_flags_explicit, it's set as below: >>>> >>>> /* If we can shrink-wrap the TOC register save separately, then use >>>> -msave-toc-indirect unless explicitly disabled. */ >>>> if ((rs6000_isa_flags_explicit & OPTION_MASK_SAVE_TOC_INDIRECT) == 0 >>>> && flag_shrink_wrap_separate >>>> && optimize_function_for_speed_p (cfun)) >>>> rs6000_isa_flags |= OPTION_MASK_SAVE_TOC_INDIRECT; >>>> >>>> Initially, rs6000 initialize target_option_default_node with >>>> OPTION_MASK_SAVE_TOC_INDIRECT unset, at that time cfun is NULL >>>> and optimize_size is true. >>>> >>>> Later, when c parser handling function foo, it builds target >>>> option node as target_option_default_node in function >>>> handle_optimize_attribute, it does global option saving and >>>> verifying there as well, at that time the cfun is NULL, no >>>> issue is found. And function store_parm_decls allocates >>>> struct_function for foo then, cfun becomes function struct >>>> for foo, when c parser continues to handle the decl bar in >>>> foo, function handle_optimize_attribute works as before, >>>> tries to restore the target options at the end, it calls >>>> targetm.target_option.restore (rs6000_function_specific_restore) >>>> which calls function rs6000_option_override_internal again, >>>> at this time the cfun is not NULL while there is no cgraph >>>> node for its decl, optimize_function_for_speed_p returns true >>>> and gets the OPTION_MASK_SAVE_TOC_INDIRECT bit of flag >>>> rs6000_isa_flags set unexpectedly. It becomes inconsistent >>>> as the one saved previously. >>>> >>>> IMHO, both contexts of global and function decl foo here hold >>>> optimize_size, function optimize_function_for_speed_p should >>>> not return true anyway. >>>> >>>> btw, the aarch64 failed case also gets the unexpected >>>> result for optimize_function_for_speed_p during c parsing >>>> (fold_range_test <- ... <- c_parser_condition). >>>> >>>> IIUC, in parsing time we don't have the profile information >>>> available. >>>> >>> >>> Hi Honza, >>> >>> Does the above explanation sound reasonable to you? >>> Hi Honza, Gentle ping^4 ... BR, Kewen
diff --git a/gcc/predict.cc b/gcc/predict.cc index 5734e4c8516..6c60a973236 100644 --- a/gcc/predict.cc +++ b/gcc/predict.cc @@ -268,7 +268,7 @@ optimize_function_for_size_p (struct function *fun) cgraph_node *n = cgraph_node::get (fun->decl); if (n) return n->optimize_for_size_p (); - return OPTIMIZE_SIZE_NO; + return optimize_size ? OPTIMIZE_SIZE_MAX : OPTIMIZE_SIZE_NO; } /* Return true if function FUN should always be optimized for speed. */ diff --git a/gcc/testsuite/gcc.dg/guality/pr54693-2.c b/gcc/testsuite/gcc.dg/guality/pr54693-2.c index 68aa6c63d71..14ca94ad37d 100644 --- a/gcc/testsuite/gcc.dg/guality/pr54693-2.c +++ b/gcc/testsuite/gcc.dg/guality/pr54693-2.c @@ -17,7 +17,7 @@ foo (int x, int y, int z) int i = 0; while (x > 3 && y > 3 && z > 3) { /* { dg-final { gdb-test .+2 "i" "v + 1" } } */ - /* { dg-final { gdb-test .+1 "x" "10 - i" } } */ + /* { dg-final { gdb-test .+1 "x" "10 - i" { xfail { aarch64*-*-* && { any-opts "-Os" } } } } } */ bar (i); /* { dg-final { gdb-test . "y" "20 - 2 * i" } } */ /* { dg-final { gdb-test .-1 "z" "30 - 3 * i" { xfail { aarch64*-*-* && { any-opts "-fno-fat-lto-objects" "-Os" } } } } } */ i++, x--, y -= 2, z -= 3; diff --git a/gcc/testsuite/gcc.target/powerpc/pr105818.c b/gcc/testsuite/gcc.target/powerpc/pr105818.c new file mode 100644 index 00000000000..18781edbf9e --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr105818.c @@ -0,0 +1,9 @@ +/* { dg-options "-Os -fno-tree-vectorize" } */ + +#pragma GCC optimize "-fno-tree-vectorize" + +void +foo (void) +{ + void bar (void); +}