Message ID | 20140518205956.GG1828@kam.mff.cuni.cz |
---|---|
State | New |
Headers | show |
On 05/18/2014 02:59 PM, Jan Hubicka wrote: > Sandra, >> This patch seems quite similar in purpose to the >> remove_local_statics optimization that Mentor has proposed, although >> the implementation is quite different. Here is the last version of >> our patch, prepared by Bernd Schmidt last year: >> >> https://gcc.gnu.org/ml/gcc-patches/2013-06/msg00317.html > > Thanks for pointer, I did not notice this patch! > The approach is indeed very different. So the patch works on function basis > and cares only about local statics of functions that was not inlined? Yes. I should probably mention here that we did the analysis and initial implementation of this optimization 7+ years ago against GCC 4.2, and in some cases we were being conservative in deciding the optimization was not valid because the information required for more detailed analysis wasn't being collected in the right place back then, etc. >> The failing tests are remove-local-statics-{4,5,7,12,14b}.c. > > +/* Verify that we don't eliminate a global static variable. */ > + > +/* { dg-do compile } */ > +/* { dg-options "-O2" } */ > +/* { dg-final { scan-assembler "global_static" } } */ > + > +static int global_static; > + > +int > +test1 (int x) > +{ > + global_static = x; > + > + return global_static + x; > +} > > here test1 optimizes into > > global_static=x; > return x+x; > > this makes global_static write only and thus it is correctly eliminated. > So I think this testcase is bogus. Yes, I agree that this one was for a restriction of our implementation approach. > +++ b/gcc/testsuite/gcc.dg/remove-local-statics-5.c > @@ -0,0 +1,24 @@ > +/* Verify that we do not eliminate a static local variable whose uses > + are dominated by a def when the function calls setjmp. */ > + > +/* { dg-do compile } */ > +/* { dg-options "-O2" } */ > +/* { dg-final { scan-assembler "thestatic" } } */ > + > +#include <setjmp.h> > + > +int > +foo (int x) > +{ > + static int thestatic; > + int retval; > + jmp_buf env; > + > + thestatic = x; > + > + retval = thestatic + x; > + > + setjmp (env); > + > + return retval; > +} > > I belive this is similar case. I do not see setjmp changing anything here, since > local optimizers turns retval = x+x; > What it was intended to test? Hmmmm, I'm guessing this was some concern about invalid code motion around a setjmp. Our original analysis document lists "F does not call setjmp" as a requirement for the optimization, so this was probably a case where we were being excessively conservative. > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/remove-local-statics-7.c > @@ -0,0 +1,19 @@ > +/* Verify that we eliminate a static local variable where it is defined > + along all paths leading to a use. */ > + > +/* { dg-do compile } */ > +/* { dg-options "-O2" } */ > +/* { dg-final { scan-assembler-not "thestatic" } } */ > + > +int > +test1 (int x) > +{ > + static int thestatic; > + > + if (x < 0) > + thestatic = x; > + else > + thestatic = -x; > + > + return thestatic + x; > +} > > Here we get after early optimizations: > > int > test1 (int x) > { > static int thestatic; > int thestatic.0_5; > int thestatic.1_7; > int _8; > > <bb 2>: > if (x_2(D) < 0) > goto <bb 3>; > else > goto <bb 4>; > > <bb 3>: > thestatic = x_2(D); > goto <bb 5>; > > <bb 4>: > thestatic.0_5 = -x_2(D); > thestatic = thestatic.0_5; > > <bb 5>: > thestatic.1_7 = thestatic; > _8 = thestatic.1_7 + x_2(D); > return _8; > > } > > and thus we still have bogus read from thestatic. Because my analysis works at IPA level, > we won't benefit from fact that dom2 eventually cleans it up as: > int > test1 (int x) > { > static int thestatic; > int thestatic.0_5; > int thestatic.1_7; > int _8; > int prephitmp_10; > > <bb 2>: > if (x_2(D) < 0) > goto <bb 3>; > else > goto <bb 4>; > > <bb 3>: > thestatic = x_2(D); > goto <bb 5>; > > <bb 4>: > thestatic.0_5 = -x_2(D); > thestatic = thestatic.0_5; > > <bb 5>: > # prephitmp_10 = PHI <x_2(D)(3), thestatic.0_5(4)> > thestatic.1_7 = prephitmp_10; > _8 = thestatic.1_7 + x_2(D); > return _8; > > } > > Richi, is there a way to teach early FRE to get this transformation? > I see it is a partial redundancy problem... Hmmmm, bummer that we don't get this one for free. :-( > +/* Verify that we do not eliminate a static variable when it is declared > + in a function that has nested functions. */ > + > +/* { dg-do compile } */ > +/* { dg-options "-O2" } */ > +/* { dg-final { scan-assembler "thestatic" } } */ > + > +int test1 (int x) > +{ > + static int thestatic; > + > + int nested_test1 (int x) > + { > + return x + thestatic; > + } > + > + thestatic = x; > + > + return thestatic + x + nested_test1 (x); > +} > > Here we work hard enough to optimize test1 as: > int > test1 (int x) > { > static int thestatic; > int _4; > int _5; > > <bb 2>: > thestatic = x_2(D); > _4 = x_2(D) + x_2(D); > _5 = _4 + _4; > return _5; > > } > > thus inlining nested_test1 during early optimization. This makes the removal valid. Yes. This is one we had to punt on due to the one-function-at-a-time approach. > +/* Verify that we do not eliminate a static local variable if the function > + containing it is inlined. */ > + > +/* { dg-do compile } */ > +/* { dg-options "-O2" } */ > +/* { dg-final { scan-assembler "thestatic" } } */ > + > +int > +test2 (int x) > +{ > + if (x < 0) > + return 0; > + else > + return test1 (x - 1); > +} > + > +inline int > +test1 (int x) > +{ > + static int thestatic; > + int y; > + > + thestatic = x; > + > + y = test2 (thestatic - 1); > + > + return y + x; > +} > > Here thestatic becomes write only during early optimization, so again we can correctly eliminate it. OK. > Sandra, > do you think you can drop the testcases that are not valid and commit the valid one minus > remove-local-statics-7.c for which we can fill in enhancement request? OK. Keep the original numbering or re-number them to fill up the holes left by the deletions? > For cases like local-statics-7 your approach can be "saved" by adding simple IPA analysis > to look for static vars that are used only by one function and keeping your DSE code active > for them, so we can still get rid of this special case assignments during late compilation. > I am however not quite convinced it is worth the effort - do you have some real world > cases where it helps? Um, the well-known benchmark. ;-) > I am rather thinking about cutting the passmanager queue once again after main > tree optimization and re-running IPA unreachable code removal after them. This > should help with rather common cases where we optimize out code as effect > of inlining. > > This would basically mean running pass_all_optimizations from late IPA pass > and scheduling one extra fixup_cfg and perhaps DCE pass at begginig of > pass_all_optimizations. > > Honza > -Sandra
> > Hmmmm, I'm guessing this was some concern about invalid code motion > around a setjmp. Our original analysis document lists "F does not > call setjmp" as a requirement for the optimization, so this was > probably a case where we were being excessively conservative. I suppose it was because you needed to prove that the value stored in static variable is dead at the function return and setjmp may let you to jump from somewhere else. This should transparently work with mainline approach. > > > >Richi, is there a way to teach early FRE to get this transformation? > >I see it is a partial redundancy problem... > > Hmmmm, bummer that we don't get this one for free. :-( Yeah, lets not forget about this one. On the plus side we got quite few cases your code didn't ;) > >Sandra, > >do you think you can drop the testcases that are not valid and commit the valid one minus > >remove-local-statics-7.c for which we can fill in enhancement request? > > OK. Keep the original numbering or re-number them to fill up the > holes left by the deletions? Since the original testcases never hit mainline, I would preffer them to be renumbered ;) > > >For cases like local-statics-7 your approach can be "saved" by adding simple IPA analysis > >to look for static vars that are used only by one function and keeping your DSE code active > >for them, so we can still get rid of this special case assignments during late compilation. > >I am however not quite convinced it is worth the effort - do you have some real world > >cases where it helps? > > Um, the well-known benchmark. ;-) Very informative, does my implementation handle it well? ;) I suppose for benchmarks using static where they should not, the analysis that static is used only in one function and pass to turn it into automatic variable would still make sense. The approach removing write only variables and relying on FRE to completely clean up is bit fragile by requiring very complex machinery to work perfectly... Honza > > >I am rather thinking about cutting the passmanager queue once again after main > >tree optimization and re-running IPA unreachable code removal after them. This > >should help with rather common cases where we optimize out code as effect > >of inlining. > > > >This would basically mean running pass_all_optimizations from late IPA pass > >and scheduling one extra fixup_cfg and perhaps DCE pass at begginig of > >pass_all_optimizations. > > > >Honza > > > > -Sandra >
On Sun, May 18, 2014 at 10:59 PM, Jan Hubicka <hubicka@ucw.cz> wrote: > Sandra, >> This patch seems quite similar in purpose to the >> remove_local_statics optimization that Mentor has proposed, although >> the implementation is quite different. Here is the last version of >> our patch, prepared by Bernd Schmidt last year: >> >> https://gcc.gnu.org/ml/gcc-patches/2013-06/msg00317.html > > Thanks for pointer, I did not notice this patch! > The approach is indeed very different. So the patch works on function basis > and cares only about local statics of functions that was not inlined? >> >> I think we can drop our patch from our local tree now, but it >> includes a large number of test cases which I think are worth >> keeping on mainline. A few of them fail with your implementation, >> though -- which might be genuine bugs, or just different limitations >> of the two approaches. Can you take a look? >> >> The failing tests are remove-local-statics-{4,5,7,12,14b}.c. > > +/* Verify that we don't eliminate a global static variable. */ > + > +/* { dg-do compile } */ > +/* { dg-options "-O2" } */ > +/* { dg-final { scan-assembler "global_static" } } */ > + > +static int global_static; > + > +int > +test1 (int x) > +{ > + global_static = x; > + > + return global_static + x; > +} > > here test1 optimizes into > > global_static=x; > return x+x; > > this makes global_static write only and thus it is correctly eliminated. > So I think this testcase is bogus. > > +++ b/gcc/testsuite/gcc.dg/remove-local-statics-5.c > @@ -0,0 +1,24 @@ > +/* Verify that we do not eliminate a static local variable whose uses > + are dominated by a def when the function calls setjmp. */ > + > +/* { dg-do compile } */ > +/* { dg-options "-O2" } */ > +/* { dg-final { scan-assembler "thestatic" } } */ > + > +#include <setjmp.h> > + > +int > +foo (int x) > +{ > + static int thestatic; > + int retval; > + jmp_buf env; > + > + thestatic = x; > + > + retval = thestatic + x; > + > + setjmp (env); > + > + return retval; > +} > > I belive this is similar case. I do not see setjmp changing anything here, since > local optimizers turns retval = x+x; > What it was intended to test? > > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/remove-local-statics-7.c > @@ -0,0 +1,19 @@ > +/* Verify that we eliminate a static local variable where it is defined > + along all paths leading to a use. */ > + > +/* { dg-do compile } */ > +/* { dg-options "-O2" } */ > +/* { dg-final { scan-assembler-not "thestatic" } } */ > + > +int > +test1 (int x) > +{ > + static int thestatic; > + > + if (x < 0) > + thestatic = x; > + else > + thestatic = -x; > + > + return thestatic + x; > +} > > Here we get after early optimizations: > > int > test1 (int x) > { > static int thestatic; > int thestatic.0_5; > int thestatic.1_7; > int _8; > > <bb 2>: > if (x_2(D) < 0) > goto <bb 3>; > else > goto <bb 4>; > > <bb 3>: > thestatic = x_2(D); > goto <bb 5>; > > <bb 4>: > thestatic.0_5 = -x_2(D); > thestatic = thestatic.0_5; > > <bb 5>: > thestatic.1_7 = thestatic; > _8 = thestatic.1_7 + x_2(D); > return _8; > > } > > and thus we still have bogus read from thestatic. Because my analysis works at IPA level, > we won't benefit from fact that dom2 eventually cleans it up as: > int > test1 (int x) > { > static int thestatic; > int thestatic.0_5; > int thestatic.1_7; > int _8; > int prephitmp_10; > > <bb 2>: > if (x_2(D) < 0) > goto <bb 3>; > else > goto <bb 4>; > > <bb 3>: > thestatic = x_2(D); > goto <bb 5>; > > <bb 4>: > thestatic.0_5 = -x_2(D); > thestatic = thestatic.0_5; > > <bb 5>: > # prephitmp_10 = PHI <x_2(D)(3), thestatic.0_5(4)> > thestatic.1_7 = prephitmp_10; > _8 = thestatic.1_7 + x_2(D); > return _8; > > } > > Richi, is there a way to teach early FRE to get this transformation? > I see it is a partial redundancy problem... Yeah, nothing FRE can do about (needs insert of a PHI node). Eventually for this particular case running phiopt early would solve it. > +/* Verify that we do not eliminate a static variable when it is declared > + in a function that has nested functions. */ > + > +/* { dg-do compile } */ > +/* { dg-options "-O2" } */ > +/* { dg-final { scan-assembler "thestatic" } } */ > + > +int test1 (int x) > +{ > + static int thestatic; > + > + int nested_test1 (int x) > + { > + return x + thestatic; > + } > + > + thestatic = x; > + > + return thestatic + x + nested_test1 (x); > +} > > Here we work hard enough to optimize test1 as: > int > test1 (int x) > { > static int thestatic; > int _4; > int _5; > > <bb 2>: > thestatic = x_2(D); > _4 = x_2(D) + x_2(D); > _5 = _4 + _4; > return _5; > > } > > thus inlining nested_test1 during early optimization. This makes the removal valid. > > +/* Verify that we do not eliminate a static local variable if the function > + containing it is inlined. */ > + > +/* { dg-do compile } */ > +/* { dg-options "-O2" } */ > +/* { dg-final { scan-assembler "thestatic" } } */ > + > +int > +test2 (int x) > +{ > + if (x < 0) > + return 0; > + else > + return test1 (x - 1); > +} > + > +inline int > +test1 (int x) > +{ > + static int thestatic; > + int y; > + > + thestatic = x; > + > + y = test2 (thestatic - 1); > + > + return y + x; > +} > > Here thestatic becomes write only during early optimization, so again we can correctly eliminate it. > > Sandra, > do you think you can drop the testcases that are not valid and commit the valid one minus > remove-local-statics-7.c for which we can fill in enhancement request? > > For cases like local-statics-7 your approach can be "saved" by adding simple IPA analysis > to look for static vars that are used only by one function and keeping your DSE code active > for them, so we can still get rid of this special case assignments during late compilation. > I am however not quite convinced it is worth the effort - do you have some real world > cases where it helps? > > I am rather thinking about cutting the passmanager queue once again after main > tree optimization and re-running IPA unreachable code removal after them. This > should help with rather common cases where we optimize out code as effect > of inlining. > > This would basically mean running pass_all_optimizations from late IPA pass > and scheduling one extra fixup_cfg and perhaps DCE pass at begginig of > pass_all_optimizations. > > Honza
On Sun, 18 May 2014, Sandra Loosemore wrote: > Hmmmm, I'm guessing this was some concern about invalid code motion around a > setjmp. Our original analysis document lists "F does not call setjmp" as a > requirement for the optimization, so this was probably a case where we were > being excessively conservative. Reading my internal analysis from 24 Nov 2005, it was in terms of converting a function-local static variable to automatic. The concern may well have been about "the values of objects of automatic storage duration that are local to the function containing the invocation of the corresponding setjmp macro that do not have volatile-qualified type and have been changed between the setjmp invocation and longjmp call are indeterminate" meaning that program that's valid when the variable is static can become invalid on converting it to automatic. Of course that depends on the particular uses of the variable (whether it's possible for it to be changed between the two calls); I think we can presume the existing optimizations (that deal with everything except the final store) already deal with keeping the transformations valid in the presence of setjmp / longjmp.
On 05/18/2014 08:45 PM, Sandra Loosemore wrote: > On 05/18/2014 02:59 PM, Jan Hubicka wrote: >> For cases like local-statics-7 your approach can be "saved" by adding >> simple IPA analysis >> to look for static vars that are used only by one function and keeping >> your DSE code active >> for them, so we can still get rid of this special case assignments >> during late compilation. >> I am however not quite convinced it is worth the effort - do you have >> some real world >> cases where it helps? > > Um, the well-known benchmark. ;-) I looked at the generated code for this benchmark and see that your patch is indeed not getting rid of all the pointless static variables, while ours is, so this is quite disappointing. I'm thinking now that we will still need to retain our patch at least locally, although we'd really like to get it on trunk if possible. Here is another testcase vaguely based on the same kind of signal-processing algorithm as the benchmark, but coded without reference to it. I have arm-none-eabi builds around for both 4.9.0 with our remove_local_statics patch applied, and trunk with your patch. With -O2, our optimization produces 23 instructions and gets rid of all 3 statics, yours produces 33 and only gets rid of 1 of them. Of course it's lame to use static variables in this way, but OTOH it's lame if GCC can't recognize them and optimize them away, too. -Sandra
> On 05/18/2014 08:45 PM, Sandra Loosemore wrote: > >On 05/18/2014 02:59 PM, Jan Hubicka wrote: > >>For cases like local-statics-7 your approach can be "saved" by adding > >>simple IPA analysis > >>to look for static vars that are used only by one function and keeping > >>your DSE code active > >>for them, so we can still get rid of this special case assignments > >>during late compilation. > >>I am however not quite convinced it is worth the effort - do you have > >>some real world > >>cases where it helps? > > > >Um, the well-known benchmark. ;-) > > I looked at the generated code for this benchmark and see that your > patch is indeed not getting rid of all the pointless static > variables, while ours is, so this is quite disappointing. I'm > thinking now that we will still need to retain our patch at least > locally, although we'd really like to get it on trunk if possible. Yours patch can really be improved by adding simple IPA analysis to work out what variables are refernced by one function only instead of using not-longer-that-relevant local static info. As last IPA pass for each static variable with no address taken, look at all references and see if they all come from one function or functions inlined to it. > > Here is another testcase vaguely based on the same kind of > signal-processing algorithm as the benchmark, but coded without > reference to it. I have arm-none-eabi builds around for both 4.9.0 > with our remove_local_statics patch applied, and trunk with your > patch. With -O2, our optimization produces 23 instructions and gets > rid of all 3 statics, yours produces 33 and only gets rid of 1 of > them. > > Of course it's lame to use static variables in this way, but OTOH > it's lame if GCC can't recognize them and optimize them away, too. > > -Sandra > > void > fir (int *coeffs, int coeff_length, int *input, int input_length, int *output) > { > static int *coeffp; > static int *inputp; > static int *outputp; Here inputp read is rmeoved only at 101.dceloop1 pass. That is really late. coeffp is removed late, too. > int i, c, acc; > > for (i = 0; i < input_length; i++) > { > coeffp = coeffs; > inputp = input + i; > outputp = output + i; > acc = 0; > for (c = 0; c < coeff_length; c++) > { > acc += *coeffp * *inputp; > coeffp++; > inputp--; > } It seems like AA can not work out the fact that inputp is unchanged until that late. Richi, any ideas? Honza > *outputp = acc; > } > } > >
--- /dev/null +++ b/gcc/testsuite/gcc.dg/remove-local-statics-7.c @@ -0,0 +1,19 @@ +/* Verify that we eliminate a static local variable where it is defined + along all paths leading to a use. */ + +/* { dg-do compile } */ +/* { dg-options "-O2" } */ +/* { dg-final { scan-assembler-not "thestatic" } } */ + +int +test1 (int x) +{ + static int thestatic; + + if (x < 0) + thestatic = x; + else + thestatic = -x; + + return thestatic + x; +}