Patchwork [rtl-optimization] : Fix PR 45223, RTL PRE GCSE pass hoists trapping insn out of loop

login
register
mail settings
Submitter Uros Bizjak
Date Aug. 7, 2010, 3:25 p.m.
Message ID <AANLkTin6ho5bxdOWBTU59mrj5Y_QRqOYea60ZqFJR=Bk@mail.gmail.com>
Download mbox | patch
Permalink /patch/61181/
State New
Headers show

Comments

Uros Bizjak - Aug. 7, 2010, 3:25 p.m.
Hello!

Attached patch fixes wrong hoisting of trapping insns out from the
loop.  As commented in the original PR [1], the problem is in PRE GCSE
pass that does not look if the moved instruction can trap.  The
original problem reliably triggers on targets, where modulo is a
single HW insn (as in case of moxie-elf target), but the problem can
also be shown on x86_64-pc-linux-gnu with slightly modified testcase
[2].

2010-08-07  Uros Bizjak  <ubizjak@gmail.com>

	PR rtl-optimization/45223
	* gcse.c (compute_hash_table_work): Skip insns that may trap when
	building hash table.

The patch was bootstrapped and regression tested on
x86_64-pc-linux-gnu {,-m32} on gcc-4_5 branch. I have also checked
that patched gcc produces correct asm for moxie and for modified FP
case. In the modified FP case, --ffast-math still hoists FP division
out of the loop (allowed by may_trap_p predicate).

H.J. will check the performance effect of the patch on the SPEC, but
--ffast-math compilation seems to be unaffected.

OK for mainline and release branches?

[1] http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38819#c16
[2] http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45223#c2

Uros.
H.J. Lu - Aug. 10, 2010, 4:11 p.m.
On Sat, Aug 7, 2010 at 8:25 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
> Hello!
>
> Attached patch fixes wrong hoisting of trapping insns out from the
> loop.  As commented in the original PR [1], the problem is in PRE GCSE
> pass that does not look if the moved instruction can trap.  The
> original problem reliably triggers on targets, where modulo is a
> single HW insn (as in case of moxie-elf target), but the problem can
> also be shown on x86_64-pc-linux-gnu with slightly modified testcase
> [2].
>
> 2010-08-07  Uros Bizjak  <ubizjak@gmail.com>
>
>        PR rtl-optimization/45223
>        * gcse.c (compute_hash_table_work): Skip insns that may trap when
>        building hash table.
>
> The patch was bootstrapped and regression tested on
> x86_64-pc-linux-gnu {,-m32} on gcc-4_5 branch. I have also checked
> that patched gcc produces correct asm for moxie and for modified FP
> case. In the modified FP case, --ffast-math still hoists FP division
> out of the loop (allowed by may_trap_p predicate).
>
> H.J. will check the performance effect of the patch on the SPEC, but
> --ffast-math compilation seems to be unaffected.
>

Here are the ration of before and after on Intel Core i7. Gzip slowed
down by 10 to 20%.

32bit: -O2:

164.gzip 			 -23.8895%
175.vpr 			 -0.0407%
176.gcc 			 0.362131%
181.mcf 			 -2.74021%
186.crafty 			 0.0768344%
197.parser 			 -0.104384%
252.eon 			 -0.0320718%
253.perlbmk 			 -0.910384%
254.gap 			 0.0637146%
255.vortex 			 -0.0968679%
256.bzip2 			 -4.41989%
300.twolf 			 1.3125%
SPECint_base2000 			 -2.79381%

168.wupwise 			 -4.88964%
171.swim 			 -1.05833%
172.mgrid 			 0.0604595%
173.applu 			 -0.326158%
177.mesa 			 0.479386%
178.galgel 			 0.076266%
179.art 			 0.0865333%
183.equake 			 -0.0255297%
187.facerec 			 -0.223364%
188.ammp 			 0.237047%
189.lucas 			 -0.719759%
191.fma3d 			 -0.297354%
200.sixtrack 			 0.0784314%
301.apsi 			 -1.1%
SPECfp_base2000 			 -0.552806%

400.perlbench 			 -0.763359%
401.bzip2 			 -0.641026%
403.gcc 			 0.823045%
429.mcf 			 0%
445.gobmk 			 0.45045%
456.hmmer 			 -0.578035%
458.sjeng 			 -1.82648%
462.libquantum 			 0%
464.h264ref 			 0%
471.omnetpp 			 2.06186%
473.astar 			 1.38889%
483.xalancbmk 			 0.787402%
SPECint(R)_base2006 			 0%

410.bwaves 			 0.35461%
416.gamess 			 0%
433.milc 			 4.14508%
434.zeusmp 			 3.80952%
435.gromacs 			 0.649351%
436.cactusADM 			 4.54545%
437.leslie3d 			 3.1746%
444.namd 			 0.606061%
447.dealII 			 0%
450.soplex 			 2.77778%
453.povray 			 -0.456621%
454.calculix 			 -0.3663%
459.GemsFDTD 			 0.52356%
465.tonto 			 1.27389%
470.lbm 			 -0.227273%
481.wrf 			 0%
482.sphinx3 			 0.331126%
SPECfp(R)_base2006 			 1.52284%

32bit: -O3

164.gzip 			 -8.45154%
175.vpr 			 0.401768%
176.gcc 			 0.0254065%
181.mcf 			 2.13227%
186.crafty 			 -0.388048%
197.parser 			 0.179856%
252.eon 			 0.632911%
253.perlbmk 			 -0.721587%
254.gap 			 -0.326264%
255.vortex 			 -0.25102%
256.bzip2 			 -0.317749%
300.twolf 			 0.217189%
SPECint_base2000 			 -0.606289%

168.wupwise 			 0.720339%
171.swim 			 -1.60307%
172.mgrid 			 0.131637%
173.applu 			 -0.825627%
177.mesa 			 -0.0324992%
178.galgel 			 0.147232%
179.art 			 0.234982%
183.equake 			 0.167609%
187.facerec 			 -0.624721%
188.ammp 			 -0.0332005%
189.lucas 			 -0.908033%
191.fma3d 			 0.572738%
200.sixtrack 			 0.384615%
301.apsi 			 -0.258065%
SPECfp_base2000 			 -0.139548%

400.perlbench 			 1.13636%
401.bzip2 			 0.632911%
403.gcc 			 -1.51515%
429.mcf 			 -0.869565%
445.gobmk 			 -0.913242%
456.hmmer 			 0%
458.sjeng 			 0%
462.libquantum 			 -0.234742%
464.h264ref 			 0%
471.omnetpp 			 0%
473.astar 			 -1.96078%
483.xalancbmk 			 -0.355872%
SPECint(R)_base2006 			 -0.414938%

410.bwaves 			 -0.353357%
416.gamess 			 0%
433.milc 			 -2.27273%
434.zeusmp 			 -2.28311%
435.gromacs 			 0%
436.cactusADM 			 -4.29185%
437.leslie3d 			 -0.4329%
444.namd 			 0%
447.dealII 			 0%
450.soplex 			 -1.0101%
453.povray 			 -0.456621%
454.calculix 			 -0.636943%
459.GemsFDTD 			 0%
465.tonto 			 0%
470.lbm 			 0%
481.wrf 			 0%
482.sphinx3 			 -0.623053%
SPECfp(R)_base2006 			 -0.884956%

64bit: -O2

164.gzip 			 -8.81881%
175.vpr 			 0.121753%
176.gcc 			 -1.31579%
181.mcf 			 1.74008%
186.crafty 			 0.18272%
197.parser 			 0.618238%
252.eon 			 0.82801%
253.perlbmk 			 1.38561%
254.gap 			 -0.563631%
255.vortex 			 -0.640342%
256.bzip2 			 -9.00375%
300.twolf 			 0.442478%
SPECint_base2000 			 -1.31644%

168.wupwise 			 -3.92157%
171.swim 			 -0.508553%
172.mgrid 			 -0.0758438%
173.applu 			 -0.745199%
177.mesa 			 -0.361613%
178.galgel 			 -0.194986%
179.art 			 -0.289017%
183.equake 			 -0.180424%
187.facerec 			 -0.399167%
188.ammp 			 1.63432%
189.lucas 			 -0.172444%
191.fma3d 			 -0.388199%
200.sixtrack 			 0.0789266%
301.apsi 			 -0.175644%
SPECfp_base2000 			 -0.412859%

400.perlbench 			 0.363636%
401.bzip2 			 5.37634%
403.gcc 			 -1.63265%
429.mcf 			 0%
445.gobmk 			 -0.416667%
456.hmmer 			 0%
458.sjeng 			 0%
462.libquantum 			 -0.430108%
464.h264ref 			 -3.00546%
471.omnetpp 			 -0.537634%
473.astar 			 -1.24224%
483.xalancbmk 			 -0.373134%
SPECint(R)_base2006 			 0%

410.bwaves 			 -0.2849%
416.gamess 			 0%
433.milc 			 -3.16742%
434.zeusmp 			 -3.47826%
435.gromacs 			 0%
436.cactusADM 			 -4.29448%
437.leslie3d 			 -0.956938%
444.namd 			 -0.555556%
447.dealII 			 0.3367%
450.soplex 			 -0.986842%
453.povray 			 -0.373134%
454.calculix 			 0%
459.GemsFDTD 			 -0.373134%
465.tonto 			 0%
470.lbm 			 0.394477%
481.wrf 			 -0.561798%
482.sphinx3 			 0.3125%
SPECfp(R)_base2006 			 -0.83682%

64bit: -O3

164.gzip 			 -18.5222%
175.vpr 			 0.404694%
176.gcc 			 0.217569%
181.mcf 			 -1.29206%
186.crafty 			 0.580016%
197.parser 			 0.434153%
252.eon 			 0.40678%
253.perlbmk 			 2.09394%
254.gap 			 0.115307%
255.vortex 			 0.59927%
256.bzip2 			 -7.56539%
300.twolf 			 0.283197%
SPECint_base2000 			 -2.02524%

168.wupwise 			 -1.3712%
171.swim 			 0.0453104%
172.mgrid 			 0.182482%
173.applu 			 -1.29227%
177.mesa 			 0.24187%
178.galgel 			 -0.531777%
179.art 			 -0.210128%
183.equake 			 -0.86905%
187.facerec 			 0.356295%
188.ammp 			 -0.420032%
189.lucas 			 0.25%
191.fma3d 			 -1.63522%
200.sixtrack 			 0.228137%
301.apsi 			 0.387382%
SPECfp_base2000 			 -0.332982%

400.perlbench 			 1.87266%
401.bzip2 			 0%
403.gcc 			 -0.4%
429.mcf 			 -1.2%
445.gobmk 			 -0.851064%
456.hmmer 			 0.420168%
458.sjeng 			 0%
462.libquantum 			 -0.365631%
464.h264ref 			 0.271003%
471.omnetpp 			 0.531915%
473.astar 			 -2.92398%
483.xalancbmk 			 0.357143%
SPECint(R)_base2006 			 0%

410.bwaves 			 -0.57971%
416.gamess 			 -0.404858%
433.milc 			 -2.21239%
434.zeusmp 			 -3.30579%
435.gromacs 			 0.47619%
436.cactusADM 			 -2.38095%
437.leslie3d 			 0%
444.namd 			 -0.555556%
447.dealII 			 0%
450.soplex 			 -1.28205%
453.povray 			 -0.392157%
454.calculix 			 -0.462963%
459.GemsFDTD 			 -0.369004%
465.tonto 			 0.458716%
470.lbm 			 -0.19685%
481.wrf 			 0%
482.sphinx3 			 0.837989%
SPECfp(R)_base2006 			 -0.37594%
Uros Bizjak - Aug. 12, 2010, 10:01 a.m.
On Tue, Aug 10, 2010 at 6:11 PM, H.J. Lu <hjl.tools@gmail.com> wrote:

>> Attached patch fixes wrong hoisting of trapping insns out from the
>> loop.  As commented in the original PR [1], the problem is in PRE GCSE
>> pass that does not look if the moved instruction can trap.  The
>> original problem reliably triggers on targets, where modulo is a
>> single HW insn (as in case of moxie-elf target), but the problem can
>> also be shown on x86_64-pc-linux-gnu with slightly modified testcase
>> [2].
>>
>> 2010-08-07  Uros Bizjak  <ubizjak@gmail.com>
>>
>>        PR rtl-optimization/45223
>>        * gcse.c (compute_hash_table_work): Skip insns that may trap when
>>        building hash table.
>>
>> The patch was bootstrapped and regression tested on
>> x86_64-pc-linux-gnu {,-m32} on gcc-4_5 branch. I have also checked
>> that patched gcc produces correct asm for moxie and for modified FP
>> case. In the modified FP case, --ffast-math still hoists FP division
>> out of the loop (allowed by may_trap_p predicate).
>>
>> H.J. will check the performance effect of the patch on the SPEC, but
>> --ffast-math compilation seems to be unaffected.
>>
>
> Here are the ration of before and after on Intel Core i7. Gzip slowed
> down by 10 to 20%.
>
> 32bit: -O2:
>
> 164.gzip                         -23.8895%

Ouch.

So, patch retraced, see also comment in [1]:

"The problem is that this is hard to fix without pessimizing the common case."

Please also note, that due to your results, it looks that RTL
optimizer optimizes gzip in the wrong way. Tree optimizer left the
division inside the loop, but RTL optimizer hoisted the division out.
Probably the right thing to do, but this can't be proved (otherwise
tree optimizer would hoist the division).

There is nothing that can be done here. If we fix RTL optimizer, there
will be severe degradation in integer code. FP is OK with
--ffast-math.

I have added some CCs that would eventually be interested in this issue.

[1]: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38819#c20
Richard Guenther - Aug. 12, 2010, 10:34 a.m.
On Thu, Aug 12, 2010 at 12:01 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
> On Tue, Aug 10, 2010 at 6:11 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>
>>> Attached patch fixes wrong hoisting of trapping insns out from the
>>> loop.  As commented in the original PR [1], the problem is in PRE GCSE
>>> pass that does not look if the moved instruction can trap.  The
>>> original problem reliably triggers on targets, where modulo is a
>>> single HW insn (as in case of moxie-elf target), but the problem can
>>> also be shown on x86_64-pc-linux-gnu with slightly modified testcase
>>> [2].
>>>
>>> 2010-08-07  Uros Bizjak  <ubizjak@gmail.com>
>>>
>>>        PR rtl-optimization/45223
>>>        * gcse.c (compute_hash_table_work): Skip insns that may trap when
>>>        building hash table.
>>>
>>> The patch was bootstrapped and regression tested on
>>> x86_64-pc-linux-gnu {,-m32} on gcc-4_5 branch. I have also checked
>>> that patched gcc produces correct asm for moxie and for modified FP
>>> case. In the modified FP case, --ffast-math still hoists FP division
>>> out of the loop (allowed by may_trap_p predicate).
>>>
>>> H.J. will check the performance effect of the patch on the SPEC, but
>>> --ffast-math compilation seems to be unaffected.
>>>
>>
>> Here are the ration of before and after on Intel Core i7. Gzip slowed
>> down by 10 to 20%.
>>
>> 32bit: -O2:
>>
>> 164.gzip                         -23.8895%
>
> Ouch.
>
> So, patch retraced, see also comment in [1]:
>
> "The problem is that this is hard to fix without pessimizing the common case."
>
> Please also note, that due to your results, it looks that RTL
> optimizer optimizes gzip in the wrong way. Tree optimizer left the
> division inside the loop, but RTL optimizer hoisted the division out.
> Probably the right thing to do, but this can't be proved (otherwise
> tree optimizer would hoist the division).
>
> There is nothing that can be done here. If we fix RTL optimizer, there
> will be severe degradation in integer code. FP is OK with
> --ffast-math.
>
> I have added some CCs that would eventually be interested in this issue.
>
> [1]: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38819#c20

The fix is to teach LIM to do conditional invariant motion.

Richard.

Patch

Index: gcse.c
===================================================================
--- gcse.c	(revision 162975)
+++ gcse.c	(working copy)
@@ -1693,7 +1693,7 @@  compute_hash_table_work (struct hash_tab
 
       /* The next pass builds the hash table.  */
       FOR_BB_INSNS (current_bb, insn)
-	if (INSN_P (insn))
+	if (INSN_P (insn) && !may_trap_p (PATTERN (insn)))
 	  hash_scan_insn (insn, table);
     }