Message ID | 4FEB1F9C.40004@ispras.ru |
---|---|
State | New |
Headers | show |
On Wed, Jun 27, 2012 at 4:58 PM, Dmitry Melnik <dm@ispras.ru> wrote: > This patch can be applied to current trunk and passes regtest successfully > on qemu-arm. > Maybe it will be good to have it in trunk? > If everybody agrees, we can take care of committing it. If the patch is approved, can you please add a brief comment before the define_split to explain why it's there and what it does? Ciao! Steven
On Wed, 27 Jun 2012 18:58:36 +0400 Dmitry Melnik <dm@ispras.ru> wrote: > This patch can be applied to current trunk and passes regtest > successfully on qemu-arm. > Maybe it will be good to have it in trunk? > If everybody agrees, we can take care of committing it. No objection from me (as the original author), FWIW. Thanks! Julian
On 27/06/12 15:58, Dmitry Melnik wrote: > Hi, > > We'd like to note about CodeSourcery's patch for ARM backend, from which > GCC mainline can gain 4% on SPEC2K INT: > http://cgit.openembedded.org/openembedded/plain/recipes/gcc/gcc-4.5/linaro/gcc-4.5-linaro-r99369.patch > (also the patch is attached). > > Originally, we noticed that GNU Go works 6% faster on cortex-a8 with > -fno-gcse. After profiling we found that this is most likely caused by > cache misses when accessing global variables. GCC generates ldr > instructions for them, while this can be avoided by emitting movt/movw > pair for such cases. RTL expressions for these instructions is high_ and > lo_sum. Currently, symbol_ref expands as high_ and lo_sum but then > cprop1 decides that this is redundant and merges them into one load insn. > > The problem was also found by Linaro community: > https://bugs.launchpad.net/gcc-linaro/+bug/886124 . > Also there is a patch from codesourcery (attached), which was ported to > linaro gcc 4.5, but is missing in later linaro releases. > This patch makes split of symbol_refs at the later stage (after cprop), > instead of generating movt/movw at expand. > > It fixed our test case on GNU Go. Also we tested it on SPEC2K INT (ref) > with GCC 4.8 snapshot from May 12, 2012 on cortex-a9 with -O2 and -mthumb: > > Base Base Base Peak Peak Peak > Benchmarks Ref Time Run Time Ratio Ref Time Run Time Ratio > ---------- -------- -------- -------- -------- -------- ------- > 164.gzip 1400 492 284 1400 497 282 -0.70% > 175.vpr 1400 433 323 1400 458 306 -5.26% > 176.gcc 1100 203 542 1100 198 557 2.77% > 181.mcf 1800 529 340 1800 528 341 0.29% > 186.crafty 1000 261 383 1000 256 391 2.09% > 197.parser 1800 709 254 1800 701 257 1.18% > 252.eon 1300 219 594 1300 202 644 8.42% > 253.perlbmk 1800 389 463 1800 367 490 5.83% > 254.gap 1100 259 425 1100 236 467 9.88% > 255.vortex 1900 498 382 1900 442 430 12.57% > 256.bzip2 1500 452 332 1500 424 354 6.63% > 300.twolf 3000 916 328 3000 853 352 7.32% > SPECint_base2000 376 > SPECint2000 391 3.99% > > > SPEC2K INT grows by 4% (up to 12.5% on vortex; vpr slowdown is likely > because of big variance on this test). > > Similarly, there are gains of 3-4% without -mthumb on cortex-a9 and on > cortex-a8 (thumb2 and ARM modes). > > This patch can be applied to current trunk and passes regtest > successfully on qemu-arm. > Maybe it will be good to have it in trunk? > If everybody agrees, we can take care of committing it. > > -- > Best regards, > Dmitry > > > gcc-4.5-linaro-r99369.patch > Please update the ChangeLog entry (it's not appropriate to mention Sourcery G++) and add a comment as Steven has suggested. Otherwise OK. R.
On 27 June 2012 15:58, Dmitry Melnik <dm@ispras.ru> wrote: > Hi, > > We'd like to note about CodeSourcery's patch for ARM backend, from which GCC > mainline can gain 4% on SPEC2K INT: > http://cgit.openembedded.org/openembedded/plain/recipes/gcc/gcc-4.5/linaro/gcc-4.5-linaro-r99369.patch > (also the patch is attached). > > Originally, we noticed that GNU Go works 6% faster on cortex-a8 with > -fno-gcse. After profiling we found that this is most likely caused by > cache misses when accessing global variables. GCC generates ldr > instructions for them, while this can be avoided by emitting movt/movw pair > for such cases. RTL expressions for these instructions is high_ and lo_sum. > Currently, symbol_ref expands as high_ and lo_sum but then cprop1 decides > that this is redundant and merges them into one load insn. > > The problem was also found by Linaro community: > https://bugs.launchpad.net/gcc-linaro/+bug/886124 . The reason IIRC this isn't in our later releases is that it wasn't thought beneficial enough to upstream. Now you've got some evidence to the contrary. > Also there is a patch from codesourcery (attached), which was ported to > linaro gcc 4.5, but is missing in later linaro releases. > This patch makes split of symbol_refs at the later stage (after cprop), > instead of generating movt/movw at expand. I must admit that I had been suggesting to Zhenqiang about turning this off by tightening the movsi_insn predicates rather than adding a split, but given that it appears to produce enough benefit in this case I don't have any reasons to object ... However it's interesting that this doesn't seem to help vpr .... Ramana
On 06/27/2012 07:55 PM, Ramana Radhakrishnan wrote: > I must admit that I had been suggesting to Zhenqiang about turning > this off by tightening the movsi_insn predicates rather than adding a > split, but given that it appears to produce enough benefit in this > case I don't have any reasons to object ... > > However it's interesting that this doesn't seem to help vpr .... We retested vpr, but it just seems to be unstable: base peak time time 175.vpr 1400 502 X 1400 526 X 175.vpr 1400 500 X 1400 524 X 175.vpr 1400 516 X 1400 526 X 175.vpr 1400 492 X 1400 481 X 175.vpr 1400 496 X 1400 485 X median 500 524 However, the minimum time is still better with the patch. And here are all 3 runs for previously reported data: test base ratio peak ratio median base median peak improvement 164.gzip 284 281 284 282 -0.70% 164.gzip 284 282 164.gzip 285 283 175.vpr 329 306 323 306 -5.26% 175.vpr 323 305 175.vpr 306 307 176.gcc 542 554 542 557 2.77% 176.gcc 541 558 176.gcc 544 557 181.mcf 343 340 340 341 0.29% 181.mcf 339 342 181.mcf 340 341 186.crafty 383 399 383 391 2.09% 186.crafty 390 391 186.crafty 380 386 197.parser 254 257 254 257 1.18% 197.parser 254 257 197.parser 254 257 252.eon 591 644 594 644 8.42% 252.eon 598 644 252.eon 594 643 253.perlbmk 462 490 463 490 5.83% 253.perlbmk 463 490 253.perlbmk 463 490 254.gap 415 473 425 467 9.88% 254.gap 430 467 254.gap 425 464 255.vortex 384 430 382 430 12.57% 255.vortex 382 430 255.vortex 381 430 256.bzip2 331 354 332 354 6.63% 256.bzip2 332 354 256.bzip2 335 349 300.twolf 323 356 328 352 7.32% 300.twolf 347 337 300.twolf 328 352 -- Best regards, Dmitry
On 29 June 2012 14:48, Dmitry Melnik <dm@ispras.ru> wrote: > > On 06/27/2012 07:55 PM, Ramana Radhakrishnan wrote: > >> I must admit that I had been suggesting to Zhenqiang about turning >> this off by tightening the movsi_insn predicates rather than adding a >> split, but given that it appears to produce enough benefit in this >> case I don't have any reasons to object ... >> >> However it's interesting that this doesn't seem to help vpr .... > > We retested vpr, but it just seems to be unstable: Unfortunate but ok. Ramana
2010-08-20 Jie Zhang <jie@codesourcery.com> Merged from Sourcery G++ 4.4: gcc/ 2009-05-29 Julian Brown <julian@codesourcery.com> Merged from Sourcery G++ 4.3: * config/arm/arm.md (movsi): Don't split symbol refs here. (define_split): New. 2010-08-18 Julian Brown <julian@codesourcery.com> Issue #9222 === modified file 'gcc/config/arm/arm.md' --- old/gcc/config/arm/arm.md 2010-08-20 16:41:37 +0000 +++ new/gcc/config/arm/arm.md 2010-08-23 14:39:12 +0000 @@ -5150,14 +5150,6 @@ optimize && can_create_pseudo_p ()); DONE; } - - if (TARGET_USE_MOVT && !target_word_relocations - && GET_CODE (operands[1]) == SYMBOL_REF - && !flag_pic && !arm_tls_referenced_p (operands[1])) - { - arm_emit_movpair (operands[0], operands[1]); - DONE; - } } else /* TARGET_THUMB1... */ { @@ -5265,6 +5257,19 @@ " ) +(define_split + [(set (match_operand:SI 0 "arm_general_register_operand" "") + (match_operand:SI 1 "general_operand" ""))] + "TARGET_32BIT + && TARGET_USE_MOVT && GET_CODE (operands[1]) == SYMBOL_REF + && !flag_pic && !target_word_relocations + && !arm_tls_referenced_p (operands[1])" + [(clobber (const_int 0))] +{ + arm_emit_movpair (operands[0], operands[1]); + DONE; +}) + (define_insn "*thumb1_movsi_insn" [(set (match_operand:SI 0 "nonimmediate_operand" "=l,l,l,l,l,>,l, m,*lhk") (match_operand:SI 1 "general_operand" "l, I,J,K,>,l,mi,l,*lhk"))]