Message ID | 4C8B9496.6020101@codesourcery.com |
---|---|
State | New |
Headers | show |
On Sat, Sep 11, 2010 at 4:39 PM, Jie Zhang <jie@codesourcery.com> wrote: > Benchmarking using EEMBC shows that enabling -fno-sched-interblock is > helpful for program performance on Cortex-M4 and Cortex-M4F. This patch just > does it. This patch depends that patch I sent yesterday which added > -mcpu=cortex-m4f. Is it OK? You say "enable -fno-sched-interblock" but it's really "disable -fsched-interblock". Don't you want to know *why* this is helpful for performance on your target? This option is enabled by default because it is supposed to be helpful for performance. If that's not the case for you for one benchmark, the usual "quick hack" of disabling it is IMHO just Not Good Enough. Have you tried, instead, to enable -fsched-pressure by default? Checked that it's not a problem with the new scheduler pipeline descriptions rather than in the scheduler itself? Etc. Ciao! Steven
On 09/11/2010 10:50 PM, Steven Bosscher wrote: > On Sat, Sep 11, 2010 at 4:39 PM, Jie Zhang<jie@codesourcery.com> wrote: >> Benchmarking using EEMBC shows that enabling -fno-sched-interblock is >> helpful for program performance on Cortex-M4 and Cortex-M4F. This patch just >> does it. This patch depends that patch I sent yesterday which added >> -mcpu=cortex-m4f. Is it OK? > > You say "enable -fno-sched-interblock" but it's really "disable > -fsched-interblock". > Yes. ;-) > Don't you want to know *why* this is helpful for performance on your > target? This option is enabled by default because it is supposed to be > helpful for performance. If that's not the case for you for one > benchmark, the usual "quick hack" of disabling it is IMHO just Not > Good Enough. > This improves performance of 6 tests in EEMBC from 8% to 20% and only one test regression of 2%. I took a look at one test. Let me draw a flow graph first. In one of the hottest functions of that test, there is a loop: | |<--------+ | | v | block 1 block 2 | ^ | | +---------+ | v An instruction is scheduled from block 2 to block 1. That instruction will be executed one more time before schedule in each execute. One more cycle for each function call but the function is called many times. The effect adds up to a noticeable performance loss. The integer pipeline of Cortex-M4 is 3-stage. And most of instructions are 1 cycle. There are little benefits to do interblock scheduling. > Have you tried, instead, to enable -fsched-pressure by default? Not yet. > Checked that it's not a problem with the new scheduler pipeline > descriptions rather than in the scheduler itself? Etc. > I think interblock scheduling might be more helpful for processors with deep pipelines.
* config/arm/arm.c (arm_override_options): Enable -fno-sched-interblock for Cortex-M4 and Cortex-M4F. Index: config/arm/arm.c =================================================================== --- config/arm/arm.c (revision 164143) +++ config/arm/arm.c (working copy) @@ -1886,6 +1886,11 @@ arm_override_options (void) fix_cm3_ldrd = 0; } + /* Enable -fno-sched-interblock for Cortex-M4 and Cortex-M4F. */ + if (arm_selected_tune->core == cortexm4 + || arm_selected_tune->core == cortexm4f) + flag_schedule_interblock = 0; + if (TARGET_THUMB1 && flag_schedule_insns) { /* Don't warn since it's on by default in -O2. */