Patchwork [ARM] Enable -fno-sched-interblock for Cortex-M4 and Cortex-M4F

login
register
mail settings
Submitter Jie Zhang
Date Sept. 11, 2010, 2:39 p.m.
Message ID <4C8B9496.6020101@codesourcery.com>
Download mbox | patch
Permalink /patch/64521/
State New
Headers show

Comments

Jie Zhang - Sept. 11, 2010, 2:39 p.m.
Benchmarking using EEMBC shows that enabling -fno-sched-interblock is 
helpful for program performance on Cortex-M4 and Cortex-M4F. This patch 
just does it. This patch depends that patch I sent yesterday which added 
-mcpu=cortex-m4f. Is it OK?
Steven Bosscher - Sept. 11, 2010, 2:50 p.m.
On Sat, Sep 11, 2010 at 4:39 PM, Jie Zhang <jie@codesourcery.com> wrote:
> Benchmarking using EEMBC shows that enabling -fno-sched-interblock is
> helpful for program performance on Cortex-M4 and Cortex-M4F. This patch just
> does it. This patch depends that patch I sent yesterday which added
> -mcpu=cortex-m4f. Is it OK?

You say "enable -fno-sched-interblock" but it's really "disable
-fsched-interblock".

Don't you want to know *why* this is helpful for performance on your
target? This option is enabled by default because it is supposed to be
helpful for performance. If that's not the case for you for one
benchmark, the usual "quick hack" of disabling it is IMHO just Not
Good Enough.

Have you tried, instead, to enable -fsched-pressure by default?
Checked that it's not a problem with the new scheduler pipeline
descriptions rather than in the scheduler itself? Etc.

Ciao!
Steven
Jie Zhang - Sept. 11, 2010, 3:12 p.m.
On 09/11/2010 10:50 PM, Steven Bosscher wrote:
> On Sat, Sep 11, 2010 at 4:39 PM, Jie Zhang<jie@codesourcery.com>  wrote:
>> Benchmarking using EEMBC shows that enabling -fno-sched-interblock is
>> helpful for program performance on Cortex-M4 and Cortex-M4F. This patch just
>> does it. This patch depends that patch I sent yesterday which added
>> -mcpu=cortex-m4f. Is it OK?
>
> You say "enable -fno-sched-interblock" but it's really "disable
> -fsched-interblock".
>
Yes. ;-)

> Don't you want to know *why* this is helpful for performance on your
> target? This option is enabled by default because it is supposed to be
> helpful for performance. If that's not the case for you for one
> benchmark, the usual "quick hack" of disabling it is IMHO just Not
> Good Enough.
>
This improves performance of 6 tests in EEMBC from 8% to 20% and only 
one test regression of 2%. I took a look at one test. Let me draw a flow 
graph first. In one of the hottest functions of that test, there is a loop:

    |
    |<--------+
    |         |
    v         |
block 1   block 2
    |         ^
    |         |
    +---------+
    |
    v

An instruction is scheduled from block 2 to block 1. That instruction 
will be executed one more time before schedule in each execute. One more 
cycle for each function call but the function is called many times. The 
effect adds up to a noticeable performance loss.

The integer pipeline of Cortex-M4 is 3-stage. And most of instructions 
are 1 cycle. There are little benefits to do interblock scheduling.

> Have you tried, instead, to enable -fsched-pressure by default?

Not yet.

> Checked that it's not a problem with the new scheduler pipeline
> descriptions rather than in the scheduler itself? Etc.
>
I think interblock scheduling might be more helpful for processors with 
deep pipelines.

Patch


	* config/arm/arm.c (arm_override_options): Enable
	-fno-sched-interblock for Cortex-M4 and Cortex-M4F.

Index: config/arm/arm.c
===================================================================
--- config/arm/arm.c	(revision 164143)
+++ config/arm/arm.c	(working copy)
@@ -1886,6 +1886,11 @@  arm_override_options (void)
 	fix_cm3_ldrd = 0;
     }
 
+  /* Enable -fno-sched-interblock for Cortex-M4 and Cortex-M4F.  */
+  if (arm_selected_tune->core == cortexm4
+      || arm_selected_tune->core == cortexm4f)
+    flag_schedule_interblock = 0;
+
   if (TARGET_THUMB1 && flag_schedule_insns)
     {
       /* Don't warn since it's on by default in -O2.  */