Patchwork Fix ARM ldm/stm peephole2 loop

login
register
mail settings
Submitter Bernd Schmidt
Date Aug. 5, 2010, 1:09 a.m.
Message ID <4C5A0F50.4000904@codesourcery.com>
Download mbox | patch
Permalink /patch/60912/
State New
Headers show

Comments

Bernd Schmidt - Aug. 5, 2010, 1:09 a.m.
With -mtune=xscale, arm_gen_load_multiple_1 doesn't actually generate a
load-multiple for two loads, but single instructions.  This causes a
peephole2 loop where we try to optimize the same instructions over and
over again.

Fixed by teaching multiple_operation_profitable_p about this situation.  Ok?


Bernd
PR bootstrap/45177
	* config/arm/arm.c (multiple_operation_profitable_p): Return false
	if we'd generate single operations.
Richard Earnshaw - Aug. 5, 2010, 7:46 a.m.
On Thu, 2010-08-05 at 03:09 +0200, Bernd Schmidt wrote:
> With -mtune=xscale, arm_gen_load_multiple_1 doesn't actually generate a
> load-multiple for two loads, but single instructions.  This causes a
> peephole2 loop where we try to optimize the same instructions over and
> over again.
> 
> Fixed by teaching multiple_operation_profitable_p about this situation.  Ok?
> 
> 
> Bernd

+  if (nops <= 2 && arm_tune_xscale && !optimize_size)

It seems wrong to me to have an explicit test for XScale here.  Other
cores could surely exist that result in a similar loop.  Ie the test
should be on the cause, not on 'it's an xscale'

R
Phil Blundell - Aug. 5, 2010, 9:21 a.m.
On Thu, 2010-08-05 at 08:46 +0100, Richard Earnshaw wrote:
> On Thu, 2010-08-05 at 03:09 +0200, Bernd Schmidt wrote:
> > With -mtune=xscale, arm_gen_load_multiple_1 doesn't actually generate a
> > load-multiple for two loads, but single instructions.  This causes a
> > peephole2 loop where we try to optimize the same instructions over and
> > over again.
> > 
> > Fixed by teaching multiple_operation_profitable_p about this situation.  Ok?
> > 
> > 
> > Bernd
> 
> +  if (nops <= 2 && arm_tune_xscale && !optimize_size)
> 
> It seems wrong to me to have an explicit test for XScale here.  Other
> cores could surely exist that result in a similar loop.  Ie the test
> should be on the cause, not on 'it's an xscale'

I think the equivalent condition in arm_gen_load_multiple (and
arm_gen_store_multiple, fwiw) is also written in terms of
arm_tune_xcale.  So, to that extent, "it's an xscale" is actually the
cause here, at least until and unless another CPU appears which has the
same pipeline behaviour.

It does seem a little bit fragile to require the conditions in the two
places to match in order to avoid loops, though.  Maybe there should be
a comment at the appropriate place in arm_gen_xx_multiple to say that it
needs to stay in sync with the code in multiple_operation_profitable_p,
or maybe those two functions could be reworked to actually use
multiple_operation_profitable_p() rather than duplicating its logic.

p.

Patch

Index: config/arm/arm.c
===================================================================
--- config/arm/arm.c	(revision 162821)
+++ config/arm/arm.c	(working copy)
@@ -9185,7 +9192,11 @@  multiple_operation_profitable_p (bool is
      changes, then the test below needs to be reworked.  */
   if (nops == 2 && arm_ld_sched && add_offset != 0)
     return false;
-
+  /* Return false if arm_gen_load_multiple_1 or
+     arm_gen_store_multiple_1 would just emit single operations.  See
+     the discussion there.  */
+  if (nops <= 2 && arm_tune_xscale && !optimize_size)
+    return false;
   return true;
 }