Patchwork [PATCH:,PR,target/44999] Replace "and r0, r0, #255" with uxtb in thumb2

login
register
mail settings
Submitter Carrot Wei
Date Nov. 7, 2010, 2:05 p.m.
Message ID <AANLkTimibrqy+5xv+CWAa9S=OZsoK2cvrJgxM_gcuaUX@mail.gmail.com>
Download mbox | patch
Permalink /patch/70349/
State New
Headers show

Comments

Carrot Wei - Nov. 7, 2010, 2:05 p.m.
On Wed, Nov 3, 2010 at 9:04 PM, Paul Brook <paul@codesourcery.com> wrote:
>> > +             operands[1] = convert_to_mode (QImode, operands[1], 1);
>> > +             emit_insn (gen_thumb2_zero_extendqisi2_v6 (operands[0],
>> > +                                                        operands[1]));
>> > +           }
>> >
>> >          }
>>
>> This is OK.
>
> No it's not. This code is inside if (TARGET_32BIT). Using thumb specific
> patterns in ARM mode is just plain wrong.
>
You are right. Following is additional patch to fix it. Tested with
regression testing in qemu.

> Even ignoring that, I don't think this is the right fix. Why do we need this
> for Thumb mode, and not ARM mode? Why extendqi and not extendhi?

In Thumb mode, and with a constant is 32 bit, and uxtb is 16 bit.
In ARM mode, both are 32 bit, so it is not beneficial.
The test case shows that and with 0xFFFF is already converted to uxth,
so it is not handled here.


ChangeLog:
2010-11-07  Wei Guozhi  <carrot@google.com>

        * config/arm/arm.md (andsi3): Call gen_thumb2_zero_extendqisi2_v6 only
        when the target is TARGET_THUMB2.
Paul Brook - Nov. 8, 2010, 11:55 a.m.
> > Even ignoring that, I don't think this is the right fix. Why do we need
> > this for Thumb mode, and not ARM mode? Why extendqi and not extendhi?
> 
> In Thumb mode, and with a constant is 32 bit, and uxtb is 16 bit.
> In ARM mode, both are 32 bit, so it is not beneficial.
> The test case shows that and with 0xFFFF is already converted to uxth,
> so it is not handled here.

I'm still not convinced. Why is HImode different to QImode? Why has this 
suddenly changed? Why does gcc generate uxtb in ARM mode but not Thumb mode? 
i.e. I've little confidence that this is actually fixing the problem rather than 
papering over a bug elsewhere, especially given we have a perfectly good 
zero_extendqisi2 pattern+expander.

Paul
Carrot Wei - Nov. 9, 2010, 7:10 a.m.
1. Why is HImode different to QImode?

Actually it is not HImode VS QImode, it is 65535 VS 255. The actual
processing of and with constant is handled by function
arm_gen_constant. In this function it finds that 255 can be directly
encoded in an and instruction, so it generates a simple and
instruction. On the other hand, 65535 can't be  directly encoded in an
and instruction, it must find some other ways to handle it, then it
find this and can be accomplished with two shifts. In the later
combine pass, the two shifts are combined into a single uxth.

2. Why do we need it for Thumb mode, and not for ARM mode?

In thumb mode, instruction and with constant is 32 bit, uxtb is 16
bit, with this enhancement we can save 2 bytes.
In ARM mode, all instructions are 32 bit, convert and to uxtb doesn't
help us, so we don't care which instruction is used.
So we need it for Thumb mode only.

thanks
Carrot

On Mon, Nov 8, 2010 at 7:55 PM, Paul Brook <paul@codesourcery.com> wrote:
>> > Even ignoring that, I don't think this is the right fix. Why do we need
>> > this for Thumb mode, and not ARM mode? Why extendqi and not extendhi?
>>
>> In Thumb mode, and with a constant is 32 bit, and uxtb is 16 bit.
>> In ARM mode, both are 32 bit, so it is not beneficial.
>> The test case shows that and with 0xFFFF is already converted to uxth,
>> so it is not handled here.
>
> I'm still not convinced. Why is HImode different to QImode? Why has this
> suddenly changed? Why does gcc generate uxtb in ARM mode but not Thumb mode?
> i.e. I've little confidence that this is actually fixing the problem rather than
> papering over a bug elsewhere, especially given we have a perfectly good
> zero_extendqisi2 pattern+expander.
>
> Paul
>
Paul Brook - Nov. 9, 2010, 11:16 a.m.
> 2. Why do we need it for Thumb mode, and not for ARM mode?
> 
> In thumb mode, instruction and with constant is 32 bit, uxtb is 16
> bit, with this enhancement we can save 2 bytes.
> In ARM mode, all instructions are 32 bit, convert and to uxtb doesn't
> help us, so we don't care which instruction is used.
> So we need it for Thumb mode only.

No. Look at the output.
In ARM mode we generate uxtb without your patch.
In thumb mode gcc 4.5 generated uxtb, so this is a recent regression. What 
caused this regression?

The fact that this used to work (without andsi expander hacks), and still 
works in ARM mode, suggests there's something you're missing. Given there's 
clearly code somewhere capable of dong this transformation, it seems 
surprising that it's not already undoing you expander hack, and may do so in 
the future.

Paul
Carrot Wei - Nov. 16, 2010, 1:29 a.m.
On Tue, Nov 9, 2010 at 3:16 AM, Paul Brook <paul@codesourcery.com> wrote:
>> 2. Why do we need it for Thumb mode, and not for ARM mode?
>>
>> In thumb mode, instruction and with constant is 32 bit, uxtb is 16
>> bit, with this enhancement we can save 2 bytes.
>> In ARM mode, all instructions are 32 bit, convert and to uxtb doesn't
>> help us, so we don't care which instruction is used.
>> So we need it for Thumb mode only.
>
> No. Look at the output.
> In ARM mode we generate uxtb without your patch.
> In thumb mode gcc 4.5 generated uxtb, so this is a recent regression. What
> caused this regression?
>
I've built a new gcc 4.5 compiler, it actually can't generate uxtb in
thumb mode. I've also tried the compiler for each month from 2009 Jan
to now, all of them didn't generate uxtb in thumb mode. And all of
them generated uxtb in arm mode.

> The fact that this used to work (without andsi expander hacks), and still
> works in ARM mode, suggests there's something you're missing. Given there's
> clearly code somewhere capable of dong this transformation, it seems
> surprising that it's not already undoing you expander hack, and may do so in
> the future.
>

For arm mode, the rtl before combine is:

(insn 2 5 3 2 src/tp.c:2 (set (reg/v:SI 134 [ x ])
        (reg:SI 0 r0 [ x ])) 167 {*arm_movsi_insn} (expr_list:REG_DEAD
(reg:SI 0 r0 [ x ])
        (nil)))

(insn 3 2 4 2 src/tp.c:2 (set (reg/v:SI 135 [ y ])
        (reg:SI 1 r1 [ y ])) 167 {*arm_movsi_insn} (expr_list:REG_DEAD
(reg:SI 1 r1 [ y ])
        (nil)))

(note 4 3 7 2 NOTE_INSN_FUNCTION_BEG)

(insn 7 4 8 2 src/tp.c:2 (set (reg:SI 137)
        (and:SI (reg/v:SI 134 [ x ])
            (const_int 255 [0xff]))) 67 {*arm_andsi3_insn}
(expr_list:REG_DEAD (reg/v:SI 134 [ x ])
        (nil)))

(insn 8 7 9 2 src/tp.c:2 (set (reg:SI 139)
        (const_int 65535 [0xffff])) 167 {*arm_movsi_insn} (nil))

(insn 9 8 10 2 src/tp.c:2 (set (reg:SI 138)
        (and:SI (reg/v:SI 135 [ y ])
            (reg:SI 139))) 67 {*arm_andsi3_insn} (expr_list:REG_DEAD
(reg:SI 139)
        (expr_list:REG_DEAD (reg/v:SI 135 [ y ])
            (expr_list:REG_EQUAL (and:SI (reg/v:SI 135 [ y ])
                    (const_int 65535 [0xffff]))
                (nil)))))

...

At combine pass, insn 2 and insn 7 are combined into a single unsigned
extend instruction, similarly insns 3,8,9 are combined into another
unsigned extend instruction, the result is:

(insn 7 4 8 2 src/tp.c:2 (set (reg:SI 137)
        (zero_extend:SI (reg:QI 0 r0 [ x ]))) 149
{*arm_zero_extendqisi2_v6} (expr_list:REG_DEAD (reg:SI 0 r0 [ x ])
        (nil)))

(note 8 7 9 2 NOTE_INSN_DELETED)

(insn 9 8 10 2 src/tp.c:2 (set (reg:SI 138)
        (zero_extend:SI (reg:HI 1 r1 [ y ]))) 144
{*arm_zero_extendhisi2_v6} (expr_list:REG_DEAD (reg:SI 1 r1 [ y ])
        (nil)))
...

In thumb mode, insn 2 and 7 are failed to be combined because function
cant_combine_insn_p returns true for insn 2, which is caused by
CLASS_LIKELY_SPILLED_P returns true for register r0. The
implementation of CLASS_LIKELY_SPILLED_P is

static bool
arm_class_likely_spilled_p (reg_class_t rclass)
{
  if ((TARGET_THUMB && rclass == LO_REGS)
      || rclass  == CC_REG)
    return true;

  return false;
}

The problem is the condition (TARGET_THUMB && rclass == LO_REGS), in
thumb2 we can use the same number of registers as arm mode, so it
should not be likely spilled. The condition should be changed to

    (TARGET_THUMB1 && rclass == LO_REGS)

After this change, gcc now can generate uxtb instruction.

But it is still not general enough because the combine needs two or
more instructions, so it actually depends on the existence of the
parameter

(insn 2 5 3 2 src/tp.c:2 (set (reg/v:SI 134 [ x ])
        (reg:SI 0 r0 [ x ])) 167 {*arm_movsi_insn} (expr_list:REG_DEAD
(reg:SI 0 r0 [ x ])
        (nil)))

If I modify the test case a little

int tp(int x, int y)
{
  return ((x + 3) & 0xff) - (y & 0xffff);
}

The compiler always generates "and" instruction for both arm and thumb
modes. But the enhanced expander can still handle this case.

thanks
Carrot
Paolo Bonzini - Nov. 16, 2010, 2:07 a.m.
On 11/16/2010 02:29 AM, Carrot Wei wrote:
> The compiler always generates "and" instruction for both arm and thumb
> modes. But the enhanced expander can still handle this case.

If anything, a splitter seems preferrable to hacking the expander. 
Within an expander you cannot be sure that passes such as combine won't 
undo your change.

Paolo

Patch

Index: arm.md
===================================================================
--- arm.md	(revision 165492)
+++ arm.md	(working copy)
@@ -2015,7 +2015,7 @@ 
     {
       if (GET_CODE (operands[2]) == CONST_INT)
         {
-	  if (INTVAL (operands[2]) == 255 && arm_arch6)
+	  if (INTVAL (operands[2]) == 255 && TARGET_THUMB2 && arm_arch6)
 	    {
 	      operands[1] = convert_to_mode (QImode, operands[1], 1);
 	      emit_insn (gen_thumb2_zero_extendqisi2_v6 (operands[0],