Message ID | AANLkTimibrqy+5xv+CWAa9S=OZsoK2cvrJgxM_gcuaUX@mail.gmail.com |
---|---|
State | New |
Headers | show |
> > Even ignoring that, I don't think this is the right fix. Why do we need > > this for Thumb mode, and not ARM mode? Why extendqi and not extendhi? > > In Thumb mode, and with a constant is 32 bit, and uxtb is 16 bit. > In ARM mode, both are 32 bit, so it is not beneficial. > The test case shows that and with 0xFFFF is already converted to uxth, > so it is not handled here. I'm still not convinced. Why is HImode different to QImode? Why has this suddenly changed? Why does gcc generate uxtb in ARM mode but not Thumb mode? i.e. I've little confidence that this is actually fixing the problem rather than papering over a bug elsewhere, especially given we have a perfectly good zero_extendqisi2 pattern+expander. Paul
1. Why is HImode different to QImode? Actually it is not HImode VS QImode, it is 65535 VS 255. The actual processing of and with constant is handled by function arm_gen_constant. In this function it finds that 255 can be directly encoded in an and instruction, so it generates a simple and instruction. On the other hand, 65535 can't be directly encoded in an and instruction, it must find some other ways to handle it, then it find this and can be accomplished with two shifts. In the later combine pass, the two shifts are combined into a single uxth. 2. Why do we need it for Thumb mode, and not for ARM mode? In thumb mode, instruction and with constant is 32 bit, uxtb is 16 bit, with this enhancement we can save 2 bytes. In ARM mode, all instructions are 32 bit, convert and to uxtb doesn't help us, so we don't care which instruction is used. So we need it for Thumb mode only. thanks Carrot On Mon, Nov 8, 2010 at 7:55 PM, Paul Brook <paul@codesourcery.com> wrote: >> > Even ignoring that, I don't think this is the right fix. Why do we need >> > this for Thumb mode, and not ARM mode? Why extendqi and not extendhi? >> >> In Thumb mode, and with a constant is 32 bit, and uxtb is 16 bit. >> In ARM mode, both are 32 bit, so it is not beneficial. >> The test case shows that and with 0xFFFF is already converted to uxth, >> so it is not handled here. > > I'm still not convinced. Why is HImode different to QImode? Why has this > suddenly changed? Why does gcc generate uxtb in ARM mode but not Thumb mode? > i.e. I've little confidence that this is actually fixing the problem rather than > papering over a bug elsewhere, especially given we have a perfectly good > zero_extendqisi2 pattern+expander. > > Paul >
> 2. Why do we need it for Thumb mode, and not for ARM mode? > > In thumb mode, instruction and with constant is 32 bit, uxtb is 16 > bit, with this enhancement we can save 2 bytes. > In ARM mode, all instructions are 32 bit, convert and to uxtb doesn't > help us, so we don't care which instruction is used. > So we need it for Thumb mode only. No. Look at the output. In ARM mode we generate uxtb without your patch. In thumb mode gcc 4.5 generated uxtb, so this is a recent regression. What caused this regression? The fact that this used to work (without andsi expander hacks), and still works in ARM mode, suggests there's something you're missing. Given there's clearly code somewhere capable of dong this transformation, it seems surprising that it's not already undoing you expander hack, and may do so in the future. Paul
On Tue, Nov 9, 2010 at 3:16 AM, Paul Brook <paul@codesourcery.com> wrote: >> 2. Why do we need it for Thumb mode, and not for ARM mode? >> >> In thumb mode, instruction and with constant is 32 bit, uxtb is 16 >> bit, with this enhancement we can save 2 bytes. >> In ARM mode, all instructions are 32 bit, convert and to uxtb doesn't >> help us, so we don't care which instruction is used. >> So we need it for Thumb mode only. > > No. Look at the output. > In ARM mode we generate uxtb without your patch. > In thumb mode gcc 4.5 generated uxtb, so this is a recent regression. What > caused this regression? > I've built a new gcc 4.5 compiler, it actually can't generate uxtb in thumb mode. I've also tried the compiler for each month from 2009 Jan to now, all of them didn't generate uxtb in thumb mode. And all of them generated uxtb in arm mode. > The fact that this used to work (without andsi expander hacks), and still > works in ARM mode, suggests there's something you're missing. Given there's > clearly code somewhere capable of dong this transformation, it seems > surprising that it's not already undoing you expander hack, and may do so in > the future. > For arm mode, the rtl before combine is: (insn 2 5 3 2 src/tp.c:2 (set (reg/v:SI 134 [ x ]) (reg:SI 0 r0 [ x ])) 167 {*arm_movsi_insn} (expr_list:REG_DEAD (reg:SI 0 r0 [ x ]) (nil))) (insn 3 2 4 2 src/tp.c:2 (set (reg/v:SI 135 [ y ]) (reg:SI 1 r1 [ y ])) 167 {*arm_movsi_insn} (expr_list:REG_DEAD (reg:SI 1 r1 [ y ]) (nil))) (note 4 3 7 2 NOTE_INSN_FUNCTION_BEG) (insn 7 4 8 2 src/tp.c:2 (set (reg:SI 137) (and:SI (reg/v:SI 134 [ x ]) (const_int 255 [0xff]))) 67 {*arm_andsi3_insn} (expr_list:REG_DEAD (reg/v:SI 134 [ x ]) (nil))) (insn 8 7 9 2 src/tp.c:2 (set (reg:SI 139) (const_int 65535 [0xffff])) 167 {*arm_movsi_insn} (nil)) (insn 9 8 10 2 src/tp.c:2 (set (reg:SI 138) (and:SI (reg/v:SI 135 [ y ]) (reg:SI 139))) 67 {*arm_andsi3_insn} (expr_list:REG_DEAD (reg:SI 139) (expr_list:REG_DEAD (reg/v:SI 135 [ y ]) (expr_list:REG_EQUAL (and:SI (reg/v:SI 135 [ y ]) (const_int 65535 [0xffff])) (nil))))) ... At combine pass, insn 2 and insn 7 are combined into a single unsigned extend instruction, similarly insns 3,8,9 are combined into another unsigned extend instruction, the result is: (insn 7 4 8 2 src/tp.c:2 (set (reg:SI 137) (zero_extend:SI (reg:QI 0 r0 [ x ]))) 149 {*arm_zero_extendqisi2_v6} (expr_list:REG_DEAD (reg:SI 0 r0 [ x ]) (nil))) (note 8 7 9 2 NOTE_INSN_DELETED) (insn 9 8 10 2 src/tp.c:2 (set (reg:SI 138) (zero_extend:SI (reg:HI 1 r1 [ y ]))) 144 {*arm_zero_extendhisi2_v6} (expr_list:REG_DEAD (reg:SI 1 r1 [ y ]) (nil))) ... In thumb mode, insn 2 and 7 are failed to be combined because function cant_combine_insn_p returns true for insn 2, which is caused by CLASS_LIKELY_SPILLED_P returns true for register r0. The implementation of CLASS_LIKELY_SPILLED_P is static bool arm_class_likely_spilled_p (reg_class_t rclass) { if ((TARGET_THUMB && rclass == LO_REGS) || rclass == CC_REG) return true; return false; } The problem is the condition (TARGET_THUMB && rclass == LO_REGS), in thumb2 we can use the same number of registers as arm mode, so it should not be likely spilled. The condition should be changed to (TARGET_THUMB1 && rclass == LO_REGS) After this change, gcc now can generate uxtb instruction. But it is still not general enough because the combine needs two or more instructions, so it actually depends on the existence of the parameter (insn 2 5 3 2 src/tp.c:2 (set (reg/v:SI 134 [ x ]) (reg:SI 0 r0 [ x ])) 167 {*arm_movsi_insn} (expr_list:REG_DEAD (reg:SI 0 r0 [ x ]) (nil))) If I modify the test case a little int tp(int x, int y) { return ((x + 3) & 0xff) - (y & 0xffff); } The compiler always generates "and" instruction for both arm and thumb modes. But the enhanced expander can still handle this case. thanks Carrot
On 11/16/2010 02:29 AM, Carrot Wei wrote: > The compiler always generates "and" instruction for both arm and thumb > modes. But the enhanced expander can still handle this case. If anything, a splitter seems preferrable to hacking the expander. Within an expander you cannot be sure that passes such as combine won't undo your change. Paolo
Index: arm.md =================================================================== --- arm.md (revision 165492) +++ arm.md (working copy) @@ -2015,7 +2015,7 @@ { if (GET_CODE (operands[2]) == CONST_INT) { - if (INTVAL (operands[2]) == 255 && arm_arch6) + if (INTVAL (operands[2]) == 255 && TARGET_THUMB2 && arm_arch6) { operands[1] = convert_to_mode (QImode, operands[1], 1); emit_insn (gen_thumb2_zero_extendqisi2_v6 (operands[0],