Message ID | 000001cf4a5a$0cc1c010$26454030$@arm.com |
---|---|
State | New |
Headers | show |
> -----Original Message----- > From: gcc-patches-owner@gcc.gnu.org [mailto:gcc-patches- > owner@gcc.gnu.org] On Behalf Of Terry Guo > Sent: Friday, March 28, 2014 3:48 PM > To: gcc-patches@gcc.gnu.org > Subject: [Patch]Simplify SUBREG with operand whose target bits are cleared > by AND operation > > Hi there, > > When compile below case for ARM Thumb-2 target: > > long long int > test (unsigned long long int a, unsigned int b) > { > return (a & 0xFFFFFFFF) * b; > } > > I find the GCC function simplify_subreg fails to simplify rtx (subreg:SI > (and:DI (reg/v:DI 115 [ a ]) (const_int 4294967295 [0xffffffff])) 4) to zero > during the fwprop1 pass, considering the fact that the high 32-bit part of > (a & 0xFFFFFFFF) is zero. This leads to some unnecessary multiplications for > high 32-bit part of the result of AND operation. The attached patch is > trying to improve simplify_rtx to handle such case. Other target like x86 > seems hasn't such issue because it generates different RTX to handle 64bit > multiplication on a 32bit machine. > > Bootstrapped gcc on x86 machine, no problem. Tested with gcc regression > test > for x86 and Thumb2, no regression. > > Is it OK to stage-1? > > BR, > Terry Sorry for missing the ChangeLog part: gcc/ 2014-03-28 Terry Guo <terry.guo@arm.com> * fwprop.c (simplify_subreg): Handle case that bits are cleared by AND operation. gcc/testsuite/ 2014-03-28 Terry Guo <terry.guo@arm.com> * gcc.target/arm/umull.c: New testcase.
> I find the GCC function simplify_subreg fails to simplify rtx (subreg:SI > (and:DI (reg/v:DI 115 [ a ]) (const_int 4294967295 [0xffffffff])) 4) to zero > during the fwprop1 pass, considering the fact that the high 32-bit part of > (a & 0xFFFFFFFF) is zero. This leads to some unnecessary multiplications > for high 32-bit part of the result of AND operation. The attached patch is > trying to improve simplify_rtx to handle such case. Other target like x86 > seems hasn't such issue because it generates different RTX to handle 64bit > multiplication on a 32bit machine. See http://gcc.gnu.org/ml/gcc-patches/2013-05/msg00073.html for another try, which led to the simplification in combine.c:combine_simplify_rtx line 5448. Your variant is both more general, because it isn't restricted to the lowpart, and less general, because it is artificially restricted to AND. Some remarks: - this needs to be restricted to non-paradoxical subregs, - you need to test HWI_COMPUTABLE_MODE_P (innermode), - you need to test !side_effects_p (op). I think we need to find a common ground between Jakub's patch and yours and put a single transformation in simplify_subreg.
diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c index 04af01e..0ed88fb 100644 --- a/gcc/simplify-rtx.c +++ b/gcc/simplify-rtx.c @@ -6099,6 +6099,19 @@ simplify_subreg (enum machine_mode outermode, rtx op, return CONST0_RTX (outermode); } + /* The AND operation may clear the target bits of SUBREG to zero. + Then we just need to return a zero. Here is an example: + (subreg:SI (and:DI (reg:DI X) (const_int 0xFFFFFFFF)) 4). */ + if (GET_CODE (op) == AND && SCALAR_INT_MODE_P (innermode)) + { + unsigned int bitpos = subreg_lsb_1 (outermode, innermode, byte); + unsigned HOST_WIDE_INT nzmask = nonzero_bits (op, innermode); + unsigned HOST_WIDE_INT smask = GET_MODE_MASK (outermode); + + if (((smask << bitpos) & nzmask) == 0) + return CONST0_RTX (outermode); + } + if (SCALAR_INT_MODE_P (outermode) && SCALAR_INT_MODE_P (innermode) && GET_MODE_PRECISION (outermode) < GET_MODE_PRECISION (innermode) diff --git a/gcc/testsuite/gcc.target/arm/umull.c b/gcc/testsuite/gcc.target/arm/umull.c new file mode 100644 index 0000000..2e39baa --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/umull.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-skip-if "" { arm_thumb1 } } */ +/* { dg-options "-O2" } */ + +long long int +test (unsigned long long int a, unsigned int b) +{ + return (a & 0xFFFFFFFF) * b; +} + +/* { dg-final { scan-assembler-not "mla" } } */