Patchwork Simplify SUBREG with operand whose target bits are cleared by AND operation

login
register
mail settings
Submitter Terry Guo
Date March 28, 2014, 7:48 a.m.
Message ID <000001cf4a5a$0cc1c010$26454030$@arm.com>
Download mbox | patch
Permalink /patch/334614/
State New
Headers show

Comments

Terry Guo - March 28, 2014, 7:48 a.m.
Hi there,

When compile below case for ARM Thumb-2 target:

long long int
test (unsigned long long int a, unsigned int b)
{
  return (a & 0xFFFFFFFF) * b;
}

I find the GCC function simplify_subreg fails to simplify rtx (subreg:SI
(and:DI (reg/v:DI 115 [ a ]) (const_int 4294967295 [0xffffffff])) 4) to zero
during the fwprop1 pass, considering the fact that the high 32-bit part of
(a & 0xFFFFFFFF) is zero. This leads to some unnecessary multiplications for
high 32-bit part of the result of AND operation. The attached patch is
trying to improve simplify_rtx to handle such case. Other target like x86
seems hasn't such issue because it generates different RTX to handle 64bit
multiplication on a 32bit machine.

Bootstrapped gcc on x86 machine, no problem. Tested with gcc regression test
for x86 and Thumb2, no regression.

Is it OK to stage-1?

BR,
Terry
Terry Guo - March 28, 2014, 8:42 a.m.
> -----Original Message-----
> From: gcc-patches-owner@gcc.gnu.org [mailto:gcc-patches-
> owner@gcc.gnu.org] On Behalf Of Terry Guo
> Sent: Friday, March 28, 2014 3:48 PM
> To: gcc-patches@gcc.gnu.org
> Subject: [Patch]Simplify SUBREG with operand whose target bits are cleared
> by AND operation
> 
> Hi there,
> 
> When compile below case for ARM Thumb-2 target:
> 
> long long int
> test (unsigned long long int a, unsigned int b)
> {
>   return (a & 0xFFFFFFFF) * b;
> }
> 
> I find the GCC function simplify_subreg fails to simplify rtx (subreg:SI
> (and:DI (reg/v:DI 115 [ a ]) (const_int 4294967295 [0xffffffff])) 4) to
zero
> during the fwprop1 pass, considering the fact that the high 32-bit part of
> (a & 0xFFFFFFFF) is zero. This leads to some unnecessary multiplications
for
> high 32-bit part of the result of AND operation. The attached patch is
> trying to improve simplify_rtx to handle such case. Other target like x86
> seems hasn't such issue because it generates different RTX to handle 64bit
> multiplication on a 32bit machine.
> 
> Bootstrapped gcc on x86 machine, no problem. Tested with gcc regression
> test
> for x86 and Thumb2, no regression.
> 
> Is it OK to stage-1?
> 
> BR,
> Terry

Sorry for missing the ChangeLog part:

gcc/
2014-03-28  Terry Guo  <terry.guo@arm.com>

        * fwprop.c (simplify_subreg): Handle case that bits are
        cleared by AND operation.

gcc/testsuite/
2014-03-28  Terry Guo  <terry.guo@arm.com>

        * gcc.target/arm/umull.c: New testcase.
Eric Botcazou - April 3, 2014, 2:11 p.m.
> I find the GCC function simplify_subreg fails to simplify rtx (subreg:SI
> (and:DI (reg/v:DI 115 [ a ]) (const_int 4294967295 [0xffffffff])) 4) to zero
> during the fwprop1 pass, considering the fact that the high 32-bit part of
> (a & 0xFFFFFFFF) is zero. This leads to some unnecessary multiplications
> for high 32-bit part of the result of AND operation. The attached patch is
> trying to improve simplify_rtx to handle such case. Other target like x86
> seems hasn't such issue because it generates different RTX to handle 64bit
> multiplication on a 32bit machine.

See http://gcc.gnu.org/ml/gcc-patches/2013-05/msg00073.html for another try,
which led to the simplification in combine.c:combine_simplify_rtx line 5448.

Your variant is both more general, because it isn't restricted to the lowpart, 
and less general, because it is artificially restricted to AND.

Some remarks:
 - this needs to be restricted to non-paradoxical subregs,
 - you need to test HWI_COMPUTABLE_MODE_P (innermode),
 - you need to test !side_effects_p (op).

I think we need to find a common ground between Jakub's patch and yours and 
put a single transformation in simplify_subreg.

Patch

diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
index 04af01e..0ed88fb 100644
--- a/gcc/simplify-rtx.c
+++ b/gcc/simplify-rtx.c
@@ -6099,6 +6099,19 @@  simplify_subreg (enum machine_mode outermode, rtx op,
 	return CONST0_RTX (outermode);
     }
 
+  /* The AND operation may clear the target bits of SUBREG to zero.
+     Then we just need to return a zero.  Here is an example:
+     (subreg:SI (and:DI (reg:DI X) (const_int 0xFFFFFFFF)) 4).  */
+  if (GET_CODE (op) == AND && SCALAR_INT_MODE_P (innermode))
+    {
+      unsigned int bitpos = subreg_lsb_1 (outermode, innermode, byte);
+      unsigned HOST_WIDE_INT nzmask = nonzero_bits (op, innermode);
+      unsigned HOST_WIDE_INT smask = GET_MODE_MASK (outermode);
+
+      if (((smask << bitpos) & nzmask) == 0)
+	return CONST0_RTX (outermode);
+    }
+
   if (SCALAR_INT_MODE_P (outermode)
       && SCALAR_INT_MODE_P (innermode)
       && GET_MODE_PRECISION (outermode) < GET_MODE_PRECISION (innermode)
diff --git a/gcc/testsuite/gcc.target/arm/umull.c b/gcc/testsuite/gcc.target/arm/umull.c
new file mode 100644
index 0000000..2e39baa
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/umull.c
@@ -0,0 +1,11 @@ 
+/* { dg-do compile } */
+/* { dg-skip-if "" { arm_thumb1 } } */
+/* { dg-options "-O2" } */
+
+long long int
+test (unsigned long long int a, unsigned int b)
+{
+  return (a & 0xFFFFFFFF) * b;
+}
+
+/* { dg-final { scan-assembler-not "mla" } } */