PR rtl-optimization/65220: avoid integer division in stack alignment
diff mbox

Message ID 54EFCC88.6080203@redhat.com
State New
Headers show

Commit Message

Aldy Hernandez Feb. 27, 2015, 1:46 a.m. UTC
This is actually Jakub's patch from the PR, with a few minor tweaks that 
were needed to bootstrap and pass the regression suite.

The splitter was using operand 0 without setting it first.  It should've 
been operand 2.  Also, there was a division by zero that was causing an 
invalid insn; fixed by changing the INTVAL + 2 into INTVAL - 2. 
Finally, a reload_completed was added at Jakub's request.

Bootstrapped and tested on x86-64 Linux.

OK for mainline?
PR rtl-optimization/65220
    	* config/i386/i386.md (*udivmod<mode>4_pow2): New.

Comments

Richard Henderson March 2, 2015, 5:55 p.m. UTC | #1
On 02/26/2015 05:46 PM, Aldy Hernandez wrote:
> +;; Optimize division or modulo by constant power of 2, if the constant
> +;; materializes only after expansion.
> +(define_insn_and_split "*udivmod<mode>4_pow2"
> +  [(set (match_operand:SWI48 0 "register_operand" "=r")
> +	(udiv:SWI48 (match_operand:SWI48 2 "register_operand" "0")
> +		    (match_operand:SWI48 3 "const_int_operand" "n")))
> +   (set (match_operand:SWI48 1 "register_operand" "=r")
> +	(umod:SWI48 (match_dup 2) (match_dup 3)))
> +   (clobber (reg:CC FLAGS_REG))]
> +  "UINTVAL (operands[3]) - 2 < <MODE_SIZE> * BITS_PER_UNIT
> +   && (UINTVAL (operands[3]) & (UINTVAL (operands[3]) - 1)) == 0"
> +  "#"
> +  "reload_completed"
> +  [(set (match_dup 1) (match_dup 2))
> +   (parallel [(set (match_dup 0) (lshiftrt:<MODE> (match_dup 2) (match_dup 4)))
> +	      (clobber (reg:CC FLAGS_REG))])
> +   (parallel [(set (match_dup 1) (and:<MODE> (match_dup 1) (match_dup 5)))
> +	      (clobber (reg:CC FLAGS_REG))])]
> +{

Do you actually need the reload completed?  Couldn't this be legitimately split
before reload (and then parts of it DCE'd as necessary)?

Otherwise, the split condition does need "&&" as mentioned by Uros.


r~
Aldy Hernandez March 2, 2015, 6:25 p.m. UTC | #2
On 03/02/2015 09:55 AM, Richard Henderson wrote:
> On 02/26/2015 05:46 PM, Aldy Hernandez wrote:
>> +;; Optimize division or modulo by constant power of 2, if the constant
>> +;; materializes only after expansion.
>> +(define_insn_and_split "*udivmod<mode>4_pow2"
>> +  [(set (match_operand:SWI48 0 "register_operand" "=r")
>> +	(udiv:SWI48 (match_operand:SWI48 2 "register_operand" "0")
>> +		    (match_operand:SWI48 3 "const_int_operand" "n")))
>> +   (set (match_operand:SWI48 1 "register_operand" "=r")
>> +	(umod:SWI48 (match_dup 2) (match_dup 3)))
>> +   (clobber (reg:CC FLAGS_REG))]
>> +  "UINTVAL (operands[3]) - 2 < <MODE_SIZE> * BITS_PER_UNIT
>> +   && (UINTVAL (operands[3]) & (UINTVAL (operands[3]) - 1)) == 0"
>> +  "#"
>> +  "reload_completed"
>> +  [(set (match_dup 1) (match_dup 2))
>> +   (parallel [(set (match_dup 0) (lshiftrt:<MODE> (match_dup 2) (match_dup 4)))
>> +	      (clobber (reg:CC FLAGS_REG))])
>> +   (parallel [(set (match_dup 1) (and:<MODE> (match_dup 1) (match_dup 5)))
>> +	      (clobber (reg:CC FLAGS_REG))])]
>> +{
>
> Do you actually need the reload completed?  Couldn't this be legitimately split
> before reload (and then parts of it DCE'd as necessary)?

I think Jakub mentioned we needed it, otherwise we needed to check for a 
division by zero for the INTVAL.  But I could be misunderstanding.

Aldy
Richard Henderson March 2, 2015, 6:34 p.m. UTC | #3
On 03/02/2015 10:25 AM, Aldy Hernandez wrote:
> On 03/02/2015 09:55 AM, Richard Henderson wrote:
>> On 02/26/2015 05:46 PM, Aldy Hernandez wrote:
>>> +;; Optimize division or modulo by constant power of 2, if the constant
>>> +;; materializes only after expansion.
>>> +(define_insn_and_split "*udivmod<mode>4_pow2"
>>> +  [(set (match_operand:SWI48 0 "register_operand" "=r")
>>> +    (udiv:SWI48 (match_operand:SWI48 2 "register_operand" "0")
>>> +            (match_operand:SWI48 3 "const_int_operand" "n")))
>>> +   (set (match_operand:SWI48 1 "register_operand" "=r")
>>> +    (umod:SWI48 (match_dup 2) (match_dup 3)))
>>> +   (clobber (reg:CC FLAGS_REG))]
>>> +  "UINTVAL (operands[3]) - 2 < <MODE_SIZE> * BITS_PER_UNIT
>>> +   && (UINTVAL (operands[3]) & (UINTVAL (operands[3]) - 1)) == 0"
>>> +  "#"
>>> +  "reload_completed"
>>> +  [(set (match_dup 1) (match_dup 2))
>>> +   (parallel [(set (match_dup 0) (lshiftrt:<MODE> (match_dup 2) (match_dup
>>> 4)))
>>> +          (clobber (reg:CC FLAGS_REG))])
>>> +   (parallel [(set (match_dup 1) (and:<MODE> (match_dup 1) (match_dup 5)))
>>> +          (clobber (reg:CC FLAGS_REG))])]
>>> +{
>>
>> Do you actually need the reload completed?  Couldn't this be legitimately split
>> before reload (and then parts of it DCE'd as necessary)?
> 
> I think Jakub mentioned we needed it, otherwise we needed to check for a
> division by zero for the INTVAL.  But I could be misunderstanding.

The UINTVAL - 2 < MODE_BIT_SIZE condition means that 0 & 1 are excluded.  So I
don't see how division by zero applies here.


r~
Richard Henderson March 2, 2015, 6:38 p.m. UTC | #4
On 03/02/2015 09:55 AM, Richard Henderson wrote:
> On 02/26/2015 05:46 PM, Aldy Hernandez wrote:
>> +;; Optimize division or modulo by constant power of 2, if the constant
>> +;; materializes only after expansion.
>> +(define_insn_and_split "*udivmod<mode>4_pow2"
>> +  [(set (match_operand:SWI48 0 "register_operand" "=r")
>> +	(udiv:SWI48 (match_operand:SWI48 2 "register_operand" "0")
>> +		    (match_operand:SWI48 3 "const_int_operand" "n")))
>> +   (set (match_operand:SWI48 1 "register_operand" "=r")
>> +	(umod:SWI48 (match_dup 2) (match_dup 3)))
>> +   (clobber (reg:CC FLAGS_REG))]
>> +  "UINTVAL (operands[3]) - 2 < <MODE_SIZE> * BITS_PER_UNIT
>> +   && (UINTVAL (operands[3]) & (UINTVAL (operands[3]) - 1)) == 0"
>> +  "#"
>> +  "reload_completed"
>> +  [(set (match_dup 1) (match_dup 2))
>> +   (parallel [(set (match_dup 0) (lshiftrt:<MODE> (match_dup 2) (match_dup 4)))
>> +	      (clobber (reg:CC FLAGS_REG))])
>> +   (parallel [(set (match_dup 1) (and:<MODE> (match_dup 1) (match_dup 5)))
>> +	      (clobber (reg:CC FLAGS_REG))])]
>> +{
> 
> Do you actually need the reload completed?  Couldn't this be legitimately split
> before reload (and then parts of it DCE'd as necessary)?
> 
> Otherwise, the split condition does need "&&" as mentioned by Uros.

Oh, I forgot -- even if you can exclude reload_completed,
you'd then need to use "&& 1" to effectively copy the first
condition.


r~

Patch
diff mbox

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 2d3d075..ad02156 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -7331,6 +7331,32 @@ 
   [(set_attr "type" "multi")
    (set_attr "mode" "<MODE>")])
 
+;; Optimize division or modulo by constant power of 2, if the constant
+;; materializes only after expansion.
+(define_insn_and_split "*udivmod<mode>4_pow2"
+  [(set (match_operand:SWI48 0 "register_operand" "=r")
+	(udiv:SWI48 (match_operand:SWI48 2 "register_operand" "0")
+		    (match_operand:SWI48 3 "const_int_operand" "n")))
+   (set (match_operand:SWI48 1 "register_operand" "=r")
+	(umod:SWI48 (match_dup 2) (match_dup 3)))
+   (clobber (reg:CC FLAGS_REG))]
+  "UINTVAL (operands[3]) - 2 < <MODE_SIZE> * BITS_PER_UNIT
+   && (UINTVAL (operands[3]) & (UINTVAL (operands[3]) - 1)) == 0"
+  "#"
+  "reload_completed"
+  [(set (match_dup 1) (match_dup 2))
+   (parallel [(set (match_dup 0) (lshiftrt:<MODE> (match_dup 2) (match_dup 4)))
+	      (clobber (reg:CC FLAGS_REG))])
+   (parallel [(set (match_dup 1) (and:<MODE> (match_dup 1) (match_dup 5)))
+	      (clobber (reg:CC FLAGS_REG))])]
+{
+  int v = exact_log2 (UINTVAL (operands[3]));
+  operands[4] = GEN_INT (v);
+  operands[5] = GEN_INT ((HOST_WIDE_INT_1U << v) - 1);
+}
+  [(set_attr "type" "multi")
+   (set_attr "mode" "<MODE>")])
+
 (define_insn "*udivmod<mode>4_noext"
   [(set (match_operand:SWIM248 0 "register_operand" "=a")
 	(udiv:SWIM248 (match_operand:SWIM248 2 "register_operand" "0")
diff --git a/gcc/testsuite/gcc.target/i386/pr65520.c b/gcc/testsuite/gcc.target/i386/pr65520.c
new file mode 100644
index 0000000..8a62c39
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr65520.c
@@ -0,0 +1,20 @@ 
+/* PR target/65520 */
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+int foo (void *);
+
+void
+bar (void)
+{
+  unsigned s = 128;
+  while (1)
+    {
+      unsigned b[s];
+      if (foo (b))
+	break;
+      s *= 2;
+    }
+}
+
+/* { dg-final { scan-assembler-not "div\[^\n\r]*%" } } */