Message ID | 54EFCC88.6080203@redhat.com |
---|---|
State | New |
Headers | show |
On 02/26/2015 05:46 PM, Aldy Hernandez wrote: > +;; Optimize division or modulo by constant power of 2, if the constant > +;; materializes only after expansion. > +(define_insn_and_split "*udivmod<mode>4_pow2" > + [(set (match_operand:SWI48 0 "register_operand" "=r") > + (udiv:SWI48 (match_operand:SWI48 2 "register_operand" "0") > + (match_operand:SWI48 3 "const_int_operand" "n"))) > + (set (match_operand:SWI48 1 "register_operand" "=r") > + (umod:SWI48 (match_dup 2) (match_dup 3))) > + (clobber (reg:CC FLAGS_REG))] > + "UINTVAL (operands[3]) - 2 < <MODE_SIZE> * BITS_PER_UNIT > + && (UINTVAL (operands[3]) & (UINTVAL (operands[3]) - 1)) == 0" > + "#" > + "reload_completed" > + [(set (match_dup 1) (match_dup 2)) > + (parallel [(set (match_dup 0) (lshiftrt:<MODE> (match_dup 2) (match_dup 4))) > + (clobber (reg:CC FLAGS_REG))]) > + (parallel [(set (match_dup 1) (and:<MODE> (match_dup 1) (match_dup 5))) > + (clobber (reg:CC FLAGS_REG))])] > +{ Do you actually need the reload completed? Couldn't this be legitimately split before reload (and then parts of it DCE'd as necessary)? Otherwise, the split condition does need "&&" as mentioned by Uros. r~
On 03/02/2015 09:55 AM, Richard Henderson wrote: > On 02/26/2015 05:46 PM, Aldy Hernandez wrote: >> +;; Optimize division or modulo by constant power of 2, if the constant >> +;; materializes only after expansion. >> +(define_insn_and_split "*udivmod<mode>4_pow2" >> + [(set (match_operand:SWI48 0 "register_operand" "=r") >> + (udiv:SWI48 (match_operand:SWI48 2 "register_operand" "0") >> + (match_operand:SWI48 3 "const_int_operand" "n"))) >> + (set (match_operand:SWI48 1 "register_operand" "=r") >> + (umod:SWI48 (match_dup 2) (match_dup 3))) >> + (clobber (reg:CC FLAGS_REG))] >> + "UINTVAL (operands[3]) - 2 < <MODE_SIZE> * BITS_PER_UNIT >> + && (UINTVAL (operands[3]) & (UINTVAL (operands[3]) - 1)) == 0" >> + "#" >> + "reload_completed" >> + [(set (match_dup 1) (match_dup 2)) >> + (parallel [(set (match_dup 0) (lshiftrt:<MODE> (match_dup 2) (match_dup 4))) >> + (clobber (reg:CC FLAGS_REG))]) >> + (parallel [(set (match_dup 1) (and:<MODE> (match_dup 1) (match_dup 5))) >> + (clobber (reg:CC FLAGS_REG))])] >> +{ > > Do you actually need the reload completed? Couldn't this be legitimately split > before reload (and then parts of it DCE'd as necessary)? I think Jakub mentioned we needed it, otherwise we needed to check for a division by zero for the INTVAL. But I could be misunderstanding. Aldy
On 03/02/2015 10:25 AM, Aldy Hernandez wrote: > On 03/02/2015 09:55 AM, Richard Henderson wrote: >> On 02/26/2015 05:46 PM, Aldy Hernandez wrote: >>> +;; Optimize division or modulo by constant power of 2, if the constant >>> +;; materializes only after expansion. >>> +(define_insn_and_split "*udivmod<mode>4_pow2" >>> + [(set (match_operand:SWI48 0 "register_operand" "=r") >>> + (udiv:SWI48 (match_operand:SWI48 2 "register_operand" "0") >>> + (match_operand:SWI48 3 "const_int_operand" "n"))) >>> + (set (match_operand:SWI48 1 "register_operand" "=r") >>> + (umod:SWI48 (match_dup 2) (match_dup 3))) >>> + (clobber (reg:CC FLAGS_REG))] >>> + "UINTVAL (operands[3]) - 2 < <MODE_SIZE> * BITS_PER_UNIT >>> + && (UINTVAL (operands[3]) & (UINTVAL (operands[3]) - 1)) == 0" >>> + "#" >>> + "reload_completed" >>> + [(set (match_dup 1) (match_dup 2)) >>> + (parallel [(set (match_dup 0) (lshiftrt:<MODE> (match_dup 2) (match_dup >>> 4))) >>> + (clobber (reg:CC FLAGS_REG))]) >>> + (parallel [(set (match_dup 1) (and:<MODE> (match_dup 1) (match_dup 5))) >>> + (clobber (reg:CC FLAGS_REG))])] >>> +{ >> >> Do you actually need the reload completed? Couldn't this be legitimately split >> before reload (and then parts of it DCE'd as necessary)? > > I think Jakub mentioned we needed it, otherwise we needed to check for a > division by zero for the INTVAL. But I could be misunderstanding. The UINTVAL - 2 < MODE_BIT_SIZE condition means that 0 & 1 are excluded. So I don't see how division by zero applies here. r~
On 03/02/2015 09:55 AM, Richard Henderson wrote: > On 02/26/2015 05:46 PM, Aldy Hernandez wrote: >> +;; Optimize division or modulo by constant power of 2, if the constant >> +;; materializes only after expansion. >> +(define_insn_and_split "*udivmod<mode>4_pow2" >> + [(set (match_operand:SWI48 0 "register_operand" "=r") >> + (udiv:SWI48 (match_operand:SWI48 2 "register_operand" "0") >> + (match_operand:SWI48 3 "const_int_operand" "n"))) >> + (set (match_operand:SWI48 1 "register_operand" "=r") >> + (umod:SWI48 (match_dup 2) (match_dup 3))) >> + (clobber (reg:CC FLAGS_REG))] >> + "UINTVAL (operands[3]) - 2 < <MODE_SIZE> * BITS_PER_UNIT >> + && (UINTVAL (operands[3]) & (UINTVAL (operands[3]) - 1)) == 0" >> + "#" >> + "reload_completed" >> + [(set (match_dup 1) (match_dup 2)) >> + (parallel [(set (match_dup 0) (lshiftrt:<MODE> (match_dup 2) (match_dup 4))) >> + (clobber (reg:CC FLAGS_REG))]) >> + (parallel [(set (match_dup 1) (and:<MODE> (match_dup 1) (match_dup 5))) >> + (clobber (reg:CC FLAGS_REG))])] >> +{ > > Do you actually need the reload completed? Couldn't this be legitimately split > before reload (and then parts of it DCE'd as necessary)? > > Otherwise, the split condition does need "&&" as mentioned by Uros. Oh, I forgot -- even if you can exclude reload_completed, you'd then need to use "&& 1" to effectively copy the first condition. r~
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 2d3d075..ad02156 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -7331,6 +7331,32 @@ [(set_attr "type" "multi") (set_attr "mode" "<MODE>")]) +;; Optimize division or modulo by constant power of 2, if the constant +;; materializes only after expansion. +(define_insn_and_split "*udivmod<mode>4_pow2" + [(set (match_operand:SWI48 0 "register_operand" "=r") + (udiv:SWI48 (match_operand:SWI48 2 "register_operand" "0") + (match_operand:SWI48 3 "const_int_operand" "n"))) + (set (match_operand:SWI48 1 "register_operand" "=r") + (umod:SWI48 (match_dup 2) (match_dup 3))) + (clobber (reg:CC FLAGS_REG))] + "UINTVAL (operands[3]) - 2 < <MODE_SIZE> * BITS_PER_UNIT + && (UINTVAL (operands[3]) & (UINTVAL (operands[3]) - 1)) == 0" + "#" + "reload_completed" + [(set (match_dup 1) (match_dup 2)) + (parallel [(set (match_dup 0) (lshiftrt:<MODE> (match_dup 2) (match_dup 4))) + (clobber (reg:CC FLAGS_REG))]) + (parallel [(set (match_dup 1) (and:<MODE> (match_dup 1) (match_dup 5))) + (clobber (reg:CC FLAGS_REG))])] +{ + int v = exact_log2 (UINTVAL (operands[3])); + operands[4] = GEN_INT (v); + operands[5] = GEN_INT ((HOST_WIDE_INT_1U << v) - 1); +} + [(set_attr "type" "multi") + (set_attr "mode" "<MODE>")]) + (define_insn "*udivmod<mode>4_noext" [(set (match_operand:SWIM248 0 "register_operand" "=a") (udiv:SWIM248 (match_operand:SWIM248 2 "register_operand" "0") diff --git a/gcc/testsuite/gcc.target/i386/pr65520.c b/gcc/testsuite/gcc.target/i386/pr65520.c new file mode 100644 index 0000000..8a62c39 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr65520.c @@ -0,0 +1,20 @@ +/* PR target/65520 */ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +int foo (void *); + +void +bar (void) +{ + unsigned s = 128; + while (1) + { + unsigned b[s]; + if (foo (b)) + break; + s *= 2; + } +} + +/* { dg-final { scan-assembler-not "div\[^\n\r]*%" } } */