i386: do not use SImode mul-highpart on 64-bit

Message ID alpine.LNX.2.20.13.1808091740340.7591@monopod.intra.ispras.ru
State New
Headers show
Series
  • i386: do not use SImode mul-highpart on 64-bit
Related show

Commit Message

Alexander Monakov Aug. 9, 2018, 3 p.m.
Hello,

on x86-64, 32-bit division by constants uses mulsi3_highpart pattern that
turns into 'mull <reg>' instruction with source implicitly in eax and
result in edx:eax.  However, using 64-bit multiplication with zero-extended
source would be preferable, as the imulq instruction accepts the magic
multiplier as immediate if it fits into 31 bits, has fewer register allocation
constraints, and typically has better latency/throughput.

Perhaps ideally we'd want expand_divmod to automatically choose this cheaper
variant on x86, but changing that appears to be too complicated. On the other
hand, we don't use mul_highpart patterns for anything else, so we can simply
expose only the DImode mul_highpart on x86-64.

This patch does that by changing mode iterator so we offer only SImode
mul-highpart on 32-bit x86 and only DImode on 64-bit.

Bootstrapped/regtested on x86-64, OK for trunk?

Alexander

	PR target/82418
	* config/i386/i386.md (<s>mul<mode>3_highpart): Use DWIH mode iterator
	instead of SWI48.

testsuite/
	* gcc.target/i386/pr82418.c: New test.

Comments

Uros Bizjak Aug. 10, 2018, 5:53 a.m. | #1
On Thu, Aug 9, 2018 at 5:00 PM, Alexander Monakov <amonakov@ispras.ru> wrote:
> Hello,
>
> on x86-64, 32-bit division by constants uses mulsi3_highpart pattern that
> turns into 'mull <reg>' instruction with source implicitly in eax and
> result in edx:eax.  However, using 64-bit multiplication with zero-extended
> source would be preferable, as the imulq instruction accepts the magic
> multiplier as immediate if it fits into 31 bits, has fewer register allocation
> constraints, and typically has better latency/throughput.
>
> Perhaps ideally we'd want expand_divmod to automatically choose this cheaper
> variant on x86, but changing that appears to be too complicated. On the other
> hand, we don't use mul_highpart patterns for anything else, so we can simply
> expose only the DImode mul_highpart on x86-64.
>
> This patch does that by changing mode iterator so we offer only SImode
> mul-highpart on 32-bit x86 and only DImode on 64-bit.
>
> Bootstrapped/regtested on x86-64, OK for trunk?
>
> Alexander
>
>         PR target/82418
>         * config/i386/i386.md (<s>mul<mode>3_highpart): Use DWIH mode iterator
>         instead of SWI48.
>
> testsuite/
>         * gcc.target/i386/pr82418.c: New test.

OK.

Thanks,
Uros.

> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> index 73948c12618..10783d305d2 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -7792,16 +7792,16 @@ (define_insn "*<u>mulqihi3_1"
>     (set_attr "mode" "QI")])
>
>  (define_expand "<s>mul<mode>3_highpart"
> -  [(parallel [(set (match_operand:SWI48 0 "register_operand")
> -                  (truncate:SWI48
> +  [(parallel [(set (match_operand:DWIH 0 "register_operand")
> +                  (truncate:DWIH
>                      (lshiftrt:<DWI>
>                        (mult:<DWI>
>                          (any_extend:<DWI>
> -                          (match_operand:SWI48 1 "nonimmediate_operand"))
> +                          (match_operand:DWIH 1 "nonimmediate_operand"))
>                          (any_extend:<DWI>
> -                          (match_operand:SWI48 2 "register_operand")))
> +                          (match_operand:DWIH 2 "register_operand")))
>                        (match_dup 3))))
> -             (clobber (match_scratch:SWI48 4))
> +             (clobber (match_scratch:DWIH 4))
>               (clobber (reg:CC FLAGS_REG))])]
>    ""
>    "operands[3] = GEN_INT (GET_MODE_BITSIZE (<MODE>mode));")
> diff --git a/gcc/testsuite/gcc.target/i386/pr82418.c b/gcc/testsuite/gcc.target/i386/pr82418.c
> index e69de29bb2d..95a506d5ccd 100644
> --- a/gcc/testsuite/gcc.target/i386/pr82418.c
> +++ b/gcc/testsuite/gcc.target/i386/pr82418.c
> @@ -0,0 +1,10 @@
> +/* PR target/82418 */
> +/* { dg-do compile { target { ! ia32 } } } */
> +/* { dg-options "-O2" } */
> +/* { dg-final { scan-assembler "imul\[^\n\r]*1374389535" } } */
> +
> +unsigned
> +f1(unsigned x)
> +{
> +  return x / 100;
> +}

Patch

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 73948c12618..10783d305d2 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -7792,16 +7792,16 @@  (define_insn "*<u>mulqihi3_1"
    (set_attr "mode" "QI")])
 
 (define_expand "<s>mul<mode>3_highpart"
-  [(parallel [(set (match_operand:SWI48 0 "register_operand")
-		   (truncate:SWI48
+  [(parallel [(set (match_operand:DWIH 0 "register_operand")
+		   (truncate:DWIH
 		     (lshiftrt:<DWI>
 		       (mult:<DWI>
 			 (any_extend:<DWI>
-			   (match_operand:SWI48 1 "nonimmediate_operand"))
+			   (match_operand:DWIH 1 "nonimmediate_operand"))
 			 (any_extend:<DWI>
-			   (match_operand:SWI48 2 "register_operand")))
+			   (match_operand:DWIH 2 "register_operand")))
 		       (match_dup 3))))
-	      (clobber (match_scratch:SWI48 4))
+	      (clobber (match_scratch:DWIH 4))
 	      (clobber (reg:CC FLAGS_REG))])]
   ""
   "operands[3] = GEN_INT (GET_MODE_BITSIZE (<MODE>mode));")
diff --git a/gcc/testsuite/gcc.target/i386/pr82418.c b/gcc/testsuite/gcc.target/i386/pr82418.c
index e69de29bb2d..95a506d5ccd 100644
--- a/gcc/testsuite/gcc.target/i386/pr82418.c
+++ b/gcc/testsuite/gcc.target/i386/pr82418.c
@@ -0,0 +1,10 @@ 
+/* PR target/82418 */
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler "imul\[^\n\r]*1374389535" } } */
+
+unsigned
+f1(unsigned x)
+{
+  return x / 100;
+}