diff mbox

[RFC,i386] : Enable post-reload compare elimination pass

Message ID CAFULd4beOEsXre1y+nz5c0iOUMVDmLYLO9yUiizHCtYw9bOWmg@mail.gmail.com
State New
Headers show

Commit Message

Uros Bizjak May 9, 2017, 4:06 p.m. UTC
Hello!

Attached patch enables post-reload compare elimination pass by
providing expected patterns (duplicates of existing patterns with
setters of reg and flags switched in the parallel) for flag setting
arithmetic instructions.

The merge triggers more than 3000 times during the gcc bootstrap,
mostly in cases where intervening memory load or store prevents
combine from merging the arithmetic insn and the following compare.

Also, some recent linux x86_64 defconfig build results in ~200 merges,
removing ~200 test/cmp insns. Not much, but I think the results still
warrant the pass to be enabled.

2017-05-09  Uros Bizjak  <ubizjak@gmail.com>

    * config/i386/i386-protos.h (ix86_match_ccmode_last): New prototype.
    * config/i386/i386.c (ix86_match_ccmode_1): Rename from
    ix86_match_ccmode.  Add "last" argument.  Make function static inline.
    (ix86_match_ccmode): New function.
    (ix86_match_ccmode_last): Ditto.
    (TARGET_FLAGS_REGNUM): Define.
    * config/i386/i386.md (*add<mode>_2b): New insn pattern.
    (*sub<mode>_2b): Ditto.
    (*and<mode>_2b): Ditto.
    (*<any_or:code><mode>_2b): Ditto.

Patch was bootstrapped and regression tested on x86_64-linux-gnu.

Uros.

Comments

Jakub Jelinek May 10, 2017, 2:27 p.m. UTC | #1
On Tue, May 09, 2017 at 06:06:47PM +0200, Uros Bizjak wrote:
> Attached patch enables post-reload compare elimination pass by
> providing expected patterns (duplicates of existing patterns with
> setters of reg and flags switched in the parallel) for flag setting
> arithmetic instructions.
> 
> The merge triggers more than 3000 times during the gcc bootstrap,
> mostly in cases where intervening memory load or store prevents
> combine from merging the arithmetic insn and the following compare.
> 
> Also, some recent linux x86_64 defconfig build results in ~200 merges,
> removing ~200 test/cmp insns. Not much, but I think the results still
> warrant the pass to be enabled.

Isn't the right fix instead to change the compare-elim.c pass to either
accept both reg vs. flags orderings in parallel, or both depending
on some target hook, or change it to the order i386.md and most other
major targets use and just fix up mn10300/rx (and aarch64?) to use the same
order?
I think this has been discussed before already several times.

> 
> 2017-05-09  Uros Bizjak  <ubizjak@gmail.com>
> 
>     * config/i386/i386-protos.h (ix86_match_ccmode_last): New prototype.
>     * config/i386/i386.c (ix86_match_ccmode_1): Rename from
>     ix86_match_ccmode.  Add "last" argument.  Make function static inline.
>     (ix86_match_ccmode): New function.
>     (ix86_match_ccmode_last): Ditto.
>     (TARGET_FLAGS_REGNUM): Define.
>     * config/i386/i386.md (*add<mode>_2b): New insn pattern.
>     (*sub<mode>_2b): Ditto.
>     (*and<mode>_2b): Ditto.
>     (*<any_or:code><mode>_2b): Ditto.

	Jakub
Uros Bizjak May 10, 2017, 3:18 p.m. UTC | #2
On Wed, May 10, 2017 at 4:27 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Tue, May 09, 2017 at 06:06:47PM +0200, Uros Bizjak wrote:
>> Attached patch enables post-reload compare elimination pass by
>> providing expected patterns (duplicates of existing patterns with
>> setters of reg and flags switched in the parallel) for flag setting
>> arithmetic instructions.
>>
>> The merge triggers more than 3000 times during the gcc bootstrap,
>> mostly in cases where intervening memory load or store prevents
>> combine from merging the arithmetic insn and the following compare.
>>
>> Also, some recent linux x86_64 defconfig build results in ~200 merges,
>> removing ~200 test/cmp insns. Not much, but I think the results still
>> warrant the pass to be enabled.
>
> Isn't the right fix instead to change the compare-elim.c pass to either
> accept both reg vs. flags orderings in parallel, or both depending
> on some target hook, or change it to the order i386.md and most other
> major targets use and just fix up mn10300/rx (and aarch64?) to use the same
> order?

I was looking at compare-elim.c, where in line 675 function
try_eliminate_compare simply substitutes clobber of CC-reg with a new
compare RTX through a validate_change call. I'm not sure, what would
be the best way to handle both insn variants here. I was hoping
perhaps Jeff would help with the correct approach. Additional copy of
several patterns indeed seems heavily counter-productive.

> I think this has been discussed before already several times.

True, but there was no resolution. As an experiment, I was surprised,
how many cases patched compiler caught, even with the limited set of
additional patterns. So, the postreload cmpelim pass certainly brings
some benefits, in the same sense postreload ree pass does, to catch
additional opportunities that combine pass wasn't able to merge.

Uros.
diff mbox

Patch

Index: i386-protos.h
===================================================================
--- i386-protos.h	(revision 247793)
+++ i386-protos.h	(working copy)
@@ -125,6 +125,7 @@  extern void ix86_split_copysign_const (rtx []);
 extern void ix86_split_copysign_var (rtx []);
 extern bool ix86_unary_operator_ok (enum rtx_code, machine_mode, rtx[]);
 extern bool ix86_match_ccmode (rtx, machine_mode);
+extern bool ix86_match_ccmode_last (rtx, machine_mode);
 extern void ix86_expand_branch (enum rtx_code, rtx, rtx, rtx);
 extern void ix86_expand_setcc (rtx, enum rtx_code, rtx, rtx);
 extern bool ix86_expand_int_movcc (rtx[]);
Index: i386.c
===================================================================
--- i386.c	(revision 247793)
+++ i386.c	(working copy)
@@ -22337,12 +22337,8 @@  ix86_split_copysign_var (rtx operands[])
   emit_insn (gen_rtx_SET (dest, x));
 }
 
-/* Return TRUE or FALSE depending on whether the first SET in INSN
-   has source and destination with matching CC modes, and that the
-   CC mode is at least as constrained as REQ_MODE.  */
-
-bool
-ix86_match_ccmode (rtx insn, machine_mode req_mode)
+static inline bool
+ix86_match_ccmode_1 (rtx insn, machine_mode req_mode, bool last)
 {
   rtx set;
   machine_mode set_mode;
@@ -22349,7 +22345,7 @@  ix86_split_copysign_var (rtx operands[])
 
   set = PATTERN (insn);
   if (GET_CODE (set) == PARALLEL)
-    set = XVECEXP (set, 0, 0);
+    set = XVECEXP (set, 0, last ? XVECLEN (set, 0) - 1 : 0);
   gcc_assert (GET_CODE (set) == SET);
   gcc_assert (GET_CODE (SET_SRC (set)) == COMPARE);
 
@@ -22393,6 +22389,26 @@  ix86_split_copysign_var (rtx operands[])
   return GET_MODE (SET_SRC (set)) == set_mode;
 }
 
+/* Return TRUE or FALSE depending on whether the first SET in INSN
+   has source and destination with matching CC modes, and that the
+   CC mode is at least as constrained as REQ_MODE.  */
+
+bool
+ix86_match_ccmode (rtx insn, machine_mode req_mode)
+{
+  return ix86_match_ccmode_1 (insn, req_mode, false);
+}
+
+/* Return TRUE or FALSE depending on whether the last SET in INSN
+   has source and destination with matching CC modes, and that the
+   CC mode is at least as constrained as REQ_MODE.  */
+
+bool
+ix86_match_ccmode_last (rtx insn, machine_mode req_mode)
+{
+  return ix86_match_ccmode_1 (insn, req_mode, true);
+}
+
 /* Generate insn patterns to do an integer compare of OPERANDS.  */
 
 static rtx
@@ -52043,6 +52059,8 @@  ix86_run_selftests (void)
 #undef TARGET_ADDRESS_COST
 #define TARGET_ADDRESS_COST ix86_address_cost
 
+#undef TARGET_FLAGS_REGNUM
+#define TARGET_FLAGS_REGNUM FLAGS_REG
 #undef TARGET_FIXED_CONDITION_CODE_REGS
 #define TARGET_FIXED_CONDITION_CODE_REGS ix86_fixed_condition_code_regs
 #undef TARGET_CC_MODES_COMPATIBLE
Index: i386.md
===================================================================
--- i386.md	(revision 247793)
+++ i386.md	(working copy)
@@ -5917,6 +5917,52 @@ 
 	(const_string "*")))
    (set_attr "mode" "<MODE>")])
 
+(define_insn "*add<mode>_2b"
+  [(set (match_operand:SWI 0 "nonimmediate_operand" "=<r>,<r>m,<r>")
+	(plus:SWI
+	  (match_operand:SWI 1 "nonimmediate_operand" "%0,0,<r>")
+	  (match_operand:SWI 2 "<general_operand>" "<g>,<r><i>,0")))
+   (set (reg FLAGS_REG)
+	(compare
+	  (plus:SWI (match_dup 1) (match_dup 2))
+	  (const_int 0)))]
+  "ix86_match_ccmode_last (insn, CCGOCmode)
+   && ix86_binary_operator_ok (PLUS, <MODE>mode, operands)
+   && reload_completed"
+{
+  switch (get_attr_type (insn))
+    {
+    case TYPE_INCDEC:
+      if (operands[2] == const1_rtx)
+	return "inc{<imodesuffix>}\t%0";
+      else
+	{
+	  gcc_assert (operands[2] == constm1_rtx);
+	  return "dec{<imodesuffix>}\t%0";
+	}
+
+    default:
+      if (which_alternative == 2)
+	std::swap (operands[1], operands[2]);
+        
+      gcc_assert (rtx_equal_p (operands[0], operands[1]));
+      if (x86_maybe_negate_const_int (&operands[2], <MODE>mode))
+	return "sub{<imodesuffix>}\t{%2, %0|%0, %2}";
+
+      return "add{<imodesuffix>}\t{%2, %0|%0, %2}";
+    }
+}
+  [(set (attr "type")
+     (if_then_else (match_operand:SWI 2 "incdec_operand")
+	(const_string "incdec")
+	(const_string "alu")))
+   (set (attr "length_immediate")
+      (if_then_else
+	(and (eq_attr "type" "alu") (match_operand 2 "const128_operand"))
+	(const_string "1")
+	(const_string "*")))
+   (set_attr "mode" "<MODE>")])
+
 ;; See comment for addsi_1_zext why we do use nonimmediate_operand
 (define_insn "*addsi_2_zext"
   [(set (reg FLAGS_REG)
@@ -6568,6 +6614,22 @@ 
   [(set_attr "type" "alu")
    (set_attr "mode" "<MODE>")])
 
+(define_insn "*sub<mode>_2b"
+  [(set (match_operand:SWI 0 "nonimmediate_operand" "=<r>m,<r>")
+	(minus:SWI
+	  (match_operand:SWI 1 "nonimmediate_operand" "0,0")
+	  (match_operand:SWI 2 "<general_operand>" "<r><i>,<r>m")))
+   (set (reg FLAGS_REG)
+	(compare
+	  (minus:SWI (match_dup 1) (match_dup 2))
+	  (const_int 0)))]
+  "ix86_match_ccmode_last (insn, CCGOCmode)
+   && ix86_binary_operator_ok (MINUS, <MODE>mode, operands)
+   && reload_completed"
+  "sub{<imodesuffix>}\t{%2, %0|%0, %2}"
+  [(set_attr "type" "alu")
+   (set_attr "mode" "<MODE>")])
+
 (define_insn "*subsi_2_zext"
   [(set (reg FLAGS_REG)
 	(compare
@@ -8476,6 +8538,21 @@ 
   [(set_attr "type" "alu")
    (set_attr "mode" "<MODE>")])
 
+(define_insn "*and<mode>_2b"
+  [(set (match_operand:SWI124 0 "nonimmediate_operand" "=<r>,<r>m")
+	(and:SWI124
+	  (match_operand:SWI124 1 "nonimmediate_operand" "%0,0")
+	  (match_operand:SWI124 2 "<general_operand>" "<g>,<r><i>")))
+   (set (reg FLAGS_REG)
+	(compare (and:SWI124 (match_dup 1) (match_dup 2))
+		 (const_int 0)))]
+  "ix86_match_ccmode_last (insn, CCNOmode)
+   && ix86_binary_operator_ok (AND, <MODE>mode, operands)
+   && reload_completed"
+  "and{<imodesuffix>}\t{%2, %0|%0, %2}"
+  [(set_attr "type" "alu")
+   (set_attr "mode" "<MODE>")])
+
 (define_insn "*andqi_2_slp"
   [(set (reg FLAGS_REG)
 	(compare (and:QI
@@ -8822,6 +8899,21 @@ 
   [(set_attr "type" "alu")
    (set_attr "mode" "<MODE>")])
 
+(define_insn "*<code><mode>_2b"
+  [(set (match_operand:SWI124 0 "nonimmediate_operand" "=<r>,<r>m")
+	(any_or:SWI124
+	  (match_operand:SWI124 1 "nonimmediate_operand" "%0,0")
+	  (match_operand:SWI124 2 "<general_operand>" "<g>,<r><i>")))
+   (set (reg FLAGS_REG)
+	(compare (any_or:SWI124 (match_dup 1) (match_dup 2))
+		 (const_int 0)))]
+  "ix86_match_ccmode_last (insn, CCNOmode)
+   && ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)
+   && reload_completed"
+  "<logic>{<imodesuffix>}\t{%2, %0|%0, %2}"
+  [(set_attr "type" "alu")
+   (set_attr "mode" "<MODE>")])
+
 ;; See comment for addsi_1_zext why we do use nonimmediate_operand
 ;; ??? Special case for immediate operand is missing - it is tricky.
 (define_insn "*<code>si_2_zext"