diff mbox

[i386] : Avoid partial reg stall with arith insn + setCC + movzbl sequence

Message ID CAFULd4Ztr=OOOc8DknWzY0FqXDW_OZKKkuSsO8oO0z2vBayEPA@mail.gmail.com
State New
Headers show

Commit Message

Uros Bizjak March 12, 2012, 8:52 a.m. UTC
Hello!

Attached patch improves setCC + movzbl to xor + setcc peephole2 to
also handle CC setting arithmetic instructions.

2012-03-12  Uros Bizjak  <ubizjak@gmail.com>

	* config/i386/i386.md (setcc + movzbl to xor + setcc peephole2):
	Also convert sequences with CC setting arithmetic instruction.

Tested on x86_64-pc-linux-gnu {, -m32}, committed to mainline SVN.

Uros.

Comments

Paolo Bonzini March 12, 2012, 10:13 a.m. UTC | #1
Il 12/03/2012 09:52, Uros Bizjak ha scritto:
> +(define_peephole2
> +  [(parallel [(set (reg FLAGS_REG) (match_operand 0 "" ""))
> +	      (match_operand 4 "" "")])
> +   (set (match_operand:QI 1 "register_operand" "")
> +	(match_operator:QI 2 "ix86_comparison_operator"
> +	  [(reg FLAGS_REG) (const_int 0)]))
> +   (set (match_operand 3 "q_regs_operand" "")
> +	(zero_extend (match_dup 1)))]
> +  "(peep2_reg_dead_p (3, operands[1])
> +    || operands_match_p (operands[1], operands[3]))
> +   && ! reg_overlap_mentioned_p (operands[3], operands[0])"

I understand that you're assuming the shape of operands[4] to be the
same as operands[3], but would it be preferrable to add another overlap
check on operands[4]?

For example the transformation is invalid if you had an overlap between
operands[3] and the destination of operands[4].

Paolo
Uros Bizjak March 12, 2012, 10:49 a.m. UTC | #2
On Mon, Mar 12, 2012 at 11:13 AM, Paolo Bonzini <bonzini@gnu.org> wrote:
> Il 12/03/2012 09:52, Uros Bizjak ha scritto:
>> +(define_peephole2
>> +  [(parallel [(set (reg FLAGS_REG) (match_operand 0 "" ""))
>> +           (match_operand 4 "" "")])
>> +   (set (match_operand:QI 1 "register_operand" "")
>> +     (match_operator:QI 2 "ix86_comparison_operator"
>> +       [(reg FLAGS_REG) (const_int 0)]))
>> +   (set (match_operand 3 "q_regs_operand" "")
>> +     (zero_extend (match_dup 1)))]
>> +  "(peep2_reg_dead_p (3, operands[1])
>> +    || operands_match_p (operands[1], operands[3]))
>> +   && ! reg_overlap_mentioned_p (operands[3], operands[0])"
>
> I understand that you're assuming the shape of operands[4] to be the
> same as operands[3], but would it be preferrable to add another overlap
> check on operands[4]?
>
> For example the transformation is invalid if you had an overlap between
> operands[3] and the destination of operands[4].

The destination of operands[4] _always_ matches one of operands inside
operand[0]. All arithmetic insn that set flags are destructive on x86.

Uros.
diff mbox

Patch

Index: i386.md
===================================================================
--- i386.md	(revision 185201)
+++ i386.md	(working copy)
@@ -11170,6 +11170,27 @@ 
   ix86_expand_clear (operands[3]);
 })
 
+(define_peephole2
+  [(parallel [(set (reg FLAGS_REG) (match_operand 0 "" ""))
+	      (match_operand 4 "" "")])
+   (set (match_operand:QI 1 "register_operand" "")
+	(match_operator:QI 2 "ix86_comparison_operator"
+	  [(reg FLAGS_REG) (const_int 0)]))
+   (set (match_operand 3 "q_regs_operand" "")
+	(zero_extend (match_dup 1)))]
+  "(peep2_reg_dead_p (3, operands[1])
+    || operands_match_p (operands[1], operands[3]))
+   && ! reg_overlap_mentioned_p (operands[3], operands[0])"
+  [(parallel [(set (match_dup 5) (match_dup 0))
+	      (match_dup 4)])
+   (set (strict_low_part (match_dup 6))
+	(match_dup 2))]
+{
+  operands[5] = gen_rtx_REG (GET_MODE (operands[0]), FLAGS_REG);
+  operands[6] = gen_lowpart (QImode, operands[3]);
+  ix86_expand_clear (operands[3]);
+})
+
 ;; Similar, but match zero extend with andsi3.
 
 (define_peephole2
@@ -11190,6 +11211,28 @@ 
   operands[5] = gen_lowpart (QImode, operands[3]);
   ix86_expand_clear (operands[3]);
 })
+
+(define_peephole2
+  [(parallel [(set (reg FLAGS_REG) (match_operand 0 "" ""))
+	      (match_operand 4 "" "")])
+   (set (match_operand:QI 1 "register_operand" "")
+	(match_operator:QI 2 "ix86_comparison_operator"
+	  [(reg FLAGS_REG) (const_int 0)]))
+   (parallel [(set (match_operand 3 "q_regs_operand" "")
+		   (zero_extend (match_dup 1)))
+	      (clobber (reg:CC FLAGS_REG))])]
+  "(peep2_reg_dead_p (3, operands[1])
+    || operands_match_p (operands[1], operands[3]))
+   && ! reg_overlap_mentioned_p (operands[3], operands[0])"
+  [(parallel [(set (match_dup 5) (match_dup 0))
+	      (match_dup 4)])
+   (set (strict_low_part (match_dup 6))
+	(match_dup 2))]
+{
+  operands[5] = gen_rtx_REG (GET_MODE (operands[0]), FLAGS_REG);
+  operands[6] = gen_lowpart (QImode, operands[3]);
+  ix86_expand_clear (operands[3]);
+})
 
 ;; Call instructions.