Patchwork x86: emit tzcnt unconditionally

login
register
mail settings
Submitter Uros Bizjak
Date May 6, 2012, 11:04 p.m.
Message ID <CAFULd4aVBXRK1+SWQdX=4iwg_7XYYJabYam_zU3knkBiPQHWow@mail.gmail.com>
Download mbox | patch
Permalink /patch/157197/
State New
Headers show

Comments

Uros Bizjak - May 6, 2012, 11:04 p.m.
On Mon, Apr 30, 2012 at 10:09 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
> On Fri, Apr 27, 2012 at 3:30 PM, Paolo Bonzini <bonzini@gnu.org> wrote:
>> tzcnt is encoded as "rep;bsf" and unlike lzcnt is a drop-in replacement
>> if we don't care about the flags (it has the same semantics for non-zero
>> values).
>>
>> Since bsf is usually slower, just emit tzcnt unconditionally.  However,
>> write it as rep;bsf unless -mbmi is in use, to cater for old assemblers.
>
> Please emit "rep;bsf" when optimize_insn_for_speed_p () is true.
>
>> Bootstrapped on a non-BMI x86_64-linux host, regtest in progress.
>> Ok for mainline?
>
> OK with the optimize_insn_for_speed_p conditional.

I have committed similar patch, where we emit bsf when optimizing for
size (saving a whopping one byte) and rep;bsf for !TARGET_BMI. The
same functionality can be added to *ffs<mode>_1, since we don't care
what ends in the register for input operand == 0 (this is the key
difference between tzcnt and bsf).

2012-05-06  Uros Bizjak  <ubizjak@gmail.com>
	    Paolo Bonzini  <bonzini@gnu.org>

	* config/i386/i386.md (ctz<mode>2): Emit rep;bsf even for
	!TARGET_BMI and bsf when optimizing for size.
	(*ffs<mode>_1): Ditto.

Tested on x86_64-pc-linux-gnu {,-m32}, committed to mainline SVN.

Uros.
Jakub Jelinek - May 7, 2012, 5:43 a.m.
On Mon, May 07, 2012 at 01:04:33AM +0200, Uros Bizjak wrote:
> Index: i386.md
> ===================================================================
> --- i386.md	(revision 187217)
> +++ i386.md	(working copy)
> @@ -12112,9 +12112,22 @@
>     (set (match_operand:SWI48 0 "register_operand" "=r")
>  	(ctz:SWI48 (match_dup 1)))]
>    ""
> -  "bsf{<imodesuffix>}\t{%1, %0|%0, %1}"
> +{
> +  if (optimize_function_for_size_p (cfun))
> +    return "bsf{<imodesuffix>}\t{%1, %0|%0, %1}";
> +  else if (TARGET_BMI)
> +    return "tzcnt{<imodesuffix>}\t{%1, %0|%0, %1}";
> +  else 
> +    /* tzcnt expands to rep;bsf and we can use it even if !TARGET_BMI.  */
> +    return "rep; bsf{<imodesuffix>}\t{%1, %0|%0, %1}";
> +}

Shouldn't that be done only for generic tuning?  If somebody uses
-mtune=native, then emitting rep; bsf is overkill, the code is intended
to be run on a CPU without (or with TARGET_BMI with) tzcnt insn support.

	Jakub

Patch

Index: i386.md
===================================================================
--- i386.md	(revision 187217)
+++ i386.md	(working copy)
@@ -12112,9 +12112,22 @@ 
    (set (match_operand:SWI48 0 "register_operand" "=r")
 	(ctz:SWI48 (match_dup 1)))]
   ""
-  "bsf{<imodesuffix>}\t{%1, %0|%0, %1}"
+{
+  if (optimize_function_for_size_p (cfun))
+    return "bsf{<imodesuffix>}\t{%1, %0|%0, %1}";
+  else if (TARGET_BMI)
+    return "tzcnt{<imodesuffix>}\t{%1, %0|%0, %1}";
+  else 
+    /* tzcnt expands to rep;bsf and we can use it even if !TARGET_BMI.  */
+    return "rep; bsf{<imodesuffix>}\t{%1, %0|%0, %1}";
+}
   [(set_attr "type" "alu1")
    (set_attr "prefix_0f" "1")
+   (set (attr "prefix_rep")
+     (if_then_else
+       (match_test "optimize_function_for_size_p (cfun)")
+       (const_string "0")
+       (const_string "1")))
    (set_attr "mode" "<MODE>")])
 
 (define_insn "ctz<mode>2"
@@ -12123,14 +12136,21 @@ 
    (clobber (reg:CC FLAGS_REG))]
   ""
 {
-  if (TARGET_BMI)
+  if (optimize_function_for_size_p (cfun))
+    return "bsf{<imodesuffix>}\t{%1, %0|%0, %1}";
+  else if (TARGET_BMI)
     return "tzcnt{<imodesuffix>}\t{%1, %0|%0, %1}";
-  else
-    return "bsf{<imodesuffix>}\t{%1, %0|%0, %1}";
+  else 
+    /* tzcnt expands to rep;bsf and we can use it even if !TARGET_BMI.  */
+    return "rep; bsf{<imodesuffix>}\t{%1, %0|%0, %1}";
 }
   [(set_attr "type" "alu1")
    (set_attr "prefix_0f" "1")
-   (set (attr "prefix_rep") (symbol_ref "TARGET_BMI"))
+   (set (attr "prefix_rep")
+     (if_then_else
+       (match_test "optimize_function_for_size_p (cfun)")
+       (const_string "0")
+       (const_string "1")))
    (set_attr "mode" "<MODE>")])
 
 (define_expand "clz<mode>2"