From patchwork Sat Aug 20 21:16:04 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Uros Bizjak X-Patchwork-Id: 110788 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id 60DE3B6F7B for ; Sun, 21 Aug 2011 07:16:27 +1000 (EST) Received: (qmail 8146 invoked by alias); 20 Aug 2011 21:16:25 -0000 Received: (qmail 8138 invoked by uid 22791); 20 Aug 2011 21:16:22 -0000 X-SWARE-Spam-Status: No, hits=-1.1 required=5.0 tests=AWL, BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, FREEMAIL_FROM, RCVD_IN_DNSWL_LOW, SARE_HTML_INV_TAG, TW_FT, TW_ZJ X-Spam-Check-By: sourceware.org Received: from mail-pz0-f49.google.com (HELO mail-pz0-f49.google.com) (209.85.210.49) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Sat, 20 Aug 2011 21:16:05 +0000 Received: by pzk6 with SMTP id 6so8893413pzk.8 for ; Sat, 20 Aug 2011 14:16:05 -0700 (PDT) MIME-Version: 1.0 Received: by 10.142.161.7 with SMTP id j7mr596967wfe.172.1313874964987; Sat, 20 Aug 2011 14:16:04 -0700 (PDT) Received: by 10.142.54.17 with HTTP; Sat, 20 Aug 2011 14:16:04 -0700 (PDT) In-Reply-To: References: <20110819132236.GD2687@tyan-ft48-01.lab.bos.redhat.com> Date: Sat, 20 Aug 2011 23:16:04 +0200 Message-ID: Subject: Re: [PATCH, testsuite, i386] BMI2 support for GCC From: Uros Bizjak To: Kirill Yukhin Cc: "H.J. Lu" , Jakub Jelinek , gcc-patches List Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org On Sat, Aug 20, 2011 at 2:09 PM, Uros Bizjak wrote: > Don't expand RORX through ix86_expand_binary_operator, generate it > directly from expander. You are complicating things with splitters too > much! > > I will rewrite this part of i386.md. So, attached RFC patch handles BMI2 mul, shift and ror stuff. Some remarks: - M and N register modifiers are added to print low and high register of a double word register pair. This is needed for mulx insn. - ishiftx and rotatex instruction type attributes are added. - "w" mode attribute is added to add register prefix for word mode. This is needed to output QImode count register of shift insns. - mulx is expanded directly from expander, IMO it is always a win to generate this insn if available. - Yb register constraint is added to conditionally enable generation of BMI alternatives in generic shift and rotate patterns. The BMI variant is generated only if RA chooses it as the most profitable alternative. - shift and rotate instructions are split post-reload from generic patterns to strip flags clobber. - zero-extended 64bit variants are also handled for shift and rotate insns. - rotate right AND rotate left instructions are handled through rorx. 2011-08-20 Uros Bizjak * config/i386/i386.md (type): Add ishiftx and rotatex. (length_immediate): Handle ishiftx and rotatex. (imm_disp): Ditto. (w): New mode attribute. (mul3): Split from mul3. (umul3): Ditto. Generate bmi2_umul3_1 pattern for TARGET_BMI2. (bmi2_umul3_1): New insn pattern. (*bmi2_ashl3_1): New insn pattern. (*ashl3_1): Add ishiftx BMI2 alternative. (*ashl3_1 splitter): New splitter to avoid flags dependency. (*bmi2_ashlsi3_1_zext): New insn pattern. (*ashlsi3_1_zext): Add ishiftx BMI2 alternative. (*ashlsi3_1_zext splitter): New splitter to avoid flags dependency. (*bmi2_3_1): New insn pattern. (*3_1): Add ishiftx BMI2 alternative. (*3_1 splitter): New splitter to avoid flags dependency. (*bmi2_si3_1_zext): New insn pattern. (*si3_1_zext): Add ishiftx BMI2 alternative. (*si3_1_zext splitter): New splitter to avoid flags dependency. (*bmi2_rorx3_1): New insn pattern. (*3_1): Add rotatex BMI2 alternative. (*rotate3_1 splitter): New splitter to avoid flags dependency. (*rotatert3_1 splitter): Ditto. (*bmi2_rorxsi3_1_zext): New insn pattern. (*si3_1_zext): Add rotatex BMI2 alternative. (*rotatesi3_1_zext splitter): New splitter to avoid flags dependency. (*rotatertsi3_1_zext splitter): Ditto. * config/i386/constraints.md (Yb): New register constraint. * config/i386/i386.c (print_reg): Handle 'M' and 'N' modifiers. (print_operand): Ditto. The patch is currently in RFC/RFT state, since I have no way to properly test it. The patch bootstraps OK and regression test is clean on x86_64-pc-linux-gnu {,-m32}. I tested the patch lightly on provided testcases, so expected patterns are generated. Oh, and all insn constraints should be changed from TARGET_BMI to TARGET_BMI2. Uros. Index: i386.md =================================================================== --- i386.md (revision 177925) +++ i386.md (working copy) @@ -50,6 +50,8 @@ ;; t -- likewise, print the V8SFmode name of the register. ;; h -- print the QImode name for a "high" register, either ah, bh, ch or dh. ;; y -- print "st(0)" instead of "st" as a register. +;; M -- print the low register of a double word register pair. +;; N -- print the high register of a double word register pair. ;; d -- print duplicated register operand for AVX instruction. ;; D -- print condition for SSE cmp instruction. ;; P -- if PIC, print an @PLT suffix. @@ -377,7 +379,7 @@ (define_attr "type" "other,multi, alu,alu1,negnot,imov,imovx,lea, - incdec,ishift,ishift1,rotate,rotate1,imul,idiv, + incdec,ishift,ishiftx,ishift1,rotate,rotatex,rotate1,imul,idiv, icmp,test,ibr,setcc,icmov, push,pop,call,callv,leave, str,bitmanip, @@ -414,8 +416,8 @@ (const_int 0) (eq_attr "unit" "i387,sse,mmx") (const_int 0) - (eq_attr "type" "alu,alu1,negnot,imovx,ishift,rotate,ishift1,rotate1, - imul,icmp,push,pop") + (eq_attr "type" "alu,alu1,negnot,imovx,ishift,ishiftx,ishift1, + rotate,rotatex,rotate1,imul,icmp,push,pop") (symbol_ref "ix86_attr_length_immediate_default (insn, true)") (eq_attr "type" "imov,test") (symbol_ref "ix86_attr_length_immediate_default (insn, false)") @@ -675,7 +677,7 @@ (and (match_operand 0 "memory_displacement_operand" "") (match_operand 1 "immediate_operand" ""))) (const_string "true") - (and (eq_attr "type" "alu,ishift,rotate,imul,idiv") + (and (eq_attr "type" "alu,ishift,ishiftx,rotate,rotatex,imul,idiv") (and (match_operand 0 "memory_displacement_operand" "") (match_operand 2 "immediate_operand" ""))) (const_string "true") @@ -947,6 +949,9 @@ ;; Instruction suffix for REX 64bit operators. (define_mode_attr rex64suffix [(SI "") (DI "{q}")]) +;; Register prefix for word mode. +(define_mode_attr w [(SI "k") (DI "q")]) + ;; This mode iterator allows :P to be used for patterns that operate on ;; pointer-sized quantities. Exactly one of the two alternatives will match. (define_mode_iterator P [(SI "Pmode == SImode") (DI "Pmode == DImode")]) @@ -6830,15 +6835,34 @@ (set_attr "bdver1_decode" "direct") (set_attr "mode" "QI")]) -(define_expand "mul3" +(define_expand "mul3" [(parallel [(set (match_operand: 0 "register_operand" "") (mult: - (any_extend: + (sign_extend: (match_operand:DWIH 1 "nonimmediate_operand" "")) - (any_extend: + (sign_extend: (match_operand:DWIH 2 "register_operand" "")))) (clobber (reg:CC FLAGS_REG))])]) +(define_expand "umul3" + [(parallel [(set (match_operand: 0 "register_operand" "") + (mult: + (zero_extend: + (match_operand:DWIH 1 "nonimmediate_operand" "")) + (zero_extend: + (match_operand:DWIH 2 "register_operand" "")))) + (clobber (reg:CC FLAGS_REG))])] + "" +{ + if (TARGET_BMI) + { + emit_insn (gen_bmi2_umul3_1 (operands[0], + operands[1], + operands[2])); + DONE; + } +}) + (define_expand "mulqihi3" [(parallel [(set (match_operand:HI 0 "register_operand" "") (mult:HI @@ -6849,6 +6873,20 @@ (clobber (reg:CC FLAGS_REG))])] "TARGET_QIMODE_MATH") +(define_insn "bmi2_umul3_1" + [(set (match_operand: 0 "register_operand" "=r") + (mult: + (zero_extend: + (match_operand:DWIH 1 "nonimmediate_operand" "%d")) + (zero_extend: + (match_operand:DWIH 2 "nonimmediate_operand" "rm"))))] + "TARGET_BMI + && !(MEM_P (operands[1]) && MEM_P (operands[2]))" + "mulx\t{%2, %M0, %N0|%N0, %M0, %2}" + [(set_attr "type" "imul") + (set_attr "prefix" "vex") + (set_attr "mode" "")]) + (define_insn "*mul3_1" [(set (match_operand: 0 "register_operand" "=A") (mult: @@ -9056,16 +9094,26 @@ [(set_attr "type" "ishift") (set_attr "mode" "")]) +(define_insn "*bmi2_ashl3_1" + [(set (match_operand:SWI48 0 "register_operand" "=r") + (ashift:SWI48 (match_operand:SWI48 1 "nonimmediate_operand" "rm") + (match_operand:QI 2 "register_operand" "r")))] + "TARGET_BMI" + "salx\t{%2, %1, %0|%0, %1, %2}" + [(set_attr "type" "ishiftx") + (set_attr "mode" "")]) + (define_insn "*ashl3_1" - [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r") - (ashift:SWI48 (match_operand:SWI48 1 "nonimmediate_operand" "0,l") - (match_operand:QI 2 "nonmemory_operand" "c,M"))) + [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r,Yb") + (ashift:SWI48 (match_operand:SWI48 1 "nonimmediate_operand" "0,l,mYb") + (match_operand:QI 2 "nonmemory_operand" "c,M,Yb"))) (clobber (reg:CC FLAGS_REG))] "ix86_binary_operator_ok (ASHIFT, mode, operands)" { switch (get_attr_type (insn)) { case TYPE_LEA: + case TYPE_ISHIFTX: return "#"; case TYPE_ALU: @@ -9084,6 +9132,8 @@ [(set (attr "type") (cond [(eq_attr "alternative" "1") (const_string "lea") + (eq_attr "alternative" "2") + (const_string "ishiftx") (and (and (ne (symbol_ref "TARGET_DOUBLE_WITH_ADD") (const_int 0)) (match_operand 0 "register_operand" "")) @@ -9102,17 +9152,39 @@ (const_string "*"))) (set_attr "mode" "")]) +;; Convert shift to the shiftx pattern to avoid flags dependency. +(define_split + [(set (match_operand:SWI48 0 "register_operand" "") + (ashift:SWI48 (match_operand:SWI48 1 "nonimmediate_operand" "") + (match_operand:QI 2 "register_operand" ""))) + (clobber (reg:CC FLAGS_REG))] + "TARGET_BMI && reload_completed + && true_regnum (operands[0]) != true_regnum (operands[1])" + [(set (match_dup 0) + (ashift:SWI48 (match_dup 1) (match_dup 2)))]) + +(define_insn "*bmi2_ashlsi3_1_zext" + [(set (match_operand:DI 0 "register_operand" "=r") + (zero_extend:DI + (ashift:SI (match_operand:SI 1 "nonimmediate_operand" "rm") + (match_operand:QI 2 "register_operand" "r"))))] + "TARGET_64BIT && TARGET_BMI" + "salx\t{%k2, %1, %k0|%k0, %1, %k2}" + [(set_attr "type" "ishiftx") + (set_attr "mode" "SI")]) + (define_insn "*ashlsi3_1_zext" - [(set (match_operand:DI 0 "register_operand" "=r,r") + [(set (match_operand:DI 0 "register_operand" "=r,r,Yb") (zero_extend:DI - (ashift:SI (match_operand:SI 1 "register_operand" "0,l") - (match_operand:QI 2 "nonmemory_operand" "cI,M")))) + (ashift:SI (match_operand:SI 1 "nonimmediate_operand" "0,l,mYb") + (match_operand:QI 2 "nonmemory_operand" "cI,M,Yb")))) (clobber (reg:CC FLAGS_REG))] "TARGET_64BIT && ix86_binary_operator_ok (ASHIFT, SImode, operands)" { switch (get_attr_type (insn)) { case TYPE_LEA: + case TYPE_ISHIFTX: return "#"; case TYPE_ALU: @@ -9130,6 +9202,8 @@ [(set (attr "type") (cond [(eq_attr "alternative" "1") (const_string "lea") + (eq_attr "alternative" "2") + (const_string "ishiftx") (and (ne (symbol_ref "TARGET_DOUBLE_WITH_ADD") (const_int 0)) (match_operand 2 "const1_operand" "")) @@ -9147,6 +9221,18 @@ (const_string "*"))) (set_attr "mode" "SI")]) +;; Convert shift to the shiftx pattern to avoid flags dependency. +(define_split + [(set (match_operand:DI 0 "register_operand" "") + (zero_extend:DI + (ashift:SI (match_operand:SI 1 "nonimmediate_operand" "") + (match_operand:QI 2 "register_operand" "")))) + (clobber (reg:CC FLAGS_REG))] + "TARGET_64BIT && TARGET_BMI && reload_completed + && true_regnum (operands[0]) != true_regnum (operands[1])" + [(set (match_dup 0) + (zero_extend:DI (ashift:SI (match_dup 1) (match_dup 2))))]) + (define_insn "*ashlhi3_1" [(set (match_operand:HI 0 "nonimmediate_operand" "=rm") (ashift:HI (match_operand:HI 1 "nonimmediate_operand" "0") @@ -9763,20 +9849,37 @@ DONE; }) +(define_insn "*bmi2_3_1" + [(set (match_operand:SWI48 0 "register_operand" "=r") + (any_shiftrt:SWI48 (match_operand:SWI48 1 "nonimmediate_operand" "rm") + (match_operand:QI 2 "register_operand" "r")))] + "TARGET_BMI" + "x\t{%2, %1, %0|%0, %1, %2}" + [(set_attr "type" "ishiftx") + (set_attr "mode" "")]) + (define_insn "*3_1" - [(set (match_operand:SWI 0 "nonimmediate_operand" "=m") - (any_shiftrt:SWI (match_operand:SWI 1 "nonimmediate_operand" "0") - (match_operand:QI 2 "nonmemory_operand" "c"))) + [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,Yb") + (any_shiftrt:SWI48 + (match_operand:SWI48 1 "nonimmediate_operand" "0,mYb") + (match_operand:QI 2 "nonmemory_operand" "c,Yb"))) (clobber (reg:CC FLAGS_REG))] "ix86_binary_operator_ok (, mode, operands)" { - if (operands[2] == const1_rtx - && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun))) - return "{}\t%0"; - else - return "{}\t{%2, %0|%0, %2}"; + switch (get_attr_type (insn)) + { + case TYPE_ISHIFTX: + return "#"; + + default: + if (operands[2] == const1_rtx + && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun))) + return "{}\t%0"; + else + return "{}\t{%2, %0|%0, %2}"; + } } - [(set_attr "type" "ishift") + [(set_attr "type" "ishift,ishiftx") (set (attr "length_immediate") (if_then_else (and (match_operand 2 "const1_operand" "") @@ -9786,19 +9889,83 @@ (const_string "*"))) (set_attr "mode" "")]) -(define_insn "*si3_1_zext" +;; Convert shift to the shiftx pattern to avoid flags dependency. +(define_split + [(set (match_operand:SWI48 0 "register_operand" "") + (any_shiftrt:SWI48 (match_operand:SWI48 1 "nonimmediate_operand" "") + (match_operand:QI 2 "register_operand" ""))) + (clobber (reg:CC FLAGS_REG))] + "TARGET_BMI && reload_completed + && true_regnum (operands[0]) != true_regnum (operands[1])" + [(set (match_dup 0) + (any_shiftrt:SWI48 (match_dup 1) (match_dup 2)))]) + +(define_insn "*bmi2_si3_1_zext" [(set (match_operand:DI 0 "register_operand" "=r") (zero_extend:DI - (any_shiftrt:SI (match_operand:SI 1 "register_operand" "0") - (match_operand:QI 2 "nonmemory_operand" "cI")))) + (any_shiftrt:SI (match_operand:SI 1 "nonimmediate_operand" "rm") + (match_operand:QI 2 "register_operand" "r"))))] + "TARGET_64BIT && TARGET_BMI" + "x\t{%k2, %1, %k0|%k0, %1, %k2}" + [(set_attr "type" "ishiftx") + (set_attr "mode" "SI")]) + +(define_insn "*si3_1_zext" + [(set (match_operand:DI 0 "register_operand" "=r,Yb") + (zero_extend:DI + (any_shiftrt:SI (match_operand:SI 1 "nonimmediate_operand" "0,mYb") + (match_operand:QI 2 "nonmemory_operand" "cI,Yb")))) (clobber (reg:CC FLAGS_REG))] "TARGET_64BIT && ix86_binary_operator_ok (, SImode, operands)" { + switch (get_attr_type (insn)) + { + case TYPE_ISHIFTX: + return "#"; + + default: + if (operands[2] == const1_rtx + && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun))) + return "{l}\t%k0"; + else + return "{l}\t{%2, %k0|%k0, %2}"; + } +} + [(set_attr "type" "ishift,ishiftx") + (set (attr "length_immediate") + (if_then_else + (and (match_operand 2 "const1_operand" "") + (ne (symbol_ref "TARGET_SHIFT1 || optimize_function_for_size_p (cfun)") + (const_int 0))) + (const_string "0") + (const_string "*"))) + (set_attr "mode" "SI")]) + +;; Convert shift to the shiftx pattern to avoid flags dependency. +(define_split + [(set (match_operand:DI 0 "register_operand" "") + (zero_extend:DI + (any_shiftrt:SI (match_operand:SI 1 "nonimmediate_operand" "") + (match_operand:QI 2 "register_operand" "")))) + (clobber (reg:CC FLAGS_REG))] + "TARGET_64BIT && TARGET_BMI && reload_completed + && true_regnum (operands[0]) != true_regnum (operands[1])" + [(set (match_dup 0) + (zero_extend:DI (any_shiftrt:SI (match_dup 1) (match_dup 2))))]) + +(define_insn "*3_1" + [(set (match_operand:SWI12 0 "nonimmediate_operand" "=m") + (any_shiftrt:SWI12 + (match_operand:SWI12 1 "nonimmediate_operand" "0") + (match_operand:QI 2 "nonmemory_operand" "c"))) + (clobber (reg:CC FLAGS_REG))] + "ix86_binary_operator_ok (, mode, operands)" +{ if (operands[2] == const1_rtx && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun))) - return "{l}\t%k0"; + return "{}\t%0"; else - return "{l}\t{%2, %k0|%k0, %2}"; + return "{}\t{%2, %0|%0, %2}"; } [(set_attr "type" "ishift") (set (attr "length_immediate") @@ -9808,7 +9975,7 @@ (const_int 0))) (const_string "0") (const_string "*"))) - (set_attr "mode" "SI")]) + (set_attr "mode" "")]) (define_insn "*qi3_1_slp" [(set (strict_low_part (match_operand:QI 0 "nonimmediate_operand" "+qm")) @@ -10060,42 +10227,153 @@ split_double_mode (mode, &operands[0], 1, &operands[4], &operands[5]); }) +(define_insn "*bmi2_rorx3_1" + [(set (match_operand:SWI48 0 "register_operand" "=r") + (rotatert:SWI48 (match_operand:SWI48 1 "nonimmediate_operand" "rm") + (match_operand:QI 2 "immediate_operand" "")))] + "TARGET_BMI" + "rorx\t{%2, %1, %0|%0, %1, %2}" + [(set_attr "type" "rotatex") + (set_attr "mode" "")]) + (define_insn "*3_1" - [(set (match_operand:SWI 0 "nonimmediate_operand" "=m") - (any_rotate:SWI (match_operand:SWI 1 "nonimmediate_operand" "0") - (match_operand:QI 2 "nonmemory_operand" "c"))) + [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,Yb") + (any_rotate:SWI48 + (match_operand:SWI48 1 "nonimmediate_operand" "0,mYb") + (match_operand:QI 2 "nonmemory_operand" "c,"))) (clobber (reg:CC FLAGS_REG))] "ix86_binary_operator_ok (, mode, operands)" { - if (operands[2] == const1_rtx - && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun))) - return "{}\t%0"; - else - return "{}\t{%2, %0|%0, %2}"; + switch (get_attr_type (insn)) + { + case TYPE_ROTATEX: + return "#"; + + default: + if (operands[2] == const1_rtx + && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun))) + return "{}\t%0"; + else + return "{}\t{%2, %0|%0, %2}"; + } } - [(set_attr "type" "rotate") + [(set_attr "type" "rotate,rotatex") (set (attr "length_immediate") (if_then_else - (and (match_operand 2 "const1_operand" "") - (ne (symbol_ref "TARGET_SHIFT1 || optimize_function_for_size_p (cfun)") - (const_int 0))) + (and (eq_attr "type" "rotate") + (and (match_operand 2 "const1_operand" "") + (ne (symbol_ref "TARGET_SHIFT1 || optimize_function_for_size_p (cfun)") + (const_int 0)))) (const_string "0") (const_string "*"))) (set_attr "mode" "")]) -(define_insn "*si3_1_zext" +;; Convert rotate to the rotatex pattern to avoid flags dependency. +(define_split + [(set (match_operand:SWI48 0 "register_operand" "") + (rotate:SWI48 (match_operand:SWI48 1 "nonimmediate_operand" "") + (match_operand:QI 2 "immediate_operand" ""))) + (clobber (reg:CC FLAGS_REG))] + "TARGET_BMI && reload_completed + && true_regnum (operands[0]) != true_regnum (operands[1])" + [(set (match_dup 0) + (rotatert:SWI48 (match_dup 1) (match_dup 2)))] +{ + operands[2] + = GEN_INT (GET_MODE_BITSIZE (mode) - INTVAL (operands[2])); +}) + +(define_split + [(set (match_operand:SWI48 0 "register_operand" "") + (rotatert:SWI48 (match_operand:SWI48 1 "nonimmediate_operand" "") + (match_operand:QI 2 "immediate_operand" ""))) + (clobber (reg:CC FLAGS_REG))] + "TARGET_BMI && reload_completed + && true_regnum (operands[0]) != true_regnum (operands[1])" + [(set (match_dup 0) + (rotatert:SWI48 (match_dup 1) (match_dup 2)))]) + +(define_insn "*bmi2_rorxsi3_1_zext" [(set (match_operand:DI 0 "register_operand" "=r") (zero_extend:DI - (any_rotate:SI (match_operand:SI 1 "register_operand" "0") - (match_operand:QI 2 "nonmemory_operand" "cI")))) + (rotatert:SI (match_operand:SI 1 "nonimmediate_operand" "rm") + (match_operand:QI 2 "immediate_operand" "I"))))] + "TARGET_64BIT && TARGET_BMI" + "rorx\t{%2, %1, %k0|%k0, %1, %2}" + [(set_attr "type" "rotatex") + (set_attr "mode" "SI")]) + +(define_insn "*si3_1_zext" + [(set (match_operand:DI 0 "register_operand" "=r,Yb") + (zero_extend:DI + (any_rotate:SI (match_operand:SI 1 "nonimmediate_operand" "0,mYb") + (match_operand:QI 2 "nonmemory_operand" "cI,I")))) (clobber (reg:CC FLAGS_REG))] "TARGET_64BIT && ix86_binary_operator_ok (, SImode, operands)" { - if (operands[2] == const1_rtx - && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun))) - return "{l}\t%k0"; + switch (get_attr_type (insn)) + { + case TYPE_ROTATEX: + return "#"; + + default: + if (operands[2] == const1_rtx + && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun))) + return "{l}\t%k0"; + else + return "{l}\t{%2, %k0|%k0, %2}"; + } +} + [(set_attr "type" "rotate,rotatex") + (set (attr "length_immediate") + (if_then_else + (and (eq_attr "type" "rotate") + (and (match_operand 2 "const1_operand" "") + (ne (symbol_ref "TARGET_SHIFT1 || optimize_function_for_size_p (cfun)") + (const_int 0)))) + (const_string "0") + (const_string "*"))) + (set_attr "mode" "SI")]) + +;; Convert rotate to the rotatex pattern to avoid flags dependency. +(define_split + [(set (match_operand:DI 0 "register_operand" "") + (zero_extend:DI + (rotate:SI (match_operand:SI 1 "nonimmediate_operand" "") + (match_operand:QI 2 "immediate_operand" "")))) + (clobber (reg:CC FLAGS_REG))] + "TARGET_64BIT && TARGET_BMI && reload_completed + && true_regnum (operands[0]) != true_regnum (operands[1])" + [(set (match_dup 0) + (zero_extend:DI (rotatert:SI (match_dup 1) (match_dup 2))))] +{ + operands[2] + = GEN_INT (GET_MODE_BITSIZE (SImode) - INTVAL (operands[2])); +}) + +(define_split + [(set (match_operand:DI 0 "register_operand" "") + (zero_extend:DI + (rotatert:SI (match_operand:SI 1 "nonimmediate_operand" "") + (match_operand:QI 2 "immediate_operand" "")))) + (clobber (reg:CC FLAGS_REG))] + "TARGET_64BIT && TARGET_BMI && reload_completed + && true_regnum (operands[0]) != true_regnum (operands[1])" + [(set (match_dup 0) + (zero_extend:DI (rotatert:SI (match_dup 1) (match_dup 2))))]) + +(define_insn "*3_1" + [(set (match_operand:SWI12 0 "nonimmediate_operand" "=m") + (any_rotate:SWI12 (match_operand:SWI12 1 "nonimmediate_operand" "0") + (match_operand:QI 2 "nonmemory_operand" "c"))) + (clobber (reg:CC FLAGS_REG))] + "ix86_binary_operator_ok (, mode, operands)" +{ + if (operands[2] == const1_rtx + && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun))) + return "{}\t%0"; else - return "{l}\t{%2, %k0|%k0, %2}"; + return "{}\t{%2, %0|%0, %2}"; } [(set_attr "type" "rotate") (set (attr "length_immediate") @@ -10105,7 +10383,7 @@ (const_int 0))) (const_string "0") (const_string "*"))) - (set_attr "mode" "SI")]) + (set_attr "mode" "")]) (define_insn "*qi3_1_slp" [(set (strict_low_part (match_operand:QI 0 "nonimmediate_operand" "+qm")) Index: constraints.md =================================================================== --- constraints.md (revision 177925) +++ constraints.md (working copy) @@ -92,6 +92,7 @@ ;; m MMX inter-unit moves enabled ;; d Integer register when integer DFmode moves are enabled ;; x Integer register when integer XFmode moves are enabled +;; b Integer register when BMI2 instructions are enabled (define_register_constraint "Yz" "TARGET_SSE ? SSE_FIRST_REG : NO_REGS" "First SSE register (@code{%xmm0}).") @@ -123,6 +124,10 @@ "optimize_function_for_speed_p (cfun) ? GENERAL_REGS : NO_REGS" "@internal Any integer register when integer XFmode moves are enabled.") +(define_register_constraint "Yb" + "TARGET_BMI ? GENERAL_REGS : NO_REGS" + "@internal Any integer register, when BMI2 is enabled.") + (define_constraint "z" "@internal Constant call address operand." (match_operand 0 "constant_call_address_operand")) Index: i386.c =================================================================== --- i386.c (revision 177928) +++ i386.c (working copy) @@ -13285,6 +13285,8 @@ put_condition_code (enum rtx_code code, enum machi If CODE is 't', pretend the mode is V8SFmode. If CODE is 'h', pretend the reg is the 'high' byte register. If CODE is 'y', print "st(0)" instead of "st", if the reg is stack op. + If CODE is 'M', print the low register of a double word register pair. + If CODE is 'N', print the high register of a double word register pair. If CODE is 'd', duplicate the operand for AVX instruction. */ @@ -13327,6 +13329,18 @@ print_reg (rtx x, int code, FILE *file) code = 16; else if (code == 't') code = 32; + else if (code == 'M') + { + gcc_assert (GET_MODE (x) == GET_MODE_WIDER_MODE (word_mode)); + x = gen_lowpart (word_mode, x); + code = GET_MODE_SIZE (word_mode); + } + else if (code == 'N') + { + gcc_assert (GET_MODE (x) == GET_MODE_WIDER_MODE (word_mode)); + x = gen_highpart (word_mode, x); + code = GET_MODE_SIZE (word_mode); + } else code = GET_MODE_SIZE (GET_MODE (x)); @@ -13472,6 +13486,8 @@ get_some_local_dynamic_name (void) t -- likewise, print the V8SFmode name of the register. h -- print the QImode name for a "high" register, either ah, bh, ch or dh. y -- print "st(0)" instead of "st" as a register. + M -- print the low register of a double word register pair. + N -- print the high register of a double word register pair. d -- print duplicated register operand for AVX instruction. D -- print condition for SSE cmp instruction. P -- if PIC, print an @PLT suffix. @@ -13678,6 +13694,8 @@ ix86_print_operand (FILE *file, rtx x, int code) case 'h': case 't': case 'y': + case 'M': + case 'N': case 'x': case 'X': case 'P':