From patchwork Sat Apr 14 13:33:31 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Uros Bizjak X-Patchwork-Id: 152506 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id 548B1B7004 for ; Sat, 14 Apr 2012 23:34:02 +1000 (EST) Comment: DKIM? See http://www.dkim.org DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=gcc.gnu.org; s=default; x=1335015243; h=Comment: DomainKey-Signature:Received:Received:Received:Received: MIME-Version:Received:Received:Date:Message-ID:Subject:From:To: Content-Type:Mailing-List:Precedence:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:Sender:Delivered-To; bh=xYdxXhp l7Wf0xphy0Cr7S2zuUMU=; b=lvpCnXqQxG55tAXMq9cHtl6vDCfifG0ST0Xm+GE LnB9kEIg5thr3Fc/WXZDuknRMYJSNjKj0fAovVQmQGtuiqmfVJsTihoaGgK7cbBu 2kgqbNVY0XeQSI5uNQWb6bGlroPfzBySI2yktttX6+xVfKS+RJk4o+IE4yojPF5R z0O8= Comment: DomainKeys? See http://antispam.yahoo.com/domainkeys DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=default; d=gcc.gnu.org; h=Received:Received:X-SWARE-Spam-Status:X-Spam-Check-By:Received:Received:MIME-Version:Received:Received:Date:Message-ID:Subject:From:To:Content-Type:Mailing-List:Precedence:List-Id:List-Unsubscribe:List-Archive:List-Post:List-Help:Sender:Delivered-To; b=BwtISqDodRHhTZbm9d4jVXM2AOWes3e8ZBOIno24IX2Ukdx3UT9CCsu9jD/fz/ MCclLOUnknnsY68+D/NPa77iNuA0MztkPXU0naG3LCGJTzIRIxmc5QXFoFCwuhU5 iiH7rGz20+Df5czeSb6sI4F5khud8UDWaAjDOqbomULUE=; Received: (qmail 13006 invoked by alias); 14 Apr 2012 13:33:58 -0000 Received: (qmail 12986 invoked by uid 22791); 14 Apr 2012 13:33:51 -0000 X-SWARE-Spam-Status: No, hits=-2.9 required=5.0 tests=AWL, BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, FREEMAIL_FROM, KHOP_RCVD_TRUST, RCVD_IN_DNSWL_LOW, RCVD_IN_HOSTKARMA_YE, SARE_HTML_INV_TAG, TW_AV, TW_VX, TW_ZJ X-Spam-Check-By: sourceware.org Received: from mail-gx0-f175.google.com (HELO mail-gx0-f175.google.com) (209.85.161.175) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Sat, 14 Apr 2012 13:33:32 +0000 Received: by ggcy3 with SMTP id y3so2183867ggc.20 for ; Sat, 14 Apr 2012 06:33:31 -0700 (PDT) MIME-Version: 1.0 Received: by 10.236.170.198 with SMTP id p46mr5196100yhl.63.1334410411289; Sat, 14 Apr 2012 06:33:31 -0700 (PDT) Received: by 10.146.124.5 with HTTP; Sat, 14 Apr 2012 06:33:31 -0700 (PDT) Date: Sat, 14 Apr 2012 15:33:31 +0200 Message-ID: Subject: [PATCH, i386]: Macroize horizontal add/sub and integer mac patterns From: Uros Bizjak To: gcc-patches@gcc.gnu.org Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Hello! Mechanical patch that removes nearly 900 lines of code from sse.md. No functional changes. 2012-04-14 Uros Bizjak * config/i386/sse.md (ssse3_plusminus): New code iterator. (avx2_phwv16hi3): Macroize insn from avx2_ph{add,adds,sub,subs}wv16hi3 using ssse3_plusminus code iterator. (ssse3_phwv8hi3): Macroize insn from ssse3_ph{add,adds,sub,subs}wv8hi3 using ssse3_plusminus code iterator. (ssse3_phwv4hi3): Macroize insn from ssse3_ph{add,adds,sub,subs}wv4hi3 using ssse3_plusminus code iterator. (avx2_phdv8si3): Macroize insn from avx2_ph{add,adds,sub,subs}dv8si3 using plusminus code iterator. (ssse3_phdv4si3): Macroize insn from ssse3_ph{add,adds,sub,subs}dv4si3 using plusminus code iterator. (ssse3_phdv2si3): Macroize insn from ssse3_ph{add,adds,sub,subs}dv2si3 using plusminus code iterator. (xop_plus): New code iterator. (macs): New code attribute. (macds): Ditto. (xop_p): Macroize insn from xop_pmacs{,s}{ww,dd} using xop_plus code iterator and VI24_128 mode iterator. (xop_pdql): Macroize insn from xop_pmacs{,s}dql using xop_plus code iterator. (xop_pdqh): Macroize insn from xop_pmacs{,s}dqh using xop_plus code iterator. (xop_pwd): Macroize insn from xop_pmacs{,s}wd using xop_plus code iterator. (xop_pwd): Macroize insn from xop_pmadcs{,s}wd using xop_plus code iterator. (xop_phaddbw): Macroize insn from xop_phadd{,u}bw usign any_extend code iterator. (xop_phaddbd): Macroize insn from xop_phadd{,u}bd usign any_extend code iterator. (xop_phaddbq): Macroize insn from xop_phadd{,u}bq usign any_extend code iterator. (xop_phaddwd): Macroize insn from xop_phadd{,u}wd usign any_extend code iterator. (xop_phaddwq): Macroize insn from xop_phadd{,u}wq usign any_extend code iterator. (xop_phadddq): Macroize insn from xop_phadd{,u}dq usign any_extend code iterator. Tested on x86_64-pc-linux-gnu {,-m32}, committed to mainline SVN. Uros. Index: config/i386/sse.md =================================================================== --- config/i386/sse.md (revision 186448) +++ config/i386/sse.md (working copy) @@ -8037,9 +8037,10 @@ ;; surely not generally useful. (define_insn "_psadbw" [(set (match_operand:VI8_AVX2 0 "register_operand" "=x,x") - (unspec:VI8_AVX2 [(match_operand: 1 "register_operand" "0,x") - (match_operand: 2 "nonimmediate_operand" "xm,xm")] - UNSPEC_PSADBW))] + (unspec:VI8_AVX2 + [(match_operand: 1 "register_operand" "0,x") + (match_operand: 2 "nonimmediate_operand" "xm,xm")] + UNSPEC_PSADBW))] "TARGET_SSE2" "@ psadbw\t{%2, %0|%0, %2} @@ -8175,375 +8176,125 @@ ;; ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; -(define_insn "avx2_phaddwv16hi3" - [(set (match_operand:V16HI 0 "register_operand" "=x") - (vec_concat:V16HI - (vec_concat:V8HI - (vec_concat:V4HI - (vec_concat:V2HI - (plus:HI - (vec_select:HI - (match_operand:V16HI 1 "register_operand" "x") - (parallel [(const_int 0)])) - (vec_select:HI (match_dup 1) (parallel [(const_int 1)]))) - (plus:HI - (vec_select:HI (match_dup 1) (parallel [(const_int 2)])) - (vec_select:HI (match_dup 1) (parallel [(const_int 3)])))) - (vec_concat:V2HI - (plus:HI - (vec_select:HI (match_dup 1) (parallel [(const_int 4)])) - (vec_select:HI (match_dup 1) (parallel [(const_int 5)]))) - (plus:HI - (vec_select:HI (match_dup 1) (parallel [(const_int 6)])) - (vec_select:HI (match_dup 1) (parallel [(const_int 7)]))))) - (vec_concat:V4HI - (vec_concat:V2HI - (plus:HI - (vec_select:HI (match_dup 1) (parallel [(const_int 8)])) - (vec_select:HI (match_dup 1) (parallel [(const_int 9)]))) - (plus:HI - (vec_select:HI (match_dup 1) (parallel [(const_int 10)])) - (vec_select:HI (match_dup 1) (parallel [(const_int 11)])))) - (vec_concat:V2HI - (plus:HI - (vec_select:HI (match_dup 1) (parallel [(const_int 12)])) - (vec_select:HI (match_dup 1) (parallel [(const_int 13)]))) - (plus:HI - (vec_select:HI (match_dup 1) (parallel [(const_int 14)])) - (vec_select:HI (match_dup 1) (parallel [(const_int 15)])))))) - (vec_concat:V8HI - (vec_concat:V4HI - (vec_concat:V2HI - (plus:HI - (vec_select:HI - (match_operand:V16HI 2 "nonimmediate_operand" "xm") - (parallel [(const_int 0)])) - (vec_select:HI (match_dup 2) (parallel [(const_int 1)]))) - (plus:HI - (vec_select:HI (match_dup 2) (parallel [(const_int 2)])) - (vec_select:HI (match_dup 2) (parallel [(const_int 3)])))) - (vec_concat:V2HI - (plus:HI - (vec_select:HI (match_dup 2) (parallel [(const_int 4)])) - (vec_select:HI (match_dup 2) (parallel [(const_int 5)]))) - (plus:HI - (vec_select:HI (match_dup 2) (parallel [(const_int 6)])) - (vec_select:HI (match_dup 2) (parallel [(const_int 7)]))))) - (vec_concat:V4HI - (vec_concat:V2HI - (plus:HI - (vec_select:HI (match_dup 2) (parallel [(const_int 8)])) - (vec_select:HI (match_dup 2) (parallel [(const_int 9)]))) - (plus:HI - (vec_select:HI (match_dup 2) (parallel [(const_int 10)])) - (vec_select:HI (match_dup 2) (parallel [(const_int 11)])))) - (vec_concat:V2HI - (plus:HI - (vec_select:HI (match_dup 2) (parallel [(const_int 12)])) - (vec_select:HI (match_dup 2) (parallel [(const_int 13)]))) - (plus:HI - (vec_select:HI (match_dup 2) (parallel [(const_int 14)])) - (vec_select:HI (match_dup 2) (parallel [(const_int 15)]))))))))] - "TARGET_AVX2" - "vphaddw\t{%2, %1, %0|%0, %1, %2}" - [(set_attr "type" "sseiadd") - (set_attr "prefix_extra" "1") - (set_attr "prefix" "vex") - (set_attr "mode" "OI")]) +(define_code_iterator ssse3_plusminus [plus ss_plus minus ss_minus]) -(define_insn "ssse3_phaddwv8hi3" - [(set (match_operand:V8HI 0 "register_operand" "=x,x") - (vec_concat:V8HI - (vec_concat:V4HI - (vec_concat:V2HI - (plus:HI - (vec_select:HI - (match_operand:V8HI 1 "register_operand" "0,x") - (parallel [(const_int 0)])) - (vec_select:HI (match_dup 1) (parallel [(const_int 1)]))) - (plus:HI - (vec_select:HI (match_dup 1) (parallel [(const_int 2)])) - (vec_select:HI (match_dup 1) (parallel [(const_int 3)])))) - (vec_concat:V2HI - (plus:HI - (vec_select:HI (match_dup 1) (parallel [(const_int 4)])) - (vec_select:HI (match_dup 1) (parallel [(const_int 5)]))) - (plus:HI - (vec_select:HI (match_dup 1) (parallel [(const_int 6)])) - (vec_select:HI (match_dup 1) (parallel [(const_int 7)]))))) - (vec_concat:V4HI - (vec_concat:V2HI - (plus:HI - (vec_select:HI - (match_operand:V8HI 2 "nonimmediate_operand" "xm,xm") - (parallel [(const_int 0)])) - (vec_select:HI (match_dup 2) (parallel [(const_int 1)]))) - (plus:HI - (vec_select:HI (match_dup 2) (parallel [(const_int 2)])) - (vec_select:HI (match_dup 2) (parallel [(const_int 3)])))) - (vec_concat:V2HI - (plus:HI - (vec_select:HI (match_dup 2) (parallel [(const_int 4)])) - (vec_select:HI (match_dup 2) (parallel [(const_int 5)]))) - (plus:HI - (vec_select:HI (match_dup 2) (parallel [(const_int 6)])) - (vec_select:HI (match_dup 2) (parallel [(const_int 7)])))))))] - "TARGET_SSSE3" - "@ - phaddw\t{%2, %0|%0, %2} - vphaddw\t{%2, %1, %0|%0, %1, %2}" - [(set_attr "isa" "noavx,avx") - (set_attr "type" "sseiadd") - (set_attr "atom_unit" "complex") - (set_attr "prefix_data16" "1,*") - (set_attr "prefix_extra" "1") - (set_attr "prefix" "orig,vex") - (set_attr "mode" "TI")]) - -(define_insn "ssse3_phaddwv4hi3" - [(set (match_operand:V4HI 0 "register_operand" "=y") - (vec_concat:V4HI - (vec_concat:V2HI - (plus:HI - (vec_select:HI - (match_operand:V4HI 1 "register_operand" "0") - (parallel [(const_int 0)])) - (vec_select:HI (match_dup 1) (parallel [(const_int 1)]))) - (plus:HI - (vec_select:HI (match_dup 1) (parallel [(const_int 2)])) - (vec_select:HI (match_dup 1) (parallel [(const_int 3)])))) - (vec_concat:V2HI - (plus:HI - (vec_select:HI - (match_operand:V4HI 2 "nonimmediate_operand" "ym") - (parallel [(const_int 0)])) - (vec_select:HI (match_dup 2) (parallel [(const_int 1)]))) - (plus:HI - (vec_select:HI (match_dup 2) (parallel [(const_int 2)])) - (vec_select:HI (match_dup 2) (parallel [(const_int 3)]))))))] - "TARGET_SSSE3" - "phaddw\t{%2, %0|%0, %2}" - [(set_attr "type" "sseiadd") - (set_attr "atom_unit" "complex") - (set_attr "prefix_extra" "1") - (set (attr "prefix_rex") (symbol_ref "x86_extended_reg_mentioned_p (insn)")) - (set_attr "mode" "DI")]) - -(define_insn "avx2_phadddv8si3" - [(set (match_operand:V8SI 0 "register_operand" "=x") - (vec_concat:V8SI - (vec_concat:V4SI - (vec_concat:V2SI - (plus:SI - (vec_select:SI - (match_operand:V8SI 1 "register_operand" "x") - (parallel [(const_int 0)])) - (vec_select:SI (match_dup 1) (parallel [(const_int 1)]))) - (plus:SI - (vec_select:SI (match_dup 1) (parallel [(const_int 2)])) - (vec_select:SI (match_dup 1) (parallel [(const_int 3)])))) - (vec_concat:V2SI - (plus:SI - (vec_select:SI (match_dup 1) (parallel [(const_int 4)])) - (vec_select:SI (match_dup 1) (parallel [(const_int 5)]))) - (plus:SI - (vec_select:SI (match_dup 1) (parallel [(const_int 6)])) - (vec_select:SI (match_dup 1) (parallel [(const_int 7)]))))) - (vec_concat:V4SI - (vec_concat:V2SI - (plus:SI - (vec_select:SI - (match_operand:V8SI 2 "nonimmediate_operand" "xm") - (parallel [(const_int 0)])) - (vec_select:SI (match_dup 2) (parallel [(const_int 1)]))) - (plus:SI - (vec_select:SI (match_dup 2) (parallel [(const_int 2)])) - (vec_select:SI (match_dup 2) (parallel [(const_int 3)])))) - (vec_concat:V2SI - (plus:SI - (vec_select:SI (match_dup 2) (parallel [(const_int 4)])) - (vec_select:SI (match_dup 2) (parallel [(const_int 5)]))) - (plus:SI - (vec_select:SI (match_dup 2) (parallel [(const_int 6)])) - (vec_select:SI (match_dup 2) (parallel [(const_int 7)])))))))] - "TARGET_AVX2" - "vphaddd\t{%2, %1, %0|%0, %1, %2}" - [(set_attr "type" "sseiadd") - (set_attr "prefix_extra" "1") - (set_attr "prefix" "vex") - (set_attr "mode" "OI")]) - -(define_insn "ssse3_phadddv4si3" - [(set (match_operand:V4SI 0 "register_operand" "=x,x") - (vec_concat:V4SI - (vec_concat:V2SI - (plus:SI - (vec_select:SI - (match_operand:V4SI 1 "register_operand" "0,x") - (parallel [(const_int 0)])) - (vec_select:SI (match_dup 1) (parallel [(const_int 1)]))) - (plus:SI - (vec_select:SI (match_dup 1) (parallel [(const_int 2)])) - (vec_select:SI (match_dup 1) (parallel [(const_int 3)])))) - (vec_concat:V2SI - (plus:SI - (vec_select:SI - (match_operand:V4SI 2 "nonimmediate_operand" "xm,xm") - (parallel [(const_int 0)])) - (vec_select:SI (match_dup 2) (parallel [(const_int 1)]))) - (plus:SI - (vec_select:SI (match_dup 2) (parallel [(const_int 2)])) - (vec_select:SI (match_dup 2) (parallel [(const_int 3)]))))))] - "TARGET_SSSE3" - "@ - phaddd\t{%2, %0|%0, %2} - vphaddd\t{%2, %1, %0|%0, %1, %2}" - [(set_attr "isa" "noavx,avx") - (set_attr "type" "sseiadd") - (set_attr "atom_unit" "complex") - (set_attr "prefix_data16" "1,*") - (set_attr "prefix_extra" "1") - (set_attr "prefix" "orig,vex") - (set_attr "mode" "TI")]) - -(define_insn "ssse3_phadddv2si3" - [(set (match_operand:V2SI 0 "register_operand" "=y") - (vec_concat:V2SI - (plus:SI - (vec_select:SI - (match_operand:V2SI 1 "register_operand" "0") - (parallel [(const_int 0)])) - (vec_select:SI (match_dup 1) (parallel [(const_int 1)]))) - (plus:SI - (vec_select:SI - (match_operand:V2SI 2 "nonimmediate_operand" "ym") - (parallel [(const_int 0)])) - (vec_select:SI (match_dup 2) (parallel [(const_int 1)])))))] - "TARGET_SSSE3" - "phaddd\t{%2, %0|%0, %2}" - [(set_attr "type" "sseiadd") - (set_attr "atom_unit" "complex") - (set_attr "prefix_extra" "1") - (set (attr "prefix_rex") (symbol_ref "x86_extended_reg_mentioned_p (insn)")) - (set_attr "mode" "DI")]) - -(define_insn "avx2_phaddswv16hi3" +(define_insn "avx2_phwv16hi3" [(set (match_operand:V16HI 0 "register_operand" "=x") (vec_concat:V16HI (vec_concat:V8HI (vec_concat:V4HI (vec_concat:V2HI - (ss_plus:HI + (ssse3_plusminus:HI (vec_select:HI (match_operand:V16HI 1 "register_operand" "x") (parallel [(const_int 0)])) (vec_select:HI (match_dup 1) (parallel [(const_int 1)]))) - (ss_plus:HI + (ssse3_plusminus:HI (vec_select:HI (match_dup 1) (parallel [(const_int 2)])) (vec_select:HI (match_dup 1) (parallel [(const_int 3)])))) (vec_concat:V2HI - (ss_plus:HI + (ssse3_plusminus:HI (vec_select:HI (match_dup 1) (parallel [(const_int 4)])) (vec_select:HI (match_dup 1) (parallel [(const_int 5)]))) - (ss_plus:HI + (ssse3_plusminus:HI (vec_select:HI (match_dup 1) (parallel [(const_int 6)])) (vec_select:HI (match_dup 1) (parallel [(const_int 7)]))))) (vec_concat:V4HI (vec_concat:V2HI - (ss_plus:HI + (ssse3_plusminus:HI (vec_select:HI (match_dup 1) (parallel [(const_int 8)])) (vec_select:HI (match_dup 1) (parallel [(const_int 9)]))) - (ss_plus:HI + (ssse3_plusminus:HI (vec_select:HI (match_dup 1) (parallel [(const_int 10)])) (vec_select:HI (match_dup 1) (parallel [(const_int 11)])))) (vec_concat:V2HI - (ss_plus:HI + (ssse3_plusminus:HI (vec_select:HI (match_dup 1) (parallel [(const_int 12)])) (vec_select:HI (match_dup 1) (parallel [(const_int 13)]))) - (ss_plus:HI + (ssse3_plusminus:HI (vec_select:HI (match_dup 1) (parallel [(const_int 14)])) (vec_select:HI (match_dup 1) (parallel [(const_int 15)])))))) (vec_concat:V8HI (vec_concat:V4HI (vec_concat:V2HI - (ss_plus:HI + (ssse3_plusminus:HI (vec_select:HI (match_operand:V16HI 2 "nonimmediate_operand" "xm") (parallel [(const_int 0)])) (vec_select:HI (match_dup 2) (parallel [(const_int 1)]))) - (ss_plus:HI + (ssse3_plusminus:HI (vec_select:HI (match_dup 2) (parallel [(const_int 2)])) (vec_select:HI (match_dup 2) (parallel [(const_int 3)])))) (vec_concat:V2HI - (ss_plus:HI + (ssse3_plusminus:HI (vec_select:HI (match_dup 2) (parallel [(const_int 4)])) (vec_select:HI (match_dup 2) (parallel [(const_int 5)]))) - (ss_plus:HI + (ssse3_plusminus:HI (vec_select:HI (match_dup 2) (parallel [(const_int 6)])) (vec_select:HI (match_dup 2) (parallel [(const_int 7)]))))) (vec_concat:V4HI (vec_concat:V2HI - (ss_plus:HI + (ssse3_plusminus:HI (vec_select:HI (match_dup 2) (parallel [(const_int 8)])) (vec_select:HI (match_dup 2) (parallel [(const_int 9)]))) - (ss_plus:HI + (ssse3_plusminus:HI (vec_select:HI (match_dup 2) (parallel [(const_int 10)])) (vec_select:HI (match_dup 2) (parallel [(const_int 11)])))) (vec_concat:V2HI - (ss_plus:HI + (ssse3_plusminus:HI (vec_select:HI (match_dup 2) (parallel [(const_int 12)])) (vec_select:HI (match_dup 2) (parallel [(const_int 13)]))) - (ss_plus:HI + (ssse3_plusminus:HI (vec_select:HI (match_dup 2) (parallel [(const_int 14)])) (vec_select:HI (match_dup 2) (parallel [(const_int 15)]))))))))] "TARGET_AVX2" - "vphaddsw\t{%2, %1, %0|%0, %1, %2}" + "vphw\t{%2, %1, %0|%0, %1, %2}" [(set_attr "type" "sseiadd") (set_attr "prefix_extra" "1") (set_attr "prefix" "vex") (set_attr "mode" "OI")]) -(define_insn "ssse3_phaddswv8hi3" +(define_insn "ssse3_phwv8hi3" [(set (match_operand:V8HI 0 "register_operand" "=x,x") (vec_concat:V8HI (vec_concat:V4HI (vec_concat:V2HI - (ss_plus:HI + (ssse3_plusminus:HI (vec_select:HI (match_operand:V8HI 1 "register_operand" "0,x") (parallel [(const_int 0)])) (vec_select:HI (match_dup 1) (parallel [(const_int 1)]))) - (ss_plus:HI + (ssse3_plusminus:HI (vec_select:HI (match_dup 1) (parallel [(const_int 2)])) (vec_select:HI (match_dup 1) (parallel [(const_int 3)])))) (vec_concat:V2HI - (ss_plus:HI + (ssse3_plusminus:HI (vec_select:HI (match_dup 1) (parallel [(const_int 4)])) (vec_select:HI (match_dup 1) (parallel [(const_int 5)]))) - (ss_plus:HI + (ssse3_plusminus:HI (vec_select:HI (match_dup 1) (parallel [(const_int 6)])) (vec_select:HI (match_dup 1) (parallel [(const_int 7)]))))) (vec_concat:V4HI (vec_concat:V2HI - (ss_plus:HI + (ssse3_plusminus:HI (vec_select:HI (match_operand:V8HI 2 "nonimmediate_operand" "xm,xm") (parallel [(const_int 0)])) (vec_select:HI (match_dup 2) (parallel [(const_int 1)]))) - (ss_plus:HI + (ssse3_plusminus:HI (vec_select:HI (match_dup 2) (parallel [(const_int 2)])) (vec_select:HI (match_dup 2) (parallel [(const_int 3)])))) (vec_concat:V2HI - (ss_plus:HI + (ssse3_plusminus:HI (vec_select:HI (match_dup 2) (parallel [(const_int 4)])) (vec_select:HI (match_dup 2) (parallel [(const_int 5)]))) - (ss_plus:HI + (ssse3_plusminus:HI (vec_select:HI (match_dup 2) (parallel [(const_int 6)])) (vec_select:HI (match_dup 2) (parallel [(const_int 7)])))))))] "TARGET_SSSE3" "@ - phaddsw\t{%2, %0|%0, %2} - vphaddsw\t{%2, %1, %0|%0, %1, %2}" + phw\t{%2, %0|%0, %2} + vphw\t{%2, %1, %0|%0, %1, %2}" [(set_attr "isa" "noavx,avx") (set_attr "type" "sseiadd") (set_attr "atom_unit" "complex") @@ -8552,259 +8303,104 @@ (set_attr "prefix" "orig,vex") (set_attr "mode" "TI")]) -(define_insn "ssse3_phaddswv4hi3" +(define_insn "ssse3_phwv4hi3" [(set (match_operand:V4HI 0 "register_operand" "=y") (vec_concat:V4HI (vec_concat:V2HI - (ss_plus:HI + (ssse3_plusminus:HI (vec_select:HI (match_operand:V4HI 1 "register_operand" "0") (parallel [(const_int 0)])) (vec_select:HI (match_dup 1) (parallel [(const_int 1)]))) - (ss_plus:HI + (ssse3_plusminus:HI (vec_select:HI (match_dup 1) (parallel [(const_int 2)])) (vec_select:HI (match_dup 1) (parallel [(const_int 3)])))) (vec_concat:V2HI - (ss_plus:HI + (ssse3_plusminus:HI (vec_select:HI (match_operand:V4HI 2 "nonimmediate_operand" "ym") (parallel [(const_int 0)])) (vec_select:HI (match_dup 2) (parallel [(const_int 1)]))) - (ss_plus:HI + (ssse3_plusminus:HI (vec_select:HI (match_dup 2) (parallel [(const_int 2)])) (vec_select:HI (match_dup 2) (parallel [(const_int 3)]))))))] "TARGET_SSSE3" - "phaddsw\t{%2, %0|%0, %2}" + "phw\t{%2, %0|%0, %2}" [(set_attr "type" "sseiadd") (set_attr "atom_unit" "complex") (set_attr "prefix_extra" "1") (set (attr "prefix_rex") (symbol_ref "x86_extended_reg_mentioned_p (insn)")) (set_attr "mode" "DI")]) -(define_insn "avx2_phsubwv16hi3" - [(set (match_operand:V16HI 0 "register_operand" "=x") - (vec_concat:V16HI - (vec_concat:V8HI - (vec_concat:V4HI - (vec_concat:V2HI - (minus:HI - (vec_select:HI - (match_operand:V16HI 1 "register_operand" "x") - (parallel [(const_int 0)])) - (vec_select:HI (match_dup 1) (parallel [(const_int 1)]))) - (minus:HI - (vec_select:HI (match_dup 1) (parallel [(const_int 2)])) - (vec_select:HI (match_dup 1) (parallel [(const_int 3)])))) - (vec_concat:V2HI - (minus:HI - (vec_select:HI (match_dup 1) (parallel [(const_int 4)])) - (vec_select:HI (match_dup 1) (parallel [(const_int 5)]))) - (minus:HI - (vec_select:HI (match_dup 1) (parallel [(const_int 6)])) - (vec_select:HI (match_dup 1) (parallel [(const_int 7)]))))) - (vec_concat:V4HI - (vec_concat:V2HI - (minus:HI - (vec_select:HI (match_dup 1) (parallel [(const_int 8)])) - (vec_select:HI (match_dup 1) (parallel [(const_int 9)]))) - (minus:HI - (vec_select:HI (match_dup 1) (parallel [(const_int 10)])) - (vec_select:HI (match_dup 1) (parallel [(const_int 11)])))) - (vec_concat:V2HI - (minus:HI - (vec_select:HI (match_dup 1) (parallel [(const_int 12)])) - (vec_select:HI (match_dup 1) (parallel [(const_int 13)]))) - (minus:HI - (vec_select:HI (match_dup 1) (parallel [(const_int 14)])) - (vec_select:HI (match_dup 1) (parallel [(const_int 15)])))))) - (vec_concat:V8HI - (vec_concat:V4HI - (vec_concat:V2HI - (minus:HI - (vec_select:HI - (match_operand:V16HI 2 "nonimmediate_operand" "xm") - (parallel [(const_int 0)])) - (vec_select:HI (match_dup 2) (parallel [(const_int 1)]))) - (minus:HI - (vec_select:HI (match_dup 2) (parallel [(const_int 2)])) - (vec_select:HI (match_dup 2) (parallel [(const_int 3)])))) - (vec_concat:V2HI - (minus:HI - (vec_select:HI (match_dup 2) (parallel [(const_int 4)])) - (vec_select:HI (match_dup 2) (parallel [(const_int 5)]))) - (minus:HI - (vec_select:HI (match_dup 2) (parallel [(const_int 6)])) - (vec_select:HI (match_dup 2) (parallel [(const_int 7)]))))) - (vec_concat:V4HI - (vec_concat:V2HI - (minus:HI - (vec_select:HI (match_dup 2) (parallel [(const_int 8)])) - (vec_select:HI (match_dup 2) (parallel [(const_int 9)]))) - (minus:HI - (vec_select:HI (match_dup 2) (parallel [(const_int 10)])) - (vec_select:HI (match_dup 2) (parallel [(const_int 11)])))) - (vec_concat:V2HI - (minus:HI - (vec_select:HI (match_dup 2) (parallel [(const_int 12)])) - (vec_select:HI (match_dup 2) (parallel [(const_int 13)]))) - (minus:HI - (vec_select:HI (match_dup 2) (parallel [(const_int 14)])) - (vec_select:HI (match_dup 2) (parallel [(const_int 15)]))))))))] - "TARGET_AVX2" - "vphsubw\t{%2, %1, %0|%0, %1, %2}" - [(set_attr "type" "sseiadd") - (set_attr "prefix_extra" "1") - (set_attr "prefix" "vex") - (set_attr "mode" "OI")]) - -(define_insn "ssse3_phsubwv8hi3" - [(set (match_operand:V8HI 0 "register_operand" "=x,x") - (vec_concat:V8HI - (vec_concat:V4HI - (vec_concat:V2HI - (minus:HI - (vec_select:HI - (match_operand:V8HI 1 "register_operand" "0,x") - (parallel [(const_int 0)])) - (vec_select:HI (match_dup 1) (parallel [(const_int 1)]))) - (minus:HI - (vec_select:HI (match_dup 1) (parallel [(const_int 2)])) - (vec_select:HI (match_dup 1) (parallel [(const_int 3)])))) - (vec_concat:V2HI - (minus:HI - (vec_select:HI (match_dup 1) (parallel [(const_int 4)])) - (vec_select:HI (match_dup 1) (parallel [(const_int 5)]))) - (minus:HI - (vec_select:HI (match_dup 1) (parallel [(const_int 6)])) - (vec_select:HI (match_dup 1) (parallel [(const_int 7)]))))) - (vec_concat:V4HI - (vec_concat:V2HI - (minus:HI - (vec_select:HI - (match_operand:V8HI 2 "nonimmediate_operand" "xm,xm") - (parallel [(const_int 0)])) - (vec_select:HI (match_dup 2) (parallel [(const_int 1)]))) - (minus:HI - (vec_select:HI (match_dup 2) (parallel [(const_int 2)])) - (vec_select:HI (match_dup 2) (parallel [(const_int 3)])))) - (vec_concat:V2HI - (minus:HI - (vec_select:HI (match_dup 2) (parallel [(const_int 4)])) - (vec_select:HI (match_dup 2) (parallel [(const_int 5)]))) - (minus:HI - (vec_select:HI (match_dup 2) (parallel [(const_int 6)])) - (vec_select:HI (match_dup 2) (parallel [(const_int 7)])))))))] - "TARGET_SSSE3" - "@ - phsubw\t{%2, %0|%0, %2} - vphsubw\t{%2, %1, %0|%0, %1, %2}" - [(set_attr "isa" "noavx,avx") - (set_attr "type" "sseiadd") - (set_attr "atom_unit" "complex") - (set_attr "prefix_data16" "1,*") - (set_attr "prefix_extra" "1") - (set_attr "prefix" "orig,vex") - (set_attr "mode" "TI")]) - -(define_insn "ssse3_phsubwv4hi3" - [(set (match_operand:V4HI 0 "register_operand" "=y") - (vec_concat:V4HI - (vec_concat:V2HI - (minus:HI - (vec_select:HI - (match_operand:V4HI 1 "register_operand" "0") - (parallel [(const_int 0)])) - (vec_select:HI (match_dup 1) (parallel [(const_int 1)]))) - (minus:HI - (vec_select:HI (match_dup 1) (parallel [(const_int 2)])) - (vec_select:HI (match_dup 1) (parallel [(const_int 3)])))) - (vec_concat:V2HI - (minus:HI - (vec_select:HI - (match_operand:V4HI 2 "nonimmediate_operand" "ym") - (parallel [(const_int 0)])) - (vec_select:HI (match_dup 2) (parallel [(const_int 1)]))) - (minus:HI - (vec_select:HI (match_dup 2) (parallel [(const_int 2)])) - (vec_select:HI (match_dup 2) (parallel [(const_int 3)]))))))] - "TARGET_SSSE3" - "phsubw\t{%2, %0|%0, %2}" - [(set_attr "type" "sseiadd") - (set_attr "atom_unit" "complex") - (set_attr "prefix_extra" "1") - (set (attr "prefix_rex") (symbol_ref "x86_extended_reg_mentioned_p (insn)")) - (set_attr "mode" "DI")]) - -(define_insn "avx2_phsubdv8si3" +(define_insn "avx2_phdv8si3" [(set (match_operand:V8SI 0 "register_operand" "=x") (vec_concat:V8SI (vec_concat:V4SI (vec_concat:V2SI - (minus:SI + (plusminus:SI (vec_select:SI (match_operand:V8SI 1 "register_operand" "x") (parallel [(const_int 0)])) (vec_select:SI (match_dup 1) (parallel [(const_int 1)]))) - (minus:SI + (plusminus:SI (vec_select:SI (match_dup 1) (parallel [(const_int 2)])) (vec_select:SI (match_dup 1) (parallel [(const_int 3)])))) (vec_concat:V2SI - (minus:SI + (plusminus:SI (vec_select:SI (match_dup 1) (parallel [(const_int 4)])) (vec_select:SI (match_dup 1) (parallel [(const_int 5)]))) - (minus:SI + (plusminus:SI (vec_select:SI (match_dup 1) (parallel [(const_int 6)])) (vec_select:SI (match_dup 1) (parallel [(const_int 7)]))))) (vec_concat:V4SI (vec_concat:V2SI - (minus:SI + (plusminus:SI (vec_select:SI (match_operand:V8SI 2 "nonimmediate_operand" "xm") (parallel [(const_int 0)])) (vec_select:SI (match_dup 2) (parallel [(const_int 1)]))) - (minus:SI + (plusminus:SI (vec_select:SI (match_dup 2) (parallel [(const_int 2)])) (vec_select:SI (match_dup 2) (parallel [(const_int 3)])))) (vec_concat:V2SI - (minus:SI + (plusminus:SI (vec_select:SI (match_dup 2) (parallel [(const_int 4)])) (vec_select:SI (match_dup 2) (parallel [(const_int 5)]))) - (minus:SI + (plusminus:SI (vec_select:SI (match_dup 2) (parallel [(const_int 6)])) (vec_select:SI (match_dup 2) (parallel [(const_int 7)])))))))] "TARGET_AVX2" - "vphsubd\t{%2, %1, %0|%0, %1, %2}" + "vphd\t{%2, %1, %0|%0, %1, %2}" [(set_attr "type" "sseiadd") (set_attr "prefix_extra" "1") (set_attr "prefix" "vex") (set_attr "mode" "OI")]) -(define_insn "ssse3_phsubdv4si3" +(define_insn "ssse3_phdv4si3" [(set (match_operand:V4SI 0 "register_operand" "=x,x") (vec_concat:V4SI (vec_concat:V2SI - (minus:SI + (plusminus:SI (vec_select:SI (match_operand:V4SI 1 "register_operand" "0,x") (parallel [(const_int 0)])) (vec_select:SI (match_dup 1) (parallel [(const_int 1)]))) - (minus:SI + (plusminus:SI (vec_select:SI (match_dup 1) (parallel [(const_int 2)])) (vec_select:SI (match_dup 1) (parallel [(const_int 3)])))) (vec_concat:V2SI - (minus:SI + (plusminus:SI (vec_select:SI (match_operand:V4SI 2 "nonimmediate_operand" "xm,xm") (parallel [(const_int 0)])) (vec_select:SI (match_dup 2) (parallel [(const_int 1)]))) - (minus:SI + (plusminus:SI (vec_select:SI (match_dup 2) (parallel [(const_int 2)])) (vec_select:SI (match_dup 2) (parallel [(const_int 3)]))))))] "TARGET_SSSE3" "@ - phsubd\t{%2, %0|%0, %2} - vphsubd\t{%2, %1, %0|%0, %1, %2}" - + phd\t{%2, %0|%0, %2} + vphd\t{%2, %1, %0|%0, %1, %2}" [(set_attr "isa" "noavx,avx") (set_attr "type" "sseiadd") (set_attr "atom_unit" "complex") @@ -8813,181 +8409,27 @@ (set_attr "prefix" "orig,vex") (set_attr "mode" "TI")]) -(define_insn "ssse3_phsubdv2si3" +(define_insn "ssse3_phdv2si3" [(set (match_operand:V2SI 0 "register_operand" "=y") (vec_concat:V2SI - (minus:SI + (plusminus:SI (vec_select:SI (match_operand:V2SI 1 "register_operand" "0") (parallel [(const_int 0)])) (vec_select:SI (match_dup 1) (parallel [(const_int 1)]))) - (minus:SI + (plusminus:SI (vec_select:SI (match_operand:V2SI 2 "nonimmediate_operand" "ym") (parallel [(const_int 0)])) (vec_select:SI (match_dup 2) (parallel [(const_int 1)])))))] "TARGET_SSSE3" - "phsubd\t{%2, %0|%0, %2}" + "phd\t{%2, %0|%0, %2}" [(set_attr "type" "sseiadd") (set_attr "atom_unit" "complex") (set_attr "prefix_extra" "1") (set (attr "prefix_rex") (symbol_ref "x86_extended_reg_mentioned_p (insn)")) (set_attr "mode" "DI")]) -(define_insn "avx2_phsubswv16hi3" - [(set (match_operand:V16HI 0 "register_operand" "=x") - (vec_concat:V16HI - (vec_concat:V8HI - (vec_concat:V4HI - (vec_concat:V2HI - (ss_minus:HI - (vec_select:HI - (match_operand:V16HI 1 "register_operand" "x") - (parallel [(const_int 0)])) - (vec_select:HI (match_dup 1) (parallel [(const_int 1)]))) - (ss_minus:HI - (vec_select:HI (match_dup 1) (parallel [(const_int 2)])) - (vec_select:HI (match_dup 1) (parallel [(const_int 3)])))) - (vec_concat:V2HI - (ss_minus:HI - (vec_select:HI (match_dup 1) (parallel [(const_int 4)])) - (vec_select:HI (match_dup 1) (parallel [(const_int 5)]))) - (ss_minus:HI - (vec_select:HI (match_dup 1) (parallel [(const_int 6)])) - (vec_select:HI (match_dup 1) (parallel [(const_int 7)]))))) - (vec_concat:V4HI - (vec_concat:V2HI - (ss_minus:HI - (vec_select:HI (match_dup 1) (parallel [(const_int 8)])) - (vec_select:HI (match_dup 1) (parallel [(const_int 9)]))) - (ss_minus:HI - (vec_select:HI (match_dup 1) (parallel [(const_int 10)])) - (vec_select:HI (match_dup 1) (parallel [(const_int 11)])))) - (vec_concat:V2HI - (ss_minus:HI - (vec_select:HI (match_dup 1) (parallel [(const_int 12)])) - (vec_select:HI (match_dup 1) (parallel [(const_int 13)]))) - (ss_minus:HI - (vec_select:HI (match_dup 1) (parallel [(const_int 14)])) - (vec_select:HI (match_dup 1) (parallel [(const_int 15)])))))) - (vec_concat:V8HI - (vec_concat:V4HI - (vec_concat:V2HI - (ss_minus:HI - (vec_select:HI - (match_operand:V16HI 2 "nonimmediate_operand" "xm") - (parallel [(const_int 0)])) - (vec_select:HI (match_dup 2) (parallel [(const_int 1)]))) - (ss_minus:HI - (vec_select:HI (match_dup 2) (parallel [(const_int 2)])) - (vec_select:HI (match_dup 2) (parallel [(const_int 3)])))) - (vec_concat:V2HI - (ss_minus:HI - (vec_select:HI (match_dup 2) (parallel [(const_int 4)])) - (vec_select:HI (match_dup 2) (parallel [(const_int 5)]))) - (ss_minus:HI - (vec_select:HI (match_dup 2) (parallel [(const_int 6)])) - (vec_select:HI (match_dup 2) (parallel [(const_int 7)]))))) - (vec_concat:V4HI - (vec_concat:V2HI - (ss_minus:HI - (vec_select:HI (match_dup 2) (parallel [(const_int 8)])) - (vec_select:HI (match_dup 2) (parallel [(const_int 9)]))) - (ss_minus:HI - (vec_select:HI (match_dup 2) (parallel [(const_int 10)])) - (vec_select:HI (match_dup 2) (parallel [(const_int 11)])))) - (vec_concat:V2HI - (ss_minus:HI - (vec_select:HI (match_dup 2) (parallel [(const_int 12)])) - (vec_select:HI (match_dup 2) (parallel [(const_int 13)]))) - (ss_minus:HI - (vec_select:HI (match_dup 2) (parallel [(const_int 14)])) - (vec_select:HI (match_dup 2) (parallel [(const_int 15)]))))))))] - "TARGET_AVX2" - "vphsubsw\t{%2, %1, %0|%0, %1, %2}" - [(set_attr "type" "sseiadd") - (set_attr "prefix_extra" "1") - (set_attr "prefix" "vex") - (set_attr "mode" "OI")]) - -(define_insn "ssse3_phsubswv8hi3" - [(set (match_operand:V8HI 0 "register_operand" "=x,x") - (vec_concat:V8HI - (vec_concat:V4HI - (vec_concat:V2HI - (ss_minus:HI - (vec_select:HI - (match_operand:V8HI 1 "register_operand" "0,x") - (parallel [(const_int 0)])) - (vec_select:HI (match_dup 1) (parallel [(const_int 1)]))) - (ss_minus:HI - (vec_select:HI (match_dup 1) (parallel [(const_int 2)])) - (vec_select:HI (match_dup 1) (parallel [(const_int 3)])))) - (vec_concat:V2HI - (ss_minus:HI - (vec_select:HI (match_dup 1) (parallel [(const_int 4)])) - (vec_select:HI (match_dup 1) (parallel [(const_int 5)]))) - (ss_minus:HI - (vec_select:HI (match_dup 1) (parallel [(const_int 6)])) - (vec_select:HI (match_dup 1) (parallel [(const_int 7)]))))) - (vec_concat:V4HI - (vec_concat:V2HI - (ss_minus:HI - (vec_select:HI - (match_operand:V8HI 2 "nonimmediate_operand" "xm,xm") - (parallel [(const_int 0)])) - (vec_select:HI (match_dup 2) (parallel [(const_int 1)]))) - (ss_minus:HI - (vec_select:HI (match_dup 2) (parallel [(const_int 2)])) - (vec_select:HI (match_dup 2) (parallel [(const_int 3)])))) - (vec_concat:V2HI - (ss_minus:HI - (vec_select:HI (match_dup 2) (parallel [(const_int 4)])) - (vec_select:HI (match_dup 2) (parallel [(const_int 5)]))) - (ss_minus:HI - (vec_select:HI (match_dup 2) (parallel [(const_int 6)])) - (vec_select:HI (match_dup 2) (parallel [(const_int 7)])))))))] - "TARGET_SSSE3" - "@ - phsubsw\t{%2, %0|%0, %2} - vphsubsw\t{%2, %1, %0|%0, %1, %2}" - [(set_attr "isa" "noavx,avx") - (set_attr "type" "sseiadd") - (set_attr "atom_unit" "complex") - (set_attr "prefix_data16" "1,*") - (set_attr "prefix_extra" "1") - (set_attr "prefix" "orig,vex") - (set_attr "mode" "TI")]) - -(define_insn "ssse3_phsubswv4hi3" - [(set (match_operand:V4HI 0 "register_operand" "=y") - (vec_concat:V4HI - (vec_concat:V2HI - (ss_minus:HI - (vec_select:HI - (match_operand:V4HI 1 "register_operand" "0") - (parallel [(const_int 0)])) - (vec_select:HI (match_dup 1) (parallel [(const_int 1)]))) - (ss_minus:HI - (vec_select:HI (match_dup 1) (parallel [(const_int 2)])) - (vec_select:HI (match_dup 1) (parallel [(const_int 3)])))) - (vec_concat:V2HI - (ss_minus:HI - (vec_select:HI - (match_operand:V4HI 2 "nonimmediate_operand" "ym") - (parallel [(const_int 0)])) - (vec_select:HI (match_dup 2) (parallel [(const_int 1)]))) - (ss_minus:HI - (vec_select:HI (match_dup 2) (parallel [(const_int 2)])) - (vec_select:HI (match_dup 2) (parallel [(const_int 3)]))))))] - "TARGET_SSSE3" - "phsubsw\t{%2, %0|%0, %2}" - [(set_attr "type" "sseiadd") - (set_attr "atom_unit" "complex") - (set_attr "prefix_extra" "1") - (set (attr "prefix_rex") (symbol_ref "x86_extended_reg_mentioned_p (insn)")) - (set_attr "mode" "DI")]) - (define_insn "avx2_pmaddubsw256" [(set (match_operand:V16HI 0 "register_operand" "=x") (ss_plus:V16HI @@ -9314,9 +8756,10 @@ (define_insn "_pshufb3" [(set (match_operand:VI1_AVX2 0 "register_operand" "=x,x") - (unspec:VI1_AVX2 [(match_operand:VI1_AVX2 1 "register_operand" "0,x") - (match_operand:VI1_AVX2 2 "nonimmediate_operand" "xm,xm")] - UNSPEC_PSHUFB))] + (unspec:VI1_AVX2 + [(match_operand:VI1_AVX2 1 "register_operand" "0,x") + (match_operand:VI1_AVX2 2 "nonimmediate_operand" "xm,xm")] + UNSPEC_PSHUFB))] "TARGET_SSSE3" "@ pshufb\t{%2, %0|%0, %2} @@ -9372,10 +8815,11 @@ (define_insn "_palignr" [(set (match_operand:SSESCALARMODE 0 "register_operand" "=x,x") - (unspec:SSESCALARMODE [(match_operand:SSESCALARMODE 1 "register_operand" "0,x") - (match_operand:SSESCALARMODE 2 "nonimmediate_operand" "xm,xm") - (match_operand:SI 3 "const_0_to_255_mul_8_operand" "n,n")] - UNSPEC_PALIGNR))] + (unspec:SSESCALARMODE + [(match_operand:SSESCALARMODE 1 "register_operand" "0,x") + (match_operand:SSESCALARMODE 2 "nonimmediate_operand" "xm,xm") + (match_operand:SI 3 "const_0_to_255_mul_8_operand" "n,n")] + UNSPEC_PALIGNR))] "TARGET_SSSE3" { operands[3] = GEN_INT (INTVAL (operands[3]) / 8); @@ -9595,10 +9039,11 @@ (define_insn "_mpsadbw" [(set (match_operand:VI1_AVX2 0 "register_operand" "=x,x") - (unspec:VI1_AVX2 [(match_operand:VI1_AVX2 1 "register_operand" "0,x") - (match_operand:VI1_AVX2 2 "nonimmediate_operand" "xm,xm") - (match_operand:SI 3 "const_0_to_255_operand" "n,n")] - UNSPEC_MPSADBW))] + (unspec:VI1_AVX2 + [(match_operand:VI1_AVX2 1 "register_operand" "0,x") + (match_operand:VI1_AVX2 2 "nonimmediate_operand" "xm,xm") + (match_operand:SI 3 "const_0_to_255_operand" "n,n")] + UNSPEC_MPSADBW))] "TARGET_SSE4_1" "@ mpsadbw\t{%3, %2, %0|%0, %2, %3} @@ -10396,78 +9841,51 @@ ;; ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +(define_code_iterator xop_plus [plus ss_plus]) + +(define_code_attr macs [(plus "macs") (ss_plus "macss")]) +(define_code_attr madcs [(plus "madcs") (ss_plus "madcss")]) + ;; XOP parallel integer multiply/add instructions. ;; Note the XOP multiply/add instructions ;; a[i] = b[i] * c[i] + d[i]; ;; do not allow the value being added to be a memory operation. -(define_insn "xop_pmacsww" - [(set (match_operand:V8HI 0 "register_operand" "=x") - (plus:V8HI - (mult:V8HI - (match_operand:V8HI 1 "nonimmediate_operand" "%x") - (match_operand:V8HI 2 "nonimmediate_operand" "xm")) - (match_operand:V8HI 3 "nonimmediate_operand" "x")))] - "TARGET_XOP" - "vpmacsww\t{%3, %2, %1, %0|%0, %1, %2, %3}" - [(set_attr "type" "ssemuladd") - (set_attr "mode" "TI")]) -(define_insn "xop_pmacssww" - [(set (match_operand:V8HI 0 "register_operand" "=x") - (ss_plus:V8HI - (mult:V8HI (match_operand:V8HI 1 "nonimmediate_operand" "%x") - (match_operand:V8HI 2 "nonimmediate_operand" "xm")) - (match_operand:V8HI 3 "nonimmediate_operand" "x")))] +(define_insn "xop_p" + [(set (match_operand:VI24_128 0 "register_operand" "=x") + (xop_plus:VI24_128 + (mult:VI24_128 + (match_operand:VI24_128 1 "nonimmediate_operand" "%x") + (match_operand:VI24_128 2 "nonimmediate_operand" "xm")) + (match_operand:VI24_128 3 "nonimmediate_operand" "x")))] "TARGET_XOP" - "vpmacssww\t{%3, %2, %1, %0|%0, %1, %2, %3}" + "vp\t{%3, %2, %1, %0|%0, %1, %2, %3}" [(set_attr "type" "ssemuladd") (set_attr "mode" "TI")]) -(define_insn "xop_pmacsdd" - [(set (match_operand:V4SI 0 "register_operand" "=x") - (plus:V4SI - (mult:V4SI - (match_operand:V4SI 1 "nonimmediate_operand" "%x") - (match_operand:V4SI 2 "nonimmediate_operand" "xm")) - (match_operand:V4SI 3 "nonimmediate_operand" "x")))] - "TARGET_XOP" - "vpmacsdd\t{%3, %2, %1, %0|%0, %1, %2, %3}" - [(set_attr "type" "ssemuladd") - (set_attr "mode" "TI")]) - -(define_insn "xop_pmacssdd" - [(set (match_operand:V4SI 0 "register_operand" "=x") - (ss_plus:V4SI - (mult:V4SI (match_operand:V4SI 1 "nonimmediate_operand" "%x") - (match_operand:V4SI 2 "nonimmediate_operand" "xm")) - (match_operand:V4SI 3 "nonimmediate_operand" "x")))] - "TARGET_XOP" - "vpmacssdd\t{%3, %2, %1, %0|%0, %1, %2, %3}" - [(set_attr "type" "ssemuladd") - (set_attr "mode" "TI")]) - -(define_insn "xop_pmacssdql" +(define_insn "xop_pdql" [(set (match_operand:V2DI 0 "register_operand" "=x") - (ss_plus:V2DI + (xop_plus:V2DI (mult:V2DI (sign_extend:V2DI (vec_select:V2SI (match_operand:V4SI 1 "nonimmediate_operand" "%x") (parallel [(const_int 1) (const_int 3)]))) - (vec_select:V2SI - (match_operand:V4SI 2 "nonimmediate_operand" "xm") - (parallel [(const_int 1) - (const_int 3)]))) + (sign_extend:V2DI + (vec_select:V2SI + (match_operand:V4SI 2 "nonimmediate_operand" "xm") + (parallel [(const_int 1) + (const_int 3)])))) (match_operand:V2DI 3 "nonimmediate_operand" "x")))] "TARGET_XOP" - "vpmacssdql\t{%3, %2, %1, %0|%0, %1, %2, %3}" + "vpdql\t{%3, %2, %1, %0|%0, %1, %2, %3}" [(set_attr "type" "ssemuladd") (set_attr "mode" "TI")]) -(define_insn "xop_pmacssdqh" +(define_insn "xop_pdqh" [(set (match_operand:V2DI 0 "register_operand" "=x") - (ss_plus:V2DI + (xop_plus:V2DI (mult:V2DI (sign_extend:V2DI (vec_select:V2SI @@ -10481,30 +9899,10 @@ (const_int 2)])))) (match_operand:V2DI 3 "nonimmediate_operand" "x")))] "TARGET_XOP" - "vpmacssdqh\t{%3, %2, %1, %0|%0, %1, %2, %3}" + "vpdqh\t{%3, %2, %1, %0|%0, %1, %2, %3}" [(set_attr "type" "ssemuladd") (set_attr "mode" "TI")]) -(define_insn "xop_pmacsdql" - [(set (match_operand:V2DI 0 "register_operand" "=x") - (plus:V2DI - (mult:V2DI - (sign_extend:V2DI - (vec_select:V2SI - (match_operand:V4SI 1 "nonimmediate_operand" "%x") - (parallel [(const_int 1) - (const_int 3)]))) - (sign_extend:V2DI - (vec_select:V2SI - (match_operand:V4SI 2 "nonimmediate_operand" "xm") - (parallel [(const_int 1) - (const_int 3)])))) - (match_operand:V2DI 3 "nonimmediate_operand" "x")))] - "TARGET_XOP" - "vpmacsdql\t{%3, %2, %1, %0|%0, %1, %2, %3}" - [(set_attr "type" "ssemuladd") - (set_attr "mode" "TI")]) - ;; We don't have a straight 32-bit parallel multiply and extend on XOP, so ;; fake it with a multiply/add. In general, we expect the define_split to ;; occur before register allocation, so we have to handle the corner case where @@ -10547,26 +9945,6 @@ [(set_attr "type" "ssemul") (set_attr "mode" "TI")]) -(define_insn "xop_pmacsdqh" - [(set (match_operand:V2DI 0 "register_operand" "=x") - (plus:V2DI - (mult:V2DI - (sign_extend:V2DI - (vec_select:V2SI - (match_operand:V4SI 1 "nonimmediate_operand" "%x") - (parallel [(const_int 0) - (const_int 2)]))) - (sign_extend:V2DI - (vec_select:V2SI - (match_operand:V4SI 2 "nonimmediate_operand" "xm") - (parallel [(const_int 0) - (const_int 2)])))) - (match_operand:V2DI 3 "nonimmediate_operand" "x")))] - "TARGET_XOP" - "vpmacsdqh\t{%3, %2, %1, %0|%0, %1, %2, %3}" - [(set_attr "type" "ssemuladd") - (set_attr "mode" "TI")]) - ;; We don't have a straight 32-bit parallel multiply and extend on XOP, so ;; fake it with a multiply/add. In general, we expect the define_split to ;; occur before register allocation, so we have to handle the corner case where @@ -10610,9 +9988,9 @@ (set_attr "mode" "TI")]) ;; XOP parallel integer multiply/add instructions for the intrinisics -(define_insn "xop_pmacsswd" +(define_insn "xop_pwd" [(set (match_operand:V4SI 0 "register_operand" "=x") - (ss_plus:V4SI + (xop_plus:V4SI (mult:V4SI (sign_extend:V4SI (vec_select:V4HI @@ -10630,37 +10008,13 @@ (const_int 7)])))) (match_operand:V4SI 3 "nonimmediate_operand" "x")))] "TARGET_XOP" - "vpmacsswd\t{%3, %2, %1, %0|%0, %1, %2, %3}" + "vpwd\t{%3, %2, %1, %0|%0, %1, %2, %3}" [(set_attr "type" "ssemuladd") (set_attr "mode" "TI")]) -(define_insn "xop_pmacswd" +(define_insn "xop_pwd" [(set (match_operand:V4SI 0 "register_operand" "=x") - (plus:V4SI - (mult:V4SI - (sign_extend:V4SI - (vec_select:V4HI - (match_operand:V8HI 1 "nonimmediate_operand" "%x") - (parallel [(const_int 1) - (const_int 3) - (const_int 5) - (const_int 7)]))) - (sign_extend:V4SI - (vec_select:V4HI - (match_operand:V8HI 2 "nonimmediate_operand" "xm") - (parallel [(const_int 1) - (const_int 3) - (const_int 5) - (const_int 7)])))) - (match_operand:V4SI 3 "nonimmediate_operand" "x")))] - "TARGET_XOP" - "vpmacswd\t{%3, %2, %1, %0|%0, %1, %2, %3}" - [(set_attr "type" "ssemuladd") - (set_attr "mode" "TI")]) - -(define_insn "xop_pmadcsswd" - [(set (match_operand:V4SI 0 "register_operand" "=x") - (ss_plus:V4SI + (xop_plus:V4SI (plus:V4SI (mult:V4SI (sign_extend:V4SI @@ -10694,50 +10048,10 @@ (const_int 7)]))))) (match_operand:V4SI 3 "nonimmediate_operand" "x")))] "TARGET_XOP" - "vpmadcsswd\t{%3, %2, %1, %0|%0, %1, %2, %3}" + "vpwd\t{%3, %2, %1, %0|%0, %1, %2, %3}" [(set_attr "type" "ssemuladd") (set_attr "mode" "TI")]) -(define_insn "xop_pmadcswd" - [(set (match_operand:V4SI 0 "register_operand" "=x") - (plus:V4SI - (plus:V4SI - (mult:V4SI - (sign_extend:V4SI - (vec_select:V4HI - (match_operand:V8HI 1 "nonimmediate_operand" "%x") - (parallel [(const_int 0) - (const_int 2) - (const_int 4) - (const_int 6)]))) - (sign_extend:V4SI - (vec_select:V4HI - (match_operand:V8HI 2 "nonimmediate_operand" "xm") - (parallel [(const_int 0) - (const_int 2) - (const_int 4) - (const_int 6)])))) - (mult:V4SI - (sign_extend:V4SI - (vec_select:V4HI - (match_dup 1) - (parallel [(const_int 1) - (const_int 3) - (const_int 5) - (const_int 7)]))) - (sign_extend:V4SI - (vec_select:V4HI - (match_dup 2) - (parallel [(const_int 1) - (const_int 3) - (const_int 5) - (const_int 7)]))))) - (match_operand:V4SI 3 "nonimmediate_operand" "x")))] - "TARGET_XOP" - "vpmadcswd\t{%3, %2, %1, %0|%0, %1, %2, %3}" - [(set_attr "type" "ssemuladd") - (set_attr "mode" "TI")]) - ;; XOP parallel XMM conditional moves (define_insn "xop_pcmov_" [(set (match_operand:V 0 "register_operand" "=x,x") @@ -10750,10 +10064,10 @@ [(set_attr "type" "sse4arg")]) ;; XOP horizontal add/subtract instructions -(define_insn "xop_phaddbw" +(define_insn "xop_phaddbw" [(set (match_operand:V8HI 0 "register_operand" "=x") (plus:V8HI - (sign_extend:V8HI + (any_extend:V8HI (vec_select:V8QI (match_operand:V16QI 1 "nonimmediate_operand" "xm") (parallel [(const_int 0) @@ -10764,7 +10078,7 @@ (const_int 10) (const_int 12) (const_int 14)]))) - (sign_extend:V8HI + (any_extend:V8HI (vec_select:V8QI (match_dup 1) (parallel [(const_int 1) @@ -10776,21 +10090,21 @@ (const_int 13) (const_int 15)])))))] "TARGET_XOP" - "vphaddbw\t{%1, %0|%0, %1}" + "vphaddbw\t{%1, %0|%0, %1}" [(set_attr "type" "sseiadd1")]) -(define_insn "xop_phaddbd" +(define_insn "xop_phaddbd" [(set (match_operand:V4SI 0 "register_operand" "=x") (plus:V4SI (plus:V4SI - (sign_extend:V4SI + (any_extend:V4SI (vec_select:V4QI (match_operand:V16QI 1 "nonimmediate_operand" "xm") (parallel [(const_int 0) (const_int 4) (const_int 8) (const_int 12)]))) - (sign_extend:V4SI + (any_extend:V4SI (vec_select:V4QI (match_dup 1) (parallel [(const_int 1) @@ -10798,14 +10112,14 @@ (const_int 9) (const_int 13)])))) (plus:V4SI - (sign_extend:V4SI + (any_extend:V4SI (vec_select:V4QI (match_dup 1) (parallel [(const_int 2) (const_int 6) (const_int 10) (const_int 14)]))) - (sign_extend:V4SI + (any_extend:V4SI (vec_select:V4QI (match_dup 1) (parallel [(const_int 3) @@ -10813,73 +10127,73 @@ (const_int 11) (const_int 15)]))))))] "TARGET_XOP" - "vphaddbd\t{%1, %0|%0, %1}" + "vphaddbd\t{%1, %0|%0, %1}" [(set_attr "type" "sseiadd1")]) -(define_insn "xop_phaddbq" +(define_insn "xop_phaddbq" [(set (match_operand:V2DI 0 "register_operand" "=x") (plus:V2DI (plus:V2DI (plus:V2DI - (sign_extend:V2DI + (any_extend:V2DI (vec_select:V2QI (match_operand:V16QI 1 "nonimmediate_operand" "xm") (parallel [(const_int 0) (const_int 4)]))) - (sign_extend:V2DI + (any_extend:V2DI (vec_select:V2QI (match_dup 1) (parallel [(const_int 1) (const_int 5)])))) (plus:V2DI - (sign_extend:V2DI + (any_extend:V2DI (vec_select:V2QI (match_dup 1) (parallel [(const_int 2) (const_int 6)]))) - (sign_extend:V2DI + (any_extend:V2DI (vec_select:V2QI (match_dup 1) (parallel [(const_int 3) (const_int 7)]))))) (plus:V2DI (plus:V2DI - (sign_extend:V2DI + (any_extend:V2DI (vec_select:V2QI (match_dup 1) (parallel [(const_int 8) (const_int 12)]))) - (sign_extend:V2DI + (any_extend:V2DI (vec_select:V2QI (match_dup 1) (parallel [(const_int 9) (const_int 13)])))) (plus:V2DI - (sign_extend:V2DI + (any_extend:V2DI (vec_select:V2QI (match_dup 1) (parallel [(const_int 10) (const_int 14)]))) - (sign_extend:V2DI + (any_extend:V2DI (vec_select:V2QI (match_dup 1) (parallel [(const_int 11) (const_int 15)])))))))] "TARGET_XOP" - "vphaddbq\t{%1, %0|%0, %1}" + "vphaddbq\t{%1, %0|%0, %1}" [(set_attr "type" "sseiadd1")]) -(define_insn "xop_phaddwd" +(define_insn "xop_phaddwd" [(set (match_operand:V4SI 0 "register_operand" "=x") (plus:V4SI - (sign_extend:V4SI + (any_extend:V4SI (vec_select:V4HI (match_operand:V8HI 1 "nonimmediate_operand" "xm") (parallel [(const_int 0) (const_int 2) (const_int 4) (const_int 6)]))) - (sign_extend:V4SI + (any_extend:V4SI (vec_select:V4HI (match_dup 1) (parallel [(const_int 1) @@ -10887,241 +10201,55 @@ (const_int 5) (const_int 7)])))))] "TARGET_XOP" - "vphaddwd\t{%1, %0|%0, %1}" + "vphaddwd\t{%1, %0|%0, %1}" [(set_attr "type" "sseiadd1")]) -(define_insn "xop_phaddwq" +(define_insn "xop_phaddwq" [(set (match_operand:V2DI 0 "register_operand" "=x") (plus:V2DI (plus:V2DI - (sign_extend:V2DI + (any_extend:V2DI (vec_select:V2HI (match_operand:V8HI 1 "nonimmediate_operand" "xm") (parallel [(const_int 0) (const_int 4)]))) - (sign_extend:V2DI + (any_extend:V2DI (vec_select:V2HI (match_dup 1) (parallel [(const_int 1) (const_int 5)])))) (plus:V2DI - (sign_extend:V2DI + (any_extend:V2DI (vec_select:V2HI (match_dup 1) (parallel [(const_int 2) (const_int 6)]))) - (sign_extend:V2DI + (any_extend:V2DI (vec_select:V2HI (match_dup 1) (parallel [(const_int 3) (const_int 7)]))))))] "TARGET_XOP" - "vphaddwq\t{%1, %0|%0, %1}" + "vphaddwq\t{%1, %0|%0, %1}" [(set_attr "type" "sseiadd1")]) -(define_insn "xop_phadddq" +(define_insn "xop_phadddq" [(set (match_operand:V2DI 0 "register_operand" "=x") (plus:V2DI - (sign_extend:V2DI + (any_extend:V2DI (vec_select:V2SI (match_operand:V4SI 1 "nonimmediate_operand" "xm") (parallel [(const_int 0) (const_int 2)]))) - (sign_extend:V2DI + (any_extend:V2DI (vec_select:V2SI (match_dup 1) (parallel [(const_int 1) (const_int 3)])))))] "TARGET_XOP" - "vphadddq\t{%1, %0|%0, %1}" + "vphadddq\t{%1, %0|%0, %1}" [(set_attr "type" "sseiadd1")]) -(define_insn "xop_phaddubw" - [(set (match_operand:V8HI 0 "register_operand" "=x") - (plus:V8HI - (zero_extend:V8HI - (vec_select:V8QI - (match_operand:V16QI 1 "nonimmediate_operand" "xm") - (parallel [(const_int 0) - (const_int 2) - (const_int 4) - (const_int 6) - (const_int 8) - (const_int 10) - (const_int 12) - (const_int 14)]))) - (zero_extend:V8HI - (vec_select:V8QI - (match_dup 1) - (parallel [(const_int 1) - (const_int 3) - (const_int 5) - (const_int 7) - (const_int 9) - (const_int 11) - (const_int 13) - (const_int 15)])))))] - "TARGET_XOP" - "vphaddubw\t{%1, %0|%0, %1}" - [(set_attr "type" "sseiadd1")]) - -(define_insn "xop_phaddubd" - [(set (match_operand:V4SI 0 "register_operand" "=x") - (plus:V4SI - (plus:V4SI - (zero_extend:V4SI - (vec_select:V4QI - (match_operand:V16QI 1 "nonimmediate_operand" "xm") - (parallel [(const_int 0) - (const_int 4) - (const_int 8) - (const_int 12)]))) - (zero_extend:V4SI - (vec_select:V4QI - (match_dup 1) - (parallel [(const_int 1) - (const_int 5) - (const_int 9) - (const_int 13)])))) - (plus:V4SI - (zero_extend:V4SI - (vec_select:V4QI - (match_dup 1) - (parallel [(const_int 2) - (const_int 6) - (const_int 10) - (const_int 14)]))) - (zero_extend:V4SI - (vec_select:V4QI - (match_dup 1) - (parallel [(const_int 3) - (const_int 7) - (const_int 11) - (const_int 15)]))))))] - "TARGET_XOP" - "vphaddubd\t{%1, %0|%0, %1}" - [(set_attr "type" "sseiadd1")]) - -(define_insn "xop_phaddubq" - [(set (match_operand:V2DI 0 "register_operand" "=x") - (plus:V2DI - (plus:V2DI - (plus:V2DI - (zero_extend:V2DI - (vec_select:V2QI - (match_operand:V16QI 1 "nonimmediate_operand" "xm") - (parallel [(const_int 0) - (const_int 4)]))) - (sign_extend:V2DI - (vec_select:V2QI - (match_dup 1) - (parallel [(const_int 1) - (const_int 5)])))) - (plus:V2DI - (zero_extend:V2DI - (vec_select:V2QI - (match_dup 1) - (parallel [(const_int 2) - (const_int 6)]))) - (zero_extend:V2DI - (vec_select:V2QI - (match_dup 1) - (parallel [(const_int 3) - (const_int 7)]))))) - (plus:V2DI - (plus:V2DI - (zero_extend:V2DI - (vec_select:V2QI - (match_dup 1) - (parallel [(const_int 8) - (const_int 12)]))) - (sign_extend:V2DI - (vec_select:V2QI - (match_dup 1) - (parallel [(const_int 9) - (const_int 13)])))) - (plus:V2DI - (zero_extend:V2DI - (vec_select:V2QI - (match_dup 1) - (parallel [(const_int 10) - (const_int 14)]))) - (zero_extend:V2DI - (vec_select:V2QI - (match_dup 1) - (parallel [(const_int 11) - (const_int 15)])))))))] - "TARGET_XOP" - "vphaddubq\t{%1, %0|%0, %1}" - [(set_attr "type" "sseiadd1")]) - -(define_insn "xop_phadduwd" - [(set (match_operand:V4SI 0 "register_operand" "=x") - (plus:V4SI - (zero_extend:V4SI - (vec_select:V4HI - (match_operand:V8HI 1 "nonimmediate_operand" "xm") - (parallel [(const_int 0) - (const_int 2) - (const_int 4) - (const_int 6)]))) - (zero_extend:V4SI - (vec_select:V4HI - (match_dup 1) - (parallel [(const_int 1) - (const_int 3) - (const_int 5) - (const_int 7)])))))] - "TARGET_XOP" - "vphadduwd\t{%1, %0|%0, %1}" - [(set_attr "type" "sseiadd1")]) - -(define_insn "xop_phadduwq" - [(set (match_operand:V2DI 0 "register_operand" "=x") - (plus:V2DI - (plus:V2DI - (zero_extend:V2DI - (vec_select:V2HI - (match_operand:V8HI 1 "nonimmediate_operand" "xm") - (parallel [(const_int 0) - (const_int 4)]))) - (zero_extend:V2DI - (vec_select:V2HI - (match_dup 1) - (parallel [(const_int 1) - (const_int 5)])))) - (plus:V2DI - (zero_extend:V2DI - (vec_select:V2HI - (match_dup 1) - (parallel [(const_int 2) - (const_int 6)]))) - (zero_extend:V2DI - (vec_select:V2HI - (match_dup 1) - (parallel [(const_int 3) - (const_int 7)]))))))] - "TARGET_XOP" - "vphadduwq\t{%1, %0|%0, %1}" - [(set_attr "type" "sseiadd1")]) - -(define_insn "xop_phaddudq" - [(set (match_operand:V2DI 0 "register_operand" "=x") - (plus:V2DI - (zero_extend:V2DI - (vec_select:V2SI - (match_operand:V4SI 1 "nonimmediate_operand" "xm") - (parallel [(const_int 0) - (const_int 2)]))) - (zero_extend:V2DI - (vec_select:V2SI - (match_dup 1) - (parallel [(const_int 1) - (const_int 3)])))))] - "TARGET_XOP" - "vphaddudq\t{%1, %0|%0, %1}" - [(set_attr "type" "sseiadd1")]) - (define_insn "xop_phsubbw" [(set (match_operand:V8HI 0 "register_operand" "=x") (minus:V8HI