From patchwork Sat Feb 9 13:23:11 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "H.J. Lu" X-Patchwork-Id: 1039198 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-495576-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="sU5DSRFu"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 43xXsL5DqFz9sMp for ; Sun, 10 Feb 2019 00:27:18 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; q=dns; s=default; b=DHL FAZV83h0t8tY7RbDmyqYorAcm/J12tTNrPTT+neS1TuzYFv0jUVCRu5HdFR/ZsEB OaeOqqmSifaVhP7+W1vK3y5IXfBGLY3P7T7sreDbYSTWqWA2308FievGahQlidsK n3REPpBONkwE1leVkaNIKj/8TdD3StBvua19SpL4= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; s=default; bh=GmxfD73bw qdx+M9+f21F0HQ6U7o=; b=sU5DSRFu0wFUMXIISPbj1ovQY9cgc3nMVevnfgCM4 m74H48eC/5+O9x7jdqopo1ZJluMdcAOlH26rjJThsQIFMdv4tJ87ThGBBbowH0eQ zA33AHolV5zYeAN3DCmHmyJmMr6SYLOTTYdp/NyD4iDxs2PRE9pV5DLk9dPLWT7s W4= Received: (qmail 89982 invoked by alias); 9 Feb 2019 13:24:04 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 89424 invoked by uid 89); 9 Feb 2019 13:24:01 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-25.7 required=5.0 tests=BAYES_00, FREEMAIL_FROM, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, KAM_NUMSUBJECT, SPF_SOFTFAIL autolearn=ham version=3.3.2 spammy= X-HELO: mga06.intel.com Received: from mga06.intel.com (HELO mga06.intel.com) (134.134.136.31) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Sat, 09 Feb 2019 13:23:58 +0000 Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga104.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 09 Feb 2019 05:23:53 -0800 Received: from gnu-cfl-1.sc.intel.com ([172.25.70.237]) by fmsmga004.fm.intel.com with ESMTP; 09 Feb 2019 05:23:52 -0800 From: "H.J. Lu" To: gcc-patches@gcc.gnu.org Cc: Uros Bizjak Subject: [PATCH 02/43] i386: Emulate MMX packsswb/packssdw/packuswb with SSE2 Date: Sat, 9 Feb 2019 05:23:11 -0800 Message-Id: <20190209132352.1828-3-hjl.tools@gmail.com> In-Reply-To: <20190209132352.1828-1-hjl.tools@gmail.com> References: <20190209132352.1828-1-hjl.tools@gmail.com> MIME-Version: 1.0 X-IsSubscribed: yes Emulate MMX packsswb/packssdw/packuswb with SSE packsswb/packssdw/packuswb plus moving bits 64:95 to bits 32:63 in SSE register. Only SSE register source operand is allowed. 2019-02-08 H.J. Lu Uros Bizjak PR target/89021 * config/i386/constraints.md (Yx): Any SSE register if MMX is disabled in 64-bit mode. (Yy): Any EVEX encodable SSE register for AVX512VL target, otherwise any SSE register if MMX is disabled in 64-bit mode. * config/i386/i386-protos.h (ix86_move_vector_high_sse_to_mmx): New prototype. (ix86_split_mmx_pack): Likewise. * config/i386/i386.c (ix86_move_vector_high_sse_to_mmx): New function. (ix86_split_mmx_pack): Likewise. * config/i386/i386.md (mmx_isa): New. (enabled): Also check mmx_isa. * config/i386/mmx.md (any_s_truncate): New code iterator. (s_trunsuffix): New code attr. (mmx_packsswb): Removed. (mmx_packssdw): Likewise. (mmx_packuswb): Likewise. (mmx_packswb): New define_insn_and_split to emulate MMX packsswb/packuswb with SSE2. (mmx_packssdw): Likewise. --- gcc/config/i386/constraints.md | 10 +++++ gcc/config/i386/i386-protos.h | 3 ++ gcc/config/i386/i386.c | 54 +++++++++++++++++++++++++++ gcc/config/i386/i386.md | 12 ++++++ gcc/config/i386/mmx.md | 67 +++++++++++++++++++--------------- 5 files changed, 116 insertions(+), 30 deletions(-) diff --git a/gcc/config/i386/constraints.md b/gcc/config/i386/constraints.md index 33921aea267..6e9244ad77f 100644 --- a/gcc/config/i386/constraints.md +++ b/gcc/config/i386/constraints.md @@ -110,6 +110,9 @@ ;; v any EVEX encodable SSE register for AVX512VL target, ;; otherwise any SSE register ;; h EVEX encodable SSE register with number factor of four +;; x SSE register if MMX is disabled in 64-bit mode +;; y any EVEX encodable SSE register for AVX512VL target, otherwise +;; any SSE register if MMX is disabled in 64-bit mode (define_register_constraint "Yz" "TARGET_SSE ? SSE_FIRST_REG : NO_REGS" "First SSE register (@code{%xmm0}).") @@ -146,6 +149,13 @@ "TARGET_AVX512VL ? ALL_SSE_REGS : TARGET_SSE ? SSE_REGS : NO_REGS" "@internal For AVX512VL, any EVEX encodable SSE register (@code{%xmm0-%xmm31}), otherwise any SSE register.") +(define_register_constraint "Yx" "TARGET_MMX_WITH_SSE ? SSE_REGS : NO_REGS" + "@internal Any SSE register if MMX is disabled in 64-bit mode.") + +(define_register_constraint "Yy" + "TARGET_MMX_WITH_SSE ? (TARGET_AVX512VL ? ALL_SSE_REGS : TARGET_SSE ? SSE_REGS : NO_REGS) : NO_REGS" + "@internal Any EVEX encodable SSE register for AVX512VL target, otherwise any SSE register if MMX is disabled in 64-bit mode.") + ;; We use the B prefix to denote any number of internal operands: ;; f FLAGS_REG ;; g GOT memory operand. diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h index 2d600173917..bb96a420a85 100644 --- a/gcc/config/i386/i386-protos.h +++ b/gcc/config/i386/i386-protos.h @@ -200,6 +200,9 @@ extern void ix86_expand_vecop_qihi (enum rtx_code, rtx, rtx, rtx); extern rtx ix86_split_stack_guard (void); +extern void ix86_move_vector_high_sse_to_mmx (rtx); +extern void ix86_split_mmx_pack (rtx[], enum rtx_code); + #ifdef TREE_CODE extern void init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, tree, int); #endif /* TREE_CODE */ diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 12bc7926f86..cab35bb2242 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -19955,6 +19955,60 @@ ix86_expand_vector_move_misalign (machine_mode mode, rtx operands[]) gcc_unreachable (); } +/* Move bits 64:95 to bits 32:63. */ + +void +ix86_move_vector_high_sse_to_mmx (rtx op) +{ + rtx mask = gen_rtx_PARALLEL (VOIDmode, + gen_rtvec (4, GEN_INT (0), GEN_INT (2), + GEN_INT (0), GEN_INT (0))); + rtx dest = gen_rtx_REG (V4SImode, REGNO (op)); + op = gen_rtx_VEC_SELECT (V4SImode, dest, mask); + rtx insn = gen_rtx_SET (dest, op); + emit_insn (insn); +} + +/* Split MMX pack with signed/unsigned saturation with SSE/SSE2. */ + +void +ix86_split_mmx_pack (rtx operands[], enum rtx_code code) +{ + rtx op0 = operands[0]; + rtx op1 = operands[1]; + rtx op2 = operands[2]; + + machine_mode dmode = GET_MODE (op0); + machine_mode smode = GET_MODE (op1); + machine_mode inner_dmode = GET_MODE_INNER (dmode); + machine_mode inner_smode = GET_MODE_INNER (smode); + + /* Get the corresponding SSE mode for destination. */ + int nunits = 16 / GET_MODE_SIZE (inner_dmode); + machine_mode sse_dmode = mode_for_vector (GET_MODE_INNER (dmode), + nunits).require (); + machine_mode sse_half_dmode = mode_for_vector (GET_MODE_INNER (dmode), + nunits / 2).require (); + + /* Get the corresponding SSE mode for source. */ + nunits = 16 / GET_MODE_SIZE (inner_smode); + machine_mode sse_smode = mode_for_vector (GET_MODE_INNER (smode), + nunits).require (); + + /* Generate SSE pack with signed/unsigned saturation. */ + rtx dest = gen_rtx_REG (sse_dmode, REGNO (op0)); + op1 = gen_rtx_REG (sse_smode, REGNO (op1)); + op2 = gen_rtx_REG (sse_smode, REGNO (op2)); + + op1 = gen_rtx_fmt_e (code, sse_half_dmode, op1); + op2 = gen_rtx_fmt_e (code, sse_half_dmode, op2); + rtx insn = gen_rtx_SET (dest, gen_rtx_VEC_CONCAT (sse_dmode, + op1, op2)); + emit_insn (insn); + + ix86_move_vector_high_sse_to_mmx (op0); +} + /* Helper function of ix86_fixup_binary_operands to canonicalize operand order. Returns true if the operands should be swapped. */ diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 4a32144a71a..72685107fc0 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -792,6 +792,9 @@ avx512vl,noavx512vl,x64_avx512dq,x64_avx512bw" (const_string "base")) +;; Define instruction set of MMX instructions +(define_attr "mmx_isa" "base,native,x64,x64_noavx,x64_avx" (const_string "base")) + (define_attr "enabled" "" (cond [(eq_attr "isa" "x64") (symbol_ref "TARGET_64BIT") (eq_attr "isa" "x64_sse2") @@ -830,6 +833,15 @@ (eq_attr "isa" "noavx512dq") (symbol_ref "!TARGET_AVX512DQ") (eq_attr "isa" "avx512vl") (symbol_ref "TARGET_AVX512VL") (eq_attr "isa" "noavx512vl") (symbol_ref "!TARGET_AVX512VL") + + (eq_attr "mmx_isa" "native") + (symbol_ref "!TARGET_MMX_WITH_SSE") + (eq_attr "mmx_isa" "x64") + (symbol_ref "TARGET_MMX_WITH_SSE") + (eq_attr "mmx_isa" "x64_avx") + (symbol_ref "TARGET_MMX_WITH_SSE && TARGET_AVX") + (eq_attr "mmx_isa" "x64_noavx") + (symbol_ref "TARGET_MMX_WITH_SSE && !TARGET_AVX") ] (const_int 1))) diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md index c1e0f2c411e..5c28d935e82 100644 --- a/gcc/config/i386/mmx.md +++ b/gcc/config/i386/mmx.md @@ -58,6 +58,11 @@ ;; Mapping from integer vector mode to mnemonic suffix (define_mode_attr mmxvecsize [(V8QI "b") (V4HI "w") (V2SI "d") (V1DI "q")]) +;; Used in signed and unsigned truncations with saturation. +(define_code_iterator any_s_truncate [ss_truncate us_truncate]) +;; Instruction suffix for truncations with saturation. +(define_code_attr s_trunsuffix [(ss_truncate "s") (us_truncate "u")]) + ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; ;; Move patterns @@ -1046,41 +1051,43 @@ ;; ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; -(define_insn "mmx_packsswb" - [(set (match_operand:V8QI 0 "register_operand" "=y") +(define_insn_and_split "mmx_packswb" + [(set (match_operand:V8QI 0 "register_operand" "=y,Yx,Yy") (vec_concat:V8QI - (ss_truncate:V4QI - (match_operand:V4HI 1 "register_operand" "0")) - (ss_truncate:V4QI - (match_operand:V4HI 2 "nonimmediate_operand" "ym"))))] - "TARGET_MMX" - "packsswb\t{%2, %0|%0, %2}" - [(set_attr "type" "mmxshft") - (set_attr "mode" "DI")]) + (any_s_truncate:V4QI + (match_operand:V4HI 1 "register_operand" "0,0,Yy")) + (any_s_truncate:V4QI + (match_operand:V4HI 2 "nonimmediate_operand" "ym,Yx,Yy"))))] + "TARGET_MMX || TARGET_MMX_WITH_SSE" + "@ + packswb\t{%2, %0|%0, %2} + # + #" + "&& reload_completed && TARGET_MMX_WITH_SSE" + [(const_int 0)] + "ix86_split_mmx_pack (operands, );" + [(set_attr "mmx_isa" "native,x64_noavx,x64_avx") + (set_attr "type" "mmxshft,sselog,sselog") + (set_attr "mode" "DI,TI,TI")]) -(define_insn "mmx_packssdw" - [(set (match_operand:V4HI 0 "register_operand" "=y") +(define_insn_and_split "mmx_packssdw" + [(set (match_operand:V4HI 0 "register_operand" "=y,Yx,Yy") (vec_concat:V4HI (ss_truncate:V2HI - (match_operand:V2SI 1 "register_operand" "0")) + (match_operand:V2SI 1 "register_operand" "0,0,Yy")) (ss_truncate:V2HI - (match_operand:V2SI 2 "nonimmediate_operand" "ym"))))] - "TARGET_MMX" - "packssdw\t{%2, %0|%0, %2}" - [(set_attr "type" "mmxshft") - (set_attr "mode" "DI")]) - -(define_insn "mmx_packuswb" - [(set (match_operand:V8QI 0 "register_operand" "=y") - (vec_concat:V8QI - (us_truncate:V4QI - (match_operand:V4HI 1 "register_operand" "0")) - (us_truncate:V4QI - (match_operand:V4HI 2 "nonimmediate_operand" "ym"))))] - "TARGET_MMX" - "packuswb\t{%2, %0|%0, %2}" - [(set_attr "type" "mmxshft") - (set_attr "mode" "DI")]) + (match_operand:V2SI 2 "nonimmediate_operand" "ym,Yx,Yy"))))] + "TARGET_MMX || TARGET_MMX_WITH_SSE" + "@ + packssdw\t{%2, %0|%0, %2} + # + #" + "&& reload_completed && TARGET_MMX_WITH_SSE" + [(const_int 0)] + "ix86_split_mmx_pack (operands, SS_TRUNCATE);" + [(set_attr "mmx_isa" "native,x64_noavx,x64_avx") + (set_attr "type" "mmxshft,sselog,sselog") + (set_attr "mode" "DI,TI,TI")]) (define_insn "mmx_punpckhbw" [(set (match_operand:V8QI 0 "register_operand" "=y")