From patchwork Fri Jan 31 14:12:54 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Uros Bizjak X-Patchwork-Id: 1231923 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-518639-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha1 header.s=default header.b=uUttYrEM; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20161025 header.b=vdeI6VVf; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 488K282bg3z9sPJ for ; Sat, 1 Feb 2020 01:13:20 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:from:date:message-id:subject:to:content-type; q= dns; s=default; b=nBPb5Yisnyj23trgcJe0bzAykWkP58jSB/Bhav1dfEcq/W kT2Lb/8gfKuzWWwjfKR99o/C6kli/vGIPkOC0Kya8nGJX7E4zNHHw0dq8/1iicMU 2qcDudNdAaBMLcyQYZcaEV+QH1U86JShgv/l1MiXr7Jvd8mv91xknYPyy2ZSA= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:from:date:message-id:subject:to:content-type; s= default; bh=b/Lkw0uzvWAnxqb0z2HRRbgf0N4=; b=uUttYrEM5nciBS3QvvwC meB3j6s68wKBoEkGMcN/SGsid6U8PFYnMC4lZraV5gywLGVsR/0WMl+y6664yTLV 9OQuAkauJjemg7+kU+SwJIkY/BIv0dB2tb/Pin68BhjG2cgvDcoHKiDsQRvZ0M2D Yh8ASwGZuvun/76IPgP+h44= Received: (qmail 60993 invoked by alias); 31 Jan 2020 14:13:11 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 60918 invoked by uid 89); 31 Jan 2020 14:13:11 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-15.6 required=5.0 tests=AWL, BAYES_00, FREEMAIL_FROM, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_PASS autolearn=ham version=3.3.1 spammy=Family, vmovdqa32, 1957 X-HELO: mail-io1-f46.google.com Received: from mail-io1-f46.google.com (HELO mail-io1-f46.google.com) (209.85.166.46) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 31 Jan 2020 14:13:07 +0000 Received: by mail-io1-f46.google.com with SMTP id i11so8221116ioi.12 for ; Fri, 31 Jan 2020 06:13:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to; bh=t6dh4M2Xeb9bxW3XW5q8I5EdkGoudzu2s9vu2Z/tMpU=; b=vdeI6VVfPcGcw+e+tG5visX3cSo7VsUq+uulHg87ubBqMwC7UR/xIPTIRdESz8+RRY GuKpzKQi39BQWucc4OspibEd8F9CDt9naQzzV9YS89PqOn17pGUJBGr5J/M2FD4zNxsr V0tFCHe4y1ukUflj1nd3UoVJWdI3U3BD3ofeQXG6jt+J2oR3qubIs9a+6zB272rsNCZZ gSljeuyavUc0Px7wPODeZETWaE5CHcy/aWj5lLvCjB6koy9fkUyd68if2twU4GdInYYp jBdwE8T9xEPk04EW0OQ9agDfsbMcx/z/V96muPcIkJnQa2r/sGz2sNr80T+20vSp/Ope zpRA== MIME-Version: 1.0 From: Uros Bizjak Date: Fri, 31 Jan 2020 15:12:54 +0100 Message-ID: Subject: [PATCH, i386]: Fix TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL handling. To: "gcc-patches@gcc.gnu.org" The reason for TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL on AMD target is only insn size, as advised in e.g. Software Optimization Guide for the AMD Family 15h Processors [1], section 7.1.2, where it is said: --quote-- 7.1.2 Reduce Instruction SizeOptimization Reduce the size of instructions when possible. Rationale Using smaller instruction sizes improves instruction fetch throughput. Specific examples include the following: *In SIMD code, use the single-precision (PS) form of instructions instead of the double-precision (PD) form. For example, for register to register moves, MOVAPS achieves the same result as MOVAPD, but uses one less byte to encode the instruction and has no prefix byte. Other examples in which single-precision forms can be substituted for double-precision forms include MOVUPS, MOVNTPS, XORPS, ORPS, ANDPS, and SHUFPS. ... --/quote-- Please note that this optimization applies only to non-AVX forms, as demonstrated by: 0: 0f 28 c8 movaps %xmm0,%xmm1 3: 66 0f 28 c8 movapd %xmm0,%xmm1 7: c5 f8 28 d1 vmovaps %xmm1,%xmm2 b: c5 f9 28 d1 vmovapd %xmm1,%xmm2 Also note that MOVDQA is missing in the above optimization. It is harmful to substitute MOVDQA with MOVAPS, as it can (and does) introduce +1 cycle forwarding penalty between FLT (FPA/FPM) and INT (VALU) FP clusters. [1] https://www.amd.com/system/files/TechDocs/47414_15h_sw_opt_guide.pdf Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}. Will be committed to mainline soon. Uros. 2020-01-31 Uroš Bizjak * config/i386/i386.md (*movoi_internal_avx): Do not check for TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL. Remove MODE_V8SF handling. (*movti_internal): Do not check for TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL. (*movtf_internal): Move check for TARGET_SSE2 and size optimization just after check for TARGET_AVX. (*movdf_internal): Ditto. * config/i386/mmx.md (*mov_internal): Do not check for TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL. * config/i386/sse.md (mov_internal): Only check TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL with V2DFmode. Move check for TARGET_SSE2 and size optimization just after check for TARGET_AVX. (_andnot3): Move check for TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL after check for TARGET_AVX. (3): Ditto. (*andnot3): Ditto. (*andnottf3): Ditto. (*3): Ditto. (*tf3): Ditto. (*andnot3): Remove TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL handling. (3): Ditto. (*3): Ditto. (sse4_1_blendv): Ditto. * config/i386/x86-tune.def (X86_TUNE_SSE_UNALIGNED_STORE_OPTIMAL): Explain that tune applies to 128bit instructions only. diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 9f0077d59a97..e2675da24c17 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -1949,18 +1949,14 @@ if (misaligned_operand (operands[0], OImode) || misaligned_operand (operands[1], OImode)) { - if (get_attr_mode (insn) == MODE_V8SF) - return "vmovups\t{%1, %0|%0, %1}"; - else if (get_attr_mode (insn) == MODE_XI) + if (get_attr_mode (insn) == MODE_XI) return "vmovdqu32\t{%1, %0|%0, %1}"; else return "vmovdqu\t{%1, %0|%0, %1}"; } else { - if (get_attr_mode (insn) == MODE_V8SF) - return "vmovaps\t{%1, %0|%0, %1}"; - else if (get_attr_mode (insn) == MODE_XI) + if (get_attr_mode (insn) == MODE_XI) return "vmovdqa32\t{%1, %0|%0, %1}"; else return "vmovdqa\t{%1, %0|%0, %1}"; @@ -1980,8 +1976,6 @@ (and (eq_attr "alternative" "1") (match_test "TARGET_AVX512VL")) (const_string "XI") - (match_test "TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL") - (const_string "V8SF") ] (const_string "OI")))]) @@ -2060,11 +2054,10 @@ (match_test "TARGET_AVX") (const_string "TI") (ior (not (match_test "TARGET_SSE2")) - (ior (match_test "TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL") - (and (eq_attr "alternative" "5") - (match_test "TARGET_SSE_TYPELESS_STORES")))) + (match_test "optimize_function_for_size_p (cfun)")) (const_string "V4SF") - (match_test "optimize_function_for_size_p (cfun)") + (and (eq_attr "alternative" "5") + (match_test "TARGET_SSE_TYPELESS_STORES")) (const_string "V4SF") ] (const_string "TI"))) @@ -2243,12 +2236,10 @@ (cond [(ior (match_operand 0 "ext_sse_reg_operand") (match_operand 1 "ext_sse_reg_operand")) (const_string "TI") - (ior (not (match_test "TARGET_SSE2")) - (match_test "TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL")) - (const_string "V4SF") (match_test "TARGET_AVX") (const_string "TI") - (match_test "optimize_function_for_size_p (cfun)") + (ior (not (match_test "TARGET_SSE2")) + (match_test "optimize_function_for_size_p (cfun)")) (const_string "V4SF") ] (const_string "TI")) @@ -2453,12 +2444,10 @@ (cond [(ior (match_operand 0 "ext_sse_reg_operand") (match_operand 1 "ext_sse_reg_operand")) (const_string "XI") - (ior (not (match_test "TARGET_SSE2")) - (match_test "TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL")) - (const_string "V4SF") (match_test "TARGET_AVX") (const_string "TI") - (match_test "optimize_function_for_size_p (cfun)") + (ior (not (match_test "TARGET_SSE2")) + (match_test "optimize_function_for_size_p (cfun)")) (const_string "V4SF") ] (const_string "TI")) @@ -3324,14 +3313,14 @@ (const_string "DI") (match_test "TARGET_AVX") (const_string "TI") + (ior (not (match_test "TARGET_SSE2")) + (match_test "optimize_function_for_size_p (cfun)")) + (const_string "V4SF") (match_test "TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL") (const_string "V4SF") (and (eq_attr "alternative" "2") (match_test "TARGET_SSE_TYPELESS_STORES")) (const_string "V4SF") - (ior (not (match_test "TARGET_SSE2")) - (match_test "optimize_function_for_size_p (cfun)")) - (const_string "V4SF") ] (const_string "TI")))]) @@ -3541,14 +3530,13 @@ /* xorps is one byte shorter for non-AVX targets. */ (eq_attr "alternative" "12,16") - (cond [(not (match_test "TARGET_SSE2")) - (const_string "V4SF") - (and (match_test "TARGET_AVX512F") - (not (match_test "TARGET_PREFER_AVX256"))) + (cond [(and (match_test "TARGET_AVX512F") + (not (match_test "TARGET_PREFER_AVX256"))) (const_string "XI") (match_test "TARGET_AVX") (const_string "V2DF") - (match_test "optimize_function_for_size_p (cfun)") + (ior (not (match_test "TARGET_SSE2")) + (match_test "optimize_function_for_size_p (cfun)")) (const_string "V4SF") (match_test "TARGET_SSE_LOAD0_BY_PXOR") (const_string "TI") @@ -3566,15 +3554,15 @@ (ior (match_operand 0 "ext_sse_reg_operand") (match_operand 1 "ext_sse_reg_operand"))) (const_string "V8DF") + (match_test "TARGET_AVX") + (const_string "DF") (ior (not (match_test "TARGET_SSE2")) - (match_test "TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL")) + (match_test "optimize_function_for_size_p (cfun)")) + (const_string "V4SF") + (match_test "TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL") (const_string "V4SF") (match_test "TARGET_SSE_PARTIAL_REG_DEPENDENCY") (const_string "V2DF") - (match_test "TARGET_AVX") - (const_string "DF") - (match_test "optimize_function_for_size_p (cfun)") - (const_string "V4SF") ] (const_string "DF")) @@ -3723,16 +3711,15 @@ (eq_attr "alternative" "11") (const_string "DI") (eq_attr "alternative" "5") - (cond [(not (match_test "TARGET_SSE2")) - (const_string "V4SF") - (and (match_test "TARGET_AVX512F") - (not (match_test "TARGET_PREFER_AVX256"))) + (cond [(and (match_test "TARGET_AVX512F") + (not (match_test "TARGET_PREFER_AVX256"))) (const_string "V16SF") (match_test "TARGET_AVX") (const_string "V4SF") - (match_test "optimize_function_for_size_p (cfun)") + (ior (not (match_test "TARGET_SSE2")) + (match_test "optimize_function_for_size_p (cfun)")) (const_string "V4SF") - (match_test "TARGET_SSE_LOAD0_BY_PXOR") + (match_test "TARGET_SSE_LOAD0_BY_PXOR") (const_string "TI") ] (const_string "V4SF")) diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md index d06e3b1fcb49..f695831b5b90 100644 --- a/gcc/config/i386/mmx.md +++ b/gcc/config/i386/mmx.md @@ -195,11 +195,7 @@ (match_test "mode == V2SFmode") (const_string "V4SF") (ior (not (match_test "TARGET_SSE2")) - (match_test "TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL")) - (const_string "V4SF") - (match_test "TARGET_AVX") - (const_string "TI") - (match_test "optimize_function_for_size_p (cfun)") + (match_test "optimize_function_for_size_p (cfun)")) (const_string "V4SF") ] (const_string "TI")) diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index abbd879aab35..91ef86ae274b 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -1118,13 +1118,15 @@ (const_string "") (match_test "TARGET_AVX") (const_string "") - (ior (match_test "TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL") - (and (eq_attr "alternative" "3") - (match_test "TARGET_SSE_TYPELESS_STORES"))) - (const_string "") (ior (not (match_test "TARGET_SSE2")) (match_test "optimize_function_for_size_p (cfun)")) (const_string "V4SF") + (and (match_test "mode == V2DFmode") + (match_test "TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL")) + (const_string "V4SF") + (and (eq_attr "alternative" "3") + (match_test "TARGET_SSE_TYPELESS_STORES")) + (const_string "V4SF") (and (eq_attr "alternative" "0") (match_test "TARGET_SSE_LOAD0_BY_PXOR")) (const_string "TI") @@ -3555,16 +3557,14 @@ (const_string "") (eq_attr "alternative" "3") (const_string "") - (and (match_test " == 16") - (match_test "TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL")) - (const_string "") (match_test "TARGET_AVX") (const_string "") (match_test "optimize_function_for_size_p (cfun)") (const_string "V4SF") - ] - (const_string "")))]) - + (match_test "TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL") + (const_string "V4SF") + ] + (const_string "")))]) (define_insn "_andnot3" [(set (match_operand:VF_512 0 "register_operand" "=v") @@ -3673,15 +3673,14 @@ (const_string "") (eq_attr "alternative" "3") (const_string "") - (and (match_test " == 16") - (match_test "TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL")) - (const_string "") (match_test "TARGET_AVX") (const_string "") (match_test "optimize_function_for_size_p (cfun)") (const_string "V4SF") - ] - (const_string "")))]) + (match_test "TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL") + (const_string "V4SF") + ] + (const_string "")))]) (define_insn "*3" [(set (match_operand:VF_512 0 "register_operand" "=v") @@ -3822,15 +3821,14 @@ (if_then_else (match_test "TARGET_AVX512DQ") (const_string "") (const_string "XI")) - (and (match_test " == 16") - (match_test "TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL")) - (const_string "V4SF") (match_test "TARGET_AVX") (const_string "") (match_test "optimize_function_for_size_p (cfun)") (const_string "V4SF") - ] - (const_string "")))]) + (match_test "TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL") + (const_string "V4SF") + ] + (const_string "")))]) (define_insn "*andnottf3" [(set (match_operand:TF 0 "register_operand" "=x,x,v,v") @@ -3879,15 +3877,15 @@ (const_string "TI") (eq_attr "alternative" "3") (const_string "XI") - (match_test "TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL") - (const_string "V4SF") (match_test "TARGET_AVX") (const_string "TI") (ior (not (match_test "TARGET_SSE2")) (match_test "optimize_function_for_size_p (cfun)")) (const_string "V4SF") - ] - (const_string "TI")))]) + (match_test "TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL") + (const_string "V4SF") + ] + (const_string "TI")))]) (define_insn "*3" [(set (match_operand:MODEF 0 "register_operand" "=x,x,v,v") @@ -3946,15 +3944,14 @@ (if_then_else (match_test "TARGET_AVX512DQ") (const_string "") (const_string "XI")) - (and (match_test " == 16") - (match_test "TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL")) - (const_string "V4SF") (match_test "TARGET_AVX") (const_string "") (match_test "optimize_function_for_size_p (cfun)") (const_string "V4SF") - ] - (const_string "")))]) + (match_test "TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL") + (const_string "V4SF") + ] + (const_string "")))]) (define_expand "tf3" [(set (match_operand:TF 0 "register_operand") @@ -4011,15 +4008,15 @@ (const_string "TI") (eq_attr "alternative" "3") (const_string "QI") - (match_test "TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL") - (const_string "V4SF") (match_test "TARGET_AVX") (const_string "TI") (ior (not (match_test "TARGET_SSE2")) (match_test "optimize_function_for_size_p (cfun)")) (const_string "V4SF") - ] - (const_string "TI")))]) + (match_test "TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL") + (const_string "V4SF") + ] + (const_string "TI")))]) ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; @@ -13007,10 +13004,7 @@ (const_string "*"))) (set_attr "prefix" "orig,vex,evex") (set (attr "mode") - (cond [(and (match_test " == 16") - (match_test "TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL")) - (const_string "") - (match_test "TARGET_AVX2") + (cond [(match_test "TARGET_AVX2") (const_string "") (match_test "TARGET_AVX") (if_then_else @@ -13148,10 +13142,7 @@ (const_string "*"))) (set_attr "prefix" ",evex") (set (attr "mode") - (cond [(and (match_test " == 16") - (match_test "TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL")) - (const_string "") - (match_test "TARGET_AVX2") + (cond [(match_test "TARGET_AVX2") (const_string "") (match_test "TARGET_AVX") (if_then_else @@ -13244,10 +13235,7 @@ (const_string "*"))) (set_attr "prefix" "orig,vex,evex") (set (attr "mode") - (cond [(and (match_test " == 16") - (match_test "TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL")) - (const_string "") - (match_test "TARGET_AVX2") + (cond [(match_test "TARGET_AVX2") (const_string "") (match_test "TARGET_AVX") (if_then_else @@ -17151,14 +17139,14 @@ (set_attr "prefix" "orig,orig,vex") (set_attr "btver2_decode" "vector,vector,vector") (set (attr "mode") - (cond [(match_test "TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL") - (const_string "V4SF") - (match_test "TARGET_AVX") + (cond [(match_test "TARGET_AVX") (const_string "") (match_test "optimize_function_for_size_p (cfun)") (const_string "V4SF") - ] - (const_string "")))]) + (match_test "TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL") + (const_string "V4SF") + ] + (const_string "")))]) (define_insn_and_split "*_blendv_lt" [(set (match_operand:VF_128_256 0 "register_operand" "=Yr,*x,x") diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.def index 212547405a3f..d55687aa9e27 100644 --- a/gcc/config/i386/x86-tune.def +++ b/gcc/config/i386/x86-tune.def @@ -373,8 +373,8 @@ DEF_TUNE (X86_TUNE_SSE_UNALIGNED_STORE_OPTIMAL, "sse_unaligned_store_optimal", | m_INTEL | m_GOLDMONT | m_GOLDMONT_PLUS | m_TREMONT | m_BDVER | m_ZNVER | m_GENERIC) -/* Use packed single precision instructions where posisble. I.e. movups instead - of movupd. */ +/* X86_TUNE_SSE_PACKED_SINGLE_INSN_OPTIMAL: Use packed single + precision 128bit instructions instead of double where possible. */ DEF_TUNE (X86_TUNE_SSE_PACKED_SINGLE_INSN_OPTIMAL, "sse_packed_single_insn_optimal", m_BDVER | m_ZNVER)