From patchwork Wed Oct 17 15:02:31 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "H.J. Lu" X-Patchwork-Id: 985431 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-487744-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="ao7D8HB7"; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="IlkNrsuk"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 42ZwWW0rTpz9s8T for ; Thu, 18 Oct 2018 02:07:01 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id; q=dns; s=default; b=Qt1FFTyxH9OY zGSzdjH4K0NmRmz5lyzv+cG5E+l5lqi6WR/pSaKSPLhEgb4yro+jV7bDTd7mGbju vvR1r3ywwnEc84FnsT/k1rGDQ1WBB99R1vFcTAdeeACBgX2NZom7w1hGwWTFOL23 jwtstFtkW8Y9bY00dOP0enJSIh8JCxw= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id; s=default; bh=H4PdWnwxPLXJcBEk0W 48H+eEcHw=; b=ao7D8HB7xNlXW5Yep15H7bCai1tvylaQkQ8uuf+xvpQ6SktGvg zOz2BaoWlfhnTr2O7c32925Dq+QMS8UG1/Az6xAVI0Pe1BitWbgVBohnuH7vLYTZ njlGBCx2bqgP1WyEWZkUQiGG1uqhdiJxXkkZraeVXer4+KhGxfdzID00E= Received: (qmail 72816 invoked by alias); 17 Oct 2018 15:06:54 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 72797 invoked by uid 89); 17 Oct 2018 15:06:53 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-26.9 required=5.0 tests=BAYES_00, FREEMAIL_FROM, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_NONE, SPF_PASS autolearn=ham version=3.3.2 spammy=sk:ix86_bu, i386c, shorter, i386.c X-HELO: mail-ot1-f54.google.com Received: from mail-ot1-f54.google.com (HELO mail-ot1-f54.google.com) (209.85.210.54) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 17 Oct 2018 15:06:51 +0000 Received: by mail-ot1-f54.google.com with SMTP id l58so26512833otd.6 for ; Wed, 17 Oct 2018 08:06:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id; bh=vgpz73XCNfJIj1o3EeiGSRes30eVh8f9ifVQY2EfQgU=; b=IlkNrsukg679yWiaR2hslJW1G1dchJrm/fkZ5zFOy9grT/ZYRSadx1+aQjV8Rna0wn ijSPCImJ6iJg/mWKcJ0QyicQc2J2MNWVzJWIlrpTaBE/5Mv+0nMdCjnaqBoYQOFBwlnk u+ZczrBeRXoHef5Fe+q5o3Gq09TOvELW7ISKb3c4QCrIYANrVuU/H5bT91g7kljvWab4 /+xf2hIQ1rw1Su+Utj59uQr79QakiM0UXd9QuCmmfbYZ4nVbkK7BUAWPtmwLoU1fXqoM t+Y9eo/NijI46/d4xn7hSAf9qdb64aMiVxvCkCwCp8KKClyxNyMIZSZbXYUXPLyITgct Yfgg== Received: from gnu-efi-2.localdomain ([2607:fb90:204:9564:dcf3:fd4c:9673:baa0]) by smtp.gmail.com with ESMTPSA id w128-v6sm4538861oiw.15.2018.10.17.08.06.45 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Wed, 17 Oct 2018 08:06:48 -0700 (PDT) From: "H.J. Lu" To: gcc-patches@gcc.gnu.org Cc: Uros Bizjak Subject: [PATCH 1/2] i386: Use scalar operand in floating point vec_dup patterns Date: Wed, 17 Oct 2018 08:02:31 -0700 Message-Id: <20181017150232.32208-1-hjl.tools@gmail.com> X-IsSubscribed: yes Since vector registers are also used for scalar floating point values, we can use scalar operand in floating point vec_dup patterns, which enables combiner to generate (set (reg:V8SF 84) (vec_duplicate:V8SF (mem/c:SF (symbol_ref:DI ("y"))))) For AVX512 broadcast instructions from integer register operand, we only need to broadcast integer to integer vectors. gcc/ PR target/87537 * config/i386/i386-builtin-types.def: Replace CODE_FOR_avx2_vec_dupv4sf, CODE_FOR_avx2_vec_dupv8sf and CODE_FOR_avx2_vec_dupv4df with CODE_FOR_vec_dupv4sf, CODE_FOR_vec_dupv8sf and CODE_FOR_vec_dupv4df, respectively. * config/i386/i386.c (expand_vec_perm_1): Replace gen_avx512f_vec_dupv16sf_1, gen_avx2_vec_dupv8sf_1 and gen_avx512f_vec_dupv8df_1 with gen_avx512f_vec_dupv16sf, gen_vec_dupv8sf and gen_avx512f_vec_dupv8df, respectively. Duplicate them from scalar operand. * config/i386/i386.md (SF to DF splitter): Replace gen_avx512f_vec_dupv16sf_1 with gen_avx512f_vec_dupv16sf. * config/i386/sse.md (VF48_AVX512VL): New. (avx2_vec_dup): Removed. (avx2_vec_dupv8sf_1): Likewise. (avx512f_vec_dup_1): Likewise. (avx2_vec_dupv4df): Likewise. (_vec_dup:V48_AVX512VL): Likewise. (_vec_dup:VF48_AVX512VL): New. (_vec_dup:VI48_AVX512VL): Likewise. (_vec_dup_gpr): Replace V48_AVX512VL with VI48_AVX512VL. (*avx_vperm_broadcast_): Replace gen_avx2_vec_dupv8sf with gen_vec_dupv8sf. gcc/testsuite/ PR target/87537 * gcc.target/i386/avx2-vbroadcastss_ps256-1.c: Updated. * gcc.target/i386/avx512vl-vbroadcast-3.c: Likewise. --- gcc/config/i386/i386-builtin.def | 6 +- gcc/config/i386/i386.c | 28 ++++++- gcc/config/i386/i386.md | 2 +- gcc/config/i386/sse.md | 82 ++++++------------- .../i386/avx2-vbroadcastss_ps256-1.c | 3 +- .../gcc.target/i386/avx512vl-vbroadcast-3.c | 5 +- 6 files changed, 56 insertions(+), 70 deletions(-) diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def index dc4c70c7ea3..922f9ea2544 100644 --- a/gcc/config/i386/i386-builtin.def +++ b/gcc/config/i386/i386-builtin.def @@ -1194,9 +1194,9 @@ BDESC (OPTION_MASK_ISA_AVX2, CODE_FOR_avx2_interleave_lowv16hi, "__builtin_ia32_ BDESC (OPTION_MASK_ISA_AVX2, CODE_FOR_avx2_interleave_lowv8si, "__builtin_ia32_punpckldq256", IX86_BUILTIN_PUNPCKLDQ256, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI) BDESC (OPTION_MASK_ISA_AVX2, CODE_FOR_avx2_interleave_lowv4di, "__builtin_ia32_punpcklqdq256", IX86_BUILTIN_PUNPCKLQDQ256, UNKNOWN, (int) V4DI_FTYPE_V4DI_V4DI) BDESC (OPTION_MASK_ISA_AVX2, CODE_FOR_xorv4di3, "__builtin_ia32_pxor256", IX86_BUILTIN_PXOR256, UNKNOWN, (int) V4DI_FTYPE_V4DI_V4DI) -BDESC (OPTION_MASK_ISA_AVX2, CODE_FOR_avx2_vec_dupv4sf, "__builtin_ia32_vbroadcastss_ps", IX86_BUILTIN_VBROADCASTSS_PS, UNKNOWN, (int) V4SF_FTYPE_V4SF) -BDESC (OPTION_MASK_ISA_AVX2, CODE_FOR_avx2_vec_dupv8sf, "__builtin_ia32_vbroadcastss_ps256", IX86_BUILTIN_VBROADCASTSS_PS256, UNKNOWN, (int) V8SF_FTYPE_V4SF) -BDESC (OPTION_MASK_ISA_AVX2, CODE_FOR_avx2_vec_dupv4df, "__builtin_ia32_vbroadcastsd_pd256", IX86_BUILTIN_VBROADCASTSD_PD256, UNKNOWN, (int) V4DF_FTYPE_V2DF) +BDESC (OPTION_MASK_ISA_AVX2, CODE_FOR_vec_dupv4sf, "__builtin_ia32_vbroadcastss_ps", IX86_BUILTIN_VBROADCASTSS_PS, UNKNOWN, (int) V4SF_FTYPE_V4SF) +BDESC (OPTION_MASK_ISA_AVX2, CODE_FOR_vec_dupv8sf, "__builtin_ia32_vbroadcastss_ps256", IX86_BUILTIN_VBROADCASTSS_PS256, UNKNOWN, (int) V8SF_FTYPE_V4SF) +BDESC (OPTION_MASK_ISA_AVX2, CODE_FOR_vec_dupv4df, "__builtin_ia32_vbroadcastsd_pd256", IX86_BUILTIN_VBROADCASTSD_PD256, UNKNOWN, (int) V4DF_FTYPE_V2DF) BDESC (OPTION_MASK_ISA_AVX2, CODE_FOR_avx2_vbroadcasti128_v4di, "__builtin_ia32_vbroadcastsi256", IX86_BUILTIN_VBROADCASTSI256, UNKNOWN, (int) V4DI_FTYPE_V2DI) BDESC (OPTION_MASK_ISA_AVX2, CODE_FOR_avx2_pblenddv4si, "__builtin_ia32_pblendd128", IX86_BUILTIN_PBLENDD128, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_INT) BDESC (OPTION_MASK_ISA_AVX2, CODE_FOR_avx2_pblenddv8si, "__builtin_ia32_pblendd256", IX86_BUILTIN_PBLENDD256, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_INT) diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 3ab6b205eb6..efddcbdcc24 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -45980,6 +45980,7 @@ expand_vec_perm_1 (struct expand_vec_perm_d *d) { /* Use vpbroadcast{b,w,d}. */ rtx (*gen) (rtx, rtx) = NULL; + machine_mode scalar_mode = VOIDmode; switch (d->vmode) { case E_V64QImode: @@ -46010,15 +46011,18 @@ expand_vec_perm_1 (struct expand_vec_perm_d *d) gen = gen_avx2_pbroadcastv8hi; break; case E_V16SFmode: + scalar_mode = SFmode; if (TARGET_AVX512F) - gen = gen_avx512f_vec_dupv16sf_1; + gen = gen_avx512f_vec_dupv16sf; break; case E_V8SFmode: - gen = gen_avx2_vec_dupv8sf_1; + scalar_mode = SFmode; + gen = gen_vec_dupv8sf; break; case E_V8DFmode: + scalar_mode = DFmode; if (TARGET_AVX512F) - gen = gen_avx512f_vec_dupv8df_1; + gen = gen_avx512f_vec_dupv8df; break; case E_V8DImode: if (TARGET_AVX512F) @@ -46030,7 +46034,23 @@ expand_vec_perm_1 (struct expand_vec_perm_d *d) if (gen != NULL) { if (!d->testing_p) - emit_insn (gen (d->target, d->op0)); + { + if (scalar_mode == VOIDmode) + emit_insn (gen (d->target, d->op0)); + else + { + rtx op = d->op0; + unsigned int oppos = 0; + if (SUBREG_P (op)) + { + op = SUBREG_REG (op); + oppos = SUBREG_BYTE (op); + } + emit_insn (gen (d->target, + gen_rtx_SUBREG (scalar_mode, + op, oppos))); + } + } return true; } } diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 7fb2b144f47..4a6fa077db5 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -4399,7 +4399,7 @@ else { rtx tmp = lowpart_subreg (V16SFmode, operands[3], V4SFmode); - emit_insn (gen_avx512f_vec_dupv16sf_1 (tmp, tmp)); + emit_insn (gen_avx512f_vec_dupv16sf (tmp, tmp)); } } else diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index ff9f81535a9..13dc7370fd3 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -304,6 +304,10 @@ (define_mode_iterator VF_512 [V16SF V8DF]) +(define_mode_iterator VF48_AVX512VL + [V16SF (V8SF "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL") + V8DF (V4DF "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")]) + (define_mode_iterator VI48_AVX512VL [V16SI (V8SI "TARGET_AVX512VL") (V4SI "TARGET_AVX512VL") V8DI (V4DI "TARGET_AVX512VL") (V2DI "TARGET_AVX512VL")]) @@ -6776,42 +6780,6 @@ (set_attr "prefix" "orig,maybe_evex") (set_attr "mode" "SF")]) -(define_insn "avx2_vec_dup" - [(set (match_operand:VF1_128_256 0 "register_operand" "=v") - (vec_duplicate:VF1_128_256 - (vec_select:SF - (match_operand:V4SF 1 "register_operand" "v") - (parallel [(const_int 0)]))))] - "TARGET_AVX2" - "vbroadcastss\t{%1, %0|%0, %1}" - [(set_attr "type" "sselog1") - (set_attr "prefix" "maybe_evex") - (set_attr "mode" "")]) - -(define_insn "avx2_vec_dupv8sf_1" - [(set (match_operand:V8SF 0 "register_operand" "=v") - (vec_duplicate:V8SF - (vec_select:SF - (match_operand:V8SF 1 "register_operand" "v") - (parallel [(const_int 0)]))))] - "TARGET_AVX2" - "vbroadcastss\t{%x1, %0|%0, %x1}" - [(set_attr "type" "sselog1") - (set_attr "prefix" "maybe_evex") - (set_attr "mode" "V8SF")]) - -(define_insn "avx512f_vec_dup_1" - [(set (match_operand:VF_512 0 "register_operand" "=v") - (vec_duplicate:VF_512 - (vec_select: - (match_operand:VF_512 1 "register_operand" "v") - (parallel [(const_int 0)]))))] - "TARGET_AVX512F" - "vbroadcast\t{%x1, %0|%0, %x1}" - [(set_attr "type" "sselog1") - (set_attr "prefix" "evex") - (set_attr "mode" "")]) - ;; Although insertps takes register source, we prefer ;; unpcklps with register source since it is shorter. (define_insn "*vec_concatv2sf_sse4_1" @@ -17721,18 +17689,6 @@ (set_attr "prefix" "vex") (set_attr "mode" "OI")]) -(define_insn "avx2_vec_dupv4df" - [(set (match_operand:V4DF 0 "register_operand" "=v") - (vec_duplicate:V4DF - (vec_select:DF - (match_operand:V2DF 1 "register_operand" "v") - (parallel [(const_int 0)]))))] - "TARGET_AVX2" - "vbroadcastsd\t{%1, %0|%0, %1}" - [(set_attr "type" "sselog1") - (set_attr "prefix" "maybe_evex") - (set_attr "mode" "V4DF")]) - (define_insn "_vec_dup_1" [(set (match_operand:VI_AVX512BW 0 "register_operand" "=v,v") (vec_duplicate:VI_AVX512BW @@ -17748,11 +17704,9 @@ (set_attr "mode" "")]) (define_insn "_vec_dup" - [(set (match_operand:V48_AVX512VL 0 "register_operand" "=v") - (vec_duplicate:V48_AVX512VL - (vec_select: - (match_operand: 1 "nonimmediate_operand" "vm") - (parallel [(const_int 0)]))))] + [(set (match_operand:VF48_AVX512VL 0 "register_operand" "=v") + (vec_duplicate:VF48_AVX512VL + (match_operand: 1 "nonimmediate_operand" "vm")))] "TARGET_AVX512F" { /* There is no DF broadcast (in AVX-512*) to 128b register. @@ -17766,6 +17720,18 @@ (set_attr "prefix" "evex") (set_attr "mode" "")]) +(define_insn "_vec_dup" + [(set (match_operand:VI48_AVX512VL 0 "register_operand" "=v") + (vec_duplicate:VI48_AVX512VL + (vec_select: + (match_operand: 1 "nonimmediate_operand" "vm") + (parallel [(const_int 0)]))))] + "TARGET_AVX512F" + "vbroadcast\t{%1, %0|%0, %1}" + [(set_attr "type" "ssemov") + (set_attr "prefix" "evex") + (set_attr "mode" "")]) + (define_insn "_vec_dup" [(set (match_operand:VI12_AVX512VL 0 "register_operand" "=v") (vec_duplicate:VI12_AVX512VL @@ -17815,8 +17781,8 @@ (set_attr "mode" "")]) (define_insn "_vec_dup_gpr" - [(set (match_operand:V48_AVX512VL 0 "register_operand" "=v,v") - (vec_duplicate:V48_AVX512VL + [(set (match_operand:VI48_AVX512VL 0 "register_operand" "=v,v") + (vec_duplicate:VI48_AVX512VL (match_operand: 1 "nonimmediate_operand" "vm,r")))] "TARGET_AVX512F" "vbroadcast\t{%1, %0|%0, %1}" @@ -17825,8 +17791,7 @@ (set_attr "mode" "") (set (attr "enabled") (if_then_else (eq_attr "alternative" "1") - (symbol_ref "GET_MODE_CLASS (mode) == MODE_INT - && (mode != DImode || TARGET_64BIT)") + (symbol_ref "mode != DImode || TARGET_64BIT") (const_int 1)))]) (define_insn "vec_dupv4sf" @@ -18155,8 +18120,7 @@ or VSHUFF128. */ gcc_assert (mode == V8SFmode); if ((mask & 1) == 0) - emit_insn (gen_avx2_vec_dupv8sf (op0, - gen_lowpart (V4SFmode, op0))); + emit_insn (gen_vec_dupv8sf (op0, gen_lowpart (V4SFmode, op0))); else emit_insn (gen_avx512vl_shuf_f32x4_1 (op0, op0, op0, GEN_INT (4), GEN_INT (5), diff --git a/gcc/testsuite/gcc.target/i386/avx2-vbroadcastss_ps256-1.c b/gcc/testsuite/gcc.target/i386/avx2-vbroadcastss_ps256-1.c index dfac3916b08..3ff7497aa21 100644 --- a/gcc/testsuite/gcc.target/i386/avx2-vbroadcastss_ps256-1.c +++ b/gcc/testsuite/gcc.target/i386/avx2-vbroadcastss_ps256-1.c @@ -1,6 +1,7 @@ /* { dg-do compile } */ /* { dg-options "-mavx2 -O2" } */ -/* { dg-final { scan-assembler "vbroadcastss\[ \\t\]+\[^\n\]*%xmm\[0-9\]" } } */ +/* { dg-final { scan-assembler "vbroadcastss\[ \\t\]+\[^\n\]*%ymm\[0-9\]" } } */ +/* { dg-final { scan-assembler-not "vmovaps\[\t \]*\[^,\]*,%xmm\[0-9\]" } } */ #include diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-vbroadcast-3.c b/gcc/testsuite/gcc.target/i386/avx512vl-vbroadcast-3.c index 7233398cd64..1c62364dac4 100644 --- a/gcc/testsuite/gcc.target/i386/avx512vl-vbroadcast-3.c +++ b/gcc/testsuite/gcc.target/i386/avx512vl-vbroadcast-3.c @@ -151,8 +151,8 @@ f16 (V2 *x) } /* { dg-final { scan-assembler-times "vbroadcastss\[^\n\r]*%\[re\]di\[^\n\r]*%xmm16" 4 } } */ -/* { dg-final { scan-assembler-times "vbroadcastss\[^\n\r]*%xmm16\[^\n\r]*%ymm16" 3 } } */ -/* { dg-final { scan-assembler-times "vbroadcastss\[^\n\r]*%\[re\]di\[^\n\r]*%ymm16" 3 } } */ +/* { dg-final { scan-assembler-times "vbroadcastss\[^\n\r]*%xmm16\[^\n\r]*%ymm16" 1 } } */ +/* { dg-final { scan-assembler-times "vbroadcastss\[^\n\r]*%\[re\]di\[^\n\r]*%ymm16" 4 } } */ /* { dg-final { scan-assembler-times "vpermilps\[^\n\r]*\\\$0\[^\n\r]*%xmm16\[^\n\r]*%xmm16" 1 } } */ /* { dg-final { scan-assembler-times "vpermilps\[^\n\r]*\\\$85\[^\n\r]*%xmm16\[^\n\r]*%xmm16" 1 } } */ /* { dg-final { scan-assembler-times "vpermilps\[^\n\r]*\\\$170\[^\n\r]*%xmm16\[^\n\r]*%xmm16" 1 } } */ @@ -160,3 +160,4 @@ f16 (V2 *x) /* { dg-final { scan-assembler-times "vpermilps\[^\n\r]*\\\$0\[^\n\r]*%ymm16\[^\n\r]*%ymm16" 1 } } */ /* { dg-final { scan-assembler-times "vpermilps\[^\n\r]*\\\$85\[^\n\r]*%ymm16\[^\n\r]*%ymm16" 2 } } */ /* { dg-final { scan-assembler-times "vshuff32x4\[^\n\r]*\\\$3\[^\n\r]*%ymm16\[^\n\r]*%ymm16\[^\n\r]*%ymm16" 2 } } */ +/* { dg-final { scan-assembler-times "vshuff32x4\[^\n\r]*\\\$0\[^\n\r]*%ymm16\[^\n\r]*%ymm16\[^\n\r]*%ymm16" 1 } } */