From patchwork Sun Dec 29 23:46:22 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jakub Jelinek X-Patchwork-Id: 1216194 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-516528-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="wvsLQQZO"; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.b="h73wnK6D"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 47mHK70Gflz9sPW for ; Mon, 30 Dec 2019 10:46:48 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:reply-to:mime-version :content-type; q=dns; s=default; b=e3OXKZXPBbco8yNTsmE9D7+U1A8kZ gMOm7DzK8z8ND7F0xNFizwzhrNVYGCwijQYKgBQ43w/K0O2TyiyiM2ZQcM+jHgbu p1ZhRu3CsRTx9ujnsGwjMQIDhDrm7Eg+8evzYR08H9skjOl7dmOXXgUHIcDSKTEM TSd9Qq59IJiJE8= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:reply-to:mime-version :content-type; s=default; bh=JsIrCTQKUbxoEPIGet0cWkvH7VQ=; b=wvs LQQZOD/gOl3ini6Hy2eDdEfy23H3nAozkdA5DudQXCjQQFNzyNGkq8cMmlZB3YBR c81dKrZYkJEHhQP7NzMgFJs3meHe3LdUB8bn81KzRSONC/oRYY6ebxTiOZy80EWC pbyZ/SlqtZKdx/6fL/qelkwBFCGzGXs5Az+vUU4g= Received: (qmail 49253 invoked by alias); 29 Dec 2019 23:46:39 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 49236 invoked by uid 89); 29 Dec 2019 23:46:39 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-7.9 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 spammy=kills, vm, destinations, rsp X-HELO: us-smtp-1.mimecast.com Received: from us-smtp-delivery-1.mimecast.com (HELO us-smtp-1.mimecast.com) (207.211.31.120) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Sun, 29 Dec 2019 23:46:34 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1577663192; h=from:from:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type; bh=f5keNI/Uaalr9Un+J8Arqp08zJQKcK5F2oUuQvm4q4o=; b=h73wnK6DNlFbBc55thsgnvTmAUbmQU62W6Tmr2hqmOPrqiNKGxr5Z/U6cLPR4oQSyAa4pA nKxYn5ytMVFajwC4kbVgLMiQz4KMIFF4V6whraBmlQKH1btQdcAQ+/m48BBC6u+iUqP9zR drW2Ntz1I/sCPoG66cv1jdxp+QnKDSw= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-203-OOpltYk7NOenS0ycInhP4g-1; Sun, 29 Dec 2019 18:46:28 -0500 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 69F8C1800D42; Sun, 29 Dec 2019 23:46:27 +0000 (UTC) Received: from tucnak.zalov.cz (ovpn-116-112.ams2.redhat.com [10.36.116.112]) by smtp.corp.redhat.com (Postfix) with ESMTPS id A57D65DA2C; Sun, 29 Dec 2019 23:46:26 +0000 (UTC) Received: from tucnak.zalov.cz (localhost [127.0.0.1]) by tucnak.zalov.cz (8.15.2/8.15.2) with ESMTP id xBTNkOeC012489; Mon, 30 Dec 2019 00:46:24 +0100 Received: (from jakub@localhost) by tucnak.zalov.cz (8.15.2/8.15.2/Submit) id xBTNkMT3012488; Mon, 30 Dec 2019 00:46:22 +0100 Date: Mon, 30 Dec 2019 00:46:22 +0100 From: Jakub Jelinek To: Uros Bizjak , Jeff Law , hjl.tools@gmail.com Cc: gcc-patches@gcc.gnu.org Subject: [PATCH] Fix vextract* masked patterns (PR target/93069) Message-ID: <20191229234622.GT10088@tucnak> Reply-To: Jakub Jelinek MIME-Version: 1.0 User-Agent: Mutt/1.11.3 (2019-02-01) X-Mimecast-Spam-Score: 0 Content-Disposition: inline X-IsSubscribed: yes Hi! The AVX512F documentation clearly states that in instructions where the destination is a memory only merging-masking is possible, not zero-masking, and the assembler enforces that. The testcase in this patch fails to assemble because of Error: unsupported masking for `vextracti32x8' on vextracti32x8 $0x0, %zmm1, -64(%rsp){%k1}{z} For the vector extraction patterns, we apparently have 7 *_maskm patterns that only accept memory destinations and rtx_equal_p merge-masking source for it, 7 * corresponding patterns that allow memory destination only for the non-masked cases (through ), then 2 * patterns (lo ssehalf V16FI and lo ssehalf VI8F_256 ones) which do allow memory destination even for masked cases and are the cause of the testsuite failure, because we must not allow C constraint if the destination is m, and finally one pair of patterns (separate * and *_mask, hi ssehalf VI4F_256), which has another issue (for which I don't have a testcase though), where if it would match zero-masking with register destination, it wouldn't emit the needed {z} into assembly. The attached patch fixes those 3 issues only, perhaps more suitable for backporting. But, even with that fixed, we are missing 3 further *_maskm patterns and more importantly, I find the split into 3 separate patterns after subst, *_maskm for masking with memory destination, *_mask for masking with register destination and * for non-masking unnecessarily complex and harder for reload, so the included patch below (non-attached) instead kills all *_maskm patterns and splits the * patterns into * and *_mask by hand instead of subst, where the *_mask ones make sure that with v destination they use 0C, while with m destination they use 0 and as condition enforce that either destination is not MEM, or rtx_equal_p between the destination and corresponding merging-masking operand source. If we had those 3 missing *_maskm patterns, this patch would actually result in both shorter sse.md and shorter machine description after subst (e.g. length of tmp-mddump.md), as we don't have them, the patch is actually 16 lines longer sse.md, but still shorter tmp-mddump.md. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk (and is the shorter patch ok for backports)? 2019-12-30 Jakub Jelinek PR target/93069 * config/i386/subst.md (store_mask_constraint, store_mask_predicate): Remove. (avx512dq_vextract64x2_1_maskm, avx512f_vextract32x4_1_maskm, vec_extract_lo__maskm, vec_extract_hi__maskm): Remove. (avx512dq_vextract64x2_1): Split into ... (*avx512dq_vextract64x2_1, avx512dq_vextract64x2_1_mask): ... these new define_insns. Even in the masked variant allow memory output but in that case use 0 rather than 0C constraint on the source of masked-out elts. (avx512f_vextract32x4_1): Split into ... (*avx512f_vextract32x4_1, avx512f_vextract32x4_1_mask): ... these new define_insns. Even in the masked variant allow memory output but in that case use 0 rather than 0C constraint on the source of masked-out elts. (vec_extract_lo_): Split into ... (vec_extract_lo_, vec_extract_lo__mask): ... these new define_insns. Even in the masked variant allow memory output but in that case use 0 rather than 0C constraint on the source of masked-out elts. (vec_extract_hi_): Split into ... (vec_extract_hi_, vec_extract_hi__mask): ... these new define_insns. Even in the masked variant allow memory output but in that case use 0 rather than 0C constraint on the source of masked-out elts. * gcc.target/i386/avx512vl-pr93069.c: New test. * gcc.dg/vect/pr93069.c: New test. Jakub 2019-12-30 Jakub Jelinek PR target/93069 * config/i386/sse.md (vec_extract_lo_): Use instead of m in output operand constraint. (vec_extract_hi_): Use instead of %{%3%}. * gcc.target/i386/avx512vl-pr93069.c: New test. * gcc.dg/vect/pr93069.c: New test. --- gcc/config/i386/sse.md.jj 2019-12-27 18:16:48.146431083 +0100 +++ gcc/config/i386/sse.md 2019-12-28 14:43:29.181456611 +0100 @@ -8782,7 +8782,8 @@ }) (define_insn "vec_extract_lo_" - [(set (match_operand: 0 "nonimmediate_operand" "=v,v,m") + [(set (match_operand: 0 "" + "=v,v,") (vec_select: (match_operand:V16FI 1 "" "v,,v") @@ -8834,7 +8835,8 @@ }) (define_insn "vec_extract_lo_" - [(set (match_operand: 0 "" "=v,v,m") + [(set (match_operand: 0 "" + "=v,v,") (vec_select: (match_operand:VI8F_256 1 "" "v,,v") @@ -8844,7 +8846,7 @@ && ( || !(MEM_P (operands[0]) && MEM_P (operands[1])))" { if () - return "vextract64x2\t{$0x0, %1, %0%{%3%}|%0%{%3%}, %1, 0x0}"; + return "vextract64x2\t{$0x0, %1, %0|%0, %1, 0x0}"; else return "#"; } --- gcc/testsuite/gcc.target/i386/avx512vl-pr93069.c.jj 2019-12-28 16:31:30.118695074 +0100 +++ gcc/testsuite/gcc.target/i386/avx512vl-pr93069.c 2019-12-28 16:32:16.920990539 +0100 @@ -0,0 +1,12 @@ +/* PR target/93069 */ +/* { dg-do assemble { target vect_simd_clones } } */ +/* { dg-options "-O2 -fopenmp-simd -mtune=skylake-avx512" } */ +/* { dg-additional-options "-mavx512vl" { target avx512vl } } */ +/* { dg-additional-options "-mavx512dq" { target avx512dq } } */ + +#pragma omp declare simd +int +foo (int x, int y) +{ + return x == 0 ? x : y; +} --- gcc/testsuite/gcc.dg/vect/pr93069.c.jj 2019-12-28 16:31:01.822121036 +0100 +++ gcc/testsuite/gcc.dg/vect/pr93069.c 2019-12-28 16:30:35.503517205 +0100 @@ -0,0 +1,10 @@ +/* PR target/93069 */ +/* { dg-do assemble { target vect_simd_clones } } */ +/* { dg-options "-O2 -fopenmp-simd" } */ + +#pragma omp declare simd +int +foo (int x, int y) +{ + return x == 0 ? x : y; +} --- gcc/config/i386/subst.md.jj 2019-10-28 22:16:14.651007061 +0100 +++ gcc/config/i386/subst.md 2019-12-28 14:43:56.654042070 +0100 @@ -57,8 +57,6 @@ (define_subst_attr "mask_mode512bit_cond (define_subst_attr "mask_avx512vl_condition" "mask" "1" "TARGET_AVX512VL") (define_subst_attr "mask_avx512bw_condition" "mask" "1" "TARGET_AVX512BW") (define_subst_attr "mask_avx512dq_condition" "mask" "1" "TARGET_AVX512DQ") -(define_subst_attr "store_mask_constraint" "mask" "vm" "v") -(define_subst_attr "store_mask_predicate" "mask" "nonimmediate_operand" "register_operand") (define_subst_attr "mask_prefix" "mask" "vex" "evex") (define_subst_attr "mask_prefix2" "mask" "maybe_vex" "evex") (define_subst_attr "mask_prefix3" "mask" "orig,vex" "evex,evex") --- gcc/config/i386/sse.md.jj 2019-12-27 18:16:48.146431083 +0100 +++ gcc/config/i386/sse.md 2019-12-29 12:36:33.232414154 +0100 @@ -8415,60 +8415,31 @@ (define_expand "_vextract< DONE; }) -(define_insn "avx512dq_vextract64x2_1_maskm" - [(set (match_operand: 0 "memory_operand" "=m") +(define_insn "avx512dq_vextract64x2_1_mask" + [(set (match_operand: 0 "nonimmediate_operand" "=v,m") (vec_merge: (vec_select: - (match_operand:V8FI 1 "register_operand" "v") - (parallel [(match_operand 2 "const_0_to_7_operand") - (match_operand 3 "const_0_to_7_operand")])) - (match_operand: 4 "memory_operand" "0") - (match_operand:QI 5 "register_operand" "Yk")))] + (match_operand:V8FI 1 "register_operand" "v,v") + (parallel [(match_operand 2 "const_0_to_7_operand") + (match_operand 3 "const_0_to_7_operand")])) + (match_operand: 4 "nonimm_or_0_operand" "0C,0") + (match_operand:QI 5 "register_operand" "Yk,Yk")))] "TARGET_AVX512DQ && INTVAL (operands[2]) % 2 == 0 && INTVAL (operands[2]) == INTVAL (operands[3]) - 1 - && rtx_equal_p (operands[4], operands[0])" + && (!MEM_P (operands[0]) || rtx_equal_p (operands[0], operands[4]))" { - operands[2] = GEN_INT ((INTVAL (operands[2])) >> 1); - return "vextract64x2\t{%2, %1, %0%{%5%}|%0%{%5%}, %1, %2}"; -} - [(set_attr "type" "sselog") - (set_attr "prefix_extra" "1") - (set_attr "length_immediate" "1") - (set_attr "memory" "store") - (set_attr "prefix" "evex") - (set_attr "mode" "")]) - -(define_insn "avx512f_vextract32x4_1_maskm" - [(set (match_operand: 0 "memory_operand" "=m") - (vec_merge: - (vec_select: - (match_operand:V16FI 1 "register_operand" "v") - (parallel [(match_operand 2 "const_0_to_15_operand") - (match_operand 3 "const_0_to_15_operand") - (match_operand 4 "const_0_to_15_operand") - (match_operand 5 "const_0_to_15_operand")])) - (match_operand: 6 "memory_operand" "0") - (match_operand:QI 7 "register_operand" "Yk")))] - "TARGET_AVX512F - && INTVAL (operands[2]) % 4 == 0 - && INTVAL (operands[2]) == INTVAL (operands[3]) - 1 - && INTVAL (operands[3]) == INTVAL (operands[4]) - 1 - && INTVAL (operands[4]) == INTVAL (operands[5]) - 1 - && rtx_equal_p (operands[6], operands[0])" -{ - operands[2] = GEN_INT (INTVAL (operands[2]) >> 2); - return "vextract32x4\t{%2, %1, %0%{%7%}|%0%{%7%}, %1, %2}"; + operands[2] = GEN_INT (INTVAL (operands[2]) >> 1); + return "vextract64x2\t{%2, %1, %0%{%5%}%N4|%0%{%5%}%N4, %1, %2}"; } - [(set_attr "type" "sselog") + [(set_attr "type" "sselog1") (set_attr "prefix_extra" "1") (set_attr "length_immediate" "1") - (set_attr "memory" "store") (set_attr "prefix" "evex") (set_attr "mode" "")]) -(define_insn "avx512dq_vextract64x2_1" - [(set (match_operand: 0 "" "=") +(define_insn "*avx512dq_vextract64x2_1" + [(set (match_operand: 0 "nonimmediate_operand" "=vm") (vec_select: (match_operand:V8FI 1 "register_operand" "v") (parallel [(match_operand 2 "const_0_to_7_operand") @@ -8478,7 +8449,7 @@ (define_insn "avx512dq_vex && INTVAL (operands[2]) == INTVAL (operands[3]) - 1" { operands[2] = GEN_INT (INTVAL (operands[2]) >> 1); - return "vextract64x2\t{%2, %1, %0|%0, %1, %2}"; + return "vextract64x2\t{%2, %1, %0|%0, %1, %2}"; } [(set_attr "type" "sselog1") (set_attr "prefix_extra" "1") @@ -8507,14 +8478,41 @@ (define_split operands[1] = gen_lowpart (mode, operands[1]); }) -(define_insn "avx512f_vextract32x4_1" - [(set (match_operand: 0 "" "=") +(define_insn "avx512f_vextract32x4_1_mask" + [(set (match_operand: 0 "nonimmediate_operand" "=v,m") + (vec_merge: + (vec_select: + (match_operand:V16FI 1 "register_operand" "v,v") + (parallel [(match_operand 2 "const_0_to_15_operand") + (match_operand 3 "const_0_to_15_operand") + (match_operand 4 "const_0_to_15_operand") + (match_operand 5 "const_0_to_15_operand")])) + (match_operand: 6 "nonimm_or_0_operand" "0C,0") + (match_operand:QI 7 "register_operand" "Yk,Yk")))] + "TARGET_AVX512F + && INTVAL (operands[2]) % 4 == 0 + && INTVAL (operands[2]) == INTVAL (operands[3]) - 1 + && INTVAL (operands[3]) == INTVAL (operands[4]) - 1 + && INTVAL (operands[4]) == INTVAL (operands[5]) - 1 + && (!MEM_P (operands[0]) || rtx_equal_p (operands[0], operands[6]))" +{ + operands[2] = GEN_INT (INTVAL (operands[2]) >> 2); + return "vextract32x4\t{%2, %1, %0%{%7%}%N6|%0%{%7%}%N6, %1, %2}"; +} + [(set_attr "type" "sselog1") + (set_attr "prefix_extra" "1") + (set_attr "length_immediate" "1") + (set_attr "prefix" "evex") + (set_attr "mode" "")]) + +(define_insn "*avx512f_vextract32x4_1" + [(set (match_operand: 0 "nonimmediate_operand" "=vm") (vec_select: (match_operand:V16FI 1 "register_operand" "v") - (parallel [(match_operand 2 "const_0_to_15_operand") - (match_operand 3 "const_0_to_15_operand") - (match_operand 4 "const_0_to_15_operand") - (match_operand 5 "const_0_to_15_operand")])))] + (parallel [(match_operand 2 "const_0_to_15_operand") + (match_operand 3 "const_0_to_15_operand") + (match_operand 4 "const_0_to_15_operand") + (match_operand 5 "const_0_to_15_operand")])))] "TARGET_AVX512F && INTVAL (operands[2]) % 4 == 0 && INTVAL (operands[2]) == INTVAL (operands[3]) - 1 @@ -8522,7 +8520,7 @@ (define_insn "avx512f_vext && INTVAL (operands[4]) == INTVAL (operands[5]) - 1" { operands[2] = GEN_INT (INTVAL (operands[2]) >> 2); - return "vextract32x4\t{%2, %1, %0|%0, %1, %2}"; + return "vextract32x4\t{%2, %1, %0|%0, %1, %2}"; } [(set_attr "type" "sselog1") (set_attr "prefix_extra" "1") @@ -8606,35 +8604,35 @@ (define_split [(set (match_dup 0) (match_dup 1))] "operands[1] = gen_lowpart (mode, operands[1]);") -(define_insn "vec_extract_lo__maskm" - [(set (match_operand: 0 "memory_operand" "=m") +(define_insn "vec_extract_lo__mask" + [(set (match_operand: 0 "nonimmediate_operand" "=v,m") (vec_merge: (vec_select: - (match_operand:V8FI 1 "register_operand" "v") + (match_operand:V8FI 1 "register_operand" "v,v") (parallel [(const_int 0) (const_int 1) - (const_int 2) (const_int 3)])) - (match_operand: 2 "memory_operand" "0") - (match_operand:QI 3 "register_operand" "Yk")))] + (const_int 2) (const_int 3)])) + (match_operand: 2 "nonimm_or_0_operand" "0C,0") + (match_operand:QI 3 "register_operand" "Yk,Yk")))] "TARGET_AVX512F - && rtx_equal_p (operands[2], operands[0])" - "vextract64x4\t{$0x0, %1, %0%{%3%}|%0%{%3%}, %1, 0x0}" + && (!MEM_P (operands[0]) || rtx_equal_p (operands[0], operands[2]))" + "vextract64x4\t{$0x0, %1, %0%{%3%}%N2|%0%{%3%}%N2, %1, 0x0}" [(set_attr "type" "sselog1") (set_attr "prefix_extra" "1") (set_attr "length_immediate" "1") + (set_attr "memory" "none,store") (set_attr "prefix" "evex") (set_attr "mode" "")]) -(define_insn "vec_extract_lo_" - [(set (match_operand: 0 "" "=v,,v") +(define_insn "vec_extract_lo_" + [(set (match_operand: 0 "nonimmediate_operand" "=v,vm,v") (vec_select: - (match_operand:V8FI 1 "" "v,v,") + (match_operand:V8FI 1 "nonimmediate_operand" "v,v,vm") (parallel [(const_int 0) (const_int 1) - (const_int 2) (const_int 3)])))] - "TARGET_AVX512F - && ( || !(MEM_P (operands[0]) && MEM_P (operands[1])))" + (const_int 2) (const_int 3)])))] + "TARGET_AVX512F && !(MEM_P (operands[0]) && MEM_P (operands[1]))" { - if ( || (!TARGET_AVX512VL && !MEM_P (operands[1]))) - return "vextract64x4\t{$0x0, %1, %0|%0, %1, 0x0}"; + if (!TARGET_AVX512VL && !MEM_P (operands[1])) + return "vextract64x4\t{$0x0, %1, %0|%0, %1, 0x0}"; else return "#"; } @@ -8645,70 +8643,69 @@ (define_insn "vec_extract_lo_")]) -(define_insn "vec_extract_hi__maskm" - [(set (match_operand: 0 "memory_operand" "=m") +(define_insn "vec_extract_hi__mask" + [(set (match_operand: 0 "nonimmediate_operand" "=v,m") (vec_merge: (vec_select: - (match_operand:V8FI 1 "register_operand" "v") + (match_operand:V8FI 1 "register_operand" "v,v") (parallel [(const_int 4) (const_int 5) - (const_int 6) (const_int 7)])) - (match_operand: 2 "memory_operand" "0") - (match_operand:QI 3 "register_operand" "Yk")))] + (const_int 6) (const_int 7)])) + (match_operand: 2 "nonimm_or_0_operand" "0C,0") + (match_operand:QI 3 "register_operand" "Yk,Yk")))] "TARGET_AVX512F - && rtx_equal_p (operands[2], operands[0])" - "vextract64x4\t{$0x1, %1, %0%{%3%}|%0%{%3%}, %1, 0x1}" - [(set_attr "type" "sselog") + && (!MEM_P (operands[0]) || rtx_equal_p (operands[0], operands[2]))" + "vextract64x4\t{$0x1, %1, %0%{%3%}%N2|%0%{%3%}%N2, %1, 0x1}" + [(set_attr "type" "sselog1") (set_attr "prefix_extra" "1") (set_attr "length_immediate" "1") - (set_attr "memory" "store") (set_attr "prefix" "evex") (set_attr "mode" "")]) -(define_insn "vec_extract_hi_" - [(set (match_operand: 0 "" "=") +(define_insn "vec_extract_hi_" + [(set (match_operand: 0 "nonimmediate_operand" "=vm") (vec_select: (match_operand:V8FI 1 "register_operand" "v") (parallel [(const_int 4) (const_int 5) - (const_int 6) (const_int 7)])))] + (const_int 6) (const_int 7)])))] "TARGET_AVX512F" - "vextract64x4\t{$0x1, %1, %0|%0, %1, 0x1}" + "vextract64x4\t{$0x1, %1, %0|%0, %1, 0x1}" [(set_attr "type" "sselog1") (set_attr "prefix_extra" "1") (set_attr "length_immediate" "1") (set_attr "prefix" "evex") (set_attr "mode" "")]) -(define_insn "vec_extract_hi__maskm" - [(set (match_operand: 0 "memory_operand" "=m") +(define_insn "vec_extract_hi__mask" + [(set (match_operand: 0 "nonimmediate_operand" "=v,m") (vec_merge: (vec_select: - (match_operand:V16FI 1 "register_operand" "v") + (match_operand:V16FI 1 "register_operand" "v,v") (parallel [(const_int 8) (const_int 9) - (const_int 10) (const_int 11) - (const_int 12) (const_int 13) - (const_int 14) (const_int 15)])) - (match_operand: 2 "memory_operand" "0") - (match_operand:QI 3 "register_operand" "Yk")))] + (const_int 10) (const_int 11) + (const_int 12) (const_int 13) + (const_int 14) (const_int 15)])) + (match_operand: 2 "nonimm_or_0_operand" "0C,0") + (match_operand:QI 3 "register_operand" "Yk,Yk")))] "TARGET_AVX512DQ - && rtx_equal_p (operands[2], operands[0])" - "vextract32x8\t{$0x1, %1, %0%{%3%}|%0%{%3%}, %1, 0x1}" + && (!MEM_P (operands[0]) || rtx_equal_p (operands[0], operands[2]))" + "vextract32x8\t{$0x1, %1, %0%{%3%}%N2|%0%{%3%}%N2, %1, 0x1}" [(set_attr "type" "sselog1") (set_attr "prefix_extra" "1") (set_attr "length_immediate" "1") (set_attr "prefix" "evex") (set_attr "mode" "")]) -(define_insn "vec_extract_hi_" - [(set (match_operand: 0 "" "=,vm") +(define_insn "vec_extract_hi_" + [(set (match_operand: 0 "nonimmediate_operand" "=vm,vm") (vec_select: (match_operand:V16FI 1 "register_operand" "v,v") (parallel [(const_int 8) (const_int 9) - (const_int 10) (const_int 11) - (const_int 12) (const_int 13) - (const_int 14) (const_int 15)])))] - "TARGET_AVX512F && " + (const_int 10) (const_int 11) + (const_int 12) (const_int 13) + (const_int 14) (const_int 15)])))] + "TARGET_AVX512F" "@ - vextract32x8\t{$0x1, %1, %0|%0, %1, 0x1} + vextract32x8\t{$0x1, %1, %0|%0, %1, 0x1} vextracti64x4\t{$0x1, %1, %0|%0, %1, 0x1}" [(set_attr "type" "sselog1") (set_attr "prefix_extra" "1") @@ -8781,24 +8778,42 @@ (define_expand "avx_vextractf128" DONE; }) -(define_insn "vec_extract_lo_" +(define_insn "vec_extract_lo__mask" + [(set (match_operand: 0 "nonimmediate_operand" "=v,m") + (vec_merge: + (vec_select: + (match_operand:V16FI 1 "register_operand" "v,v") + (parallel [(const_int 0) (const_int 1) + (const_int 2) (const_int 3) + (const_int 4) (const_int 5) + (const_int 6) (const_int 7)])) + (match_operand: 2 "nonimm_or_0_operand" "0C,0") + (match_operand:QI 3 "register_operand" "Yk,Yk")))] + "TARGET_AVX512F + && (!MEM_P (operands[0]) || rtx_equal_p (operands[0], operands[2]))" + "vextract32x8\t{$0x0, %1, %0%{%3%}%N2|%0%{%3%}%N2, %1, 0x0}" + [(set_attr "type" "sselog1") + (set_attr "prefix_extra" "1") + (set_attr "length_immediate" "1") + (set_attr "memory" "none,store") + (set_attr "prefix" "evex") + (set_attr "mode" "")]) + +(define_insn "vec_extract_lo_" [(set (match_operand: 0 "nonimmediate_operand" "=v,v,m") (vec_select: - (match_operand:V16FI 1 "" - "v,,v") + (match_operand:V16FI 1 "nonimmediate_operand" "v,m,v") (parallel [(const_int 0) (const_int 1) - (const_int 2) (const_int 3) - (const_int 4) (const_int 5) - (const_int 6) (const_int 7)])))] + (const_int 2) (const_int 3) + (const_int 4) (const_int 5) + (const_int 6) (const_int 7)])))] "TARGET_AVX512F - && - && ( || !(MEM_P (operands[0]) && MEM_P (operands[1])))" + && !(MEM_P (operands[0]) && MEM_P (operands[1]))" { - if ( - || (!TARGET_AVX512VL - && !REG_P (operands[0]) - && EXT_REX_SSE_REG_P (operands[1]))) - return "vextract32x8\t{$0x0, %1, %0|%0, %1, 0x0}"; + if (!TARGET_AVX512VL + && !REG_P (operands[0]) + && EXT_REX_SSE_REG_P (operands[1])) + return "vextract32x8\t{$0x0, %1, %0|%0, %1, 0x0}"; else return "#"; } @@ -8833,28 +8848,34 @@ (define_split operands[1] = gen_lowpart (mode, operands[1]); }) -(define_insn "vec_extract_lo_" - [(set (match_operand: 0 "" "=v,v,m") - (vec_select: - (match_operand:VI8F_256 1 "" - "v,,v") - (parallel [(const_int 0) (const_int 1)])))] - "TARGET_AVX - && && - && ( || !(MEM_P (operands[0]) && MEM_P (operands[1])))" -{ - if () - return "vextract64x2\t{$0x0, %1, %0%{%3%}|%0%{%3%}, %1, 0x0}"; - else - return "#"; -} +(define_insn "vec_extract_lo__mask" + [(set (match_operand: 0 "nonimmediate_operand" "=v,m") + (vec_merge: + (vec_select: + (match_operand:VI8F_256 1 "register_operand" "v,v") + (parallel [(const_int 0) (const_int 1)])) + (match_operand: 2 "nonimm_or_0_operand" "0C,0") + (match_operand:QI 3 "register_operand" "Yk,Yk")))] + "TARGET_AVX512DQ + && TARGET_AVX512VL + && (!MEM_P (operands[0]) || rtx_equal_p (operands[0], operands[2]))" + "vextract64x2\t{$0x0, %1, %0%{%3%}%N2|%0%{%3%}%N2, %1, 0x0}" [(set_attr "type" "sselog1") (set_attr "prefix_extra" "1") (set_attr "length_immediate" "1") - (set_attr "memory" "none,load,store") + (set_attr "memory" "none,store") (set_attr "prefix" "evex") (set_attr "mode" "XI")]) +(define_insn "vec_extract_lo_" + [(set (match_operand: 0 "nonimmediate_operand" "=vm,v") + (vec_select: + (match_operand:VI8F_256 1 "nonimmediate_operand" "v,vm") + (parallel [(const_int 0) (const_int 1)])))] + "TARGET_AVX + && !(MEM_P (operands[0]) && MEM_P (operands[1]))" + "#") + (define_split [(set (match_operand: 0 "nonimmediate_operand") (vec_select: @@ -8865,20 +8886,38 @@ (define_split [(set (match_dup 0) (match_dup 1))] "operands[1] = gen_lowpart (mode, operands[1]);") -(define_insn "vec_extract_hi_" - [(set (match_operand: 0 "" "=v,") +(define_insn "vec_extract_hi__mask" + [(set (match_operand: 0 "nonimmediate_operand" "=v,m") + (vec_merge: + (vec_select: + (match_operand:VI8F_256 1 "register_operand" "v,v") + (parallel [(const_int 2) (const_int 3)])) + (match_operand: 2 "nonimm_or_0_operand" "0C,0") + (match_operand:QI 3 "register_operand" "Yk,Yk")))] + "TARGET_AVX512DQ + && TARGET_AVX512VL + && (!MEM_P (operands[0]) || rtx_equal_p (operands[0], operands[2]))" + "vextract64x2\t{$0x1, %1, %0%{%3%}%N2|%0%{%3%}%N2, %1, 0x1}" + [(set_attr "type" "sselog1") + (set_attr "prefix_extra" "1") + (set_attr "length_immediate" "1") + (set_attr "prefix" "vex") + (set_attr "mode" "")]) + +(define_insn "vec_extract_hi_" + [(set (match_operand: 0 "nonimmediate_operand" "=vm") (vec_select: - (match_operand:VI8F_256 1 "register_operand" "v,v") + (match_operand:VI8F_256 1 "register_operand" "v") (parallel [(const_int 2) (const_int 3)])))] - "TARGET_AVX && && " + "TARGET_AVX" { if (TARGET_AVX512VL) - { - if (TARGET_AVX512DQ) - return "vextract64x2\t{$0x1, %1, %0|%0, %1, 0x1}"; - else - return "vextract32x4\t{$0x1, %1, %0|%0, %1, 0x1}"; - } + { + if (TARGET_AVX512DQ) + return "vextract64x2\t{$0x1, %1, %0|%0, %1, 0x1}"; + else + return "vextract32x4\t{$0x1, %1, %0|%0, %1, 0x1}"; + } else return "vextract\t{$0x1, %1, %0|%0, %1, 0x1}"; } @@ -8899,74 +8938,51 @@ (define_split [(set (match_dup 0) (match_dup 1))] "operands[1] = gen_lowpart (mode, operands[1]);") -(define_insn "vec_extract_lo_" - [(set (match_operand: 0 "" - "=,v") - (vec_select: - (match_operand:VI4F_256 1 "" - "v,") - (parallel [(const_int 0) (const_int 1) - (const_int 2) (const_int 3)])))] - "TARGET_AVX - && && - && ( || !(MEM_P (operands[0]) && MEM_P (operands[1])))" -{ - if () - return "vextract32x4\t{$0x0, %1, %0|%0, %1, 0x0}"; - else - return "#"; -} - [(set_attr "type" "sselog1") - (set_attr "prefix_extra" "1") - (set_attr "length_immediate" "1") - (set_attr "prefix" "evex") - (set_attr "mode" "")]) - -(define_insn "vec_extract_lo__maskm" - [(set (match_operand: 0 "memory_operand" "=m") +(define_insn "vec_extract_lo__mask" + [(set (match_operand: 0 "nonimmediate_operand" "=v,m") (vec_merge: (vec_select: - (match_operand:VI4F_256 1 "register_operand" "v") + (match_operand:VI4F_256 1 "register_operand" "v,v") (parallel [(const_int 0) (const_int 1) - (const_int 2) (const_int 3)])) - (match_operand: 2 "memory_operand" "0") - (match_operand:QI 3 "register_operand" "Yk")))] - "TARGET_AVX512VL && TARGET_AVX512F - && rtx_equal_p (operands[2], operands[0])" - "vextract32x4\t{$0x0, %1, %0%{%3%}|%0%{%3%}, %1, 0x0}" + (const_int 2) (const_int 3)])) + (match_operand: 2 "nonimm_or_0_operand" "0C,0") + (match_operand:QI 3 "register_operand" "Yk,Yk")))] + "TARGET_AVX512DQ + && TARGET_AVX512VL + && (!MEM_P (operands[0]) || rtx_equal_p (operands[0], operands[2]))" + "vextract32x4\t{$0x0, %1, %0%{%3%}%N2|%0%{%3%}%N2, %1, 0x0}" [(set_attr "type" "sselog1") (set_attr "prefix_extra" "1") (set_attr "length_immediate" "1") (set_attr "prefix" "evex") (set_attr "mode" "")]) -(define_insn "vec_extract_hi__maskm" - [(set (match_operand: 0 "memory_operand" "=m") - (vec_merge: - (vec_select: - (match_operand:VI4F_256 1 "register_operand" "v") - (parallel [(const_int 4) (const_int 5) - (const_int 6) (const_int 7)])) - (match_operand: 2 "memory_operand" "0") - (match_operand: 3 "register_operand" "Yk")))] - "TARGET_AVX512F && TARGET_AVX512VL - && rtx_equal_p (operands[2], operands[0])" - "vextract32x4\t{$0x1, %1, %0%{%3%}|%0%{%3%}, %1, 0x1}" +(define_insn "vec_extract_lo_" + [(set (match_operand: 0 "nonimmediate_operand" "=vm,v") + (vec_select: + (match_operand:VI4F_256 1 "nonimmediate_operand" "v,vm") + (parallel [(const_int 0) (const_int 1) + (const_int 2) (const_int 3)])))] + "TARGET_AVX + && !(MEM_P (operands[0]) && MEM_P (operands[1]))" + "#" [(set_attr "type" "sselog1") + (set_attr "prefix_extra" "1") (set_attr "length_immediate" "1") (set_attr "prefix" "evex") (set_attr "mode" "")]) (define_insn "vec_extract_hi__mask" - [(set (match_operand: 0 "register_operand" "=v") + [(set (match_operand: 0 "register_operand" "=v,m") (vec_merge: (vec_select: - (match_operand:VI4F_256 1 "register_operand" "v") + (match_operand:VI4F_256 1 "register_operand" "v,v") (parallel [(const_int 4) (const_int 5) (const_int 6) (const_int 7)])) - (match_operand: 2 "nonimm_or_0_operand" "0C") - (match_operand: 3 "register_operand" "Yk")))] - "TARGET_AVX512VL" + (match_operand: 2 "nonimm_or_0_operand" "0C,0") + (match_operand: 3 "register_operand" "Yk,Yk")))] + "TARGET_AVX512VL + && (!MEM_P (operands[0]) || rtx_equal_p (operands[0], operands[2]))" "vextract32x4\t{$0x1, %1, %0%{%3%}%N2|%0%{%3%}%N2, %1, 0x1}" [(set_attr "type" "sselog1") (set_attr "length_immediate" "1") --- gcc/testsuite/gcc.target/i386/avx512vl-pr93069.c.jj 2019-12-28 16:31:30.118695074 +0100 +++ gcc/testsuite/gcc.target/i386/avx512vl-pr93069.c 2019-12-28 16:32:16.920990539 +0100 @@ -0,0 +1,12 @@ +/* PR target/93069 */ +/* { dg-do assemble { target vect_simd_clones } } */ +/* { dg-options "-O2 -fopenmp-simd -mtune=skylake-avx512" } */ +/* { dg-additional-options "-mavx512vl" { target avx512vl } } */ +/* { dg-additional-options "-mavx512dq" { target avx512dq } } */ + +#pragma omp declare simd +int +foo (int x, int y) +{ + return x == 0 ? x : y; +} --- gcc/testsuite/gcc.dg/vect/pr93069.c.jj 2019-12-28 16:31:01.822121036 +0100 +++ gcc/testsuite/gcc.dg/vect/pr93069.c 2019-12-28 16:30:35.503517205 +0100 @@ -0,0 +1,10 @@ +/* PR target/93069 */ +/* { dg-do assemble { target vect_simd_clones } } */ +/* { dg-options "-O2 -fopenmp-simd" } */ + +#pragma omp declare simd +int +foo (int x, int y) +{ + return x == 0 ? x : y; +}