From patchwork Wed Feb 12 09:26:40 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jakub Jelinek X-Patchwork-Id: 1236712 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-519408-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha1 header.s=default header.b=M5N3lTrp; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=XekMbUCr; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 48HZ7Q1S82z9sPJ for ; Wed, 12 Feb 2020 20:28:01 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:reply-to:mime-version :content-type:content-transfer-encoding; q=dns; s=default; b=xkt mY1VbwaGhO1POU+a87nsXIvvZhzwqBHAO6s14y5dK0JdDzSwBj/PdiAGbeyFCvm9 g2DFivOWi93SDWBn2XSKOPGpYShH1kr+y5pGy4XYyduedZiCcmGJDze1MoiC/m/7 WE/HJ5+aZzv4Pd/6Z8WkJnPxJJO/FgZk/t1ZvUL0= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:reply-to:mime-version :content-type:content-transfer-encoding; s=default; bh=iz6jgdKfP 6XO3KXwcE/pC0jbhMs=; b=M5N3lTrp93a+8m0ma/aw7wNXQ5F4YSWgf++kShAS4 1ZNngSZjAhaWJOwtNi0CG7F5TxQj/uew4X2EIGxGaQia8sT+mN/eVbKbVbB6myl/ XJjmz77+1FmAPPiThZihTyUYWj5zJh+4M7oDLNngREg96tkE8PoUkVpFizU11ZMb 5E= Received: (qmail 88869 invoked by alias); 12 Feb 2020 09:27:53 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 88859 invoked by uid 89); 12 Feb 2020 09:27:52 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-8.0 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 spammy=ATT X-HELO: us-smtp-delivery-1.mimecast.com Received: from us-smtp-2.mimecast.com (HELO us-smtp-delivery-1.mimecast.com) (205.139.110.61) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 12 Feb 2020 09:27:50 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1581499669; h=from:from:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=SmVTZ0RaTaLnAKaB2qSWpzwRyS963e2XhXPXHsoygrk=; b=XekMbUCr/7rP9M6o5nswKrCDN9PDtQw5fKAhDrwywkb4g/NzjVNX5HLy6Rj2ReIGsJw5a7 +cT9rej3Brf7t6r6vZcdIMhFYN5DJO7ZO0mQrsJ7mxFuoHn9b/kaa5zfgTEr6aiXQtlT7a stYUE9cegAUG7qXlpn6r+2VwsyQ3X00= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-438-4W6shfhLNWCeohLMloM_kA-1; Wed, 12 Feb 2020 04:27:43 -0500 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id ABAED19057A7; Wed, 12 Feb 2020 09:27:42 +0000 (UTC) Received: from tucnak.zalov.cz (ovpn-116-51.ams2.redhat.com [10.36.116.51]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 3D8CA1001B07; Wed, 12 Feb 2020 09:27:42 +0000 (UTC) Received: from tucnak.zalov.cz (localhost [127.0.0.1]) by tucnak.zalov.cz (8.15.2/8.15.2) with ESMTP id 01C9R52X016649; Wed, 12 Feb 2020 10:27:25 +0100 Received: (from jakub@localhost) by tucnak.zalov.cz (8.15.2/8.15.2/Submit) id 01C9Qe5D016647; Wed, 12 Feb 2020 10:26:40 +0100 Date: Wed, 12 Feb 2020 10:26:40 +0100 From: Jakub Jelinek To: Jeff Law , Uros Bizjak Cc: gcc-patches@gcc.gnu.org Subject: [PATCH] i386: Fix up vec_extract_lo* patterns [PR93670] Message-ID: <20200212092640.GX17695@tucnak> Reply-To: Jakub Jelinek MIME-Version: 1.0 User-Agent: Mutt/1.11.3 (2019-02-01) X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Disposition: inline X-IsSubscribed: yes Hi! The VEXTRACT* insns have way too many different CPUID feature flags (ATT syntax) vextractf128 $imm, %ymm, %xmm/mem AVX vextracti128 $imm, %ymm, %xmm/mem AVX2 vextract{f,i}32x4 $imm, %ymm, %xmm/mem {k}{z} AVX512VL+AVX512F vextract{f,i}32x4 $imm, %zmm, %xmm/mem {k}{z} AVX512F vextract{f,i}64x2 $imm, %ymm, %xmm/mem {k}{z} AVX512VL+AVX512DQ vextract{f,i}64x2 $imm, %zmm, %xmm/mem {k}{z} AVX512DQ vextract{f,i}32x8 $imm, %zmm, %ymm/mem {k}{z} AVX512DQ vextract{f,i}64x4 $imm, %zmm, %ymm/mem {k}{z} AVX512F As the testcase shows and the patch too, we didn't get it right in all cases. The first hunk is about avx512vl_vextractf128v8s[if] incorrectly requiring TARGET_AVX512DQ. The corresponding insn is the first vextract{f,i}32x4 above, so it requires VL+F, and the builtins have it correct (TARGET_AVX512VL implies TARGET_AVX512F): BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_vextractf128v8sf, "__builtin_ia32_extractf32x4_256_mask", IX86_BUILTIN_EXTRACTF32X4_256, UNKNOWN, (int) V4SF_FTYPE_V8SF_INT_V4SF_UQI) BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_vextractf128v8si, "__builtin_ia32_extracti32x4_256_mask", IX86_BUILTIN_EXTRACTI32X4_256, UNKNOWN, (int) V4SI_FTYPE_V8SI_INT_V4SI_UQI) We only need TARGET_AVX512DQ for avx512vl_vextractf128v4d[if]. The second hunk is about vec_extract_lo_v16s[if]{,_mask}. These are using the vextract{f,i}32x8 insns (AVX512DQ above), but we weren't requiring that, but instead incorrectly && 1 for non-masked and && (64 == 64 && TARGET_AVX512VL) for masked insns. This is extraction from ZMM, so it doesn't need VL for anything. The hunk actually only requires TARGET_AVX512DQ when the insn is masked, if it is not masked, when TARGET_AVX512DQ isn't available we can use vextract{f,i}64x4 instead which is available already in TARGET_AVX512F and does the same thing, extracts the low 256 bits from 512 bits vector (often we split it into just nothing, but there are some special cases like when using xmm16+ when we can't without AVX512VL). The last hunk is about vec_extract_lo_v8s[if]{,_mask}. The non-_mask suffixed ones are ok already and just split into nothing (lowpart subreg). The masked ones were incorrectly requiring TARGET_AVX512VL and TARGET_AVX512DQ, when we only need TARGET_AVX512VL. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2020-02-12 Jakub Jelinek PR target/93670 * config/i386/sse.md (VI48F_256_DQ): New mode iterator. (avx512vl_vextractf128): Use it instead of VI48F_256. Remove TARGET_AVX512DQ from condition. (vec_extract_lo_): Use instead of in condition. If TARGET_AVX512DQ is false, emit vextract*64x4 instead of vextract*32x8. (vec_extract_lo_): Drop from condition. * gcc.target/i386/avx512vl-pr93670.c: New test. Jakub --- gcc/config/i386/sse.md.jj 2020-02-11 14:54:38.017593464 +0100 +++ gcc/config/i386/sse.md 2020-02-11 15:50:59.629130828 +0100 @@ -8719,13 +8719,16 @@ (define_insn "vec_extract_hi_")]) +(define_mode_iterator VI48F_256_DQ + [V8SI V8SF (V4DI "TARGET_AVX512DQ") (V4DF "TARGET_AVX512DQ")]) + (define_expand "avx512vl_vextractf128" [(match_operand: 0 "nonimmediate_operand") - (match_operand:VI48F_256 1 "register_operand") + (match_operand:VI48F_256_DQ 1 "register_operand") (match_operand:SI 2 "const_0_to_1_operand") (match_operand: 3 "nonimm_or_0_operand") (match_operand:QI 4 "register_operand")] - "TARGET_AVX512DQ && TARGET_AVX512VL" + "TARGET_AVX512VL" { rtx (*insn)(rtx, rtx, rtx, rtx); rtx dest = operands[0]; @@ -8793,14 +8796,19 @@ (define_insn "vec_extract_lo_ + && && ( || !(MEM_P (operands[0]) && MEM_P (operands[1])))" { if ( || (!TARGET_AVX512VL && !REG_P (operands[0]) && EXT_REX_SSE_REG_P (operands[1]))) - return "vextract32x8\t{$0x0, %1, %0|%0, %1, 0x0}"; + { + if (TARGET_AVX512DQ) + return "vextract32x8\t{$0x0, %1, %0|%0, %1, 0x0}"; + else + return "vextract64x4\t{$0x0, %1, %0|%0, %1, 0x0}"; + } else return "#"; } @@ -8910,7 +8918,7 @@ (define_insn "vec_extract_lo_ && + && && ( || !(MEM_P (operands[0]) && MEM_P (operands[1])))" { if () --- gcc/testsuite/gcc.target/i386/avx512vl-pr93670.c.jj 2020-02-11 16:00:14.874930873 +0100 +++ gcc/testsuite/gcc.target/i386/avx512vl-pr93670.c 2020-02-11 15:59:01.252019025 +0100 @@ -0,0 +1,77 @@ +/* PR target/93670 */ +/* { dg-do compile } */ +/* { dg-options "-O2 -mavx512vl -mno-avx512dq" } */ + +#include + +__m128i +f1 (__m256i x) +{ + return _mm256_extracti32x4_epi32 (x, 0); +} + +__m128i +f2 (__m256i x, __m128i w, __mmask8 m) +{ + return _mm256_mask_extracti32x4_epi32 (w, m, x, 0); +} + +__m128i +f3 (__m256i x, __mmask8 m) +{ + return _mm256_maskz_extracti32x4_epi32 (m, x, 0); +} + +__m128 +f4 (__m256 x) +{ + return _mm256_extractf32x4_ps (x, 0); +} + +__m128 +f5 (__m256 x, __m128 w, __mmask8 m) +{ + return _mm256_mask_extractf32x4_ps (w, m, x, 0); +} + +__m128 +f6 (__m256 x, __mmask8 m) +{ + return _mm256_maskz_extractf32x4_ps (m, x, 0); +} + +__m128i +f7 (__m256i x) +{ + return _mm256_extracti32x4_epi32 (x, 1); +} + +__m128i +f8 (__m256i x, __m128i w, __mmask8 m) +{ + return _mm256_mask_extracti32x4_epi32 (w, m, x, 1); +} + +__m128i +f9 (__m256i x, __mmask8 m) +{ + return _mm256_maskz_extracti32x4_epi32 (m, x, 1); +} + +__m128 +f10 (__m256 x) +{ + return _mm256_extractf32x4_ps (x, 1); +} + +__m128 +f11 (__m256 x, __m128 w, __mmask8 m) +{ + return _mm256_mask_extractf32x4_ps (w, m, x, 1); +} + +__m128 +f12 (__m256 x, __mmask8 m) +{ + return _mm256_maskz_extractf32x4_ps (m, x, 1); +}