From patchwork Wed Apr 11 18:59:45 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jakub Jelinek X-Patchwork-Id: 897352 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-476235-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="GbOHRZS3"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 40Ltdf1ZdFz9s1X for ; Thu, 12 Apr 2018 05:00:05 +1000 (AEST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:reply-to:references:mime-version :content-type:in-reply-to; q=dns; s=default; b=pTNZISvUVyOklc2gh 5Yg6TDkwHO5kil12pZRfNDJgREQsJLTLLtelImANW6PIuzBAWEklmyYRNMaKnsGN CcQo2TdBNdfHDUvfqPFWqvbENa++147ejb0RxOQUPBnqK8pCu99tP115/F+1XsI8 k/q8AtltXID3tOMgiF3lGh+YVk= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:reply-to:references:mime-version :content-type:in-reply-to; s=default; bh=Ezl5lHItQYvO0AVmaDSc7m5 0uzk=; b=GbOHRZS3xYSYCBw3Nq9ef7ocIoBS+CKPnSCDQNtiIgktquf7Euf1ZMK YdpcBq987ZCF03OfgVWtGBUyYQ1q/oAKwKmf4q2ZCd8M1bo0vSUnmwBQK81kX9bW zALkSjU4JbIjxCUH5qJaahaRq3a5OSk1JkxjyrkQ80K/4qpdEQq4= Received: (qmail 78704 invoked by alias); 11 Apr 2018 18:59:57 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 78693 invoked by uid 89); 11 Apr 2018 18:59:57 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-11.6 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_2, GIT_PATCH_3 autolearn=ham version=3.3.2 spammy= X-HELO: mx1.redhat.com Received: from mx3-rdu2.redhat.com (HELO mx1.redhat.com) (66.187.233.73) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 11 Apr 2018 18:59:55 +0000 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 42DF24023141; Wed, 11 Apr 2018 18:59:50 +0000 (UTC) Received: from tucnak.zalov.cz (unknown [10.36.118.110]) by smtp.corp.redhat.com (Postfix) with ESMTPS id F401DDEEC6; Wed, 11 Apr 2018 18:59:49 +0000 (UTC) Received: from tucnak.zalov.cz (localhost [127.0.0.1]) by tucnak.zalov.cz (8.15.2/8.15.2) with ESMTP id w3BIxlUI023776; Wed, 11 Apr 2018 20:59:47 +0200 Received: (from jakub@localhost) by tucnak.zalov.cz (8.15.2/8.15.2/Submit) id w3BIxk9I023775; Wed, 11 Apr 2018 20:59:46 +0200 Date: Wed, 11 Apr 2018 20:59:45 +0200 From: Jakub Jelinek To: Kirill Yukhin , Uros Bizjak Cc: gcc-patches@gcc.gnu.org Subject: [PATCH] Fix non-AVX512VL handling of lo extraction from AVX512F xmm16+ (PR target/85328, take 2) Message-ID: <20180411185945.GB8577@tucnak> Reply-To: Jakub Jelinek References: <20180411132728.GS8577@tucnak> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20180411132728.GS8577@tucnak> User-Agent: Mutt/1.9.2 (2017-12-15) X-IsSubscribed: yes On Wed, Apr 11, 2018 at 03:27:28PM +0200, Jakub Jelinek wrote: > In lots of patterns we assume that we never see xmm16+ hard registers > with 128-bit and 256-bit vector modes when not -mavx512vl, because > HARD_REGNO_MODE_OK refuses those. > Unfortunately, as this testcase and patch shows, the vec_extract_lo* > splitters work as a loophole around this, we happily create instructions > like (set (reg:V32QI xmm5) (reg:V32QI xmm16)) and then hard register > propagation can propagate the V32QI xmm16 into other insns like vpand. > > The following patch fixes it by making sure we never create such registers, > just emit (set (reg:V64QI xmm5) (reg:V64QI xmm16)) instead, which by copying > all the 512 bits also copies the low bits, and as the destination is > originally V32QI which is not HARD_REGNO_MODE_OK in xmm16+, this should be > fine. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? Actually, thinking about it more (not that I have managed to come up with a testcase), if output is a MEM and input is xmm16+, then we really need to give up in the splitters and instead emit the v*extract* instructions, because simple vmovdqa and vmovap[sd] require AVX512VL for the EVEX encodings. So, here is an updated patch, bootstrapped/regtested on x86_64-linux and i686-linux, is this one ok for trunk instead? Tried e.g. #include __m256d f1 (__m512d x) { register __m512d a __asm ("zmm16"); __asm ("" : "=v" (a) : "0" (x)); return _mm512_extractf64x4_pd (a, 0); } void f2 (__m256d *p, __m512d x) { register __m512d a __asm ("zmm16"); __asm ("" : "=v" (a) : "0" (x)); *p = _mm512_extractf64x4_pd (a, 0); } __m256d f3 (__m512d x, __m256d y) { register __m512d a __asm ("zmm16"); __asm ("" : "=v" (a) : "0" (x)); return y + _mm512_extractf64x4_pd (a, 0); } __m128 f4 (__m512 x) { register __m512 a __asm ("zmm16"); __asm ("" : "=v" (a) : "0" (x)); return _mm512_extractf32x4_ps (a, 0); } void f5 (__m128 *p, __m512 x) { register __m512 a __asm ("zmm16"); __asm ("" : "=v" (a) : "0" (x)); *p = _mm512_extractf32x4_ps (a, 0); } __m128 f6 (__m512 x, __m128 y) { register __m512 a __asm ("zmm16"); __asm ("" : "=v" (a) : "0" (x)); return y + _mm512_extractf32x4_ps (a, 0); } __m256i f7 (__m512i x) { register __m512i a __asm ("zmm16"); __asm ("" : "=v" (a) : "0" (x)); return _mm512_extracti64x4_epi64 (a, 0); } void f8 (__m256i *p, __m512i x) { register __m512i a __asm ("zmm16"); __asm ("" : "=v" (a) : "0" (x)); *p = _mm512_extracti64x4_epi64 (a, 0); } __m256i f9 (__m512i x, __m256i y) { register __m512i a __asm ("zmm16"); __asm ("" : "=v" (a) : "0" (x)); return y + _mm512_extracti64x4_epi64 (a, 0); } __m128i f10 (__m512i x) { register __m512i a __asm ("zmm16"); __asm ("" : "=v" (a) : "0" (x)); return _mm512_extracti32x4_epi32 (a, 0); } void f11 (__m128i *p, __m512i x) { register __m512i a __asm ("zmm16"); __asm ("" : "=v" (a) : "0" (x)); *p = _mm512_extracti32x4_epi32 (a, 0); } __m128i f12 (__m512i x, __m128i y) { register __m512i a __asm ("zmm16"); __asm ("" : "=v" (a) : "0" (x)); return y + _mm512_extracti32x4_epi32 (a, 0); } but couldn't reproduce though. 2018-04-11 Jakub Jelinek PR target/85328 * config/i386/sse.md (avx512dq_vextract64x2_1 split, avx512f_vextract32x4_1 split, vec_extract_lo_ split, vec_extract_lo_v32hi, vec_extract_lo_v64qi): For non-AVX512VL if input is xmm16+ reg and output is a reg, avoid creating invalid lowpart subreg, but instead split into a 512-bit move. Don't split if not AVX512VL, input is xmm16+ reg and output is a mem. (vec_extract_lo_, vec_extract_lo_v32hi, vec_extract_lo_v64qi): Don't require split if not AVX512VL, input is xmm16+ reg and output is a mem. * gcc.target/i386/pr85328.c: New test. Jakub --- gcc/config/i386/sse.md.jj 2018-04-11 13:36:29.368015262 +0200 +++ gcc/config/i386/sse.md 2018-04-11 17:15:56.175746606 +0200 @@ -7361,9 +7361,21 @@ (define_split (vec_select: (match_operand:V8FI 1 "register_operand") (parallel [(const_int 0) (const_int 1)])))] - "TARGET_AVX512DQ && reload_completed" + "TARGET_AVX512DQ + && reload_completed + && (TARGET_AVX512VL + || REG_P (operands[0]) + || !EXT_REX_SSE_REG_P (operands[1]))" [(set (match_dup 0) (match_dup 1))] - "operands[1] = gen_lowpart (mode, operands[1]);") +{ + if (!TARGET_AVX512VL + && REG_P (operands[0]) + && EXT_REX_SSE_REG_P (operands[1])) + operands[0] + = lowpart_subreg (mode, operands[0], mode); + else + operands[1] = gen_lowpart (mode, operands[1]); +}) (define_insn "avx512f_vextract32x4_1" [(set (match_operand: 0 "" "=") @@ -7394,9 +7406,21 @@ (define_split (match_operand:V16FI 1 "register_operand") (parallel [(const_int 0) (const_int 1) (const_int 2) (const_int 3)])))] - "TARGET_AVX512F && reload_completed" + "TARGET_AVX512F + && reload_completed + && (TARGET_AVX512VL + || REG_P (operands[0]) + || !EXT_REX_SSE_REG_P (operands[1]))" [(set (match_dup 0) (match_dup 1))] - "operands[1] = gen_lowpart (mode, operands[1]);") +{ + if (!TARGET_AVX512VL + && REG_P (operands[0]) + && EXT_REX_SSE_REG_P (operands[1])) + operands[0] + = lowpart_subreg (mode, operands[0], mode); + else + operands[1] = gen_lowpart (mode, operands[1]); +}) (define_mode_attr extract_type_2 [(V16SF "avx512dq") (V16SI "avx512dq") (V8DF "avx512f") (V8DI "avx512f")]) @@ -7639,7 +7663,10 @@ (define_insn "vec_extract_lo_ && ( || !(MEM_P (operands[0]) && MEM_P (operands[1])))" { - if () + if ( + || (!TARGET_AVX512VL + && !REG_P (operands[0]) + && EXT_REX_SSE_REG_P (operands[1]))) return "vextract32x8\t{$0x0, %1, %0|%0, %1, 0x0}"; else return "#"; @@ -7654,9 +7681,20 @@ (define_split (const_int 4) (const_int 5) (const_int 6) (const_int 7)])))] "TARGET_AVX512F && !(MEM_P (operands[0]) && MEM_P (operands[1])) - && reload_completed" + && reload_completed + && (TARGET_AVX512VL + || REG_P (operands[0]) + || !EXT_REX_SSE_REG_P (operands[1]))" [(set (match_dup 0) (match_dup 1))] - "operands[1] = gen_lowpart (mode, operands[1]);") +{ + if (!TARGET_AVX512VL + && REG_P (operands[0]) + && EXT_REX_SSE_REG_P (operands[1])) + operands[0] + = lowpart_subreg (mode, operands[0], mode); + else + operands[1] = gen_lowpart (mode, operands[1]); +}) (define_insn "vec_extract_lo_" [(set (match_operand: 0 "" "=v,m") @@ -7828,10 +7866,27 @@ (define_insn_and_split "vec_extract_lo_v (const_int 12) (const_int 13) (const_int 14) (const_int 15)])))] "TARGET_AVX512F && !(MEM_P (operands[0]) && MEM_P (operands[1]))" - "#" - "&& reload_completed" +{ + if (TARGET_AVX512VL + || REG_P (operands[0]) + || !EXT_REX_SSE_REG_P (operands[1])) + return "#"; + else + return "vextracti64x4\t{$0x0, %1, %0|%0, %1, 0x0}"; +} + "&& reload_completed + && (TARGET_AVX512VL + || REG_P (operands[0]) + || !EXT_REX_SSE_REG_P (operands[1]))" [(set (match_dup 0) (match_dup 1))] - "operands[1] = gen_lowpart (V16HImode, operands[1]);") +{ + if (!TARGET_AVX512VL + && REG_P (operands[0]) + && EXT_REX_SSE_REG_P (operands[1])) + operands[0] = lowpart_subreg (V32HImode, operands[0], V16HImode); + else + operands[1] = gen_lowpart (V16HImode, operands[1]); +}) (define_insn "vec_extract_hi_v32hi" [(set (match_operand:V16HI 0 "nonimmediate_operand" "=v,m") @@ -7913,10 +7968,27 @@ (define_insn_and_split "vec_extract_lo_v (const_int 28) (const_int 29) (const_int 30) (const_int 31)])))] "TARGET_AVX512F && !(MEM_P (operands[0]) && MEM_P (operands[1]))" - "#" - "&& reload_completed" +{ + if (TARGET_AVX512VL + || REG_P (operands[0]) + || !EXT_REX_SSE_REG_P (operands[1])) + return "#"; + else + return "vextracti64x4\t{$0x0, %1, %0|%0, %1, 0x0}"; +} + "&& reload_completed + && (TARGET_AVX512VL + || REG_P (operands[0]) + || !EXT_REX_SSE_REG_P (operands[1]))" [(set (match_dup 0) (match_dup 1))] - "operands[1] = gen_lowpart (V32QImode, operands[1]);") +{ + if (!TARGET_AVX512VL + && REG_P (operands[0]) + && EXT_REX_SSE_REG_P (operands[1])) + operands[0] = lowpart_subreg (V64QImode, operands[0], V32QImode); + else + operands[1] = gen_lowpart (V32QImode, operands[1]); +}) (define_insn "vec_extract_hi_v64qi" [(set (match_operand:V32QI 0 "nonimmediate_operand" "=v,m") --- gcc/testsuite/gcc.target/i386/pr85328.c.jj 2018-04-11 16:41:49.769327148 +0200 +++ gcc/testsuite/gcc.target/i386/pr85328.c 2018-04-11 16:41:49.769327148 +0200 @@ -0,0 +1,18 @@ +/* PR target/85328 */ +/* { dg-do assemble { target avx512f } } */ +/* { dg-options "-O3 -fno-caller-saves -mavx512f" } */ + +typedef char U __attribute__((vector_size (64))); +typedef int V __attribute__((vector_size (64))); +U a, b; + +extern void bar (void); + +V +foo (V f) +{ + b <<= (U){(V){}[63]} & 7; + bar (); + a = (U)f & 7; + return (V)b; +}