From patchwork Wed Mar 29 22:36:20 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jakub Jelinek X-Patchwork-Id: 744985 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3vtjKy6DcKz9ryZ for ; Thu, 30 Mar 2017 09:36:37 +1100 (AEDT) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="kOWqpd7J"; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:reply-to:mime-version :content-type; q=dns; s=default; b=uxGCzAOFdj7yrOfUs3t7l22w+lJGu SGGNtbM9e1pgIEo8PI2relNMhCOiiWPFUmTFWX3A2FDx+zCiWUM7UtGXt580ndZs ueE59KLgEJiMrLaN+9cL9xhY+qbC4xjXzEGeREpyFpIyogfJ/QfQ97COkxEqfI8H 8jwU/iGtgDBbmQ= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:reply-to:mime-version :content-type; s=default; bh=1nGfVCLQp/mX3LLCHt36pnsU/dc=; b=kOW qpd7JJ6Z/NEdlvKsALVWtu7IZW8M/V8TbMPWrst2R99910GN1nh9CYwTiuCDqmaP B807zCkXh1pRp1zB5uV49uLf13kPGYF3zAOIJxPbx7TmeZB3+WebXKXn5SYo4Qse zDAuOaKRP98RJmhPOfny9SJ48osaH1ZFUml9fYlQ= Received: (qmail 103982 invoked by alias); 29 Mar 2017 22:36:28 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 103962 invoked by uid 89); 29 Mar 2017 22:36:27 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-11.9 required=5.0 tests=BAYES_00, GIT_PATCH_2, GIT_PATCH_3, RP_MATCHES_RCVD, SPF_HELO_PASS autolearn=ham version=3.3.2 spammy= X-HELO: mx1.redhat.com Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 29 Mar 2017 22:36:25 +0000 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 69EF380F8E; Wed, 29 Mar 2017 22:36:25 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 69EF380F8E Authentication-Results: ext-mx03.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx03.extmail.prod.ext.phx2.redhat.com; spf=pass smtp.mailfrom=jakub@redhat.com DKIM-Filter: OpenDKIM Filter v2.11.0 mx1.redhat.com 69EF380F8E Received: from tucnak.zalov.cz (ovpn-116-72.ams2.redhat.com [10.36.116.72]) by smtp.corp.redhat.com (Postfix) with ESMTPS id ED9D060F8B; Wed, 29 Mar 2017 22:36:24 +0000 (UTC) Received: from tucnak.zalov.cz (localhost [127.0.0.1]) by tucnak.zalov.cz (8.15.2/8.15.2) with ESMTP id v2TMaMrh015637; Thu, 30 Mar 2017 00:36:22 +0200 Received: (from jakub@localhost) by tucnak.zalov.cz (8.15.2/8.15.2/Submit) id v2TMaKjs015636; Thu, 30 Mar 2017 00:36:20 +0200 Date: Thu, 30 Mar 2017 00:36:20 +0200 From: Jakub Jelinek To: Uros Bizjak , Kirill Yukhin Cc: gcc-patches@gcc.gnu.org Subject: [PATCH] Fix various avx512 extraction issues (PR target/80206) Message-ID: <20170329223620.GI17461@tucnak> Reply-To: Jakub Jelinek MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.7.1 (2016-10-04) X-IsSubscribed: yes Hi! As the testcase shows, we ICE with -mavx512f -ffloat-store, because at -O0 during expansion the destination is MEM, and the corresponding dup operand is some pseudo. There are *_mask patterns that have just register_operand / =v for the desination and vector_move_operand / 0C for the corresponding dup operand (but this doesn't apply when the destination is MEM), and then *_maskm patterns, that have memory_operand / =m and corresponding dup operand memory_operand / 0, but also requires rtx_equal_p between them in the condition, so that doesn't match either. The expanders have weirdo: if (MEM_P (operands[0]) && GET_CODE (operands[3]) == CONST_VECTOR) operands[0] = force_reg (mode, operands[0]); which can't really ever work, because the expander's caller expects the output to be stored in the original operands[0], but that is not where it stores it. Furthermore, force_reg makes no sense for the output operand. The following patch should fix that, bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? There are still some remaining issues that can perhaps be resolved incrementally, e.g. some insns use: (define_insn "vec_extract_hi_" [(set (match_operand: 0 "" "=,vm") If , is register_operand, so having vm constraint for it is strange. Not really sure how well it can work with vector_move_operand and 0C constraint, what will LRA do with it if the input isn't in memory but dest is, or if both are memory, but not the same one. 2017-03-28 Jakub Jelinek PR target/80206 * config/i386/sse.md (_vextract_mask): Force dest into register whenever it is a MEM not rtx_equal_p to the corresponding dup operand, and when forcing into reg move the reg into the memory afterwards. (_vextract_mask): Likewise. Use instead of for the force_reg mode. (avx512vl_vextractf128): Force dest into register either always when a MEM, or when it is a MEM not rtx_equal_p to the corresponding dup operand, or even not when it is a CONST_VECTOR depending on the mode and lo vs. hi. (avx512dq_vextract64x2_1_maskm): Remove extraneous parens. (avx512f_vextract32x4_1_maskm): Likewise. (avx512dq_vextract64x2_1): Likewise. Require that operands[2] is even. (avx512f_vextract32x4_1): Remove extraneous parens. Require that operands[2] is a multiple of 4. (vec_extract_lo_): Don't bother testing if operands[0] is a MEM if , the predicates/constraints disallow memory then. * gcc.target/i386/pr80206.c: New test. Jakub --- gcc/config/i386/sse.md.jj 2017-03-07 09:10:56.946428168 +0100 +++ gcc/config/i386/sse.md 2017-03-29 19:22:37.394215557 +0200 @@ -7135,19 +7135,22 @@ (define_expand "_vextract< { int mask; mask = INTVAL (operands[2]); + rtx dest = operands[0]; - if (MEM_P (operands[0]) && GET_CODE (operands[3]) == CONST_VECTOR) - operands[0] = force_reg (mode, operands[0]); + if (MEM_P (operands[0]) && !rtx_equal_p (operands[0], operands[3])) + dest = force_reg (mode, dest); if (mode == V16SImode || mode == V16SFmode) - emit_insn (gen_avx512f_vextract32x4_1_mask (operands[0], + emit_insn (gen_avx512f_vextract32x4_1_mask (dest, operands[1], GEN_INT (mask * 4), GEN_INT (mask * 4 + 1), GEN_INT (mask * 4 + 2), GEN_INT (mask * 4 + 3), operands[3], operands[4])); else - emit_insn (gen_avx512dq_vextract64x2_1_mask (operands[0], + emit_insn (gen_avx512dq_vextract64x2_1_mask (dest, operands[1], GEN_INT (mask * 2), GEN_INT (mask * 2 + 1), operands[3], operands[4])); + if (dest != operands[0]) + emit_move_insn (operands[0], dest); DONE; }) @@ -7161,8 +7164,8 @@ (define_insn "avx512dq_vextract 4 "memory_operand" "0") (match_operand:QI 5 "register_operand" "Yk")))] "TARGET_AVX512DQ - && (INTVAL (operands[2]) % 2 == 0) - && (INTVAL (operands[2]) == INTVAL (operands[3]) - 1) + && INTVAL (operands[2]) % 2 == 0 + && INTVAL (operands[2]) == INTVAL (operands[3]) - 1 && rtx_equal_p (operands[4], operands[0])" { operands[2] = GEN_INT ((INTVAL (operands[2])) >> 1); @@ -7187,13 +7190,13 @@ (define_insn "avx512f_vextract 6 "memory_operand" "0") (match_operand:QI 7 "register_operand" "Yk")))] "TARGET_AVX512F - && ((INTVAL (operands[2]) % 4 == 0) - && INTVAL (operands[2]) == (INTVAL (operands[3]) - 1) - && INTVAL (operands[3]) == (INTVAL (operands[4]) - 1) - && INTVAL (operands[4]) == (INTVAL (operands[5]) - 1)) + && INTVAL (operands[2]) % 4 == 0 + && INTVAL (operands[2]) == INTVAL (operands[3]) - 1 + && INTVAL (operands[3]) == INTVAL (operands[4]) - 1 + && INTVAL (operands[4]) == INTVAL (operands[5]) - 1 && rtx_equal_p (operands[6], operands[0])" { - operands[2] = GEN_INT ((INTVAL (operands[2])) >> 2); + operands[2] = GEN_INT (INTVAL (operands[2]) >> 2); return "vextract32x4\t{%2, %1, %0%{%7%}|%0%{%7%}, %1, %2}"; } [(set_attr "type" "sselog") @@ -7209,9 +7212,11 @@ (define_insn "avx512dq_vex (match_operand:V8FI 1 "register_operand" "v") (parallel [(match_operand 2 "const_0_to_7_operand") (match_operand 3 "const_0_to_7_operand")])))] - "TARGET_AVX512DQ && (INTVAL (operands[2]) == INTVAL (operands[3]) - 1)" + "TARGET_AVX512DQ + && INTVAL (operands[2]) % 2 == 0 + && INTVAL (operands[2]) == INTVAL (operands[3]) - 1" { - operands[2] = GEN_INT ((INTVAL (operands[2])) >> 1); + operands[2] = GEN_INT (INTVAL (operands[2]) >> 1); return "vextract64x2\t{%2, %1, %0|%0, %1, %2}"; } [(set_attr "type" "sselog1") @@ -7229,11 +7234,12 @@ (define_insn "avx512f_vext (match_operand 4 "const_0_to_15_operand") (match_operand 5 "const_0_to_15_operand")])))] "TARGET_AVX512F - && (INTVAL (operands[2]) == (INTVAL (operands[3]) - 1) - && INTVAL (operands[3]) == (INTVAL (operands[4]) - 1) - && INTVAL (operands[4]) == (INTVAL (operands[5]) - 1))" + && INTVAL (operands[2]) % 4 == 0 + && INTVAL (operands[2]) == INTVAL (operands[3]) - 1 + && INTVAL (operands[3]) == INTVAL (operands[4]) - 1 + && INTVAL (operands[4]) == INTVAL (operands[5]) - 1" { - operands[2] = GEN_INT ((INTVAL (operands[2])) >> 2); + operands[2] = GEN_INT (INTVAL (operands[2]) >> 2); return "vextract32x4\t{%2, %1, %0|%0, %1, %2}"; } [(set_attr "type" "sselog1") @@ -7260,9 +7266,10 @@ (define_expand "_vextrac "TARGET_AVX512F" { rtx (*insn)(rtx, rtx, rtx, rtx); + rtx dest = operands[0]; - if (MEM_P (operands[0]) && GET_CODE (operands[3]) == CONST_VECTOR) - operands[0] = force_reg (mode, operands[0]); + if (MEM_P (dest) && !rtx_equal_p (dest, operands[3])) + dest = force_reg (mode, dest); switch (INTVAL (operands[2])) { @@ -7276,7 +7283,9 @@ (define_expand "_vextrac gcc_unreachable (); } - emit_insn (insn (operands[0], operands[1], operands[3], operands[4])); + emit_insn (insn (dest, operands[1], operands[3], operands[4])); + if (dest != operands[0]) + emit_move_insn (operands[0], dest); DONE; }) @@ -7317,7 +7326,8 @@ (define_insn "vec_extract_lo_ || !(MEM_P (operands[0]) && MEM_P (operands[1])))" { if ( || !TARGET_AVX512VL) return "vextract64x4\t{$0x0, %1, %0|%0, %1, 0x0}"; @@ -7411,10 +7421,19 @@ (define_expand "avx512vl_vextractf128mode, operands[0]); - + if (MEM_P (dest) + && (GET_MODE_SIZE (GET_MODE_INNER (mode)) == 4 + /* For V8S[IF]mode there are maskm insns with =m and 0 + constraints. */ + ? !rtx_equal_p (dest, operands[3]) + /* For V4D[IF]mode, hi insns don't allow memory, and + lo insns have =m and 0C constraints. */ + : (operands[2] != const0_rtx + || (!rtx_equal_p (dest, operands[3]) + && GET_CODE (operands[3]) != CONST_VECTOR)))) + dest = force_reg (mode, dest); switch (INTVAL (operands[2])) { case 0: @@ -7427,7 +7446,9 @@ (define_expand "avx512vl_vextractf128 + +__m512d a; +__m256d b; + +void +foo (__m256d *p) +{ + *p = _mm512_mask_extractf64x4_pd (b, 1, a, 1); +}