From patchwork Sun Feb 3 16:47:10 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Uros Bizjak X-Patchwork-Id: 1035622 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-495200-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="njx5JjUC"; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="dgw52pcU"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 43sxbD3f25z9s4V for ; Mon, 4 Feb 2019 03:47:34 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:from:date:message-id:subject:to:cc:content-type; q=dns; s=default; b=LTCDdrryvLYnzyggiKcamyMml8PVDwGXAFFtVBOf/AP EijUWtMnQD5yMJGOo9G+DaxfvCBfKYV+yrGrgK9eA25+dFhlpnRrLj7ok17Y2PYD C8u8ujoYACi1e5n2XkUn6rQ0zg8K4agnd4t7eUBxTLmO8zVH8FE2GINm5loaKqWs = DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:from:date:message-id:subject:to:cc:content-type; s=default; bh=MSEbmhFDzteeq/H6tRGEOjq7d9w=; b=njx5JjUCFgG6MIdgm GZiPFiA+gCyInUVMag2fPtZiuOpBECJiL/dWMm1miv+Mxt84DWrKhnojVw9iOtIS hTDkeVa711I3zSPdDM9ZzjR76wNvbpbEVei0npGnv1s2/BC+5AgKphY1iLRepRAW KpD9W8L9qToDRtWzLWOKKs/9Vk= Received: (qmail 113167 invoked by alias); 3 Feb 2019 16:47:27 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 113154 invoked by uid 89); 3 Feb 2019 16:47:26 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-26.9 required=5.0 tests=BAYES_00, FREEMAIL_FROM, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_NONE, SPF_PASS autolearn=ham version=3.3.2 spammy=HX-Gm-Message-State:sk:AHQUAuY, ff, nnn X-HELO: mail-io1-f42.google.com Received: from mail-io1-f42.google.com (HELO mail-io1-f42.google.com) (209.85.166.42) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Sun, 03 Feb 2019 16:47:23 +0000 Received: by mail-io1-f42.google.com with SMTP id b23so9968322ios.10 for ; Sun, 03 Feb 2019 08:47:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to:cc; bh=L7XGF6fIaV1zR+HLIYkQJster3Yh3DbNDEae3+PVKK0=; b=dgw52pcUm85lA1dgNAMCz9ms03bNGt43hGmzrMDQyPKjCo4uUVY6Ys0iL45fwGHvnj EnFJQ3kS8yEwAxVurxn2DovXRZjrRl1C4thx9QIvzdAT/kOWLwFVIh4J7GShYazZ75G4 sMHUTJbQxymDuDSHTMARcei2mApyfa22g2K1QRZMTJ9VVCCW5yMbQKsnM0zaSevcl0Tu 7WAoEaTGNsxHqv7MuXGWoff1sbc34ZDoK+bz4rNizQlgyyEQUpm78yGRfHC1Kjv9eMBI 2nUWQc5l1ll6c03pzpDFeIepaefFbLxa1SmU0e5i5BPpoLwHFHRuvaZOhf11a3wANMuO hw/Q== MIME-Version: 1.0 From: Uros Bizjak Date: Sun, 3 Feb 2019 17:47:10 +0100 Message-ID: Subject: [PATCH, i386]: (Partially) fix PR89074, break SSE reg dependency for a few scalar insns To: "gcc-patches@gcc.gnu.org" Cc: "H. J. Lu" Following patch may help with partial SSE reg dependencies for {R,}SQRTS{S,D}, RCPS{S,D} and ROUNDS{S,D} instructions. It takes the same strategy as both ICC and clang take, that is: a) load from memory with MOVS{S,D} and b) in case of SSE, match input and output register. The implementation uses preferred_for_speed attribute, so in cold sections or when compiled with -Os, the compiler is still able to create direct load from memory (SSE, AVX) and use unmatched registers for SSE targets. The sqrt from memory is now compiled to: movsd z(%rip), %xmm0 sqrtsd %xmm0, %xmm0 (SSE) or vmovsd z(%rip), %xmm1 vsqrtsd %xmm1, %xmm1, %xmm0 (AVX). And sqrt from unmatched input register will compile to: sqrtsd %xmm1, %xmm1 movapd %xmm1, %xmm0 (SSE) or vsqrtsd %xmm1, %xmm1, %xmm0 (AVX). The patch doesn't touch conversion instructions, where XOR clearing is preferred (pending patch for PR 87007). 2019-02-03 Uroš Bizjak PR target/89071 * config/i386/i386.md (*sqrt2_sse): Add (v,0) alternative. Do not prefer (v,v) alternative for non-AVX targets and (m,v) alternative for speed when TARGET_SSE_PARTIAL_REG_DEPENDENCY is set. (*rcpsf2_sse): Ditto. (*rsqrtsf2_sse): Ditto. (sse4_1_round2_sse" - [(set (match_operand:MODEF 0 "register_operand" "=v,v") + [(set (match_operand:MODEF 0 "register_operand" "=v,v,v") (sqrt:MODEF - (match_operand:MODEF 1 "nonimmediate_operand" "v,m")))] + (match_operand:MODEF 1 "nonimmediate_operand" "0,v,m")))] "SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH" "@ + %vsqrt\t{%d1, %0|%0, %d1} %vsqrt\t{%d1, %0|%0, %d1} %vsqrt\t{%1, %d0|%d0, %1}" [(set_attr "type" "sse") @@ -15039,9 +15056,13 @@ (set_attr "btver2_sse_attr" "sqrt") (set_attr "prefix" "maybe_vex") (set_attr "mode" "") - (set_attr "athlon_decode" "*") - (set_attr "amdfam10_decode" "*") - (set_attr "bdver1_decode" "*")]) + (set (attr "preferred_for_speed") + (cond [(eq_attr "alternative" "1") + (symbol_ref "TARGET_AVX || !TARGET_SSE_PARTIAL_REG_DEPENDENCY") + (eq_attr "alternative" "2") + (symbol_ref "!TARGET_SSE_PARTIAL_REG_DEPENDENCY") + ] + (symbol_ref "true")))]) (define_expand "sqrt2" [(set (match_operand:MODEF 0 "register_operand") @@ -16175,21 +16196,30 @@ (define_insn "sse4_1_round2" - [(set (match_operand:MODEF 0 "register_operand" "=x,x,v") - (unspec:MODEF [(match_operand:MODEF 1 "nonimmediate_operand" "x,m,vm") - (match_operand:SI 2 "const_0_to_15_operand" "n,n,n")] - UNSPEC_ROUND))] + [(set (match_operand:MODEF 0 "register_operand" "=x,x,x,v") + (unspec:MODEF + [(match_operand:MODEF 1 "nonimmediate_operand" "0,x,m,vm") + (match_operand:SI 2 "const_0_to_15_operand" "n,n,n,n")] + UNSPEC_ROUND))] "TARGET_SSE4_1" "@ + %vround\t{%2, %d1, %0|%0, %d1, %2} %vround\t{%2, %d1, %0|%0, %d1, %2} %vround\t{%2, %1, %d0|%d0, %1, %2} vrndscale\t{%2, %1, %d0|%d0, %1, %2}" [(set_attr "type" "ssecvt") - (set_attr "prefix_extra" "1,1,*") - (set_attr "length_immediate" "*,*,1") - (set_attr "prefix" "maybe_vex,maybe_vex,evex") - (set_attr "isa" "noavx512f,noavx512f,avx512f") - (set_attr "mode" "")]) + (set_attr "prefix_extra" "1,1,1,*") + (set_attr "length_immediate" "*,*,*,1") + (set_attr "prefix" "maybe_vex,maybe_vex,maybe_vex,evex") + (set_attr "isa" "noavx512f,noavx512f,noavx512f,avx512f") + (set_attr "mode" "") + (set (attr "preferred_for_speed") + (cond [(eq_attr "alternative" "1") + (symbol_ref "TARGET_AVX || !TARGET_SSE_PARTIAL_REG_DEPENDENCY") + (eq_attr "alternative" "2") + (symbol_ref "!TARGET_SSE_PARTIAL_REG_DEPENDENCY") + ] + (symbol_ref "true")))]) (define_insn "rintxf2" [(set (match_operand:XF 0 "register_operand" "=f")