From patchwork Mon Nov 12 03:10:15 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Bergner X-Patchwork-Id: 996252 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-489685-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="VxRvFHeh"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 42tbP35LJVz9s55 for ; Mon, 12 Nov 2018 14:10:46 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to:cc :from:subject:date:mime-version:content-type :content-transfer-encoding:message-id; q=dns; s=default; b=qCBcD eysc6pZW/3LJNIQLnWoypGREnpCFPNpUq01kGFOF42t+scMh50dq28N9VvxZ0QeL lb4gqVs3/HeLhy++69NfH2g2QORfq1JbSG52qV8b0qgO9JyRYc/mYQImC5OmE76W +r6AylcUHwccPIdoZ+LN+4K3DplqC+iUey3XLI= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to:cc :from:subject:date:mime-version:content-type :content-transfer-encoding:message-id; s=default; bh=AAIC89/R5mH FrV2fuIgM9k/Ajdk=; b=VxRvFHehgUeL1AslQ5wtBVpoye/8PT19eNIS7uyMsco cLHqohd+3+d167wptyf0L/YpOtOdcheTXHRF3kEiScY38TgBhoNzJQbLV07Ec9Bn jLgksI1+Z8dMEUKnFN7x2syTMozJ+avD3Khfp8O+A+SiYaG7wXb3pOFUmn6yO9Gg = Received: (qmail 119490 invoked by alias); 12 Nov 2018 03:10:39 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 119475 invoked by uid 89); 12 Nov 2018 03:10:38 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-10.5 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_2, GIT_PATCH_3, KAM_ASCII_DIVIDERS, KAM_NUMSUBJECT, KAM_SHORT, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.2 spammy=HTo:U*rth, subreg, DImode, 8766 X-HELO: mx0a-001b2d01.pphosted.com Received: from mx0a-001b2d01.pphosted.com (HELO mx0a-001b2d01.pphosted.com) (148.163.156.1) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 12 Nov 2018 03:10:36 +0000 Received: from pps.filterd (m0098404.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.22/8.16.0.22) with SMTP id wAC3AUR2003136 for ; Sun, 11 Nov 2018 22:10:34 -0500 Received: from e33.co.us.ibm.com (e33.co.us.ibm.com [32.97.110.151]) by mx0a-001b2d01.pphosted.com with ESMTP id 2npvkagq8b-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Sun, 11 Nov 2018 22:10:30 -0500 Received: from localhost by e33.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Mon, 12 Nov 2018 03:10:21 -0000 Received: from b03cxnp08025.gho.boulder.ibm.com (9.17.130.17) by e33.co.us.ibm.com (192.168.1.133) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Mon, 12 Nov 2018 03:10:18 -0000 Received: from b03ledav006.gho.boulder.ibm.com (b03ledav006.gho.boulder.ibm.com [9.17.130.237]) by b03cxnp08025.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id wAC3AHaM16646202 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Mon, 12 Nov 2018 03:10:17 GMT Received: from b03ledav006.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 6A7F9C6057; Mon, 12 Nov 2018 03:10:17 +0000 (GMT) Received: from b03ledav006.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 82A5EC6059; Mon, 12 Nov 2018 03:10:16 +0000 (GMT) Received: from otta.local (unknown [9.85.156.246]) by b03ledav006.gho.boulder.ibm.com (Postfix) with ESMTP; Mon, 12 Nov 2018 03:10:16 +0000 (GMT) To: "ian@airs.com" , Richard Henderson Cc: Segher Boessenkool , GCC Patches From: Peter Bergner Subject: [PATCH][lower-subreg] Fix PR87507 Date: Sun, 11 Nov 2018 21:10:15 -0600 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 x-cbid: 18111203-0036-0000-0000-00000A59D95A X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00010030; HX=3.00000242; KW=3.00000007; PH=3.00000004; SC=3.00000270; SDB=6.01116159; UDB=6.00578822; IPR=6.00896268; MB=3.00024116; MTD=3.00000008; XFM=3.00000015; UTC=2018-11-12 03:10:19 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 18111203-0037-0000-0000-0000499B27CA Message-Id: <27077db7-b4bf-689c-2438-a5bab4753871@linux.ibm.com> X-IsSubscribed: yes PR87507 shows a problem where IRA assigns a non-volatile TImode reg pair to a pseudo when there is a volatile reg pair available to use. This then causes us to emit save/restore code for the non-volatile reg usage. My first attempt at solving this was to adjust the costs used by non-volatile registers, but Vlad was hesitant to accept the patch since this is a sensitive area that can cause performance issues: https://gcc.gnu.org/ml/gcc-patches/2018-10/msg00460.html Since we're late in stage1, I decided to try another solution that doesn't involve RA at all. The only reason the register pairs exist at the moment is that lower-subreg could not decompose them when compiling on ppc in LE mode. The reason is that for POWER8, some of the vector loads and stores do not properly byte swap the values being loaded/stores, so our mov patterns add an explicit rotate to the rtl insns. So the following: (insn (set (mem:TI (reg/v/f:DI 122)) (reg/v:TI 123))) is replaced with: (insn (set (reg:TI 129) (rotate:TI (reg/v:TI 123) (const_int 64)))) (insn (set (mem:TI (reg/v/f:DI 122)) (rotate:TI (reg:TI 129) (const_int 64)))) On BE, we are able to correctly decompose the TImode access into two DImode accesses, but on LE, lower-subreg doesn't see the accesses as simple moves anymore, and so fails to decompose them. However, is we look at what lower-subreg tries, it sees a: (insn (set (concatn/v:TI [(reg:DI 131 [src]) (reg:DI 132 [src+8])]) (rotate:TI (concatn:TI [(reg:DI 133) (reg:DI 134 [+8])]) (const_int 64)))) This is just a word sized rotate of a double word sized register pair and that is just equivalent to stripping the rotate and swapping the two registers like so: (insn (set (concatn/v:TI [(reg:DI 131 [src]) (reg:DI 132 [src+8])]) (concatn:TI [(reg:DI 134 [+8]) (reg:DI 133)]))) The following patch extends lower-subreg to recognize these word sized rotates as simple moves so that it can replace the rotates with swapped decomposed registers. This has passed bootstrap and regtesting on powerpc64le-linux with no regressions. Is this ok for mainline? Peter gcc/ PR rtl-optimization/87507 * lower-subreg.c (simple_move_operator): New function. (simple_move): Strip simple operators. (find_pseudo_copy): Likewise. (resolve_simple_move): Strip simple operators and swap operands. gcc/testsuite/ PR rtl-optimization/87507 * gcc.target/powerpc/pr87507.c: New test. * gcc.target/powerpc/pr68805.c: Update expected results. Index: gcc/lower-subreg.c =================================================================== --- gcc/lower-subreg.c (revision 265971) +++ gcc/lower-subreg.c (working copy) @@ -320,6 +320,24 @@ simple_move_operand (rtx x) return true; } +/* If X is an operator that can be treated as a simple move that we + can split, then return the operand that is operated on. */ + +static rtx +simple_move_operator (rtx x) +{ + /* A word sized rotate of a register pair is equivalent to swapping + the registers in the register pair. */ + if (GET_CODE (x) == ROTATE + && GET_MODE (x) == twice_word_mode + && simple_move_operand (XEXP (x, 0)) + && CONST_INT_P (XEXP (x, 1)) + && INTVAL (XEXP (x, 1)) == BITS_PER_WORD) + return XEXP (x, 0);; + + return NULL_RTX; +} + /* If INSN is a single set between two objects that we want to split, return the single set. SPEED_P says whether we are optimizing INSN for speed or size. @@ -330,7 +348,7 @@ simple_move_operand (rtx x) static rtx simple_move (rtx_insn *insn, bool speed_p) { - rtx x; + rtx x, op; rtx set; machine_mode mode; @@ -348,6 +366,9 @@ simple_move (rtx_insn *insn, bool speed_ return NULL_RTX; x = SET_SRC (set); + if ((op = simple_move_operator (x)) != NULL_RTX) + x = op; + if (x != recog_data.operand[0] && x != recog_data.operand[1]) return NULL_RTX; /* For the src we can handle ASM_OPERANDS, and it is beneficial for @@ -386,9 +407,13 @@ find_pseudo_copy (rtx set) { rtx dest = SET_DEST (set); rtx src = SET_SRC (set); + rtx op; unsigned int rd, rs; bitmap b; + if ((op = simple_move_operator (src)) != NULL_RTX) + src = op; + if (!REG_P (dest) || !REG_P (src)) return false; @@ -853,7 +878,7 @@ can_decompose_p (rtx x) static rtx_insn * resolve_simple_move (rtx set, rtx_insn *insn) { - rtx src, dest, real_dest; + rtx src, dest, real_dest, src_op; rtx_insn *insns; machine_mode orig_mode, dest_mode; unsigned int orig_size, words; @@ -876,6 +901,33 @@ resolve_simple_move (rtx set, rtx_insn * real_dest = NULL_RTX; + if ((src_op = simple_move_operator (src)) != NULL_RTX) + { + if (resolve_reg_p (dest)) + { + /* DEST is a CONCATN, so swap its operands and strip + SRC's operator. */ + rtx concatn = copy_rtx (dest); + rtx op0 = XVECEXP (concatn, 0, 0); + rtx op1 = XVECEXP (concatn, 0, 1); + XVECEXP (concatn, 0, 0) = op1; + XVECEXP (concatn, 0, 1) = op0; + dest = concatn; + src = src_op; + } + else if (resolve_reg_p (src_op)) + { + /* SRC is an operation on a CONCATN, so strip the operator and + swap the CONCATN's operands. */ + rtx concatn = copy_rtx (src_op); + rtx op0 = XVECEXP (concatn, 0, 0); + rtx op1 = XVECEXP (concatn, 0, 1); + XVECEXP (concatn, 0, 0) = op1; + XVECEXP (concatn, 0, 1) = op0; + src = concatn; + } + } + if (GET_CODE (src) == SUBREG && resolve_reg_p (SUBREG_REG (src)) && (maybe_ne (SUBREG_BYTE (src), 0) Index: gcc/testsuite/gcc.target/powerpc/pr87507.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/pr87507.c (nonexistent) +++ gcc/testsuite/gcc.target/powerpc/pr87507.c (working copy) @@ -0,0 +1,22 @@ +/* { dg-do compile { target powerpc64le-*-* } } */ +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power8" } } */ +/* { dg-options "-O2 -mcpu=power8" } */ + +typedef struct +{ + __int128_t x; + __int128_t y; +} foo_t; + +void +foo (long cond, foo_t *dst, __int128_t src) +{ + if (cond) + { + dst->x = src; + dst->y = src; + } +} + +/* { dg-final { scan-assembler-times {\mstd\M} 4 } } */ +/* { dg-final { scan-assembler-not {\mld\M} } } */ Index: gcc/testsuite/gcc.target/powerpc/pr68805.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/pr68805.c (revision 265971) +++ gcc/testsuite/gcc.target/powerpc/pr68805.c (working copy) @@ -9,7 +9,7 @@ typedef struct bar { void foo (TYPE *p, TYPE *q) { *p = *q; } -/* { dg-final { scan-assembler "lxvd2x" } } */ -/* { dg-final { scan-assembler "stxvd2x" } } */ +/* { dg-final { scan-assembler-times {\mld\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mstd\M} 2 } } */ /* { dg-final { scan-assembler-not "xxpermdi" } } */