[committed,PR,rtl-optimization/87761] Improve MIPS splitters to sometimes avoid unnecessary cross-unit register copies

I cry uncle!

So I finally started looking at the fpr-moves regression in this BZ.  No
surprise this is further fallout from the combiner changes.

Going into register allocation we have something like this:

(insn 13 7 14 2 (set (reg:TF 196)
        (reg:TF 44 $f12 [ d ])) "j.c":7:1 376 {*movtf}
     (expr_list:REG_DEAD (reg:TF 44 $f12 [ d ])
        (nil)))
(insn 14 13 10 2 (set (reg:DI 197)
        (reg:DI 6 $6 [ x ])) "j.c":7:1 313 {*movdi_64bit}
     (expr_list:REG_DEAD (reg:DI 6 $6 [ x ])
        (nil)))
(insn 10 14 15 2 (set (mem:TF (reg:DI 197) [1 *x_2(D)+0 S16 A128])
        (reg:TF 196)) "j.c":8:6 376 {*movtf}
     (expr_list:REG_DEAD (reg:DI 197)
        (expr_list:REG_DEAD (reg:TF 196)
            (nil))))

Prior to the combine changes, all of that would have been squashed into
a single nice insn.  One might reasonably ask if combine's avoidance of
hard regs should be loosened for consecutive insns -- combining
consecutive insns isn't going to increase the live range of these hard
regs.  I wandered around combine a bit and decided it wasn't really
feasible to restrict things that way.

So as a starting point, assume the RTL above is what we're going to have
to deal with.

We end up allocating (reg 196) into a GPR.  It's "cheaper".  So
post-reload it looks like:

> (insn 13 7 10 2 (set (reg:TF 2 $2 [196])
>         (reg:TF 44 $f12 [ d ])) "j.c":7:1 376 {*movtf}
>      (nil))
> (insn 10 13 15 2 (set (mem:TF (reg:DI 6 $6 [197]) [1 *x_2(D)+0 S16 A128])
>         (reg:TF 2 $2 [196])) "j.c":8:6 376 {*movtf}
>      (nil))

The MIPS splitter turns that into this mess:

> (insn 17 7 18 2 (set (reg:DI 2 $2 [196])
>         (unspec:DI [
>                 (reg:TF 44 $f12 [ d ])
>                 (const_int 0 [0])
>             ] UNSPEC_STORE_WORD)) "j.c":7:1 407 {store_wordtf}
>      (nil))
> (insn 18 17 19 2 (set (reg:DI 3 $3 [+8 ])
>         (unspec:DI [
>                 (reg:TF 44 $f12 [ d ])
>                 (const_int 1 [0x1])
>             ] UNSPEC_STORE_WORD)) "j.c":7:1 407 {store_wordtf}
>      (nil))
> (insn 19 18 20 2 (set (mem:DI (reg:DI 6 $6 [197]) [1 *x_2(D)+0 S8 A128])
>         (reg:DI 2 $2 [196])) "j.c":8:6 313 {*movdi_64bit}
>      (nil))
> (insn 20 19 15 2 (set (mem:DI (plus:DI (reg:DI 6 $6 [197])
>                 (const_int 8 [0x8])) [1 *x_2(D)+8 S8 A64])
>         (reg:DI 3 $3 [+8 ])) "j.c":8:6 313 {*movdi_64bit}
>      (nil))

Ugh.  Note the unspecs in the first two insns.  That's going to prevent
regcprop from doing its job with this mess.  I have no idea why we don't
represent this properly in RTL and use UNSPECs instead, but I'm willing
to assume it's for a good reason.

So one thought was to have a pass of regcprop earlier.  That fixes this
MIPS issue nicely *and* simplifies the solution to the fix-r4000
regression!  But regresses x86 in a couple ways.  The x86 regressions
can then be fixed by moving the REE pass to just before the early
regcprop pass.  That's real promising, so I throw it into the tester and
look for fallout.

Example fallout on visium where we have this after reload:

> (insn 2 5 11 2 (set (reg/v:QI 10 r10 [orig:66 a ] [66])
>         (reg:QI 1 r1 [74])) "k.c":7:1 1 {*movqi_insn}
>      (nil))
> (insn 11 2 10 2 (set (reg:QI 9 r9 [71])
>         (plus:QI (reg/v:QI 10 r10 [orig:66 a ] [66])
>             (reg/v:QI 2 r2 [orig:67 b ] [67]))) "k.c":8:10 28 {*addqi3_insn}
>      (nil))
> (insn 10 11 12 2 (set (reg:QI 1 r1 [orig:64 _7+1 ] [64])
>         (const_int 0 [0])) "k.c":8:10 1 {*movqi_insn}
>      (expr_list:REG_EQUAL (const_int 0 [0])
>         (nil)))
> (jump_insn 12 10 30 2 (set (pc)
>         (if_then_else (eq (reg:QI 9 r9 [71])
>                 (unspec:QI [
>                         (reg/v:QI 10 r10 [orig:66 a ] [66])
>                         (reg/v:QI 2 r2 [orig:67 b ] [67])
>                     ] UNSPEC_ADDV))
>             (label_ref:SI 17)
>             (pc))) "k.c":8:10 234 {*cbranchqi4_addv_insn}
>      (int_list:REG_BR_PROB 1073204964 (nil))

The early regcprop pass will propagate r1 for r10 in insn 11, but it
can't propagate into the use in insn 12 (because of the set of r1 in
insn 10).  So we regress visium in unpleasant ways (the operands of the
plus apparently need to match the jump at insn 12 to get the code we
want).  We can't move the cmpelim pass earlier because it depends on
splitting and the point of this approach is to run hard cprop before
splitting.  Note this was just one regression, rl78 regressed and likely
others would have as well if they had stronger target specific testsuites.

Anyway, we could look at doing more pass juggling, but I suspect we'd
just be permuting the problems.  We could perhaps do some pass
duplication, but that seems so wasteful of compile-time given the cases
we're looking at.

So I'm just punting.  In the MIPS splitter, we can peek ahead one real
insn and try to forward propagate the source operand at split time.  We
still generate the split insns as well.  So after post-reload splitting
we have something like this:

> (insn 17 7 18 2 (set (reg:DI 2 $2 [196])
>         (unspec:DI [
>                 (reg:TF 44 $f12 [ d ])
>                 (const_int 0 [0])
>             ] UNSPEC_STORE_WORD)) "j.c":7:1 407 {store_wordtf}
>      (nil))
> (insn 18 17 19 2 (set (reg:DI 3 $3 [+8 ])
>         (unspec:DI [
>                 (reg:TF 44 $f12 [ d ])
>                 (const_int 1 [0x1])
>             ] UNSPEC_STORE_WORD)) "j.c":7:1 407 {store_wordtf}
>      (nil))
> (insn 19 18 20 2 (set (mem:DI (reg:DI 6 $6 [197]) [1 *x_2(D)+0 S8 A128])
>         (unspec:DI [
>                 (reg:TF 44 $f12 [ d ])
>                 (const_int 0 [0])
>             ] UNSPEC_STORE_WORD)) "j.c":8:6 -1
>      (nil))
> (insn 20 19 15 2 (set (mem:DI (plus:DI (reg:DI 6 $6 [197])
>                 (const_int 8 [0x8])) [1 *x_2(D)+8 S8 A64])
>         (unspec:DI [
>                 (reg:TF 44 $f12 [ d ])
>                 (const_int 1 [0x1])
>             ] UNSPEC_STORE_WORD)) "j.c":8:6 -1
>      (nil))

Note how we split the second move too and the source operand is now
$f12.    insns 17 and 18 will get removed as dead by DCE and we
ultimately get the desired code.

But just to be clear, this is a gross hack to avoid playing wack-a-mole
with pass ordering and duplication.  If it triggers in real world code,
it should ultimately improve performance as it can make the cross unit
copies dead and gives the scheduler additional freedoms. But I don't
really expect it to be triggering much in real world codes.

It fixes fpr-moves-5 (noted as a regression in the BZ) and fpr-moves-6
(not noted as a regression in the BZ) on mips64-linux-gnu and
mips64el-linux-gnu.  Bootstrapped on mipsisa32r2-linux-gnu as well.

Installing on the trunk.

Jeff

ps.  Now back to the fix-r4000 regression :-)
PR rtl-optimization/87761
	* config/mips/mips-protos.h (mips_split_move): Add new argument.
	(mips_emit_move_or_split): Pass NULL for INSN into mips_split_move.
	(mips_split_move): Accept new INSN argument.  Try to forward SRC
	into the next instruction.
	(mips_split_move_insn): Pass INSN through to mips_split_move.

Message ID	b2864cfb-1b41-59e3-59cf-e5baf13ad5ad@redhat.com
State	New
Headers	show Return-Path: <gcc-patches-return-498325-incoming=patchwork.ozlabs.org@gcc.gnu.org> X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-498325-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=<UNKNOWN>) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="rd/n9+GW"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 44QsLP5L9Pz9sPg for <incoming@patchwork.ozlabs.org>; Sat, 23 Mar 2019 05:16:45 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:message-id:date:mime-version:content-type; q=dns; s= default; b=AyTM1FUduCaPu+RUyExMCigFTRvW4c5Dtwucz2zrkzr141HDr2QSU tyF7fP5ZPyJgif2Pc11Cj+a0zrofpzr96Tl8uZCgcPCGPj1gIBo2SMSxHvm9dmNK 517kT+tR/L3J5bUMxrqKOYgTZlx5UTxsYhAAvR38gb5NOeGwIgEl4o= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:message-id:date:mime-version:content-type; s= default; bh=IALhYlM9mOAaJ1la+oiPgHLi0kI=; b=rd/n9+GWAKMFKNiKIvSw YnHiEpxW92Nzl86Y0bZuD9NEG0f4mxgCRfo/yZ/I2YTBD7byk4TsvJ89KjJBk68r yZtaHTYS8MxVhQ3rVeC885n8NsBVv+x+Pbw5YF9rF/5SrQaK+6rVk8vLb9je8O+y ycoXiZW/i75Ac49t18R+MPA= Received: (qmail 122341 invoked by alias); 22 Mar 2019 18:16:36 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: <gcc-patches.gcc.gnu.org> List-Unsubscribe: <mailto:gcc-patches-unsubscribe-incoming=patchwork.ozlabs.org@gcc.gnu.org> List-Archive: <http://gcc.gnu.org/ml/gcc-patches/> List-Post: <mailto:gcc-patches@gcc.gnu.org> List-Help: <mailto:gcc-patches-help@gcc.gnu.org> Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 122328 invoked by uid 89); 22 Mar 2019 18:16:35 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-14.2 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, SPF_HELO_PASS autolearn=ham version=3.3.1 spammy=playing, jc, triggering, punting X-HELO: mx1.redhat.com Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 22 Mar 2019 18:16:33 +0000 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 4B05981F0C for <gcc-patches@gcc.gnu.org>; Fri, 22 Mar 2019 18:16:32 +0000 (UTC) Received: from localhost.localdomain (ovpn-112-72.rdu2.redhat.com [10.10.112.72]) by smtp.corp.redhat.com (Postfix) with ESMTP id 6C44A19C57 for <gcc-patches@gcc.gnu.org>; Fri, 22 Mar 2019 18:16:31 +0000 (UTC) From: Jeff Law <law@redhat.com> Openpgp: preference=signencrypt To: gcc-patches <gcc-patches@gcc.gnu.org> Subject: [committed] [PR rtl-optimization/87761] Improve MIPS splitters to sometimes avoid unnecessary cross-unit register copies Message-ID: <b2864cfb-1b41-59e3-59cf-e5baf13ad5ad@redhat.com> Date: Fri, 22 Mar 2019 12:16:30 -0600 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.4.0 MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="------------2D82779F4DFF905ED87C1D2E" X-IsSubscribed: yes
Series	[committed,PR,rtl-optimization/87761] Improve MIPS splitters to sometimes avoid unnecessary cross-unit register copies \| expand [committed,PR,rtl-optimization/87761] Improve MIPS splitters to sometimes avoid unnecessary cross-u…

[committed,PR,rtl-optimization/87761] Improve MIPS splitters to sometimes avoid unnecessary cross-unit register copies

Commit Message

Comments

Patch