[05/31] VAX: Rationalize expression and address costs

Expression costs are required to be given in terms of COSTS_N_INSNS (n),
which is defined to stand for the count of single fast instructions, and
actually returns `n * 4'.  The VAX backend however instead operates on
naked numbers, causing an anomaly for the integer const zero rtx, where
the cost given is 4 as opposed to 1 for integers in the [1:63] range, as
well as -1 for comparisons.  This is because the value of 0 returned by
`vax_rtx_costs' is converted to COSTS_N_INSNS (1) in `pattern_cost':

  return cost > 0 ? cost : COSTS_N_INSNS (1);

Consequently, where feasible, 1 or -1 are preferred over 0 by the middle
end causing code pessimization, e.g. rather than producing this:

	subl2 $4,%sp
	movl 4(%ap),%r0
	jgtr .L2
	addl2 $2,%r0
.L2:
	ret

or this:

	subl2 $4,%sp
	addl3 4(%ap),8(%ap),%r0
	jlss .L6
	addl2 $2,%r0
.L6:
	ret

code is produced like this:

	subl2 $4,%sp
	movl 4(%ap),%r0
	cmpl %r0,$1
	jgeq .L2
	addl2 $2,%r0
.L2:
	ret

or this:

	subl2 $4,%sp
	addl3 4(%ap),8(%ap),%r0
	cmpl %r0,$-1
	jleq .L6
	addl2 $2,%r0
.L6:
	ret

from this:

int
compare_mov (int x)
{
  if (x > 0)
    return x;
  else
    return x + 2;
}

and this:

int
compare_add (int x, int y)
{
  int z;

  z = x + y;
  if (z < 0)
    return z;
  else
    return z + 2;
}

respectively, which is slower and larger both at a time.

Furthermore once the backend is converted to MODE_CC this anomaly makes
it usually impossible to remove redundant comparisons in the comparison
elimination pass, because most VAX instructions set the condition codes
as per the relation of the instruction's result to 0 and not -1.

The middle end has some other assumptions as to rtx costs being given in
terms of COSTS_N_INSNS, so wrap all the VAX rtx costs then as they stand
into COSTS_N_INSNS invocations, effectively scaling the costs by 4 while
preserving their relative values, except for the integer const zero rtx
given the value of `COSTS_N_INSNS (1) / 2', half of a fast instruction
(this can be further halved if needed in the future).

Adjust address costs likewise so that they remain proportional to the
new absolute values of rtx costs.

Code size stats are as follows, collected from 17639 executables built
in `check-c' GCC testing:

              samples average  median
--------------------------------------
regressions      1420  0.400%  0.195%
unchanged       13811  0.000%  0.000%
progressions     2408 -0.504% -0.201%
--------------------------------------
total           17639 -0.037%  0.000%

with a small number of outliers only (over 5% size change):

old     new     change  %change filename
----------------------------------------------------
4991    5249     258     5.1693 981001-1.exe
2637    2777     140     5.3090 interchange-6.exe
2187    2307     120     5.4869 sprintf.x7
3969    4197     228     5.7445 pr28982a.exe
8264    8816     552     6.6795 vector-compare-1.exe
5199    5575     376     7.2321 pr28982b.exe
2113    2411     298    14.1031 20030323-1.exe
2113    2411     298    14.1031 20030323-1.exe
2113    2411     298    14.1031 20030323-1.exe

so it seems we are looking good, and we have complementing reductions
to compensate:

old     new     change  %change filename
----------------------------------------------------
2919    2631    -288    -9.8663 pr57521.exe
3427    3167    -260    -7.5868 sabd_1.exe
2985    2765    -220    -7.3701 ssad-run.exe
2985    2765    -220    -7.3701 ssad-run.exe
2985    2765    -220    -7.3701 usad-run.exe
2985    2765    -220    -7.3701 usad-run.exe
4509    4253    -256    -5.6775 vshuf-v2sf.exe
4541    4285    -256    -5.6375 vshuf-v2si.exe
4673    4417    -256    -5.4782 vshuf-v2df.exe
2993    2841    -152    -5.0785 abs-2.x4
2993    2841    -152    -5.0785 abs-3.x4

This actually causes `loop-8.c' to regress:

FAIL: gcc.dg/loop-8.c scan-rtl-dump-times loop2_invariant "Decided" 1
FAIL: gcc.dg/loop-8.c scan-rtl-dump-not loop2_invariant "without introducing a new temporary register"

but upon a closer inspection this is a red herring.  Old code looks as
follows:

	.file	"loop-8.c"
	.text
	.align 1
.globl f
	.type	f, @function
f:
	.word 0
	subl2 $4,%sp
	movl 4(%ap),%r2
	movl 8(%ap),%r3
	movl $42,(%r2)
	clrl %r0
	movl $42,%r1
	movl %r1,%r4
	jbr .L2
.L5:
	movl %r4,%r1
.L2:
	movl %r1,(%r3)[%r0]
	incl %r0
	cmpl %r0,$100
	jeql .L6
	movl $42,(%r2)[%r0]
	bicl3 $-2,%r0,%r1
	jeql .L5
	movl %r0,%r1
	jbr .L2
.L6:
	ret
	.size	f, .-f

while new one is like below:

	.file	"loop-8.c"
	.text
	.align 1
.globl f
	.type	f, @function
f:
	.word 0
	subl2 $4,%sp
	movl 4(%ap),%r2
	movl $42,(%r2)+
	movl 8(%ap),%r1
	clrl %r0
	movl $42,%r3
	movzbl $100,%r4
	movl %r3,%r5
	jbr .L2
.L5:
	movl %r5,%r3
.L2:
	movl %r3,(%r1)+
	incl %r0
	cmpl %r0,%r4
	jeql .L6
	movl $42,(%r2)+
	bicl3 $-2,%r0,%r3
	jeql .L5
	movl %r0,%r3
	jbr .L2
.L6:
	ret
	.size	f, .-f

and is clearly better: not only it is smaller, but it also uses the
post-increment rather than indexed addressing mode in the loop, of
which the former comes for free in terms of both performance and code
size while the latter causes an extra byte per operand to be produced
for the index register and also incurs an execution penalty for the
extra address calculation.

Exclude the case from VAX testing then, as already done for some other
targets and discussed with commit d242fdaec186 ("gcc.dg/loop-8.c: Skip
for mmix.").

	gcc/
	* config/vax/vax.c (vax_address_cost): Express the cost in terms
	of COSTS_N_INSNS.
	(vax_rtx_costs): Likewise.

	gcc/testsuite/
	* gcc.dg/loop-8.c: Exclude for `vax-*-*'.
	* gcc.target/vax/compare-add-zero.c: New test.
	* gcc.target/vax/compare-mov-zero.c: New test.
---
 gcc/config/vax/vax.c                            | 110 ++++++++++++------------
 gcc/testsuite/gcc.dg/loop-8.c                   |   2 +-
 gcc/testsuite/gcc.target/vax/compare-add-zero.c |  27 ++++++
 gcc/testsuite/gcc.target/vax/compare-mov-zero.c |  24 ++++++
 4 files changed, 109 insertions(+), 54 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/vax/compare-add-zero.c
 create mode 100644 gcc/testsuite/gcc.target/vax/compare-mov-zero.c

Message ID	alpine.LFD.2.21.2011200235010.656242@eddie.linux-mips.org
State	Accepted
Headers	show Return-Path: <gcc-patches-bounces@gcc.gnu.org> X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces@gcc.gnu.org; receiver=<UNKNOWN>) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=linux-mips.org Received: from sourceware.org (unknown [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4CchxY22Ggz9sSs for <incoming@patchwork.ozlabs.org>; Fri, 20 Nov 2020 14:34:41 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 1CB9E3959C34; Fri, 20 Nov 2020 03:34:38 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from cvs.linux-mips.org (eddie.linux-mips.org [148.251.95.138]) by sourceware.org (Postfix) with ESMTP id 2D3C13959C2C for <gcc-patches@gcc.gnu.org>; Fri, 20 Nov 2020 03:34:34 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 2D3C13959C2C Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux-mips.org Authentication-Results: sourceware.org; spf=none smtp.mailfrom=macro@linux-mips.org Received: from localhost.localdomain ([127.0.0.1]:41502 "EHLO localhost" rhost-flags-OK-OK-OK-OK) by eddie.linux-mips.org with ESMTP id S23990422AbgKTDed5WtoX (ORCPT <rfc822;gcc-patches@gcc.gnu.org>); Fri, 20 Nov 2020 04:34:33 +0100 Date: Fri, 20 Nov 2020 03:34:33 +0000 (GMT) From: "Maciej W. Rozycki" <macro@linux-mips.org> To: gcc-patches@gcc.gnu.org Subject: [PATCH 05/31] VAX: Rationalize expression and address costs In-Reply-To: <alpine.LFD.2.21.2010251020560.866917@eddie.linux-mips.org> Message-ID: <alpine.LFD.2.21.2011200235010.656242@eddie.linux-mips.org> References: <alpine.LFD.2.21.2010251020560.866917@eddie.linux-mips.org> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Spam-Status: No, score=-8.5 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_ASCII_DIVIDERS, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KHOP_HELO_FCRDNS, SCC_5_SHORT_WORD_LINES, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org> List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe> List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/> List-Post: <mailto:gcc-patches@gcc.gnu.org> List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help> List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe> Cc: Anders Magnusson <ragge@tethuvudet.se> Errors-To: gcc-patches-bounces@gcc.gnu.org Sender: "Gcc-patches" <gcc-patches-bounces@gcc.gnu.org>
Series	VAX: Bring the port up to date (yes, MODE_CC conversion is included) \| expand [00/31] VAX: Bring the port up to date (yes, MODE_CC conversion is included) [01/31] PR target/58901: reload: Handle SUBREG of MEM with a mode-dependent address [02/31] VAX: Remove `c' operand format specifier overload [03/31] VAX: Define LEGITIMATE_PIC_OPERAND_P [04/31] VAX/testsuite: Run target testing over all the usual optimization levels [05/31] VAX: Rationalize expression and address costs [06/31] VAX: Correct fatal issues with the `ffs' builtin [07/31] RTL: Also support HOST_WIDE_INT with int iterators [08/31] jump: Also handle jumps wrapped in UNSPEC or UNSPEC_VOLATILE [09/31] VAX: Use a mode iterator to produce individual interlocked branches [10/31] VAX: Use an int iterator to produce individual interlocked branches [11/31] VAX: Correct `sync_lock_test_and_set' and `sync_lock_release' builtins [12/31] VAX: Actually enable `builtins.md' now that it is fully functional [13/31] VAX: Add a test for the SImode `ffs' operation [14/31] VAX: Add tests for `sync_lock_test_and_set' and `sync_lock_release' [15/31] VAX: Provide the `ctz' operation [16/31] VAX: Also provide QImode and HImode `ctz' and `ffs' operations [17/31] VAX: Actually produce QImode and HImode `ctz' operations [18/31] VAX: Add a test for the `cpymemhi' instruction [19/31] VAX: Add the `movmemhi' instruction [20/31] VAX: Fix predicates and constraints for EXTV/EXTZV/INSV insns [21/31] VAX: Remove EXTV/EXTZV/INSV instruction use from aligned case insns [22/31] VAX: Ensure PIC mode address is adjustable with aligned bitfield insns [23/31] VAX: Make `extv' an expander matching the remaining bitfield operations [24/31] VAX: Fix predicates and constraints for bitfield comparison insns [25/31] VAX: Fix predicates for widening multiply and multiply-add insns [26/31] VAX: Correct issues with commented-out insns [27/31] VAX: Make the `divmoddisi4' and `*amulsi4' comment notation consistent [28/31] RTL: Add `const_double_zero' syntactic rtx [29/31] PDP11: Use `const_double_zero' to express double zero constant [30/31] PR target/95294: VAX: Convert backend to MODE_CC representation [31/31] PR target/95294: VAX: Add test cases for MODE_CC representation

[05/31] VAX: Rationalize expression and address costs

Commit Message

Comments

Patch