diff mbox

[GCC8,01/33] Handle TRUNCATE between tieable modes in rtx_cost

Message ID VI1PR0802MB21763B21D53FFD66F71885F3E7190@VI1PR0802MB2176.eurprd08.prod.outlook.com
State New
Headers show

Commit Message

Bin Cheng April 18, 2017, 10:37 a.m. UTC
Hi,
This patch series rewrites parts of IVOPTs.  The change consists of below described parts:
  A) New cost computation model.  Currently, there are big amount code trying to understand
     tree expression and estimate its computation cost.  The model is designed long ago
     for generic tree expressions.  In order to process generic expression (even address
     expression of array/memory references), it has code for too many corner cases.  The
     problem is it's somehow impossible to handle all complicated expressions, even with
     complicated logic in functions like get_computation_cost_at, difference_cost,
     ptr_difference_cost, get_address_cost and so on...  The second problem is it's hard
     to keep cost model consistent among special cases.  As special cases being added
     from time to time, the model is no long unified any more.  There are cases that right
     cost results in bad code, or vice versa, wrong cost results in good code.  Finally,
     it's also difficult to add code for new cases.
     This patch introduces a new cost computation model by using tree affine.  Tree exprs
     are lowered to aff_tree which is simple arithmetic operation usually.  Code handling
     special cases is no longer necessary, which brings us quite simplicity.  It is also
     easier to compute consistent costs among different expressions using tree affine,
     which gives us a unified cost model.
     This change is implemented in [PATCH rewrite-cost-computation-*.txt].
  B) In rewriting both nonlinear iv_use and address iv_use, current code does bad association
     by mixing computation of invariant and induction.  This introduces inconsistency
     between cost computation and code generation because costs of invariant and induction
     are computed separately.  This also prevents loop inv from being hoisted out of loop.
     This change fixes the issue by re-associating invariant and induction parts separately
     for both nonlinear and address iv_use.
     This patch is implemented in two patches:
     [PATCH nonlinear-iv_use-rewrite-*.txt]
     [PATCH address-iv_use-rewrite-*.txt]
  C) Current implementation shares the same register pressure computation with RTL loop
     inv pass.  It has difficulty in handling (especially large) loop nest, and quite
     often generating too many candidates (especially for outer loops).  This change
     introduces new register pressure estimation.  The brief idea is to differentiate
     (hot) innermost loop and outer loop.  for (possibly hot) innermost loop, more registers
     are allowed as long as overall register pressure is within the range of number of
     target available registers.
     This change is implemented in below patches:
     [PATCH record-newly-used-inv_var-*.txt]
     [PATCH skip-non_int-phi-reg-pressure-*.txt]
     [PATCH ivopt-reg_pressure-model-*.txt]
  D) Other small refactors and improvements.  These will be described in each patch's review
     message.
  E) Patches allow better induction variable optimizations for vectorized loops.  These
     patches are blocked at the moment because current IVOPTs implementation can generate
     worse code on targets with limited addressing mode support.
     [PATCH range_info-for-vect_loop-niters-*.txt]
     [PATCH pr69710-*.txt]

As a bonus, issues like PR53090/PR71361 are now fixed with better code generation than what
the two PRs were expecting.

I collected spec2k6 data on my local AArch64 and X86_64 machines.  Overall FP is improved
+1% on both machines; while INT mainly remains neutral.  I think part of improvement comes
from IVOPTs itself, and rest of it comes from opportunities enabled as described by E).  Also It
would be great if other targets can run some benchmarks with this patch series in case of any
performance breakage.

The patch series is bootstrap and test on X86_64 and AArch64, no real regression found,
though some tests do need further adjustment.

As the start, this is the first patch of the series.  It simply handles TRUNCATE between
tieable modes in rtx_cost.  Since we don't need additional instruction for such truncate,
it simply return 0 cost.

Is it OK?

Thanks,
bin

2017-04-11  Bin Cheng  <bin.cheng@arm.com>

	* rtlanal.c (rtx_cost): Handle TRUNCATE between tieable modes.
From d9b17e5d303d5fb1c75f489753b4578f8c907453 Mon Sep 17 00:00:00 2001
From: Bin Cheng <binche01@e108451-lin.cambridge.arm.com>
Date: Mon, 27 Feb 2017 14:51:56 +0000
Subject: [PATCH 01/33] no_cost-for-tieable-type-truncate-20170220.txt

---
 gcc/rtlanal.c | 8 ++++++++
 1 file changed, 8 insertions(+)

Comments

Jeff Law April 24, 2017, 9:20 p.m. UTC | #1
On 04/18/2017 04:37 AM, Bin Cheng wrote:
> Is it OK?
> 
> Thanks,
> bin
> 
> 2017-04-11  Bin Cheng  <bin.cheng@arm.com>
> 
> 	* rtlanal.c (rtx_cost): Handle TRUNCATE between tieable modes.
This is fine.  You might consider adding tests for this kind of change, 
but I also realize they could end up being pretty fragile.  Hmm, maybe 
they would be better as unit tests of rtx_cost?

jeff
Eric Botcazou May 3, 2017, 6:17 a.m. UTC | #2
> 2017-04-11  Bin Cheng  <bin.cheng@arm.com>
> 
> 	* rtlanal.c (rtx_cost): Handle TRUNCATE between tieable modes.

This breaks bootstrap with RTL checking:

/home/eric/build/gcc/native/./gcc/xgcc -B/home/eric/build/gcc/native/./gcc/ -
nostdinc -x c /dev/null -S -o /dev/null -fself-
test=/home/eric/svn/gcc/gcc/testsuite/selftests
cc1: internal compiler error: RTL check: expected code 'subreg', have 
'truncate' in rtx_cost, at rtlanal.c:4169
0xbae338 rtl_check_failed_code1(rtx_def const*, rtx_code, char const*, int, 
char const*)
        /home/eric/svn/gcc/gcc/rtl.c:829
0xbbc9b4 rtx_cost(rtx_def*, machine_mode, rtx_code, int, bool)
        /home/eric/svn/gcc/gcc/rtlanal.c:4169
0x8517e6 set_src_cost
        /home/eric/svn/gcc/gcc/rtl.h:2685
0x8517e6 init_expmed_one_conv
        /home/eric/svn/gcc/gcc/expmed.c:142
0x8517e6 init_expmed_one_mode
        /home/eric/svn/gcc/gcc/expmed.c:209
0x853fb2 init_expmed()
        /home/eric/svn/gcc/gcc/expmed.c:270
0xc45974 backend_init_target
        /home/eric/svn/gcc/gcc/toplev.c:1665
0xc45974 initialize_rtl()
Bin.Cheng May 3, 2017, 8:38 a.m. UTC | #3
On Wed, May 3, 2017 at 7:17 AM, Eric Botcazou <ebotcazou@adacore.com> wrote:
>> 2017-04-11  Bin Cheng  <bin.cheng@arm.com>
>>
>>       * rtlanal.c (rtx_cost): Handle TRUNCATE between tieable modes.
>
> This breaks bootstrap with RTL checking:
>
> /home/eric/build/gcc/native/./gcc/xgcc -B/home/eric/build/gcc/native/./gcc/ -
> nostdinc -x c /dev/null -S -o /dev/null -fself-
> test=/home/eric/svn/gcc/gcc/testsuite/selftests
> cc1: internal compiler error: RTL check: expected code 'subreg', have
> 'truncate' in rtx_cost, at rtlanal.c:4169
> 0xbae338 rtl_check_failed_code1(rtx_def const*, rtx_code, char const*, int,
> char const*)
>         /home/eric/svn/gcc/gcc/rtl.c:829
> 0xbbc9b4 rtx_cost(rtx_def*, machine_mode, rtx_code, int, bool)
>         /home/eric/svn/gcc/gcc/rtlanal.c:4169
> 0x8517e6 set_src_cost
>         /home/eric/svn/gcc/gcc/rtl.h:2685
> 0x8517e6 init_expmed_one_conv
>         /home/eric/svn/gcc/gcc/expmed.c:142
> 0x8517e6 init_expmed_one_mode
>         /home/eric/svn/gcc/gcc/expmed.c:209
> 0x853fb2 init_expmed()
>         /home/eric/svn/gcc/gcc/expmed.c:270
> 0xc45974 backend_init_target
>         /home/eric/svn/gcc/gcc/toplev.c:1665
> 0xc45974 initialize_rtl()
>
Sorry for disturbing, I will revert this if can't fix today.

Thanks,
bin
> --
> Eric Botcazou
diff mbox

Patch

diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c
index acb4230..6019c3e 100644
--- a/gcc/rtlanal.c
+++ b/gcc/rtlanal.c
@@ -4146,6 +4146,14 @@  rtx_cost (rtx x, machine_mode mode, enum rtx_code outer_code,
 	return COSTS_N_INSNS (2 + factor);
       break;
 
+    case TRUNCATE:
+      /* If we can tie these modes, make this cheap.  */
+      if (MODES_TIEABLE_P (mode, GET_MODE (SUBREG_REG (x))))
+	{
+	  total = 0;
+	  break;
+	}
+      /* FALLTHRU */
     default:
       if (targetm.rtx_costs (x, mode, outer_code, opno, &total, speed))
 	return total;