[PR66388] Add sizetype cand for BIV of smaller type if it's used as index of memory ref

Hi,
This patch is a new approach to fix PR66388.  IVO today computes iv_use with
iv_cand which has at least same type precision as the use.  On 64bit
platforms like AArch64, this results in different iv_cand created for each
address type iv_use, and register pressure increased.  As a matter of fact,
the BIV should be used for all iv_uses in some of these cases.  It is a
latent bug but recently getting worse because of overflow changes.

The original approach at
https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01484.html can fix the issue
except it conflict with IV elimination.  Seems to me it is impossible to
mitigate the contradiction.

This new approach fixes the issue by adding sizetype iv_cand for BIVs
directly.  In cases if the original BIV is preferred, the sizetype iv_cand
will be chosen.  As for code generation, the sizetype iv_cand has the same
effect as the original BIV.  Actually, it's better because BIV needs to be
explicitly extended to sizetype to be used in address expression on most
targets.

One shortage of this approach is it may introduce more iv candidates.  To
minimize the impact, this patch does sophisticated code analysis and adds
sizetype candidate for BIV only if it is used as index.  Moreover, it avoids
to add candidate of the original type if the BIV is only used as index.
Statistics for compiling spec2k6 shows increase of candidate number is
modest and can be ignored.

There are two more patches following to fix corner cases revealed by this
one.  In together they bring obvious perf improvement for spec26k/int on
aarch64.
Spec2k6/int
400.perlbench	3.44%
445.gobmk	-0.86%
456.hmmer	14.83%
458.sjeng	2.49%
462.libquantum	-0.79%
GEOMEAN         1.68%

There is also about 0.36% improvement for spec2k6/fp, mostly because of case
436.cactusADM.  I believe it can be further improved, but that should be
another patch.

I also collected benchmark data for x86_64.  Spec2k6/fp is not affected.  As
for spec2k6/int, though the geomean is improved slightly, 400.perlbench is
regressed by ~3%.  I can see BIVs are chosen for some loops instead of
address candidates.  Generally, the loop header will be simplified because
iv elimination with BIV is simpler; the number of instructions in loop body
isn't changed.  I suspect the regression comes from different addressing
modes.  With BIV, complex addressing mode like [base + index << scale +
disp] is used, rather than [base + disp].  I guess the former has more
micro-ops, thus more expensive.  This guess can be confirmed by manually
suppressing the complex addressing mode with higher address cost.
Now the problem becomes why overall cost of BIV is computed lower while the
actual cost is higher.  I noticed for most affected loops, loop header is
bloated because of iv elimination using the old address candidate.  The
bloated loop header results in much higher cost than BIV.  As a result, BIV
is preferred.  I also noticed the bloated loop header generally can be
simplified (I have a following patch for this).  After applying the local
patch, the old address candidate is chosen, and most of regression is
recovered.
Conclusion is I think loop header bloated issue should be blamed for the
regression, and it can be resolved.

Bootstrap and test on x64_64 and aarch64.  It fixes failure of
gcc.target/i386/pr49781-1.c, without new breakage.

So what do you think?

Thanks,
bin

2015-08-31  Bin Cheng  <bin.cheng@arm.com>

	* tree-affine.c (aff_combination_expand): New parameters.
	(tree_to_aff_combination_expand): Ditto.
	* tree-affine.h (aff_combination_expand): New declaration.
	(tree_to_aff_combination_expand): Ditto.
	* tree-ssa-loop-ivopts.c (struct iv, iv_cand): New fields.
	(dump_iv): Dump no_overflow information.
	(alloc_iv): Initialize new field for struct iv.
	(struct expand_data): New struct for affine combination expanding.
	(stop_expand): New callback func for affine combination expanding.
	(find_deriving_biv_for_iv, record_biv_for_address_use): New
functions.
	(idx_find_step): Call new functions above.
	(find_depends, add_candidate): New paramter.
	(add_iv_candidate_for_biv): Add sizetype cand for BIV.
	(get_computation_aff): Simplify convertion of cand for BIV.
	(get_computation_cost_at): Step cand's base if necessary.

[PR66388] Add sizetype cand for BIV of smaller type if it's used as index of memory ref

Commit Message

Comments

Patch