PATCH: Add vzeroupper optimization for AVX

Hi,

This patch adds vzeroupper optimization for AVX, which is very important
for 256bit AVX instructions. Otherwise AVX-SSE transition penalty may
kill 256bit AVX vector performance. I am enclosing improvement of
vzeroupper optimization on SPEC CPU 2K/2006 when 256bit AVX vectorizer
is enabled. The data shows that tonto is improved by 59%, wrf by 25%,
sphinx3 by 19%, GemsFDTD by 11% and gamess by 11%.

At RTL expansion time, the vzeroupper optimization generates a
vzeroupper_nop before function call and functin return if 256bit AVX
instructions are used. The vzeroupper pass is run before final pass.
It scans all reachable blocks:

1. Remove vzeroupper_nop when:
    a. The upper 128bits of AVX regiters are known dead.
    b. The upper 128bits of AVX regiters are live and used for parameter
    passing. We need to know if callee returns 256bit AVX registers
    to decide if the upper 128bits of AVX regiters are live after
    callee returns.
2. Move vzeroupper_nop right before function return call/return. It is
needed since various passes may move 256bit vector instructions across
vzeroupper_nop and I can't find a way to describe vzeroupper_nop other
than as UNSPECV.  I can't say it clobbers all 256bit AVX regiters since
it isn't true.  I can't describe it clears upper 128bits of all AVX
regiters since register allocator will try allocate all AVX regiters
for vzeroupper_nop.

OK for trunk?

Thanks.

H.J.
---
gcc/

2010-10-22  H.J. Lu  <hongjiu.lu@intel.com>

	* config/i386/i386-protos.h (init_cumulative_args): Add an int.

	* config/i386/i386.c: Include "tree-pass.h".
	(block_info): New.
	(BLOCK_INFO): Likewise.
	(RTX_VZEROUPPER_CALLEE_RETURN_AVX256): Likewise.
	(RTX_VZEROUPPER_CALLEE_RETURN_PASS_AVX256): Likewise.
	(RTX_VZEROUPPER_CALLEE_PASS_AVX256): Likewise.
	(RTX_VZEROUPPER_NO_AVX256): Likewise.
	(check_avx256_stores): Likewise.
	(move_or_delete_vzeroupper_2): Likewise.
	(move_or_delete_vzeroupper_1): Likewise.
	(move_or_delete_vzeroupper): Likewise.
	(rest_of_handle_vzeroupper): Likewise.
	(gate_handle_vzeroupper): Likewise.
	(pass_vzeroupper): Likewise.
	(use_avx256_p): Likewise.
	(function_pass_avx256_p): Likewise.
	(flag_opts): Add -mvzeroupper.
	(ix86_option_override_internal): Turn on MASK_VZEROUPPER by
	default for TARGET_AVX.  Turn off MASK_VZEROUPPER if TARGET_AVX
	is disabled.  Register pass_vzeroupper for TARGET_VZEROUPPER.
	(ix86_function_ok_for_sibcall): Disable sibcall if we need to
	generate vzeroupper.
	(init_cumulative_args): Add an int to indicate caller.  Set
	use_avx256_p, callee_return_avx256_p and caller_use_avx256_p
	based on return type.
	(ix86_function_arg): Set use_avx256_p, callee_pass_avx256_p and
	caller_pass_avx256_p based on argument type.
	(ix86_expand_epilogue): Emit vzeroupper if 256bit AVX register
	is used, but not returned by caller.
	(ix86_expand_call): Emit vzeroupper if 256bit AVX register is
	used.
	(ix86_local_alignment): Set use_avx256_p if 256bit AVX register
	is used.
	(ix86_minimum_alignment): Likewise.

	* config/i386/i386.h (ix86_args): Add caller.
	(INIT_CUMULATIVE_ARGS): Updated.
	(machine_function): Add use_vzeroupper_p, use_avx256_p,
	caller_pass_avx256_p, caller_return_avx256_p,
	callee_pass_avx256_p and callee_return_avx256_p.

	* config/i386/i386.md (UNSPECV_VZEROUPPER_NOP): New.
	* config/i386/sse.md (avx_vzeroupper_nop): Likewise.

	* config/i386/i386.opt (-mvzeroupper): New.

	* doc/invoke.texi: Document -mvzeroupper.

	* timevar.def (TV_VZEROUPPER): New.

gcc/testsuite/

2010-10-22  H.J. Lu  <hongjiu.lu@intel.com>

	* gcc.target/i386/avx-vzeroupper-1.c: Add -mtune=generic.
	* gcc.target/i386/avx-vzeroupper-2.c: Likewise.

	* gcc.target/i386/avx-vzeroupper-3.c: New.
	* gcc.target/i386/avx-vzeroupper-4.c: Likewise.
	* gcc.target/i386/avx-vzeroupper-5.c: Likewise.
	* gcc.target/i386/avx-vzeroupper-6.c: Likewise.
	* gcc.target/i386/avx-vzeroupper-7.c: Likewise.
	* gcc.target/i386/avx-vzeroupper-8.c: Likewise.
	* gcc.target/i386/avx-vzeroupper-9.c: Likewise.
	* gcc.target/i386/avx-vzeroupper-10.c: Likewise.
	* gcc.target/i386/avx-vzeroupper-11.c: Likewise.
	* gcc.target/i386/avx-vzeroupper-12.c: Likewise.
	* gcc.target/i386/avx-vzeroupper-13.c: Likewise.
	* gcc.target/i386/avx-vzeroupper-14.c: Likewise.

PATCH: Add vzeroupper optimization for AVX

Commit Message

Comments

Patch