diff mbox series

[middle-end,i386,Version,4] Add -fzero-call-used-regs=[skip|used-gpr-arg|used-arg|all-arg|used-gpr|all-gpr|used|all]

Message ID 9A5D01AB-BCE8-45E7-B08C-69F5E2E3B421@ORACLE.COM
State New
Headers show
Series [middle-end,i386,Version,4] Add -fzero-call-used-regs=[skip|used-gpr-arg|used-arg|all-arg|used-gpr|all-gpr|used|all] | expand

Commit Message

Qing Zhao Oct. 24, 2020, 4:04 p.m. UTC
Hi, 

This is the 4th version of the implementation of patch -fzero-call-used-regs.

The major change compared to the previous version are:

1.  Documentation change per Richard’s suggestion;
2.  Command sub options handling per Richard’s suggestion;
3.  I386 part, clearing ST registers per Uros’s suggestion;
4. Some minor spelling and style fix.

I have tested this new GCC on both x86 and arm64, no regression. 

Please let me know whether it’s ready for stage 1 gcc11?

Thanks.


******The changelog:

gcc/ChangeLog:

2020-10-24  Qing Zhao  <qing.zhao@oracle.com>
	    H.J.Lu  <hjl.tools@gmail.com>

	* common.opt: Add new option -fzero-call-used-regs
	* config/i386/i386.c (zero_call_used_regno_p): New function.
	(zero_call_used_regno_mode): Likewise.
	(zero_all_vector_registers): Likewise.
	(zero_all_st_mm_registers): Likewise.
	(ix86_zero_call_used_regs): Likewise.
	(TARGET_ZERO_CALL_USED_REGS): Define.
	* df-scan.c (df_epilogue_uses_p): New function.
	(df_get_exit_block_use_set): Replace EPILOGUE_USES with
	df_epilogue_uses_p.
	* df.h (df_epilogue_uses_p): Declare.
	* doc/extend.texi: Document the new zero_call_used_regs attribute.
	* doc/invoke.texi: Document the new -fzero-call-used-regs option.
	* doc/tm.texi: Regenerate.
	* doc/tm.texi.in (TARGET_ZERO_CALL_USED_REGS): New hook. 
	* emit-rtl.h (struct rtl_data): New fields zero_call_used_regs
	and must_be_zero_on_return.
	* flag-types.h (enum  zero_call_used_regs_code): New type.
	* function.c (gen_call_used_regs_seq): New function.
	(rest_of_zero_call_used_regs): Likewise.
	(class pass_zero_call_used_regs): New class.
	(pass_zero_call_used_regs::gate): New function.
	(make_pass_zero_call_used_regs): New function.
	* optabs.c (expand_asm_reg_clobber_mem_blockage): New function.
	* optabs.h (expand_asm_reg_clobber_mem_blockage): Declare.
	* opts.c (zero_call_used_regs_opts): New structure array 
	initialization.  
        (parse_zero_call_used_regs_options): New function.
        (common_handle_option): Handle fzero_call_used_regs_.
	* opts.h (zero_call_used_regs_opts): New structure array.
	* passes.def: Add new pass pass_zero_call_used_regs.
	* resource.c (init_resource_info): Replace EPILOGUE_USES with
	df_epilogue_uses_p.
	* target.def (zero_call_used_regs): New hook.
	* targhooks.c (default_zero_call_used_regs): New function.
	* targhooks.h (default_zero_call_used_regs): Declare.
	* tree-pass.h (make_pass_zero_call_used_regs): Declare.

gcc/c-family/ChangeLog:

2020-10-24  Qing Zhao  <qing.zhao@oracle.com>
	    H.J.Lu  <hjl.tools@gmail.com>

	* c-attribs.c (c_common_attribute_table): Add new attribute
	zero_call_used_regs.
	(handle_zero_call_used_regs_attribute): New function.

gcc/testsuite/ChangeLog:

2020-10-24  Qing Zhao  <qing.zhao@oracle.com>
	    H.J.Lu  <hjl.tools@gmail.com>

	* gcc.target/i386/zero-scratch-regs-1.c: New test.
	* gcc.target/i386/zero-scratch-regs-10.c: New test.
	* gcc.target/i386/zero-scratch-regs-11.c: New test.
	* gcc.target/i386/zero-scratch-regs-12.c: New test.
	* gcc.target/i386/zero-scratch-regs-13.c: New test.
	* gcc.target/i386/zero-scratch-regs-14.c: New test.
	* gcc.target/i386/zero-scratch-regs-15.c: New test.
	* gcc.target/i386/zero-scratch-regs-16.c: New test.
	* gcc.target/i386/zero-scratch-regs-17.c: New test.
	* gcc.target/i386/zero-scratch-regs-18.c: New test.
	* gcc.target/i386/zero-scratch-regs-19.c: New test.
	* gcc.target/i386/zero-scratch-regs-2.c: New test.
	* gcc.target/i386/zero-scratch-regs-20.c: New test.
	* gcc.target/i386/zero-scratch-regs-21.c: New test.
	* gcc.target/i386/zero-scratch-regs-22.c: New test.
	* gcc.target/i386/zero-scratch-regs-23.c: New test.
	* gcc.target/i386/zero-scratch-regs-24.c: New test.
	* gcc.target/i386/zero-scratch-regs-25.c: New test.
	* gcc.target/i386/zero-scratch-regs-26.c: New test.
	* gcc.target/i386/zero-scratch-regs-27.c: New test.
	* gcc.target/i386/zero-scratch-regs-3.c: New test.
	* gcc.target/i386/zero-scratch-regs-4.c: New test.
	* gcc.target/i386/zero-scratch-regs-5.c: New test.
	* gcc.target/i386/zero-scratch-regs-6.c: New test.
	* gcc.target/i386/zero-scratch-regs-7.c: New test.
	* gcc.target/i386/zero-scratch-regs-8.c: New test.
	* gcc.target/i386/zero-scratch-regs-9.c: New test.

******the patch:

From 0da0f0f2203b1e3bb2be8c12a0a1addd3c20a534 Mon Sep 17 00:00:00 2001
From: qing zhao <qing.zhao@oracle.com>
Date: Thu, 1 Oct 2020 00:27:57 +0000
Subject: [PATCH] The 4th version of -fzero-call-used-regs
We will provide a new feature into GCC:

Add -fzero-call-used-regs=[skip|used-gpr-arg|used-arg|all-gpr-arg
			   |all-arg|used-gpr|all-gpr|used|all] 
command-line option
and
zero_call_used_regs("skip|used-gpr-arg|used-arg|all-gpr-arg
		     |all-arg|used-gpr|all-gpr|used|all") 
function attribues:

  1. -fzero-call-used-regs=skip and zero_call_used_regs("skip")
  2. -fzero-call-used-regs=used-gpr-arg and zero_call_used_regs("used-gpr-arg")
  3. -fzero-call-used-regs=used-arg and zero_call_used_regs("used-arg")
  4. -fzero-call-used-regs=all-gpr-arg and zero_call_used_regs("all-gpr-arg")
  5. -fzero-call-used-regs=all-arg and zero_call_used_regs("all-arg")
  6. -fzero-call-used-regs=used-gpr and zero_call_used_regs("used-gpr")
  7. -fzero-call-used-regs=all-gpr and zero_call_used_regs("all-gpr")
  8. -fzero-call-used-regs=used and zero_call_used_regs("used")
  9. -fzero-call-used-regs=all and zero_call_used_regs("all")

---
 gcc/c-family/c-attribs.c                           |  31 +++
 gcc/common.opt                                     |   8 +
 gcc/config/i386/i386.c                             | 186 ++++++++++++++++++
 gcc/df-scan.c                                      |  12 +-
 gcc/df.h                                           |   1 +
 gcc/doc/extend.texi                                |  43 +++++
 gcc/doc/invoke.texi                                |  42 +++-
 gcc/doc/tm.texi                                    |  12 ++
 gcc/doc/tm.texi.in                                 |   2 +
 gcc/emit-rtl.h                                     |   6 +
 gcc/flag-types.h                                   |   9 +
 gcc/function.c                                     | 213 ++++++++++++++++++++-
 gcc/optabs.c                                       |  42 ++++
 gcc/optabs.h                                       |   2 +
 gcc/opts.c                                         |  47 +++++
 gcc/opts.h                                         |   6 +
 gcc/passes.def                                     |   1 +
 gcc/recog.c                                        |  16 ++
 gcc/recog.h                                        |   1 +
 gcc/resource.c                                     |   2 +-
 gcc/target.def                                     |  15 ++
 gcc/targhooks.c                                    |  32 ++++
 gcc/targhooks.h                                    |   1 +
 gcc/testsuite/c-c++-common/zero-scratch-regs-1.c   |  15 ++
 gcc/testsuite/c-c++-common/zero-scratch-regs-2.c   |  16 ++
 .../gcc.target/i386/zero-scratch-regs-1.c          |  12 ++
 .../gcc.target/i386/zero-scratch-regs-10.c         |  21 ++
 .../gcc.target/i386/zero-scratch-regs-11.c         |  39 ++++
 .../gcc.target/i386/zero-scratch-regs-12.c         |  39 ++++
 .../gcc.target/i386/zero-scratch-regs-13.c         |  21 ++
 .../gcc.target/i386/zero-scratch-regs-14.c         |  19 ++
 .../gcc.target/i386/zero-scratch-regs-15.c         |  14 ++
 .../gcc.target/i386/zero-scratch-regs-16.c         |  14 ++
 .../gcc.target/i386/zero-scratch-regs-17.c         |  13 ++
 .../gcc.target/i386/zero-scratch-regs-18.c         |  13 ++
 .../gcc.target/i386/zero-scratch-regs-19.c         |  12 ++
 .../gcc.target/i386/zero-scratch-regs-2.c          |  19 ++
 .../gcc.target/i386/zero-scratch-regs-20.c         |  23 +++
 .../gcc.target/i386/zero-scratch-regs-21.c         |  14 ++
 .../gcc.target/i386/zero-scratch-regs-22.c         |  21 ++
 .../gcc.target/i386/zero-scratch-regs-23.c         |  29 +++
 .../gcc.target/i386/zero-scratch-regs-24.c         |  10 +
 .../gcc.target/i386/zero-scratch-regs-25.c         |  10 +
 .../gcc.target/i386/zero-scratch-regs-26.c         |  23 +++
 .../gcc.target/i386/zero-scratch-regs-27.c         |  15 ++
 .../gcc.target/i386/zero-scratch-regs-3.c          |  12 ++
 .../gcc.target/i386/zero-scratch-regs-4.c          |  14 ++
 .../gcc.target/i386/zero-scratch-regs-5.c          |  20 ++
 .../gcc.target/i386/zero-scratch-regs-6.c          |  14 ++
 .../gcc.target/i386/zero-scratch-regs-7.c          |  13 ++
 .../gcc.target/i386/zero-scratch-regs-8.c          |  19 ++
 .../gcc.target/i386/zero-scratch-regs-9.c          |  15 ++
 gcc/tree-pass.h                                    |   1 +
 53 files changed, 1244 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/zero-scratch-regs-1.c
 create mode 100644 gcc/testsuite/c-c++-common/zero-scratch-regs-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c
 create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c
 create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c
 create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-13.c
 create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-14.c
 create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-15.c
 create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-16.c
 create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-17.c
 create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-18.c
 create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-19.c
 create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-20.c
 create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-21.c
 create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-22.c
 create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-23.c
 create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-24.c
 create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-25.c
 create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-26.c
 create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-27.c
 create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c
 create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c
 create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c
 create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c
 create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c

Comments

Uros Bizjak Oct. 24, 2020, 4:52 p.m. UTC | #1
On Sat, Oct 24, 2020 at 6:05 PM Qing Zhao <QING.ZHAO@oracle.com> wrote:
>
> Hi,
>
> This is the 4th version of the implementation of patch -fzero-call-used-regs.
>
> The major change compared to the previous version are:
>
> 1.  Documentation change per Richard’s suggestion;
> 2.  Command sub options handling per Richard’s suggestion;
> 3.  I386 part, clearing ST registers per Uros’s suggestion;
> 4. Some minor spelling and style fix.
>
> I have tested this new GCC on both x86 and arm64, no regression.
>
> Please let me know whether it’s ready for stage 1 gcc11?
>
> Thanks.
>
>
> ******The changelog:
>
> gcc/ChangeLog:
>
> 2020-10-24  Qing Zhao  <qing.zhao@oracle.com>
>             H.J.Lu  <hjl.tools@gmail.com>
>
>         * common.opt: Add new option -fzero-call-used-regs
>         * config/i386/i386.c (zero_call_used_regno_p): New function.
>         (zero_call_used_regno_mode): Likewise.
>         (zero_all_vector_registers): Likewise.
>         (zero_all_st_mm_registers): Likewise.
>         (ix86_zero_call_used_regs): Likewise.
>         (TARGET_ZERO_CALL_USED_REGS): Define.
>         * df-scan.c (df_epilogue_uses_p): New function.
>         (df_get_exit_block_use_set): Replace EPILOGUE_USES with
>         df_epilogue_uses_p.
>         * df.h (df_epilogue_uses_p): Declare.
>         * doc/extend.texi: Document the new zero_call_used_regs attribute.
>         * doc/invoke.texi: Document the new -fzero-call-used-regs option.
>         * doc/tm.texi: Regenerate.
>         * doc/tm.texi.in (TARGET_ZERO_CALL_USED_REGS): New hook.
>         * emit-rtl.h (struct rtl_data): New fields zero_call_used_regs
>         and must_be_zero_on_return.
>         * flag-types.h (enum  zero_call_used_regs_code): New type.
>         * function.c (gen_call_used_regs_seq): New function.
>         (rest_of_zero_call_used_regs): Likewise.
>         (class pass_zero_call_used_regs): New class.
>         (pass_zero_call_used_regs::gate): New function.
>         (make_pass_zero_call_used_regs): New function.
>         * optabs.c (expand_asm_reg_clobber_mem_blockage): New function.
>         * optabs.h (expand_asm_reg_clobber_mem_blockage): Declare.
>         * opts.c (zero_call_used_regs_opts): New structure array
>         initialization.
>         (parse_zero_call_used_regs_options): New function.
>         (common_handle_option): Handle fzero_call_used_regs_.
>         * opts.h (zero_call_used_regs_opts): New structure array.
>         * passes.def: Add new pass pass_zero_call_used_regs.
>         * resource.c (init_resource_info): Replace EPILOGUE_USES with
>         df_epilogue_uses_p.
>         * target.def (zero_call_used_regs): New hook.
>         * targhooks.c (default_zero_call_used_regs): New function.
>         * targhooks.h (default_zero_call_used_regs): Declare.
>         * tree-pass.h (make_pass_zero_call_used_regs): Declare.
>
> gcc/c-family/ChangeLog:
>
> 2020-10-24  Qing Zhao  <qing.zhao@oracle.com>
>             H.J.Lu  <hjl.tools@gmail.com>
>
>         * c-attribs.c (c_common_attribute_table): Add new attribute
>         zero_call_used_regs.
>         (handle_zero_call_used_regs_attribute): New function.
>
> gcc/testsuite/ChangeLog:
>
> 2020-10-24  Qing Zhao  <qing.zhao@oracle.com>
>             H.J.Lu  <hjl.tools@gmail.com>
>
>         * gcc.target/i386/zero-scratch-regs-1.c: New test.
>         * gcc.target/i386/zero-scratch-regs-10.c: New test.
>         * gcc.target/i386/zero-scratch-regs-11.c: New test.
>         * gcc.target/i386/zero-scratch-regs-12.c: New test.
>         * gcc.target/i386/zero-scratch-regs-13.c: New test.
>         * gcc.target/i386/zero-scratch-regs-14.c: New test.
>         * gcc.target/i386/zero-scratch-regs-15.c: New test.
>         * gcc.target/i386/zero-scratch-regs-16.c: New test.
>         * gcc.target/i386/zero-scratch-regs-17.c: New test.
>         * gcc.target/i386/zero-scratch-regs-18.c: New test.
>         * gcc.target/i386/zero-scratch-regs-19.c: New test.
>         * gcc.target/i386/zero-scratch-regs-2.c: New test.
>         * gcc.target/i386/zero-scratch-regs-20.c: New test.
>         * gcc.target/i386/zero-scratch-regs-21.c: New test.
>         * gcc.target/i386/zero-scratch-regs-22.c: New test.
>         * gcc.target/i386/zero-scratch-regs-23.c: New test.
>         * gcc.target/i386/zero-scratch-regs-24.c: New test.
>         * gcc.target/i386/zero-scratch-regs-25.c: New test.
>         * gcc.target/i386/zero-scratch-regs-26.c: New test.
>         * gcc.target/i386/zero-scratch-regs-27.c: New test.
>         * gcc.target/i386/zero-scratch-regs-3.c: New test.
>         * gcc.target/i386/zero-scratch-regs-4.c: New test.
>         * gcc.target/i386/zero-scratch-regs-5.c: New test.
>         * gcc.target/i386/zero-scratch-regs-6.c: New test.
>         * gcc.target/i386/zero-scratch-regs-7.c: New test.
>         * gcc.target/i386/zero-scratch-regs-8.c: New test.
>         * gcc.target/i386/zero-scratch-regs-9.c: New test.
>
> ******the patch:
>
> From 0da0f0f2203b1e3bb2be8c12a0a1addd3c20a534 Mon Sep 17 00:00:00 2001
> From: qing zhao <qing.zhao@oracle.com>
> Date: Thu, 1 Oct 2020 00:27:57 +0000
> Subject: [PATCH] The 4th version of -fzero-call-used-regs
> We will provide a new feature into GCC:
>
> Add -fzero-call-used-regs=[skip|used-gpr-arg|used-arg|all-gpr-arg
>                            |all-arg|used-gpr|all-gpr|used|all]
> command-line option
> and
> zero_call_used_regs("skip|used-gpr-arg|used-arg|all-gpr-arg
>                      |all-arg|used-gpr|all-gpr|used|all")
> function attribues:
>
>   1. -fzero-call-used-regs=skip and zero_call_used_regs("skip")
>   2. -fzero-call-used-regs=used-gpr-arg and zero_call_used_regs("used-gpr-arg")
>   3. -fzero-call-used-regs=used-arg and zero_call_used_regs("used-arg")
>   4. -fzero-call-used-regs=all-gpr-arg and zero_call_used_regs("all-gpr-arg")
>   5. -fzero-call-used-regs=all-arg and zero_call_used_regs("all-arg")
>   6. -fzero-call-used-regs=used-gpr and zero_call_used_regs("used-gpr")
>   7. -fzero-call-used-regs=all-gpr and zero_call_used_regs("all-gpr")
>   8. -fzero-call-used-regs=used and zero_call_used_regs("used")
>   9. -fzero-call-used-regs=all and zero_call_used_regs("all")
>
> ---
>  gcc/c-family/c-attribs.c                           |  31 +++
>  gcc/common.opt                                     |   8 +
>  gcc/config/i386/i386.c                             | 186 ++++++++++++++++++
>  gcc/df-scan.c                                      |  12 +-
>  gcc/df.h                                           |   1 +
>  gcc/doc/extend.texi                                |  43 +++++
>  gcc/doc/invoke.texi                                |  42 +++-
>  gcc/doc/tm.texi                                    |  12 ++
>  gcc/doc/tm.texi.in                                 |   2 +
>  gcc/emit-rtl.h                                     |   6 +
>  gcc/flag-types.h                                   |   9 +
>  gcc/function.c                                     | 213 ++++++++++++++++++++-
>  gcc/optabs.c                                       |  42 ++++
>  gcc/optabs.h                                       |   2 +
>  gcc/opts.c                                         |  47 +++++
>  gcc/opts.h                                         |   6 +
>  gcc/passes.def                                     |   1 +
>  gcc/recog.c                                        |  16 ++
>  gcc/recog.h                                        |   1 +
>  gcc/resource.c                                     |   2 +-
>  gcc/target.def                                     |  15 ++
>  gcc/targhooks.c                                    |  32 ++++
>  gcc/targhooks.h                                    |   1 +
>  gcc/testsuite/c-c++-common/zero-scratch-regs-1.c   |  15 ++
>  gcc/testsuite/c-c++-common/zero-scratch-regs-2.c   |  16 ++
>  .../gcc.target/i386/zero-scratch-regs-1.c          |  12 ++
>  .../gcc.target/i386/zero-scratch-regs-10.c         |  21 ++
>  .../gcc.target/i386/zero-scratch-regs-11.c         |  39 ++++
>  .../gcc.target/i386/zero-scratch-regs-12.c         |  39 ++++
>  .../gcc.target/i386/zero-scratch-regs-13.c         |  21 ++
>  .../gcc.target/i386/zero-scratch-regs-14.c         |  19 ++
>  .../gcc.target/i386/zero-scratch-regs-15.c         |  14 ++
>  .../gcc.target/i386/zero-scratch-regs-16.c         |  14 ++
>  .../gcc.target/i386/zero-scratch-regs-17.c         |  13 ++
>  .../gcc.target/i386/zero-scratch-regs-18.c         |  13 ++
>  .../gcc.target/i386/zero-scratch-regs-19.c         |  12 ++
>  .../gcc.target/i386/zero-scratch-regs-2.c          |  19 ++
>  .../gcc.target/i386/zero-scratch-regs-20.c         |  23 +++
>  .../gcc.target/i386/zero-scratch-regs-21.c         |  14 ++
>  .../gcc.target/i386/zero-scratch-regs-22.c         |  21 ++
>  .../gcc.target/i386/zero-scratch-regs-23.c         |  29 +++
>  .../gcc.target/i386/zero-scratch-regs-24.c         |  10 +
>  .../gcc.target/i386/zero-scratch-regs-25.c         |  10 +
>  .../gcc.target/i386/zero-scratch-regs-26.c         |  23 +++
>  .../gcc.target/i386/zero-scratch-regs-27.c         |  15 ++
>  .../gcc.target/i386/zero-scratch-regs-3.c          |  12 ++
>  .../gcc.target/i386/zero-scratch-regs-4.c          |  14 ++
>  .../gcc.target/i386/zero-scratch-regs-5.c          |  20 ++
>  .../gcc.target/i386/zero-scratch-regs-6.c          |  14 ++
>  .../gcc.target/i386/zero-scratch-regs-7.c          |  13 ++
>  .../gcc.target/i386/zero-scratch-regs-8.c          |  19 ++
>  .../gcc.target/i386/zero-scratch-regs-9.c          |  15 ++
>  gcc/tree-pass.h                                    |   1 +
>  53 files changed, 1244 insertions(+), 6 deletions(-)
>  create mode 100644 gcc/testsuite/c-c++-common/zero-scratch-regs-1.c
>  create mode 100644 gcc/testsuite/c-c++-common/zero-scratch-regs-2.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-13.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-14.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-15.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-16.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-17.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-18.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-19.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-20.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-21.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-22.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-23.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-24.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-25.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-26.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-27.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c
>
> diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
> index c779d13..979b6a7 100644
> --- a/gcc/c-family/c-attribs.c
> +++ b/gcc/c-family/c-attribs.c
> @@ -138,6 +138,8 @@ static tree handle_target_clones_attribute (tree *, tree, tree, int, bool *);
>  static tree handle_optimize_attribute (tree *, tree, tree, int, bool *);
>  static tree ignore_attribute (tree *, tree, tree, int, bool *);
>  static tree handle_no_split_stack_attribute (tree *, tree, tree, int, bool *);
> +static tree handle_zero_call_used_regs_attribute (tree *, tree, tree, int,
> +                                                 bool *);
>  static tree handle_argspec_attribute (tree *, tree, tree, int, bool *);
>  static tree handle_fnspec_attribute (tree *, tree, tree, int, bool *);
>  static tree handle_warn_unused_attribute (tree *, tree, tree, int, bool *);
> @@ -437,6 +439,8 @@ const struct attribute_spec c_common_attribute_table[] =
>                               ignore_attribute, NULL },
>    { "no_split_stack",        0, 0, true,  false, false, false,
>                               handle_no_split_stack_attribute, NULL },
> +  { "zero_call_used_regs",    1, 1, true, false, false, false,
> +                             handle_zero_call_used_regs_attribute, NULL },
>    /* For internal use only (marking of function arguments).
>       The name contains a space to prevent its usage in source code.  */
>    { "arg spec",                      1, -1, true, false, false, false,
> @@ -4959,6 +4963,33 @@ handle_no_split_stack_attribute (tree *node, tree name,
>    return NULL_TREE;
>  }
>
> +/* Handle a "zero_call_used_regs" attribute; arguments as in
> +   struct attribute_spec.handler.  */
> +
> +static tree
> +handle_zero_call_used_regs_attribute (tree *node, tree name, tree args,
> +                                     int ARG_UNUSED (flags),
> +                                     bool *no_add_attrs)
> +{
> +  tree decl = *node;
> +  tree id = TREE_VALUE (args);
> +
> +  if (TREE_CODE (decl) != FUNCTION_DECL)
> +    {
> +      error_at (DECL_SOURCE_LOCATION (decl),
> +               "%qE attribute applies only to functions", name);
> +      *no_add_attrs = true;
> +    }
> +
> +  if (TREE_CODE (id) != STRING_CST)
> +    {
> +      error ("attribute %qE arguments not a string", name);
> +      *no_add_attrs = true;
> +    }
> +
> +  return NULL_TREE;
> +}
> +
>  /* Handle a "returns_nonnull" attribute; arguments as in
>     struct attribute_spec.handler.  */
>
> diff --git a/gcc/common.opt b/gcc/common.opt
> index 292c2de..4a13f32 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -228,6 +228,10 @@ unsigned int flag_sanitize_coverage
>  Variable
>  bool dump_base_name_prefixed = false
>
> +; What subset of registers should be zeroed
> +Variable
> +unsigned int flag_zero_call_used_regs
> +
>  ###
>  Driver
>
> @@ -3111,6 +3115,10 @@ fzero-initialized-in-bss
>  Common Report Var(flag_zero_initialized_in_bss) Init(1)
>  Put zero initialized data in the bss section.
>
> +fzero-call-used-regs=
> +Common Report RejectNegative Joined
> +Clear call-used registers upon function return.
> +
>  g
>  Common Driver RejectNegative JoinedOrMissing
>  Generate debug information in default format.
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index f684954..e66dcf0 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -3551,6 +3551,189 @@ ix86_function_value_regno_p (const unsigned int regno)
>    return false;
>  }
>
> +/* Check whether the register REGNO should be zeroed on X86.
> +   When ALL_SSE_ZEROED is true, all SSE registers have been zeroed
> +   together, no need to zero it again.
> +   Stack registers (st0-st7) and mm0-mm7 are aliased with each other.
> +   very hard to be zeroed individually, don't zero individual st or
> +   mm registgers.  */
> +
> +static bool
> +zero_call_used_regno_p (const unsigned int regno,
> +                       bool all_sse_zeroed)
> +{
> +  return GENERAL_REGNO_P (regno)
> +        || (!all_sse_zeroed && SSE_REGNO_P (regno))
> +        || MASK_REGNO_P (regno);
> +}
> +
> +/* Return the machine_mode that is used to zero register REGNO.  */
> +
> +static machine_mode
> +zero_call_used_regno_mode (const unsigned int regno)
> +{
> +  /* NB: We only need to zero the lower 32 bits for integer registers
> +     and the lower 128 bits for vector registers since destination are
> +     zero-extended to the full register width.  */
> +  if (GENERAL_REGNO_P (regno))
> +    return SImode;
> +  else if (SSE_REGNO_P (regno))
> +    return V4SFmode;
> +  else
> +    return HImode;
> +}
> +
> +/* Generate a rtx to zero all vector registers together if possible,
> +   otherwise, return NULL.  */
> +
> +static rtx
> +zero_all_vector_registers (HARD_REG_SET need_zeroed_hardregs)
> +{
> +  if (!TARGET_AVX)
> +    return NULL;
> +
> +  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
> +    if ((IN_RANGE (regno, FIRST_SSE_REG, LAST_SSE_REG)
> +        || (TARGET_64BIT
> +            && (REX_SSE_REGNO_P (regno)
> +                || (TARGET_AVX512F && EXT_REX_SSE_REGNO_P (regno)))))
> +       && !TEST_HARD_REG_BIT (need_zeroed_hardregs, regno))
> +      return NULL;
> +
> +  return gen_avx_vzeroall ();
> +}
> +
> +/* Generate insns to zero all st/mm registers together.
> +   Return true when zeroing instructions are generated.
> +   Assume the number of st registers that are zeroed is num_of_st,
> +   we will emit the following sequence to zero them together:
> +                 fldz;         \
> +                 fldz;         \
> +                 ...
> +                 fldz;         \
> +                 fstp %%st(0); \
> +                 fstp %%st(0); \
> +                 ...
> +                 fstp %%st(0);
> +   i.e., num_of_st fldz followed by num_of_st fstp to clear the stack
> +   mark stack slots empty.  */
> +
> +static bool
> +zero_all_st_mm_registers (HARD_REG_SET need_zeroed_hardregs)
> +{
> +  unsigned int num_of_st = 0;
> +  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
> +    if (STACK_REGNO_P (regno)
> +       && TEST_HARD_REG_BIT (need_zeroed_hardregs, regno)
> +       /* When the corresponding mm register also need to be cleared too.  */
> +       && TEST_HARD_REG_BIT (need_zeroed_hardregs,
> +                             (regno - FIRST_STACK_REG + FIRST_MMX_REG)))
> +      num_of_st++;

I don't think the above logic is correct. It should go like this:

- If the function is returning an MMX register, then the function
exits in MMX mode, and MMX registers should be cleared in the same way
as XMM registers. Otherwise the ABI specifies that the function exits
in x87 mode and x87 stack should be cleared (but see below).

- There is no direct mapping of stack registers to hard register
numbers. If a stack register is used, we don't know where in the stack
the value remains. So, if _any_ stack register is touched, the whole
stack should be cleared (value, returning in x87 stack register should
obviously be excluded).

- There is no x87 argument register. 32bit targets use MMX0-3 argument
registers and return value in the XMM register. Please also note that
complex values take two stack slots in x87 stack.

Uros.

> +
> +  if (num_of_st == 0)
> +    return false;
> +
> +  rtx st_reg = gen_rtx_REG (XFmode, FIRST_STACK_REG);
> +  for (unsigned int i = 0; i < num_of_st; i++)
> +    emit_insn (gen_rtx_SET (st_reg, CONST0_RTX (XFmode)));
> +
> +  for (unsigned int i = 0; i < num_of_st; i++)
> +    {
> +      rtx insn;
> +      insn = emit_insn (gen_rtx_SET (st_reg, st_reg));
> +      add_reg_note (insn, REG_DEAD, st_reg);
> +    }
> +  return true;
> +}
> +
> +/* TARGET_ZERO_CALL_USED_REGS.  */
> +/* Generate a sequence of instructions that zero registers specified by
> +   NEED_ZEROED_HARDREGS.  Return the ZEROED_HARDREGS that are actually
> +   zeroed.  */
> +static HARD_REG_SET
> +ix86_zero_call_used_regs (HARD_REG_SET need_zeroed_hardregs)
> +{
> +  HARD_REG_SET zeroed_hardregs;
> +  bool all_sse_zeroed = false;
> +  bool st_zeroed = false;
> +
> +  /* first, let's see whether we can zero all vector registers together.  */
> +  rtx zero_all_vec_insn = zero_all_vector_registers (need_zeroed_hardregs);
> +  if (zero_all_vec_insn)
> +    {
> +      emit_insn (zero_all_vec_insn);
> +      all_sse_zeroed = true;
> +    }
> +
> +  /* then, let's see whether we can zero all st+mm registers togeter.  */
> +  st_zeroed = zero_all_st_mm_registers (need_zeroed_hardregs);
> +
> +  /* Now, generate instructions to zero all the registers.  */
> +
> +  CLEAR_HARD_REG_SET (zeroed_hardregs);
> +  if (st_zeroed)
> +    SET_HARD_REG_BIT (zeroed_hardregs, FIRST_STACK_REG);
> +
> +  rtx zero_gpr = NULL_RTX;
> +  rtx zero_vector = NULL_RTX;
> +  rtx zero_mask = NULL_RTX;
> +
> +  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
> +    {
> +      if (!TEST_HARD_REG_BIT (need_zeroed_hardregs, regno))
> +       continue;
> +      if (!zero_call_used_regno_p (regno, all_sse_zeroed))
> +       continue;
> +
> +      SET_HARD_REG_BIT (zeroed_hardregs, regno);
> +
> +      rtx reg, tmp;
> +      machine_mode mode = zero_call_used_regno_mode (regno);
> +
> +      reg = gen_rtx_REG (mode, regno);
> +
> +      if (mode == SImode)
> +       if (zero_gpr == NULL_RTX)
> +         {
> +           zero_gpr = reg;
> +           tmp = gen_rtx_SET (reg, const0_rtx);
> +           if (!TARGET_USE_MOV0 || optimize_insn_for_size_p ())
> +             {
> +               rtx clob = gen_rtx_CLOBBER (VOIDmode,
> +                                           gen_rtx_REG (CCmode,
> +                                                        FLAGS_REG));
> +               tmp = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2,
> +                                                            tmp,
> +                                                            clob));
> +             }
> +           emit_insn (tmp);
> +         }
> +       else
> +         emit_move_insn (reg, zero_gpr);
> +      else if (mode == V4SFmode)
> +       if (zero_vector == NULL_RTX)
> +         {
> +           zero_vector = reg;
> +           tmp = gen_rtx_SET (reg, const0_rtx);
> +           emit_insn (tmp);
> +         }
> +       else
> +         emit_move_insn (reg, zero_vector);
> +      else if (mode == HImode)
> +       if (zero_mask == NULL_RTX)
> +         {
> +           zero_mask = reg;
> +           tmp = gen_rtx_SET (reg, const0_rtx);
> +           emit_insn (tmp);
> +         }
> +       else
> +         emit_move_insn (reg, zero_mask);
> +      else
> +       gcc_unreachable ();
> +    }
> +  return zeroed_hardregs;
> +}
> +
>  /* Define how to find the value returned by a function.
>     VALTYPE is the data type of the value (as a tree).
>     If the precise function being called is known, FUNC is its FUNCTION_DECL;
> @@ -23229,6 +23412,9 @@ ix86_run_selftests (void)
>  #undef TARGET_FUNCTION_VALUE_REGNO_P
>  #define TARGET_FUNCTION_VALUE_REGNO_P ix86_function_value_regno_p
>
> +#undef TARGET_ZERO_CALL_USED_REGS
> +#define TARGET_ZERO_CALL_USED_REGS ix86_zero_call_used_regs
> +
>  #undef TARGET_PROMOTE_FUNCTION_MODE
>  #define TARGET_PROMOTE_FUNCTION_MODE ix86_promote_function_mode
>
> diff --git a/gcc/df-scan.c b/gcc/df-scan.c
> index 93b060f..9e75c13 100644
> --- a/gcc/df-scan.c
> +++ b/gcc/df-scan.c
> @@ -3614,6 +3614,14 @@ df_update_entry_block_defs (void)
>  }
>
>
> +/* Return true if REGNO is used by the epilogue.  */
> +bool
> +df_epilogue_uses_p (unsigned int regno)
> +{
> +  return (EPILOGUE_USES (regno)
> +         || TEST_HARD_REG_BIT (crtl->must_be_zero_on_return, regno));
> +}
> +
>  /* Set the bit for regs that are considered being used at the exit. */
>
>  static void
> @@ -3661,7 +3669,7 @@ df_get_exit_block_use_set (bitmap exit_block_uses)
>       epilogue as being live at the end of the function since they
>       may be referenced by our caller.  */
>    for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
> -    if (global_regs[i] || EPILOGUE_USES (i))
> +    if (global_regs[i] || df_epilogue_uses_p (i))
>        bitmap_set_bit (exit_block_uses, i);
>
>    if (targetm.have_epilogue () && epilogue_completed)
> @@ -3802,7 +3810,6 @@ df_hard_reg_init (void)
>    initialized = true;
>  }
>
> -
>  /* Recompute the parts of scanning that are based on regs_ever_live
>     because something changed in that array.  */
>
> @@ -3862,7 +3869,6 @@ df_regs_ever_live_p (unsigned int regno)
>    return regs_ever_live[regno];
>  }
>
> -
>  /* Set regs_ever_live[REGNO] to VALUE.  If this cause regs_ever_live
>     to change, schedule that change for the next update.  */
>
> diff --git a/gcc/df.h b/gcc/df.h
> index 8b6ca8c..0f098d7 100644
> --- a/gcc/df.h
> +++ b/gcc/df.h
> @@ -1085,6 +1085,7 @@ extern void df_update_entry_exit_and_calls (void);
>  extern bool df_hard_reg_used_p (unsigned int);
>  extern unsigned int df_hard_reg_used_count (unsigned int);
>  extern bool df_regs_ever_live_p (unsigned int);
> +extern bool df_epilogue_uses_p (unsigned int);
>  extern void df_set_regs_ever_live (unsigned int, bool);
>  extern void df_compute_regs_ever_live (bool);
>  extern void df_scan_verify (void);
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index c9f7299..3a884e1 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -3992,6 +3992,49 @@ performing a link with relocatable output (i.e.@: @code{ld -r}) on them.
>  A declaration to which @code{weakref} is attached and that is associated
>  with a named @code{target} must be @code{static}.
>
> +@item zero_call_used_regs ("@var{choice}")
> +@cindex @code{zero_call_used_regs} function attribute
> +
> +The @code{zero_call_used_regs} attribute causes the compiler to zero
> +a subset of all call-used registers at function return according to
> +@var{choice}.
> +This is used to increase the program security by either mitigating
> +Return-Oriented Programming (ROP) or preventing information leak
> +through registers.
> +
> +A "call-used" register is a register that is clobbered by function calls,
> +as a result, the caller has to save and restore it before or after a
> +function call.  It is also called as "call-clobbered", "caller-saved", or
> +"volatile".
> +
> +In order to satisfy users with different security needs and control the
> +run-time overhead at the same time,  GCC provides a flexible way to choose
> +the subset of the call-used registers to be zeroed.
> +
> +@samp{skip} doesn't zero any call-used registers.
> +@samp{used} zeros call-used registers which are used in the function.  A "used"
> +register is one whose content has been set or referenced in the function.
> +@samp{all} zeros all call-used registers.
> +
> +In addition to the above three basic choices, the register set can be further
> +limited by adding "-gpr" (i.e., general purpose register), "-arg" (i.e.,
> +argument register), or both as following:
> +
> +@samp{used-gpr-arg} zeros used call-used general purpose registers that
> +pass parameters.
> +@samp{used-arg} zeros used call-used registers that pass parameters.
> +@samp{all-gpr-arg} zeros all call-used general purpose registers that pass
> +parameters.
> +@samp{all-arg} zeros all call-used registers that pass parameters.
> +@samp{used-gpr} zeros call-used general purpose registers which are used in the
> +function.
> +@samp{all-gpr} zeros all call-used general purpose registers.
> +
> +Among this list, "used-gpr-arg", "used-arg", "all-gpr-arg", and "all-arg" are
> +mainly used for ROP mitigation.
> +
> +The default for the attribute is controlled by @option{-fzero-call-used-regs}.
> +
>  @end table
>
>  @c This is the end of the target-independent attribute table
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index c049932..c6837d7 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -550,7 +550,7 @@ Objective-C and Objective-C++ Dialects}.
>  -funit-at-a-time  -funroll-all-loops  -funroll-loops @gol
>  -funsafe-math-optimizations  -funswitch-loops @gol
>  -fipa-ra  -fvariable-expansion-in-unroller  -fvect-cost-model  -fvpt @gol
> --fweb  -fwhole-program  -fwpa  -fuse-linker-plugin @gol
> +-fweb  -fwhole-program  -fwpa  -fuse-linker-plugin -fzero-call-used-regs @gol
>  --param @var{name}=@var{value}
>  -O  -O0  -O1  -O2  -O3  -Os  -Ofast  -Og}
>
> @@ -12550,6 +12550,46 @@ int foo (void)
>
>  Not all targets support this option.
>
> +@item -fzero-call-used-regs=@var{choice}
> +@opindex fzero-call-used-regs
> +Zero call-used registers at function return to increase the program
> +security by either mitigating Return-Oriented Programming (ROP) or
> +preventing information leak through registers.
> +
> +A "call-used" register is a register that is clobbered by function calls,
> +as a result, the caller has to save and restore it before or after a
> +function call.  It is also called as "call-clobbered", "caller-saved", or
> +"volatile".
> +
> +In order to satisfy users with different security needs and control the
> +run-time overhead at the same time,  GCC provides a flexible way to choose
> +the subset of the call-used registers to be zeroed.
> +
> +@samp{skip}, which is the default, doesn't zero any call-used registers.
> +@samp{used} zeros call-used registers which are used in the function.  A "used"
> +register is one whose content has been set or referenced in the function.
> +@samp{all} zeros all call-used registers.
> +
> +In addition to the above three basic choices, the register set can be further
> +limited by adding "-gpr" (i.e., general purpose register), "-arg" (i.e.,
> +argument register), or both as following:
> +
> +@samp{used-gpr-arg} zeros used call-used general purpose registers that
> +pass parameters.
> +@samp{used-arg} zeros used call-used registers that pass parameters.
> +@samp{all-gpr-arg} zeros all call-used general purpose registers that pass
> +parameters.
> +@samp{all-arg} zeros all call-used registers that pass parameters.
> +@samp{used-gpr} zeros call-used general purpose registers which are used in the
> +function.
> +@samp{all-gpr} zeros all call-used general purpose registers.
> +
> +Among this list, "used-gpr-arg", "used-arg", "all-gpr-arg", and "all-arg" are
> +mainly used for ROP mitigation.
> +
> +You can control this behavior for a specific function by using the function
> +attribute @code{zero_call_used_regs}.  @xref{Function Attributes}.
> +
>  @item --param @var{name}=@var{value}
>  @opindex param
>  In some places, GCC uses various constants to control the amount of
> diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
> index 97437e8..3b75c46 100644
> --- a/gcc/doc/tm.texi
> +++ b/gcc/doc/tm.texi
> @@ -12053,6 +12053,18 @@ argument list due to stack realignment.  Return @code{NULL} if no DRAP
>  is needed.
>  @end deftypefn
>
> +@deftypefn {Target Hook} HARD_REG_SET TARGET_ZERO_CALL_USED_REGS (HARD_REG_SET @var{selected_regs})
> +This target hook emits instructions to zero subset of @var{selected_regs}
> +that could conceivably contain values that are useful to an attacker.
> +Return the set of registers that were actually cleared.
> +
> +The default implementation uses normal move instructions to zero
> +all the registers in @var{selected_regs}.  Define this hook if the
> +target has more efficient ways of zeroing certain registers,
> +or if you believe that certain registers would never contain
> +values that are useful to an attacker.
> +@end deftypefn
> +
>  @deftypefn {Target Hook} bool TARGET_ALLOCATE_STACK_SLOTS_FOR_ARGS (void)
>  When optimization is disabled, this hook indicates whether or not
>  arguments should be allocated to stack slots.  Normally, GCC allocates
> diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
> index 412e22c..a67dbea 100644
> --- a/gcc/doc/tm.texi.in
> +++ b/gcc/doc/tm.texi.in
> @@ -8111,6 +8111,8 @@ and the associated definitions of those functions.
>
>  @hook TARGET_GET_DRAP_RTX
>
> +@hook TARGET_ZERO_CALL_USED_REGS
> +
>  @hook TARGET_ALLOCATE_STACK_SLOTS_FOR_ARGS
>
>  @hook TARGET_CONST_ANCHOR
> diff --git a/gcc/emit-rtl.h b/gcc/emit-rtl.h
> index 92ad0dd6..d7bdb66 100644
> --- a/gcc/emit-rtl.h
> +++ b/gcc/emit-rtl.h
> @@ -173,6 +173,9 @@ struct GTY(()) rtl_data {
>          local stack.  */
>    unsigned int stack_alignment_estimated;
>
> +  /* How to zero call-used regsiters for this routine.  */
> +  unsigned int zero_call_used_regs;
> +
>    /* How many NOP insns to place at each function entry by default.  */
>    unsigned short patch_area_size;
>
> @@ -310,6 +313,9 @@ struct GTY(()) rtl_data {
>       sets them.  */
>    HARD_REG_SET asm_clobbers;
>
> +  /* All hard registers that need to be zeroed at the return of the routine.  */
> +  HARD_REG_SET must_be_zero_on_return;
> +
>    /* The highest address seen during shorten_branches.  */
>    int max_insn_address;
>  };
> diff --git a/gcc/flag-types.h b/gcc/flag-types.h
> index 852ea76..0f7e503 100644
> --- a/gcc/flag-types.h
> +++ b/gcc/flag-types.h
> @@ -285,6 +285,15 @@ enum sanitize_code {
>                                   | SANITIZE_BOUNDS_STRICT
>  };
>
> +enum  zero_call_used_regs_code {
> +  UNSET = 0,
> +  SKIP = 1UL << 0,
> +  ONLY_USED = 1UL << 1,
> +  ONLY_GPR = 1UL << 2,
> +  ONLY_ARG = 1UL << 3,
> +  ALL = 1UL << 4
> +};
> +
>  /* Settings of flag_incremental_link.  */
>  enum incremental_link {
>    INCREMENTAL_LINK_NONE,
> diff --git a/gcc/function.c b/gcc/function.c
> index c612959..56e9997 100644
> --- a/gcc/function.c
> +++ b/gcc/function.c
> @@ -46,10 +46,12 @@ along with GCC; see the file COPYING3.  If not see
>  #include "stringpool.h"
>  #include "expmed.h"
>  #include "optabs.h"
> +#include "opts.h"
>  #include "regs.h"
>  #include "emit-rtl.h"
>  #include "recog.h"
>  #include "rtl-error.h"
> +#include "hard-reg-set.h"
>  #include "alias.h"
>  #include "fold-const.h"
>  #include "stor-layout.h"
> @@ -5815,6 +5817,102 @@ make_prologue_seq (void)
>    return seq;
>  }
>
> +/* Emit a sequence of insns to zero the call-used-registers before RET.  */
> +
> +static void
> +gen_call_used_regs_seq (rtx_insn *ret)
> +{
> +  bool gpr_only = true;
> +  bool used_only = true;
> +  bool arg_only = true;
> +
> +  /* No need to zero call-used-regs in main ().  */
> +  if (MAIN_NAME_P (DECL_NAME (current_function_decl)))
> +    return;
> +
> +  /* No need to zero call-used-regs if __builtin_eh_return is called
> +     since it isn't a normal function return.  */
> +  if (crtl->calls_eh_return)
> +    return;
> +
> +  /* If gpr_only is true, only zero call-used-registers that are
> +     general-purpose registers; if used_only is true, only zero
> +     call-used-registers that are used in the current function.  */
> +
> +  gpr_only = crtl->zero_call_used_regs & ONLY_GPR;
> +  used_only = crtl->zero_call_used_regs & ONLY_USED;
> +  arg_only = crtl->zero_call_used_regs & ONLY_ARG;
> +
> +  /* For each of the hard registers, check to see whether we should zero it if:
> +     1. it is a call-used-registers;
> + and 2. it is not a fixed-registers;
> + and 3. it is not live at the return of the routine;
> + and 4. it is general registor if gpr_only is true;
> + and 5. it is used in the routine if used_only is true;
> + and 6. it is a register that passes parameter if arg_only is true;
> +   */
> +
> +  /* First, prepare the data flow information.  */
> +  basic_block bb = BLOCK_FOR_INSN (ret);
> +  bitmap live_out;
> +  live_out = BITMAP_ALLOC (NULL);
> +  bitmap_copy (live_out, df_get_live_out (bb));
> +  df_simulate_initialize_backwards (bb, live_out);
> +  df_simulate_one_insn_backwards (bb, ret, live_out);
> +
> +  HARD_REG_SET need_zeroed_hardregs;
> +  CLEAR_HARD_REG_SET (need_zeroed_hardregs);
> +  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
> +    {
> +      if (!crtl->abi->clobbers_full_reg_p (regno))
> +       continue;
> +      if (fixed_regs[regno])
> +       continue;
> +      if (REGNO_REG_SET_P (live_out, regno))
> +       continue;
> +      if (gpr_only
> +         && !TEST_HARD_REG_BIT (reg_class_contents[GENERAL_REGS], regno))
> +       continue;
> +      if (used_only && !df_regs_ever_live_p (regno))
> +       continue;
> +      if (arg_only && !FUNCTION_ARG_REGNO_P (regno))
> +       continue;
> +
> +      /* Now this is a register that we might want to zero.  */
> +      SET_HARD_REG_BIT (need_zeroed_hardregs, regno);
> +    }
> +
> +  BITMAP_FREE (live_out);
> +
> +  if (hard_reg_set_empty_p (need_zeroed_hardregs))
> +    return;
> +
> +  /* Now we get a hard register set that need to be zeroed, pass it to
> +     target to generate zeroing sequence.  */
> +  HARD_REG_SET zeroed_hardregs;
> +  start_sequence ();
> +  zeroed_hardregs = targetm.calls.zero_call_used_regs (need_zeroed_hardregs);
> +  rtx_insn *seq = get_insns ();
> +  end_sequence ();
> +  if (seq)
> +    {
> +      /* Emit the memory blockage and register clobber asm volatile before
> +        the whole sequence.  */
> +      start_sequence ();
> +      expand_asm_reg_clobber_mem_blockage (zeroed_hardregs);
> +      rtx_insn *seq_barrier = get_insns ();
> +      end_sequence ();
> +
> +      emit_insn_before (seq_barrier, ret);
> +      emit_insn_before (seq, ret);
> +
> +      /* Update the data flow information.  */
> +      crtl->must_be_zero_on_return |= zeroed_hardregs;
> +      df_set_bb_dirty (EXIT_BLOCK_PTR_FOR_FN (cfun));
> +    }
> +}
> +
> +
>  /* Return a sequence to be used as the epilogue for the current function,
>     or NULL.  */
>
> @@ -6486,7 +6584,120 @@ make_pass_thread_prologue_and_epilogue (gcc::context *ctxt)
>  {
>    return new pass_thread_prologue_and_epilogue (ctxt);
>  }
> -
>
> +
> +static unsigned int
> +rest_of_zero_call_used_regs (void)
> +{
> +  edge_iterator ei;
> +  edge e;
> +  rtx_insn *insn;
> +
> +  /* This pass needs data flow information.  */
> +  df_analyze ();
> +
> +  /* Search all the "return"s in the routine, and insert instruction sequence to
> +     zero the call used registers.  */
> +  FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds)
> +    {
> +      insn = BB_END (e->src);
> +      if (JUMP_P (insn) && ANY_RETURN_P (JUMP_LABEL (insn)))
> +       gen_call_used_regs_seq (insn);
> +    }
> +
> +  return 0;
> +}
> +
> +namespace {
> +
> +const pass_data pass_data_zero_call_used_regs =
> +{
> +  RTL_PASS, /* type */
> +  "zero_call_used_regs", /* name */
> +  OPTGROUP_NONE, /* optinfo_flags */
> +  TV_NONE, /* tv_id */
> +  0, /* properties_required */
> +  0, /* properties_provided */
> +  0, /* properties_destroyed */
> +  0, /* todo_flags_start */
> +  0, /* todo_flags_finish */
> +};
> +
> +class pass_zero_call_used_regs: public rtl_opt_pass
> +{
> +public:
> +  pass_zero_call_used_regs (gcc::context *ctxt)
> +    : rtl_opt_pass (pass_data_zero_call_used_regs, ctxt)
> +  {}
> +
> +  /* opt_pass methods: */
> +  virtual bool gate (function *);
> +
> +  virtual unsigned int execute (function *)
> +    {
> +      return rest_of_zero_call_used_regs ();
> +    }
> +
> +}; // class pass_zero_call_used_regs
> +
> +bool
> +pass_zero_call_used_regs::gate (function *fun)
> +{
> +  unsigned int zero_regs_type = UNSET;
> +  unsigned int attr_zero_regs_type = UNSET;
> +
> +  tree attr_zero_regs
> +       = lookup_attribute ("zero_call_used_regs",
> +                           DECL_ATTRIBUTES (fun->decl));
> +
> +  /* Get the type of zero_call_used_regs from function attribute.  */
> +  if (attr_zero_regs)
> +    {
> +      bool found = false;
> +      unsigned int i;
> +
> +      /* The TREE_VALUE of an attribute is a TREE_LIST whose TREE_VALUE
> +        is the attribute argument's value.  */
> +      attr_zero_regs = TREE_VALUE (attr_zero_regs);
> +      gcc_assert (TREE_CODE (attr_zero_regs) == TREE_LIST);
> +      attr_zero_regs = TREE_VALUE (attr_zero_regs);
> +      gcc_assert (TREE_CODE (attr_zero_regs) == STRING_CST);
> +
> +      for (i = 0; zero_call_used_regs_opts[i].name != NULL; ++i)
> +       if (strcmp (TREE_STRING_POINTER (attr_zero_regs),
> +                    zero_call_used_regs_opts[i].name) == 0)
> +         {
> +           attr_zero_regs_type |= zero_call_used_regs_opts[i].flag;
> +           found = true;
> +           break;
> +         }
> +
> +      if (!found)
> +       warning_at (DECL_SOURCE_LOCATION (fun->decl), 0,
> +                   "unrecognized zero_call_used_regs attribute: %qs",
> +                   TREE_STRING_POINTER (attr_zero_regs));
> +    }
> +
> +  if (flag_zero_call_used_regs)
> +    if (!attr_zero_regs)
> +      zero_regs_type = flag_zero_call_used_regs;
> +    else
> +      zero_regs_type = attr_zero_regs_type;
> +  else
> +    zero_regs_type = attr_zero_regs_type;
> +
> +  crtl->zero_call_used_regs = zero_regs_type;
> +
> +  /* No need to zero call-used-regs when no user request is present.  */
> +  return zero_regs_type > SKIP;
> +}
> +
> +} // anon namespace
> +
> +rtl_opt_pass *
> +make_pass_zero_call_used_regs (gcc::context *ctxt)
> +{
> +  return new pass_zero_call_used_regs (ctxt);
> +}
>
>  /* If CONSTRAINT is a matching constraint, then return its number.
>     Otherwise, return -1.  */
> diff --git a/gcc/optabs.c b/gcc/optabs.c
> index 8ad7f4b..bd64af0 100644
> --- a/gcc/optabs.c
> +++ b/gcc/optabs.c
> @@ -6484,6 +6484,48 @@ expand_memory_blockage (void)
>      expand_asm_memory_blockage ();
>  }
>
> +/* Generate asm volatile("" : : : "memory") as a memory blockage, at the
> +   same time clobbering the register set specified by REGS.  */
> +
> +void
> +expand_asm_reg_clobber_mem_blockage (HARD_REG_SET regs)
> +{
> +  rtx asm_op, clob_mem;
> +
> +  unsigned int num_of_regs = 0;
> +  for (unsigned int i = 0; i < FIRST_PSEUDO_REGISTER; i++)
> +    if (TEST_HARD_REG_BIT (regs, i))
> +      num_of_regs++;
> +
> +  asm_op = gen_rtx_ASM_OPERANDS (VOIDmode, "", "", 0,
> +                                rtvec_alloc (0), rtvec_alloc (0),
> +                                rtvec_alloc (0), UNKNOWN_LOCATION);
> +  MEM_VOLATILE_P (asm_op) = 1;
> +
> +  rtvec v = rtvec_alloc (num_of_regs + 2);
> +
> +  clob_mem = gen_rtx_SCRATCH (VOIDmode);
> +  clob_mem = gen_rtx_MEM (BLKmode, clob_mem);
> +  clob_mem = gen_rtx_CLOBBER (VOIDmode, clob_mem);
> +
> +  RTVEC_ELT (v,0) = asm_op;
> +  RTVEC_ELT (v,1) = clob_mem;
> +
> +  if (num_of_regs > 0)
> +    {
> +      unsigned int j = 2;
> +      for (unsigned int i = 0; i < FIRST_PSEUDO_REGISTER; i++)
> +       if (TEST_HARD_REG_BIT (regs, i))
> +         {
> +           RTVEC_ELT (v,j) = gen_rtx_CLOBBER (VOIDmode, regno_reg_rtx[i]);
> +           j++;
> +         }
> +      gcc_assert (j == (num_of_regs + 2));
> +    }
> +
> +  emit_insn (gen_rtx_PARALLEL (VOIDmode, v));
> +}
> +
>  /* This routine will either emit the mem_thread_fence pattern or issue a
>     sync_synchronize to generate a fence for memory model MEMMODEL.  */
>
> diff --git a/gcc/optabs.h b/gcc/optabs.h
> index 0b14700..bfa10c8 100644
> --- a/gcc/optabs.h
> +++ b/gcc/optabs.h
> @@ -345,6 +345,8 @@ rtx expand_atomic_store (rtx, rtx, enum memmodel, bool);
>  rtx expand_atomic_fetch_op (rtx, rtx, rtx, enum rtx_code, enum memmodel,
>                               bool);
>
> +extern void expand_asm_reg_clobber_mem_blockage (HARD_REG_SET);
> +
>  extern bool insn_operand_matches (enum insn_code icode, unsigned int opno,
>                                   rtx operand);
>  extern bool valid_multiword_target_p (rtx);
> diff --git a/gcc/opts.c b/gcc/opts.c
> index 3bda59a..f95a1f0 100644
> --- a/gcc/opts.c
> +++ b/gcc/opts.c
> @@ -1776,6 +1776,24 @@ const struct sanitizer_opts_s coverage_sanitizer_opts[] =
>    { NULL, 0U, 0UL, false }
>  };
>
> +/* -fzero-call-used-regs= suboptions.  */
> +const struct zero_call_used_regs_opts_s zero_call_used_regs_opts[] =
> +{
> +#define ZERO_CALL_USED_REGS_OPT(name, flags) \
> +    { #name, flags }
> +  ZERO_CALL_USED_REGS_OPT (skip, SKIP),
> +  ZERO_CALL_USED_REGS_OPT (used-gpr-arg, (ONLY_USED | ONLY_GPR | ONLY_ARG)),
> +  ZERO_CALL_USED_REGS_OPT (used-arg, (ONLY_USED | ONLY_ARG)),
> +  ZERO_CALL_USED_REGS_OPT (all-gpr-arg, (ONLY_GPR | ONLY_ARG)),
> +  ZERO_CALL_USED_REGS_OPT (all-arg, ONLY_ARG),
> +  ZERO_CALL_USED_REGS_OPT (used-gpr, (ONLY_USED | ONLY_GPR)),
> +  ZERO_CALL_USED_REGS_OPT (all-gpr, ONLY_GPR),
> +  ZERO_CALL_USED_REGS_OPT (used, ONLY_USED),
> +  ZERO_CALL_USED_REGS_OPT (all, ALL),
> +#undef ZERO_CALL_USED_REGS_OPT
> +  {NULL, 0U}
> +};
> +
>  /* A struct for describing a run of chars within a string.  */
>
>  class string_fragment
> @@ -1970,6 +1988,30 @@ parse_no_sanitize_attribute (char *value)
>    return flags;
>  }
>
> +/* Parse -fzero-call-used-regs suboptions from ARG, return the FLAGS.  */
> +
> +unsigned int
> +parse_zero_call_used_regs_options (const char *arg)
> +{
> +  bool found = false;
> +  unsigned int flags = 0;
> +  unsigned int i;
> +
> +  /* Check to see if the string matches a sub-option name.  */
> +  for (i = 0; zero_call_used_regs_opts[i].name != NULL; ++i)
> +    if (strcmp (arg, zero_call_used_regs_opts[i].name) == 0)
> +      {
> +       flags |= zero_call_used_regs_opts[i].flag;
> +       found = true;
> +       break;
> +      }
> +
> +  if (!found)
> +    error ("unrecognized argument to %<-fzero-call-used-regs=%>: %qs", arg);
> +
> +  return flags;
> +}
> +
>  /* Parse -falign-NAME format for a FLAG value.  Return individual
>     parsed integer values into RESULT_VALUES array.  If REPORT_ERROR is
>     set, print error message at LOC location.  */
> @@ -2601,6 +2643,11 @@ common_handle_option (struct gcc_options *opts,
>        /* Automatically sets -ftree-loop-vectorize and
>          -ftree-slp-vectorize.  Nothing more to do here.  */
>        break;
> +    case OPT_fzero_call_used_regs_:
> +      opts->x_flag_zero_call_used_regs
> +       = parse_zero_call_used_regs_options (arg);
> +      break;
> +
>      case OPT_fshow_column:
>        dc->show_column = value;
>        break;
> diff --git a/gcc/opts.h b/gcc/opts.h
> index 8f594b4..7d1e126 100644
> --- a/gcc/opts.h
> +++ b/gcc/opts.h
> @@ -444,6 +444,12 @@ extern const struct sanitizer_opts_s
>    bool can_recover;
>  } sanitizer_opts[];
>
> +extern const struct zero_call_used_regs_opts_s
> +{
> +  const char *const name;
> +  unsigned int flag;
> +} zero_call_used_regs_opts[];
> +
>  extern vec<const char *> help_option_arguments;
>
>  extern void add_misspelling_candidates (auto_vec<char *> *candidates,
> diff --git a/gcc/passes.def b/gcc/passes.def
> index f865bdc..77d4676 100644
> --- a/gcc/passes.def
> +++ b/gcc/passes.def
> @@ -492,6 +492,7 @@ along with GCC; see the file COPYING3.  If not see
>        POP_INSERT_PASSES ()
>        NEXT_PASS (pass_late_compilation);
>        PUSH_INSERT_PASSES_WITHIN (pass_late_compilation)
> +         NEXT_PASS (pass_zero_call_used_regs);
>           NEXT_PASS (pass_compute_alignments);
>           NEXT_PASS (pass_variable_tracking);
>           NEXT_PASS (pass_free_cfg);
> diff --git a/gcc/recog.c b/gcc/recog.c
> index ce83b7f..e231b5d 100644
> --- a/gcc/recog.c
> +++ b/gcc/recog.c
> @@ -923,6 +923,22 @@ validate_simplify_insn (rtx_insn *insn)
>    return ((num_changes_pending () > 0) && (apply_change_group () > 0));
>  }
>
>
> +
> +/* Check whether INSN matches a specific alternative of an .md pattern.  */
> +bool
> +valid_insn_p (rtx_insn *insn)
> +{
> +  recog_memoized (insn);
> +  if (INSN_CODE (insn) < 0)
> +    return false;
> +  extract_insn (insn);
> +  /* We don't know whether the insn will be in code that is optimized
> +     for size or speed, so consider all enabled alternatives.  */
> +  if (!constrain_operands (1, get_enabled_alternatives (insn)))
> +    return false;
> +  return true;
> +}
> +
>  /* Return 1 if OP is a valid general operand for machine mode MODE.
>     This is either a register reference, a memory reference,
>     or a constant.  In the case of a memory reference, the address
> diff --git a/gcc/recog.h b/gcc/recog.h
> index ae3675f..d87456c 100644
> --- a/gcc/recog.h
> +++ b/gcc/recog.h
> @@ -113,6 +113,7 @@ extern void validate_replace_src_group (rtx, rtx, rtx_insn *);
>  extern bool validate_simplify_insn (rtx_insn *insn);
>  extern int num_changes_pending (void);
>  extern bool reg_fits_class_p (const_rtx, reg_class_t, int, machine_mode);
> +extern bool valid_insn_p (rtx_insn *);
>
>  extern int offsettable_memref_p (rtx);
>  extern int offsettable_nonstrict_memref_p (rtx);
> diff --git a/gcc/resource.c b/gcc/resource.c
> index 0a9d594..90cf091 100644
> --- a/gcc/resource.c
> +++ b/gcc/resource.c
> @@ -1186,7 +1186,7 @@ init_resource_info (rtx_insn *epilogue_insn)
>                                &end_of_function_needs, true);
>
>    for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
> -    if (global_regs[i] || EPILOGUE_USES (i))
> +    if (global_regs[i] || df_epilogue_uses_p (i))
>        SET_HARD_REG_BIT (end_of_function_needs.regs, i);
>
>    /* The registers required to be live at the end of the function are
> diff --git a/gcc/target.def b/gcc/target.def
> index ed2da15..20e7f81 100644
> --- a/gcc/target.def
> +++ b/gcc/target.def
> @@ -5080,6 +5080,21 @@ argument list due to stack realignment.  Return @code{NULL} if no DRAP\n\
>  is needed.",
>   rtx, (void), NULL)
>
> +/* Generate instruction sequence to zero call used registers.  */
> +DEFHOOK
> +(zero_call_used_regs,
> + "This target hook emits instructions to zero subset of @var{selected_regs}\n\
> +that could conceivably contain values that are useful to an attacker.\n\
> +Return the set of registers that were actually cleared.\n\
> +\n\
> +The default implementation uses normal move instructions to zero\n\
> +all the registers in @var{selected_regs}.  Define this hook if the\n\
> +target has more efficient ways of zeroing certain registers,\n\
> +or if you believe that certain registers would never contain\n\
> +values that are useful to an attacker.",
> + HARD_REG_SET, (HARD_REG_SET selected_regs),
> +default_zero_call_used_regs)
> +
>  /* Return true if all function parameters should be spilled to the
>     stack.  */
>  DEFHOOK
> diff --git a/gcc/targhooks.c b/gcc/targhooks.c
> index 5d94fce..88eef00 100644
> --- a/gcc/targhooks.c
> +++ b/gcc/targhooks.c
> @@ -56,6 +56,9 @@ along with GCC; see the file COPYING3.  If not see
>  #include "tree-ssa-alias.h"
>  #include "gimple-expr.h"
>  #include "memmodel.h"
> +#include "backend.h"
> +#include "emit-rtl.h"
> +#include "df.h"
>  #include "tm_p.h"
>  #include "stringpool.h"
>  #include "tree-vrp.h"
> @@ -987,6 +990,35 @@ default_function_value_regno_p (const unsigned int regno ATTRIBUTE_UNUSED)
>  #endif
>  }
>
> +/* The default hook for TARGET_ZERO_CALL_USED_REGS.  */
> +
> +HARD_REG_SET
> +default_zero_call_used_regs (HARD_REG_SET need_zeroed_hardregs)
> +{
> +  gcc_assert (!hard_reg_set_empty_p (need_zeroed_hardregs));
> +
> +  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
> +    if (TEST_HARD_REG_BIT (need_zeroed_hardregs, regno))
> +      {
> +       rtx_insn *last_insn = get_last_insn ();
> +       machine_mode mode = GET_MODE (regno_reg_rtx[regno]);
> +       rtx zero = CONST0_RTX (mode);
> +       rtx_insn *insn = emit_move_insn (regno_reg_rtx[regno], zero);
> +       if (!valid_insn_p (insn))
> +         {
> +           static bool issued_error;
> +           if (!issued_error)
> +             {
> +               issued_error = true;
> +               sorry ("%qs not supported on this target",
> +                       "fzero-call-used_regs");
> +             }
> +           delete_insns_since (last_insn);
> +         }
> +      }
> +  return need_zeroed_hardregs;
> +}
> +
>  rtx
>  default_internal_arg_pointer (void)
>  {
> diff --git a/gcc/targhooks.h b/gcc/targhooks.h
> index 44ab926..e0a925f 100644
> --- a/gcc/targhooks.h
> +++ b/gcc/targhooks.h
> @@ -160,6 +160,7 @@ extern unsigned int default_function_arg_round_boundary (machine_mode,
>                                                          const_tree);
>  extern bool hook_bool_const_rtx_commutative_p (const_rtx, int);
>  extern rtx default_function_value (const_tree, const_tree, bool);
> +extern HARD_REG_SET default_zero_call_used_regs (HARD_REG_SET);
>  extern rtx default_libcall_value (machine_mode, const_rtx);
>  extern bool default_function_value_regno_p (const unsigned int);
>  extern rtx default_internal_arg_pointer (void);
> diff --git a/gcc/testsuite/c-c++-common/zero-scratch-regs-1.c b/gcc/testsuite/c-c++-common/zero-scratch-regs-1.c
> new file mode 100644
> index 0000000..f44add9
> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/zero-scratch-regs-1.c
> @@ -0,0 +1,15 @@
> +/* { dg-do run } */
> +/* { dg-options "-O2 -fzero-call-used-regs=all" } */
> +
> +volatile int result = 0;
> +int
> +__attribute__((noinline))
> +foo (int x)
> +{
> +  return x;
> +}
> +int main()
> +{
> +  result = foo (2);
> +  return 0;
> +}
> diff --git a/gcc/testsuite/c-c++-common/zero-scratch-regs-2.c b/gcc/testsuite/c-c++-common/zero-scratch-regs-2.c
> new file mode 100644
> index 0000000..7c8350b
> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/zero-scratch-regs-2.c
> @@ -0,0 +1,16 @@
> +/* { dg-do run } */
> +/* { dg-options "-O2" } */
> +
> +volatile int result = 0;
> +int
> +__attribute__((noinline))
> +__attribute__ ((zero_call_used_regs("all")))
> +foo (int x)
> +{
> +  return x;
> +}
> +int main()
> +{
> +  result = foo (2);
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c
> new file mode 100644
> index 0000000..9f61dc4
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=used" } */
> +
> +void
> +foo (void)
> +{
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c
> new file mode 100644
> index 0000000..09048e5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c
> @@ -0,0 +1,21 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> +
> +extern int foo (int) __attribute__ ((zero_call_used_regs("all-gpr")));
> +
> +int
> +foo (int x)
> +{
> +  return x;
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler "xorl\[ \t\]*%edx, %edx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %ecx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %esi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %edi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r8d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r9d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r10d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r11d" { target { ! ia32 } } } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c
> new file mode 100644
> index 0000000..4862688
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c
> @@ -0,0 +1,39 @@
> +/* { dg-do run { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=used-gpr" } */
> +
> +struct S { int i; };
> +__attribute__((const, noinline, noclone))
> +struct S foo (int x)
> +{
> +  struct S s;
> +  s.i = x;
> +  return s;
> +}
> +
> +int a[2048], b[2048], c[2048], d[2048];
> +struct S e[2048];
> +
> +__attribute__((noinline, noclone)) void
> +bar (void)
> +{
> +  int i;
> +  for (i = 0; i < 1024; i++)
> +    {
> +      e[i] = foo (i);
> +      a[i+2] = a[i] + a[i+1];
> +      b[10] = b[10] + i;
> +      c[i] = c[2047 - i];
> +      d[i] = d[i + 1];
> +    }
> +}
> +
> +int
> +main ()
> +{
> +  int i;
> +  bar ();
> +  for (i = 0; i < 1024; i++)
> +    if (e[i].i != i)
> +      __builtin_abort ();
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c
> new file mode 100644
> index 0000000..500251b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c
> @@ -0,0 +1,39 @@
> +/* { dg-do run { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=all-gpr" } */
> +
> +struct S { int i; };
> +__attribute__((const, noinline, noclone))
> +struct S foo (int x)
> +{
> +  struct S s;
> +  s.i = x;
> +  return s;
> +}
> +
> +int a[2048], b[2048], c[2048], d[2048];
> +struct S e[2048];
> +
> +__attribute__((noinline, noclone)) void
> +bar (void)
> +{
> +  int i;
> +  for (i = 0; i < 1024; i++)
> +    {
> +      e[i] = foo (i);
> +      a[i+2] = a[i] + a[i+1];
> +      b[10] = b[10] + i;
> +      c[i] = c[2047 - i];
> +      d[i] = d[i + 1];
> +    }
> +}
> +
> +int
> +main ()
> +{
> +  int i;
> +  bar ();
> +  for (i = 0; i < 1024; i++)
> +    if (e[i].i != i)
> +      __builtin_abort ();
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-13.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-13.c
> new file mode 100644
> index 0000000..8b058e3
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-13.c
> @@ -0,0 +1,21 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7" } */
> +
> +void
> +foo (void)
> +{
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm0, %xmm0" } } */
> +/* { dg-final { scan-assembler-times "movaps\[ \t\]*%xmm0, %xmm\[0-9\]+" 7 { target { ia32 } } } } */
> +/* { dg-final { scan-assembler-times "movaps\[ \t\]*%xmm0, %xmm\[0-9\]+" 15 { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-14.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-14.c
> new file mode 100644
> index 0000000..d4eaaf7
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-14.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7 -mavx" } */
> +
> +void
> +foo (void)
> +{
> +}
> +
> +/* { dg-final { scan-assembler-times "vzeroall" 1 } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-15.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-15.c
> new file mode 100644
> index 0000000..dd3bb90
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-15.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> +
> +extern void foo (void) __attribute__ ((zero_call_used_regs("used")));
> +
> +void
> +foo (void)
> +{
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-16.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-16.c
> new file mode 100644
> index 0000000..e2274f6
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-16.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=all" } */
> +
> +extern void foo (void) __attribute__ ((zero_call_used_regs("skip")));
> +
> +void
> +foo (void)
> +{
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-17.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-17.c
> new file mode 100644
> index 0000000..7f5d153
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-17.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=used" } */
> +
> +int
> +foo (int x)
> +{
> +  return x;
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" { target ia32 } } } */
> +/* { dg-final { scan-assembler "xorl\[ \t\]*%edi, %edi" { target { ! ia32 } } } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-18.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-18.c
> new file mode 100644
> index 0000000..fe13d2b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-18.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=used -march=corei7" } */
> +
> +float
> +foo (float z, float y, float x)
> +{
> +  return x + y;
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm1, %xmm1" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movaps\[ \t\]*%xmm1, %xmm2" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-19.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-19.c
> new file mode 100644
> index 0000000..205a532
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-19.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=used -march=corei7" } */
> +
> +float
> +foo (float z, float y, float x)
> +{
> +  return x;
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm2, %xmm2" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c
> new file mode 100644
> index 0000000..e046684
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=all-gpr" } */
> +
> +void
> +foo (void)
> +{
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-20.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-20.c
> new file mode 100644
> index 0000000..4be8ff6
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-20.c
> @@ -0,0 +1,23 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7" } */
> +
> +float
> +foo (float z, float y, float x)
> +{
> +  return x + y;
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm0, %xmm0" { target { ia32 } } } } */
> +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm1, %xmm1" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler-times "movaps\[ \t\]*%xmm0, %xmm\[0-9\]+" 7 { target { ia32 } } } } */
> +/* { dg-final { scan-assembler-times "movaps\[ \t\]*%xmm1, %xmm\[0-9\]+" 14 { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-21.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-21.c
> new file mode 100644
> index 0000000..0eb34e0
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-21.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=skip -march=corei7" } */
> +
> +__attribute__ ((zero_call_used_regs("used")))
> +float
> +foo (float z, float y, float x)
> +{
> +  return x + y;
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm1, %xmm1" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movaps\[ \t\]*%xmm1, %xmm2" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-22.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-22.c
> new file mode 100644
> index 0000000..0258c70
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-22.c
> @@ -0,0 +1,21 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7 -mavx" } */
> +
> +void
> +foo (void)
> +{
> +}
> +
> +/* { dg-final { scan-assembler "vzeroall" } } */
> +/* { dg-final { scan-assembler-times "fldz" 8 } } */
> +/* { dg-final { scan-assembler-times "fstp" 8 } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-23.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-23.c
> new file mode 100644
> index 0000000..0625eb5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-23.c
> @@ -0,0 +1,29 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7 -mavx512f" } */
> +
> +void
> +foo (void)
> +{
> +}
> +
> +/* { dg-final { scan-assembler "vzeroall" } } */
> +/* { dg-final { scan-assembler-times "fldz" 8 } } */
> +/* { dg-final { scan-assembler-times "fstp" 8 } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "kxorw\[ \t\]*%k0, %k0, %k0" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "kmovw\[ \t\]*%k0, %k1" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "kmovw\[ \t\]*%k0, %k2" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "kmovw\[ \t\]*%k0, %k3" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "kmovw\[ \t\]*%k0, %k4" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "kmovw\[ \t\]*%k0, %k5" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "kmovw\[ \t\]*%k0, %k6" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "kmovw\[ \t\]*%k0, %k7" { target { ! ia32 } } } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-24.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-24.c
> new file mode 100644
> index 0000000..208633e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-24.c
> @@ -0,0 +1,10 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=used-gpr-arg" } */
> +
> +int
> +foo (int x)
> +{
> +  return x;
> +}
> +
> +/* { dg-final { scan-assembler "xorl\[ \t\]*%edi, %edi" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-25.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-25.c
> new file mode 100644
> index 0000000..21e82c6
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-25.c
> @@ -0,0 +1,10 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=used-arg" } */
> +
> +int
> +foo (int x)
> +{
> +  return x;
> +}
> +
> +/* { dg-final { scan-assembler "xorl\[ \t\]*%edi, %edi" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-26.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-26.c
> new file mode 100644
> index 0000000..293d2fe
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-26.c
> @@ -0,0 +1,23 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=all-arg" } */
> +
> +int
> +foo (int x)
> +{
> +  return x;
> +}
> +
> +/* { dg-final { scan-assembler "xorl\[ \t\]*%edx, %edx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %ecx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %esi" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %edi" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r8d" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r9d" } } */
> +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm0, %xmm0" } } */
> +/* { dg-final { scan-assembler "movaps\[ \t\]*%xmm0, %xmm1" } } */
> +/* { dg-final { scan-assembler "movaps\[ \t\]*%xmm0, %xmm2" } } */
> +/* { dg-final { scan-assembler "movaps\[ \t\]*%xmm0, %xmm3" } } */
> +/* { dg-final { scan-assembler "movaps\[ \t\]*%xmm0, %xmm4" } } */
> +/* { dg-final { scan-assembler "movaps\[ \t\]*%xmm0, %xmm5" } } */
> +/* { dg-final { scan-assembler "movaps\[ \t\]*%xmm0, %xmm6" } } */
> +/* { dg-final { scan-assembler "movaps\[ \t\]*%xmm0, %xmm7" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-27.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-27.c
> new file mode 100644
> index 0000000..c34e6af
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-27.c
> @@ -0,0 +1,15 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=all-gpr-arg" } */
> +
> +int
> +foo (int x)
> +{
> +  return x;
> +}
> +
> +/* { dg-final { scan-assembler "xorl\[ \t\]*%edx, %edx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %ecx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %esi" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %edi" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r8d" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r9d" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c
> new file mode 100644
> index 0000000..de71223
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> +
> +void
> +foo (void)
> +{
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c
> new file mode 100644
> index 0000000..ccfa441
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> +
> +extern void foo (void) __attribute__ ((zero_call_used_regs("used-gpr")));
> +
> +void
> +foo (void)
> +{
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c
> new file mode 100644
> index 0000000..6b46ca3
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c
> @@ -0,0 +1,20 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> +
> +__attribute__ ((zero_call_used_regs("all-gpr")))
> +void
> +foo (void)
> +{
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c
> new file mode 100644
> index 0000000..0680f38
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=all-gpr" } */
> +
> +extern void foo (void) __attribute__ ((zero_call_used_regs("skip")));
> +
> +void
> +foo (void)
> +{
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c
> new file mode 100644
> index 0000000..534defa
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=used-gpr" } */
> +
> +int
> +foo (int x)
> +{
> +  return x;
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" { target ia32 } } } */
> +/* { dg-final { scan-assembler "xorl\[ \t\]*%edi, %edi" { target { ! ia32 } } } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c
> new file mode 100644
> index 0000000..477bb19
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=all-gpr" } */
> +
> +int
> +foo (int x)
> +{
> +  return x;
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler "xorl\[ \t\]*%edx, %edx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %ecx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %esi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %edi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r8d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r9d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r10d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r11d" { target { ! ia32 } } } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c
> new file mode 100644
> index 0000000..a305a60
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c
> @@ -0,0 +1,15 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> +
> +extern int foo (int) __attribute__ ((zero_call_used_regs("used-gpr")));
> +
> +int
> +foo (int x)
> +{
> +  return x;
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" { target ia32 } } } */
> +/* { dg-final { scan-assembler "xorl\[ \t\]*%edi, %edi" { target { ! ia32 } } } } */
> diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
> index 62e5b69..8afe8ee 100644
> --- a/gcc/tree-pass.h
> +++ b/gcc/tree-pass.h
> @@ -592,6 +592,7 @@ extern rtl_opt_pass *make_pass_gcse2 (gcc::context *ctxt);
>  extern rtl_opt_pass *make_pass_split_after_reload (gcc::context *ctxt);
>  extern rtl_opt_pass *make_pass_thread_prologue_and_epilogue (gcc::context
>                                                              *ctxt);
> +extern rtl_opt_pass *make_pass_zero_call_used_regs (gcc::context *ctxt);
>  extern rtl_opt_pass *make_pass_stack_adjustments (gcc::context *ctxt);
>  extern rtl_opt_pass *make_pass_sched_fusion (gcc::context *ctxt);
>  extern rtl_opt_pass *make_pass_peephole2 (gcc::context *ctxt);
> --
> 1.8.3.1
>
Qing Zhao Oct. 26, 2020, 2:45 p.m. UTC | #2
>> 
>> +/* Generate insns to zero all st/mm registers together.
>> +   Return true when zeroing instructions are generated.
>> +   Assume the number of st registers that are zeroed is num_of_st,
>> +   we will emit the following sequence to zero them together:
>> +                 fldz;         \
>> +                 fldz;         \
>> +                 ...
>> +                 fldz;         \
>> +                 fstp %%st(0); \
>> +                 fstp %%st(0); \
>> +                 ...
>> +                 fstp %%st(0);
>> +   i.e., num_of_st fldz followed by num_of_st fstp to clear the stack
>> +   mark stack slots empty.  */
>> +
>> +static bool
>> +zero_all_st_mm_registers (HARD_REG_SET need_zeroed_hardregs)
>> +{
>> +  unsigned int num_of_st = 0;
>> +  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
>> +    if (STACK_REGNO_P (regno)
>> +       && TEST_HARD_REG_BIT (need_zeroed_hardregs, regno)
>> +       /* When the corresponding mm register also need to be cleared too.  */
>> +       && TEST_HARD_REG_BIT (need_zeroed_hardregs,
>> +                             (regno - FIRST_STACK_REG + FIRST_MMX_REG)))
>> +      num_of_st++;
> 
> I don't think the above logic is correct. It should go like this:
> 
> - If the function is returning an MMX register,

How to check on this? Is the following correct?

If (GET_CODE(crtl->return_rtx) == REG 
    && (MMX_REG_P (REGNO (crtl->return_rtx)))

   The function is returning an MMX register.


> then the function
> exits in MMX mode, and MMX registers should be cleared in the same way
> as XMM registers.

When clearing XMM registers, we used V4SFmode, what’s the mode we should use to clearing
mmx registers?

> Otherwise the ABI specifies that the function exits
> in x87 mode and x87 stack should be cleared (but see below).
> 
> - There is no direct mapping of stack registers to hard register
> numbers. If a stack register is used, we don't know where in the stack
> the value remains. So, if _any_ stack register is touched, the whole
> stack should be cleared (value, returning in x87 stack register should
> obviously be excluded).

Then, how to exclude the x87 stack register that returns the function return value when we need to 
Clear the whole stack? 
I am a little confused here? Could you explain a little more details?
> 
> - There is no x87 argument register. 32bit targets use MMX0-3 argument
> registers and return value in the XMM register. Please also note that
> complex values take two stack slots in x87 stack.

You mean the complex return value will be returned in two  x87 registers? 

thanks.

Qing
> 
> Uros.
> 
>> +
>> +  if (num_of_st == 0)
Uros Bizjak Oct. 26, 2020, 4:13 p.m. UTC | #3
On Mon, Oct 26, 2020 at 3:45 PM Qing Zhao <QING.ZHAO@oracle.com> wrote:
>
>
> +/* Generate insns to zero all st/mm registers together.
> +   Return true when zeroing instructions are generated.
> +   Assume the number of st registers that are zeroed is num_of_st,
> +   we will emit the following sequence to zero them together:
> +                 fldz;         \
> +                 fldz;         \
> +                 ...
> +                 fldz;         \
> +                 fstp %%st(0); \
> +                 fstp %%st(0); \
> +                 ...
> +                 fstp %%st(0);
> +   i.e., num_of_st fldz followed by num_of_st fstp to clear the stack
> +   mark stack slots empty.  */
> +
> +static bool
> +zero_all_st_mm_registers (HARD_REG_SET need_zeroed_hardregs)
> +{
> +  unsigned int num_of_st = 0;
> +  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
> +    if (STACK_REGNO_P (regno)
> +       && TEST_HARD_REG_BIT (need_zeroed_hardregs, regno)
> +       /* When the corresponding mm register also need to be cleared too.  */
> +       && TEST_HARD_REG_BIT (need_zeroed_hardregs,
> +                             (regno - FIRST_STACK_REG + FIRST_MMX_REG)))
> +      num_of_st++;
>
>
> I don't think the above logic is correct. It should go like this:
>
> - If the function is returning an MMX register,
>
>
> How to check on this? Is the following correct?
>
> If (GET_CODE(crtl->return_rtx) == REG
>     && (MMX_REG_P (REGNO (crtl->return_rtx)))

Yes, but please use

if (MMX_REG_P (crtl->return_rtx))

>
>    The function is returning an MMX register.
>
>
> then the function
> exits in MMX mode, and MMX registers should be cleared in the same way
> as XMM registers.
>
>
> When clearing XMM registers, we used V4SFmode, what’s the mode we should use to clearing
> mmx registers?

It doesn't matter that much, any 8byte vector mode will do (including
DImode). Let's use V4HImode.

> Otherwise the ABI specifies that the function exits
> in x87 mode and x87 stack should be cleared (but see below).
>
> - There is no direct mapping of stack registers to hard register
> numbers. If a stack register is used, we don't know where in the stack
> the value remains. So, if _any_ stack register is touched, the whole
> stack should be cleared (value, returning in x87 stack register should
> obviously be excluded).
>
>
> Then, how to exclude the x87 stack register that returns the function return value when we need to
> Clear the whole stack?
> I am a little confused here? Could you explain a little more details?

x87 returns in the top (two for complex values) register, so simply
load 7 zeros (and 7 corresponding pops). This will preserve the return
value but clear the whole remaining stack.

> - There is no x87 argument register. 32bit targets use MMX0-3 argument
> registers and return value in the XMM register. Please also note that
> complex values take two stack slots in x87 stack.
>
>
> You mean the complex return value will be returned in two  x87 registers?

Yes, please see ix86_class_max_nregs. Please note that in case of
complex return value, only 6 zeros should be loaded to avoid
clobbering the complex return value.

Uros.
Qing Zhao Oct. 26, 2020, 4:26 p.m. UTC | #4
> On Oct 26, 2020, at 11:13 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
> 
> On Mon, Oct 26, 2020 at 3:45 PM Qing Zhao <QING.ZHAO@oracle.com <mailto:QING.ZHAO@oracle.com>> wrote:
>> 
>> 
>> +/* Generate insns to zero all st/mm registers together.
>> +   Return true when zeroing instructions are generated.
>> +   Assume the number of st registers that are zeroed is num_of_st,
>> +   we will emit the following sequence to zero them together:
>> +                 fldz;         \
>> +                 fldz;         \
>> +                 ...
>> +                 fldz;         \
>> +                 fstp %%st(0); \
>> +                 fstp %%st(0); \
>> +                 ...
>> +                 fstp %%st(0);
>> +   i.e., num_of_st fldz followed by num_of_st fstp to clear the stack
>> +   mark stack slots empty.  */
>> +
>> +static bool
>> +zero_all_st_mm_registers (HARD_REG_SET need_zeroed_hardregs)
>> +{
>> +  unsigned int num_of_st = 0;
>> +  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
>> +    if (STACK_REGNO_P (regno)
>> +       && TEST_HARD_REG_BIT (need_zeroed_hardregs, regno)
>> +       /* When the corresponding mm register also need to be cleared too.  */
>> +       && TEST_HARD_REG_BIT (need_zeroed_hardregs,
>> +                             (regno - FIRST_STACK_REG + FIRST_MMX_REG)))
>> +      num_of_st++;
>> 
>> 
>> I don't think the above logic is correct. It should go like this:
>> 
>> - If the function is returning an MMX register,
>> 
>> 
>> How to check on this? Is the following correct?
>> 
>> If (GET_CODE(crtl->return_rtx) == REG
>>    && (MMX_REG_P (REGNO (crtl->return_rtx)))
> 
> Yes, but please use
> 
> if (MMX_REG_P (crtl->return_rtx))

Okay.
> 
>> 
>>   The function is returning an MMX register.
>> 
>> 
>> then the function
>> exits in MMX mode, and MMX registers should be cleared in the same way
>> as XMM registers.
>> 
>> 
>> When clearing XMM registers, we used V4SFmode, what’s the mode we should use to clearing
>> mmx registers?
> 
> It doesn't matter that much, any 8byte vector mode will do (including
> DImode). Let's use V4HImode.
Okay.

> 
>> Otherwise the ABI specifies that the function exits
>> in x87 mode and x87 stack should be cleared (but see below).
>> 
>> - There is no direct mapping of stack registers to hard register
>> numbers. If a stack register is used, we don't know where in the stack
>> the value remains. So, if _any_ stack register is touched, the whole
>> stack should be cleared (value, returning in x87 stack register should
>> obviously be excluded).
>> 
>> 
>> Then, how to exclude the x87 stack register that returns the function return value when we need to
>> Clear the whole stack?
>> I am a little confused here? Could you explain a little more details?
> 
> x87 returns in the top (two for complex values) register, so simply
> load 7 zeros (and 7 corresponding pops). This will preserve the return
> value but clear the whole remaining stack.

I see. 
> 
>> - There is no x87 argument register. 32bit targets use MMX0-3 argument
>> registers and return value in the XMM register. Please also note that
>> complex values take two stack slots in x87 stack.
>> 
>> 
>> You mean the complex return value will be returned in two  x87 registers?
> 
> Yes, please see ix86_class_max_nregs. Please note that in case of
> complex return value, only 6 zeros should be loaded to avoid
> clobbering the complex return value.

Okay, I see. 

thanks.

Qing
> 
> Uros.
Qing Zhao Oct. 26, 2020, 5:30 p.m. UTC | #5
The following is the current change in i386.c, could you check whether the logic is good?

thanks.

Qing 

/* Check whether the register REGNO should be zeroed on X86.
   When ALL_SSE_ZEROED is true, all SSE registers have been zeroed
   together, no need to zero it again.
   When EXIT_WITH_MMX_MODE is true, MMX registers should be cleared.  */

static bool
zero_call_used_regno_p (const unsigned int regno,
                        bool all_sse_zeroed,
                        bool exit_with_mmx_mode)
{
  return GENERAL_REGNO_P (regno)
         || (!all_sse_zeroed && SSE_REGNO_P (regno))
         || MASK_REGNO_P (regno)
         || exit_with_mmx_mode && MMX_REGNO_P (regno);
}

/* Return the machine_mode that is used to zero register REGNO.  */

static machine_mode
zero_call_used_regno_mode (const unsigned int regno)
{
  /* NB: We only need to zero the lower 32 bits for integer registers
     and the lower 128 bits for vector registers since destination are
     zero-extended to the full register width.  */
  if (GENERAL_REGNO_P (regno))
    return SImode;
  else if (SSE_REGNO_P (regno))
    return V4SFmode;
  else if (MASK_REGNO_P (regno))
    return HImode;
  else if (MMX_REGNO_P (regno))
    return V4HImode;
  else
    gcc_unreachable ();
}

/* Generate a rtx to zero all vector registers together if possible,
   otherwise, return NULL.  */

static rtx
zero_all_vector_registers (HARD_REG_SET need_zeroed_hardregs)
{
  if (!TARGET_AVX)
    return NULL;

  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
    if ((IN_RANGE (regno, FIRST_SSE_REG, LAST_SSE_REG)
         || (TARGET_64BIT
             && (REX_SSE_REGNO_P (regno)
                 || (TARGET_AVX512F && EXT_REX_SSE_REGNO_P (regno)))))
        && !TEST_HARD_REG_BIT (need_zeroed_hardregs, regno))
      return NULL;

  return gen_avx_vzeroall ();
}


/* Generate insns to zero all st registers together.
   Return true when zeroing instructions are generated.
   Assume the number of st registers that are zeroed is num_of_st,
   we will emit the following sequence to zero them together:
                  fldz;         \
                  fldz;         \
                  ...
                  fldz;         \
                  fstp %%st(0); \
                  fstp %%st(0); \
                  ...
                  fstp %%st(0);
   i.e., num_of_st fldz followed by num_of_st fstp to clear the stack
   mark stack slots empty.

   How to compute the num_of_st?
   There is no direct mapping from stack registers to hard register
   numbers.  If one stack register need to be cleared, we don't know
   where in the stack the value remains.  So, if any stack register
   need to be cleared, the whole stack should be cleared.  However,
   x87 stack registers that hold the return value should be excluded.
   x87 returns in the top (two for complex values) register, so
   num_of_st should be 7/6 when x87 returns, otherwise it will be 8.  */


static bool
zero_all_st_registers (HARD_REG_SET need_zeroed_hardregs)
{
  unsigned int num_of_st = 0;
  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
    if (STACK_REGNO_P (regno)
        && TEST_HARD_REG_BIT (need_zeroed_hardregs, regno))
      {
        num_of_st++;
        break;
      }

  if (num_of_st == 0)
    return false;

  bool return_with_x87 = false;
  return_with_x87 = ((GET_CODE (crtl->return_rtx) == REG)
                      && (STACK_REG_P (crtl->return_rtx)));

  bool complex_return = false;
  complex_return = (COMPLEX_MODE_P (GET_MODE (crtl->return_rtx)));

  if (return_with_x87)
    if (complex_return)
      num_of_st = 6;
    else
      num_of_st = 7;
  else
    num_of_st = 8;

  rtx st_reg = gen_rtx_REG (XFmode, FIRST_STACK_REG);

  for (unsigned int i = 0; i < num_of_st; i++)
    emit_insn (gen_rtx_SET (st_reg, CONST0_RTX (XFmode)));

  for (unsigned int i = 0; i < num_of_st; i++)
    {
      rtx insn;
      insn = emit_insn (gen_rtx_SET (st_reg, st_reg));
      add_reg_note (insn, REG_DEAD, st_reg);
    }
  return true;
}

/* TARGET_ZERO_CALL_USED_REGS.  */
/* Generate a sequence of instructions that zero registers specified by
   NEED_ZEROED_HARDREGS.  Return the ZEROED_HARDREGS that are actually
   zeroed.  */
static HARD_REG_SET
ix86_zero_call_used_regs (HARD_REG_SET need_zeroed_hardregs)
{
  HARD_REG_SET zeroed_hardregs;
  bool all_sse_zeroed = false;
  bool st_zeroed = false;

  /* first, let's see whether we can zero all vector registers together.  */
  rtx zero_all_vec_insn = zero_all_vector_registers (need_zeroed_hardregs);
  if (zero_all_vec_insn)
    {
      emit_insn (zero_all_vec_insn);
      all_sse_zeroed = true;
    }

  /* Then, decide which mode (MMX mode or x87 mode) the function exit with.
     In order to decide whether we need to clear the MMX registers or the
     stack registers.  */
  bool exit_with_mmx_mode = false;

  exit_with_mmx_mode = ((GET_CODE (crtl->return_rtx) == REG)
                        && (MMX_REG_P (crtl->return_rtx)));

  /* then, let's see whether we can zero all st registers togeter.  */
  if (!exit_with_mmx_mode)
    st_zeroed = zero_all_st_registers (need_zeroed_hardregs);

  /* Now, generate instructions to zero all the registers.  */

  CLEAR_HARD_REG_SET (zeroed_hardregs);
  if (st_zeroed)
    SET_HARD_REG_BIT (zeroed_hardregs, FIRST_STACK_REG);

  rtx zero_gpr = NULL_RTX;
  rtx zero_vector = NULL_RTX;
  rtx zero_mask = NULL_RTX;

  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
    {
      if (!TEST_HARD_REG_BIT (need_zeroed_hardregs, regno))
        continue;
      if (!zero_call_used_regno_p (regno, all_sse_zeroed, exit_with_mmx_mode))
        continue;

      SET_HARD_REG_BIT (zeroed_hardregs, regno);

      rtx reg, tmp;
      machine_mode mode = zero_call_used_regno_mode (regno);

      reg = gen_rtx_REG (mode, regno);

      if (mode == SImode)
        if (zero_gpr == NULL_RTX)
          {
            zero_gpr = reg;
            tmp = gen_rtx_SET (reg, const0_rtx);
            if (!TARGET_USE_MOV0 || optimize_insn_for_size_p ())
              {
                rtx clob = gen_rtx_CLOBBER (VOIDmode,
                                            gen_rtx_REG (CCmode,
                                                         FLAGS_REG));
                tmp = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2,
                                                             tmp,
                                                             clob));
              }
            emit_insn (tmp);
          }
        else
          emit_move_insn (reg, zero_gpr);
      else if (mode == V4SFmode)
        if (zero_vector == NULL_RTX)
          {
            zero_vector = reg;
            tmp = gen_rtx_SET (reg, const0_rtx);
            emit_insn (tmp);
          }
        else
          emit_move_insn (reg, zero_vector);
      else if (mode == HImode)
        if (zero_mask == NULL_RTX)
          {
            zero_mask = reg;
            tmp = gen_rtx_SET (reg, const0_rtx);
            emit_insn (tmp);
          }
        else
          emit_move_insn (reg, zero_mask);
      else
        gcc_unreachable ();
    }
  return zeroed_hardregs;
}

> On Oct 26, 2020, at 11:13 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
> 
> On Mon, Oct 26, 2020 at 3:45 PM Qing Zhao <QING.ZHAO@oracle.com> wrote:
>> 
>> 
>> +/* Generate insns to zero all st/mm registers together.
>> +   Return true when zeroing instructions are generated.
>> +   Assume the number of st registers that are zeroed is num_of_st,
>> +   we will emit the following sequence to zero them together:
>> +                 fldz;         \
>> +                 fldz;         \
>> +                 ...
>> +                 fldz;         \
>> +                 fstp %%st(0); \
>> +                 fstp %%st(0); \
>> +                 ...
>> +                 fstp %%st(0);
>> +   i.e., num_of_st fldz followed by num_of_st fstp to clear the stack
>> +   mark stack slots empty.  */
>> +
>> +static bool
>> +zero_all_st_mm_registers (HARD_REG_SET need_zeroed_hardregs)
>> +{
>> +  unsigned int num_of_st = 0;
>> +  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
>> +    if (STACK_REGNO_P (regno)
>> +       && TEST_HARD_REG_BIT (need_zeroed_hardregs, regno)
>> +       /* When the corresponding mm register also need to be cleared too.  */
>> +       && TEST_HARD_REG_BIT (need_zeroed_hardregs,
>> +                             (regno - FIRST_STACK_REG + FIRST_MMX_REG)))
>> +      num_of_st++;
>> 
>> 
>> I don't think the above logic is correct. It should go like this:
>> 
>> - If the function is returning an MMX register,
>> 
>> 
>> How to check on this? Is the following correct?
>> 
>> If (GET_CODE(crtl->return_rtx) == REG
>>    && (MMX_REG_P (REGNO (crtl->return_rtx)))
> 
> Yes, but please use
> 
> if (MMX_REG_P (crtl->return_rtx))
> 
>> 
>>   The function is returning an MMX register.
>> 
>> 
>> then the function
>> exits in MMX mode, and MMX registers should be cleared in the same way
>> as XMM registers.
>> 
>> 
>> When clearing XMM registers, we used V4SFmode, what’s the mode we should use to clearing
>> mmx registers?
> 
> It doesn't matter that much, any 8byte vector mode will do (including
> DImode). Let's use V4HImode.
> 
>> Otherwise the ABI specifies that the function exits
>> in x87 mode and x87 stack should be cleared (but see below).
>> 
>> - There is no direct mapping of stack registers to hard register
>> numbers. If a stack register is used, we don't know where in the stack
>> the value remains. So, if _any_ stack register is touched, the whole
>> stack should be cleared (value, returning in x87 stack register should
>> obviously be excluded).
>> 
>> 
>> Then, how to exclude the x87 stack register that returns the function return value when we need to
>> Clear the whole stack?
>> I am a little confused here? Could you explain a little more details?
> 
> x87 returns in the top (two for complex values) register, so simply
> load 7 zeros (and 7 corresponding pops). This will preserve the return
> value but clear the whole remaining stack.
> 
>> - There is no x87 argument register. 32bit targets use MMX0-3 argument
>> registers and return value in the XMM register. Please also note that
>> complex values take two stack slots in x87 stack.
>> 
>> 
>> You mean the complex return value will be returned in two  x87 registers?
> 
> Yes, please see ix86_class_max_nregs. Please note that in case of
> complex return value, only 6 zeros should be loaded to avoid
> clobbering the complex return value.
> 
> Uros.
Uros Bizjak Oct. 26, 2020, 6:42 p.m. UTC | #6
On Mon, Oct 26, 2020 at 6:30 PM Qing Zhao <QING.ZHAO@oracle.com> wrote:
>
>
> The following is the current change in i386.c, could you check whether the logic is good?

x87 handling looks good to me.

One remaining question: If the function uses MMX regs (either
internally or as an argument register), but exits in x87 mode, does
your logic clear the x87 stack?

(The ABI in the above case requires EMMS before exit, but the values
from XMM regs still remain as their aliases in x87 stack.)

Uros.

> thanks.
>
> Qing
>
> /* Check whether the register REGNO should be zeroed on X86.
>    When ALL_SSE_ZEROED is true, all SSE registers have been zeroed
>    together, no need to zero it again.
>    When EXIT_WITH_MMX_MODE is true, MMX registers should be cleared.  */
>
> static bool
> zero_call_used_regno_p (const unsigned int regno,
>                         bool all_sse_zeroed,
>                         bool exit_with_mmx_mode)
> {
>   return GENERAL_REGNO_P (regno)
>          || (!all_sse_zeroed && SSE_REGNO_P (regno))
>          || MASK_REGNO_P (regno)
>          || exit_with_mmx_mode && MMX_REGNO_P (regno);
> }
>
> /* Return the machine_mode that is used to zero register REGNO.  */
>
> static machine_mode
> zero_call_used_regno_mode (const unsigned int regno)
> {
>   /* NB: We only need to zero the lower 32 bits for integer registers
>      and the lower 128 bits for vector registers since destination are
>      zero-extended to the full register width.  */
>   if (GENERAL_REGNO_P (regno))
>     return SImode;
>   else if (SSE_REGNO_P (regno))
>     return V4SFmode;
>   else if (MASK_REGNO_P (regno))
>     return HImode;
>   else if (MMX_REGNO_P (regno))
>     return V4HImode;
>   else
>     gcc_unreachable ();
> }
>
> /* Generate a rtx to zero all vector registers together if possible,
>    otherwise, return NULL.  */
>
> static rtx
> zero_all_vector_registers (HARD_REG_SET need_zeroed_hardregs)
> {
>   if (!TARGET_AVX)
>     return NULL;
>
>   for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
>     if ((IN_RANGE (regno, FIRST_SSE_REG, LAST_SSE_REG)
>          || (TARGET_64BIT
>              && (REX_SSE_REGNO_P (regno)
>                  || (TARGET_AVX512F && EXT_REX_SSE_REGNO_P (regno)))))
>         && !TEST_HARD_REG_BIT (need_zeroed_hardregs, regno))
>       return NULL;
>
>   return gen_avx_vzeroall ();
> }
>
>
> /* Generate insns to zero all st registers together.
>    Return true when zeroing instructions are generated.
>    Assume the number of st registers that are zeroed is num_of_st,
>    we will emit the following sequence to zero them together:
>                   fldz;         \
>                   fldz;         \
>                   ...
>                   fldz;         \
>                   fstp %%st(0); \
>                   fstp %%st(0); \
>                   ...
>                   fstp %%st(0);
>    i.e., num_of_st fldz followed by num_of_st fstp to clear the stack
>    mark stack slots empty.
>
>    How to compute the num_of_st?
>    There is no direct mapping from stack registers to hard register
>    numbers.  If one stack register need to be cleared, we don't know
>    where in the stack the value remains.  So, if any stack register
>    need to be cleared, the whole stack should be cleared.  However,
>    x87 stack registers that hold the return value should be excluded.
>    x87 returns in the top (two for complex values) register, so
>    num_of_st should be 7/6 when x87 returns, otherwise it will be 8.  */
>
>
> static bool
> zero_all_st_registers (HARD_REG_SET need_zeroed_hardregs)
> {
>   unsigned int num_of_st = 0;
>   for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
>     if (STACK_REGNO_P (regno)
>         && TEST_HARD_REG_BIT (need_zeroed_hardregs, regno))
>       {
>         num_of_st++;
>         break;
>       }
>
>   if (num_of_st == 0)
>     return false;
>
>   bool return_with_x87 = false;
>   return_with_x87 = ((GET_CODE (crtl->return_rtx) == REG)
>                       && (STACK_REG_P (crtl->return_rtx)));
>
>   bool complex_return = false;
>   complex_return = (COMPLEX_MODE_P (GET_MODE (crtl->return_rtx)));
>
>   if (return_with_x87)
>     if (complex_return)
>       num_of_st = 6;
>     else
>       num_of_st = 7;
>   else
>     num_of_st = 8;
>
>   rtx st_reg = gen_rtx_REG (XFmode, FIRST_STACK_REG);
>
>   for (unsigned int i = 0; i < num_of_st; i++)
>     emit_insn (gen_rtx_SET (st_reg, CONST0_RTX (XFmode)));
>
>   for (unsigned int i = 0; i < num_of_st; i++)
>     {
>       rtx insn;
>       insn = emit_insn (gen_rtx_SET (st_reg, st_reg));
>       add_reg_note (insn, REG_DEAD, st_reg);
>     }
>   return true;
> }
>
> /* TARGET_ZERO_CALL_USED_REGS.  */
> /* Generate a sequence of instructions that zero registers specified by
>    NEED_ZEROED_HARDREGS.  Return the ZEROED_HARDREGS that are actually
>    zeroed.  */
> static HARD_REG_SET
> ix86_zero_call_used_regs (HARD_REG_SET need_zeroed_hardregs)
> {
>   HARD_REG_SET zeroed_hardregs;
>   bool all_sse_zeroed = false;
>   bool st_zeroed = false;
>
>   /* first, let's see whether we can zero all vector registers together.  */
>   rtx zero_all_vec_insn = zero_all_vector_registers (need_zeroed_hardregs);
>   if (zero_all_vec_insn)
>     {
>       emit_insn (zero_all_vec_insn);
>       all_sse_zeroed = true;
>     }
>
>   /* Then, decide which mode (MMX mode or x87 mode) the function exit with.
>      In order to decide whether we need to clear the MMX registers or the
>      stack registers.  */
>   bool exit_with_mmx_mode = false;
>
>   exit_with_mmx_mode = ((GET_CODE (crtl->return_rtx) == REG)
>                         && (MMX_REG_P (crtl->return_rtx)));
>
>   /* then, let's see whether we can zero all st registers togeter.  */
>   if (!exit_with_mmx_mode)
>     st_zeroed = zero_all_st_registers (need_zeroed_hardregs);
>
>   /* Now, generate instructions to zero all the registers.  */
>
>   CLEAR_HARD_REG_SET (zeroed_hardregs);
>   if (st_zeroed)
>     SET_HARD_REG_BIT (zeroed_hardregs, FIRST_STACK_REG);
>
>   rtx zero_gpr = NULL_RTX;
>   rtx zero_vector = NULL_RTX;
>   rtx zero_mask = NULL_RTX;
>
>   for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
>     {
>       if (!TEST_HARD_REG_BIT (need_zeroed_hardregs, regno))
>         continue;
>       if (!zero_call_used_regno_p (regno, all_sse_zeroed, exit_with_mmx_mode))
>         continue;
>
>       SET_HARD_REG_BIT (zeroed_hardregs, regno);
>
>       rtx reg, tmp;
>       machine_mode mode = zero_call_used_regno_mode (regno);
>
>       reg = gen_rtx_REG (mode, regno);
>
>       if (mode == SImode)
>         if (zero_gpr == NULL_RTX)
>           {
>             zero_gpr = reg;
>             tmp = gen_rtx_SET (reg, const0_rtx);
>             if (!TARGET_USE_MOV0 || optimize_insn_for_size_p ())
>               {
>                 rtx clob = gen_rtx_CLOBBER (VOIDmode,
>                                             gen_rtx_REG (CCmode,
>                                                          FLAGS_REG));
>                 tmp = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2,
>                                                              tmp,
>                                                              clob));
>               }
>             emit_insn (tmp);
>           }
>         else
>           emit_move_insn (reg, zero_gpr);
>       else if (mode == V4SFmode)
>         if (zero_vector == NULL_RTX)
>           {
>             zero_vector = reg;
>             tmp = gen_rtx_SET (reg, const0_rtx);
>             emit_insn (tmp);
>           }
>         else
>           emit_move_insn (reg, zero_vector);
>       else if (mode == HImode)
>         if (zero_mask == NULL_RTX)
>           {
>             zero_mask = reg;
>             tmp = gen_rtx_SET (reg, const0_rtx);
>             emit_insn (tmp);
>           }
>         else
>           emit_move_insn (reg, zero_mask);
>       else
>         gcc_unreachable ();
>     }
>   return zeroed_hardregs;
> }
>
> On Oct 26, 2020, at 11:13 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
>
> On Mon, Oct 26, 2020 at 3:45 PM Qing Zhao <QING.ZHAO@oracle.com> wrote:
>
>
>
> +/* Generate insns to zero all st/mm registers together.
> +   Return true when zeroing instructions are generated.
> +   Assume the number of st registers that are zeroed is num_of_st,
> +   we will emit the following sequence to zero them together:
> +                 fldz;         \
> +                 fldz;         \
> +                 ...
> +                 fldz;         \
> +                 fstp %%st(0); \
> +                 fstp %%st(0); \
> +                 ...
> +                 fstp %%st(0);
> +   i.e., num_of_st fldz followed by num_of_st fstp to clear the stack
> +   mark stack slots empty.  */
> +
> +static bool
> +zero_all_st_mm_registers (HARD_REG_SET need_zeroed_hardregs)
> +{
> +  unsigned int num_of_st = 0;
> +  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
> +    if (STACK_REGNO_P (regno)
> +       && TEST_HARD_REG_BIT (need_zeroed_hardregs, regno)
> +       /* When the corresponding mm register also need to be cleared too.  */
> +       && TEST_HARD_REG_BIT (need_zeroed_hardregs,
> +                             (regno - FIRST_STACK_REG + FIRST_MMX_REG)))
> +      num_of_st++;
>
>
> I don't think the above logic is correct. It should go like this:
>
> - If the function is returning an MMX register,
>
>
> How to check on this? Is the following correct?
>
> If (GET_CODE(crtl->return_rtx) == REG
>    && (MMX_REG_P (REGNO (crtl->return_rtx)))
>
>
> Yes, but please use
>
> if (MMX_REG_P (crtl->return_rtx))
>
>
>   The function is returning an MMX register.
>
>
> then the function
> exits in MMX mode, and MMX registers should be cleared in the same way
> as XMM registers.
>
>
> When clearing XMM registers, we used V4SFmode, what’s the mode we should use to clearing
> mmx registers?
>
>
> It doesn't matter that much, any 8byte vector mode will do (including
> DImode). Let's use V4HImode.
>
> Otherwise the ABI specifies that the function exits
> in x87 mode and x87 stack should be cleared (but see below).
>
> - There is no direct mapping of stack registers to hard register
> numbers. If a stack register is used, we don't know where in the stack
> the value remains. So, if _any_ stack register is touched, the whole
> stack should be cleared (value, returning in x87 stack register should
> obviously be excluded).
>
>
> Then, how to exclude the x87 stack register that returns the function return value when we need to
> Clear the whole stack?
> I am a little confused here? Could you explain a little more details?
>
>
> x87 returns in the top (two for complex values) register, so simply
> load 7 zeros (and 7 corresponding pops). This will preserve the return
> value but clear the whole remaining stack.
>
> - There is no x87 argument register. 32bit targets use MMX0-3 argument
> registers and return value in the XMM register. Please also note that
> complex values take two stack slots in x87 stack.
>
>
> You mean the complex return value will be returned in two  x87 registers?
>
>
> Yes, please see ix86_class_max_nregs. Please note that in case of
> complex return value, only 6 zeros should be loaded to avoid
> clobbering the complex return value.
>
> Uros.
>
>
Richard Sandiford Oct. 26, 2020, 6:47 p.m. UTC | #7
Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index c9f7299..3a884e1 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -3992,6 +3992,49 @@ performing a link with relocatable output (i.e.@: @code{ld -r}) on them.
>  A declaration to which @code{weakref} is attached and that is associated
>  with a named @code{target} must be @code{static}.
>  
> +@item zero_call_used_regs ("@var{choice}")
> +@cindex @code{zero_call_used_regs} function attribute
> +
> +The @code{zero_call_used_regs} attribute causes the compiler to zero
> +a subset of all call-used registers at function return according to
> +@var{choice}.
> +This is used to increase the program security by either mitigating
> +Return-Oriented Programming (ROP) or preventing information leak
> +through registers.
> +
> +A "call-used" register is a register that is clobbered by function calls,
> +as a result, the caller has to save and restore it before or after a
> +function call.  It is also called as "call-clobbered", "caller-saved", or
> +"volatile".

texinfo quoting is to use ``…'' rather than "…".  So maybe:

-------------------------------------------------------------------
A ``call-used'' register is a register whose contents can be changed by
a function call; therefore, a caller cannot assume that the register has
the same contents on return from the function as it had before calling
the function.  Such registers are also called ``call-clobbered'',
``caller-saved'', or ``volatile''.
-------------------------------------------------------------------

> +In order to satisfy users with different security needs and control the
> +run-time overhead at the same time,  GCC provides a flexible way to choose

nit: should only be one space after the comma

> +the subset of the call-used registers to be zeroed.

Maybe add “The three basic values of @var{choice} are:”

> +
> +@samp{skip} doesn't zero any call-used registers.
> +@samp{used} zeros call-used registers which are used in the function.  A "used"

Maybe s/zeros/only zeros/?

s/which/that/

> +register is one whose content has been set or referenced in the function.
> +@samp{all} zeros all call-used registers.

I think this would be better formatted using a @table.

> +In addition to the above three basic choices, the register set can be further
> +limited by adding "-gpr" (i.e., general purpose register), "-arg" (i.e.,
> +argument register), or both as following:

How about:

-------------------------------------------------------------------
In addition to these three basic choices, it is possible to modify
@samp{used} or @samp{all} as follows:

@itemize @bullet
@item
Adding @samp{-gpr} restricts the zeroing to general-purpose registers.

@item
Adding @samp{-arg} restricts the zeroing to registers that are used
to pass parameters.  When applied to @samp{all}, this includes all
parameter registers defined by the platform's calling convention,
regardless of whether the function uses those parameter registers.
@end @itemize

The modifiers can be used individually or together.  If they are used
together, they must appear in the order above.

The full list of @var{choice}s is therefore:
-------------------------------------------------------------------

with the list repeating @var{skip}, @var{used} and @var{all}.

(untested)

> +@samp{used-gpr-arg} zeros used call-used general purpose registers that
> +pass parameters.
> +@samp{used-arg} zeros used call-used registers that pass parameters.
> +@samp{all-gpr-arg} zeros all call-used general purpose registers that pass
> +parameters.
> +@samp{all-arg} zeros all call-used registers that pass parameters.
> +@samp{used-gpr} zeros call-used general purpose registers which are used in the
> +function.
> +@samp{all-gpr} zeros all call-used general purpose registers.

I think this too should be a @table.

> +
> +Among this list, "used-gpr-arg", "used-arg", "all-gpr-arg", and "all-arg" are
> +mainly used for ROP mitigation.

Should be quoted using @samp rather than ".

> +@item -fzero-call-used-regs=@var{choice}
> +@opindex fzero-call-used-regs
> +Zero call-used registers at function return to increase the program
> +security by either mitigating Return-Oriented Programming (ROP) or
> +preventing information leak through registers.

After this, we should probably say something like:

-------------------------------------------------------------------
The possible values of @var{choice} are the same as for the
@samp{zero_call_used_regs} attribute (@pxref{…}).  The default
is @samp{skip}.
-------------------------------------------------------------------

(with the xref filled in)

> diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
> index 97437e8..3b75c46 100644
> --- a/gcc/doc/tm.texi
> +++ b/gcc/doc/tm.texi
> @@ -12053,6 +12053,18 @@ argument list due to stack realignment.  Return @code{NULL} if no DRAP
>  is needed.
>  @end deftypefn
>  
> +@deftypefn {Target Hook} HARD_REG_SET TARGET_ZERO_CALL_USED_REGS (HARD_REG_SET @var{selected_regs})
> +This target hook emits instructions to zero subset of @var{selected_regs}

…to zero the subset…
(probably my mistake, sorry)

> diff --git a/gcc/flag-types.h b/gcc/flag-types.h
> index 852ea76..0f7e503 100644
> --- a/gcc/flag-types.h
> +++ b/gcc/flag-types.h
> @@ -285,6 +285,15 @@ enum sanitize_code {
>  				  | SANITIZE_BOUNDS_STRICT
>  };
>  
> +enum  zero_call_used_regs_code {
> +  UNSET = 0,
> +  SKIP = 1UL << 0,
> +  ONLY_USED = 1UL << 1,
> +  ONLY_GPR = 1UL << 2,
> +  ONLY_ARG = 1UL << 3,
> +  ALL = 1UL << 4
> +};

I'd suggested these names on the assumption that we'd be using
a C++ enum class, so that the enum would be referenced as
name::ALL, name::SKIP, etc.  But I guess using a C++ enum class
doesn't work well with bitfields after all.

These names are too generic without the name:: scoping though.
Perhaps we should put them in a namespace:

  namespace zero_regs_flags {
    const unsigned int UNSET = 0;
    …etc…
  }

(call-used probably doesn't need to be part of the flag names,
since the concept is more general than that and call-usedness
is really a filter that's being applied on top.  Although I guess
the same is true of “zero”. ;-))

I don't think we should have ALL as a separate flag: ALL is the absence
of ONLY_*.  Maybe we should have an ENABLED flag that all non-skip
combinations use?

If it makes things easier, I think it would be good to have e.g.:

  unsigned int USED_GPR = ENABLED | ONLY_USED | ONLY_GPR;

inside the namespace, to reduce the verbosity in the option table.

> +  /* If gpr_only is true, only zero call-used-registers that are
> +     general-purpose registers; if used_only is true, only zero
> +     call-used-registers that are used in the current function.  */
> +
> +  gpr_only = crtl->zero_call_used_regs & ONLY_GPR;
> +  used_only = crtl->zero_call_used_regs & ONLY_USED;
> +  arg_only = crtl->zero_call_used_regs & ONLY_ARG;
> +
> +  /* For each of the hard registers, check to see whether we should zero it if:

s/check to see whether //

> +     1. it is a call-used-registers;

s/call-used-registers/call-used register/

> + and 2. it is not a fixed-registers;

s/fixed-registers/fixed register/

> + and 3. it is not live at the return of the routine;
> + and 4. it is general registor if gpr_only is true;
> + and 5. it is used in the routine if used_only is true;
> + and 6. it is a register that passes parameter if arg_only is true;
> +   */

Under GCC formatting, the “and” lines need to be indented under “For each”.
Maybe indent the “1.” line a bit more if you think it looks nicer with the
numbers lined up (it probably does).

Similarly, the last bit of text should end with “.  */”, rather than
with the “;\n  */” above.

(Sorry that the rules are so picky about this.)

> +  /* First, prepare the data flow information.  */
> +  basic_block bb = BLOCK_FOR_INSN (ret);
> +  bitmap live_out;
> +  live_out = BITMAP_ALLOC (NULL);

Should just use auto_bitmap here, which will also handle the freeing.

> +  bitmap_copy (live_out, df_get_live_out (bb));
> +  df_simulate_initialize_backwards (bb, live_out);
> +  df_simulate_one_insn_backwards (bb, ret, live_out);
> +
> +  HARD_REG_SET need_zeroed_hardregs;
> +  CLEAR_HARD_REG_SET (need_zeroed_hardregs);

Maybe s/need_zeroed/selected/?  Similarly to the target hook comment
in the previous review, I think “need” makes it sound like the target
has no freedom to decline.

> +  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
> +    {
> +      if (!crtl->abi->clobbers_full_reg_p (regno))
> +	continue;
> +      if (fixed_regs[regno])
> +	continue;
> +      if (REGNO_REG_SET_P (live_out, regno))
> +	continue;
> +      if (gpr_only
> +	  && !TEST_HARD_REG_BIT (reg_class_contents[GENERAL_REGS], regno))
> +	continue;
> +      if (used_only && !df_regs_ever_live_p (regno))
> +	continue;
> +      if (arg_only && !FUNCTION_ARG_REGNO_P (regno))
> +	continue;
> +
> +      /* Now this is a register that we might want to zero.  */
> +      SET_HARD_REG_BIT (need_zeroed_hardregs, regno);
> +    }
> +
> +  BITMAP_FREE (live_out);
> +
> +  if (hard_reg_set_empty_p (need_zeroed_hardregs))
> +    return;
> +
> +  /* Now we get a hard register set that need to be zeroed, pass it to
> +     target to generate zeroing sequence.  */

/* Now that we have a hard register set that needs to be zeroed, pass it
   to the target to generate the zeroing sequence.  */

> +  HARD_REG_SET zeroed_hardregs;
> +  start_sequence ();
> +  zeroed_hardregs = targetm.calls.zero_call_used_regs (need_zeroed_hardregs);
> +  rtx_insn *seq = get_insns ();
> +  end_sequence ();
> +  if (seq)
> +    {
> +      /* Emit the memory blockage and register clobber asm volatile before
> +	 the whole sequence.  */
> +      start_sequence ();
> +      expand_asm_reg_clobber_mem_blockage (zeroed_hardregs);
> +      rtx_insn *seq_barrier = get_insns ();
> +      end_sequence ();
> +
> +      emit_insn_before (seq_barrier, ret);
> +      emit_insn_before (seq, ret);
> +
> +      /* Update the data flow information.  */
> +      crtl->must_be_zero_on_return |= zeroed_hardregs;
> +      df_set_bb_dirty (EXIT_BLOCK_PTR_FOR_FN (cfun));
> +    }
> +}
> +
> +
>  /* Return a sequence to be used as the epilogue for the current function,
>     or NULL.  */
>  
> @@ -6486,7 +6584,120 @@ make_pass_thread_prologue_and_epilogue (gcc::context *ctxt)
>  {
>    return new pass_thread_prologue_and_epilogue (ctxt);
>  }
> -
>
> +
> +static unsigned int
> +rest_of_zero_call_used_regs (void)

This needs a function comment.  Maybe:

/* Iterate over the function's return instructions and insert any
   register zeroing required by the -fzero-call-used-regs command-line
   option or the "zero_call_used_regs" function attribute.  */

Also, we might as well make it:

pass_zero_call_used_regs::execute

rather than a separate function.  The “rest_of_…” stuff is mostly legacy.

> +{
> +  edge_iterator ei;
> +  edge e;
> +  rtx_insn *insn;
> +
> +  /* This pass needs data flow information.  */
> +  df_analyze ();
> +
> +  /* Search all the "return"s in the routine, and insert instruction sequence to
> +     zero the call used registers.  */
> +  FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds)
> +    {
> +      insn = BB_END (e->src);

Modern style would be to declare insn here rather than above.

> +      if (JUMP_P (insn) && ANY_RETURN_P (JUMP_LABEL (insn)))
> +	gen_call_used_regs_seq (insn);
> +    }
> +
> +  return 0;
> +}
> +
> +namespace {
> +
> +const pass_data pass_data_zero_call_used_regs =
> +{
> +  RTL_PASS, /* type */
> +  "zero_call_used_regs", /* name */
> +  OPTGROUP_NONE, /* optinfo_flags */
> +  TV_NONE, /* tv_id */
> +  0, /* properties_required */
> +  0, /* properties_provided */
> +  0, /* properties_destroyed */
> +  0, /* todo_flags_start */
> +  0, /* todo_flags_finish */
> +};
> +
> +class pass_zero_call_used_regs: public rtl_opt_pass
> +{
> +public:
> +  pass_zero_call_used_regs (gcc::context *ctxt)
> +    : rtl_opt_pass (pass_data_zero_call_used_regs, ctxt)
> +  {}
> +
> +  /* opt_pass methods: */
> +  virtual bool gate (function *);
> +
> +  virtual unsigned int execute (function *)
> +    {
> +      return rest_of_zero_call_used_regs ();
> +    }
> +
> +}; // class pass_zero_call_used_regs
> +
> +bool
> +pass_zero_call_used_regs::gate (function *fun)
> +{
> +  unsigned int zero_regs_type = UNSET;
> +  unsigned int attr_zero_regs_type = UNSET;
> +
> +  tree attr_zero_regs
> +	= lookup_attribute ("zero_call_used_regs",
> +			    DECL_ATTRIBUTES (fun->decl));
> +
> +  /* Get the type of zero_call_used_regs from function attribute.  */
> +  if (attr_zero_regs)
> +    {
> +      bool found = false;
> +      unsigned int i;
> +
> +      /* The TREE_VALUE of an attribute is a TREE_LIST whose TREE_VALUE
> +	 is the attribute argument's value.  */
> +      attr_zero_regs = TREE_VALUE (attr_zero_regs);
> +      gcc_assert (TREE_CODE (attr_zero_regs) == TREE_LIST);
> +      attr_zero_regs = TREE_VALUE (attr_zero_regs);
> +      gcc_assert (TREE_CODE (attr_zero_regs) == STRING_CST);
> +
> +      for (i = 0; zero_call_used_regs_opts[i].name != NULL; ++i)
> +	if (strcmp (TREE_STRING_POINTER (attr_zero_regs),
> +		     zero_call_used_regs_opts[i].name) == 0)
> +	  {
> +	    attr_zero_regs_type |= zero_call_used_regs_opts[i].flag;

Think = is less surprising than |= here.

> +	    found = true;

All valid values are nonzero, so we don't need a separate boolean.

> + 	    break;
> +	  }
> +
> +      if (!found)
> +	warning_at (DECL_SOURCE_LOCATION (fun->decl), 0,
> +		    "unrecognized zero_call_used_regs attribute: %qs",
> +		    TREE_STRING_POINTER (attr_zero_regs));

I think we should warn when handling the attribute in c-attribs.c
(as before, IIRC), and make it silent here.

> +    }
> +
> +  if (flag_zero_call_used_regs)
> +    if (!attr_zero_regs)
> +      zero_regs_type = flag_zero_call_used_regs;
> +    else
> +      zero_regs_type = attr_zero_regs_type;
> +  else
> +    zero_regs_type = attr_zero_regs_type;

Seems easier to make the attribute code set zero_regs_type directly,
then have:

  if (!zero_regs_type)
    zero_regs_type = flag_zero_call_used_regs;

> +
> +  crtl->zero_call_used_regs = zero_regs_type;
> +
> +  /* No need to zero call-used-regs when no user request is present.  */
> +  return zero_regs_type > SKIP;

Think testing for skip using & SKIP or ==/!= SKIP is more obvious.

This is too much for a gate function, which should be a simple
side-effect-free function that tests whether the pass should run.
Perhaps we should just make the pass unconditional and do the above
in ::execute.  The pass is very cheap, so gating probably isn't
worthwhile.

> +}
> +
> +} // anon namespace
> +
> +rtl_opt_pass *
> +make_pass_zero_call_used_regs (gcc::context *ctxt)
> +{
> +  return new pass_zero_call_used_regs (ctxt);
> +}
>  
>  /* If CONSTRAINT is a matching constraint, then return its number.
>     Otherwise, return -1.  */
> diff --git a/gcc/optabs.c b/gcc/optabs.c
> index 8ad7f4b..bd64af0 100644
> --- a/gcc/optabs.c
> +++ b/gcc/optabs.c
> @@ -6484,6 +6484,48 @@ expand_memory_blockage (void)
>      expand_asm_memory_blockage ();
>  }
>  
> +/* Generate asm volatile("" : : : "memory") as a memory blockage, at the
> +   same time clobbering the register set specified by REGS.  */
> +
> +void
> +expand_asm_reg_clobber_mem_blockage (HARD_REG_SET regs)
> +{
> +  rtx asm_op, clob_mem;
> +
> +  unsigned int num_of_regs = 0;
> +  for (unsigned int i = 0; i < FIRST_PSEUDO_REGISTER; i++)
> +    if (TEST_HARD_REG_BIT (regs, i))
> +      num_of_regs++;
> +
> +  asm_op = gen_rtx_ASM_OPERANDS (VOIDmode, "", "", 0,
> +				 rtvec_alloc (0), rtvec_alloc (0),
> +				 rtvec_alloc (0), UNKNOWN_LOCATION);
> +  MEM_VOLATILE_P (asm_op) = 1;
> +
> +  rtvec v = rtvec_alloc (num_of_regs + 2);
> +
> +  clob_mem = gen_rtx_SCRATCH (VOIDmode);
> +  clob_mem = gen_rtx_MEM (BLKmode, clob_mem);
> +  clob_mem = gen_rtx_CLOBBER (VOIDmode, clob_mem);
> +
> +  RTVEC_ELT (v,0) = asm_op;
> +  RTVEC_ELT (v,1) = clob_mem;

nit: should be a space before the comma, here and below.

> +
> +  if (num_of_regs > 0)
> +    {
> +      unsigned int j = 2;
> +      for (unsigned int i = 0; i < FIRST_PSEUDO_REGISTER; i++)
> +	if (TEST_HARD_REG_BIT (regs, i))
> +	  {
> +	    RTVEC_ELT (v,j) = gen_rtx_CLOBBER (VOIDmode, regno_reg_rtx[i]);
> + 	    j++;
> +	  }
> +      gcc_assert (j == (num_of_regs + 2));
> +    }
> +
> +  emit_insn (gen_rtx_PARALLEL (VOIDmode, v));
> +}
> +
>  /* This routine will either emit the mem_thread_fence pattern or issue a 
>     sync_synchronize to generate a fence for memory model MEMMODEL.  */
>  
> diff --git a/gcc/optabs.h b/gcc/optabs.h
> index 0b14700..bfa10c8 100644
> --- a/gcc/optabs.h
> +++ b/gcc/optabs.h
> @@ -345,6 +345,8 @@ rtx expand_atomic_store (rtx, rtx, enum memmodel, bool);
>  rtx expand_atomic_fetch_op (rtx, rtx, rtx, enum rtx_code, enum memmodel, 
>  			      bool);
>  
> +extern void expand_asm_reg_clobber_mem_blockage (HARD_REG_SET);
> +
>  extern bool insn_operand_matches (enum insn_code icode, unsigned int opno,
>  				  rtx operand);
>  extern bool valid_multiword_target_p (rtx);
> diff --git a/gcc/opts.c b/gcc/opts.c
> index 3bda59a..f95a1f0 100644
> --- a/gcc/opts.c
> +++ b/gcc/opts.c
> @@ -1776,6 +1776,24 @@ const struct sanitizer_opts_s coverage_sanitizer_opts[] =
>    { NULL, 0U, 0UL, false }
>  };
>  
> +/* -fzero-call-used-regs= suboptions.  */
> +const struct zero_call_used_regs_opts_s zero_call_used_regs_opts[] =
> +{
> +#define ZERO_CALL_USED_REGS_OPT(name, flags) \
> +    { #name, flags }
> +  ZERO_CALL_USED_REGS_OPT (skip, SKIP),
> +  ZERO_CALL_USED_REGS_OPT (used-gpr-arg, (ONLY_USED | ONLY_GPR | ONLY_ARG)),
> +  ZERO_CALL_USED_REGS_OPT (used-arg, (ONLY_USED | ONLY_ARG)),
> +  ZERO_CALL_USED_REGS_OPT (all-gpr-arg, (ONLY_GPR | ONLY_ARG)),
> +  ZERO_CALL_USED_REGS_OPT (all-arg, ONLY_ARG),
> +  ZERO_CALL_USED_REGS_OPT (used-gpr, (ONLY_USED | ONLY_GPR)),
> +  ZERO_CALL_USED_REGS_OPT (all-gpr, ONLY_GPR),
> +  ZERO_CALL_USED_REGS_OPT (used, ONLY_USED),
> +  ZERO_CALL_USED_REGS_OPT (all, ALL),
> +#undef ZERO_CALL_USED_REGS_OPT
> +  {NULL, 0U}
> +};
> +
>  /* A struct for describing a run of chars within a string.  */
>  
>  class string_fragment
> @@ -1970,6 +1988,30 @@ parse_no_sanitize_attribute (char *value)
>    return flags;
>  }
>  
> +/* Parse -fzero-call-used-regs suboptions from ARG, return the FLAGS.  */
> +
> +unsigned int
> +parse_zero_call_used_regs_options (const char *arg)
> +{
> +  bool found = false;
> +  unsigned int flags = 0;
> +  unsigned int i;
> +
> +  /* Check to see if the string matches a sub-option name.  */
> +  for (i = 0; zero_call_used_regs_opts[i].name != NULL; ++i)
> +    if (strcmp (arg, zero_call_used_regs_opts[i].name) == 0)
> +      {
> +	flags |= zero_call_used_regs_opts[i].flag;
> + 	found = true;

Same comments as above.

> +	break;
> +      }
> +
> +  if (!found)
> +    error ("unrecognized argument to %<-fzero-call-used-regs=%>: %qs", arg);

Think we should use %qs for the option name too, to reduce the number
of translation strings.

> diff --git a/gcc/recog.c b/gcc/recog.c
> index ce83b7f..e231b5d 100644
> --- a/gcc/recog.c
> +++ b/gcc/recog.c
> @@ -923,6 +923,22 @@ validate_simplify_insn (rtx_insn *insn)
>    return ((num_changes_pending () > 0) && (apply_change_group () > 0));
>  }
>  
>
> +
> +/* Check whether INSN matches a specific alternative of an .md pattern.  */
> +bool
> +valid_insn_p (rtx_insn *insn)

Very minor nit, but it's unusual to have three blank lines before
the comment and none afterwards.  The codebase isn't very consistent
about this, but local style seems mostly to be one blank line before
the comment and one afterwards.

> diff --git a/gcc/testsuite/c-c++-common/zero-scratch-regs-1.c b/gcc/testsuite/c-c++-common/zero-scratch-regs-1.c
> new file mode 100644
> index 0000000..f44add9
> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/zero-scratch-regs-1.c
> @@ -0,0 +1,15 @@
> +/* { dg-do run } */
> +/* { dg-options "-O2 -fzero-call-used-regs=all" } */
> +
> +volatile int result = 0;
> +int 
> +__attribute__((noinline))

“noipa” is stronger.  Same for all the tests.

The i386 tests are Uros's domain, but I think it would be good to have
generic tests for all the variants.  E.g.:

(1) one test per -fzero-call-used-regs option (including skip)
(2) one test that tries all valid attribute values (including skip),
    compiled without -fzero-call-used-regs
(3) one test that #includes (2) but is compiled with an arbitrarily-chosen
    -fzero-call-used-regs (say =all).
(4) one test that tries invalid uses of the attribute, e.g.:
    - one use of the attribute on a variable
    - one use of the attribute on a function, but with an obviously-wrong
      value
    - one use of the attribute on a function, but with -gpr and -arg the
      wrong way around

(Sorry for not getting to the tests last time.)

Thanks,
Richard
Qing Zhao Oct. 26, 2020, 7:10 p.m. UTC | #8
> On Oct 26, 2020, at 1:42 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
> 
> On Mon, Oct 26, 2020 at 6:30 PM Qing Zhao <QING.ZHAO@oracle.com> wrote:
>> 
>> 
>> The following is the current change in i386.c, could you check whether the logic is good?
> 
> x87 handling looks good to me.
> 
> One remaining question: If the function uses MMX regs (either
> internally or as an argument register), but exits in x87 mode, does
> your logic clear the x87 stack?

Yes but not completely yes. 

FIRST, As following:

  /* Then, decide which mode (MMX mode or x87 mode) the function exit with.
     In order to decide whether we need to clear the MMX registers or the
     stack registers.  */
  bool exit_with_mmx_mode = false;

  exit_with_mmx_mode = ((GET_CODE (crtl->return_rtx) == REG) 
                        && (MMX_REG_P (crtl->return_rtx)));

  /* then, let's see whether we can zero all st registers togeter.  */
  if (!exit_with_mmx_mode)
    st_zeroed = zero_all_st_registers (need_zeroed_hardregs);


We first check whether this routine exit with mmx mode, if Not then it’s X87 mode 
(at exit, “EMMS” should already been called per ABI), then 
The st/mm registers will be cleared as x87 stack registers. 

However, within the routine “zero_all_st_registers”:

static bool
zero_all_st_registers (HARD_REG_SET need_zeroed_hardregs)
{
  unsigned int num_of_st = 0;
  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
    if (STACK_REGNO_P (regno)
        && TEST_HARD_REG_BIT (need_zeroed_hardregs, regno))
      {
        num_of_st++;
        break;
      }

  if (num_of_st == 0)
    return false;


In the above, I currently only check whether any “Stack” registers need to be zeroed or not.
But looks like we should also check any “MMX” register need to be zeroed or not too. If there is any
“MMX” register need to be zeroed, we still need to clear the whole X87 stack? 


BTW, is it convenient for you to provide me 3 small testing cases for the following situation: 

1. Return with MMX register;
2. Return with  x87 stack register;
3. Return with 2 x87 stack register (i.e the complex value).

Then it will be much easy for me to verify my implementation is good or not at my side.

Thanks a lot for your help.

Qing

> 
> (The ABI in the above case requires EMMS before exit, but the values
> from XMM regs still remain as their aliases in x87 stack.)
> 
> Uros.
> 
>> thanks.
>> 
>> Qing
>> 
>> /* Check whether the register REGNO should be zeroed on X86.
>>   When ALL_SSE_ZEROED is true, all SSE registers have been zeroed
>>   together, no need to zero it again.
>>   When EXIT_WITH_MMX_MODE is true, MMX registers should be cleared.  */
>> 
>> static bool
>> zero_call_used_regno_p (const unsigned int regno,
>>                        bool all_sse_zeroed,
>>                        bool exit_with_mmx_mode)
>> {
>>  return GENERAL_REGNO_P (regno)
>>         || (!all_sse_zeroed && SSE_REGNO_P (regno))
>>         || MASK_REGNO_P (regno)
>>         || exit_with_mmx_mode && MMX_REGNO_P (regno);
>> }
>> 
>> /* Return the machine_mode that is used to zero register REGNO.  */
>> 
>> static machine_mode
>> zero_call_used_regno_mode (const unsigned int regno)
>> {
>>  /* NB: We only need to zero the lower 32 bits for integer registers
>>     and the lower 128 bits for vector registers since destination are
>>     zero-extended to the full register width.  */
>>  if (GENERAL_REGNO_P (regno))
>>    return SImode;
>>  else if (SSE_REGNO_P (regno))
>>    return V4SFmode;
>>  else if (MASK_REGNO_P (regno))
>>    return HImode;
>>  else if (MMX_REGNO_P (regno))
>>    return V4HImode;
>>  else
>>    gcc_unreachable ();
>> }
>> 
>> /* Generate a rtx to zero all vector registers together if possible,
>>   otherwise, return NULL.  */
>> 
>> static rtx
>> zero_all_vector_registers (HARD_REG_SET need_zeroed_hardregs)
>> {
>>  if (!TARGET_AVX)
>>    return NULL;
>> 
>>  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
>>    if ((IN_RANGE (regno, FIRST_SSE_REG, LAST_SSE_REG)
>>         || (TARGET_64BIT
>>             && (REX_SSE_REGNO_P (regno)
>>                 || (TARGET_AVX512F && EXT_REX_SSE_REGNO_P (regno)))))
>>        && !TEST_HARD_REG_BIT (need_zeroed_hardregs, regno))
>>      return NULL;
>> 
>>  return gen_avx_vzeroall ();
>> }
>> 
>> 
>> /* Generate insns to zero all st registers together.
>>   Return true when zeroing instructions are generated.
>>   Assume the number of st registers that are zeroed is num_of_st,
>>   we will emit the following sequence to zero them together:
>>                  fldz;         \
>>                  fldz;         \
>>                  ...
>>                  fldz;         \
>>                  fstp %%st(0); \
>>                  fstp %%st(0); \
>>                  ...
>>                  fstp %%st(0);
>>   i.e., num_of_st fldz followed by num_of_st fstp to clear the stack
>>   mark stack slots empty.
>> 
>>   How to compute the num_of_st?
>>   There is no direct mapping from stack registers to hard register
>>   numbers.  If one stack register need to be cleared, we don't know
>>   where in the stack the value remains.  So, if any stack register
>>   need to be cleared, the whole stack should be cleared.  However,
>>   x87 stack registers that hold the return value should be excluded.
>>   x87 returns in the top (two for complex values) register, so
>>   num_of_st should be 7/6 when x87 returns, otherwise it will be 8.  */
>> 
>> 
>> static bool
>> zero_all_st_registers (HARD_REG_SET need_zeroed_hardregs)
>> {
>>  unsigned int num_of_st = 0;
>>  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
>>    if (STACK_REGNO_P (regno)
>>        && TEST_HARD_REG_BIT (need_zeroed_hardregs, regno))
>>      {
>>        num_of_st++;
>>        break;
>>      }
>> 
>>  if (num_of_st == 0)
>>    return false;
>> 
>>  bool return_with_x87 = false;
>>  return_with_x87 = ((GET_CODE (crtl->return_rtx) == REG)
>>                      && (STACK_REG_P (crtl->return_rtx)));
>> 
>>  bool complex_return = false;
>>  complex_return = (COMPLEX_MODE_P (GET_MODE (crtl->return_rtx)));
>> 
>>  if (return_with_x87)
>>    if (complex_return)
>>      num_of_st = 6;
>>    else
>>      num_of_st = 7;
>>  else
>>    num_of_st = 8;
>> 
>>  rtx st_reg = gen_rtx_REG (XFmode, FIRST_STACK_REG);
>> 
>>  for (unsigned int i = 0; i < num_of_st; i++)
>>    emit_insn (gen_rtx_SET (st_reg, CONST0_RTX (XFmode)));
>> 
>>  for (unsigned int i = 0; i < num_of_st; i++)
>>    {
>>      rtx insn;
>>      insn = emit_insn (gen_rtx_SET (st_reg, st_reg));
>>      add_reg_note (insn, REG_DEAD, st_reg);
>>    }
>>  return true;
>> }
>> 
>> /* TARGET_ZERO_CALL_USED_REGS.  */
>> /* Generate a sequence of instructions that zero registers specified by
>>   NEED_ZEROED_HARDREGS.  Return the ZEROED_HARDREGS that are actually
>>   zeroed.  */
>> static HARD_REG_SET
>> ix86_zero_call_used_regs (HARD_REG_SET need_zeroed_hardregs)
>> {
>>  HARD_REG_SET zeroed_hardregs;
>>  bool all_sse_zeroed = false;
>>  bool st_zeroed = false;
>> 
>>  /* first, let's see whether we can zero all vector registers together.  */
>>  rtx zero_all_vec_insn = zero_all_vector_registers (need_zeroed_hardregs);
>>  if (zero_all_vec_insn)
>>    {
>>      emit_insn (zero_all_vec_insn);
>>      all_sse_zeroed = true;
>>    }
>> 
>>  /* Then, decide which mode (MMX mode or x87 mode) the function exit with.
>>     In order to decide whether we need to clear the MMX registers or the
>>     stack registers.  */
>>  bool exit_with_mmx_mode = false;
>> 
>>  exit_with_mmx_mode = ((GET_CODE (crtl->return_rtx) == REG)
>>                        && (MMX_REG_P (crtl->return_rtx)));
>> 
>>  /* then, let's see whether we can zero all st registers togeter.  */
>>  if (!exit_with_mmx_mode)
>>    st_zeroed = zero_all_st_registers (need_zeroed_hardregs);
>> 
>>  /* Now, generate instructions to zero all the registers.  */
>> 
>>  CLEAR_HARD_REG_SET (zeroed_hardregs);
>>  if (st_zeroed)
>>    SET_HARD_REG_BIT (zeroed_hardregs, FIRST_STACK_REG);
>> 
>>  rtx zero_gpr = NULL_RTX;
>>  rtx zero_vector = NULL_RTX;
>>  rtx zero_mask = NULL_RTX;
>> 
>>  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
>>    {
>>      if (!TEST_HARD_REG_BIT (need_zeroed_hardregs, regno))
>>        continue;
>>      if (!zero_call_used_regno_p (regno, all_sse_zeroed, exit_with_mmx_mode))
>>        continue;
>> 
>>      SET_HARD_REG_BIT (zeroed_hardregs, regno);
>> 
>>      rtx reg, tmp;
>>      machine_mode mode = zero_call_used_regno_mode (regno);
>> 
>>      reg = gen_rtx_REG (mode, regno);
>> 
>>      if (mode == SImode)
>>        if (zero_gpr == NULL_RTX)
>>          {
>>            zero_gpr = reg;
>>            tmp = gen_rtx_SET (reg, const0_rtx);
>>            if (!TARGET_USE_MOV0 || optimize_insn_for_size_p ())
>>              {
>>                rtx clob = gen_rtx_CLOBBER (VOIDmode,
>>                                            gen_rtx_REG (CCmode,
>>                                                         FLAGS_REG));
>>                tmp = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2,
>>                                                             tmp,
>>                                                             clob));
>>              }
>>            emit_insn (tmp);
>>          }
>>        else
>>          emit_move_insn (reg, zero_gpr);
>>      else if (mode == V4SFmode)
>>        if (zero_vector == NULL_RTX)
>>          {
>>            zero_vector = reg;
>>            tmp = gen_rtx_SET (reg, const0_rtx);
>>            emit_insn (tmp);
>>          }
>>        else
>>          emit_move_insn (reg, zero_vector);
>>      else if (mode == HImode)
>>        if (zero_mask == NULL_RTX)
>>          {
>>            zero_mask = reg;
>>            tmp = gen_rtx_SET (reg, const0_rtx);
>>            emit_insn (tmp);
>>          }
>>        else
>>          emit_move_insn (reg, zero_mask);
>>      else
>>        gcc_unreachable ();
>>    }
>>  return zeroed_hardregs;
>> }
>> 
>> On Oct 26, 2020, at 11:13 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
>> 
>> On Mon, Oct 26, 2020 at 3:45 PM Qing Zhao <QING.ZHAO@oracle.com> wrote:
>> 
>> 
>> 
>> +/* Generate insns to zero all st/mm registers together.
>> +   Return true when zeroing instructions are generated.
>> +   Assume the number of st registers that are zeroed is num_of_st,
>> +   we will emit the following sequence to zero them together:
>> +                 fldz;         \
>> +                 fldz;         \
>> +                 ...
>> +                 fldz;         \
>> +                 fstp %%st(0); \
>> +                 fstp %%st(0); \
>> +                 ...
>> +                 fstp %%st(0);
>> +   i.e., num_of_st fldz followed by num_of_st fstp to clear the stack
>> +   mark stack slots empty.  */
>> +
>> +static bool
>> +zero_all_st_mm_registers (HARD_REG_SET need_zeroed_hardregs)
>> +{
>> +  unsigned int num_of_st = 0;
>> +  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
>> +    if (STACK_REGNO_P (regno)
>> +       && TEST_HARD_REG_BIT (need_zeroed_hardregs, regno)
>> +       /* When the corresponding mm register also need to be cleared too.  */
>> +       && TEST_HARD_REG_BIT (need_zeroed_hardregs,
>> +                             (regno - FIRST_STACK_REG + FIRST_MMX_REG)))
>> +      num_of_st++;
>> 
>> 
>> I don't think the above logic is correct. It should go like this:
>> 
>> - If the function is returning an MMX register,
>> 
>> 
>> How to check on this? Is the following correct?
>> 
>> If (GET_CODE(crtl->return_rtx) == REG
>>   && (MMX_REG_P (REGNO (crtl->return_rtx)))
>> 
>> 
>> Yes, but please use
>> 
>> if (MMX_REG_P (crtl->return_rtx))
>> 
>> 
>>  The function is returning an MMX register.
>> 
>> 
>> then the function
>> exits in MMX mode, and MMX registers should be cleared in the same way
>> as XMM registers.
>> 
>> 
>> When clearing XMM registers, we used V4SFmode, what’s the mode we should use to clearing
>> mmx registers?
>> 
>> 
>> It doesn't matter that much, any 8byte vector mode will do (including
>> DImode). Let's use V4HImode.
>> 
>> Otherwise the ABI specifies that the function exits
>> in x87 mode and x87 stack should be cleared (but see below).
>> 
>> - There is no direct mapping of stack registers to hard register
>> numbers. If a stack register is used, we don't know where in the stack
>> the value remains. So, if _any_ stack register is touched, the whole
>> stack should be cleared (value, returning in x87 stack register should
>> obviously be excluded).
>> 
>> 
>> Then, how to exclude the x87 stack register that returns the function return value when we need to
>> Clear the whole stack?
>> I am a little confused here? Could you explain a little more details?
>> 
>> 
>> x87 returns in the top (two for complex values) register, so simply
>> load 7 zeros (and 7 corresponding pops). This will preserve the return
>> value but clear the whole remaining stack.
>> 
>> - There is no x87 argument register. 32bit targets use MMX0-3 argument
>> registers and return value in the XMM register. Please also note that
>> complex values take two stack slots in x87 stack.
>> 
>> 
>> You mean the complex return value will be returned in two  x87 registers?
>> 
>> 
>> Yes, please see ix86_class_max_nregs. Please note that in case of
>> complex return value, only 6 zeros should be loaded to avoid
>> clobbering the complex return value.
>> 
>> Uros.
>> 
>>
Uros Bizjak Oct. 26, 2020, 8:05 p.m. UTC | #9
On Mon, Oct 26, 2020 at 8:10 PM Qing Zhao <QING.ZHAO@oracle.com> wrote:
>
>
>
> > On Oct 26, 2020, at 1:42 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
> >
> > On Mon, Oct 26, 2020 at 6:30 PM Qing Zhao <QING.ZHAO@oracle.com> wrote:
> >>
> >>
> >> The following is the current change in i386.c, could you check whether the logic is good?
> >
> > x87 handling looks good to me.
> >
> > One remaining question: If the function uses MMX regs (either
> > internally or as an argument register), but exits in x87 mode, does
> > your logic clear the x87 stack?
>
> Yes but not completely yes.
>
> FIRST, As following:
>
>   /* Then, decide which mode (MMX mode or x87 mode) the function exit with.
>      In order to decide whether we need to clear the MMX registers or the
>      stack registers.  */
>   bool exit_with_mmx_mode = false;
>
>   exit_with_mmx_mode = ((GET_CODE (crtl->return_rtx) == REG)
>                         && (MMX_REG_P (crtl->return_rtx)));
>
>   /* then, let's see whether we can zero all st registers togeter.  */
>   if (!exit_with_mmx_mode)
>     st_zeroed = zero_all_st_registers (need_zeroed_hardregs);
>
>
> We first check whether this routine exit with mmx mode, if Not then it’s X87 mode
> (at exit, “EMMS” should already been called per ABI), then
> The st/mm registers will be cleared as x87 stack registers.
>
> However, within the routine “zero_all_st_registers”:
>
> static bool
> zero_all_st_registers (HARD_REG_SET need_zeroed_hardregs)
> {
>   unsigned int num_of_st = 0;
>   for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
>     if (STACK_REGNO_P (regno)
>         && TEST_HARD_REG_BIT (need_zeroed_hardregs, regno))
>       {
>         num_of_st++;
>         break;
>       }
>
>   if (num_of_st == 0)
>     return false;
>
>
> In the above, I currently only check whether any “Stack” registers need to be zeroed or not.
> But looks like we should also check any “MMX” register need to be zeroed or not too. If there is any
> “MMX” register need to be zeroed, we still need to clear the whole X87 stack?

I think so, but I have to check the details

> BTW, is it convenient for you to provide me 3 small testing cases for the following situation:
>
> 1. Return with MMX register;
> 2. Return with  x87 stack register;
> 3. Return with 2 x87 stack register (i.e the complex value).
>
> Then it will be much easy for me to verify my implementation is good or not at my side.

--cut here--
typedef int __v2si __attribute__ ((vector_size (8)));

__v2si ret_mmx (void)
{
  return (__v2si) { 123, 345 };
}

long double ret_x87 (void)
{
  return 1.1L;
}

_Complex long double ret_x87_cplx (void)
{
  return 1.1L + 1.2iL;
}
--cut here--

Please compile this with "-m32 -mmmx".

ret_mmx returns value in MMX register.
ret_x87 returns value in x87 register.
ret_x87_cplx returns value in memory.

"-m64"

ret_mmx returns value in XMM register.
ret_x87 returns value in x87 register.
ret_x87_cplx returns value in two x87 registers.

Uros.
Uros Bizjak Oct. 26, 2020, 8:33 p.m. UTC | #10
On Mon, Oct 26, 2020 at 9:05 PM Uros Bizjak <ubizjak@gmail.com> wrote:
>
> On Mon, Oct 26, 2020 at 8:10 PM Qing Zhao <QING.ZHAO@oracle.com> wrote:
> >
> >
> >
> > > On Oct 26, 2020, at 1:42 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
> > >
> > > On Mon, Oct 26, 2020 at 6:30 PM Qing Zhao <QING.ZHAO@oracle.com> wrote:
> > >>
> > >>
> > >> The following is the current change in i386.c, could you check whether the logic is good?
> > >
> > > x87 handling looks good to me.
> > >
> > > One remaining question: If the function uses MMX regs (either
> > > internally or as an argument register), but exits in x87 mode, does
> > > your logic clear the x87 stack?
> >
> > Yes but not completely yes.
> >
> > FIRST, As following:
> >
> >   /* Then, decide which mode (MMX mode or x87 mode) the function exit with.
> >      In order to decide whether we need to clear the MMX registers or the
> >      stack registers.  */
> >   bool exit_with_mmx_mode = false;
> >
> >   exit_with_mmx_mode = ((GET_CODE (crtl->return_rtx) == REG)
> >                         && (MMX_REG_P (crtl->return_rtx)));
> >
> >   /* then, let's see whether we can zero all st registers togeter.  */
> >   if (!exit_with_mmx_mode)
> >     st_zeroed = zero_all_st_registers (need_zeroed_hardregs);
> >
> >
> > We first check whether this routine exit with mmx mode, if Not then it’s X87 mode
> > (at exit, “EMMS” should already been called per ABI), then
> > The st/mm registers will be cleared as x87 stack registers.
> >
> > However, within the routine “zero_all_st_registers”:
> >
> > static bool
> > zero_all_st_registers (HARD_REG_SET need_zeroed_hardregs)
> > {
> >   unsigned int num_of_st = 0;
> >   for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
> >     if (STACK_REGNO_P (regno)
> >         && TEST_HARD_REG_BIT (need_zeroed_hardregs, regno))
> >       {
> >         num_of_st++;
> >         break;
> >       }
> >
> >   if (num_of_st == 0)
> >     return false;
> >
> >
> > In the above, I currently only check whether any “Stack” registers need to be zeroed or not.
> > But looks like we should also check any “MMX” register need to be zeroed or not too. If there is any
> > “MMX” register need to be zeroed, we still need to clear the whole X87 stack?
>
> I think so, but I have to check the details.

Please compile the following testcase with "-m32 -mmmx":

--cut here--
#include <stdio.h>

typedef int __v2si __attribute__ ((vector_size (8)));

__v2si zzz;

void
__attribute__ ((noinline))
mmx (__v2si a, __v2si b, __v2si c)
{
  __v2si res;

  res = __builtin_ia32_paddd (a, b);
  zzz = __builtin_ia32_paddd (res, c);

  __builtin_ia32_emms ();
}


int main ()
{
  __v2si a = { 123, 345 };
  __v2si b = { 234, 456 };
  __v2si c = { 345, 567 };

  mmx (a, b, c);

  printf ("%i, %i\n", zzz[0], zzz[1]);

  return 0;
}
--cut here--

at the end of mmx() function:

0x080491ed in mmx ()
(gdb) disass
Dump of assembler code for function mmx:
  0x080491e0 <+0>:     paddd  %mm1,%mm0
  0x080491e3 <+3>:     paddd  %mm2,%mm0
  0x080491e6 <+6>:     movq   %mm0,0x804c020
=> 0x080491ed <+13>:    emms
  0x080491ef <+15>:    ret
End of assembler dump.
(gdb) i r flo
st0            <invalid float value> (raw 0xffff00000558000002be)
st1            <invalid float value> (raw 0xffff000001c8000000ea)
st2            <invalid float value> (raw 0xffff0000023700000159)
st3            0                   (raw 0x00000000000000000000)
st4            0                   (raw 0x00000000000000000000)
st5            0                   (raw 0x00000000000000000000)
st6            0                   (raw 0x00000000000000000000)
st7            0                   (raw 0x00000000000000000000)
fctrl          0x37f               895
fstat          0x0                 0
ftag           0x556a              21866
fiseg          0x0                 0
fioff          0x0                 0
foseg          0x0                 0
fooff          0x0                 0
fop            0x0                 0

There are still values in the MMX registers. However, we are in x87
mode, so the whole stack has to be cleared.

Now, what to do if the function uses x87 registers and exits in MMX
mode? I guess we have to clear all MMX registers (modulo return value
reg).

Uros.
Qing Zhao Oct. 26, 2020, 9:23 p.m. UTC | #11
> On Oct 26, 2020, at 3:33 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
> 
> On Mon, Oct 26, 2020 at 9:05 PM Uros Bizjak <ubizjak@gmail.com> wrote:
>> 
>> On Mon, Oct 26, 2020 at 8:10 PM Qing Zhao <QING.ZHAO@oracle.com> wrote:
>>> 
>>> 
>>> 
>>>> On Oct 26, 2020, at 1:42 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
>>>> 
>>>> On Mon, Oct 26, 2020 at 6:30 PM Qing Zhao <QING.ZHAO@oracle.com> wrote:
>>>>> 
>>>>> 
>>>>> The following is the current change in i386.c, could you check whether the logic is good?
>>>> 
>>>> x87 handling looks good to me.
>>>> 
>>>> One remaining question: If the function uses MMX regs (either
>>>> internally or as an argument register), but exits in x87 mode, does
>>>> your logic clear the x87 stack?
>>> 
>>> Yes but not completely yes.
>>> 
>>> FIRST, As following:
>>> 
>>>  /* Then, decide which mode (MMX mode or x87 mode) the function exit with.
>>>     In order to decide whether we need to clear the MMX registers or the
>>>     stack registers.  */
>>>  bool exit_with_mmx_mode = false;
>>> 
>>>  exit_with_mmx_mode = ((GET_CODE (crtl->return_rtx) == REG)
>>>                        && (MMX_REG_P (crtl->return_rtx)));
>>> 
>>>  /* then, let's see whether we can zero all st registers togeter.  */
>>>  if (!exit_with_mmx_mode)
>>>    st_zeroed = zero_all_st_registers (need_zeroed_hardregs);
>>> 
>>> 
>>> We first check whether this routine exit with mmx mode, if Not then it’s X87 mode
>>> (at exit, “EMMS” should already been called per ABI), then
>>> The st/mm registers will be cleared as x87 stack registers.
>>> 
>>> However, within the routine “zero_all_st_registers”:
>>> 
>>> static bool
>>> zero_all_st_registers (HARD_REG_SET need_zeroed_hardregs)
>>> {
>>>  unsigned int num_of_st = 0;
>>>  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
>>>    if (STACK_REGNO_P (regno)
>>>        && TEST_HARD_REG_BIT (need_zeroed_hardregs, regno))
>>>      {
>>>        num_of_st++;
>>>        break;
>>>      }
>>> 
>>>  if (num_of_st == 0)
>>>    return false;
>>> 
>>> 
>>> In the above, I currently only check whether any “Stack” registers need to be zeroed or not.
>>> But looks like we should also check any “MMX” register need to be zeroed or not too. If there is any
>>> “MMX” register need to be zeroed, we still need to clear the whole X87 stack?
>> 
>> I think so, but I have to check the details.
> 
> Please compile the following testcase with "-m32 -mmmx":
> 
> --cut here--
> #include <stdio.h>
> 
> typedef int __v2si __attribute__ ((vector_size (8)));
> 
> __v2si zzz;
> 
> void
> __attribute__ ((noinline))
> mmx (__v2si a, __v2si b, __v2si c)
> {
>  __v2si res;
> 
>  res = __builtin_ia32_paddd (a, b);
>  zzz = __builtin_ia32_paddd (res, c);
> 
>  __builtin_ia32_emms ();
> }
> 
> 
> int main ()
> {
>  __v2si a = { 123, 345 };
>  __v2si b = { 234, 456 };
>  __v2si c = { 345, 567 };
> 
>  mmx (a, b, c);
> 
>  printf ("%i, %i\n", zzz[0], zzz[1]);
> 
>  return 0;
> }
> --cut here--
> 
> at the end of mmx() function:
> 
> 0x080491ed in mmx ()
> (gdb) disass
> Dump of assembler code for function mmx:
>  0x080491e0 <+0>:     paddd  %mm1,%mm0
>  0x080491e3 <+3>:     paddd  %mm2,%mm0
>  0x080491e6 <+6>:     movq   %mm0,0x804c020
> => 0x080491ed <+13>:    emms
>  0x080491ef <+15>:    ret
> End of assembler dump.
> (gdb) i r flo
> st0            <invalid float value> (raw 0xffff00000558000002be)
> st1            <invalid float value> (raw 0xffff000001c8000000ea)
> st2            <invalid float value> (raw 0xffff0000023700000159)
> st3            0                   (raw 0x00000000000000000000)
> st4            0                   (raw 0x00000000000000000000)
> st5            0                   (raw 0x00000000000000000000)
> st6            0                   (raw 0x00000000000000000000)
> st7            0                   (raw 0x00000000000000000000)
> fctrl          0x37f               895
> fstat          0x0                 0
> ftag           0x556a              21866
> fiseg          0x0                 0
> fioff          0x0                 0
> foseg          0x0                 0
> fooff          0x0                 0
> fop            0x0                 0
> 
> There are still values in the MMX registers. However, we are in x87
> mode, so the whole stack has to be cleared.

Yes. And I just tried, my current implementation behaved correctly. 
> 
> Now, what to do if the function uses x87 registers and exits in MMX
> mode? I guess we have to clear all MMX registers (modulo return value
> reg).

Need to add this part.

thanks.
Qing
> 
> Uros.
Qing Zhao Oct. 26, 2020, 11:06 p.m. UTC | #12
Hi, Uros,

Could you please check the change compared to the previous version for i386.c as following:
Let me know any issue there.

Thanks a lot.

Qing

---
 gcc/config/i386/i386.c                             | 136 ++++++++++++++++++---
 .../gcc.target/i386/zero-scratch-regs-28.c         |  17 +++
 .../gcc.target/i386/zero-scratch-regs-29.c         |  11 ++
 .../gcc.target/i386/zero-scratch-regs-30.c         |  11 ++
 4 files changed, 155 insertions(+), 20 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-28.c
 create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-29.c
 create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-30.c

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index e66dcf0d587..65f778112d9 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -3554,17 +3554,17 @@ ix86_function_value_regno_p (const unsigned int regno)
 /* Check whether the register REGNO should be zeroed on X86.
    When ALL_SSE_ZEROED is true, all SSE registers have been zeroed
    together, no need to zero it again.
-   Stack registers (st0-st7) and mm0-mm7 are aliased with each other.
-   very hard to be zeroed individually, don't zero individual st or
-   mm registgers.  */
+   When NEED_ZERO_MMX is true, MMX registers should be cleared.  */
 
 static bool
 zero_call_used_regno_p (const unsigned int regno,
-			bool all_sse_zeroed)
+			bool all_sse_zeroed,
+			bool need_zero_mmx)
 {
   return GENERAL_REGNO_P (regno)
 	 || (!all_sse_zeroed && SSE_REGNO_P (regno))
-	 || MASK_REGNO_P (regno);
+	 || MASK_REGNO_P (regno)
+	 || (need_zero_mmx && MMX_REGNO_P (regno));
 }
 
 /* Return the machine_mode that is used to zero register REGNO.  */
@@ -3579,8 +3579,12 @@ zero_call_used_regno_mode (const unsigned int regno)
     return SImode;
   else if (SSE_REGNO_P (regno))
     return V4SFmode;
-  else
+  else if (MASK_REGNO_P (regno))
     return HImode;
+  else if (MMX_REGNO_P (regno))
+    return DImode;
+  else
+    gcc_unreachable ();
 }
 
 /* Generate a rtx to zero all vector registers together if possible,
@@ -3603,7 +3607,7 @@ zero_all_vector_registers (HARD_REG_SET need_zeroed_hardregs)
   return gen_avx_vzeroall ();
 }
 
-/* Generate insns to zero all st/mm registers together.
+/* Generate insns to zero all st registers together.
    Return true when zeroing instructions are generated.
    Assume the number of st registers that are zeroed is num_of_st,
    we will emit the following sequence to zero them together:
@@ -3616,23 +3620,50 @@ zero_all_vector_registers (HARD_REG_SET need_zeroed_hardregs)
 		  ...
 		  fstp %%st(0);
    i.e., num_of_st fldz followed by num_of_st fstp to clear the stack
-   mark stack slots empty.  */
+   mark stack slots empty.
+
+   How to compute the num_of_st?
+   There is no direct mapping from stack registers to hard register
+   numbers.  If one stack register need to be cleared, we don't know
+   where in the stack the value remains.  So, if any stack register 
+   need to be cleared, the whole stack should be cleared.  However,
+   x87 stack registers that hold the return value should be excluded.
+   x87 returns in the top (two for complex values) register, so
+   num_of_st should be 7/6 when x87 returns, otherwise it will be 8.  */
+
 
 static bool
-zero_all_st_mm_registers (HARD_REG_SET need_zeroed_hardregs)
+zero_all_st_registers (HARD_REG_SET need_zeroed_hardregs)
 {
   unsigned int num_of_st = 0;
   for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
-    if (STACK_REGNO_P (regno)
-	&& TEST_HARD_REG_BIT (need_zeroed_hardregs, regno)
-	/* When the corresponding mm register also need to be cleared too.  */
-	&& TEST_HARD_REG_BIT (need_zeroed_hardregs,
-			      (regno - FIRST_STACK_REG + FIRST_MMX_REG)))
-      num_of_st++;
+    if ((STACK_REGNO_P (regno) || MMX_REGNO_P (regno))
+	&& TEST_HARD_REG_BIT (need_zeroed_hardregs, regno))
+      {
+	num_of_st++;
+	break;
+      }
 
   if (num_of_st == 0)
     return false;
 
+  bool return_with_x87 = false;
+  return_with_x87 = (crtl->return_rtx
+		     && (GET_CODE (crtl->return_rtx) == REG)
+		     && (STACK_REG_P (crtl->return_rtx)));
+
+  bool complex_return = false;
+  complex_return = (crtl->return_rtx
+		    && COMPLEX_MODE_P (GET_MODE (crtl->return_rtx)));
+
+  if (return_with_x87)
+    if (complex_return)
+      num_of_st = 6;
+    else
+      num_of_st = 7;
+  else
+    num_of_st = 8;
+
   rtx st_reg = gen_rtx_REG (XFmode, FIRST_STACK_REG);
   for (unsigned int i = 0; i < num_of_st; i++)
     emit_insn (gen_rtx_SET (st_reg, CONST0_RTX (XFmode)));
@@ -3646,6 +3677,43 @@ zero_all_st_mm_registers (HARD_REG_SET need_zeroed_hardregs)
   return true;
 }
 
+
+/* When the routine exit with MMX mode, if there is any ST registers
+   need to be zeroed, we should clear all MMX registers except the
+   one that holds the return value RET_MMX_REGNO.  */
+static bool
+zero_all_mm_registers (HARD_REG_SET need_zeroed_hardregs,
+		       unsigned int ret_mmx_regno)
+{
+  bool need_zero_all_mm = false;
+  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
+    if (STACK_REGNO_P (regno)
+	&& TEST_HARD_REG_BIT (need_zeroed_hardregs, regno))
+      {
+	need_zero_all_mm = true;
+	break;
+      }
+
+  if (!need_zero_all_mm)
+    return false;
+
+  rtx zero_mmx = NULL_RTX;
+  machine_mode mode = DImode;
+  for (unsigned int regno = FIRST_MMX_REG; regno <= LAST_MMX_REG; regno++)
+    if (regno != ret_mmx_regno)
+      {
+	rtx reg = gen_rtx_REG (mode, regno);
+	if (zero_mmx == NULL_RTX)
+	  {
+	    zero_mmx = reg;
+	    emit_insn (gen_rtx_SET (reg, const0_rtx));
+	  }
+	else
+	  emit_move_insn (reg, zero_mmx);
+      }
+  return true;
+}
+
 /* TARGET_ZERO_CALL_USED_REGS.  */
 /* Generate a sequence of instructions that zero registers specified by
    NEED_ZEROED_HARDREGS.  Return the ZEROED_HARDREGS that are actually
@@ -3655,7 +3723,8 @@ ix86_zero_call_used_regs (HARD_REG_SET need_zeroed_hardregs)
 {
   HARD_REG_SET zeroed_hardregs;
   bool all_sse_zeroed = false;
-  bool st_zeroed = false;
+  bool all_st_zeroed = false;
+  bool all_mm_zeroed = false;
 
   /* first, let's see whether we can zero all vector registers together.  */
   rtx zero_all_vec_insn = zero_all_vector_registers (need_zeroed_hardregs);
@@ -3665,24 +3734,42 @@ ix86_zero_call_used_regs (HARD_REG_SET need_zeroed_hardregs)
       all_sse_zeroed = true;
     }
 
-  /* then, let's see whether we can zero all st+mm registers togeter.  */
-  st_zeroed = zero_all_st_mm_registers (need_zeroed_hardregs);
+  /* Then, decide which mode (MMX mode or x87 mode) the function exit with.
+     In order to decide whether we need to clear the MMX registers or the
+     stack registers.  */
+
+  bool exit_with_mmx_mode = (crtl->return_rtx
+			     && (GET_CODE (crtl->return_rtx) == REG)
+			     && (MMX_REG_P (crtl->return_rtx)));
+
+  /* then, let's see whether we can zero all st registers together.  */
+  if (!exit_with_mmx_mode)
+    all_st_zeroed = zero_all_st_registers (need_zeroed_hardregs);
+  /* Or should we zero all MMX registers.  */
+  else 
+    {
+      unsigned int exit_mmx_regno = REGNO (crtl->return_rtx);
+      all_mm_zeroed = zero_all_mm_registers (need_zeroed_hardregs, 
+					     exit_mmx_regno);
+    }
 
   /* Now, generate instructions to zero all the registers.  */
 
   CLEAR_HARD_REG_SET (zeroed_hardregs);
-  if (st_zeroed)
+  if (all_st_zeroed)
     SET_HARD_REG_BIT (zeroed_hardregs, FIRST_STACK_REG);
 
   rtx zero_gpr = NULL_RTX;
   rtx zero_vector = NULL_RTX;
   rtx zero_mask = NULL_RTX;
+  rtx zero_mmx = NULL_RTX;
 
   for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
     {
       if (!TEST_HARD_REG_BIT (need_zeroed_hardregs, regno))
 	continue;
-      if (!zero_call_used_regno_p (regno, all_sse_zeroed))
+      if (!zero_call_used_regno_p (regno, all_sse_zeroed, 
+				   exit_with_mmx_mode && !all_mm_zeroed))
 	continue;
 
       SET_HARD_REG_BIT (zeroed_hardregs, regno);
@@ -3728,6 +3815,15 @@ ix86_zero_call_used_regs (HARD_REG_SET need_zeroed_hardregs)
 	  }
 	else
 	  emit_move_insn (reg, zero_mask);
+      else if (mode == DImode)
+	if (zero_mmx == NULL_RTX)
+	  {
+	    zero_mmx = reg;
+	    tmp = gen_rtx_SET (reg, const0_rtx);
+	    emit_insn (tmp);
+	  }
+	else
+	  emit_move_insn (reg, zero_mmx);
       else
 	gcc_unreachable ();
     }
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-28.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-28.c
new file mode 100644
index 00000000000..61c0bb7a35c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-28.c
@@ -0,0 +1,17 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -m32 -mmmx -fzero-call-used-regs=all" } */
+
+typedef int __v2si __attribute__ ((vector_size (8)));
+
+__v2si ret_mmx (void)
+{
+  return (__v2si) { 123, 345 };
+}
+
+/* { dg-final { scan-assembler "pxor\[ \t\]*%mm1, %mm1" } } */
+/* { dg-final { scan-assembler "movq\[ \t\]*%mm1, %mm2" } } */
+/* { dg-final { scan-assembler "movq\[ \t\]*%mm1, %mm3" } } */
+/* { dg-final { scan-assembler "movq\[ \t\]*%mm1, %mm4" } } */
+/* { dg-final { scan-assembler "movq\[ \t\]*%mm1, %mm5" } } */
+/* { dg-final { scan-assembler "movq\[ \t\]*%mm1, %mm6" } } */
+/* { dg-final { scan-assembler "movq\[ \t\]*%mm1, %mm7" } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-29.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-29.c
new file mode 100644
index 00000000000..db636654e70
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-29.c
@@ -0,0 +1,11 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -m32 -mmmx -fzero-call-used-regs=all" } */
+typedef int __v2si __attribute__ ((vector_size (8)));
+
+long double ret_x87 (void)
+{
+  return 1.1L;
+}
+
+/* { dg-final { scan-assembler-times "fldz" 7 } } */
+/* { dg-final { scan-assembler-times "fstp" 7 } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-30.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-30.c
new file mode 100644
index 00000000000..7c20b569bfa
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-30.c
@@ -0,0 +1,11 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2  -fzero-call-used-regs=all" } */
+typedef int __v2si __attribute__ ((vector_size (8)));
+
+_Complex long double ret_x87_cplx (void)
+{
+  return 1.1L + 1.2iL;
+}
+
+/* { dg-final { scan-assembler-times "fldz" 6 } } */
+/* { dg-final { scan-assembler-times "fstp" 6 } } */
Uros Bizjak Oct. 27, 2020, 8:09 a.m. UTC | #13
On Tue, Oct 27, 2020 at 12:08 AM Qing Zhao <QING.ZHAO@oracle.com> wrote:
>
> Hi, Uros,
>
> Could you please check the change compared to the previous version for i386.c as following:
> Let me know any issue there.

It looks that the combination when the function only touches MMX
registers (so, no x87 register is touched) and exits in MMX mode is
not handled in the optimal way. In this case, MMX registers should be
handled in the same way as XMM registers, where only used/arg/all regs
can be cleared.

                  MMX exit mode       x87 exit mode
-------------|----------------------|---------------
uses x87 reg | clear all MMX        | clear all x87
uses MMX reg | clear individual MMX | clear all x87
x87 + MMX    | clear all MMX        | clear all x87

IOW, if x87 is used, we don't know where in the stack (or in which MMX
"register") the value lies. But when the function uses only MMX
registers and exits in MMX mode, we know which register was used, and
we *can* access them individually.

Also, do we want to handle only arg/used registers? x87 has no arg
registers, so there is no need to clear anything. MMX has 3 argument
registers for 32bit targets, and is possible to clear them
individually when the function exits in MMX mode.

Please note review comments inline.

Uros.

> Thanks a lot.
>
> Qing
>
> ---
>  gcc/config/i386/i386.c                             | 136 ++++++++++++++++++---
>  .../gcc.target/i386/zero-scratch-regs-28.c         |  17 +++
>  .../gcc.target/i386/zero-scratch-regs-29.c         |  11 ++
>  .../gcc.target/i386/zero-scratch-regs-30.c         |  11 ++
>  4 files changed, 155 insertions(+), 20 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-28.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-29.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-30.c
>
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index e66dcf0d587..65f778112d9 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -3554,17 +3554,17 @@ ix86_function_value_regno_p (const unsigned int regno)
>  /* Check whether the register REGNO should be zeroed on X86.
>     When ALL_SSE_ZEROED is true, all SSE registers have been zeroed
>     together, no need to zero it again.
> -   Stack registers (st0-st7) and mm0-mm7 are aliased with each other.
> -   very hard to be zeroed individually, don't zero individual st or
> -   mm registgers.  */
> +   When NEED_ZERO_MMX is true, MMX registers should be cleared.  */
>
>  static bool
>  zero_call_used_regno_p (const unsigned int regno,
> - bool all_sse_zeroed)
> + bool all_sse_zeroed,
> + bool need_zero_mmx)
>  {
>    return GENERAL_REGNO_P (regno)
>    || (!all_sse_zeroed && SSE_REGNO_P (regno))
> -  || MASK_REGNO_P (regno);
> +  || MASK_REGNO_P (regno)
> +  || (need_zero_mmx && MMX_REGNO_P (regno));
>  }
>
>  /* Return the machine_mode that is used to zero register REGNO.  */
> @@ -3579,8 +3579,12 @@ zero_call_used_regno_mode (const unsigned int regno)
>      return SImode;
>    else if (SSE_REGNO_P (regno))
>      return V4SFmode;
> -  else
> +  else if (MASK_REGNO_P (regno))
>      return HImode;
> +  else if (MMX_REGNO_P (regno))
> +    return DImode;

Why DImode instead of V4HImode? DImode is "natural" for integer
registers, and we risk moves from integer to MMX regs.

> +  else
> +    gcc_unreachable ();
>  }
>
>  /* Generate a rtx to zero all vector registers together if possible,
> @@ -3603,7 +3607,7 @@ zero_all_vector_registers (HARD_REG_SET need_zeroed_hardregs)
>    return gen_avx_vzeroall ();
>  }
>
> -/* Generate insns to zero all st/mm registers together.
> +/* Generate insns to zero all st registers together.
>     Return true when zeroing instructions are generated.
>     Assume the number of st registers that are zeroed is num_of_st,
>     we will emit the following sequence to zero them together:
> @@ -3616,23 +3620,50 @@ zero_all_vector_registers (HARD_REG_SET need_zeroed_hardregs)
>     ...
>     fstp %%st(0);
>     i.e., num_of_st fldz followed by num_of_st fstp to clear the stack
> -   mark stack slots empty.  */
> +   mark stack slots empty.
> +
> +   How to compute the num_of_st?
> +   There is no direct mapping from stack registers to hard register
> +   numbers.  If one stack register need to be cleared, we don't know
> +   where in the stack the value remains.  So, if any stack register
> +   need to be cleared, the whole stack should be cleared.  However,
> +   x87 stack registers that hold the return value should be excluded.
> +   x87 returns in the top (two for complex values) register, so
> +   num_of_st should be 7/6 when x87 returns, otherwise it will be 8.  */
> +
>
>  static bool
> -zero_all_st_mm_registers (HARD_REG_SET need_zeroed_hardregs)
> +zero_all_st_registers (HARD_REG_SET need_zeroed_hardregs)
>  {
>    unsigned int num_of_st = 0;
>    for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
> -    if (STACK_REGNO_P (regno)
> - && TEST_HARD_REG_BIT (need_zeroed_hardregs, regno)
> - /* When the corresponding mm register also need to be cleared too.  */
> - && TEST_HARD_REG_BIT (need_zeroed_hardregs,
> -       (regno - FIRST_STACK_REG + FIRST_MMX_REG)))
> -      num_of_st++;
> +    if ((STACK_REGNO_P (regno) || MMX_REGNO_P (regno))
> + && TEST_HARD_REG_BIT (need_zeroed_hardregs, regno))
> +      {
> + num_of_st++;
> + break;
> +      }
>
>    if (num_of_st == 0)
>      return false;
>
> +  bool return_with_x87 = false;
> +  return_with_x87 = (crtl->return_rtx
> +      && (GET_CODE (crtl->return_rtx) == REG)
> +      && (STACK_REG_P (crtl->return_rtx)));

STACK_REG_P already checks for REG, no need for separate check.

> +
> +  bool complex_return = false;
> +  complex_return = (crtl->return_rtx
> +     && COMPLEX_MODE_P (GET_MODE (crtl->return_rtx)));
> +
> +  if (return_with_x87)
> +    if (complex_return)
> +      num_of_st = 6;
> +    else
> +      num_of_st = 7;
> +  else
> +    num_of_st = 8;
> +
>    rtx st_reg = gen_rtx_REG (XFmode, FIRST_STACK_REG);
>    for (unsigned int i = 0; i < num_of_st; i++)
>      emit_insn (gen_rtx_SET (st_reg, CONST0_RTX (XFmode)));
> @@ -3646,6 +3677,43 @@ zero_all_st_mm_registers (HARD_REG_SET need_zeroed_hardregs)
>    return true;
>  }
>
> +
> +/* When the routine exit with MMX mode, if there is any ST registers
> +   need to be zeroed, we should clear all MMX registers except the
> +   one that holds the return value RET_MMX_REGNO.  */
> +static bool
> +zero_all_mm_registers (HARD_REG_SET need_zeroed_hardregs,
> +        unsigned int ret_mmx_regno)
> +{
> +  bool need_zero_all_mm = false;
> +  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
> +    if (STACK_REGNO_P (regno)
> + && TEST_HARD_REG_BIT (need_zeroed_hardregs, regno))
> +      {
> + need_zero_all_mm = true;
> + break;
> +      }
> +
> +  if (!need_zero_all_mm)
> +    return false;
> +
> +  rtx zero_mmx = NULL_RTX;
> +  machine_mode mode = DImode;
> +  for (unsigned int regno = FIRST_MMX_REG; regno <= LAST_MMX_REG; regno++)
> +    if (regno != ret_mmx_regno)
> +      {
> + rtx reg = gen_rtx_REG (mode, regno);
> + if (zero_mmx == NULL_RTX)
> +   {
> +     zero_mmx = reg;
> +     emit_insn (gen_rtx_SET (reg, const0_rtx));

Use CONST0_RTX (mode), and you will be able to use V4HImode instead of DImode.

> +   }
> + else
> +   emit_move_insn (reg, zero_mmx);
> +      }
> +  return true;
> +}
> +
>  /* TARGET_ZERO_CALL_USED_REGS.  */
>  /* Generate a sequence of instructions that zero registers specified by
>     NEED_ZEROED_HARDREGS.  Return the ZEROED_HARDREGS that are actually
> @@ -3655,7 +3723,8 @@ ix86_zero_call_used_regs (HARD_REG_SET need_zeroed_hardregs)
>  {
>    HARD_REG_SET zeroed_hardregs;
>    bool all_sse_zeroed = false;
> -  bool st_zeroed = false;
> +  bool all_st_zeroed = false;
> +  bool all_mm_zeroed = false;
>
>    /* first, let's see whether we can zero all vector registers together.  */
>    rtx zero_all_vec_insn = zero_all_vector_registers (need_zeroed_hardregs);
> @@ -3665,24 +3734,42 @@ ix86_zero_call_used_regs (HARD_REG_SET need_zeroed_hardregs)
>        all_sse_zeroed = true;
>      }
>
> -  /* then, let's see whether we can zero all st+mm registers togeter.  */
> -  st_zeroed = zero_all_st_mm_registers (need_zeroed_hardregs);
> +  /* Then, decide which mode (MMX mode or x87 mode) the function exit with.
> +     In order to decide whether we need to clear the MMX registers or the
> +     stack registers.  */
> +
> +  bool exit_with_mmx_mode = (crtl->return_rtx
> +      && (GET_CODE (crtl->return_rtx) == REG)
> +      && (MMX_REG_P (crtl->return_rtx)));

MMX_REG_P also checks for REG internally.

> +
> +  /* then, let's see whether we can zero all st registers together.  */
> +  if (!exit_with_mmx_mode)
> +    all_st_zeroed = zero_all_st_registers (need_zeroed_hardregs);
> +  /* Or should we zero all MMX registers.  */
> +  else
> +    {
> +      unsigned int exit_mmx_regno = REGNO (crtl->return_rtx);
> +      all_mm_zeroed = zero_all_mm_registers (need_zeroed_hardregs,
> +      exit_mmx_regno);
> +    }
>
>    /* Now, generate instructions to zero all the registers.  */
>
>    CLEAR_HARD_REG_SET (zeroed_hardregs);
> -  if (st_zeroed)
> +  if (all_st_zeroed)
>      SET_HARD_REG_BIT (zeroed_hardregs, FIRST_STACK_REG);
>
>    rtx zero_gpr = NULL_RTX;
>    rtx zero_vector = NULL_RTX;
>    rtx zero_mask = NULL_RTX;
> +  rtx zero_mmx = NULL_RTX;
>
>    for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
>      {
>        if (!TEST_HARD_REG_BIT (need_zeroed_hardregs, regno))
>   continue;
> -      if (!zero_call_used_regno_p (regno, all_sse_zeroed))
> +      if (!zero_call_used_regno_p (regno, all_sse_zeroed,
> +    exit_with_mmx_mode && !all_mm_zeroed))
>   continue;
>
>        SET_HARD_REG_BIT (zeroed_hardregs, regno);
> @@ -3728,6 +3815,15 @@ ix86_zero_call_used_regs (HARD_REG_SET need_zeroed_hardregs)
>     }
>   else
>     emit_move_insn (reg, zero_mask);
> +      else if (mode == DImode)
> + if (zero_mmx == NULL_RTX)
> +   {
> +     zero_mmx = reg;
> +     tmp = gen_rtx_SET (reg, const0_rtx);

CONST0_RTX (mode), and you will be able to use V4HImode.

> +     emit_insn (tmp);
> +   }
> + else
> +   emit_move_insn (reg, zero_mmx);
>        else
>   gcc_unreachable ();
>      }
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-28.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-28.c
> new file mode 100644
> index 00000000000..61c0bb7a35c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-28.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -m32 -mmmx -fzero-call-used-regs=all" } */

-m32 should not be used explicitly. Use:

/* { dg-require-effective-target ia32 } */

instead.

Also, can we test -fzero-call-used-regs=used and
-fzero-call-used-regs=arg with MMX regs? As said above, when function
exits in MMX mode, and no x87 is touched, we can clear separate MMX
registers.

> +
> +typedef int __v2si __attribute__ ((vector_size (8)));
> +
> +__v2si ret_mmx (void)
> +{
> +  return (__v2si) { 123, 345 };
> +}
> +
> +/* { dg-final { scan-assembler "pxor\[ \t\]*%mm1, %mm1" } } */
> +/* { dg-final { scan-assembler "movq\[ \t\]*%mm1, %mm2" } } */
> +/* { dg-final { scan-assembler "movq\[ \t\]*%mm1, %mm3" } } */
> +/* { dg-final { scan-assembler "movq\[ \t\]*%mm1, %mm4" } } */
> +/* { dg-final { scan-assembler "movq\[ \t\]*%mm1, %mm5" } } */
> +/* { dg-final { scan-assembler "movq\[ \t\]*%mm1, %mm6" } } */
> +/* { dg-final { scan-assembler "movq\[ \t\]*%mm1, %mm7" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-29.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-29.c
> new file mode 100644
> index 00000000000..db636654e70
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-29.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -m32 -mmmx -fzero-call-used-regs=all" } */

No need for "-m32 -mmmx", this test works the same for 32bit and 64bit targets..

> +typedef int __v2si __attribute__ ((vector_size (8)));

The above is not needed in this test.
> +
> +long double ret_x87 (void)
> +{
> +  return 1.1L;
> +}
> +
> +/* { dg-final { scan-assembler-times "fldz" 7 } } */
> +/* { dg-final { scan-assembler-times "fstp" 7 } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-30.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-30.c
> new file mode 100644
> index 00000000000..7c20b569bfa
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-30.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2  -fzero-call-used-regs=all" } */
> +typedef int __v2si __attribute__ ((vector_size (8)));

The above line is not needed.

> +
> +_Complex long double ret_x87_cplx (void)
> +{
> +  return 1.1L + 1.2iL;
> +}
> +
> +/* { dg-final { scan-assembler-times "fldz" 6 } } */
> +/* { dg-final { scan-assembler-times "fstp" 6 } } */

The above applies only to 64bit target. 32bit targets pass complex
value via memory and should clear all 8 registers.
> --
> 2.11.0
>
>
>
>
>
> On Oct 26, 2020, at 4:23 PM, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
>
>
>
> On Oct 26, 2020, at 3:33 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
>
> On Mon, Oct 26, 2020 at 9:05 PM Uros Bizjak <ubizjak@gmail.com> wrote:
>
>
> On Mon, Oct 26, 2020 at 8:10 PM Qing Zhao <QING.ZHAO@oracle.com> wrote:
>
>
>
>
> On Oct 26, 2020, at 1:42 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
>
> On Mon, Oct 26, 2020 at 6:30 PM Qing Zhao <QING.ZHAO@oracle.com> wrote:
>
>
>
> The following is the current change in i386.c, could you check whether the logic is good?
>
>
> x87 handling looks good to me.
>
> One remaining question: If the function uses MMX regs (either
> internally or as an argument register), but exits in x87 mode, does
> your logic clear the x87 stack?
>
>
> Yes but not completely yes.
>
> FIRST, As following:
>
> /* Then, decide which mode (MMX mode or x87 mode) the function exit with.
>    In order to decide whether we need to clear the MMX registers or the
>    stack registers.  */
> bool exit_with_mmx_mode = false;
>
> exit_with_mmx_mode = ((GET_CODE (crtl->return_rtx) == REG)
>                       && (MMX_REG_P (crtl->return_rtx)));
>
> /* then, let's see whether we can zero all st registers togeter.  */
> if (!exit_with_mmx_mode)
>   st_zeroed = zero_all_st_registers (need_zeroed_hardregs);
>
>
> We first check whether this routine exit with mmx mode, if Not then it’s X87 mode
> (at exit, “EMMS” should already been called per ABI), then
> The st/mm registers will be cleared as x87 stack registers.
>
> However, within the routine “zero_all_st_registers”:
>
> static bool
> zero_all_st_registers (HARD_REG_SET need_zeroed_hardregs)
> {
> unsigned int num_of_st = 0;
> for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
>   if (STACK_REGNO_P (regno)
>       && TEST_HARD_REG_BIT (need_zeroed_hardregs, regno))
>     {
>       num_of_st++;
>       break;
>     }
>
> if (num_of_st == 0)
>   return false;
>
>
> In the above, I currently only check whether any “Stack” registers need to be zeroed or not.
> But looks like we should also check any “MMX” register need to be zeroed or not too. If there is any
> “MMX” register need to be zeroed, we still need to clear the whole X87 stack?
>
>
> I think so, but I have to check the details.
>
>
> Please compile the following testcase with "-m32 -mmmx":
>
> --cut here--
> #include <stdio.h>
>
> typedef int __v2si __attribute__ ((vector_size (8)));
>
> __v2si zzz;
>
> void
> __attribute__ ((noinline))
> mmx (__v2si a, __v2si b, __v2si c)
> {
> __v2si res;
>
> res = __builtin_ia32_paddd (a, b);
> zzz = __builtin_ia32_paddd (res, c);
>
> __builtin_ia32_emms ();
> }
>
>
> int main ()
> {
> __v2si a = { 123, 345 };
> __v2si b = { 234, 456 };
> __v2si c = { 345, 567 };
>
> mmx (a, b, c);
>
> printf ("%i, %i\n", zzz[0], zzz[1]);
>
> return 0;
> }
> --cut here--
>
> at the end of mmx() function:
>
> 0x080491ed in mmx ()
> (gdb) disass
> Dump of assembler code for function mmx:
> 0x080491e0 <+0>:     paddd  %mm1,%mm0
> 0x080491e3 <+3>:     paddd  %mm2,%mm0
> 0x080491e6 <+6>:     movq   %mm0,0x804c020
> => 0x080491ed <+13>:    emms
> 0x080491ef <+15>:    ret
> End of assembler dump.
> (gdb) i r flo
> st0            <invalid float value> (raw 0xffff00000558000002be)
> st1            <invalid float value> (raw 0xffff000001c8000000ea)
> st2            <invalid float value> (raw 0xffff0000023700000159)
> st3            0                   (raw 0x00000000000000000000)
> st4            0                   (raw 0x00000000000000000000)
> st5            0                   (raw 0x00000000000000000000)
> st6            0                   (raw 0x00000000000000000000)
> st7            0                   (raw 0x00000000000000000000)
> fctrl          0x37f               895
> fstat          0x0                 0
> ftag           0x556a              21866
> fiseg          0x0                 0
> fioff          0x0                 0
> foseg          0x0                 0
> fooff          0x0                 0
> fop            0x0                 0
>
> There are still values in the MMX registers. However, we are in x87
> mode, so the whole stack has to be cleared.
>
>
> Yes. And I just tried, my current implementation behaved correctly.
>
>
> Now, what to do if the function uses x87 registers and exits in MMX
> mode? I guess we have to clear all MMX registers (modulo return value
> reg).
>
>
> Need to add this part.
>
> thanks.
> Qing
>
>
> Uros.
>
>
Richard Sandiford Oct. 27, 2020, 12:03 p.m. UTC | #14
To review my review…

Richard Sandiford via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>> +In addition to the above three basic choices, the register set can be further
>> +limited by adding "-gpr" (i.e., general purpose register), "-arg" (i.e.,
>> +argument register), or both as following:
>
> How about:
>
> -------------------------------------------------------------------
> In addition to these three basic choices, it is possible to modify
> @samp{used} or @samp{all} as follows:
>
> @itemize @bullet
> @item
> Adding @samp{-gpr} restricts the zeroing to general-purpose registers.
>
> @item
> Adding @samp{-arg} restricts the zeroing to registers that are used
> to pass parameters.  When applied to @samp{all}, this includes all
> parameter registers defined by the platform's calling convention,
> regardless of whether the function uses those parameter registers.
> @end @itemize

Actually, I guess this applies to @samp{used} as well.  And I guess
using “argument” rather than “parameter” would be more consistent.
So how about:

-------------------------------------------------------------------
Adding @samp{-arg} restricts the zeroing to registers that can sometimes
be used to pass function arguments.  This includes all arguments registers
defined by the platform's calling convention, regardless of whether
the function uses those registers for function arguments or not.
-------------------------------------------------------------------

>> +@item -fzero-call-used-regs=@var{choice}
>> +@opindex fzero-call-used-regs
>> +Zero call-used registers at function return to increase the program
>> +security by either mitigating Return-Oriented Programming (ROP) or
>> +preventing information leak through registers.
>
> After this, we should probably say something like:
>
> -------------------------------------------------------------------
> The possible values of @var{choice} are the same as for the
> @samp{zero_call_used_regs} attribute (@pxref{…}).  The default
> is @samp{skip}.
> -------------------------------------------------------------------
>
> (with the xref filled in)

To be clearer, I meant to do this instead of repeating the description.

Thanks,
Richard
Qing Zhao Oct. 27, 2020, 1:55 p.m. UTC | #15
> On Oct 27, 2020, at 3:09 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
> 
> On Tue, Oct 27, 2020 at 12:08 AM Qing Zhao <QING.ZHAO@oracle.com <mailto:QING.ZHAO@oracle.com>> wrote:
>> 
>> Hi, Uros,
>> 
>> Could you please check the change compared to the previous version for i386.c as following:
>> Let me know any issue there.
> 
> It looks that the combination when the function only touches MMX
> registers (so, no x87 register is touched) and exits in MMX mode is
> not handled in the optimal way.

My current code should handle this in the expected way already, as following:


  /* Then, decide which mode (MMX mode or x87 mode) the function exit with.
     In order to decide whether we need to clear the MMX registers or the
     stack registers.  */

  bool exit_with_mmx_mode = (crtl->return_rtx
                             && (GET_CODE (crtl->return_rtx) == REG)
                             && (MMX_REG_P (crtl->return_rtx)));

  /* then, let's see whether we can zero all st registers together.  */
  if (!exit_with_mmx_mode)
    all_st_zeroed = zero_all_st_registers (need_zeroed_hardregs);
  /* Or should we zero all MMX registers.  */
  else
    {
      unsigned int exit_mmx_regno = REGNO (crtl->return_rtx);
      all_mm_zeroed = zero_all_mm_registers (need_zeroed_hardregs,
                                             exit_mmx_regno);
    }


“Zero_all_mm_registers” only zero all MM registers when any ST register need to be cleared. Otherwise, it will not clear all MM registers.
And individual MM registers will be cleared in the regular loop as all other registers.

> In this case, MMX registers should be
> handled in the same way as XMM registers, where only used/arg/all regs
> can be cleared.
> 
>                  MMX exit mode       x87 exit mode
> -------------|----------------------|---------------
> uses x87 reg | clear all MMX        | clear all x87
> uses MMX reg | clear individual MMX | clear all x87
> x87 + MMX    | clear all MMX        | clear all x87
> 
> IOW, if x87 is used, we don't know where in the stack (or in which MMX
> "register") the value lies. But when the function uses only MMX
> registers and exits in MMX mode, we know which register was used, and
> we *can* access them individually.

I will add the above table to the comment part of the implementation. 
> 
> Also, do we want to handle only arg/used registers?

Yes.  Arg/used register handling has been done in middle end.  (In gcc/function.c) as following:

  /* For each of the hard registers, check to see whether we should zero it if:
     1. it is a call-used-registers;
 and 2. it is not a fixed-registers;
 and 3. it is not live at the return of the routine;
 and 4. it is general registor if gpr_only is true;
 and 5. it is used in the routine if used_only is true;
 and 6. it is a register that passes parameter if arg_only is true;
   */

The register set that i386 backend gets already satisfied all the above requirement. 

> x87 has no arg
> registers, so there is no need to clear anything. MMX has 3 argument
> registers for 32bit targets, and is possible to clear them
> individually when the function exits in MMX mode.

The above information should already been covered by :

     if (arg_only && !FUNCTION_ARG_REGNO_P (regno))

Right?


> 
> Please note review comments inline.
> 
> Uros.
> 
>> Thanks a lot.
>> 
>> Qing
>> 
>> ---
>> gcc/config/i386/i386.c                             | 136 ++++++++++++++++++---
>> .../gcc.target/i386/zero-scratch-regs-28.c         |  17 +++
>> .../gcc.target/i386/zero-scratch-regs-29.c         |  11 ++
>> .../gcc.target/i386/zero-scratch-regs-30.c         |  11 ++
>> 4 files changed, 155 insertions(+), 20 deletions(-)
>> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-28.c
>> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-29.c
>> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-30.c
>> 
>> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
>> index e66dcf0d587..65f778112d9 100644
>> --- a/gcc/config/i386/i386.c
>> +++ b/gcc/config/i386/i386.c
>> @@ -3554,17 +3554,17 @@ ix86_function_value_regno_p (const unsigned int regno)
>> /* Check whether the register REGNO should be zeroed on X86.
>>    When ALL_SSE_ZEROED is true, all SSE registers have been zeroed
>>    together, no need to zero it again.
>> -   Stack registers (st0-st7) and mm0-mm7 are aliased with each other.
>> -   very hard to be zeroed individually, don't zero individual st or
>> -   mm registgers.  */
>> +   When NEED_ZERO_MMX is true, MMX registers should be cleared.  */
>> 
>> static bool
>> zero_call_used_regno_p (const unsigned int regno,
>> - bool all_sse_zeroed)
>> + bool all_sse_zeroed,
>> + bool need_zero_mmx)
>> {
>>   return GENERAL_REGNO_P (regno)
>>   || (!all_sse_zeroed && SSE_REGNO_P (regno))
>> -  || MASK_REGNO_P (regno);
>> +  || MASK_REGNO_P (regno)
>> +  || (need_zero_mmx && MMX_REGNO_P (regno));
>> }
>> 
>> /* Return the machine_mode that is used to zero register REGNO.  */
>> @@ -3579,8 +3579,12 @@ zero_call_used_regno_mode (const unsigned int regno)
>>     return SImode;
>>   else if (SSE_REGNO_P (regno))
>>     return V4SFmode;
>> -  else
>> +  else if (MASK_REGNO_P (regno))
>>     return HImode;
>> +  else if (MMX_REGNO_P (regno))
>> +    return DImode;
> 
> Why DImode instead of V4HImode?

I tried  V4HImode, and V2SImode in the beginning, all failed during compilation time with “unrecognized inns” error, so, I have to use “DImode”. 

> DImode is "natural" for integer
> registers, and we risk moves from integer to MMX regs.

So, does this mean using DImode is not correct? 
> 
>> +  else
>> +    gcc_unreachable ();
>> }
>> 
>> /* Generate a rtx to zero all vector registers together if possible,
>> @@ -3603,7 +3607,7 @@ zero_all_vector_registers (HARD_REG_SET need_zeroed_hardregs)
>>   return gen_avx_vzeroall ();
>> }
>> 
>> -/* Generate insns to zero all st/mm registers together.
>> +/* Generate insns to zero all st registers together.
>>    Return true when zeroing instructions are generated.
>>    Assume the number of st registers that are zeroed is num_of_st,
>>    we will emit the following sequence to zero them together:
>> @@ -3616,23 +3620,50 @@ zero_all_vector_registers (HARD_REG_SET need_zeroed_hardregs)
>>    ...
>>    fstp %%st(0);
>>    i.e., num_of_st fldz followed by num_of_st fstp to clear the stack
>> -   mark stack slots empty.  */
>> +   mark stack slots empty.
>> +
>> +   How to compute the num_of_st?
>> +   There is no direct mapping from stack registers to hard register
>> +   numbers.  If one stack register need to be cleared, we don't know
>> +   where in the stack the value remains.  So, if any stack register
>> +   need to be cleared, the whole stack should be cleared.  However,
>> +   x87 stack registers that hold the return value should be excluded.
>> +   x87 returns in the top (two for complex values) register, so
>> +   num_of_st should be 7/6 when x87 returns, otherwise it will be 8.  */
>> +
>> 
>> static bool
>> -zero_all_st_mm_registers (HARD_REG_SET need_zeroed_hardregs)
>> +zero_all_st_registers (HARD_REG_SET need_zeroed_hardregs)
>> {
>>   unsigned int num_of_st = 0;
>>   for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
>> -    if (STACK_REGNO_P (regno)
>> - && TEST_HARD_REG_BIT (need_zeroed_hardregs, regno)
>> - /* When the corresponding mm register also need to be cleared too.  */
>> - && TEST_HARD_REG_BIT (need_zeroed_hardregs,
>> -       (regno - FIRST_STACK_REG + FIRST_MMX_REG)))
>> -      num_of_st++;
>> +    if ((STACK_REGNO_P (regno) || MMX_REGNO_P (regno))
>> + && TEST_HARD_REG_BIT (need_zeroed_hardregs, regno))
>> +      {
>> + num_of_st++;
>> + break;
>> +      }
>> 
>>   if (num_of_st == 0)
>>     return false;
>> 
>> +  bool return_with_x87 = false;
>> +  return_with_x87 = (crtl->return_rtx
>> +      && (GET_CODE (crtl->return_rtx) == REG)
>> +      && (STACK_REG_P (crtl->return_rtx)));
> 
> STACK_REG_P already checks for REG, no need for separate check.

Okay.

> 
>> +
>> +  bool complex_return = false;
>> +  complex_return = (crtl->return_rtx
>> +     && COMPLEX_MODE_P (GET_MODE (crtl->return_rtx)));
>> +
>> +  if (return_with_x87)
>> +    if (complex_return)
>> +      num_of_st = 6;
>> +    else
>> +      num_of_st = 7;
>> +  else
>> +    num_of_st = 8;
>> +
>>   rtx st_reg = gen_rtx_REG (XFmode, FIRST_STACK_REG);
>>   for (unsigned int i = 0; i < num_of_st; i++)
>>     emit_insn (gen_rtx_SET (st_reg, CONST0_RTX (XFmode)));
>> @@ -3646,6 +3677,43 @@ zero_all_st_mm_registers (HARD_REG_SET need_zeroed_hardregs)
>>   return true;
>> }
>> 
>> +
>> +/* When the routine exit with MMX mode, if there is any ST registers
>> +   need to be zeroed, we should clear all MMX registers except the
>> +   one that holds the return value RET_MMX_REGNO.  */
>> +static bool
>> +zero_all_mm_registers (HARD_REG_SET need_zeroed_hardregs,
>> +        unsigned int ret_mmx_regno)
>> +{
>> +  bool need_zero_all_mm = false;
>> +  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
>> +    if (STACK_REGNO_P (regno)
>> + && TEST_HARD_REG_BIT (need_zeroed_hardregs, regno))
>> +      {
>> + need_zero_all_mm = true;
>> + break;
>> +      }
>> +
>> +  if (!need_zero_all_mm)
>> +    return false;
>> +
>> +  rtx zero_mmx = NULL_RTX;
>> +  machine_mode mode = DImode;
>> +  for (unsigned int regno = FIRST_MMX_REG; regno <= LAST_MMX_REG; regno++)
>> +    if (regno != ret_mmx_regno)
>> +      {
>> + rtx reg = gen_rtx_REG (mode, regno);
>> + if (zero_mmx == NULL_RTX)
>> +   {
>> +     zero_mmx = reg;
>> +     emit_insn (gen_rtx_SET (reg, const0_rtx));
> 
> Use CONST0_RTX (mode), and you will be able to use V4HImode instead of DImode.
Will try this.
> 
>> +   }
>> + else
>> +   emit_move_insn (reg, zero_mmx);
>> +      }
>> +  return true;
>> +}
>> +
>> /* TARGET_ZERO_CALL_USED_REGS.  */
>> /* Generate a sequence of instructions that zero registers specified by
>>    NEED_ZEROED_HARDREGS.  Return the ZEROED_HARDREGS that are actually
>> @@ -3655,7 +3723,8 @@ ix86_zero_call_used_regs (HARD_REG_SET need_zeroed_hardregs)
>> {
>>   HARD_REG_SET zeroed_hardregs;
>>   bool all_sse_zeroed = false;
>> -  bool st_zeroed = false;
>> +  bool all_st_zeroed = false;
>> +  bool all_mm_zeroed = false;
>> 
>>   /* first, let's see whether we can zero all vector registers together.  */
>>   rtx zero_all_vec_insn = zero_all_vector_registers (need_zeroed_hardregs);
>> @@ -3665,24 +3734,42 @@ ix86_zero_call_used_regs (HARD_REG_SET need_zeroed_hardregs)
>>       all_sse_zeroed = true;
>>     }
>> 
>> -  /* then, let's see whether we can zero all st+mm registers togeter.  */
>> -  st_zeroed = zero_all_st_mm_registers (need_zeroed_hardregs);
>> +  /* Then, decide which mode (MMX mode or x87 mode) the function exit with.
>> +     In order to decide whether we need to clear the MMX registers or the
>> +     stack registers.  */
>> +
>> +  bool exit_with_mmx_mode = (crtl->return_rtx
>> +      && (GET_CODE (crtl->return_rtx) == REG)
>> +      && (MMX_REG_P (crtl->return_rtx)));
> 
> MMX_REG_P also checks for REG internally.

Okay, will update.
> 
>> +
>> +  /* then, let's see whether we can zero all st registers together.  */
>> +  if (!exit_with_mmx_mode)
>> +    all_st_zeroed = zero_all_st_registers (need_zeroed_hardregs);
>> +  /* Or should we zero all MMX registers.  */
>> +  else
>> +    {
>> +      unsigned int exit_mmx_regno = REGNO (crtl->return_rtx);
>> +      all_mm_zeroed = zero_all_mm_registers (need_zeroed_hardregs,
>> +      exit_mmx_regno);
>> +    }
>> 
>>   /* Now, generate instructions to zero all the registers.  */
>> 
>>   CLEAR_HARD_REG_SET (zeroed_hardregs);
>> -  if (st_zeroed)
>> +  if (all_st_zeroed)
>>     SET_HARD_REG_BIT (zeroed_hardregs, FIRST_STACK_REG);
>> 
>>   rtx zero_gpr = NULL_RTX;
>>   rtx zero_vector = NULL_RTX;
>>   rtx zero_mask = NULL_RTX;
>> +  rtx zero_mmx = NULL_RTX;
>> 
>>   for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
>>     {
>>       if (!TEST_HARD_REG_BIT (need_zeroed_hardregs, regno))
>>  continue;
>> -      if (!zero_call_used_regno_p (regno, all_sse_zeroed))
>> +      if (!zero_call_used_regno_p (regno, all_sse_zeroed,
>> +    exit_with_mmx_mode && !all_mm_zeroed))
>>  continue;
>> 
>>       SET_HARD_REG_BIT (zeroed_hardregs, regno);
>> @@ -3728,6 +3815,15 @@ ix86_zero_call_used_regs (HARD_REG_SET need_zeroed_hardregs)
>>    }
>>  else
>>    emit_move_insn (reg, zero_mask);
>> +      else if (mode == DImode)
>> + if (zero_mmx == NULL_RTX)
>> +   {
>> +     zero_mmx = reg;
>> +     tmp = gen_rtx_SET (reg, const0_rtx);
> 
> CONST0_RTX (mode), and you will be able to use V4HImode.

Okay, will try this.
> 
>> +     emit_insn (tmp);
>> +   }
>> + else
>> +   emit_move_insn (reg, zero_mmx);
>>       else
>>  gcc_unreachable ();
>>     }
>> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-28.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-28.c
>> new file mode 100644
>> index 00000000000..61c0bb7a35c
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-28.c
>> @@ -0,0 +1,17 @@
>> +/* { dg-do compile { target *-*-linux* } } */
>> +/* { dg-options "-O2 -m32 -mmmx -fzero-call-used-regs=all" } */
> 
> -m32 should not be used explicitly. Use:
> 
> /* { dg-require-effective-target ia32 } */
> 
> instead.
> 
> Also, can we test -fzero-call-used-regs=used and
> -fzero-call-used-regs=arg with MMX regs? As said above, when function
> exits in MMX mode, and no x87 is touched, we can clear separate MMX
> registers.

I will try to add these new testing.
> 
>> +
>> +typedef int __v2si __attribute__ ((vector_size (8)));
>> +
>> +__v2si ret_mmx (void)
>> +{
>> +  return (__v2si) { 123, 345 };
>> +}
>> +
>> +/* { dg-final { scan-assembler "pxor\[ \t\]*%mm1, %mm1" } } */
>> +/* { dg-final { scan-assembler "movq\[ \t\]*%mm1, %mm2" } } */
>> +/* { dg-final { scan-assembler "movq\[ \t\]*%mm1, %mm3" } } */
>> +/* { dg-final { scan-assembler "movq\[ \t\]*%mm1, %mm4" } } */
>> +/* { dg-final { scan-assembler "movq\[ \t\]*%mm1, %mm5" } } */
>> +/* { dg-final { scan-assembler "movq\[ \t\]*%mm1, %mm6" } } */
>> +/* { dg-final { scan-assembler "movq\[ \t\]*%mm1, %mm7" } } */
>> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-29.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-29.c
>> new file mode 100644
>> index 00000000000..db636654e70
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-29.c
>> @@ -0,0 +1,11 @@
>> +/* { dg-do compile { target *-*-linux* } } */
>> +/* { dg-options "-O2 -m32 -mmmx -fzero-call-used-regs=all" } */
> 
> No need for "-m32 -mmmx", this test works the same for 32bit and 64bit targets..

Will fix this.
> 
>> +typedef int __v2si __attribute__ ((vector_size (8)));
> 
> The above is not needed in this test.

Okay. 
>> +
>> +long double ret_x87 (void)
>> +{
>> +  return 1.1L;
>> +}
>> +
>> +/* { dg-final { scan-assembler-times "fldz" 7 } } */
>> +/* { dg-final { scan-assembler-times "fstp" 7 } } */
>> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-30.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-30.c
>> new file mode 100644
>> index 00000000000..7c20b569bfa
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-30.c
>> @@ -0,0 +1,11 @@
>> +/* { dg-do compile { target *-*-linux* } } */
>> +/* { dg-options "-O2  -fzero-call-used-regs=all" } */
>> +typedef int __v2si __attribute__ ((vector_size (8)));
> 
> The above line is not needed.
Okay.

> 
>> +
>> +_Complex long double ret_x87_cplx (void)
>> +{
>> +  return 1.1L + 1.2iL;
>> +}
>> +
>> +/* { dg-final { scan-assembler-times "fldz" 6 } } */
>> +/* { dg-final { scan-assembler-times "fstp" 6 } } */
> 
> The above applies only to 64bit target. 32bit targets pass complex
> value via memory and should clear all 8 registers.

Will fix this.

Thanks.

Qing
Qing Zhao Oct. 27, 2020, 2:33 p.m. UTC | #16
> On Oct 26, 2020, at 1:47 PM, Richard Sandiford <richard.sandiford@arm.com> wrote:
> 
> Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
>> index c9f7299..3a884e1 100644
>> --- a/gcc/doc/extend.texi
>> +++ b/gcc/doc/extend.texi
>> @@ -3992,6 +3992,49 @@ performing a link with relocatable output (i.e.@: @code{ld -r}) on them.
>> A declaration to which @code{weakref} is attached and that is associated
>> with a named @code{target} must be @code{static}.
>> 
>> +@item zero_call_used_regs ("@var{choice}")
>> +@cindex @code{zero_call_used_regs} function attribute
>> +
>> +The @code{zero_call_used_regs} attribute causes the compiler to zero
>> +a subset of all call-used registers at function return according to
>> +@var{choice}.
>> +This is used to increase the program security by either mitigating
>> +Return-Oriented Programming (ROP) or preventing information leak
>> +through registers.
>> +
>> +A "call-used" register is a register that is clobbered by function calls,
>> +as a result, the caller has to save and restore it before or after a
>> +function call.  It is also called as "call-clobbered", "caller-saved", or
>> +"volatile".
> 
> texinfo quoting is to use ``…'' rather than "…".  So maybe:
> 
> -------------------------------------------------------------------
> A ``call-used'' register is a register whose contents can be changed by
> a function call; therefore, a caller cannot assume that the register has
> the same contents on return from the function as it had before calling
> the function.  Such registers are also called ``call-clobbered'',
> ``caller-saved'', or ``volatile''.
> —————————————————————————————————

Okay.

> 
>> +In order to satisfy users with different security needs and control the
>> +run-time overhead at the same time,  GCC provides a flexible way to choose
> 
> nit: should only be one space after the comma
Okay.
> 
>> +the subset of the call-used registers to be zeroed.
> 
> Maybe add “The three basic values of @var{choice} are:”

Yes, This sounds better.
> 
>> +
>> +@samp{skip} doesn't zero any call-used registers.
>> +@samp{used} zeros call-used registers which are used in the function.  A "used"
> 
> Maybe s/zeros/only zeros/?
> 
> s/which/that/

okay.
> 
>> +register is one whose content has been set or referenced in the function.
>> +@samp{all} zeros all call-used registers.
> 
> I think this would be better formatted using a @table.
> 
>> +In addition to the above three basic choices, the register set can be further
>> +limited by adding "-gpr" (i.e., general purpose register), "-arg" (i.e.,
>> +argument register), or both as following:
> 
> How about:
> 
> -------------------------------------------------------------------
> In addition to these three basic choices, it is possible to modify
> @samp{used} or @samp{all} as follows:
> 
> @itemize @bullet
> @item
> Adding @samp{-gpr} restricts the zeroing to general-purpose registers.
> 
> @item
> Adding @samp{-arg} restricts the zeroing to registers that are used
> to pass parameters.  When applied to @samp{all}, this includes all
> parameter registers defined by the platform's calling convention,
> regardless of whether the function uses those parameter registers.
> @end @itemize
> 
> The modifiers can be used individually or together.  If they are used
> together, they must appear in the order above.
> 
> The full list of @var{choice}s is therefore:
> -------------------------------------------------------------------
> 
> with the list repeating @var{skip}, @var{used} and @var{all}.
> 
> (untested)

Okay, I will update and test this.
> 
>> +@samp{used-gpr-arg} zeros used call-used general purpose registers that
>> +pass parameters.
>> +@samp{used-arg} zeros used call-used registers that pass parameters.
>> +@samp{all-gpr-arg} zeros all call-used general purpose registers that pass
>> +parameters.
>> +@samp{all-arg} zeros all call-used registers that pass parameters.
>> +@samp{used-gpr} zeros call-used general purpose registers which are used in the
>> +function.
>> +@samp{all-gpr} zeros all call-used general purpose registers.
> 
> I think this too should be a @table.

Okay.
> 
>> +
>> +Among this list, "used-gpr-arg", "used-arg", "all-gpr-arg", and "all-arg" are
>> +mainly used for ROP mitigation.
> 
> Should be quoted using @samp rather than “.
Okay.
> 
>> +@item -fzero-call-used-regs=@var{choice}
>> +@opindex fzero-call-used-regs
>> +Zero call-used registers at function return to increase the program
>> +security by either mitigating Return-Oriented Programming (ROP) or
>> +preventing information leak through registers.
> 
> After this, we should probably say something like:
> 
> -------------------------------------------------------------------
> The possible values of @var{choice} are the same as for the
> @samp{zero_call_used_regs} attribute (@pxref{…}).  The default
> is @samp{skip}.
> -------------------------------------------------------------------
> 
> (with the xref filled in)

Okay.
> 
>> diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
>> index 97437e8..3b75c46 100644
>> --- a/gcc/doc/tm.texi
>> +++ b/gcc/doc/tm.texi
>> @@ -12053,6 +12053,18 @@ argument list due to stack realignment.  Return @code{NULL} if no DRAP
>> is needed.
>> @end deftypefn
>> 
>> +@deftypefn {Target Hook} HARD_REG_SET TARGET_ZERO_CALL_USED_REGS (HARD_REG_SET @var{selected_regs})
>> +This target hook emits instructions to zero subset of @var{selected_regs}
> 
> …to zero the subset…
> (probably my mistake, sorry)
Okay. (My mistake, I should check it myself..)
> 
>> diff --git a/gcc/flag-types.h b/gcc/flag-types.h
>> index 852ea76..0f7e503 100644
>> --- a/gcc/flag-types.h
>> +++ b/gcc/flag-types.h
>> @@ -285,6 +285,15 @@ enum sanitize_code {
>> 				  | SANITIZE_BOUNDS_STRICT
>> };
>> 
>> +enum  zero_call_used_regs_code {
>> +  UNSET = 0,
>> +  SKIP = 1UL << 0,
>> +  ONLY_USED = 1UL << 1,
>> +  ONLY_GPR = 1UL << 2,
>> +  ONLY_ARG = 1UL << 3,
>> +  ALL = 1UL << 4
>> +};
> 
> I'd suggested these names on the assumption that we'd be using
> a C++ enum class, so that the enum would be referenced as
> name::ALL, name::SKIP, etc.  But I guess using a C++ enum class
> doesn't work well with bitfields after all.
> 
> These names are too generic without the name:: scoping though.
> Perhaps we should put them in a namespace:
> 
>  namespace zero_regs_flags {
>    const unsigned int UNSET = 0;
>    …etc…
>  }
> 
> (call-used probably doesn't need to be part of the flag names,
> since the concept is more general than that and call-usedness
> is really a filter that's being applied on top.  Although I guess
> the same is true of “zero”. ;-))
> 
> I don't think we should have ALL as a separate flag: ALL is the absence
> of ONLY_*.  Maybe we should have an ENABLED flag that all non-skip
> combinations use?
> 
> If it makes things easier, I think it would be good to have e.g.:
> 
>  unsigned int USED_GPR = ENABLED | ONLY_USED | ONLY_GPR;
> 
> inside the namespace, to reduce the verbosity in the option table.

Then, the final namespace will look like:

namespace zero_regs_flags {
  const unsigned int UNSET = 0;
  const unsigned int SKIP = 1UL << 0;
  const unsigned int ONLY_USED = 1UL << 1;
  const unsigned int ONLY_GPR = 1UL << 2;
  const unsigned int ONLY_ARG = 1UL << 3;
  const unsigned int ENABLED = 1UL << 4;
  const unsigned int USED_GPR_ARG = ONLY_USED | ONLY_GPR | ONLY_ARG;
  const unsigned int USED_GPR = ENABLED | ONLY_USED | ONLY_GPR;
  const unsigned int USED_ARG = ENABLED | ONLY_USED | ONLY_ARG;
  const unsigned int USED = ENABLED | ONLY_USED;
  const unsigned int ALL_GRP_ARG = ENABLED | ONLY_GPR | ONLY_ARG;
  const unsigned int ALL_GPR = ENABLED | ONLY_GPR;
  const unsigned int ALL_ARG = ENABLED | ONLY_ARG;
  const unsigned int ALL = ENABLED;
}

??

> 
>> +  /* If gpr_only is true, only zero call-used-registers that are
>> +     general-purpose registers; if used_only is true, only zero
>> +     call-used-registers that are used in the current function.  */
>> +
>> +  gpr_only = crtl->zero_call_used_regs & ONLY_GPR;
>> +  used_only = crtl->zero_call_used_regs & ONLY_USED;
>> +  arg_only = crtl->zero_call_used_regs & ONLY_ARG;
>> +
>> +  /* For each of the hard registers, check to see whether we should zero it if:
> 
> s/check to see whether //
Okay.
> 
>> +     1. it is a call-used-registers;
> 
> s/call-used-registers/call-used register/
Okay.
> 
>> + and 2. it is not a fixed-registers;
> 
> s/fixed-registers/fixed register/
Okay.
> 
>> + and 3. it is not live at the return of the routine;
>> + and 4. it is general registor if gpr_only is true;
>> + and 5. it is used in the routine if used_only is true;
>> + and 6. it is a register that passes parameter if arg_only is true;
>> +   */
> 
> Under GCC formatting, the “and” lines need to be indented under “For each”.
> Maybe indent the “1.” line a bit more if you think it looks nicer with the
> numbers lined up (it probably does).
> 
> Similarly, the last bit of text should end with “.  */”, rather than
> with the “;\n  */” above.
> 
> (Sorry that the rules are so picky about this.)

  /* For each of the hard registers, check to see whether we should zero it if:
            1. it is a call-used-registers;
     and 2. it is not a fixed-registers;
     and 3. it is not live at the return of the routine;
     and 4. it is general registor if gpr_only is true;
     and 5. it is used in the routine if used_only is true;
     and 6. it is a register that passes parameter if arg_only is true.  */

How about this?

> 
>> +  /* First, prepare the data flow information.  */
>> +  basic_block bb = BLOCK_FOR_INSN (ret);
>> +  bitmap live_out;
>> +  live_out = BITMAP_ALLOC (NULL);
> 
> Should just use auto_bitmap here, which will also handle the freeing.
I remembered that I used auto_bitmap initially, but it didn’t work, so I changed it like this.
I will try to see whether it work. 
> 
>> +  bitmap_copy (live_out, df_get_live_out (bb));
>> +  df_simulate_initialize_backwards (bb, live_out);
>> +  df_simulate_one_insn_backwards (bb, ret, live_out);
>> +
>> +  HARD_REG_SET need_zeroed_hardregs;
>> +  CLEAR_HARD_REG_SET (need_zeroed_hardregs);
> 
> Maybe s/need_zeroed/selected/?  Similarly to the target hook comment
> in the previous review, I think “need” makes it sound like the target
> has no freedom to decline.

Okay.
> 
>> +  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
>> +    {
>> +      if (!crtl->abi->clobbers_full_reg_p (regno))
>> +	continue;
>> +      if (fixed_regs[regno])
>> +	continue;
>> +      if (REGNO_REG_SET_P (live_out, regno))
>> +	continue;
>> +      if (gpr_only
>> +	  && !TEST_HARD_REG_BIT (reg_class_contents[GENERAL_REGS], regno))
>> +	continue;
>> +      if (used_only && !df_regs_ever_live_p (regno))
>> +	continue;
>> +      if (arg_only && !FUNCTION_ARG_REGNO_P (regno))
>> +	continue;
>> +
>> +      /* Now this is a register that we might want to zero.  */
>> +      SET_HARD_REG_BIT (need_zeroed_hardregs, regno);
>> +    }
>> +
>> +  BITMAP_FREE (live_out);
>> +
>> +  if (hard_reg_set_empty_p (need_zeroed_hardregs))
>> +    return;
>> +
>> +  /* Now we get a hard register set that need to be zeroed, pass it to
>> +     target to generate zeroing sequence.  */
> 
> /* Now that we have a hard register set that needs to be zeroed, pass it
>   to the target to generate the zeroing sequence.  */

Okay.
> 
>> +  HARD_REG_SET zeroed_hardregs;
>> +  start_sequence ();
>> +  zeroed_hardregs = targetm.calls.zero_call_used_regs (need_zeroed_hardregs);
>> +  rtx_insn *seq = get_insns ();
>> +  end_sequence ();
>> +  if (seq)
>> +    {
>> +      /* Emit the memory blockage and register clobber asm volatile before
>> +	 the whole sequence.  */
>> +      start_sequence ();
>> +      expand_asm_reg_clobber_mem_blockage (zeroed_hardregs);
>> +      rtx_insn *seq_barrier = get_insns ();
>> +      end_sequence ();
>> +
>> +      emit_insn_before (seq_barrier, ret);
>> +      emit_insn_before (seq, ret);
>> +
>> +      /* Update the data flow information.  */
>> +      crtl->must_be_zero_on_return |= zeroed_hardregs;
>> +      df_set_bb_dirty (EXIT_BLOCK_PTR_FOR_FN (cfun));
>> +    }
>> +}
>> +
>> +
>> /* Return a sequence to be used as the epilogue for the current function,
>>    or NULL.  */
>> 
>> @@ -6486,7 +6584,120 @@ make_pass_thread_prologue_and_epilogue (gcc::context *ctxt)
>> {
>>   return new pass_thread_prologue_and_epilogue (ctxt);
>> }
>> -
>> 
>> +
>> +static unsigned int
>> +rest_of_zero_call_used_regs (void)
> 
> This needs a function comment.  Maybe:
> 
> /* Iterate over the function's return instructions and insert any
>   register zeroing required by the -fzero-call-used-regs command-line
>   option or the "zero_call_used_regs" function attribute.  */
> 
> Also, we might as well make it:
> 
> pass_zero_call_used_regs::execute
> 
> rather than a separate function.  The “rest_of_…” stuff is mostly legacy.

You mean to delete the “rest_of_zero_call_used_regs” function, and move its body to 
Pass_zero_call_used_regs::execute?


> 
>> +{
>> +  edge_iterator ei;
>> +  edge e;
>> +  rtx_insn *insn;
>> +
>> +  /* This pass needs data flow information.  */
>> +  df_analyze ();
>> +
>> +  /* Search all the "return"s in the routine, and insert instruction sequence to
>> +     zero the call used registers.  */
>> +  FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds)
>> +    {
>> +      insn = BB_END (e->src);
> 
> Modern style would be to declare insn here rather than above.

Okay.
> 
>> +      if (JUMP_P (insn) && ANY_RETURN_P (JUMP_LABEL (insn)))
>> +	gen_call_used_regs_seq (insn);
>> +    }
>> +
>> +  return 0;
>> +}
>> +
>> +namespace {
>> +
>> +const pass_data pass_data_zero_call_used_regs =
>> +{
>> +  RTL_PASS, /* type */
>> +  "zero_call_used_regs", /* name */
>> +  OPTGROUP_NONE, /* optinfo_flags */
>> +  TV_NONE, /* tv_id */
>> +  0, /* properties_required */
>> +  0, /* properties_provided */
>> +  0, /* properties_destroyed */
>> +  0, /* todo_flags_start */
>> +  0, /* todo_flags_finish */
>> +};
>> +
>> +class pass_zero_call_used_regs: public rtl_opt_pass
>> +{
>> +public:
>> +  pass_zero_call_used_regs (gcc::context *ctxt)
>> +    : rtl_opt_pass (pass_data_zero_call_used_regs, ctxt)
>> +  {}
>> +
>> +  /* opt_pass methods: */
>> +  virtual bool gate (function *);
>> +
>> +  virtual unsigned int execute (function *)
>> +    {
>> +      return rest_of_zero_call_used_regs ();
>> +    }
>> +
>> +}; // class pass_zero_call_used_regs
>> +
>> +bool
>> +pass_zero_call_used_regs::gate (function *fun)
>> +{
>> +  unsigned int zero_regs_type = UNSET;
>> +  unsigned int attr_zero_regs_type = UNSET;
>> +
>> +  tree attr_zero_regs
>> +	= lookup_attribute ("zero_call_used_regs",
>> +			    DECL_ATTRIBUTES (fun->decl));
>> +
>> +  /* Get the type of zero_call_used_regs from function attribute.  */
>> +  if (attr_zero_regs)
>> +    {
>> +      bool found = false;
>> +      unsigned int i;
>> +
>> +      /* The TREE_VALUE of an attribute is a TREE_LIST whose TREE_VALUE
>> +	 is the attribute argument's value.  */
>> +      attr_zero_regs = TREE_VALUE (attr_zero_regs);
>> +      gcc_assert (TREE_CODE (attr_zero_regs) == TREE_LIST);
>> +      attr_zero_regs = TREE_VALUE (attr_zero_regs);
>> +      gcc_assert (TREE_CODE (attr_zero_regs) == STRING_CST);
>> +
>> +      for (i = 0; zero_call_used_regs_opts[i].name != NULL; ++i)
>> +	if (strcmp (TREE_STRING_POINTER (attr_zero_regs),
>> +		     zero_call_used_regs_opts[i].name) == 0)
>> +	  {
>> +	    attr_zero_regs_type |= zero_call_used_regs_opts[i].flag;
> 
> Think = is less surprising than |= here.
Yes.
> 
>> +	    found = true;
> 
> All valid values are nonzero, so we don't need a separate boolean.

Yes.
> 
>> + 	    break;
>> +	  }
>> +
>> +      if (!found)
>> +	warning_at (DECL_SOURCE_LOCATION (fun->decl), 0,
>> +		    "unrecognized zero_call_used_regs attribute: %qs",
>> +		    TREE_STRING_POINTER (attr_zero_regs));
> 
> I think we should warn when handling the attribute in c-attribs.c
> (as before, IIRC), and make it silent here.
Okay. Will do that.
> 
>> +    }
>> +
>> +  if (flag_zero_call_used_regs)
>> +    if (!attr_zero_regs)
>> +      zero_regs_type = flag_zero_call_used_regs;
>> +    else
>> +      zero_regs_type = attr_zero_regs_type;
>> +  else
>> +    zero_regs_type = attr_zero_regs_type;
> 
> Seems easier to make the attribute code set zero_regs_type directly,
> then have:
> 
>  if (!zero_regs_type)
>    zero_regs_type = flag_zero_call_used_regs;

okay.

> 
>> +
>> +  crtl->zero_call_used_regs = zero_regs_type;
>> +
>> +  /* No need to zero call-used-regs when no user request is present.  */
>> +  return zero_regs_type > SKIP;
> 
> Think testing for skip using & SKIP or ==/!= SKIP is more obvious.

Testing with & SKIP or ==/!= SKIP will not work if the flag is UNSET. 

> 
> This is too much for a gate function, which should be a simple
> side-effect-free function that tests whether the pass should run.
> Perhaps we should just make the pass unconditional and do the above
> in ::execute.  The pass is very cheap, so gating probably isn't
> worthwhile.

Okay, I can do that.
> 
>> +}
>> +
>> +} // anon namespace
>> +
>> +rtl_opt_pass *
>> +make_pass_zero_call_used_regs (gcc::context *ctxt)
>> +{
>> +  return new pass_zero_call_used_regs (ctxt);
>> +}
>> 
>> /* If CONSTRAINT is a matching constraint, then return its number.
>>    Otherwise, return -1.  */
>> diff --git a/gcc/optabs.c b/gcc/optabs.c
>> index 8ad7f4b..bd64af0 100644
>> --- a/gcc/optabs.c
>> +++ b/gcc/optabs.c
>> @@ -6484,6 +6484,48 @@ expand_memory_blockage (void)
>>     expand_asm_memory_blockage ();
>> }
>> 
>> +/* Generate asm volatile("" : : : "memory") as a memory blockage, at the
>> +   same time clobbering the register set specified by REGS.  */
>> +
>> +void
>> +expand_asm_reg_clobber_mem_blockage (HARD_REG_SET regs)
>> +{
>> +  rtx asm_op, clob_mem;
>> +
>> +  unsigned int num_of_regs = 0;
>> +  for (unsigned int i = 0; i < FIRST_PSEUDO_REGISTER; i++)
>> +    if (TEST_HARD_REG_BIT (regs, i))
>> +      num_of_regs++;
>> +
>> +  asm_op = gen_rtx_ASM_OPERANDS (VOIDmode, "", "", 0,
>> +				 rtvec_alloc (0), rtvec_alloc (0),
>> +				 rtvec_alloc (0), UNKNOWN_LOCATION);
>> +  MEM_VOLATILE_P (asm_op) = 1;
>> +
>> +  rtvec v = rtvec_alloc (num_of_regs + 2);
>> +
>> +  clob_mem = gen_rtx_SCRATCH (VOIDmode);
>> +  clob_mem = gen_rtx_MEM (BLKmode, clob_mem);
>> +  clob_mem = gen_rtx_CLOBBER (VOIDmode, clob_mem);
>> +
>> +  RTVEC_ELT (v,0) = asm_op;
>> +  RTVEC_ELT (v,1) = clob_mem;
> 
> nit: should be a space before the comma, here and below.
Okay.
> 
>> +
>> +  if (num_of_regs > 0)
>> +    {
>> +      unsigned int j = 2;
>> +      for (unsigned int i = 0; i < FIRST_PSEUDO_REGISTER; i++)
>> +	if (TEST_HARD_REG_BIT (regs, i))
>> +	  {
>> +	    RTVEC_ELT (v,j) = gen_rtx_CLOBBER (VOIDmode, regno_reg_rtx[i]);
>> + 	    j++;
>> +	  }
>> +      gcc_assert (j == (num_of_regs + 2));
>> +    }
>> +
>> +  emit_insn (gen_rtx_PARALLEL (VOIDmode, v));
>> +}
>> +
>> /* This routine will either emit the mem_thread_fence pattern or issue a 
>>    sync_synchronize to generate a fence for memory model MEMMODEL.  */
>> 
>> diff --git a/gcc/optabs.h b/gcc/optabs.h
>> index 0b14700..bfa10c8 100644
>> --- a/gcc/optabs.h
>> +++ b/gcc/optabs.h
>> @@ -345,6 +345,8 @@ rtx expand_atomic_store (rtx, rtx, enum memmodel, bool);
>> rtx expand_atomic_fetch_op (rtx, rtx, rtx, enum rtx_code, enum memmodel, 
>> 			      bool);
>> 
>> +extern void expand_asm_reg_clobber_mem_blockage (HARD_REG_SET);
>> +
>> extern bool insn_operand_matches (enum insn_code icode, unsigned int opno,
>> 				  rtx operand);
>> extern bool valid_multiword_target_p (rtx);
>> diff --git a/gcc/opts.c b/gcc/opts.c
>> index 3bda59a..f95a1f0 100644
>> --- a/gcc/opts.c
>> +++ b/gcc/opts.c
>> @@ -1776,6 +1776,24 @@ const struct sanitizer_opts_s coverage_sanitizer_opts[] =
>>   { NULL, 0U, 0UL, false }
>> };
>> 
>> +/* -fzero-call-used-regs= suboptions.  */
>> +const struct zero_call_used_regs_opts_s zero_call_used_regs_opts[] =
>> +{
>> +#define ZERO_CALL_USED_REGS_OPT(name, flags) \
>> +    { #name, flags }
>> +  ZERO_CALL_USED_REGS_OPT (skip, SKIP),
>> +  ZERO_CALL_USED_REGS_OPT (used-gpr-arg, (ONLY_USED | ONLY_GPR | ONLY_ARG)),
>> +  ZERO_CALL_USED_REGS_OPT (used-arg, (ONLY_USED | ONLY_ARG)),
>> +  ZERO_CALL_USED_REGS_OPT (all-gpr-arg, (ONLY_GPR | ONLY_ARG)),
>> +  ZERO_CALL_USED_REGS_OPT (all-arg, ONLY_ARG),
>> +  ZERO_CALL_USED_REGS_OPT (used-gpr, (ONLY_USED | ONLY_GPR)),
>> +  ZERO_CALL_USED_REGS_OPT (all-gpr, ONLY_GPR),
>> +  ZERO_CALL_USED_REGS_OPT (used, ONLY_USED),
>> +  ZERO_CALL_USED_REGS_OPT (all, ALL),
>> +#undef ZERO_CALL_USED_REGS_OPT
>> +  {NULL, 0U}
>> +};
>> +
>> /* A struct for describing a run of chars within a string.  */
>> 
>> class string_fragment
>> @@ -1970,6 +1988,30 @@ parse_no_sanitize_attribute (char *value)
>>   return flags;
>> }
>> 
>> +/* Parse -fzero-call-used-regs suboptions from ARG, return the FLAGS.  */
>> +
>> +unsigned int
>> +parse_zero_call_used_regs_options (const char *arg)
>> +{
>> +  bool found = false;
>> +  unsigned int flags = 0;
>> +  unsigned int i;
>> +
>> +  /* Check to see if the string matches a sub-option name.  */
>> +  for (i = 0; zero_call_used_regs_opts[i].name != NULL; ++i)
>> +    if (strcmp (arg, zero_call_used_regs_opts[i].name) == 0)
>> +      {
>> +	flags |= zero_call_used_regs_opts[i].flag;
>> + 	found = true;
> 
> Same comments as above.

Okay.
> 
>> +	break;
>> +      }
>> +
>> +  if (!found)
>> +    error ("unrecognized argument to %<-fzero-call-used-regs=%>: %qs", arg);
> 
> Think we should use %qs for the option name too, to reduce the number
> of translation strings.
I will try to do this.
> 
>> diff --git a/gcc/recog.c b/gcc/recog.c
>> index ce83b7f..e231b5d 100644
>> --- a/gcc/recog.c
>> +++ b/gcc/recog.c
>> @@ -923,6 +923,22 @@ validate_simplify_insn (rtx_insn *insn)
>>   return ((num_changes_pending () > 0) && (apply_change_group () > 0));
>> }
>> 
>> 
>> +
>> +/* Check whether INSN matches a specific alternative of an .md pattern.  */
>> +bool
>> +valid_insn_p (rtx_insn *insn)
> 
> Very minor nit, but it's unusual to have three blank lines before
> the comment and none afterwards.  The codebase isn't very consistent
> about this, but local style seems mostly to be one blank line before
> the comment and one afterwards.
Okay.
> 
>> diff --git a/gcc/testsuite/c-c++-common/zero-scratch-regs-1.c b/gcc/testsuite/c-c++-common/zero-scratch-regs-1.c
>> new file mode 100644
>> index 0000000..f44add9
>> --- /dev/null
>> +++ b/gcc/testsuite/c-c++-common/zero-scratch-regs-1.c
>> @@ -0,0 +1,15 @@
>> +/* { dg-do run } */
>> +/* { dg-options "-O2 -fzero-call-used-regs=all" } */
>> +
>> +volatile int result = 0;
>> +int 
>> +__attribute__((noinline))
> 
> “noipa” is stronger.  Same for all the tests.

Okay.
> 
> The i386 tests are Uros's domain, but I think it would be good to have
> generic tests for all the variants.  E.g.:
> 
> (1) one test per -fzero-call-used-regs option (including skip)
> (2) one test that tries all valid attribute values (including skip),
>    compiled without -fzero-call-used-regs
> (3) one test that #includes (2) but is compiled with an arbitrarily-chosen
>    -fzero-call-used-regs (say =all).
> (4) one test that tries invalid uses of the attribute, e.g.:
>    - one use of the attribute on a variable
>    - one use of the attribute on a function, but with an obviously-wrong
>      value
>    - one use of the attribute on a function, but with -gpr and -arg the
>      wrong way around

You mean to add the above new testing cases to gcc/testsuite/c-c++-common
For all targets?

Then, we cannot test for the assembly matching, we can only testing for “dg-do run” right?

Thanks.

Qing

> 
> (Sorry for not getting to the tests last time.)
> 
> Thanks,
> Richard
Richard Sandiford Oct. 27, 2020, 3:36 p.m. UTC | #17
Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>>> diff --git a/gcc/flag-types.h b/gcc/flag-types.h
>>> index 852ea76..0f7e503 100644
>>> --- a/gcc/flag-types.h
>>> +++ b/gcc/flag-types.h
>>> @@ -285,6 +285,15 @@ enum sanitize_code {
>>> 				  | SANITIZE_BOUNDS_STRICT
>>> };
>>> 
>>> +enum  zero_call_used_regs_code {
>>> +  UNSET = 0,
>>> +  SKIP = 1UL << 0,
>>> +  ONLY_USED = 1UL << 1,
>>> +  ONLY_GPR = 1UL << 2,
>>> +  ONLY_ARG = 1UL << 3,
>>> +  ALL = 1UL << 4
>>> +};
>> 
>> I'd suggested these names on the assumption that we'd be using
>> a C++ enum class, so that the enum would be referenced as
>> name::ALL, name::SKIP, etc.  But I guess using a C++ enum class
>> doesn't work well with bitfields after all.
>> 
>> These names are too generic without the name:: scoping though.
>> Perhaps we should put them in a namespace:
>> 
>>  namespace zero_regs_flags {
>>    const unsigned int UNSET = 0;
>>    …etc…
>>  }
>> 
>> (call-used probably doesn't need to be part of the flag names,
>> since the concept is more general than that and call-usedness
>> is really a filter that's being applied on top.  Although I guess
>> the same is true of “zero”. ;-))
>> 
>> I don't think we should have ALL as a separate flag: ALL is the absence
>> of ONLY_*.  Maybe we should have an ENABLED flag that all non-skip
>> combinations use?
>> 
>> If it makes things easier, I think it would be good to have e.g.:
>> 
>>  unsigned int USED_GPR = ENABLED | ONLY_USED | ONLY_GPR;
>> 
>> inside the namespace, to reduce the verbosity in the option table.
>
> Then, the final namespace will look like:
>
> namespace zero_regs_flags {
>   const unsigned int UNSET = 0;
>   const unsigned int SKIP = 1UL << 0;
>   const unsigned int ONLY_USED = 1UL << 1;
>   const unsigned int ONLY_GPR = 1UL << 2;
>   const unsigned int ONLY_ARG = 1UL << 3;
>   const unsigned int ENABLED = 1UL << 4;
>   const unsigned int USED_GPR_ARG = ONLY_USED | ONLY_GPR | ONLY_ARG;

“ENABLED |” here

>   const unsigned int USED_GPR = ENABLED | ONLY_USED | ONLY_GPR;
>   const unsigned int USED_ARG = ENABLED | ONLY_USED | ONLY_ARG;
>   const unsigned int USED = ENABLED | ONLY_USED;
>   const unsigned int ALL_GRP_ARG = ENABLED | ONLY_GPR | ONLY_ARG;

GPR

>   const unsigned int ALL_GPR = ENABLED | ONLY_GPR;
>   const unsigned int ALL_ARG = ENABLED | ONLY_ARG;
>   const unsigned int ALL = ENABLED;
> }
>
> ??

Yeah, looks right modulo the above.

>>> + and 3. it is not live at the return of the routine;
>>> + and 4. it is general registor if gpr_only is true;
>>> + and 5. it is used in the routine if used_only is true;
>>> + and 6. it is a register that passes parameter if arg_only is true;
>>> +   */
>> 
>> Under GCC formatting, the “and” lines need to be indented under “For each”.
>> Maybe indent the “1.” line a bit more if you think it looks nicer with the
>> numbers lined up (it probably does).
>> 
>> Similarly, the last bit of text should end with “.  */”, rather than
>> with the “;\n  */” above.
>> 
>> (Sorry that the rules are so picky about this.)
>
>   /* For each of the hard registers, check to see whether we should zero it if:
>             1. it is a call-used-registers;
>      and 2. it is not a fixed-registers;
>      and 3. it is not live at the return of the routine;
>      and 4. it is general registor if gpr_only is true;
>      and 5. it is used in the routine if used_only is true;
>      and 6. it is a register that passes parameter if arg_only is true.  */
>
> How about this?

The 1. line looks overindented now :-)  Was expecting it to line up
with "2.".

Otherwise looks good.

>>> +  HARD_REG_SET zeroed_hardregs;
>>> +  start_sequence ();
>>> +  zeroed_hardregs = targetm.calls.zero_call_used_regs (need_zeroed_hardregs);
>>> +  rtx_insn *seq = get_insns ();
>>> +  end_sequence ();
>>> +  if (seq)
>>> +    {
>>> +      /* Emit the memory blockage and register clobber asm volatile before
>>> +	 the whole sequence.  */
>>> +      start_sequence ();
>>> +      expand_asm_reg_clobber_mem_blockage (zeroed_hardregs);
>>> +      rtx_insn *seq_barrier = get_insns ();
>>> +      end_sequence ();
>>> +
>>> +      emit_insn_before (seq_barrier, ret);
>>> +      emit_insn_before (seq, ret);
>>> +
>>> +      /* Update the data flow information.  */
>>> +      crtl->must_be_zero_on_return |= zeroed_hardregs;
>>> +      df_set_bb_dirty (EXIT_BLOCK_PTR_FOR_FN (cfun));
>>> +    }
>>> +}
>>> +
>>> +
>>> /* Return a sequence to be used as the epilogue for the current function,
>>>    or NULL.  */
>>> 
>>> @@ -6486,7 +6584,120 @@ make_pass_thread_prologue_and_epilogue (gcc::context *ctxt)
>>> {
>>>   return new pass_thread_prologue_and_epilogue (ctxt);
>>> }
>>> -
>>> 
>>> +
>>> +static unsigned int
>>> +rest_of_zero_call_used_regs (void)
>> 
>> This needs a function comment.  Maybe:
>> 
>> /* Iterate over the function's return instructions and insert any
>>   register zeroing required by the -fzero-call-used-regs command-line
>>   option or the "zero_call_used_regs" function attribute.  */
>> 
>> Also, we might as well make it:
>> 
>> pass_zero_call_used_regs::execute
>> 
>> rather than a separate function.  The “rest_of_…” stuff is mostly legacy.
>
> You mean to delete the “rest_of_zero_call_used_regs” function, and move its body to 
> Pass_zero_call_used_regs::execute?

Yes.

>>> +
>>> +  crtl->zero_call_used_regs = zero_regs_type;
>>> +
>>> +  /* No need to zero call-used-regs when no user request is present.  */
>>> +  return zero_regs_type > SKIP;
>> 
>> Think testing for skip using & SKIP or ==/!= SKIP is more obvious.
>
> Testing with & SKIP or ==/!= SKIP will not work if the flag is UNSET. 

But can that happen?  I would have expected the command line to be
!= UNDEF at this stage.

If that's not true, then & ENABLED would also work.

>> The i386 tests are Uros's domain, but I think it would be good to have
>> generic tests for all the variants.  E.g.:
>> 
>> (1) one test per -fzero-call-used-regs option (including skip)
>> (2) one test that tries all valid attribute values (including skip),
>>    compiled without -fzero-call-used-regs
>> (3) one test that #includes (2) but is compiled with an arbitrarily-chosen
>>    -fzero-call-used-regs (say =all).
>> (4) one test that tries invalid uses of the attribute, e.g.:
>>    - one use of the attribute on a variable
>>    - one use of the attribute on a function, but with an obviously-wrong
>>      value
>>    - one use of the attribute on a function, but with -gpr and -arg the
>>      wrong way around
>
> You mean to add the above new testing cases to gcc/testsuite/c-c++-common
> For all targets?

Yes.

> Then, we cannot test for the assembly matching, we can only testing for “dg-do run” right?

Right.  This is in addition to target-specific tests rather than a
replacement for them.

Thanks,
Richard
Qing Zhao Oct. 27, 2020, 4:10 p.m. UTC | #18
Uros,

The following is the change compared to version 4 after fix all the issues you raised in the previous email.

Let me know if there is any other issue.

Thanks.

Qing

---
 gcc/config/i386/i386.c                             | 162 +++++++++++++++++----
 .../gcc.target/i386/zero-scratch-regs-28.c         |  16 ++
 .../gcc.target/i386/zero-scratch-regs-29.c         |  10 ++
 .../gcc.target/i386/zero-scratch-regs-30.c         |  11 ++
 .../gcc.target/i386/zero-scratch-regs-31.c         |  16 ++
 5 files changed, 188 insertions(+), 27 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-28.c
 create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-29.c
 create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-30.c
 create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-31.c

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index e66dcf0d587..e6c5001b11e 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -3554,17 +3554,17 @@ ix86_function_value_regno_p (const unsigned int regno)
 /* Check whether the register REGNO should be zeroed on X86.
    When ALL_SSE_ZEROED is true, all SSE registers have been zeroed
    together, no need to zero it again.
-   Stack registers (st0-st7) and mm0-mm7 are aliased with each other.
-   very hard to be zeroed individually, don't zero individual st or
-   mm registgers.  */
+   When NEED_ZERO_MMX is true, MMX registers should be cleared.  */
 
 static bool
 zero_call_used_regno_p (const unsigned int regno,
-			bool all_sse_zeroed)
+			bool all_sse_zeroed,
+			bool need_zero_mmx)
 {
   return GENERAL_REGNO_P (regno)
 	 || (!all_sse_zeroed && SSE_REGNO_P (regno))
-	 || MASK_REGNO_P (regno);
+	 || MASK_REGNO_P (regno)
+	 || (need_zero_mmx && MMX_REGNO_P (regno));
 }
 
 /* Return the machine_mode that is used to zero register REGNO.  */
@@ -3579,8 +3579,12 @@ zero_call_used_regno_mode (const unsigned int regno)
     return SImode;
   else if (SSE_REGNO_P (regno))
     return V4SFmode;
-  else
+  else if (MASK_REGNO_P (regno))
     return HImode;
+  else if (MMX_REGNO_P (regno))
+    return V4HImode;
+  else
+    gcc_unreachable ();
 }
 
 /* Generate a rtx to zero all vector registers together if possible,
@@ -3603,7 +3607,7 @@ zero_all_vector_registers (HARD_REG_SET need_zeroed_hardregs)
   return gen_avx_vzeroall ();
 }
 
-/* Generate insns to zero all st/mm registers together.
+/* Generate insns to zero all st registers together.
    Return true when zeroing instructions are generated.
    Assume the number of st registers that are zeroed is num_of_st,
    we will emit the following sequence to zero them together:
@@ -3616,23 +3620,49 @@ zero_all_vector_registers (HARD_REG_SET need_zeroed_hardregs)
 		  ...
 		  fstp %%st(0);
    i.e., num_of_st fldz followed by num_of_st fstp to clear the stack
-   mark stack slots empty.  */
+   mark stack slots empty.
+
+   How to compute the num_of_st?
+   There is no direct mapping from stack registers to hard register
+   numbers.  If one stack register need to be cleared, we don't know
+   where in the stack the value remains.  So, if any stack register 
+   need to be cleared, the whole stack should be cleared.  However,
+   x87 stack registers that hold the return value should be excluded.
+   x87 returns in the top (two for complex values) register, so
+   num_of_st should be 7/6 when x87 returns, otherwise it will be 8.  */
+
 
 static bool
-zero_all_st_mm_registers (HARD_REG_SET need_zeroed_hardregs)
+zero_all_st_registers (HARD_REG_SET need_zeroed_hardregs)
 {
   unsigned int num_of_st = 0;
   for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
-    if (STACK_REGNO_P (regno)
-	&& TEST_HARD_REG_BIT (need_zeroed_hardregs, regno)
-	/* When the corresponding mm register also need to be cleared too.  */
-	&& TEST_HARD_REG_BIT (need_zeroed_hardregs,
-			      (regno - FIRST_STACK_REG + FIRST_MMX_REG)))
-      num_of_st++;
+    if ((STACK_REGNO_P (regno) || MMX_REGNO_P (regno))
+	&& TEST_HARD_REG_BIT (need_zeroed_hardregs, regno))
+      {
+	num_of_st++;
+	break;
+      }
 
   if (num_of_st == 0)
     return false;
 
+  bool return_with_x87 = false;
+  return_with_x87 = (crtl->return_rtx
+		     && (STACK_REG_P (crtl->return_rtx)));
+
+  bool complex_return = false;
+  complex_return = (crtl->return_rtx
+		    && COMPLEX_MODE_P (GET_MODE (crtl->return_rtx)));
+
+  if (return_with_x87)
+    if (complex_return)
+      num_of_st = 6;
+    else
+      num_of_st = 7;
+  else
+    num_of_st = 8;
+
   rtx st_reg = gen_rtx_REG (XFmode, FIRST_STACK_REG);
   for (unsigned int i = 0; i < num_of_st; i++)
     emit_insn (gen_rtx_SET (st_reg, CONST0_RTX (XFmode)));
@@ -3646,6 +3676,43 @@ zero_all_st_mm_registers (HARD_REG_SET need_zeroed_hardregs)
   return true;
 }
 
+
+/* When the routine exit with MMX mode, if there is any ST registers
+   need to be zeroed, we should clear all MMX registers except the
+   one that holds the return value RET_MMX_REGNO.  */
+static bool
+zero_all_mm_registers (HARD_REG_SET need_zeroed_hardregs,
+		       unsigned int ret_mmx_regno)
+{
+  bool need_zero_all_mm = false;
+  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
+    if (STACK_REGNO_P (regno)
+	&& TEST_HARD_REG_BIT (need_zeroed_hardregs, regno))
+      {
+	need_zero_all_mm = true;
+	break;
+      }
+
+  if (!need_zero_all_mm)
+    return false;
+
+  rtx zero_mmx = NULL_RTX;
+  machine_mode mode = V4HImode;
+  for (unsigned int regno = FIRST_MMX_REG; regno <= LAST_MMX_REG; regno++)
+    if (regno != ret_mmx_regno)
+      {
+	rtx reg = gen_rtx_REG (mode, regno);
+	if (zero_mmx == NULL_RTX)
+	  {
+	    zero_mmx = reg;
+	    emit_insn (gen_rtx_SET (reg, CONST0_RTX(mode)));
+	  }
+	else
+	  emit_move_insn (reg, zero_mmx);
+      }
+  return true;
+}
+
 /* TARGET_ZERO_CALL_USED_REGS.  */
 /* Generate a sequence of instructions that zero registers specified by
    NEED_ZEROED_HARDREGS.  Return the ZEROED_HARDREGS that are actually
@@ -3655,7 +3722,10 @@ ix86_zero_call_used_regs (HARD_REG_SET need_zeroed_hardregs)
 {
   HARD_REG_SET zeroed_hardregs;
   bool all_sse_zeroed = false;
-  bool st_zeroed = false;
+  bool all_st_zeroed = false;
+  bool all_mm_zeroed = false;
+
+  CLEAR_HARD_REG_SET (zeroed_hardregs);
 
   /* first, let's see whether we can zero all vector registers together.  */
   rtx zero_all_vec_insn = zero_all_vector_registers (need_zeroed_hardregs);
@@ -3665,38 +3735,67 @@ ix86_zero_call_used_regs (HARD_REG_SET need_zeroed_hardregs)
       all_sse_zeroed = true;
     }
 
-  /* then, let's see whether we can zero all st+mm registers togeter.  */
-  st_zeroed = zero_all_st_mm_registers (need_zeroed_hardregs);
+  /* mm/st registers are shared registers set, we should follow the following
+     rules to clear them:
+			MMX exit mode       x87 exit mode
+	-------------|----------------------|---------------
+	uses x87 reg | clear all MMX        | clear all x87
+	uses MMX reg | clear individual MMX | clear all x87
+	x87 + MMX    | clear all MMX        | clear all x87
 
-  /* Now, generate instructions to zero all the registers.  */
+     first, we should decide which mode (MMX mode or x87 mode) the function 
+     exit with.  */
 
-  CLEAR_HARD_REG_SET (zeroed_hardregs);
-  if (st_zeroed)
-    SET_HARD_REG_BIT (zeroed_hardregs, FIRST_STACK_REG);
+  bool exit_with_mmx_mode = (crtl->return_rtx
+			     && (MMX_REG_P (crtl->return_rtx)));
+
+  if (!exit_with_mmx_mode)
+    /* x87 exit mode, we should zero all st registers together.  */
+    {
+      all_st_zeroed = zero_all_st_registers (need_zeroed_hardregs);
+      if (all_st_zeroed)
+	SET_HARD_REG_BIT (zeroed_hardregs, FIRST_STACK_REG);
+    }
+  else 
+    /* MMX exit mode, check whether we can zero all mm registers.  */
+    {
+      unsigned int exit_mmx_regno = REGNO (crtl->return_rtx);
+      all_mm_zeroed = zero_all_mm_registers (need_zeroed_hardregs, 
+					     exit_mmx_regno);
+      if (all_mm_zeroed)
+	for (unsigned int regno = FIRST_MMX_REG; regno <= LAST_MMX_REG; regno++)
+	  if (regno != exit_mmx_regno)
+	    SET_HARD_REG_BIT (zeroed_hardregs, regno);
+    }
+
+  /* Now, generate instructions to zero all the other registers.  */
 
   rtx zero_gpr = NULL_RTX;
   rtx zero_vector = NULL_RTX;
   rtx zero_mask = NULL_RTX;
+  rtx zero_mmx = NULL_RTX;
 
   for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
     {
       if (!TEST_HARD_REG_BIT (need_zeroed_hardregs, regno))
 	continue;
-      if (!zero_call_used_regno_p (regno, all_sse_zeroed))
+      if (!zero_call_used_regno_p (regno, all_sse_zeroed, 
+				   exit_with_mmx_mode && !all_mm_zeroed))
 	continue;
 
       SET_HARD_REG_BIT (zeroed_hardregs, regno);
 
-      rtx reg, tmp;
+      rtx reg, tmp, zero_rtx;
       machine_mode mode = zero_call_used_regno_mode (regno);
 
       reg = gen_rtx_REG (mode, regno);
+      zero_rtx = CONST0_RTX (mode);
 
       if (mode == SImode)
 	if (zero_gpr == NULL_RTX)
 	  {
 	    zero_gpr = reg;
-	    tmp = gen_rtx_SET (reg, const0_rtx);
+	    tmp = gen_rtx_SET (reg, zero_rtx);
 	    if (!TARGET_USE_MOV0 || optimize_insn_for_size_p ())
 	      {
 		rtx clob = gen_rtx_CLOBBER (VOIDmode,
@@ -3714,7 +3813,7 @@ ix86_zero_call_used_regs (HARD_REG_SET need_zeroed_hardregs)
 	if (zero_vector == NULL_RTX)
 	  {
 	    zero_vector = reg;
-	    tmp = gen_rtx_SET (reg, const0_rtx);
+	    tmp = gen_rtx_SET (reg, zero_rtx);
 	    emit_insn (tmp);
 	  }
 	else
@@ -3723,11 +3822,20 @@ ix86_zero_call_used_regs (HARD_REG_SET need_zeroed_hardregs)
 	if (zero_mask == NULL_RTX)
 	  {
 	    zero_mask = reg;
-	    tmp = gen_rtx_SET (reg, const0_rtx);
+	    tmp = gen_rtx_SET (reg, zero_rtx);
 	    emit_insn (tmp);
 	  }
 	else
 	  emit_move_insn (reg, zero_mask);
+      else if (mode == V4HImode)
+	if (zero_mmx == NULL_RTX)
+	  {
+	    zero_mmx = reg;
+	    tmp = gen_rtx_SET (reg, zero_rtx);
+	    emit_insn (tmp);
+	  }
+	else
+	  emit_move_insn (reg, zero_mmx);
       else
 	gcc_unreachable ();
     }
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-28.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-28.c
new file mode 100644
index 00000000000..48b1f019a28
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-28.c
@@ -0,0 +1,16 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -mmmx -fzero-call-used-regs=all" } */
+/* { dg-require-effective-target ia32 } */
+
+__v2si ret_mmx (void)
+{
+  return (__v2si) { 123, 345 };
+}
+
+/* { dg-final { scan-assembler "pxor\[ \t\]*%mm1, %mm1" } } */
+/* { dg-final { scan-assembler "movq\[ \t\]*%mm1, %mm2" } } */
+/* { dg-final { scan-assembler "movq\[ \t\]*%mm1, %mm3" } } */
+/* { dg-final { scan-assembler "movq\[ \t\]*%mm1, %mm4" } } */
+/* { dg-final { scan-assembler "movq\[ \t\]*%mm1, %mm5" } } */
+/* { dg-final { scan-assembler "movq\[ \t\]*%mm1, %mm6" } } */
+/* { dg-final { scan-assembler "movq\[ \t\]*%mm1, %mm7" } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-29.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-29.c
new file mode 100644
index 00000000000..8b5e1cd1602
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-29.c
@@ -0,0 +1,10 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=all" } */
+
+long double ret_x87 (void)
+{
+  return 1.1L;
+}
+
+/* { dg-final { scan-assembler-times "fldz" 7 } } */
+/* { dg-final { scan-assembler-times "fstp" 7 } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-30.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-30.c
new file mode 100644
index 00000000000..e6fb4acf0fa
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-30.c
@@ -0,0 +1,11 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2  -fzero-call-used-regs=all" } */
+/* { dg-require-effective-target lp64} */
+
+_Complex long double ret_x87_cplx (void)
+{
+  return 1.1L + 1.2iL;
+}
+
+/* { dg-final { scan-assembler-times "fldz" 6 } } */
+/* { dg-final { scan-assembler-times "fstp" 6 } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-31.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-31.c
new file mode 100644
index 00000000000..943508d1d26
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-31.c
@@ -0,0 +1,16 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -mmmx -fzero-call-used-regs=all-arg" } */
+/* { dg-require-effective-target ia32 } */
+
+__v2si ret_mmx (void)
+{
+  return (__v2si) { 123, 345 };
+}
+
+/* { dg-final { scan-assembler "pxor\[ \t\]*%mm1, %mm1" } } */
+/* { dg-final { scan-assembler "movq\[ \t\]*%mm1, %mm2" } } */
+/* { dg-final { scan-assembler-not "movq\[ \t\]*%mm1, %mm3" } } */
+/* { dg-final { scan-assembler-not "movq\[ \t\]*%mm1, %mm4" } } */
+/* { dg-final { scan-assembler-not "movq\[ \t\]*%mm1, %mm5" } } */
+/* { dg-final { scan-assembler-not "movq\[ \t\]*%mm1, %mm6" } } */
+/* { dg-final { scan-assembler-not "movq\[ \t\]*%mm1, %mm7" } } */
Uros Bizjak Oct. 27, 2020, 5:34 p.m. UTC | #19
On Tue, Oct 27, 2020 at 5:10 PM Qing Zhao <QING.ZHAO@oracle.com> wrote:
>
> Uros,
>
> The following is the change compared to version 4 after fix all the issues you raised in the previous email.
>
> Let me know if there is any other issue.

LGTM for x86 part, with a couple of small review comments inline.

Thanks,
Uros.

> Thanks.
>
> Qing
>
> ---
>  gcc/config/i386/i386.c                             | 162 +++++++++++++++++----
>  .../gcc.target/i386/zero-scratch-regs-28.c         |  16 ++
>  .../gcc.target/i386/zero-scratch-regs-29.c         |  10 ++
>  .../gcc.target/i386/zero-scratch-regs-30.c         |  11 ++
>  .../gcc.target/i386/zero-scratch-regs-31.c         |  16 ++
>  5 files changed, 188 insertions(+), 27 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-28.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-29.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratchregs-30.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-31.c
>
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index e66dcf0d587..e6c5001b11e 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -3554,17 +3554,17 @@ ix86_function_value_regno_p (const unsigned int regno)
>  /* Check whether the register REGNO should be zeroed on X86.
>     When ALL_SSE_ZEROED is true, all SSE registers have been zeroed
>     together, no need to zero it again.
> -   Stack registers (st0-st7) and mm0-mm7 are aliased with each other.
> -   very hard to be zeroed individually, don't zero individual st or
> -   mm registgers.  */
> +   When NEED_ZERO_MMX is true, MMX registers should be cleared.  */
>
>  static bool
>  zero_call_used_regno_p (const unsigned int regno,
> -                       bool all_sse_zeroed)
> +                       bool all_sse_zeroed,
> +                       bool need_zero_mmx)
>  {
>    return GENERAL_REGNO_P (regno)
>          || (!all_sse_zeroed && SSE_REGNO_P (regno))
> -        || MASK_REGNO_P (regno);
> +        || MASK_REGNO_P (regno)
> +        || (need_zero_mmx && MMX_REGNO_P (regno));
>  }
>
>  /* Return the machine_mode that is used to zero register REGNO.  */
> @@ -3579,8 +3579,12 @@ zero_call_used_regno_mode (const unsigned int regno)
>      return SImode;
>    else if (SSE_REGNO_P (regno))
>      return V4SFmode;
> -  else
> +  else if (MASK_REGNO_P (regno))
>      return HImode;
> +  else if (MMX_REGNO_P (regno))
> +    return V4HImode;
> +  else
> +    gcc_unreachable ();
>  }
>
>  /* Generate a rtx to zero all vector registers together if possible,
> @@ -3603,7 +3607,7 @@ zero_all_vector_registers (HARD_REG_SET need_zeroed_hardregs)
>    return gen_avx_vzeroall ();
>  }
>
> -/* Generate insns to zero all st/mm registers together.
> +/* Generate insns to zero all st registers together.
>     Return true when zeroing instructions are generated.
>     Assume the number of st registers that are zeroed is num_of_st,
>     we will emit the following sequence to zero them together:
> @@ -3616,23 +3620,49 @@ zero_all_vector_registers (HARD_REG_SET need_zeroed_hardregs)
>                   ...
>                   fstp %%st(0);
>     i.e., num_of_st fldz followed by num_of_st fstp to clear the stack
> -   mark stack slots empty.  */
> +   mark stack slots empty.
> +
> +   How to compute the num_of_st?
> +   There is no direct mapping from stack registers to hard register
> +   numbers.  If one stack register need to be cleared, we don't know

needs

> +   where in the stack the value remains.  So, if any stack register
> +   need to be cleared, the whole stack should be cleared.  However,

needs

> +   x87 stack registers that hold the return value should be excluded.
> +   x87 returns in the top (two for complex values) register, so
> +   num_of_st should be 7/6 when x87 returns, otherwise it will be 8.  */
> +
>
>  static bool
> -zero_all_st_mm_registers (HARD_REG_SET need_zeroed_hardregs)
> +zero_all_st_registers (HARD_REG_SET need_zeroed_hardregs)
>  {
>    unsigned int num_of_st = 0;
>    for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
> -    if (STACK_REGNO_P (regno)
> -       && TEST_HARD_REG_BIT (need_zeroed_hardregs, regno)
> -       /* When the corresponding mm register also need to be cleared too.  */
> -       && TEST_HARD_REG_BIT (need_zeroed_hardregs,
> -                             (regno - FIRST_STACK_REG + FIRST_MMX_REG)))
> -      num_of_st++;
> +    if ((STACK_REGNO_P (regno) || MMX_REGNO_P (regno))
> +       && TEST_HARD_REG_BIT (need_zeroed_hardregs, regno))
> +      {
> +       num_of_st++;
> +       break;
> +      }
>
>    if (num_of_st == 0)
>      return false;
>
> +  bool return_with_x87 = false;
> +  return_with_x87 = (crtl->return_rtx
> +                    && (STACK_REG_P (crtl->return_rtx)));
> +
> +  bool complex_return = false;
> +  complex_return = (crtl->return_rtx
> +                   && COMPLEX_MODE_P (GET_MODE (crtl->return_rtx)));
> +
> +  if (return_with_x87)
> +    if (complex_return)
> +      num_of_st = 6;
> +    else
> +      num_of_st = 7;
> +  else
> +    num_of_st = 8;
> +
>    rtx st_reg = gen_rtx_REG (XFmode, FIRST_STACK_REG);
>    for (unsigned int i = 0; i < num_of_st; i++)
>      emit_insn (gen_rtx_SET (st_reg, CONST0_RTX (XFmode)));
> @@ -3646,6 +3676,43 @@ zero_all_st_mm_registers (HARD_REG_SET need_zeroed_hardregs)
>    return true;
>  }
>
> +
> +/* When the routine exit with MMX mode, if there is any ST registers

... exits in MMX mode, if any ST register needs ...

> +   need to be zeroed, we should clear all MMX registers except the
> +   one that holds the return value RET_MMX_REGNO.  */

... except the RET_MMX_REGNO that holds the return value.

> +static bool
> +zero_all_mm_registers (HARD_REG_SET need_zeroed_hardregs,
> +                      unsigned int ret_mmx_regno)
> +{
> +  bool need_zero_all_mm = false;
> +  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
> +    if (STACK_REGNO_P (regno)
> +       && TEST_HARD_REG_BIT (need_zeroed_hardregs, regno))
> +      {
> +       need_zero_all_mm = true;
> +       break;
> +      }
> +
> +  if (!need_zero_all_mm)
> +    return false;
> +
> +  rtx zero_mmx = NULL_RTX;
> +  machine_mode mode = V4HImode;
> +  for (unsigned int regno = FIRST_MMX_REG; regno <= LAST_MMX_REG; regno++)
> +    if (regno != ret_mmx_regno)
> +      {
> +       rtx reg = gen_rtx_REG (mode, regno);
> +       if (zero_mmx == NULL_RTX)
> +         {
> +           zero_mmx = reg;
> +           emit_insn (gen_rtx_SET (reg, CONST0_RTX(mode)));

space after CONST0_RTX

> +         }
> +       else
> +         emit_move_insn (reg, zero_mmx);
> +      }
> +  return true;
> +}
> +
>  /* TARGET_ZERO_CALL_USED_REGS.  */
>  /* Generate a sequence of instructions that zero registers specified by
>     NEED_ZEROED_HARDREGS.  Return the ZEROED_HARDREGS that are actually
> @@ -3655,7 +3722,10 @@ ix86_zero_call_used_regs (HARD_REG_SET need_zeroed_hardregs)
>  {
>    HARD_REG_SET zeroed_hardregs;
>    bool all_sse_zeroed = false;
> -  bool st_zeroed = false;
> +  bool all_st_zeroed = false;
> +  bool all_mm_zeroed = false;
> +
> +  CLEAR_HARD_REG_SET (zeroed_hardregs);
>
>    /* first, let's see whether we can zero all vector registers together.  */
>    rtx zero_all_vec_insn = zero_all_vector_registers (need_zeroed_hardregs);
> @@ -3665,38 +3735,67 @@ ix86_zero_call_used_regs (HARD_REG_SET need_zeroed_hardregs)
>        all_sse_zeroed = true;
>      }
>
> -  /* then, let's see whether we can zero all st+mm registers togeter.  */
> -  st_zeroed = zero_all_st_mm_registers (need_zeroed_hardregs);
> +  /* mm/st registers are shared registers set, we should follow the following
> +     rules to clear them:
> +                       MMX exit mode       x87 exit mode
> +       -------------|----------------------|---------------
> +       uses x87 reg | clear all MMX        | clear all x87
> +       uses MMX reg | clear individual MMX | clear all x87
> +       x87 + MMX    | clear all MMX        | clear all x87
>
> -  /* Now, generate instructions to zero all the registers.  */
> +     first, we should decide which mode (MMX mode or x87 mode) the function
> +     exit with.  */
>
> -  CLEAR_HARD_REG_SET (zeroed_hardregs);
> -  if (st_zeroed)
> -    SET_HARD_REG_BIT (zeroed_hardregs, FIRST_STACK_REG);
> +  bool exit_with_mmx_mode = (crtl->return_rtx
> +                            && (MMX_REG_P (crtl->return_rtx)));
> +
> +  if (!exit_with_mmx_mode)
> +    /* x87 exit mode, we should zero all st registers together.  */
> +    {
> +      all_st_zeroed = zero_all_st_registers (need_zeroed_hardregs);
> +      if (all_st_zeroed)
> +       SET_HARD_REG_BIT (zeroed_hardregs, FIRST_STACK_REG);
> +    }
> +  else
> +    /* MMX exit mode, check whether we can zero all mm registers.  */
> +    {
> +      unsigned int exit_mmx_regno = REGNO (crtl->return_rtx);
> +      all_mm_zeroed = zero_all_mm_registers (need_zeroed_hardregs,
> +                                            exit_mmx_regno);
> +      if (all_mm_zeroed)
> +       for (unsigned int regno = FIRST_MMX_REG; regno <= LAST_MMX_REG; regno++)
> +         if (regno != exit_mmx_regno)
> +           SET_HARD_REG_BIT (zeroed_hardregs, regno);
> +    }
> +
> +  /* Now, generate instructions to zero all the other registers.  */
>
>    rtx zero_gpr = NULL_RTX;
>    rtx zero_vector = NULL_RTX;
>    rtx zero_mask = NULL_RTX;
> +  rtx zero_mmx = NULL_RTX;
>
>    for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
>      {
>        if (!TEST_HARD_REG_BIT (need_zeroed_hardregs, regno))
>         continue;
> -      if (!zero_call_used_regno_p (regno, all_sse_zeroed))
> +      if (!zero_call_used_regno_p (regno, all_sse_zeroed,
> +                                  exit_with_mmx_mode && !all_mm_zeroed))
>         continue;
>
>        SET_HARD_REG_BIT (zeroed_hardregs, regno);
>
> -      rtx reg, tmp;
> +      rtx reg, tmp, zero_rtx;
>        machine_mode mode = zero_call_used_regno_mode (regno);
>
>        reg = gen_rtx_REG (mode, regno);
> +      zero_rtx = CONST0_RTX (mode);
>
>        if (mode == SImode)
>         if (zero_gpr == NULL_RTX)
>           {
>             zero_gpr = reg;
> -           tmp = gen_rtx_SET (reg, const0_rtx);
> +           tmp = gen_rtx_SET (reg, zero_rtx);
>             if (!TARGET_USE_MOV0 || optimize_insn_for_size_p ())
>               {
>                 rtx clob = gen_rtx_CLOBBER (VOIDmode,
> @@ -3714,7 +3813,7 @@ ix86_zero_call_used_regs (HARD_REG_SET need_zeroed_hardregs)
>         if (zero_vector == NULL_RTX)
>           {
>             zero_vector = reg;
> -           tmp = gen_rtx_SET (reg, const0_rtx);
> +           tmp = gen_rtx_SET (reg, zero_rtx);
>             emit_insn (tmp);
>           }
>         else
> @@ -3723,11 +3822,20 @@ ix86_zero_call_used_regs (HARD_REG_SET need_zeroed_hardregs)
>         if (zero_mask == NULL_RTX)
>           {
>             zero_mask = reg;
> -           tmp = gen_rtx_SET (reg, const0_rtx);
> +           tmp = gen_rtx_SET (reg, zero_rtx);
>             emit_insn (tmp);
>           }
>         else
>           emit_move_insn (reg, zero_mask);
> +      else if (mode == V4HImode)
> +       if (zero_mmx == NULL_RTX)
> +         {
> +           zero_mmx = reg;
> +           tmp = gen_rtx_SET (reg, zero_rtx);
> +           emit_insn (tmp);
> +         }
> +       else
> +         emit_move_insn (reg, zero_mmx);
>        else
>         gcc_unreachable ();
>      }
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-28.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-28.c
> new file mode 100644
> index 00000000000..48b1f019a28
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-28.c
> @@ -0,0 +1,16 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -mmmx -fzero-call-used-regs=all" } */
> +/* { dg-require-effective-target ia32 } */
> +
> +__v2si ret_mmx (void)
> +{
> +  return (__v2si) { 123, 345 };
> +}
> +
> +/* { dg-final { scan-assembler "pxor\[ \t\]*%mm1, %mm1" } } */
> +/* { dg-final { scan-assembler "movq\[ \t\]*%mm1, %mm2" } } */
> +/* { dg-final { scan-assembler "movq\[ \t\]*%mm1, %mm3" } } */
> +/* { dg-final { scan-assembler "movq\[ \t\]*%mm1, %mm4" } } */
> +/* { dg-final { scan-assembler "movq\[ \t\]*%mm1, %mm5" } } */
> +/* { dg-final { scan-assembler "movq\[ \t\]*%mm1, %mm6" } } */
> +/* { dg-final { scan-assembler "movq\[ \t\]*%mm1, %mm7" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-29.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-29.c
> new file mode 100644
> index 00000000000..8b5e1cd1602
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-29.c
> @@ -0,0 +1,10 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=all" } */
> +
> +long double ret_x87 (void)
> +{
> +  return 1.1L;
> +}
> +
> +/* { dg-final { scan-assembler-times "fldz" 7 } } */
> +/* { dg-final { scan-assembler-times "fstp" 7 } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-30.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-30.c
> new file mode 100644
> index 00000000000..e6fb4acf0fa
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-30.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2  -fzero-call-used-regs=all" } */
> +/* { dg-require-effective-target lp64} */

Please remove the above line, we can get better test coverage by
conditional scans below.

> +
> +_Complex long double ret_x87_cplx (void)
> +{
> +  return 1.1L + 1.2iL;
> +}
> +
> +/* { dg-final { scan-assembler-times "fldz" 6 } } */
> +/* { dg-final { scan-assembler-times "fstp" 6 } } */

/* { dg-final { scan-assembler-times "fldz" 8 { target ia32 } } } */
/* { dg-final { scan-assembler-times "fstp" 8 { target ia32 } } } */

/* { dg-final { scan-assembler-times "fldz" 6 { target { ! ia32 } } } } */
/* { dg-final { scan-assembler-times "fstp" 6 { target { ! ia32 } } } } */

> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-31.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-31.c
> new file mode 100644
> index 00000000000..943508d1d26
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-31.c
> @@ -0,0 +1,16 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -mmmx -fzero-call-used-regs=all-arg" } */
> +/* { dg-require-effective-target ia32 } */
> +
> +__v2si ret_mmx (void)
> +{
> +  return (__v2si) { 123, 345 };
> +}
> +
> +/* { dg-final { scan-assembler "pxor\[ \t\]*%mm1, %mm1" } } */
> +/* { dg-final { scan-assembler "movq\[ \t\]*%mm1, %mm2" } } */
> +/* { dg-final { scan-assembler-not "movq\[ \t\]*%mm1, %mm3" } } */

/* { dg-final { scan-assembler-not "movq\[ \t\]*%mm1, %mm\[34567\]" } } */

should achieve the same with one regexp.

> +/* { dg-final { scan-assembler-not "movq\[ \t\]*%mm1, %mm4" } } */
> +/* { dg-final { scan-assembler-not "movq\[ \t\]*%mm1, %mm5" } } */
> +/* { dg-final { scan-assembler-not "movq\[ \t\]*%mm1, %mm6" } } */
> +/* { dg-final { scan-assembler-not "movq\[ \t\]*%mm1, %mm7" } } */
> --
> 2.11.0
Qing Zhao Oct. 27, 2020, 7:02 p.m. UTC | #20
> On Oct 27, 2020, at 12:34 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
> 
> On Tue, Oct 27, 2020 at 5:10 PM Qing Zhao <QING.ZHAO@oracle.com <mailto:QING.ZHAO@oracle.com>> wrote:
>> 
>> Uros,
>> 
>> The following is the change compared to version 4 after fix all the issues you raised in the previous email.
>> 
>> Let me know if there is any other issue.
> 
> LGTM for x86 part, with a couple of small review comments inline.

thanks.

I will fix those issues below and then include them in the next version 5.

Thanks a lot for all your help.

Qing

> 
> Thanks,
> Uros.
> 
>> Thanks.
>> 
>> Qing
>> 
>> ---
>> gcc/config/i386/i386.c                             | 162 +++++++++++++++++----
>> .../gcc.target/i386/zero-scratch-regs-28.c         |  16 ++
>> .../gcc.target/i386/zero-scratch-regs-29.c         |  10 ++
>> .../gcc.target/i386/zero-scratch-regs-30.c         |  11 ++
>> .../gcc.target/i386/zero-scratch-regs-31.c         |  16 ++
>> 5 files changed, 188 insertions(+), 27 deletions(-)
>> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-28.c
>> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-29.c
>> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratchregs-30.c
>> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-31.c
>> 
>> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
>> index e66dcf0d587..e6c5001b11e 100644
>> --- a/gcc/config/i386/i386.c
>> +++ b/gcc/config/i386/i386.c
>> @@ -3554,17 +3554,17 @@ ix86_function_value_regno_p (const unsigned int regno)
>> /* Check whether the register REGNO should be zeroed on X86.
>>    When ALL_SSE_ZEROED is true, all SSE registers have been zeroed
>>    together, no need to zero it again.
>> -   Stack registers (st0-st7) and mm0-mm7 are aliased with each other.
>> -   very hard to be zeroed individually, don't zero individual st or
>> -   mm registgers.  */
>> +   When NEED_ZERO_MMX is true, MMX registers should be cleared.  */
>> 
>> static bool
>> zero_call_used_regno_p (const unsigned int regno,
>> -                       bool all_sse_zeroed)
>> +                       bool all_sse_zeroed,
>> +                       bool need_zero_mmx)
>> {
>>   return GENERAL_REGNO_P (regno)
>>         || (!all_sse_zeroed && SSE_REGNO_P (regno))
>> -        || MASK_REGNO_P (regno);
>> +        || MASK_REGNO_P (regno)
>> +        || (need_zero_mmx && MMX_REGNO_P (regno));
>> }
>> 
>> /* Return the machine_mode that is used to zero register REGNO.  */
>> @@ -3579,8 +3579,12 @@ zero_call_used_regno_mode (const unsigned int regno)
>>     return SImode;
>>   else if (SSE_REGNO_P (regno))
>>     return V4SFmode;
>> -  else
>> +  else if (MASK_REGNO_P (regno))
>>     return HImode;
>> +  else if (MMX_REGNO_P (regno))
>> +    return V4HImode;
>> +  else
>> +    gcc_unreachable ();
>> }
>> 
>> /* Generate a rtx to zero all vector registers together if possible,
>> @@ -3603,7 +3607,7 @@ zero_all_vector_registers (HARD_REG_SET need_zeroed_hardregs)
>>   return gen_avx_vzeroall ();
>> }
>> 
>> -/* Generate insns to zero all st/mm registers together.
>> +/* Generate insns to zero all st registers together.
>>    Return true when zeroing instructions are generated.
>>    Assume the number of st registers that are zeroed is num_of_st,
>>    we will emit the following sequence to zero them together:
>> @@ -3616,23 +3620,49 @@ zero_all_vector_registers (HARD_REG_SET need_zeroed_hardregs)
>>                  ...
>>                  fstp %%st(0);
>>    i.e., num_of_st fldz followed by num_of_st fstp to clear the stack
>> -   mark stack slots empty.  */
>> +   mark stack slots empty.
>> +
>> +   How to compute the num_of_st?
>> +   There is no direct mapping from stack registers to hard register
>> +   numbers.  If one stack register need to be cleared, we don't know
> 
> needs
> 
>> +   where in the stack the value remains.  So, if any stack register
>> +   need to be cleared, the whole stack should be cleared.  However,
> 
> needs
> 
>> +   x87 stack registers that hold the return value should be excluded.
>> +   x87 returns in the top (two for complex values) register, so
>> +   num_of_st should be 7/6 when x87 returns, otherwise it will be 8.  */
>> +
>> 
>> static bool
>> -zero_all_st_mm_registers (HARD_REG_SET need_zeroed_hardregs)
>> +zero_all_st_registers (HARD_REG_SET need_zeroed_hardregs)
>> {
>>   unsigned int num_of_st = 0;
>>   for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
>> -    if (STACK_REGNO_P (regno)
>> -       && TEST_HARD_REG_BIT (need_zeroed_hardregs, regno)
>> -       /* When the corresponding mm register also need to be cleared too.  */
>> -       && TEST_HARD_REG_BIT (need_zeroed_hardregs,
>> -                             (regno - FIRST_STACK_REG + FIRST_MMX_REG)))
>> -      num_of_st++;
>> +    if ((STACK_REGNO_P (regno) || MMX_REGNO_P (regno))
>> +       && TEST_HARD_REG_BIT (need_zeroed_hardregs, regno))
>> +      {
>> +       num_of_st++;
>> +       break;
>> +      }
>> 
>>   if (num_of_st == 0)
>>     return false;
>> 
>> +  bool return_with_x87 = false;
>> +  return_with_x87 = (crtl->return_rtx
>> +                    && (STACK_REG_P (crtl->return_rtx)));
>> +
>> +  bool complex_return = false;
>> +  complex_return = (crtl->return_rtx
>> +                   && COMPLEX_MODE_P (GET_MODE (crtl->return_rtx)));
>> +
>> +  if (return_with_x87)
>> +    if (complex_return)
>> +      num_of_st = 6;
>> +    else
>> +      num_of_st = 7;
>> +  else
>> +    num_of_st = 8;
>> +
>>   rtx st_reg = gen_rtx_REG (XFmode, FIRST_STACK_REG);
>>   for (unsigned int i = 0; i < num_of_st; i++)
>>     emit_insn (gen_rtx_SET (st_reg, CONST0_RTX (XFmode)));
>> @@ -3646,6 +3676,43 @@ zero_all_st_mm_registers (HARD_REG_SET need_zeroed_hardregs)
>>   return true;
>> }
>> 
>> +
>> +/* When the routine exit with MMX mode, if there is any ST registers
> 
> ... exits in MMX mode, if any ST register needs ...
> 
>> +   need to be zeroed, we should clear all MMX registers except the
>> +   one that holds the return value RET_MMX_REGNO.  */
> 
> ... except the RET_MMX_REGNO that holds the return value.
> 
>> +static bool
>> +zero_all_mm_registers (HARD_REG_SET need_zeroed_hardregs,
>> +                      unsigned int ret_mmx_regno)
>> +{
>> +  bool need_zero_all_mm = false;
>> +  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
>> +    if (STACK_REGNO_P (regno)
>> +       && TEST_HARD_REG_BIT (need_zeroed_hardregs, regno))
>> +      {
>> +       need_zero_all_mm = true;
>> +       break;
>> +      }
>> +
>> +  if (!need_zero_all_mm)
>> +    return false;
>> +
>> +  rtx zero_mmx = NULL_RTX;
>> +  machine_mode mode = V4HImode;
>> +  for (unsigned int regno = FIRST_MMX_REG; regno <= LAST_MMX_REG; regno++)
>> +    if (regno != ret_mmx_regno)
>> +      {
>> +       rtx reg = gen_rtx_REG (mode, regno);
>> +       if (zero_mmx == NULL_RTX)
>> +         {
>> +           zero_mmx = reg;
>> +           emit_insn (gen_rtx_SET (reg, CONST0_RTX(mode)));
> 
> space after CONST0_RTX
> 
>> +         }
>> +       else
>> +         emit_move_insn (reg, zero_mmx);
>> +      }
>> +  return true;
>> +}
>> +
>> /* TARGET_ZERO_CALL_USED_REGS.  */
>> /* Generate a sequence of instructions that zero registers specified by
>>    NEED_ZEROED_HARDREGS.  Return the ZEROED_HARDREGS that are actually
>> @@ -3655,7 +3722,10 @@ ix86_zero_call_used_regs (HARD_REG_SET need_zeroed_hardregs)
>> {
>>   HARD_REG_SET zeroed_hardregs;
>>   bool all_sse_zeroed = false;
>> -  bool st_zeroed = false;
>> +  bool all_st_zeroed = false;
>> +  bool all_mm_zeroed = false;
>> +
>> +  CLEAR_HARD_REG_SET (zeroed_hardregs);
>> 
>>   /* first, let's see whether we can zero all vector registers together.  */
>>   rtx zero_all_vec_insn = zero_all_vector_registers (need_zeroed_hardregs);
>> @@ -3665,38 +3735,67 @@ ix86_zero_call_used_regs (HARD_REG_SET need_zeroed_hardregs)
>>       all_sse_zeroed = true;
>>     }
>> 
>> -  /* then, let's see whether we can zero all st+mm registers togeter.  */
>> -  st_zeroed = zero_all_st_mm_registers (need_zeroed_hardregs);
>> +  /* mm/st registers are shared registers set, we should follow the following
>> +     rules to clear them:
>> +                       MMX exit mode       x87 exit mode
>> +       -------------|----------------------|---------------
>> +       uses x87 reg | clear all MMX        | clear all x87
>> +       uses MMX reg | clear individual MMX | clear all x87
>> +       x87 + MMX    | clear all MMX        | clear all x87
>> 
>> -  /* Now, generate instructions to zero all the registers.  */
>> +     first, we should decide which mode (MMX mode or x87 mode) the function
>> +     exit with.  */
>> 
>> -  CLEAR_HARD_REG_SET (zeroed_hardregs);
>> -  if (st_zeroed)
>> -    SET_HARD_REG_BIT (zeroed_hardregs, FIRST_STACK_REG);
>> +  bool exit_with_mmx_mode = (crtl->return_rtx
>> +                            && (MMX_REG_P (crtl->return_rtx)));
>> +
>> +  if (!exit_with_mmx_mode)
>> +    /* x87 exit mode, we should zero all st registers together.  */
>> +    {
>> +      all_st_zeroed = zero_all_st_registers (need_zeroed_hardregs);
>> +      if (all_st_zeroed)
>> +       SET_HARD_REG_BIT (zeroed_hardregs, FIRST_STACK_REG);
>> +    }
>> +  else
>> +    /* MMX exit mode, check whether we can zero all mm registers.  */
>> +    {
>> +      unsigned int exit_mmx_regno = REGNO (crtl->return_rtx);
>> +      all_mm_zeroed = zero_all_mm_registers (need_zeroed_hardregs,
>> +                                            exit_mmx_regno);
>> +      if (all_mm_zeroed)
>> +       for (unsigned int regno = FIRST_MMX_REG; regno <= LAST_MMX_REG; regno++)
>> +         if (regno != exit_mmx_regno)
>> +           SET_HARD_REG_BIT (zeroed_hardregs, regno);
>> +    }
>> +
>> +  /* Now, generate instructions to zero all the other registers.  */
>> 
>>   rtx zero_gpr = NULL_RTX;
>>   rtx zero_vector = NULL_RTX;
>>   rtx zero_mask = NULL_RTX;
>> +  rtx zero_mmx = NULL_RTX;
>> 
>>   for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
>>     {
>>       if (!TEST_HARD_REG_BIT (need_zeroed_hardregs, regno))
>>        continue;
>> -      if (!zero_call_used_regno_p (regno, all_sse_zeroed))
>> +      if (!zero_call_used_regno_p (regno, all_sse_zeroed,
>> +                                  exit_with_mmx_mode && !all_mm_zeroed))
>>        continue;
>> 
>>       SET_HARD_REG_BIT (zeroed_hardregs, regno);
>> 
>> -      rtx reg, tmp;
>> +      rtx reg, tmp, zero_rtx;
>>       machine_mode mode = zero_call_used_regno_mode (regno);
>> 
>>       reg = gen_rtx_REG (mode, regno);
>> +      zero_rtx = CONST0_RTX (mode);
>> 
>>       if (mode == SImode)
>>        if (zero_gpr == NULL_RTX)
>>          {
>>            zero_gpr = reg;
>> -           tmp = gen_rtx_SET (reg, const0_rtx);
>> +           tmp = gen_rtx_SET (reg, zero_rtx);
>>            if (!TARGET_USE_MOV0 || optimize_insn_for_size_p ())
>>              {
>>                rtx clob = gen_rtx_CLOBBER (VOIDmode,
>> @@ -3714,7 +3813,7 @@ ix86_zero_call_used_regs (HARD_REG_SET need_zeroed_hardregs)
>>        if (zero_vector == NULL_RTX)
>>          {
>>            zero_vector = reg;
>> -           tmp = gen_rtx_SET (reg, const0_rtx);
>> +           tmp = gen_rtx_SET (reg, zero_rtx);
>>            emit_insn (tmp);
>>          }
>>        else
>> @@ -3723,11 +3822,20 @@ ix86_zero_call_used_regs (HARD_REG_SET need_zeroed_hardregs)
>>        if (zero_mask == NULL_RTX)
>>          {
>>            zero_mask = reg;
>> -           tmp = gen_rtx_SET (reg, const0_rtx);
>> +           tmp = gen_rtx_SET (reg, zero_rtx);
>>            emit_insn (tmp);
>>          }
>>        else
>>          emit_move_insn (reg, zero_mask);
>> +      else if (mode == V4HImode)
>> +       if (zero_mmx == NULL_RTX)
>> +         {
>> +           zero_mmx = reg;
>> +           tmp = gen_rtx_SET (reg, zero_rtx);
>> +           emit_insn (tmp);
>> +         }
>> +       else
>> +         emit_move_insn (reg, zero_mmx);
>>       else
>>        gcc_unreachable ();
>>     }
>> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-28.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-28.c
>> new file mode 100644
>> index 00000000000..48b1f019a28
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-28.c
>> @@ -0,0 +1,16 @@
>> +/* { dg-do compile { target *-*-linux* } } */
>> +/* { dg-options "-O2 -mmmx -fzero-call-used-regs=all" } */
>> +/* { dg-require-effective-target ia32 } */
>> +
>> +__v2si ret_mmx (void)
>> +{
>> +  return (__v2si) { 123, 345 };
>> +}
>> +
>> +/* { dg-final { scan-assembler "pxor\[ \t\]*%mm1, %mm1" } } */
>> +/* { dg-final { scan-assembler "movq\[ \t\]*%mm1, %mm2" } } */
>> +/* { dg-final { scan-assembler "movq\[ \t\]*%mm1, %mm3" } } */
>> +/* { dg-final { scan-assembler "movq\[ \t\]*%mm1, %mm4" } } */
>> +/* { dg-final { scan-assembler "movq\[ \t\]*%mm1, %mm5" } } */
>> +/* { dg-final { scan-assembler "movq\[ \t\]*%mm1, %mm6" } } */
>> +/* { dg-final { scan-assembler "movq\[ \t\]*%mm1, %mm7" } } */
>> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-29.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-29.c
>> new file mode 100644
>> index 00000000000..8b5e1cd1602
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-29.c
>> @@ -0,0 +1,10 @@
>> +/* { dg-do compile { target *-*-linux* } } */
>> +/* { dg-options "-O2 -fzero-call-used-regs=all" } */
>> +
>> +long double ret_x87 (void)
>> +{
>> +  return 1.1L;
>> +}
>> +
>> +/* { dg-final { scan-assembler-times "fldz" 7 } } */
>> +/* { dg-final { scan-assembler-times "fstp" 7 } } */
>> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-30.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-30.c
>> new file mode 100644
>> index 00000000000..e6fb4acf0fa
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-30.c
>> @@ -0,0 +1,11 @@
>> +/* { dg-do compile { target *-*-linux* } } */
>> +/* { dg-options "-O2  -fzero-call-used-regs=all" } */
>> +/* { dg-require-effective-target lp64} */
> 
> Please remove the above line, we can get better test coverage by
> conditional scans below.
> 
>> +
>> +_Complex long double ret_x87_cplx (void)
>> +{
>> +  return 1.1L + 1.2iL;
>> +}
>> +
>> +/* { dg-final { scan-assembler-times "fldz" 6 } } */
>> +/* { dg-final { scan-assembler-times "fstp" 6 } } */
> 
> /* { dg-final { scan-assembler-times "fldz" 8 { target ia32 } } } */
> /* { dg-final { scan-assembler-times "fstp" 8 { target ia32 } } } */
> 
> /* { dg-final { scan-assembler-times "fldz" 6 { target { ! ia32 } } } } */
> /* { dg-final { scan-assembler-times "fstp" 6 { target { ! ia32 } } } } */
> 
>> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-31.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-31.c
>> new file mode 100644
>> index 00000000000..943508d1d26
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-31.c
>> @@ -0,0 +1,16 @@
>> +/* { dg-do compile { target *-*-linux* } } */
>> +/* { dg-options "-O2 -mmmx -fzero-call-used-regs=all-arg" } */
>> +/* { dg-require-effective-target ia32 } */
>> +
>> +__v2si ret_mmx (void)
>> +{
>> +  return (__v2si) { 123, 345 };
>> +}
>> +
>> +/* { dg-final { scan-assembler "pxor\[ \t\]*%mm1, %mm1" } } */
>> +/* { dg-final { scan-assembler "movq\[ \t\]*%mm1, %mm2" } } */
>> +/* { dg-final { scan-assembler-not "movq\[ \t\]*%mm1, %mm3" } } */
> 
> /* { dg-final { scan-assembler-not "movq\[ \t\]*%mm1, %mm\[34567\]" } } */
> 
> should achieve the same with one regexp.
> 
>> +/* { dg-final { scan-assembler-not "movq\[ \t\]*%mm1, %mm4" } } */
>> +/* { dg-final { scan-assembler-not "movq\[ \t\]*%mm1, %mm5" } } */
>> +/* { dg-final { scan-assembler-not "movq\[ \t\]*%mm1, %mm6" } } */
>> +/* { dg-final { scan-assembler-not "movq\[ \t\]*%mm1, %mm7" } } */
>> --
>> 2.11.0
Qing Zhao Oct. 28, 2020, 2:09 p.m. UTC | #21
Hi, Richard,

In order to be consistent with other flags in flag-types.h, for example, “sanitize_code”,
I didn’t use namespace, instead making the name more specific as following:

/* Different settings for zeroing subset of registers.  */
enum  zero_regs_flags {
  ZERO_REGS_UNSET = 0,
  ZERO_REGS_SKIP = 1UL << 0,
  ZERO_REGS_ONLY_USED = 1UL << 1,
  ZERO_REGS_ONLY_GPR = 1UL << 2,
  ZERO_REGS_ONLY_ARG = 1UL << 3,
  ZERO_REGS_ENABLED = 1UL << 4,
  ZERO_REGS_USED_GPR_ARG = ZERO_REGS_ENABLED | ZERO_REGS_ONLY_USED
                           | ZERO_REGS_ONLY_GPR | ZERO_REGS_ONLY_ARG,
  ZERO_REGS_USED_GPR = ZERO_REGS_ENABLED | ZERO_REGS_ONLY_USED
                       | ZERO_REGS_ONLY_GPR,
  ZERO_REGS_USED_ARG = ZERO_REGS_ENABLED | ZERO_REGS_ONLY_USED
                       | ZERO_REGS_ONLY_ARG,
  ZERO_REGS_USED = ZERO_REGS_ENABLED | ZERO_REGS_ONLY_USED,
  ZERO_REGS_ALL_GPR_ARG = ZERO_REGS_ENABLED | ZERO_REGS_ONLY_GPR
                          | ZERO_REGS_ONLY_ARG,
  ZERO_REGS_ALL_GPR = ZERO_REGS_ENABLED | ZERO_REGS_ONLY_GPR,
  ZERO_REGS_ALL_ARG = ZERO_REGS_ENABLED | ZERO_REGS_ONLY_ARG,
  ZERO_REGS_ALL = ZERO_REGS_ENABLED
};

Is this good?

Or you still prefer namespace?

thanks.

Qing


> On Oct 27, 2020, at 10:36 AM, Richard Sandiford <richard.sandiford@arm.com> wrote:
> 
> Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>>>> diff --git a/gcc/flag-types.h b/gcc/flag-types.h
>>>> index 852ea76..0f7e503 100644
>>>> --- a/gcc/flag-types.h
>>>> +++ b/gcc/flag-types.h
>>>> @@ -285,6 +285,15 @@ enum sanitize_code {
>>>> 				  | SANITIZE_BOUNDS_STRICT
>>>> };
>>>> 
>>>> +enum  zero_call_used_regs_code {
>>>> +  UNSET = 0,
>>>> +  SKIP = 1UL << 0,
>>>> +  ONLY_USED = 1UL << 1,
>>>> +  ONLY_GPR = 1UL << 2,
>>>> +  ONLY_ARG = 1UL << 3,
>>>> +  ALL = 1UL << 4
>>>> +};
>>> 
>>> I'd suggested these names on the assumption that we'd be using
>>> a C++ enum class, so that the enum would be referenced as
>>> name::ALL, name::SKIP, etc.  But I guess using a C++ enum class
>>> doesn't work well with bitfields after all.
>>> 
>>> These names are too generic without the name:: scoping though.
>>> Perhaps we should put them in a namespace:
>>> 
>>> namespace zero_regs_flags {
>>>   const unsigned int UNSET = 0;
>>>   …etc…
>>> }
>>> 
>>> (call-used probably doesn't need to be part of the flag names,
>>> since the concept is more general than that and call-usedness
>>> is really a filter that's being applied on top.  Although I guess
>>> the same is true of “zero”. ;-))
>>> 
>>> I don't think we should have ALL as a separate flag: ALL is the absence
>>> of ONLY_*.  Maybe we should have an ENABLED flag that all non-skip
>>> combinations use?
>>> 
>>> If it makes things easier, I think it would be good to have e.g.:
>>> 
>>> unsigned int USED_GPR = ENABLED | ONLY_USED | ONLY_GPR;
>>> 
>>> inside the namespace, to reduce the verbosity in the option table.
>> 
>> Then, the final namespace will look like:
>> 
>> namespace zero_regs_flags {
>>  const unsigned int UNSET = 0;
>>  const unsigned int SKIP = 1UL << 0;
>>  const unsigned int ONLY_USED = 1UL << 1;
>>  const unsigned int ONLY_GPR = 1UL << 2;
>>  const unsigned int ONLY_ARG = 1UL << 3;
>>  const unsigned int ENABLED = 1UL << 4;
>>  const unsigned int USED_GPR_ARG = ONLY_USED | ONLY_GPR | ONLY_ARG;
> 
> “ENABLED |” here
> 
>>  const unsigned int USED_GPR = ENABLED | ONLY_USED | ONLY_GPR;
>>  const unsigned int USED_ARG = ENABLED | ONLY_USED | ONLY_ARG;
>>  const unsigned int USED = ENABLED | ONLY_USED;
>>  const unsigned int ALL_GRP_ARG = ENABLED | ONLY_GPR | ONLY_ARG;
> 
> GPR
> 
>>  const unsigned int ALL_GPR = ENABLED | ONLY_GPR;
>>  const unsigned int ALL_ARG = ENABLED | ONLY_ARG;
>>  const unsigned int ALL = ENABLED;
>> }
>> 
>> ??
>
Richard Sandiford Oct. 28, 2020, 2:19 p.m. UTC | #22
Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
> Hi, Richard,
>
> In order to be consistent with other flags in flag-types.h, for example, “sanitize_code”,
> I didn’t use namespace, instead making the name more specific as following:
>
> /* Different settings for zeroing subset of registers.  */
> enum  zero_regs_flags {
>   ZERO_REGS_UNSET = 0,
>   ZERO_REGS_SKIP = 1UL << 0,
>   ZERO_REGS_ONLY_USED = 1UL << 1,
>   ZERO_REGS_ONLY_GPR = 1UL << 2,
>   ZERO_REGS_ONLY_ARG = 1UL << 3,
>   ZERO_REGS_ENABLED = 1UL << 4,
>   ZERO_REGS_USED_GPR_ARG = ZERO_REGS_ENABLED | ZERO_REGS_ONLY_USED
>                            | ZERO_REGS_ONLY_GPR | ZERO_REGS_ONLY_ARG,
>   ZERO_REGS_USED_GPR = ZERO_REGS_ENABLED | ZERO_REGS_ONLY_USED
>                        | ZERO_REGS_ONLY_GPR,
>   ZERO_REGS_USED_ARG = ZERO_REGS_ENABLED | ZERO_REGS_ONLY_USED
>                        | ZERO_REGS_ONLY_ARG,
>   ZERO_REGS_USED = ZERO_REGS_ENABLED | ZERO_REGS_ONLY_USED,
>   ZERO_REGS_ALL_GPR_ARG = ZERO_REGS_ENABLED | ZERO_REGS_ONLY_GPR
>                           | ZERO_REGS_ONLY_ARG,
>   ZERO_REGS_ALL_GPR = ZERO_REGS_ENABLED | ZERO_REGS_ONLY_GPR,
>   ZERO_REGS_ALL_ARG = ZERO_REGS_ENABLED | ZERO_REGS_ONLY_ARG,
>   ZERO_REGS_ALL = ZERO_REGS_ENABLED
> };
>
> Is this good?
>
> Or you still prefer namespace?

I prefer the namespace.  I realise namespaces aren't used that much
in GCC yet, but they *are* used.

The advantage they have is that it's possible to do:

  using namespace ...;

in contexts where there's no ambiguity.  They also make lines like
the | ones above easier to read.

Thanks,
Richard
Qing Zhao Oct. 28, 2020, 2:24 p.m. UTC | #23
Okay, I will change it to namespace.

Qing

> On Oct 28, 2020, at 9:19 AM, Richard Sandiford <richard.sandiford@arm.com> wrote:
> 
> Qing Zhao <QING.ZHAO@ORACLE.COM <mailto:QING.ZHAO@ORACLE.COM>> writes:
>> Hi, Richard,
>> 
>> In order to be consistent with other flags in flag-types.h, for example, “sanitize_code”,
>> I didn’t use namespace, instead making the name more specific as following:
>> 
>> /* Different settings for zeroing subset of registers.  */
>> enum  zero_regs_flags {
>>  ZERO_REGS_UNSET = 0,
>>  ZERO_REGS_SKIP = 1UL << 0,
>>  ZERO_REGS_ONLY_USED = 1UL << 1,
>>  ZERO_REGS_ONLY_GPR = 1UL << 2,
>>  ZERO_REGS_ONLY_ARG = 1UL << 3,
>>  ZERO_REGS_ENABLED = 1UL << 4,
>>  ZERO_REGS_USED_GPR_ARG = ZERO_REGS_ENABLED | ZERO_REGS_ONLY_USED
>>                           | ZERO_REGS_ONLY_GPR | ZERO_REGS_ONLY_ARG,
>>  ZERO_REGS_USED_GPR = ZERO_REGS_ENABLED | ZERO_REGS_ONLY_USED
>>                       | ZERO_REGS_ONLY_GPR,
>>  ZERO_REGS_USED_ARG = ZERO_REGS_ENABLED | ZERO_REGS_ONLY_USED
>>                       | ZERO_REGS_ONLY_ARG,
>>  ZERO_REGS_USED = ZERO_REGS_ENABLED | ZERO_REGS_ONLY_USED,
>>  ZERO_REGS_ALL_GPR_ARG = ZERO_REGS_ENABLED | ZERO_REGS_ONLY_GPR
>>                          | ZERO_REGS_ONLY_ARG,
>>  ZERO_REGS_ALL_GPR = ZERO_REGS_ENABLED | ZERO_REGS_ONLY_GPR,
>>  ZERO_REGS_ALL_ARG = ZERO_REGS_ENABLED | ZERO_REGS_ONLY_ARG,
>>  ZERO_REGS_ALL = ZERO_REGS_ENABLED
>> };
>> 
>> Is this good?
>> 
>> Or you still prefer namespace?
> 
> I prefer the namespace.  I realise namespaces aren't used that much
> in GCC yet, but they *are* used.
> 
> The advantage they have is that it's possible to do:
> 
>  using namespace ...;
> 
> in contexts where there's no ambiguity.  They also make lines like
> the | ones above easier to read.
> 
> Thanks,
> Richard
Qing Zhao Oct. 28, 2020, 3:13 p.m. UTC | #24
Hi, Richard, 

I changed the “enum” to “namespace”.

There is no issue for C++ compilation. However, flag-types.h header file is also included by C modules and compiled with gcc, then I got a lot of following compilation errors:

make[4]: Entering directory '/home/qinzhao/Work/x86-build/x86_64-pc-linux-gnu/libgcc'
In file included from ../.././gcc/options.h:6,
                 from ../.././gcc/tm.h:22,
                 from ../../../x86-gcc/libgcc/libgcc2.c:29,
                 from ../../../x86-gcc/libgcc/config/i386/64/_multc3.c:6:
../../../x86-gcc/libgcc/../gcc/flag-types.h:289:1: error: unknown type name ‘namespace’
  289 | namespace  zero_regs_code {
      | ^~~~~~~~~

Looks like that I should not put this new namespace inside “flag-types.h”?  Which other header file I should put this namespace in? 

thanks.

Qing

> On Oct 28, 2020, at 9:24 AM, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
> 
> Okay, I will change it to namespace.
> 
> Qing
> 
>> On Oct 28, 2020, at 9:19 AM, Richard Sandiford <richard.sandiford@arm.com> wrote:
>> 
>> Qing Zhao <QING.ZHAO@ORACLE.COM <mailto:QING.ZHAO@ORACLE.COM>> writes:
>>> Hi, Richard,
>>> 
>>> In order to be consistent with other flags in flag-types.h, for example, “sanitize_code”,
>>> I didn’t use namespace, instead making the name more specific as following:
>>> 
>>> /* Different settings for zeroing subset of registers.  */
>>> enum  zero_regs_flags {
>>> ZERO_REGS_UNSET = 0,
>>> ZERO_REGS_SKIP = 1UL << 0,
>>> ZERO_REGS_ONLY_USED = 1UL << 1,
>>> ZERO_REGS_ONLY_GPR = 1UL << 2,
>>> ZERO_REGS_ONLY_ARG = 1UL << 3,
>>> ZERO_REGS_ENABLED = 1UL << 4,
>>> ZERO_REGS_USED_GPR_ARG = ZERO_REGS_ENABLED | ZERO_REGS_ONLY_USED
>>>                          | ZERO_REGS_ONLY_GPR | ZERO_REGS_ONLY_ARG,
>>> ZERO_REGS_USED_GPR = ZERO_REGS_ENABLED | ZERO_REGS_ONLY_USED
>>>                      | ZERO_REGS_ONLY_GPR,
>>> ZERO_REGS_USED_ARG = ZERO_REGS_ENABLED | ZERO_REGS_ONLY_USED
>>>                      | ZERO_REGS_ONLY_ARG,
>>> ZERO_REGS_USED = ZERO_REGS_ENABLED | ZERO_REGS_ONLY_USED,
>>> ZERO_REGS_ALL_GPR_ARG = ZERO_REGS_ENABLED | ZERO_REGS_ONLY_GPR
>>>                         | ZERO_REGS_ONLY_ARG,
>>> ZERO_REGS_ALL_GPR = ZERO_REGS_ENABLED | ZERO_REGS_ONLY_GPR,
>>> ZERO_REGS_ALL_ARG = ZERO_REGS_ENABLED | ZERO_REGS_ONLY_ARG,
>>> ZERO_REGS_ALL = ZERO_REGS_ENABLED
>>> };
>>> 
>>> Is this good?
>>> 
>>> Or you still prefer namespace?
>> 
>> I prefer the namespace.  I realise namespaces aren't used that much
>> in GCC yet, but they *are* used.
>> 
>> The advantage they have is that it's possible to do:
>> 
>> using namespace ...;
>> 
>> in contexts where there's no ambiguity.  They also make lines like
>> the | ones above easier to read.
>> 
>> Thanks,
>> Richard
>
Richard Sandiford Oct. 28, 2020, 3:36 p.m. UTC | #25
Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
> Hi, Richard, 
>
> I changed the “enum” to “namespace”.
>
> There is no issue for C++ compilation. However, flag-types.h header file is also included by C modules and compiled with gcc, then I got a lot of following compilation errors:
>
> make[4]: Entering directory '/home/qinzhao/Work/x86-build/x86_64-pc-linux-gnu/libgcc'
> In file included from ../.././gcc/options.h:6,
>                  from ../.././gcc/tm.h:22,
>                  from ../../../x86-gcc/libgcc/libgcc2.c:29,
>                  from ../../../x86-gcc/libgcc/config/i386/64/_multc3.c:6:
> ../../../x86-gcc/libgcc/../gcc/flag-types.h:289:1: error: unknown type name ‘namespace’
>   289 | namespace  zero_regs_code {
>       | ^~~~~~~~~
>
> Looks like that I should not put this new namespace inside “flag-types.h”?  Which other header file I should put this namespace in? 

I think we should just protect the contents of flag-types.h with:

#if !defined(IN_LIBGCC2) && !defined(IN_TARGET_LIBS) && !defined(IN_RTS)

similarly to what we do for flags.h.

Thanks,
Richard
diff mbox series

Patch

diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
index c779d13..979b6a7 100644
--- a/gcc/c-family/c-attribs.c
+++ b/gcc/c-family/c-attribs.c
@@ -138,6 +138,8 @@  static tree handle_target_clones_attribute (tree *, tree, tree, int, bool *);
 static tree handle_optimize_attribute (tree *, tree, tree, int, bool *);
 static tree ignore_attribute (tree *, tree, tree, int, bool *);
 static tree handle_no_split_stack_attribute (tree *, tree, tree, int, bool *);
+static tree handle_zero_call_used_regs_attribute (tree *, tree, tree, int,
+						  bool *);
 static tree handle_argspec_attribute (tree *, tree, tree, int, bool *);
 static tree handle_fnspec_attribute (tree *, tree, tree, int, bool *);
 static tree handle_warn_unused_attribute (tree *, tree, tree, int, bool *);
@@ -437,6 +439,8 @@  const struct attribute_spec c_common_attribute_table[] =
 			      ignore_attribute, NULL },
   { "no_split_stack",	      0, 0, true,  false, false, false,
 			      handle_no_split_stack_attribute, NULL },
+  { "zero_call_used_regs",    1, 1, true, false, false, false,
+			      handle_zero_call_used_regs_attribute, NULL },
   /* For internal use only (marking of function arguments).
      The name contains a space to prevent its usage in source code.  */
   { "arg spec",		      1, -1, true, false, false, false,
@@ -4959,6 +4963,33 @@  handle_no_split_stack_attribute (tree *node, tree name,
   return NULL_TREE;
 }
 
+/* Handle a "zero_call_used_regs" attribute; arguments as in
+   struct attribute_spec.handler.  */
+
+static tree
+handle_zero_call_used_regs_attribute (tree *node, tree name, tree args,
+				      int ARG_UNUSED (flags),
+				      bool *no_add_attrs)
+{
+  tree decl = *node;
+  tree id = TREE_VALUE (args);
+
+  if (TREE_CODE (decl) != FUNCTION_DECL)
+    {
+      error_at (DECL_SOURCE_LOCATION (decl),
+		"%qE attribute applies only to functions", name);
+      *no_add_attrs = true;
+    }
+
+  if (TREE_CODE (id) != STRING_CST)
+    {
+      error ("attribute %qE arguments not a string", name);
+      *no_add_attrs = true;
+    }
+
+  return NULL_TREE;
+}
+
 /* Handle a "returns_nonnull" attribute; arguments as in
    struct attribute_spec.handler.  */
 
diff --git a/gcc/common.opt b/gcc/common.opt
index 292c2de..4a13f32 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -228,6 +228,10 @@  unsigned int flag_sanitize_coverage
 Variable
 bool dump_base_name_prefixed = false
 
+; What subset of registers should be zeroed
+Variable
+unsigned int flag_zero_call_used_regs
+
 ###
 Driver
 
@@ -3111,6 +3115,10 @@  fzero-initialized-in-bss
 Common Report Var(flag_zero_initialized_in_bss) Init(1)
 Put zero initialized data in the bss section.
 
+fzero-call-used-regs=
+Common Report RejectNegative Joined
+Clear call-used registers upon function return.
+
 g
 Common Driver RejectNegative JoinedOrMissing
 Generate debug information in default format.
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index f684954..e66dcf0 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -3551,6 +3551,189 @@  ix86_function_value_regno_p (const unsigned int regno)
   return false;
 }
 
+/* Check whether the register REGNO should be zeroed on X86.
+   When ALL_SSE_ZEROED is true, all SSE registers have been zeroed
+   together, no need to zero it again.
+   Stack registers (st0-st7) and mm0-mm7 are aliased with each other.
+   very hard to be zeroed individually, don't zero individual st or
+   mm registgers.  */
+
+static bool
+zero_call_used_regno_p (const unsigned int regno,
+			bool all_sse_zeroed)
+{
+  return GENERAL_REGNO_P (regno)
+	 || (!all_sse_zeroed && SSE_REGNO_P (regno))
+	 || MASK_REGNO_P (regno);
+}
+
+/* Return the machine_mode that is used to zero register REGNO.  */
+
+static machine_mode
+zero_call_used_regno_mode (const unsigned int regno)
+{
+  /* NB: We only need to zero the lower 32 bits for integer registers
+     and the lower 128 bits for vector registers since destination are
+     zero-extended to the full register width.  */
+  if (GENERAL_REGNO_P (regno))
+    return SImode;
+  else if (SSE_REGNO_P (regno))
+    return V4SFmode;
+  else
+    return HImode;
+}
+
+/* Generate a rtx to zero all vector registers together if possible,
+   otherwise, return NULL.  */
+
+static rtx
+zero_all_vector_registers (HARD_REG_SET need_zeroed_hardregs)
+{
+  if (!TARGET_AVX)
+    return NULL;
+
+  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
+    if ((IN_RANGE (regno, FIRST_SSE_REG, LAST_SSE_REG)
+	 || (TARGET_64BIT
+	     && (REX_SSE_REGNO_P (regno)
+		 || (TARGET_AVX512F && EXT_REX_SSE_REGNO_P (regno)))))
+	&& !TEST_HARD_REG_BIT (need_zeroed_hardregs, regno))
+      return NULL;
+
+  return gen_avx_vzeroall ();
+}
+
+/* Generate insns to zero all st/mm registers together.
+   Return true when zeroing instructions are generated.
+   Assume the number of st registers that are zeroed is num_of_st,
+   we will emit the following sequence to zero them together:
+		  fldz;		\
+		  fldz;		\
+		  ...
+		  fldz;		\
+		  fstp %%st(0);	\
+		  fstp %%st(0);	\
+		  ...
+		  fstp %%st(0);
+   i.e., num_of_st fldz followed by num_of_st fstp to clear the stack
+   mark stack slots empty.  */
+
+static bool
+zero_all_st_mm_registers (HARD_REG_SET need_zeroed_hardregs)
+{
+  unsigned int num_of_st = 0;
+  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
+    if (STACK_REGNO_P (regno)
+	&& TEST_HARD_REG_BIT (need_zeroed_hardregs, regno)
+	/* When the corresponding mm register also need to be cleared too.  */
+	&& TEST_HARD_REG_BIT (need_zeroed_hardregs,
+			      (regno - FIRST_STACK_REG + FIRST_MMX_REG)))
+      num_of_st++;
+
+  if (num_of_st == 0)
+    return false;
+
+  rtx st_reg = gen_rtx_REG (XFmode, FIRST_STACK_REG);
+  for (unsigned int i = 0; i < num_of_st; i++)
+    emit_insn (gen_rtx_SET (st_reg, CONST0_RTX (XFmode)));
+
+  for (unsigned int i = 0; i < num_of_st; i++)
+    {
+      rtx insn;
+      insn = emit_insn (gen_rtx_SET (st_reg, st_reg));
+      add_reg_note (insn, REG_DEAD, st_reg);
+    }
+  return true;
+}
+
+/* TARGET_ZERO_CALL_USED_REGS.  */
+/* Generate a sequence of instructions that zero registers specified by
+   NEED_ZEROED_HARDREGS.  Return the ZEROED_HARDREGS that are actually
+   zeroed.  */
+static HARD_REG_SET
+ix86_zero_call_used_regs (HARD_REG_SET need_zeroed_hardregs)
+{
+  HARD_REG_SET zeroed_hardregs;
+  bool all_sse_zeroed = false;
+  bool st_zeroed = false;
+
+  /* first, let's see whether we can zero all vector registers together.  */
+  rtx zero_all_vec_insn = zero_all_vector_registers (need_zeroed_hardregs);
+  if (zero_all_vec_insn)
+    {
+      emit_insn (zero_all_vec_insn);
+      all_sse_zeroed = true;
+    }
+
+  /* then, let's see whether we can zero all st+mm registers togeter.  */
+  st_zeroed = zero_all_st_mm_registers (need_zeroed_hardregs);
+
+  /* Now, generate instructions to zero all the registers.  */
+
+  CLEAR_HARD_REG_SET (zeroed_hardregs);
+  if (st_zeroed)
+    SET_HARD_REG_BIT (zeroed_hardregs, FIRST_STACK_REG);
+
+  rtx zero_gpr = NULL_RTX;
+  rtx zero_vector = NULL_RTX;
+  rtx zero_mask = NULL_RTX;
+
+  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
+    {
+      if (!TEST_HARD_REG_BIT (need_zeroed_hardregs, regno))
+	continue;
+      if (!zero_call_used_regno_p (regno, all_sse_zeroed))
+	continue;
+
+      SET_HARD_REG_BIT (zeroed_hardregs, regno);
+
+      rtx reg, tmp;
+      machine_mode mode = zero_call_used_regno_mode (regno);
+
+      reg = gen_rtx_REG (mode, regno);
+
+      if (mode == SImode)
+	if (zero_gpr == NULL_RTX)
+	  {
+	    zero_gpr = reg;
+	    tmp = gen_rtx_SET (reg, const0_rtx);
+	    if (!TARGET_USE_MOV0 || optimize_insn_for_size_p ())
+	      {
+		rtx clob = gen_rtx_CLOBBER (VOIDmode,
+					    gen_rtx_REG (CCmode,
+							 FLAGS_REG));
+		tmp = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2,
+							     tmp,
+							     clob));
+	      }
+	    emit_insn (tmp);
+	  }
+	else
+	  emit_move_insn (reg, zero_gpr);
+      else if (mode == V4SFmode)
+	if (zero_vector == NULL_RTX)
+	  {
+	    zero_vector = reg;
+	    tmp = gen_rtx_SET (reg, const0_rtx);
+	    emit_insn (tmp);
+	  }
+	else
+	  emit_move_insn (reg, zero_vector);
+      else if (mode == HImode)
+	if (zero_mask == NULL_RTX)
+	  {
+	    zero_mask = reg;
+	    tmp = gen_rtx_SET (reg, const0_rtx);
+	    emit_insn (tmp);
+	  }
+	else
+	  emit_move_insn (reg, zero_mask);
+      else
+	gcc_unreachable ();
+    }
+  return zeroed_hardregs;
+}
+
 /* Define how to find the value returned by a function.
    VALTYPE is the data type of the value (as a tree).
    If the precise function being called is known, FUNC is its FUNCTION_DECL;
@@ -23229,6 +23412,9 @@  ix86_run_selftests (void)
 #undef TARGET_FUNCTION_VALUE_REGNO_P
 #define TARGET_FUNCTION_VALUE_REGNO_P ix86_function_value_regno_p
 
+#undef TARGET_ZERO_CALL_USED_REGS
+#define TARGET_ZERO_CALL_USED_REGS ix86_zero_call_used_regs
+
 #undef TARGET_PROMOTE_FUNCTION_MODE
 #define TARGET_PROMOTE_FUNCTION_MODE ix86_promote_function_mode
 
diff --git a/gcc/df-scan.c b/gcc/df-scan.c
index 93b060f..9e75c13 100644
--- a/gcc/df-scan.c
+++ b/gcc/df-scan.c
@@ -3614,6 +3614,14 @@  df_update_entry_block_defs (void)
 }
 
 
+/* Return true if REGNO is used by the epilogue.  */
+bool
+df_epilogue_uses_p (unsigned int regno)
+{
+  return (EPILOGUE_USES (regno)
+	  || TEST_HARD_REG_BIT (crtl->must_be_zero_on_return, regno));
+}
+
 /* Set the bit for regs that are considered being used at the exit. */
 
 static void
@@ -3661,7 +3669,7 @@  df_get_exit_block_use_set (bitmap exit_block_uses)
      epilogue as being live at the end of the function since they
      may be referenced by our caller.  */
   for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
-    if (global_regs[i] || EPILOGUE_USES (i))
+    if (global_regs[i] || df_epilogue_uses_p (i))
       bitmap_set_bit (exit_block_uses, i);
 
   if (targetm.have_epilogue () && epilogue_completed)
@@ -3802,7 +3810,6 @@  df_hard_reg_init (void)
   initialized = true;
 }
 
-
 /* Recompute the parts of scanning that are based on regs_ever_live
    because something changed in that array.  */
 
@@ -3862,7 +3869,6 @@  df_regs_ever_live_p (unsigned int regno)
   return regs_ever_live[regno];
 }
 
-
 /* Set regs_ever_live[REGNO] to VALUE.  If this cause regs_ever_live
    to change, schedule that change for the next update.  */
 
diff --git a/gcc/df.h b/gcc/df.h
index 8b6ca8c..0f098d7 100644
--- a/gcc/df.h
+++ b/gcc/df.h
@@ -1085,6 +1085,7 @@  extern void df_update_entry_exit_and_calls (void);
 extern bool df_hard_reg_used_p (unsigned int);
 extern unsigned int df_hard_reg_used_count (unsigned int);
 extern bool df_regs_ever_live_p (unsigned int);
+extern bool df_epilogue_uses_p (unsigned int);
 extern void df_set_regs_ever_live (unsigned int, bool);
 extern void df_compute_regs_ever_live (bool);
 extern void df_scan_verify (void);
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index c9f7299..3a884e1 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -3992,6 +3992,49 @@  performing a link with relocatable output (i.e.@: @code{ld -r}) on them.
 A declaration to which @code{weakref} is attached and that is associated
 with a named @code{target} must be @code{static}.
 
+@item zero_call_used_regs ("@var{choice}")
+@cindex @code{zero_call_used_regs} function attribute
+
+The @code{zero_call_used_regs} attribute causes the compiler to zero
+a subset of all call-used registers at function return according to
+@var{choice}.
+This is used to increase the program security by either mitigating
+Return-Oriented Programming (ROP) or preventing information leak
+through registers.
+
+A "call-used" register is a register that is clobbered by function calls,
+as a result, the caller has to save and restore it before or after a
+function call.  It is also called as "call-clobbered", "caller-saved", or
+"volatile".
+
+In order to satisfy users with different security needs and control the
+run-time overhead at the same time,  GCC provides a flexible way to choose
+the subset of the call-used registers to be zeroed.
+
+@samp{skip} doesn't zero any call-used registers.
+@samp{used} zeros call-used registers which are used in the function.  A "used"
+register is one whose content has been set or referenced in the function.
+@samp{all} zeros all call-used registers.
+
+In addition to the above three basic choices, the register set can be further
+limited by adding "-gpr" (i.e., general purpose register), "-arg" (i.e.,
+argument register), or both as following:
+
+@samp{used-gpr-arg} zeros used call-used general purpose registers that
+pass parameters.
+@samp{used-arg} zeros used call-used registers that pass parameters.
+@samp{all-gpr-arg} zeros all call-used general purpose registers that pass
+parameters.
+@samp{all-arg} zeros all call-used registers that pass parameters.
+@samp{used-gpr} zeros call-used general purpose registers which are used in the
+function.
+@samp{all-gpr} zeros all call-used general purpose registers.
+
+Among this list, "used-gpr-arg", "used-arg", "all-gpr-arg", and "all-arg" are
+mainly used for ROP mitigation.
+
+The default for the attribute is controlled by @option{-fzero-call-used-regs}.
+
 @end table
 
 @c This is the end of the target-independent attribute table
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index c049932..c6837d7 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -550,7 +550,7 @@  Objective-C and Objective-C++ Dialects}.
 -funit-at-a-time  -funroll-all-loops  -funroll-loops @gol
 -funsafe-math-optimizations  -funswitch-loops @gol
 -fipa-ra  -fvariable-expansion-in-unroller  -fvect-cost-model  -fvpt @gol
--fweb  -fwhole-program  -fwpa  -fuse-linker-plugin @gol
+-fweb  -fwhole-program  -fwpa  -fuse-linker-plugin -fzero-call-used-regs @gol
 --param @var{name}=@var{value}
 -O  -O0  -O1  -O2  -O3  -Os  -Ofast  -Og}
 
@@ -12550,6 +12550,46 @@  int foo (void)
 
 Not all targets support this option.
 
+@item -fzero-call-used-regs=@var{choice}
+@opindex fzero-call-used-regs
+Zero call-used registers at function return to increase the program
+security by either mitigating Return-Oriented Programming (ROP) or
+preventing information leak through registers.
+
+A "call-used" register is a register that is clobbered by function calls,
+as a result, the caller has to save and restore it before or after a
+function call.  It is also called as "call-clobbered", "caller-saved", or
+"volatile".
+
+In order to satisfy users with different security needs and control the
+run-time overhead at the same time,  GCC provides a flexible way to choose
+the subset of the call-used registers to be zeroed.
+
+@samp{skip}, which is the default, doesn't zero any call-used registers.
+@samp{used} zeros call-used registers which are used in the function.  A "used"
+register is one whose content has been set or referenced in the function.
+@samp{all} zeros all call-used registers.
+
+In addition to the above three basic choices, the register set can be further
+limited by adding "-gpr" (i.e., general purpose register), "-arg" (i.e.,
+argument register), or both as following:
+
+@samp{used-gpr-arg} zeros used call-used general purpose registers that
+pass parameters.
+@samp{used-arg} zeros used call-used registers that pass parameters.
+@samp{all-gpr-arg} zeros all call-used general purpose registers that pass
+parameters.
+@samp{all-arg} zeros all call-used registers that pass parameters.
+@samp{used-gpr} zeros call-used general purpose registers which are used in the
+function.
+@samp{all-gpr} zeros all call-used general purpose registers.
+
+Among this list, "used-gpr-arg", "used-arg", "all-gpr-arg", and "all-arg" are
+mainly used for ROP mitigation.
+
+You can control this behavior for a specific function by using the function
+attribute @code{zero_call_used_regs}.  @xref{Function Attributes}.
+
 @item --param @var{name}=@var{value}
 @opindex param
 In some places, GCC uses various constants to control the amount of
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 97437e8..3b75c46 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -12053,6 +12053,18 @@  argument list due to stack realignment.  Return @code{NULL} if no DRAP
 is needed.
 @end deftypefn
 
+@deftypefn {Target Hook} HARD_REG_SET TARGET_ZERO_CALL_USED_REGS (HARD_REG_SET @var{selected_regs})
+This target hook emits instructions to zero subset of @var{selected_regs}
+that could conceivably contain values that are useful to an attacker.
+Return the set of registers that were actually cleared.
+
+The default implementation uses normal move instructions to zero
+all the registers in @var{selected_regs}.  Define this hook if the
+target has more efficient ways of zeroing certain registers,
+or if you believe that certain registers would never contain
+values that are useful to an attacker.
+@end deftypefn
+
 @deftypefn {Target Hook} bool TARGET_ALLOCATE_STACK_SLOTS_FOR_ARGS (void)
 When optimization is disabled, this hook indicates whether or not
 arguments should be allocated to stack slots.  Normally, GCC allocates
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 412e22c..a67dbea 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -8111,6 +8111,8 @@  and the associated definitions of those functions.
 
 @hook TARGET_GET_DRAP_RTX
 
+@hook TARGET_ZERO_CALL_USED_REGS
+
 @hook TARGET_ALLOCATE_STACK_SLOTS_FOR_ARGS
 
 @hook TARGET_CONST_ANCHOR
diff --git a/gcc/emit-rtl.h b/gcc/emit-rtl.h
index 92ad0dd6..d7bdb66 100644
--- a/gcc/emit-rtl.h
+++ b/gcc/emit-rtl.h
@@ -173,6 +173,9 @@  struct GTY(()) rtl_data {
         local stack.  */
   unsigned int stack_alignment_estimated;
 
+  /* How to zero call-used regsiters for this routine.  */
+  unsigned int zero_call_used_regs;
+
   /* How many NOP insns to place at each function entry by default.  */
   unsigned short patch_area_size;
 
@@ -310,6 +313,9 @@  struct GTY(()) rtl_data {
      sets them.  */
   HARD_REG_SET asm_clobbers;
 
+  /* All hard registers that need to be zeroed at the return of the routine.  */
+  HARD_REG_SET must_be_zero_on_return;
+
   /* The highest address seen during shorten_branches.  */
   int max_insn_address;
 };
diff --git a/gcc/flag-types.h b/gcc/flag-types.h
index 852ea76..0f7e503 100644
--- a/gcc/flag-types.h
+++ b/gcc/flag-types.h
@@ -285,6 +285,15 @@  enum sanitize_code {
 				  | SANITIZE_BOUNDS_STRICT
 };
 
+enum  zero_call_used_regs_code {
+  UNSET = 0,
+  SKIP = 1UL << 0,
+  ONLY_USED = 1UL << 1,
+  ONLY_GPR = 1UL << 2,
+  ONLY_ARG = 1UL << 3,
+  ALL = 1UL << 4
+};
+
 /* Settings of flag_incremental_link.  */
 enum incremental_link {
   INCREMENTAL_LINK_NONE,
diff --git a/gcc/function.c b/gcc/function.c
index c612959..56e9997 100644
--- a/gcc/function.c
+++ b/gcc/function.c
@@ -46,10 +46,12 @@  along with GCC; see the file COPYING3.  If not see
 #include "stringpool.h"
 #include "expmed.h"
 #include "optabs.h"
+#include "opts.h"
 #include "regs.h"
 #include "emit-rtl.h"
 #include "recog.h"
 #include "rtl-error.h"
+#include "hard-reg-set.h"
 #include "alias.h"
 #include "fold-const.h"
 #include "stor-layout.h"
@@ -5815,6 +5817,102 @@  make_prologue_seq (void)
   return seq;
 }
 
+/* Emit a sequence of insns to zero the call-used-registers before RET.  */
+
+static void
+gen_call_used_regs_seq (rtx_insn *ret)
+{
+  bool gpr_only = true;
+  bool used_only = true;
+  bool arg_only = true;
+
+  /* No need to zero call-used-regs in main ().  */
+  if (MAIN_NAME_P (DECL_NAME (current_function_decl)))
+    return;
+
+  /* No need to zero call-used-regs if __builtin_eh_return is called
+     since it isn't a normal function return.  */
+  if (crtl->calls_eh_return)
+    return;
+
+  /* If gpr_only is true, only zero call-used-registers that are
+     general-purpose registers; if used_only is true, only zero
+     call-used-registers that are used in the current function.  */
+
+  gpr_only = crtl->zero_call_used_regs & ONLY_GPR;
+  used_only = crtl->zero_call_used_regs & ONLY_USED;
+  arg_only = crtl->zero_call_used_regs & ONLY_ARG;
+
+  /* For each of the hard registers, check to see whether we should zero it if:
+     1. it is a call-used-registers;
+ and 2. it is not a fixed-registers;
+ and 3. it is not live at the return of the routine;
+ and 4. it is general registor if gpr_only is true;
+ and 5. it is used in the routine if used_only is true;
+ and 6. it is a register that passes parameter if arg_only is true;
+   */
+
+  /* First, prepare the data flow information.  */
+  basic_block bb = BLOCK_FOR_INSN (ret);
+  bitmap live_out;
+  live_out = BITMAP_ALLOC (NULL);
+  bitmap_copy (live_out, df_get_live_out (bb));
+  df_simulate_initialize_backwards (bb, live_out);
+  df_simulate_one_insn_backwards (bb, ret, live_out);
+
+  HARD_REG_SET need_zeroed_hardregs;
+  CLEAR_HARD_REG_SET (need_zeroed_hardregs);
+  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
+    {
+      if (!crtl->abi->clobbers_full_reg_p (regno))
+	continue;
+      if (fixed_regs[regno])
+	continue;
+      if (REGNO_REG_SET_P (live_out, regno))
+	continue;
+      if (gpr_only
+	  && !TEST_HARD_REG_BIT (reg_class_contents[GENERAL_REGS], regno))
+	continue;
+      if (used_only && !df_regs_ever_live_p (regno))
+	continue;
+      if (arg_only && !FUNCTION_ARG_REGNO_P (regno))
+	continue;
+
+      /* Now this is a register that we might want to zero.  */
+      SET_HARD_REG_BIT (need_zeroed_hardregs, regno);
+    }
+
+  BITMAP_FREE (live_out);
+
+  if (hard_reg_set_empty_p (need_zeroed_hardregs))
+    return;
+
+  /* Now we get a hard register set that need to be zeroed, pass it to
+     target to generate zeroing sequence.  */
+  HARD_REG_SET zeroed_hardregs;
+  start_sequence ();
+  zeroed_hardregs = targetm.calls.zero_call_used_regs (need_zeroed_hardregs);
+  rtx_insn *seq = get_insns ();
+  end_sequence ();
+  if (seq)
+    {
+      /* Emit the memory blockage and register clobber asm volatile before
+	 the whole sequence.  */
+      start_sequence ();
+      expand_asm_reg_clobber_mem_blockage (zeroed_hardregs);
+      rtx_insn *seq_barrier = get_insns ();
+      end_sequence ();
+
+      emit_insn_before (seq_barrier, ret);
+      emit_insn_before (seq, ret);
+
+      /* Update the data flow information.  */
+      crtl->must_be_zero_on_return |= zeroed_hardregs;
+      df_set_bb_dirty (EXIT_BLOCK_PTR_FOR_FN (cfun));
+    }
+}
+
+
 /* Return a sequence to be used as the epilogue for the current function,
    or NULL.  */
 
@@ -6486,7 +6584,120 @@  make_pass_thread_prologue_and_epilogue (gcc::context *ctxt)
 {
   return new pass_thread_prologue_and_epilogue (ctxt);
 }
-

+
+static unsigned int
+rest_of_zero_call_used_regs (void)
+{
+  edge_iterator ei;
+  edge e;
+  rtx_insn *insn;
+
+  /* This pass needs data flow information.  */
+  df_analyze ();
+
+  /* Search all the "return"s in the routine, and insert instruction sequence to
+     zero the call used registers.  */
+  FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds)
+    {
+      insn = BB_END (e->src);
+      if (JUMP_P (insn) && ANY_RETURN_P (JUMP_LABEL (insn)))
+	gen_call_used_regs_seq (insn);
+    }
+
+  return 0;
+}
+
+namespace {
+
+const pass_data pass_data_zero_call_used_regs =
+{
+  RTL_PASS, /* type */
+  "zero_call_used_regs", /* name */
+  OPTGROUP_NONE, /* optinfo_flags */
+  TV_NONE, /* tv_id */
+  0, /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  0, /* todo_flags_finish */
+};
+
+class pass_zero_call_used_regs: public rtl_opt_pass
+{
+public:
+  pass_zero_call_used_regs (gcc::context *ctxt)
+    : rtl_opt_pass (pass_data_zero_call_used_regs, ctxt)
+  {}
+
+  /* opt_pass methods: */
+  virtual bool gate (function *);
+
+  virtual unsigned int execute (function *)
+    {
+      return rest_of_zero_call_used_regs ();
+    }
+
+}; // class pass_zero_call_used_regs
+
+bool
+pass_zero_call_used_regs::gate (function *fun)
+{
+  unsigned int zero_regs_type = UNSET;
+  unsigned int attr_zero_regs_type = UNSET;
+
+  tree attr_zero_regs
+	= lookup_attribute ("zero_call_used_regs",
+			    DECL_ATTRIBUTES (fun->decl));
+
+  /* Get the type of zero_call_used_regs from function attribute.  */
+  if (attr_zero_regs)
+    {
+      bool found = false;
+      unsigned int i;
+
+      /* The TREE_VALUE of an attribute is a TREE_LIST whose TREE_VALUE
+	 is the attribute argument's value.  */
+      attr_zero_regs = TREE_VALUE (attr_zero_regs);
+      gcc_assert (TREE_CODE (attr_zero_regs) == TREE_LIST);
+      attr_zero_regs = TREE_VALUE (attr_zero_regs);
+      gcc_assert (TREE_CODE (attr_zero_regs) == STRING_CST);
+
+      for (i = 0; zero_call_used_regs_opts[i].name != NULL; ++i)
+	if (strcmp (TREE_STRING_POINTER (attr_zero_regs),
+		     zero_call_used_regs_opts[i].name) == 0)
+	  {
+	    attr_zero_regs_type |= zero_call_used_regs_opts[i].flag;
+	    found = true;
+ 	    break;
+	  }
+
+      if (!found)
+	warning_at (DECL_SOURCE_LOCATION (fun->decl), 0,
+		    "unrecognized zero_call_used_regs attribute: %qs",
+		    TREE_STRING_POINTER (attr_zero_regs));
+    }
+
+  if (flag_zero_call_used_regs)
+    if (!attr_zero_regs)
+      zero_regs_type = flag_zero_call_used_regs;
+    else
+      zero_regs_type = attr_zero_regs_type;
+  else
+    zero_regs_type = attr_zero_regs_type;
+
+  crtl->zero_call_used_regs = zero_regs_type;
+
+  /* No need to zero call-used-regs when no user request is present.  */
+  return zero_regs_type > SKIP;
+}
+
+} // anon namespace
+
+rtl_opt_pass *
+make_pass_zero_call_used_regs (gcc::context *ctxt)
+{
+  return new pass_zero_call_used_regs (ctxt);
+}
 
 /* If CONSTRAINT is a matching constraint, then return its number.
    Otherwise, return -1.  */
diff --git a/gcc/optabs.c b/gcc/optabs.c
index 8ad7f4b..bd64af0 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -6484,6 +6484,48 @@  expand_memory_blockage (void)
     expand_asm_memory_blockage ();
 }
 
+/* Generate asm volatile("" : : : "memory") as a memory blockage, at the
+   same time clobbering the register set specified by REGS.  */
+
+void
+expand_asm_reg_clobber_mem_blockage (HARD_REG_SET regs)
+{
+  rtx asm_op, clob_mem;
+
+  unsigned int num_of_regs = 0;
+  for (unsigned int i = 0; i < FIRST_PSEUDO_REGISTER; i++)
+    if (TEST_HARD_REG_BIT (regs, i))
+      num_of_regs++;
+
+  asm_op = gen_rtx_ASM_OPERANDS (VOIDmode, "", "", 0,
+				 rtvec_alloc (0), rtvec_alloc (0),
+				 rtvec_alloc (0), UNKNOWN_LOCATION);
+  MEM_VOLATILE_P (asm_op) = 1;
+
+  rtvec v = rtvec_alloc (num_of_regs + 2);
+
+  clob_mem = gen_rtx_SCRATCH (VOIDmode);
+  clob_mem = gen_rtx_MEM (BLKmode, clob_mem);
+  clob_mem = gen_rtx_CLOBBER (VOIDmode, clob_mem);
+
+  RTVEC_ELT (v,0) = asm_op;
+  RTVEC_ELT (v,1) = clob_mem;
+
+  if (num_of_regs > 0)
+    {
+      unsigned int j = 2;
+      for (unsigned int i = 0; i < FIRST_PSEUDO_REGISTER; i++)
+	if (TEST_HARD_REG_BIT (regs, i))
+	  {
+	    RTVEC_ELT (v,j) = gen_rtx_CLOBBER (VOIDmode, regno_reg_rtx[i]);
+ 	    j++;
+	  }
+      gcc_assert (j == (num_of_regs + 2));
+    }
+
+  emit_insn (gen_rtx_PARALLEL (VOIDmode, v));
+}
+
 /* This routine will either emit the mem_thread_fence pattern or issue a 
    sync_synchronize to generate a fence for memory model MEMMODEL.  */
 
diff --git a/gcc/optabs.h b/gcc/optabs.h
index 0b14700..bfa10c8 100644
--- a/gcc/optabs.h
+++ b/gcc/optabs.h
@@ -345,6 +345,8 @@  rtx expand_atomic_store (rtx, rtx, enum memmodel, bool);
 rtx expand_atomic_fetch_op (rtx, rtx, rtx, enum rtx_code, enum memmodel, 
 			      bool);
 
+extern void expand_asm_reg_clobber_mem_blockage (HARD_REG_SET);
+
 extern bool insn_operand_matches (enum insn_code icode, unsigned int opno,
 				  rtx operand);
 extern bool valid_multiword_target_p (rtx);
diff --git a/gcc/opts.c b/gcc/opts.c
index 3bda59a..f95a1f0 100644
--- a/gcc/opts.c
+++ b/gcc/opts.c
@@ -1776,6 +1776,24 @@  const struct sanitizer_opts_s coverage_sanitizer_opts[] =
   { NULL, 0U, 0UL, false }
 };
 
+/* -fzero-call-used-regs= suboptions.  */
+const struct zero_call_used_regs_opts_s zero_call_used_regs_opts[] =
+{
+#define ZERO_CALL_USED_REGS_OPT(name, flags) \
+    { #name, flags }
+  ZERO_CALL_USED_REGS_OPT (skip, SKIP),
+  ZERO_CALL_USED_REGS_OPT (used-gpr-arg, (ONLY_USED | ONLY_GPR | ONLY_ARG)),
+  ZERO_CALL_USED_REGS_OPT (used-arg, (ONLY_USED | ONLY_ARG)),
+  ZERO_CALL_USED_REGS_OPT (all-gpr-arg, (ONLY_GPR | ONLY_ARG)),
+  ZERO_CALL_USED_REGS_OPT (all-arg, ONLY_ARG),
+  ZERO_CALL_USED_REGS_OPT (used-gpr, (ONLY_USED | ONLY_GPR)),
+  ZERO_CALL_USED_REGS_OPT (all-gpr, ONLY_GPR),
+  ZERO_CALL_USED_REGS_OPT (used, ONLY_USED),
+  ZERO_CALL_USED_REGS_OPT (all, ALL),
+#undef ZERO_CALL_USED_REGS_OPT
+  {NULL, 0U}
+};
+
 /* A struct for describing a run of chars within a string.  */
 
 class string_fragment
@@ -1970,6 +1988,30 @@  parse_no_sanitize_attribute (char *value)
   return flags;
 }
 
+/* Parse -fzero-call-used-regs suboptions from ARG, return the FLAGS.  */
+
+unsigned int
+parse_zero_call_used_regs_options (const char *arg)
+{
+  bool found = false;
+  unsigned int flags = 0;
+  unsigned int i;
+
+  /* Check to see if the string matches a sub-option name.  */
+  for (i = 0; zero_call_used_regs_opts[i].name != NULL; ++i)
+    if (strcmp (arg, zero_call_used_regs_opts[i].name) == 0)
+      {
+	flags |= zero_call_used_regs_opts[i].flag;
+ 	found = true;
+	break;
+      }
+
+  if (!found)
+    error ("unrecognized argument to %<-fzero-call-used-regs=%>: %qs", arg);
+
+  return flags;
+}
+
 /* Parse -falign-NAME format for a FLAG value.  Return individual
    parsed integer values into RESULT_VALUES array.  If REPORT_ERROR is
    set, print error message at LOC location.  */
@@ -2601,6 +2643,11 @@  common_handle_option (struct gcc_options *opts,
       /* Automatically sets -ftree-loop-vectorize and
 	 -ftree-slp-vectorize.  Nothing more to do here.  */
       break;
+    case OPT_fzero_call_used_regs_:
+      opts->x_flag_zero_call_used_regs
+	= parse_zero_call_used_regs_options (arg);
+      break;
+
     case OPT_fshow_column:
       dc->show_column = value;
       break;
diff --git a/gcc/opts.h b/gcc/opts.h
index 8f594b4..7d1e126 100644
--- a/gcc/opts.h
+++ b/gcc/opts.h
@@ -444,6 +444,12 @@  extern const struct sanitizer_opts_s
   bool can_recover;
 } sanitizer_opts[];
 
+extern const struct zero_call_used_regs_opts_s
+{
+  const char *const name;
+  unsigned int flag;
+} zero_call_used_regs_opts[];
+
 extern vec<const char *> help_option_arguments;
 
 extern void add_misspelling_candidates (auto_vec<char *> *candidates,
diff --git a/gcc/passes.def b/gcc/passes.def
index f865bdc..77d4676 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -492,6 +492,7 @@  along with GCC; see the file COPYING3.  If not see
       POP_INSERT_PASSES ()
       NEXT_PASS (pass_late_compilation);
       PUSH_INSERT_PASSES_WITHIN (pass_late_compilation)
+	  NEXT_PASS (pass_zero_call_used_regs);
 	  NEXT_PASS (pass_compute_alignments);
 	  NEXT_PASS (pass_variable_tracking);
 	  NEXT_PASS (pass_free_cfg);
diff --git a/gcc/recog.c b/gcc/recog.c
index ce83b7f..e231b5d 100644
--- a/gcc/recog.c
+++ b/gcc/recog.c
@@ -923,6 +923,22 @@  validate_simplify_insn (rtx_insn *insn)
   return ((num_changes_pending () > 0) && (apply_change_group () > 0));
 }
 

+
+/* Check whether INSN matches a specific alternative of an .md pattern.  */
+bool
+valid_insn_p (rtx_insn *insn)
+{
+  recog_memoized (insn);
+  if (INSN_CODE (insn) < 0)
+    return false;
+  extract_insn (insn);
+  /* We don't know whether the insn will be in code that is optimized
+     for size or speed, so consider all enabled alternatives.  */
+  if (!constrain_operands (1, get_enabled_alternatives (insn)))
+    return false;
+  return true;
+}
+
 /* Return 1 if OP is a valid general operand for machine mode MODE.
    This is either a register reference, a memory reference,
    or a constant.  In the case of a memory reference, the address
diff --git a/gcc/recog.h b/gcc/recog.h
index ae3675f..d87456c 100644
--- a/gcc/recog.h
+++ b/gcc/recog.h
@@ -113,6 +113,7 @@  extern void validate_replace_src_group (rtx, rtx, rtx_insn *);
 extern bool validate_simplify_insn (rtx_insn *insn);
 extern int num_changes_pending (void);
 extern bool reg_fits_class_p (const_rtx, reg_class_t, int, machine_mode);
+extern bool valid_insn_p (rtx_insn *);
 
 extern int offsettable_memref_p (rtx);
 extern int offsettable_nonstrict_memref_p (rtx);
diff --git a/gcc/resource.c b/gcc/resource.c
index 0a9d594..90cf091 100644
--- a/gcc/resource.c
+++ b/gcc/resource.c
@@ -1186,7 +1186,7 @@  init_resource_info (rtx_insn *epilogue_insn)
 			       &end_of_function_needs, true);
 
   for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
-    if (global_regs[i] || EPILOGUE_USES (i))
+    if (global_regs[i] || df_epilogue_uses_p (i))
       SET_HARD_REG_BIT (end_of_function_needs.regs, i);
 
   /* The registers required to be live at the end of the function are
diff --git a/gcc/target.def b/gcc/target.def
index ed2da15..20e7f81 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -5080,6 +5080,21 @@  argument list due to stack realignment.  Return @code{NULL} if no DRAP\n\
 is needed.",
  rtx, (void), NULL)
 
+/* Generate instruction sequence to zero call used registers.  */
+DEFHOOK
+(zero_call_used_regs,
+ "This target hook emits instructions to zero subset of @var{selected_regs}\n\
+that could conceivably contain values that are useful to an attacker.\n\
+Return the set of registers that were actually cleared.\n\
+\n\
+The default implementation uses normal move instructions to zero\n\
+all the registers in @var{selected_regs}.  Define this hook if the\n\
+target has more efficient ways of zeroing certain registers,\n\
+or if you believe that certain registers would never contain\n\
+values that are useful to an attacker.",
+ HARD_REG_SET, (HARD_REG_SET selected_regs),
+default_zero_call_used_regs)
+
 /* Return true if all function parameters should be spilled to the
    stack.  */
 DEFHOOK
diff --git a/gcc/targhooks.c b/gcc/targhooks.c
index 5d94fce..88eef00 100644
--- a/gcc/targhooks.c
+++ b/gcc/targhooks.c
@@ -56,6 +56,9 @@  along with GCC; see the file COPYING3.  If not see
 #include "tree-ssa-alias.h"
 #include "gimple-expr.h"
 #include "memmodel.h"
+#include "backend.h"
+#include "emit-rtl.h"
+#include "df.h"
 #include "tm_p.h"
 #include "stringpool.h"
 #include "tree-vrp.h"
@@ -987,6 +990,35 @@  default_function_value_regno_p (const unsigned int regno ATTRIBUTE_UNUSED)
 #endif
 }
 
+/* The default hook for TARGET_ZERO_CALL_USED_REGS.  */
+
+HARD_REG_SET
+default_zero_call_used_regs (HARD_REG_SET need_zeroed_hardregs)
+{
+  gcc_assert (!hard_reg_set_empty_p (need_zeroed_hardregs));
+
+  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
+    if (TEST_HARD_REG_BIT (need_zeroed_hardregs, regno))
+      {
+	rtx_insn *last_insn = get_last_insn ();
+	machine_mode mode = GET_MODE (regno_reg_rtx[regno]);
+	rtx zero = CONST0_RTX (mode);
+	rtx_insn *insn = emit_move_insn (regno_reg_rtx[regno], zero);
+	if (!valid_insn_p (insn))
+	  {
+	    static bool issued_error;
+	    if (!issued_error)
+	      {
+		issued_error = true;
+		sorry ("%qs not supported on this target",
+			"fzero-call-used_regs");
+	      }
+	    delete_insns_since (last_insn);
+	  }
+      }
+  return need_zeroed_hardregs;
+}
+
 rtx
 default_internal_arg_pointer (void)
 {
diff --git a/gcc/targhooks.h b/gcc/targhooks.h
index 44ab926..e0a925f 100644
--- a/gcc/targhooks.h
+++ b/gcc/targhooks.h
@@ -160,6 +160,7 @@  extern unsigned int default_function_arg_round_boundary (machine_mode,
 							 const_tree);
 extern bool hook_bool_const_rtx_commutative_p (const_rtx, int);
 extern rtx default_function_value (const_tree, const_tree, bool);
+extern HARD_REG_SET default_zero_call_used_regs (HARD_REG_SET);
 extern rtx default_libcall_value (machine_mode, const_rtx);
 extern bool default_function_value_regno_p (const unsigned int);
 extern rtx default_internal_arg_pointer (void);
diff --git a/gcc/testsuite/c-c++-common/zero-scratch-regs-1.c b/gcc/testsuite/c-c++-common/zero-scratch-regs-1.c
new file mode 100644
index 0000000..f44add9
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/zero-scratch-regs-1.c
@@ -0,0 +1,15 @@ 
+/* { dg-do run } */
+/* { dg-options "-O2 -fzero-call-used-regs=all" } */
+
+volatile int result = 0;
+int 
+__attribute__((noinline))
+foo (int x)
+{
+  return x;
+}
+int main()
+{
+  result = foo (2);
+  return 0;
+}
diff --git a/gcc/testsuite/c-c++-common/zero-scratch-regs-2.c b/gcc/testsuite/c-c++-common/zero-scratch-regs-2.c
new file mode 100644
index 0000000..7c8350b
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/zero-scratch-regs-2.c
@@ -0,0 +1,16 @@ 
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+volatile int result = 0;
+int 
+__attribute__((noinline))
+__attribute__ ((zero_call_used_regs("all")))
+foo (int x)
+{
+  return x;
+}
+int main()
+{
+  result = foo (2);
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c
new file mode 100644
index 0000000..9f61dc4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c
@@ -0,0 +1,12 @@ 
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=used" } */
+
+void
+foo (void)
+{
+}
+
+/* { dg-final { scan-assembler-not "vzeroall" } } */
+/* { dg-final { scan-assembler-not "%xmm" } } */
+/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
+/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c
new file mode 100644
index 0000000..09048e5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c
@@ -0,0 +1,21 @@ 
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
+
+extern int foo (int) __attribute__ ((zero_call_used_regs("all-gpr")));
+
+int
+foo (int x)
+{
+  return x;
+}
+
+/* { dg-final { scan-assembler-not "vzeroall" } } */
+/* { dg-final { scan-assembler-not "%xmm" } } */
+/* { dg-final { scan-assembler "xorl\[ \t\]*%edx, %edx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %ecx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %esi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %edi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r8d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r9d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r10d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r11d" { target { ! ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c
new file mode 100644
index 0000000..4862688
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c
@@ -0,0 +1,39 @@ 
+/* { dg-do run { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=used-gpr" } */
+
+struct S { int i; };
+__attribute__((const, noinline, noclone))
+struct S foo (int x)
+{
+  struct S s;
+  s.i = x;
+  return s;
+}
+
+int a[2048], b[2048], c[2048], d[2048];
+struct S e[2048];
+
+__attribute__((noinline, noclone)) void
+bar (void)
+{
+  int i;
+  for (i = 0; i < 1024; i++)
+    {
+      e[i] = foo (i);
+      a[i+2] = a[i] + a[i+1];
+      b[10] = b[10] + i;
+      c[i] = c[2047 - i];
+      d[i] = d[i + 1];
+    }
+}
+
+int
+main ()
+{
+  int i;
+  bar ();
+  for (i = 0; i < 1024; i++)
+    if (e[i].i != i)
+      __builtin_abort ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c
new file mode 100644
index 0000000..500251b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c
@@ -0,0 +1,39 @@ 
+/* { dg-do run { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=all-gpr" } */
+
+struct S { int i; };
+__attribute__((const, noinline, noclone))
+struct S foo (int x)
+{
+  struct S s;
+  s.i = x;
+  return s;
+}
+
+int a[2048], b[2048], c[2048], d[2048];
+struct S e[2048];
+
+__attribute__((noinline, noclone)) void
+bar (void)
+{
+  int i;
+  for (i = 0; i < 1024; i++)
+    {
+      e[i] = foo (i);
+      a[i+2] = a[i] + a[i+1];
+      b[10] = b[10] + i;
+      c[i] = c[2047 - i];
+      d[i] = d[i + 1];
+    }
+}
+
+int
+main ()
+{
+  int i;
+  bar ();
+  for (i = 0; i < 1024; i++)
+    if (e[i].i != i)
+      __builtin_abort ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-13.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-13.c
new file mode 100644
index 0000000..8b058e3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-13.c
@@ -0,0 +1,21 @@ 
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7" } */
+
+void
+foo (void)
+{
+}
+
+/* { dg-final { scan-assembler-not "vzeroall" } } */
+/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm0, %xmm0" } } */
+/* { dg-final { scan-assembler-times "movaps\[ \t\]*%xmm0, %xmm\[0-9\]+" 7 { target { ia32 } } } } */
+/* { dg-final { scan-assembler-times "movaps\[ \t\]*%xmm0, %xmm\[0-9\]+" 15 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-14.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-14.c
new file mode 100644
index 0000000..d4eaaf7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-14.c
@@ -0,0 +1,19 @@ 
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7 -mavx" } */
+
+void
+foo (void)
+{
+}
+
+/* { dg-final { scan-assembler-times "vzeroall" 1 } } */
+/* { dg-final { scan-assembler-not "%xmm" } } */
+/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-15.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-15.c
new file mode 100644
index 0000000..dd3bb90
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-15.c
@@ -0,0 +1,14 @@ 
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
+
+extern void foo (void) __attribute__ ((zero_call_used_regs("used")));
+
+void
+foo (void)
+{
+}
+
+/* { dg-final { scan-assembler-not "vzeroall" } } */
+/* { dg-final { scan-assembler-not "%xmm" } } */
+/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
+/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-16.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-16.c
new file mode 100644
index 0000000..e2274f6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-16.c
@@ -0,0 +1,14 @@ 
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=all" } */
+
+extern void foo (void) __attribute__ ((zero_call_used_regs("skip")));
+
+void
+foo (void)
+{
+}
+
+/* { dg-final { scan-assembler-not "vzeroall" } } */
+/* { dg-final { scan-assembler-not "%xmm" } } */
+/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
+/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-17.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-17.c
new file mode 100644
index 0000000..7f5d153
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-17.c
@@ -0,0 +1,13 @@ 
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=used" } */
+
+int
+foo (int x)
+{
+  return x;
+}
+
+/* { dg-final { scan-assembler-not "vzeroall" } } */
+/* { dg-final { scan-assembler-not "%xmm" } } */
+/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" { target ia32 } } } */
+/* { dg-final { scan-assembler "xorl\[ \t\]*%edi, %edi" { target { ! ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-18.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-18.c
new file mode 100644
index 0000000..fe13d2b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-18.c
@@ -0,0 +1,13 @@ 
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=used -march=corei7" } */
+
+float
+foo (float z, float y, float x)
+{
+  return x + y;
+}
+
+/* { dg-final { scan-assembler-not "vzeroall" } } */
+/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm1, %xmm1" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movaps\[ \t\]*%xmm1, %xmm2" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-19.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-19.c
new file mode 100644
index 0000000..205a532
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-19.c
@@ -0,0 +1,12 @@ 
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=used -march=corei7" } */
+
+float
+foo (float z, float y, float x)
+{
+  return x;
+}
+
+/* { dg-final { scan-assembler-not "vzeroall" } } */
+/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm2, %xmm2" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c
new file mode 100644
index 0000000..e046684
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c
@@ -0,0 +1,19 @@ 
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=all-gpr" } */
+
+void
+foo (void)
+{
+}
+
+/* { dg-final { scan-assembler-not "vzeroall" } } */
+/* { dg-final { scan-assembler-not "%xmm" } } */
+/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-20.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-20.c
new file mode 100644
index 0000000..4be8ff6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-20.c
@@ -0,0 +1,23 @@ 
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7" } */
+
+float
+foo (float z, float y, float x)
+{
+  return x + y;
+}
+
+/* { dg-final { scan-assembler-not "vzeroall" } } */
+/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm0, %xmm0" { target { ia32 } } } } */
+/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm1, %xmm1" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "movaps\[ \t\]*%xmm0, %xmm\[0-9\]+" 7 { target { ia32 } } } } */
+/* { dg-final { scan-assembler-times "movaps\[ \t\]*%xmm1, %xmm\[0-9\]+" 14 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-21.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-21.c
new file mode 100644
index 0000000..0eb34e0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-21.c
@@ -0,0 +1,14 @@ 
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=skip -march=corei7" } */
+
+__attribute__ ((zero_call_used_regs("used")))
+float
+foo (float z, float y, float x)
+{
+  return x + y;
+}
+
+/* { dg-final { scan-assembler-not "vzeroall" } } */
+/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm1, %xmm1" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movaps\[ \t\]*%xmm1, %xmm2" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-22.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-22.c
new file mode 100644
index 0000000..0258c70
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-22.c
@@ -0,0 +1,21 @@ 
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7 -mavx" } */
+
+void
+foo (void)
+{
+}
+
+/* { dg-final { scan-assembler "vzeroall" } } */
+/* { dg-final { scan-assembler-times "fldz" 8 } } */
+/* { dg-final { scan-assembler-times "fstp" 8 } } */
+/* { dg-final { scan-assembler-not "%xmm" } } */
+/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-23.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-23.c
new file mode 100644
index 0000000..0625eb5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-23.c
@@ -0,0 +1,29 @@ 
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7 -mavx512f" } */
+
+void
+foo (void)
+{
+}
+
+/* { dg-final { scan-assembler "vzeroall" } } */
+/* { dg-final { scan-assembler-times "fldz" 8 } } */
+/* { dg-final { scan-assembler-times "fstp" 8 } } */
+/* { dg-final { scan-assembler-not "%xmm" } } */
+/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "kxorw\[ \t\]*%k0, %k0, %k0" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "kmovw\[ \t\]*%k0, %k1" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "kmovw\[ \t\]*%k0, %k2" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "kmovw\[ \t\]*%k0, %k3" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "kmovw\[ \t\]*%k0, %k4" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "kmovw\[ \t\]*%k0, %k5" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "kmovw\[ \t\]*%k0, %k6" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "kmovw\[ \t\]*%k0, %k7" { target { ! ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-24.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-24.c
new file mode 100644
index 0000000..208633e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-24.c
@@ -0,0 +1,10 @@ 
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=used-gpr-arg" } */
+
+int 
+foo (int x)
+{
+  return x;
+}
+
+/* { dg-final { scan-assembler "xorl\[ \t\]*%edi, %edi" } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-25.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-25.c
new file mode 100644
index 0000000..21e82c6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-25.c
@@ -0,0 +1,10 @@ 
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=used-arg" } */
+
+int 
+foo (int x)
+{
+  return x;
+}
+
+/* { dg-final { scan-assembler "xorl\[ \t\]*%edi, %edi" } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-26.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-26.c
new file mode 100644
index 0000000..293d2fe
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-26.c
@@ -0,0 +1,23 @@ 
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=all-arg" } */
+
+int 
+foo (int x)
+{
+  return x;
+}
+
+/* { dg-final { scan-assembler "xorl\[ \t\]*%edx, %edx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %ecx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %esi" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %edi" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r8d" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r9d" } } */
+/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm0, %xmm0" } } */
+/* { dg-final { scan-assembler "movaps\[ \t\]*%xmm0, %xmm1" } } */
+/* { dg-final { scan-assembler "movaps\[ \t\]*%xmm0, %xmm2" } } */
+/* { dg-final { scan-assembler "movaps\[ \t\]*%xmm0, %xmm3" } } */
+/* { dg-final { scan-assembler "movaps\[ \t\]*%xmm0, %xmm4" } } */
+/* { dg-final { scan-assembler "movaps\[ \t\]*%xmm0, %xmm5" } } */
+/* { dg-final { scan-assembler "movaps\[ \t\]*%xmm0, %xmm6" } } */
+/* { dg-final { scan-assembler "movaps\[ \t\]*%xmm0, %xmm7" } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-27.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-27.c
new file mode 100644
index 0000000..c34e6af
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-27.c
@@ -0,0 +1,15 @@ 
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=all-gpr-arg" } */
+
+int 
+foo (int x)
+{
+  return x;
+}
+
+/* { dg-final { scan-assembler "xorl\[ \t\]*%edx, %edx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %ecx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %esi" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %edi" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r8d" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r9d" } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c
new file mode 100644
index 0000000..de71223
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c
@@ -0,0 +1,12 @@ 
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
+
+void
+foo (void)
+{
+}
+
+/* { dg-final { scan-assembler-not "vzeroall" } } */
+/* { dg-final { scan-assembler-not "%xmm" } } */
+/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
+/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c
new file mode 100644
index 0000000..ccfa441
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c
@@ -0,0 +1,14 @@ 
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
+
+extern void foo (void) __attribute__ ((zero_call_used_regs("used-gpr")));
+
+void
+foo (void)
+{
+}
+
+/* { dg-final { scan-assembler-not "vzeroall" } } */
+/* { dg-final { scan-assembler-not "%xmm" } } */
+/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
+/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c
new file mode 100644
index 0000000..6b46ca3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c
@@ -0,0 +1,20 @@ 
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
+
+__attribute__ ((zero_call_used_regs("all-gpr")))
+void
+foo (void)
+{
+}
+
+/* { dg-final { scan-assembler-not "vzeroall" } } */
+/* { dg-final { scan-assembler-not "%xmm" } } */
+/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c
new file mode 100644
index 0000000..0680f38
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c
@@ -0,0 +1,14 @@ 
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=all-gpr" } */
+
+extern void foo (void) __attribute__ ((zero_call_used_regs("skip")));
+
+void
+foo (void)
+{
+}
+
+/* { dg-final { scan-assembler-not "vzeroall" } } */
+/* { dg-final { scan-assembler-not "%xmm" } } */
+/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
+/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c
new file mode 100644
index 0000000..534defa
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c
@@ -0,0 +1,13 @@ 
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=used-gpr" } */
+
+int
+foo (int x)
+{
+  return x;
+}
+
+/* { dg-final { scan-assembler-not "vzeroall" } } */
+/* { dg-final { scan-assembler-not "%xmm" } } */
+/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" { target ia32 } } } */
+/* { dg-final { scan-assembler "xorl\[ \t\]*%edi, %edi" { target { ! ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c
new file mode 100644
index 0000000..477bb19
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c
@@ -0,0 +1,19 @@ 
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=all-gpr" } */
+
+int
+foo (int x)
+{
+  return x;
+}
+
+/* { dg-final { scan-assembler-not "vzeroall" } } */
+/* { dg-final { scan-assembler-not "%xmm" } } */
+/* { dg-final { scan-assembler "xorl\[ \t\]*%edx, %edx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %ecx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %esi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %edi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r8d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r9d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r10d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r11d" { target { ! ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c
new file mode 100644
index 0000000..a305a60
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c
@@ -0,0 +1,15 @@ 
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
+
+extern int foo (int) __attribute__ ((zero_call_used_regs("used-gpr")));
+
+int
+foo (int x)
+{
+  return x;
+}
+
+/* { dg-final { scan-assembler-not "vzeroall" } } */
+/* { dg-final { scan-assembler-not "%xmm" } } */
+/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" { target ia32 } } } */
+/* { dg-final { scan-assembler "xorl\[ \t\]*%edi, %edi" { target { ! ia32 } } } } */
diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index 62e5b69..8afe8ee 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -592,6 +592,7 @@  extern rtl_opt_pass *make_pass_gcse2 (gcc::context *ctxt);
 extern rtl_opt_pass *make_pass_split_after_reload (gcc::context *ctxt);
 extern rtl_opt_pass *make_pass_thread_prologue_and_epilogue (gcc::context
 							     *ctxt);
+extern rtl_opt_pass *make_pass_zero_call_used_regs (gcc::context *ctxt);
 extern rtl_opt_pass *make_pass_stack_adjustments (gcc::context *ctxt);
 extern rtl_opt_pass *make_pass_sched_fusion (gcc::context *ctxt);
 extern rtl_opt_pass *make_pass_peephole2 (gcc::context *ctxt);