Patchwork [1/4] Add the -ftree-loop-if-convert flag.

login
register
mail settings
Submitter Sebastian Pop
Date July 7, 2010, 10:52 p.m.
Message ID <1278543168-11395-1-git-send-email-sebpop@gmail.com>
Download mbox | patch
Permalink /patch/58203/
State New
Headers show

Comments

Sebastian Pop - July 7, 2010, 10:52 p.m.
* common.opt (ftree-loop-if-convert): New flag.
	* doc/invoke.texi (ftree-loop-if-convert): Documented.
	* tree-if-conv.c (gate_tree_if_conversion): Enable if-conversion
	when flag_tree_loop_if_convert is set.
---
 gcc/common.opt      |    4 ++++
 gcc/doc/invoke.texi |   14 ++++++++++----
 gcc/tree-if-conv.c  |    6 +++++-
 3 files changed, 19 insertions(+), 5 deletions(-)
Richard Guenther - July 8, 2010, 9:01 a.m.
On Wed, 7 Jul 2010, Sebastian Pop wrote:

> 	* common.opt (ftree-loop-if-convert): New flag.
> 	* doc/invoke.texi (ftree-loop-if-convert): Documented.
> 	* tree-if-conv.c (gate_tree_if_conversion): Enable if-conversion
> 	when flag_tree_loop_if_convert is set.
> ---
>  gcc/common.opt      |    4 ++++
>  gcc/doc/invoke.texi |   14 ++++++++++----
>  gcc/tree-if-conv.c  |    6 +++++-
>  3 files changed, 19 insertions(+), 5 deletions(-)
> 
> diff --git a/gcc/common.opt b/gcc/common.opt
> index 6ca787a..111d7b7 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -653,6 +653,10 @@ fif-conversion2
>  Common Report Var(flag_if_conversion2) Optimization
>  Perform conversion of conditional jumps to conditional execution
>  
> +ftree-loop-if-convert
> +Common Report Var(flag_tree_loop_if_convert) Init(-1) Optimization
> +Convert conditional jumps in innermost loops to branchless equivalents
> +
>  ; -finhibit-size-directive inhibits output of .size for ELF.
>  ; This is used only for compiling crtstuff.c,
>  ; and it may be extended to other effects
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index d70f130..0847e01 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -342,7 +342,7 @@ Objective-C and Objective-C++ Dialects}.
>  -fearly-inlining -fipa-sra -fexpensive-optimizations -ffast-math @gol
>  -ffinite-math-only -ffloat-store -fexcess-precision=@var{style} @gol
>  -fforward-propagate -ffunction-sections @gol
> --fgcse -fgcse-after-reload -fgcse-las -fgcse-lm @gol
> +-fgcse -fgcse-after-reload -fgcse-las -fgcse-lm -fgraphite-identity @gol
>  -fgcse-sm -fif-conversion -fif-conversion2 -findirect-inlining @gol
>  -finline-functions -finline-functions-called-once -finline-limit=@var{n} @gol
>  -finline-small-functions -fipa-cp -fipa-cp-clone -fipa-matrix-reorg -fipa-pta @gol
> @@ -352,7 +352,7 @@ Objective-C and Objective-C++ Dialects}.
>  -fira-loop-pressure -fno-ira-share-save-slots @gol
>  -fno-ira-share-spill-slots -fira-verbose=@var{n} @gol
>  -fivopts -fkeep-inline-functions -fkeep-static-consts @gol
> --floop-block -floop-interchange -floop-strip-mine -fgraphite-identity @gol
> +-floop-block -floop-interchange -floop-strip-mine @gol
>  -floop-parallelize-all -flto -flto-compression-level -flto-report -fltrans @gol
>  -fltrans-output-list -fmerge-all-constants -fmerge-constants -fmodulo-sched @gol
>  -fmodulo-sched-allow-regmoves -fmove-loop-invariants -fmudflap @gol
> @@ -382,8 +382,8 @@ Objective-C and Objective-C++ Dialects}.
>  -fsplit-wide-types -fstack-protector -fstack-protector-all @gol
>  -fstrict-aliasing -fstrict-overflow -fthread-jumps -ftracer @gol
>  -ftree-builtin-call-dce -ftree-ccp -ftree-ch -ftree-copy-prop @gol
> --ftree-copyrename -ftree-dce @gol
> --ftree-dominator-opts -ftree-dse -ftree-forwprop -ftree-fre -ftree-loop-im @gol
> +-ftree-copyrename -ftree-dce -ftree-dominator-opts -ftree-dse @gol
> +-ftree-forwprop -ftree-fre -ftree-loop-if-convert -ftree-loop-im @gol
>  -ftree-phiprop -ftree-loop-distribution @gol
>  -ftree-loop-ivcanon -ftree-loop-linear -ftree-loop-optimize @gol
>  -ftree-parallelize-loops=@var{n} -ftree-pre -ftree-pta -ftree-reassoc @gol
> @@ -6883,6 +6883,12 @@ profitable to parallelize the loops.
>  Compare the results of several data dependence analyzers.  This option
>  is used for debugging the data dependence analyzers.
>  
> +@item -ftree-loop-if-convert
> +Attempt to transform conditional jumps in the innermost loops to
> +branch-less equivalents.  The intent is to remove control-flow from
> +the innermost loops in order to improve the ability of the
> +auto-vectorization pass to handle these loops.
> +

Please state that this is enabled by default if vectorization is enabled.

>  @item -ftree-loop-distribution
>  Perform loop distribution.  This flag can improve cache performance on
>  big loop bodies and allow further loop optimizations, like
> diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
> index 8d5d226..873cd89 100644
> --- a/gcc/tree-if-conv.c
> +++ b/gcc/tree-if-conv.c
> @@ -1242,7 +1242,11 @@ main_tree_if_conversion (void)
>  static bool
>  gate_tree_if_conversion (void)
>  {
> -  return flag_tree_vectorize != 0;
> +  if (flag_tree_vectorize
> +      && flag_tree_loop_if_convert < 0)
> +    flag_tree_loop_if_convert = 1;

Err, no.  This should be

  return ((flag_tree_vectorize && flag_tree_loop_if_convert != 0)
          || flag_tree_loop_if_convert == 1);

not set flag_tree_loop_if_convert here.

But on a 2nd thought please follow what -ftree-cselim does, do
Init(2) (ISTR -1 is now problematic for some reason), and in
process_options () set flag_tree_loop_if_convert if it is
equal to AUTODETECT_VALUE (2) to the setting of flag_tree_vectorize.

The gate function then simply can return flag_tree_loop_if_convert.

Ok with that change.

Thanks,
Richard.
Sebastian Pop - July 8, 2010, 4:39 p.m.
On Thu, Jul 8, 2010 at 04:01, Richard Guenther <rguenther@suse.de> wrote:
> On Wed, 7 Jul 2010, Sebastian Pop wrote:
>
>>       * common.opt (ftree-loop-if-convert): New flag.
>>       * doc/invoke.texi (ftree-loop-if-convert): Documented.
>>       * tree-if-conv.c (gate_tree_if_conversion): Enable if-conversion
>>       when flag_tree_loop_if_convert is set.
>> ---
>>  gcc/common.opt      |    4 ++++
>>  gcc/doc/invoke.texi |   14 ++++++++++----
>>  gcc/tree-if-conv.c  |    6 +++++-
>>  3 files changed, 19 insertions(+), 5 deletions(-)
>>
>> diff --git a/gcc/common.opt b/gcc/common.opt
>> index 6ca787a..111d7b7 100644
>> --- a/gcc/common.opt
>> +++ b/gcc/common.opt
>> @@ -653,6 +653,10 @@ fif-conversion2
>>  Common Report Var(flag_if_conversion2) Optimization
>>  Perform conversion of conditional jumps to conditional execution
>>
>> +ftree-loop-if-convert
>> +Common Report Var(flag_tree_loop_if_convert) Init(-1) Optimization
>> +Convert conditional jumps in innermost loops to branchless equivalents
>> +
>>  ; -finhibit-size-directive inhibits output of .size for ELF.
>>  ; This is used only for compiling crtstuff.c,
>>  ; and it may be extended to other effects
>> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
>> index d70f130..0847e01 100644
>> --- a/gcc/doc/invoke.texi
>> +++ b/gcc/doc/invoke.texi
>> @@ -342,7 +342,7 @@ Objective-C and Objective-C++ Dialects}.
>>  -fearly-inlining -fipa-sra -fexpensive-optimizations -ffast-math @gol
>>  -ffinite-math-only -ffloat-store -fexcess-precision=@var{style} @gol
>>  -fforward-propagate -ffunction-sections @gol
>> --fgcse -fgcse-after-reload -fgcse-las -fgcse-lm @gol
>> +-fgcse -fgcse-after-reload -fgcse-las -fgcse-lm -fgraphite-identity @gol
>>  -fgcse-sm -fif-conversion -fif-conversion2 -findirect-inlining @gol
>>  -finline-functions -finline-functions-called-once -finline-limit=@var{n} @gol
>>  -finline-small-functions -fipa-cp -fipa-cp-clone -fipa-matrix-reorg -fipa-pta @gol
>> @@ -352,7 +352,7 @@ Objective-C and Objective-C++ Dialects}.
>>  -fira-loop-pressure -fno-ira-share-save-slots @gol
>>  -fno-ira-share-spill-slots -fira-verbose=@var{n} @gol
>>  -fivopts -fkeep-inline-functions -fkeep-static-consts @gol
>> --floop-block -floop-interchange -floop-strip-mine -fgraphite-identity @gol
>> +-floop-block -floop-interchange -floop-strip-mine @gol
>>  -floop-parallelize-all -flto -flto-compression-level -flto-report -fltrans @gol
>>  -fltrans-output-list -fmerge-all-constants -fmerge-constants -fmodulo-sched @gol
>>  -fmodulo-sched-allow-regmoves -fmove-loop-invariants -fmudflap @gol
>> @@ -382,8 +382,8 @@ Objective-C and Objective-C++ Dialects}.
>>  -fsplit-wide-types -fstack-protector -fstack-protector-all @gol
>>  -fstrict-aliasing -fstrict-overflow -fthread-jumps -ftracer @gol
>>  -ftree-builtin-call-dce -ftree-ccp -ftree-ch -ftree-copy-prop @gol
>> --ftree-copyrename -ftree-dce @gol
>> --ftree-dominator-opts -ftree-dse -ftree-forwprop -ftree-fre -ftree-loop-im @gol
>> +-ftree-copyrename -ftree-dce -ftree-dominator-opts -ftree-dse @gol
>> +-ftree-forwprop -ftree-fre -ftree-loop-if-convert -ftree-loop-im @gol
>>  -ftree-phiprop -ftree-loop-distribution @gol
>>  -ftree-loop-ivcanon -ftree-loop-linear -ftree-loop-optimize @gol
>>  -ftree-parallelize-loops=@var{n} -ftree-pre -ftree-pta -ftree-reassoc @gol
>> @@ -6883,6 +6883,12 @@ profitable to parallelize the loops.
>>  Compare the results of several data dependence analyzers.  This option
>>  is used for debugging the data dependence analyzers.
>>
>> +@item -ftree-loop-if-convert
>> +Attempt to transform conditional jumps in the innermost loops to
>> +branch-less equivalents.  The intent is to remove control-flow from
>> +the innermost loops in order to improve the ability of the
>> +auto-vectorization pass to handle these loops.
>> +
>
> Please state that this is enabled by default if vectorization is enabled.
>
>>  @item -ftree-loop-distribution
>>  Perform loop distribution.  This flag can improve cache performance on
>>  big loop bodies and allow further loop optimizations, like
>> diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
>> index 8d5d226..873cd89 100644
>> --- a/gcc/tree-if-conv.c
>> +++ b/gcc/tree-if-conv.c
>> @@ -1242,7 +1242,11 @@ main_tree_if_conversion (void)
>>  static bool
>>  gate_tree_if_conversion (void)
>>  {
>> -  return flag_tree_vectorize != 0;
>> +  if (flag_tree_vectorize
>> +      && flag_tree_loop_if_convert < 0)
>> +    flag_tree_loop_if_convert = 1;
>
> Err, no.  This should be
>
>  return ((flag_tree_vectorize && flag_tree_loop_if_convert != 0)
>          || flag_tree_loop_if_convert == 1);
>
> not set flag_tree_loop_if_convert here.
>
> But on a 2nd thought please follow what -ftree-cselim does, do
> Init(2) (ISTR -1 is now problematic for some reason), and in
> process_options () set flag_tree_loop_if_convert if it is
> equal to AUTODETECT_VALUE (2) to the setting of flag_tree_vectorize.
>
> The gate function then simply can return flag_tree_loop_if_convert.
>
> Ok with that change.
>

Committed r161963.

Patch

diff --git a/gcc/common.opt b/gcc/common.opt
index 6ca787a..111d7b7 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -653,6 +653,10 @@  fif-conversion2
 Common Report Var(flag_if_conversion2) Optimization
 Perform conversion of conditional jumps to conditional execution
 
+ftree-loop-if-convert
+Common Report Var(flag_tree_loop_if_convert) Init(-1) Optimization
+Convert conditional jumps in innermost loops to branchless equivalents
+
 ; -finhibit-size-directive inhibits output of .size for ELF.
 ; This is used only for compiling crtstuff.c,
 ; and it may be extended to other effects
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index d70f130..0847e01 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -342,7 +342,7 @@  Objective-C and Objective-C++ Dialects}.
 -fearly-inlining -fipa-sra -fexpensive-optimizations -ffast-math @gol
 -ffinite-math-only -ffloat-store -fexcess-precision=@var{style} @gol
 -fforward-propagate -ffunction-sections @gol
--fgcse -fgcse-after-reload -fgcse-las -fgcse-lm @gol
+-fgcse -fgcse-after-reload -fgcse-las -fgcse-lm -fgraphite-identity @gol
 -fgcse-sm -fif-conversion -fif-conversion2 -findirect-inlining @gol
 -finline-functions -finline-functions-called-once -finline-limit=@var{n} @gol
 -finline-small-functions -fipa-cp -fipa-cp-clone -fipa-matrix-reorg -fipa-pta @gol
@@ -352,7 +352,7 @@  Objective-C and Objective-C++ Dialects}.
 -fira-loop-pressure -fno-ira-share-save-slots @gol
 -fno-ira-share-spill-slots -fira-verbose=@var{n} @gol
 -fivopts -fkeep-inline-functions -fkeep-static-consts @gol
--floop-block -floop-interchange -floop-strip-mine -fgraphite-identity @gol
+-floop-block -floop-interchange -floop-strip-mine @gol
 -floop-parallelize-all -flto -flto-compression-level -flto-report -fltrans @gol
 -fltrans-output-list -fmerge-all-constants -fmerge-constants -fmodulo-sched @gol
 -fmodulo-sched-allow-regmoves -fmove-loop-invariants -fmudflap @gol
@@ -382,8 +382,8 @@  Objective-C and Objective-C++ Dialects}.
 -fsplit-wide-types -fstack-protector -fstack-protector-all @gol
 -fstrict-aliasing -fstrict-overflow -fthread-jumps -ftracer @gol
 -ftree-builtin-call-dce -ftree-ccp -ftree-ch -ftree-copy-prop @gol
--ftree-copyrename -ftree-dce @gol
--ftree-dominator-opts -ftree-dse -ftree-forwprop -ftree-fre -ftree-loop-im @gol
+-ftree-copyrename -ftree-dce -ftree-dominator-opts -ftree-dse @gol
+-ftree-forwprop -ftree-fre -ftree-loop-if-convert -ftree-loop-im @gol
 -ftree-phiprop -ftree-loop-distribution @gol
 -ftree-loop-ivcanon -ftree-loop-linear -ftree-loop-optimize @gol
 -ftree-parallelize-loops=@var{n} -ftree-pre -ftree-pta -ftree-reassoc @gol
@@ -6883,6 +6883,12 @@  profitable to parallelize the loops.
 Compare the results of several data dependence analyzers.  This option
 is used for debugging the data dependence analyzers.
 
+@item -ftree-loop-if-convert
+Attempt to transform conditional jumps in the innermost loops to
+branch-less equivalents.  The intent is to remove control-flow from
+the innermost loops in order to improve the ability of the
+auto-vectorization pass to handle these loops.
+
 @item -ftree-loop-distribution
 Perform loop distribution.  This flag can improve cache performance on
 big loop bodies and allow further loop optimizations, like
diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
index 8d5d226..873cd89 100644
--- a/gcc/tree-if-conv.c
+++ b/gcc/tree-if-conv.c
@@ -1242,7 +1242,11 @@  main_tree_if_conversion (void)
 static bool
 gate_tree_if_conversion (void)
 {
-  return flag_tree_vectorize != 0;
+  if (flag_tree_vectorize
+      && flag_tree_loop_if_convert < 0)
+    flag_tree_loop_if_convert = 1;
+
+  return flag_tree_loop_if_convert > 0;
 }
 
 struct gimple_opt_pass pass_if_conversion =