middle-end/113680 - Optimize (x - y) CMP 0 as x CMP y

Message ID	20240304203830.3740968-1-kmatsui@gcc.gnu.org
State	New
Headers	show Return-Path: <gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org> DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org B0EA03858D28 From: Ken Matsui <kmatsui@gcc.gnu.org> To: gcc-patches@gcc.gnu.org Cc: Ken Matsui <kmatsui@gcc.gnu.org> Subject: [PATCH] middle-end/113680 - Optimize (x - y) CMP 0 as x CMP y Date: Mon, 4 Mar 2024 12:38:30 -0800 Message-ID: <20240304203830.3740968-1-kmatsui@gcc.gnu.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: list Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org
Series	middle-end/113680 - Optimize (x - y) CMP 0 as x CMP y \| expand middle-end/113680 - Optimize (x - y) CMP 0 as x CMP y

Ken Matsui March 4, 2024, 8:38 p.m. UTC

(x - y) CMP 0 is equivalent to x CMP y where x and y are signed
integers and CMP is <, <=, >, or >=.  Similarly, 0 CMP (x - y) is
equivalent to y CMP x.  As reported in PR middle-end/113680, this
equivalence does not hold for types other than signed integers.  When
it comes to conditions, the former was translated to a combination of
sub and test, whereas the latter was translated to a single cmp.
Thus, this optimization pass tries to optimize the former to the
latter.

When `-fwrapv` is enabled, GCC treats the overflow of signed integers
as defined behavior, specifically, wrapping around according to two's
complement arithmetic.  This has implications for optimizations that
rely on the standard behavior of signed integers, where overflow is
undefined.  Consider the example given:

	long long llmax = __LONG_LONG_MAX__;
	long long llmin = -llmax - 1;

Here, `llmax - llmin` effectively becomes `llmax - (-llmax - 1)`, which
simplifies to `2 * llmax + 1`.  Given that `llmax` is the maximum value
for a `long long`, this calculation overflows in a defined manner
(wrapping around), which under `-fwrapv` is a legal operation that
produces a negative value due to two's complement wraparound.
Therefore, `llmax - llmin < 0` is true.

However, the direct comparison `llmax < llmin` is false since `llmax`
is the maximum possible value and `llmin` is the minimum.  Hence,
optimizations that rely on the equivalence of `(x - y) CMP 0` to
`x CMP y` (and vice versa) cannot be safely applied when `-fwrapv` is
enabled.  This is why this optimization pass is disabled under
`-fwrapv`.

This optimization pass must run before the Jump Threading pass and the
VRP pass, as it may modify conditions. For example, in the VRP pass:

	(1)
	  int diff = x - y;
	  if (diff > 0)
	    foo();
	  if (diff < 0)
	    bar();

The second condition would be converted to diff != 0 in the VRP pass
because we know the postcondition of the first condition is diff <= 0,
and then diff != 0 is cheaper than diff < 0. If we apply this pass
after this VRP, we get:

	(2)
	  int diff = x - y;
	  if (x > y)
	    foo();
	  if (diff != 0)
	    bar();

This generates sub and test for the second condition and cmp for the
first condition. However, if we apply this pass beforehand, we simply
get:

	(3)
	  int diff = x - y;
	  if (x > y)
	    foo();
	  if (x < y)
	    bar();

In this code, diff will be eliminated as a dead code, and sub and test
will not be generated, which is more efficient.

For the Jump Threading pass, without this optimization pass, (1) and
(3) above are recognized as different, which prevents TCO.

	PR middle-end/113680

gcc/ChangeLog:

	* Makefile.in: Add tree-ssa-cmp.o to OBJS.
	* common.opt: Define ftree-cmp
	* doc/invoke.texi: Document ftree-cmp.
	* opts.cc (default_options_table): Handle OPT_ftree_cmp.
	* passes.def (pass_cmp): New optimization pass.
	* timevar.def (TV_TREE_CMP): New variable for timing.
	* tree-pass.h (make_pass_cmp): New declaration.
	* tree-ssa-cmp.cc: New file.

gcc/testsuite/ChangeLog:

	* gcc.dg/pr113680.c: New test.

Signed-off-by: Ken Matsui <kmatsui@gcc.gnu.org>
---
 gcc/Makefile.in                 |   1 +
 gcc/common.opt                  |   4 +
 gcc/doc/invoke.texi             |  11 +-
 gcc/opts.cc                     |   1 +
 gcc/passes.def                  |   3 +
 gcc/testsuite/gcc.dg/pr113680.c |  47 ++++++
 gcc/timevar.def                 |   1 +
 gcc/tree-pass.h                 |   1 +
 gcc/tree-ssa-cmp.cc             | 262 ++++++++++++++++++++++++++++++++
 9 files changed, 330 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr113680.c
 create mode 100644 gcc/tree-ssa-cmp.cc

Richard Biener March 5, 2024, 8:38 a.m. UTC | #1

On Mon, Mar 4, 2024 at 9:40 PM Ken Matsui <kmatsui@gcc.gnu.org> wrote:
>
> (x - y) CMP 0 is equivalent to x CMP y where x and y are signed
> integers and CMP is <, <=, >, or >=.  Similarly, 0 CMP (x - y) is
> equivalent to y CMP x.  As reported in PR middle-end/113680, this
> equivalence does not hold for types other than signed integers.  When
> it comes to conditions, the former was translated to a combination of
> sub and test, whereas the latter was translated to a single cmp.
> Thus, this optimization pass tries to optimize the former to the
> latter.
>
> When `-fwrapv` is enabled, GCC treats the overflow of signed integers
> as defined behavior, specifically, wrapping around according to two's
> complement arithmetic.  This has implications for optimizations that
> rely on the standard behavior of signed integers, where overflow is
> undefined.  Consider the example given:
>
>         long long llmax = __LONG_LONG_MAX__;
>         long long llmin = -llmax - 1;
>
> Here, `llmax - llmin` effectively becomes `llmax - (-llmax - 1)`, which
> simplifies to `2 * llmax + 1`.  Given that `llmax` is the maximum value
> for a `long long`, this calculation overflows in a defined manner
> (wrapping around), which under `-fwrapv` is a legal operation that
> produces a negative value due to two's complement wraparound.
> Therefore, `llmax - llmin < 0` is true.
>
> However, the direct comparison `llmax < llmin` is false since `llmax`
> is the maximum possible value and `llmin` is the minimum.  Hence,
> optimizations that rely on the equivalence of `(x - y) CMP 0` to
> `x CMP y` (and vice versa) cannot be safely applied when `-fwrapv` is
> enabled.  This is why this optimization pass is disabled under
> `-fwrapv`.
>
> This optimization pass must run before the Jump Threading pass and the
> VRP pass, as it may modify conditions. For example, in the VRP pass:
>
>         (1)
>           int diff = x - y;
>           if (diff > 0)
>             foo();
>           if (diff < 0)
>             bar();
>
> The second condition would be converted to diff != 0 in the VRP pass
> because we know the postcondition of the first condition is diff <= 0,
> and then diff != 0 is cheaper than diff < 0. If we apply this pass
> after this VRP, we get:
>
>         (2)
>           int diff = x - y;
>           if (x > y)
>             foo();
>           if (diff != 0)
>             bar();
>
> This generates sub and test for the second condition and cmp for the
> first condition. However, if we apply this pass beforehand, we simply
> get:
>
>         (3)
>           int diff = x - y;
>           if (x > y)
>             foo();
>           if (x < y)
>             bar();
>
> In this code, diff will be eliminated as a dead code, and sub and test
> will not be generated, which is more efficient.
>
> For the Jump Threading pass, without this optimization pass, (1) and
> (3) above are recognized as different, which prevents TCO.
>
>         PR middle-end/113680

This shouldn't be done as a new optimization pass.  It fits either
the explicit code present in the forwprop pass or a new match.pd
pattern.  There's possible interaction with x - y value being used
elsewhere and thus exposing a CSE opportunity as well as
a comparison against zero being possibly implemented by
a flag setting subtraction instruction.

Our VN pass has some tricks to anticipate CSE opportunities
like this, but it's not done "properly" in the way of anticipating
both forms during PRE.

I'll note we have

 /* (A - B) != 0 ? (A - B) : (B - A)    same as (A - B) */
 (for cmp (ne ltgt)

and similar which might be confused by canonicalizing to A != B.
I'm also surprised we don't already have the pattern you add.

Richard.

> gcc/ChangeLog:
>
>         * Makefile.in: Add tree-ssa-cmp.o to OBJS.
>         * common.opt: Define ftree-cmp
>         * doc/invoke.texi: Document ftree-cmp.
>         * opts.cc (default_options_table): Handle OPT_ftree_cmp.
>         * passes.def (pass_cmp): New optimization pass.
>         * timevar.def (TV_TREE_CMP): New variable for timing.
>         * tree-pass.h (make_pass_cmp): New declaration.
>         * tree-ssa-cmp.cc: New file.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.dg/pr113680.c: New test.
>
> Signed-off-by: Ken Matsui <kmatsui@gcc.gnu.org>
> ---
>  gcc/Makefile.in                 |   1 +
>  gcc/common.opt                  |   4 +
>  gcc/doc/invoke.texi             |  11 +-
>  gcc/opts.cc                     |   1 +
>  gcc/passes.def                  |   3 +
>  gcc/testsuite/gcc.dg/pr113680.c |  47 ++++++
>  gcc/timevar.def                 |   1 +
>  gcc/tree-pass.h                 |   1 +
>  gcc/tree-ssa-cmp.cc             | 262 ++++++++++++++++++++++++++++++++
>  9 files changed, 330 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.dg/pr113680.c
>  create mode 100644 gcc/tree-ssa-cmp.cc
>
> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> index a74761b7ab3..935b80b6947 100644
> --- a/gcc/Makefile.in
> +++ b/gcc/Makefile.in
> @@ -1731,6 +1731,7 @@ OBJS = \
>         tree-ssa-address.o \
>         tree-ssa-alias.o \
>         tree-ssa-ccp.o \
> +       tree-ssa-cmp.o \
>         tree-ssa-coalesce.o \
>         tree-ssa-copy.o \
>         tree-ssa-dce.o \
> diff --git a/gcc/common.opt b/gcc/common.opt
> index 51c4a17da83..7c853224458 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -3053,6 +3053,10 @@ ftree-ch
>  Common Var(flag_tree_ch) Optimization
>  Enable loop header copying on trees.
>
> +ftree-cmp
> +Common Var(flag_tree_cmp) Optimization
> +Enable SSA comparison optimization on trees.
> +
>  ftree-coalesce-inlined-vars
>  Common Ignore RejectNegative
>  Does nothing.  Preserved for backward compatibility.
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index bdf05be387d..04762d490a3 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -619,7 +619,7 @@ Objective-C and Objective-C++ Dialects}.
>  -fsplit-wide-types  -fsplit-wide-types-early  -fssa-backprop  -fssa-phiopt
>  -fstdarg-opt  -fstore-merging  -fstrict-aliasing -fipa-strict-aliasing
>  -fthread-jumps  -ftracer  -ftree-bit-ccp
> --ftree-builtin-call-dce  -ftree-ccp  -ftree-ch
> +-ftree-builtin-call-dce  -ftree-ccp  -ftree-ch  -ftree-cmp
>  -ftree-coalesce-vars  -ftree-copy-prop  -ftree-dce  -ftree-dominator-opts
>  -ftree-dse  -ftree-forwprop  -ftree-fre  -fcode-hoisting
>  -ftree-loop-if-convert  -ftree-loop-im
> @@ -12377,6 +12377,7 @@ compilation time.
>  -ftree-bit-ccp
>  -ftree-ccp
>  -ftree-ch
> +-ftree-cmp
>  -ftree-coalesce-vars
>  -ftree-copy-prop
>  -ftree-dce
> @@ -13707,6 +13708,14 @@ effectiveness of code motion optimizations.  It also saves one jump.  This flag
>  is enabled by default at @option{-O1} and higher.  It is not enabled
>  for @option{-Os}, since it usually increases code size.
>
> +@opindex ftree-cmp
> +@item -ftree-cmp
> +Perform comparison optimization on trees.  This decreases unnecessary
> +instructions for comparisons.  This flag is enabled by default at
> +@option{-O1} and higher.  This optimization is automatically disabled when
> +@option{-fwrapv} is specified to ensure correct behavior under signed integer
> +overflow wrapping.
> +
>  @opindex ftree-loop-optimize
>  @item -ftree-loop-optimize
>  Perform loop optimizations on trees.  This flag is enabled by default
> diff --git a/gcc/opts.cc b/gcc/opts.cc
> index 3333600e0ea..9ee64a48313 100644
> --- a/gcc/opts.cc
> +++ b/gcc/opts.cc
> @@ -589,6 +589,7 @@ static const struct default_options default_options_table[] =
>      { OPT_LEVELS_1_PLUS, OPT_ftree_builtin_call_dce, NULL, 1 },
>      { OPT_LEVELS_1_PLUS, OPT_ftree_ccp, NULL, 1 },
>      { OPT_LEVELS_1_PLUS, OPT_ftree_ch, NULL, 1 },
> +    { OPT_LEVELS_1_PLUS, OPT_ftree_cmp, NULL, 1 },
>      { OPT_LEVELS_1_PLUS, OPT_ftree_coalesce_vars, NULL, 1 },
>      { OPT_LEVELS_1_PLUS, OPT_ftree_copy_prop, NULL, 1 },
>      { OPT_LEVELS_1_PLUS, OPT_ftree_dce, NULL, 1 },
> diff --git a/gcc/passes.def b/gcc/passes.def
> index 1cbbd413097..e28a2d3c309 100644
> --- a/gcc/passes.def
> +++ b/gcc/passes.def
> @@ -84,6 +84,9 @@ along with GCC; see the file COPYING3.  If not see
>           /* After CCP we rewrite no longer addressed locals into SSA
>              form if possible.  */
>           NEXT_PASS (pass_forwprop);
> +         /* Do cmp before early_thread_jumps and vrp since it may
> +            modify conditions.  */
> +         NEXT_PASS (pass_cmp);
>            NEXT_PASS (pass_early_thread_jumps, /*first=*/true);
>           NEXT_PASS (pass_sra_early);
>           /* pass_build_ealias is a dummy pass that ensures that we
> diff --git a/gcc/testsuite/gcc.dg/pr113680.c b/gcc/testsuite/gcc.dg/pr113680.c
> new file mode 100644
> index 00000000000..59cd865aa21
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr113680.c
> @@ -0,0 +1,47 @@
> +/* { dg-do compile } */
> +/* { dg-options "-Os -fdump-tree-optimized" } */
> +
> +volatile int i = 0;
> +
> +void foo() { i = 1; }
> +void bar() { i = 2; }
> +
> +void f1(int x, int y)
> +{
> +  int diff = x - y;
> +  if (diff > 0)
> +    foo();
> +  if (diff < 0)
> +    bar();
> +}
> +
> +void f2(int x, int y)
> +{
> +  if ((x - y) > 0)
> +    foo();
> +  if ((x - y) < 0)
> +    bar();
> +}
> +
> +void f3(int x, int y)
> +{
> +  if (x > y)
> +    foo();
> +  if (x < y)
> +    bar();
> +}
> +
> +void f4(int x, int y)
> +{
> +  int diff = x - y;
> +  if (diff > 0)
> +    foo();
> +  if (x < y)
> +    bar();
> +}
> +
> +/* { dg-final { scan-tree-dump-not " - " "optimized" } } */
> +/* { dg-final { scan-assembler-times "cmp" 1 { target { i?86-*-* x86_64-*-* } } } } */
> +/* { dg-final { scan-assembler-times "jmp\[ \t\]+f1" 3 { target { i?86-*-* x86_64-*-* } } } } */
> +/* { dg-final { scan-assembler-not "sub" { target { i?86-*-* x86_64-*-* } } } } */
> +/* { dg-final { scan-assembler-not "test" { target { i?86-*-* x86_64-*-* } } } } */
> diff --git a/gcc/timevar.def b/gcc/timevar.def
> index 8e2168e0817..468266d0c56 100644
> --- a/gcc/timevar.def
> +++ b/gcc/timevar.def
> @@ -157,6 +157,7 @@ DEFTIMEVAR (TV_TREE_EH                   , "tree eh")
>  DEFTIMEVAR (TV_TREE_CFG                     , "tree CFG construction")
>  DEFTIMEVAR (TV_TREE_CLEANUP_CFG             , "tree CFG cleanup")
>  DEFTIMEVAR (TV_TREE_TAIL_MERGE       , "tree tail merge")
> +DEFTIMEVAR (TV_TREE_CMP                     , "tree cmp")
>  DEFTIMEVAR (TV_TREE_VRP              , "tree VRP")
>  DEFTIMEVAR (TV_TREE_VRP_THREADER     , "tree VRP threader")
>  DEFTIMEVAR (TV_TREE_EARLY_VRP        , "tree Early VRP")
> diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
> index 29267589eeb..71789a279c0 100644
> --- a/gcc/tree-pass.h
> +++ b/gcc/tree-pass.h
> @@ -401,6 +401,7 @@ extern gimple_opt_pass *make_pass_ch (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_ch_vect (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_sccopy (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_ccp (gcc::context *ctxt);
> +extern gimple_opt_pass *make_pass_cmp (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_split_paths (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_build_ssa (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_build_alias (gcc::context *ctxt);
> diff --git a/gcc/tree-ssa-cmp.cc b/gcc/tree-ssa-cmp.cc
> new file mode 100644
> index 00000000000..41d4a58a5b5
> --- /dev/null
> +++ b/gcc/tree-ssa-cmp.cc
> @@ -0,0 +1,262 @@
> +/* Global, SSA-based optimizations using comparison identities.
> +   Copyright (C) 2024 Free Software Foundation, Inc.
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify it
> +under the terms of the GNU General Public License as published by the
> +Free Software Foundation; either version 3, or (at your option) any
> +later version.
> +
> +GCC is distributed in the hope that it will be useful, but WITHOUT
> +ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> +for more details.
> +
> +You should have received a copy of the GNU General Public License
> +along with GCC; see the file COPYING3.  If not see
> +<http://www.gnu.org/licenses/>.  */
> +
> +/* (x - y) CMP 0 is equivalent to x CMP y where x and y are signed integers and
> +   CMP is <, <=, >, or >=.  Similarly, 0 CMP (x - y) is equivalent to y CMP x.
> +   As reported in PR middle-end/113680, this equivalence does not hold for
> +   types other than signed integers.  When it comes to conditions, the former
> +   was translated to a combination of sub and test, whereas the latter was
> +   translated to a single cmp.  Thus, this optimization pass tries to optimize
> +   the former to the latter.
> +
> +   When `-fwrapv` is enabled, GCC treats the overflow of signed integers as
> +   defined behavior, specifically, wrapping around according to two's
> +   complement arithmetic.  This has implications for optimizations that
> +   rely on the standard behavior of signed integers, where overflow is
> +   undefined.  Consider the example given:
> +
> +       long long llmax = __LONG_LONG_MAX__;
> +       long long llmin = -llmax - 1;
> +
> +   Here, `llmax - llmin` effectively becomes `llmax - (-llmax - 1)`, which
> +   simplifies to `2 * llmax + 1`.  Given that `llmax` is the maximum value for
> +   a `long long`, this calculation overflows in a defined manner
> +   (wrapping around), which under `-fwrapv` is a legal operation that produces
> +   a negative value due to two's complement wraparound.  Therefore,
> +   `llmax - llmin < 0` is true.
> +
> +   However, the direct comparison `llmax < llmin` is false since `llmax` is the
> +   maximum possible value and `llmin` is the minimum.  Hence, optimizations
> +   that rely on the equivalence of `(x - y) CMP 0` to `x CMP y` (and vice versa)
> +   cannot be safely applied when `-fwrapv` is enabled.  This is why this
> +   optimization pass is disabled under `-fwrapv`.
> +
> +   This optimization pass must run before the Jump Threading pass and
> +   the VRP pass, as it may modify conditions. For example, in the VRP pass:
> +
> +       (1)
> +         int diff = x - y;
> +         if (diff > 0)
> +           foo();
> +         if (diff < 0)
> +           bar();
> +
> +   The second condition would be converted to diff != 0 in the VRP pass because
> +   we know the postcondition of the first condition is diff <= 0, and then
> +   diff != 0 is cheaper than diff < 0. If we apply this pass after this VRP,
> +   we get:
> +
> +       (2)
> +         int diff = x - y;
> +         if (x > y)
> +           foo();
> +         if (diff != 0)
> +           bar();
> +
> +   This generates sub and test for the second condition and cmp for the first
> +   condition. However, if we apply this pass beforehand, we simply get:
> +
> +       (3)
> +         int diff = x - y;
> +         if (x > y)
> +           foo();
> +         if (x < y)
> +           bar();
> +
> +   In this code, diff will be eliminated as a dead code, and sub and test will
> +   not be generated, which is more efficient.
> +
> +   For the Jump Threading pass, without this optimization pass, (1) and (3)
> +   above are recognized as different, which prevents TCO.  */
> +
> +#include "config.h"
> +#include "system.h"
> +#include "coretypes.h"
> +#include "backend.h"
> +#include "tree.h"
> +#include "gimple.h"
> +#include "tree-pass.h"
> +#include "ssa.h"
> +#include "gimple-iterator.h"
> +#include "domwalk.h"
> +
> +/* True if VAR is a signed integer, false otherwise.  */
> +
> +static bool
> +is_signed_integer(const_tree var)
> +{
> +  return (TREE_CODE (TREE_TYPE (var)) == INTEGER_TYPE
> +         && !TYPE_UNSIGNED (TREE_TYPE (var)));
> +}
> +
> +/* If EXPR is an integer zero, return it.  If EXPR is SSA_NAME, and its
> +   defining statement is a subtraction of two signed integers, return the
> +   MINUS_EXPR.  Otherwise, return NULL_TREE.  */
> +
> +static const_tree
> +prop_integer_zero_or_minus_expr (const_tree expr)
> +{
> +  if (integer_zerop (expr))
> +    return expr;
> +
> +  if (TREE_CODE (expr) != SSA_NAME)
> +    return NULL_TREE;
> +
> +  gimple *defining_stmt = SSA_NAME_DEF_STMT (expr);
> +  if (!is_gimple_assign (defining_stmt)
> +      || gimple_assign_rhs_code (defining_stmt) != MINUS_EXPR)
> +    return NULL_TREE;
> +
> +  tree rhs1 = gimple_assign_rhs1 (defining_stmt);
> +  if (!is_signed_integer (rhs1))
> +    return NULL_TREE;
> +
> +  tree rhs2 = gimple_assign_rhs2 (defining_stmt);
> +  if (!is_signed_integer (rhs2))
> +    return NULL_TREE;
> +
> +  return build2 (MINUS_EXPR, TREE_TYPE (expr), rhs1, rhs2);
> +}
> +
> +/* Optimize:
> +
> +       1. (x - y) CMP 0
> +       2. 0 CMP (x - y)
> +
> +   to:
> +
> +       1. x CMP y
> +       2. y CMP x
> +
> +   where CMP is either <, <=, >, or >=.  */
> +
> +static void
> +optimize_signed_comparison (gcond *stmt)
> +{
> +  const enum tree_code code = gimple_cond_code (stmt);
> +  if (code < LT_EXPR || GE_EXPR < code)
> +    return;
> +
> +  const_tree lhs = prop_integer_zero_or_minus_expr (gimple_cond_lhs (stmt));
> +  if (lhs == NULL_TREE)
> +    return;
> +
> +  const_tree rhs = prop_integer_zero_or_minus_expr (gimple_cond_rhs (stmt));
> +  if (rhs == NULL_TREE)
> +    return;
> +
> +  if (TREE_CODE (lhs) == MINUS_EXPR && integer_zerop (rhs))
> +    {
> +      /* Case 1: (x - y) CMP 0  */
> +      tree x = TREE_OPERAND (lhs, 0);
> +      tree y = TREE_OPERAND (lhs, 1);
> +
> +      /* => x CMP y  */
> +      gimple_cond_set_lhs (stmt, x);
> +      gimple_cond_set_rhs (stmt, y);
> +      update_stmt (stmt);
> +    }
> +  else if (integer_zerop (lhs) && TREE_CODE (rhs) == MINUS_EXPR)
> +    {
> +      /* Case 2: 0 CMP (x - y)  */
> +      tree x = TREE_OPERAND (rhs, 0);
> +      tree y = TREE_OPERAND (rhs, 1);
> +
> +      /* => y CMP x  */
> +      gimple_cond_set_lhs (stmt, y);
> +      gimple_cond_set_rhs (stmt, x);
> +      update_stmt (stmt);
> +    }
> +}
> +
> +/* Find signed integer comparisons where the operands are MINUS_EXPR and
> +   0, and replace them with the equivalent comparison of the operands of
> +   the MINUS_EXPR without the subtraction.  */
> +
> +namespace {
> +
> +const pass_data pass_data_cmp =
> +{
> +  GIMPLE_PASS, /* type */
> +  "cmp", /* name */
> +  OPTGROUP_NONE, /* optinfo_flags */
> +  TV_TREE_CMP, /* tv_id */
> +  PROP_ssa, /* properties_required */
> +  0, /* properties_provided */
> +  0, /* properties_destroyed */
> +  0, /* todo_flags_start */
> +  TODO_remove_unused_locals, /* todo_flags_finish */
> +};
> +
> +class pass_cmp : public gimple_opt_pass
> +{
> +public:
> +  pass_cmp (gcc::context *ctxt)
> +    : gimple_opt_pass (pass_data_cmp, ctxt)
> +  {}
> +
> +  /* opt_pass methods: */
> +  bool gate (function *) final override
> +    {
> +      /* If -fwrapv is enabled, this pass cannot be safely applied as described
> +        in the comment at the top of this file.  */
> +      return flag_tree_cmp != 0 && flag_wrapv == 0;
> +    }
> +
> +  unsigned int execute (function *) final override;
> +
> +}; // class pass_cmp
> +
> +class cmp_dom_walker : public dom_walker
> +{
> +public:
> +  cmp_dom_walker () : dom_walker (CDI_DOMINATORS) {}
> +
> +  void after_dom_children (basic_block) final override;
> +};
> +
> +void
> +cmp_dom_walker::after_dom_children (basic_block bb)
> +{
> +  gimple_stmt_iterator gsi;
> +
> +  for (gsi = gsi_after_labels (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> +    {
> +      gimple *stmt = gsi_stmt (gsi);
> +      if (gimple_code (stmt) == GIMPLE_COND)
> +       optimize_signed_comparison (as_a <gcond *> (stmt));
> +    }
> +}
> +
> +
> +unsigned int
> +pass_cmp::execute (function *)
> +{
> +  calculate_dominance_info (CDI_DOMINATORS);
> +  cmp_dom_walker ().walk (ENTRY_BLOCK_PTR_FOR_FN (cfun));
> +  return 0;
> +}
> +
> +} // anon namespace
> +
> +gimple_opt_pass *
> +make_pass_cmp (gcc::context *ctxt)
> +{
> +  return new pass_cmp (ctxt);
> +}
> --
> 2.44.0
>

Ken Matsui March 5, 2024, 12:51 p.m. UTC | #2

On Tue, Mar 5, 2024 at 12:38 AM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Mon, Mar 4, 2024 at 9:40 PM Ken Matsui <kmatsui@gcc.gnu.org> wrote:
> >
> > (x - y) CMP 0 is equivalent to x CMP y where x and y are signed
> > integers and CMP is <, <=, >, or >=.  Similarly, 0 CMP (x - y) is
> > equivalent to y CMP x.  As reported in PR middle-end/113680, this
> > equivalence does not hold for types other than signed integers.  When
> > it comes to conditions, the former was translated to a combination of
> > sub and test, whereas the latter was translated to a single cmp.
> > Thus, this optimization pass tries to optimize the former to the
> > latter.
> >
> > When `-fwrapv` is enabled, GCC treats the overflow of signed integers
> > as defined behavior, specifically, wrapping around according to two's
> > complement arithmetic.  This has implications for optimizations that
> > rely on the standard behavior of signed integers, where overflow is
> > undefined.  Consider the example given:
> >
> >         long long llmax = __LONG_LONG_MAX__;
> >         long long llmin = -llmax - 1;
> >
> > Here, `llmax - llmin` effectively becomes `llmax - (-llmax - 1)`, which
> > simplifies to `2 * llmax + 1`.  Given that `llmax` is the maximum value
> > for a `long long`, this calculation overflows in a defined manner
> > (wrapping around), which under `-fwrapv` is a legal operation that
> > produces a negative value due to two's complement wraparound.
> > Therefore, `llmax - llmin < 0` is true.
> >
> > However, the direct comparison `llmax < llmin` is false since `llmax`
> > is the maximum possible value and `llmin` is the minimum.  Hence,
> > optimizations that rely on the equivalence of `(x - y) CMP 0` to
> > `x CMP y` (and vice versa) cannot be safely applied when `-fwrapv` is
> > enabled.  This is why this optimization pass is disabled under
> > `-fwrapv`.
> >
> > This optimization pass must run before the Jump Threading pass and the
> > VRP pass, as it may modify conditions. For example, in the VRP pass:
> >
> >         (1)
> >           int diff = x - y;
> >           if (diff > 0)
> >             foo();
> >           if (diff < 0)
> >             bar();
> >
> > The second condition would be converted to diff != 0 in the VRP pass
> > because we know the postcondition of the first condition is diff <= 0,
> > and then diff != 0 is cheaper than diff < 0. If we apply this pass
> > after this VRP, we get:
> >
> >         (2)
> >           int diff = x - y;
> >           if (x > y)
> >             foo();
> >           if (diff != 0)
> >             bar();
> >
> > This generates sub and test for the second condition and cmp for the
> > first condition. However, if we apply this pass beforehand, we simply
> > get:
> >
> >         (3)
> >           int diff = x - y;
> >           if (x > y)
> >             foo();
> >           if (x < y)
> >             bar();
> >
> > In this code, diff will be eliminated as a dead code, and sub and test
> > will not be generated, which is more efficient.
> >
> > For the Jump Threading pass, without this optimization pass, (1) and
> > (3) above are recognized as different, which prevents TCO.
> >
> >         PR middle-end/113680
>
> This shouldn't be done as a new optimization pass.  It fits either
> the explicit code present in the forwprop pass or a new match.pd
> pattern.  There's possible interaction with x - y value being used
> elsewhere and thus exposing a CSE opportunity as well as
> a comparison against zero being possibly implemented by
> a flag setting subtraction instruction.
>

Thank you so much for your review!  Although the forwprop pass runs
multiple times, we might not need to apply this optimization multiple
times.  Would it be acceptable to add such optimization?  More
generally, I would like to know how to determine where to put
optimization in the future.

FYI, I read this page: https://gcc.gnu.org/wiki/OptimizationCourse

> Our VN pass has some tricks to anticipate CSE opportunities
> like this, but it's not done "properly" in the way of anticipating
> both forms during PRE.
>
> I'll note we have
>
>  /* (A - B) != 0 ? (A - B) : (B - A)    same as (A - B) */
>  (for cmp (ne ltgt)
>
> and similar which might be confused by canonicalizing to A != B.

I will investigate and update my patch (after my final exam ends...)!

> I'm also surprised we don't already have the pattern you add.

Hmm, so am I.  I saw that this optimization sometimes messes up VRP
while sometimes helping it (as I described in my comment).  I also
need to research this.

>
> Richard.
>
> > gcc/ChangeLog:
> >
> >         * Makefile.in: Add tree-ssa-cmp.o to OBJS.
> >         * common.opt: Define ftree-cmp
> >         * doc/invoke.texi: Document ftree-cmp.
> >         * opts.cc (default_options_table): Handle OPT_ftree_cmp.
> >         * passes.def (pass_cmp): New optimization pass.
> >         * timevar.def (TV_TREE_CMP): New variable for timing.
> >         * tree-pass.h (make_pass_cmp): New declaration.
> >         * tree-ssa-cmp.cc: New file.
> >
> > gcc/testsuite/ChangeLog:
> >
> >         * gcc.dg/pr113680.c: New test.
> >
> > Signed-off-by: Ken Matsui <kmatsui@gcc.gnu.org>
> > ---
> >  gcc/Makefile.in                 |   1 +
> >  gcc/common.opt                  |   4 +
> >  gcc/doc/invoke.texi             |  11 +-
> >  gcc/opts.cc                     |   1 +
> >  gcc/passes.def                  |   3 +
> >  gcc/testsuite/gcc.dg/pr113680.c |  47 ++++++
> >  gcc/timevar.def                 |   1 +
> >  gcc/tree-pass.h                 |   1 +
> >  gcc/tree-ssa-cmp.cc             | 262 ++++++++++++++++++++++++++++++++
> >  9 files changed, 330 insertions(+), 1 deletion(-)
> >  create mode 100644 gcc/testsuite/gcc.dg/pr113680.c
> >  create mode 100644 gcc/tree-ssa-cmp.cc
> >
> > diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> > index a74761b7ab3..935b80b6947 100644
> > --- a/gcc/Makefile.in
> > +++ b/gcc/Makefile.in
> > @@ -1731,6 +1731,7 @@ OBJS = \
> >         tree-ssa-address.o \
> >         tree-ssa-alias.o \
> >         tree-ssa-ccp.o \
> > +       tree-ssa-cmp.o \
> >         tree-ssa-coalesce.o \
> >         tree-ssa-copy.o \
> >         tree-ssa-dce.o \
> > diff --git a/gcc/common.opt b/gcc/common.opt
> > index 51c4a17da83..7c853224458 100644
> > --- a/gcc/common.opt
> > +++ b/gcc/common.opt
> > @@ -3053,6 +3053,10 @@ ftree-ch
> >  Common Var(flag_tree_ch) Optimization
> >  Enable loop header copying on trees.
> >
> > +ftree-cmp
> > +Common Var(flag_tree_cmp) Optimization
> > +Enable SSA comparison optimization on trees.
> > +
> >  ftree-coalesce-inlined-vars
> >  Common Ignore RejectNegative
> >  Does nothing.  Preserved for backward compatibility.
> > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> > index bdf05be387d..04762d490a3 100644
> > --- a/gcc/doc/invoke.texi
> > +++ b/gcc/doc/invoke.texi
> > @@ -619,7 +619,7 @@ Objective-C and Objective-C++ Dialects}.
> >  -fsplit-wide-types  -fsplit-wide-types-early  -fssa-backprop  -fssa-phiopt
> >  -fstdarg-opt  -fstore-merging  -fstrict-aliasing -fipa-strict-aliasing
> >  -fthread-jumps  -ftracer  -ftree-bit-ccp
> > --ftree-builtin-call-dce  -ftree-ccp  -ftree-ch
> > +-ftree-builtin-call-dce  -ftree-ccp  -ftree-ch  -ftree-cmp
> >  -ftree-coalesce-vars  -ftree-copy-prop  -ftree-dce  -ftree-dominator-opts
> >  -ftree-dse  -ftree-forwprop  -ftree-fre  -fcode-hoisting
> >  -ftree-loop-if-convert  -ftree-loop-im
> > @@ -12377,6 +12377,7 @@ compilation time.
> >  -ftree-bit-ccp
> >  -ftree-ccp
> >  -ftree-ch
> > +-ftree-cmp
> >  -ftree-coalesce-vars
> >  -ftree-copy-prop
> >  -ftree-dce
> > @@ -13707,6 +13708,14 @@ effectiveness of code motion optimizations.  It also saves one jump.  This flag
> >  is enabled by default at @option{-O1} and higher.  It is not enabled
> >  for @option{-Os}, since it usually increases code size.
> >
> > +@opindex ftree-cmp
> > +@item -ftree-cmp
> > +Perform comparison optimization on trees.  This decreases unnecessary
> > +instructions for comparisons.  This flag is enabled by default at
> > +@option{-O1} and higher.  This optimization is automatically disabled when
> > +@option{-fwrapv} is specified to ensure correct behavior under signed integer
> > +overflow wrapping.
> > +
> >  @opindex ftree-loop-optimize
> >  @item -ftree-loop-optimize
> >  Perform loop optimizations on trees.  This flag is enabled by default
> > diff --git a/gcc/opts.cc b/gcc/opts.cc
> > index 3333600e0ea..9ee64a48313 100644
> > --- a/gcc/opts.cc
> > +++ b/gcc/opts.cc
> > @@ -589,6 +589,7 @@ static const struct default_options default_options_table[] =
> >      { OPT_LEVELS_1_PLUS, OPT_ftree_builtin_call_dce, NULL, 1 },
> >      { OPT_LEVELS_1_PLUS, OPT_ftree_ccp, NULL, 1 },
> >      { OPT_LEVELS_1_PLUS, OPT_ftree_ch, NULL, 1 },
> > +    { OPT_LEVELS_1_PLUS, OPT_ftree_cmp, NULL, 1 },
> >      { OPT_LEVELS_1_PLUS, OPT_ftree_coalesce_vars, NULL, 1 },
> >      { OPT_LEVELS_1_PLUS, OPT_ftree_copy_prop, NULL, 1 },
> >      { OPT_LEVELS_1_PLUS, OPT_ftree_dce, NULL, 1 },
> > diff --git a/gcc/passes.def b/gcc/passes.def
> > index 1cbbd413097..e28a2d3c309 100644
> > --- a/gcc/passes.def
> > +++ b/gcc/passes.def
> > @@ -84,6 +84,9 @@ along with GCC; see the file COPYING3.  If not see
> >           /* After CCP we rewrite no longer addressed locals into SSA
> >              form if possible.  */
> >           NEXT_PASS (pass_forwprop);
> > +         /* Do cmp before early_thread_jumps and vrp since it may
> > +            modify conditions.  */
> > +         NEXT_PASS (pass_cmp);
> >            NEXT_PASS (pass_early_thread_jumps, /*first=*/true);
> >           NEXT_PASS (pass_sra_early);
> >           /* pass_build_ealias is a dummy pass that ensures that we
> > diff --git a/gcc/testsuite/gcc.dg/pr113680.c b/gcc/testsuite/gcc.dg/pr113680.c
> > new file mode 100644
> > index 00000000000..59cd865aa21
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/pr113680.c
> > @@ -0,0 +1,47 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-Os -fdump-tree-optimized" } */
> > +
> > +volatile int i = 0;
> > +
> > +void foo() { i = 1; }
> > +void bar() { i = 2; }
> > +
> > +void f1(int x, int y)
> > +{
> > +  int diff = x - y;
> > +  if (diff > 0)
> > +    foo();
> > +  if (diff < 0)
> > +    bar();
> > +}
> > +
> > +void f2(int x, int y)
> > +{
> > +  if ((x - y) > 0)
> > +    foo();
> > +  if ((x - y) < 0)
> > +    bar();
> > +}
> > +
> > +void f3(int x, int y)
> > +{
> > +  if (x > y)
> > +    foo();
> > +  if (x < y)
> > +    bar();
> > +}
> > +
> > +void f4(int x, int y)
> > +{
> > +  int diff = x - y;
> > +  if (diff > 0)
> > +    foo();
> > +  if (x < y)
> > +    bar();
> > +}
> > +
> > +/* { dg-final { scan-tree-dump-not " - " "optimized" } } */
> > +/* { dg-final { scan-assembler-times "cmp" 1 { target { i?86-*-* x86_64-*-* } } } } */
> > +/* { dg-final { scan-assembler-times "jmp\[ \t\]+f1" 3 { target { i?86-*-* x86_64-*-* } } } } */
> > +/* { dg-final { scan-assembler-not "sub" { target { i?86-*-* x86_64-*-* } } } } */
> > +/* { dg-final { scan-assembler-not "test" { target { i?86-*-* x86_64-*-* } } } } */
> > diff --git a/gcc/timevar.def b/gcc/timevar.def
> > index 8e2168e0817..468266d0c56 100644
> > --- a/gcc/timevar.def
> > +++ b/gcc/timevar.def
> > @@ -157,6 +157,7 @@ DEFTIMEVAR (TV_TREE_EH                   , "tree eh")
> >  DEFTIMEVAR (TV_TREE_CFG                     , "tree CFG construction")
> >  DEFTIMEVAR (TV_TREE_CLEANUP_CFG             , "tree CFG cleanup")
> >  DEFTIMEVAR (TV_TREE_TAIL_MERGE       , "tree tail merge")
> > +DEFTIMEVAR (TV_TREE_CMP                     , "tree cmp")
> >  DEFTIMEVAR (TV_TREE_VRP              , "tree VRP")
> >  DEFTIMEVAR (TV_TREE_VRP_THREADER     , "tree VRP threader")
> >  DEFTIMEVAR (TV_TREE_EARLY_VRP        , "tree Early VRP")
> > diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
> > index 29267589eeb..71789a279c0 100644
> > --- a/gcc/tree-pass.h
> > +++ b/gcc/tree-pass.h
> > @@ -401,6 +401,7 @@ extern gimple_opt_pass *make_pass_ch (gcc::context *ctxt);
> >  extern gimple_opt_pass *make_pass_ch_vect (gcc::context *ctxt);
> >  extern gimple_opt_pass *make_pass_sccopy (gcc::context *ctxt);
> >  extern gimple_opt_pass *make_pass_ccp (gcc::context *ctxt);
> > +extern gimple_opt_pass *make_pass_cmp (gcc::context *ctxt);
> >  extern gimple_opt_pass *make_pass_split_paths (gcc::context *ctxt);
> >  extern gimple_opt_pass *make_pass_build_ssa (gcc::context *ctxt);
> >  extern gimple_opt_pass *make_pass_build_alias (gcc::context *ctxt);
> > diff --git a/gcc/tree-ssa-cmp.cc b/gcc/tree-ssa-cmp.cc
> > new file mode 100644
> > index 00000000000..41d4a58a5b5
> > --- /dev/null
> > +++ b/gcc/tree-ssa-cmp.cc
> > @@ -0,0 +1,262 @@
> > +/* Global, SSA-based optimizations using comparison identities.
> > +   Copyright (C) 2024 Free Software Foundation, Inc.
> > +
> > +This file is part of GCC.
> > +
> > +GCC is free software; you can redistribute it and/or modify it
> > +under the terms of the GNU General Public License as published by the
> > +Free Software Foundation; either version 3, or (at your option) any
> > +later version.
> > +
> > +GCC is distributed in the hope that it will be useful, but WITHOUT
> > +ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> > +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> > +for more details.
> > +
> > +You should have received a copy of the GNU General Public License
> > +along with GCC; see the file COPYING3.  If not see
> > +<http://www.gnu.org/licenses/>.  */
> > +
> > +/* (x - y) CMP 0 is equivalent to x CMP y where x and y are signed integers and
> > +   CMP is <, <=, >, or >=.  Similarly, 0 CMP (x - y) is equivalent to y CMP x.
> > +   As reported in PR middle-end/113680, this equivalence does not hold for
> > +   types other than signed integers.  When it comes to conditions, the former
> > +   was translated to a combination of sub and test, whereas the latter was
> > +   translated to a single cmp.  Thus, this optimization pass tries to optimize
> > +   the former to the latter.
> > +
> > +   When `-fwrapv` is enabled, GCC treats the overflow of signed integers as
> > +   defined behavior, specifically, wrapping around according to two's
> > +   complement arithmetic.  This has implications for optimizations that
> > +   rely on the standard behavior of signed integers, where overflow is
> > +   undefined.  Consider the example given:
> > +
> > +       long long llmax = __LONG_LONG_MAX__;
> > +       long long llmin = -llmax - 1;
> > +
> > +   Here, `llmax - llmin` effectively becomes `llmax - (-llmax - 1)`, which
> > +   simplifies to `2 * llmax + 1`.  Given that `llmax` is the maximum value for
> > +   a `long long`, this calculation overflows in a defined manner
> > +   (wrapping around), which under `-fwrapv` is a legal operation that produces
> > +   a negative value due to two's complement wraparound.  Therefore,
> > +   `llmax - llmin < 0` is true.
> > +
> > +   However, the direct comparison `llmax < llmin` is false since `llmax` is the
> > +   maximum possible value and `llmin` is the minimum.  Hence, optimizations
> > +   that rely on the equivalence of `(x - y) CMP 0` to `x CMP y` (and vice versa)
> > +   cannot be safely applied when `-fwrapv` is enabled.  This is why this
> > +   optimization pass is disabled under `-fwrapv`.
> > +
> > +   This optimization pass must run before the Jump Threading pass and
> > +   the VRP pass, as it may modify conditions. For example, in the VRP pass:
> > +
> > +       (1)
> > +         int diff = x - y;
> > +         if (diff > 0)
> > +           foo();
> > +         if (diff < 0)
> > +           bar();
> > +
> > +   The second condition would be converted to diff != 0 in the VRP pass because
> > +   we know the postcondition of the first condition is diff <= 0, and then
> > +   diff != 0 is cheaper than diff < 0. If we apply this pass after this VRP,
> > +   we get:
> > +
> > +       (2)
> > +         int diff = x - y;
> > +         if (x > y)
> > +           foo();
> > +         if (diff != 0)
> > +           bar();
> > +
> > +   This generates sub and test for the second condition and cmp for the first
> > +   condition. However, if we apply this pass beforehand, we simply get:
> > +
> > +       (3)
> > +         int diff = x - y;
> > +         if (x > y)
> > +           foo();
> > +         if (x < y)
> > +           bar();
> > +
> > +   In this code, diff will be eliminated as a dead code, and sub and test will
> > +   not be generated, which is more efficient.
> > +
> > +   For the Jump Threading pass, without this optimization pass, (1) and (3)
> > +   above are recognized as different, which prevents TCO.  */
> > +
> > +#include "config.h"
> > +#include "system.h"
> > +#include "coretypes.h"
> > +#include "backend.h"
> > +#include "tree.h"
> > +#include "gimple.h"
> > +#include "tree-pass.h"
> > +#include "ssa.h"
> > +#include "gimple-iterator.h"
> > +#include "domwalk.h"
> > +
> > +/* True if VAR is a signed integer, false otherwise.  */
> > +
> > +static bool
> > +is_signed_integer(const_tree var)
> > +{
> > +  return (TREE_CODE (TREE_TYPE (var)) == INTEGER_TYPE
> > +         && !TYPE_UNSIGNED (TREE_TYPE (var)));
> > +}
> > +
> > +/* If EXPR is an integer zero, return it.  If EXPR is SSA_NAME, and its
> > +   defining statement is a subtraction of two signed integers, return the
> > +   MINUS_EXPR.  Otherwise, return NULL_TREE.  */
> > +
> > +static const_tree
> > +prop_integer_zero_or_minus_expr (const_tree expr)
> > +{
> > +  if (integer_zerop (expr))
> > +    return expr;
> > +
> > +  if (TREE_CODE (expr) != SSA_NAME)
> > +    return NULL_TREE;
> > +
> > +  gimple *defining_stmt = SSA_NAME_DEF_STMT (expr);
> > +  if (!is_gimple_assign (defining_stmt)
> > +      || gimple_assign_rhs_code (defining_stmt) != MINUS_EXPR)
> > +    return NULL_TREE;
> > +
> > +  tree rhs1 = gimple_assign_rhs1 (defining_stmt);
> > +  if (!is_signed_integer (rhs1))
> > +    return NULL_TREE;
> > +
> > +  tree rhs2 = gimple_assign_rhs2 (defining_stmt);
> > +  if (!is_signed_integer (rhs2))
> > +    return NULL_TREE;
> > +
> > +  return build2 (MINUS_EXPR, TREE_TYPE (expr), rhs1, rhs2);
> > +}
> > +
> > +/* Optimize:
> > +
> > +       1. (x - y) CMP 0
> > +       2. 0 CMP (x - y)
> > +
> > +   to:
> > +
> > +       1. x CMP y
> > +       2. y CMP x
> > +
> > +   where CMP is either <, <=, >, or >=.  */
> > +
> > +static void
> > +optimize_signed_comparison (gcond *stmt)
> > +{
> > +  const enum tree_code code = gimple_cond_code (stmt);
> > +  if (code < LT_EXPR || GE_EXPR < code)
> > +    return;
> > +
> > +  const_tree lhs = prop_integer_zero_or_minus_expr (gimple_cond_lhs (stmt));
> > +  if (lhs == NULL_TREE)
> > +    return;
> > +
> > +  const_tree rhs = prop_integer_zero_or_minus_expr (gimple_cond_rhs (stmt));
> > +  if (rhs == NULL_TREE)
> > +    return;
> > +
> > +  if (TREE_CODE (lhs) == MINUS_EXPR && integer_zerop (rhs))
> > +    {
> > +      /* Case 1: (x - y) CMP 0  */
> > +      tree x = TREE_OPERAND (lhs, 0);
> > +      tree y = TREE_OPERAND (lhs, 1);
> > +
> > +      /* => x CMP y  */
> > +      gimple_cond_set_lhs (stmt, x);
> > +      gimple_cond_set_rhs (stmt, y);
> > +      update_stmt (stmt);
> > +    }
> > +  else if (integer_zerop (lhs) && TREE_CODE (rhs) == MINUS_EXPR)
> > +    {
> > +      /* Case 2: 0 CMP (x - y)  */
> > +      tree x = TREE_OPERAND (rhs, 0);
> > +      tree y = TREE_OPERAND (rhs, 1);
> > +
> > +      /* => y CMP x  */
> > +      gimple_cond_set_lhs (stmt, y);
> > +      gimple_cond_set_rhs (stmt, x);
> > +      update_stmt (stmt);
> > +    }
> > +}
> > +
> > +/* Find signed integer comparisons where the operands are MINUS_EXPR and
> > +   0, and replace them with the equivalent comparison of the operands of
> > +   the MINUS_EXPR without the subtraction.  */
> > +
> > +namespace {
> > +
> > +const pass_data pass_data_cmp =
> > +{
> > +  GIMPLE_PASS, /* type */
> > +  "cmp", /* name */
> > +  OPTGROUP_NONE, /* optinfo_flags */
> > +  TV_TREE_CMP, /* tv_id */
> > +  PROP_ssa, /* properties_required */
> > +  0, /* properties_provided */
> > +  0, /* properties_destroyed */
> > +  0, /* todo_flags_start */
> > +  TODO_remove_unused_locals, /* todo_flags_finish */
> > +};
> > +
> > +class pass_cmp : public gimple_opt_pass
> > +{
> > +public:
> > +  pass_cmp (gcc::context *ctxt)
> > +    : gimple_opt_pass (pass_data_cmp, ctxt)
> > +  {}
> > +
> > +  /* opt_pass methods: */
> > +  bool gate (function *) final override
> > +    {
> > +      /* If -fwrapv is enabled, this pass cannot be safely applied as described
> > +        in the comment at the top of this file.  */
> > +      return flag_tree_cmp != 0 && flag_wrapv == 0;
> > +    }
> > +
> > +  unsigned int execute (function *) final override;
> > +
> > +}; // class pass_cmp
> > +
> > +class cmp_dom_walker : public dom_walker
> > +{
> > +public:
> > +  cmp_dom_walker () : dom_walker (CDI_DOMINATORS) {}
> > +
> > +  void after_dom_children (basic_block) final override;
> > +};
> > +
> > +void
> > +cmp_dom_walker::after_dom_children (basic_block bb)
> > +{
> > +  gimple_stmt_iterator gsi;
> > +
> > +  for (gsi = gsi_after_labels (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> > +    {
> > +      gimple *stmt = gsi_stmt (gsi);
> > +      if (gimple_code (stmt) == GIMPLE_COND)
> > +       optimize_signed_comparison (as_a <gcond *> (stmt));
> > +    }
> > +}
> > +
> > +
> > +unsigned int
> > +pass_cmp::execute (function *)
> > +{
> > +  calculate_dominance_info (CDI_DOMINATORS);
> > +  cmp_dom_walker ().walk (ENTRY_BLOCK_PTR_FOR_FN (cfun));
> > +  return 0;
> > +}
> > +
> > +} // anon namespace
> > +
> > +gimple_opt_pass *
> > +make_pass_cmp (gcc::context *ctxt)
> > +{
> > +  return new pass_cmp (ctxt);
> > +}
> > --
> > 2.44.0
> >

Richard Biener March 5, 2024, 3:58 p.m. UTC | #3

On Tue, Mar 5, 2024 at 1:51 PM Ken Matsui <kmatsui@cs.washington.edu> wrote:
>
> On Tue, Mar 5, 2024 at 12:38 AM Richard Biener
> <richard.guenther@gmail.com> wrote:
> >
> > On Mon, Mar 4, 2024 at 9:40 PM Ken Matsui <kmatsui@gcc.gnu.org> wrote:
> > >
> > > (x - y) CMP 0 is equivalent to x CMP y where x and y are signed
> > > integers and CMP is <, <=, >, or >=.  Similarly, 0 CMP (x - y) is
> > > equivalent to y CMP x.  As reported in PR middle-end/113680, this
> > > equivalence does not hold for types other than signed integers.  When
> > > it comes to conditions, the former was translated to a combination of
> > > sub and test, whereas the latter was translated to a single cmp.
> > > Thus, this optimization pass tries to optimize the former to the
> > > latter.
> > >
> > > When `-fwrapv` is enabled, GCC treats the overflow of signed integers
> > > as defined behavior, specifically, wrapping around according to two's
> > > complement arithmetic.  This has implications for optimizations that
> > > rely on the standard behavior of signed integers, where overflow is
> > > undefined.  Consider the example given:
> > >
> > >         long long llmax = __LONG_LONG_MAX__;
> > >         long long llmin = -llmax - 1;
> > >
> > > Here, `llmax - llmin` effectively becomes `llmax - (-llmax - 1)`, which
> > > simplifies to `2 * llmax + 1`.  Given that `llmax` is the maximum value
> > > for a `long long`, this calculation overflows in a defined manner
> > > (wrapping around), which under `-fwrapv` is a legal operation that
> > > produces a negative value due to two's complement wraparound.
> > > Therefore, `llmax - llmin < 0` is true.
> > >
> > > However, the direct comparison `llmax < llmin` is false since `llmax`
> > > is the maximum possible value and `llmin` is the minimum.  Hence,
> > > optimizations that rely on the equivalence of `(x - y) CMP 0` to
> > > `x CMP y` (and vice versa) cannot be safely applied when `-fwrapv` is
> > > enabled.  This is why this optimization pass is disabled under
> > > `-fwrapv`.
> > >
> > > This optimization pass must run before the Jump Threading pass and the
> > > VRP pass, as it may modify conditions. For example, in the VRP pass:
> > >
> > >         (1)
> > >           int diff = x - y;
> > >           if (diff > 0)
> > >             foo();
> > >           if (diff < 0)
> > >             bar();
> > >
> > > The second condition would be converted to diff != 0 in the VRP pass
> > > because we know the postcondition of the first condition is diff <= 0,
> > > and then diff != 0 is cheaper than diff < 0. If we apply this pass
> > > after this VRP, we get:
> > >
> > >         (2)
> > >           int diff = x - y;
> > >           if (x > y)
> > >             foo();
> > >           if (diff != 0)
> > >             bar();
> > >
> > > This generates sub and test for the second condition and cmp for the
> > > first condition. However, if we apply this pass beforehand, we simply
> > > get:
> > >
> > >         (3)
> > >           int diff = x - y;
> > >           if (x > y)
> > >             foo();
> > >           if (x < y)
> > >             bar();
> > >
> > > In this code, diff will be eliminated as a dead code, and sub and test
> > > will not be generated, which is more efficient.
> > >
> > > For the Jump Threading pass, without this optimization pass, (1) and
> > > (3) above are recognized as different, which prevents TCO.
> > >
> > >         PR middle-end/113680
> >
> > This shouldn't be done as a new optimization pass.  It fits either
> > the explicit code present in the forwprop pass or a new match.pd
> > pattern.  There's possible interaction with x - y value being used
> > elsewhere and thus exposing a CSE opportunity as well as
> > a comparison against zero being possibly implemented by
> > a flag setting subtraction instruction.
> >
>
> Thank you so much for your review!  Although the forwprop pass runs
> multiple times, we might not need to apply this optimization multiple
> times.  Would it be acceptable to add such optimization?  More
> generally, I would like to know how to determine where to put
> optimization in the future.

This kind of pattern matching expression simplification is best
addressed by patterns in match.pd though historically the forwprop
pass still catches cases not in match.pd in its
forward_propagate_into_comparison_1 (and callers).

> FYI, I read this page: https://gcc.gnu.org/wiki/OptimizationCourse
>
> > Our VN pass has some tricks to anticipate CSE opportunities
> > like this, but it's not done "properly" in the way of anticipating
> > both forms during PRE.
> >
> > I'll note we have
> >
> >  /* (A - B) != 0 ? (A - B) : (B - A)    same as (A - B) */
> >  (for cmp (ne ltgt)
> >
> > and similar which might be confused by canonicalizing to A != B.
>
> I will investigate and update my patch (after my final exam ends...)!
>
> > I'm also surprised we don't already have the pattern you add.
>
> Hmm, so am I.

It looks like we do it for equality compares which can also handle
types where overflow is undefined.  -fdump-tree-all-folding shows
in the 005t.original dump we simplify a - b != 0 via match.pd:6105:

/* Transform comparisons of the form X - Y CMP 0 to X CMP Y.
   ??? The transformation is valid for the other operators if overflow
   is undefined for the type, but performing it here badly interacts
   with the transformation in fold_cond_expr_with_comparison which
   attempts to synthetize ABS_EXPR.  */
(for cmp (eq ne)
 (for sub (minus pointer_diff)
  (simplify
   (cmp (sub@2 @0 @1) integer_zerop)
   (if (single_use (@2))
    (cmp @0 @1)))))

and the comment explains why we don't perform the transform
it the way you intend.  That comment might no longer apply
but I would guess there's testsuite fallout when you amend this
pattern.

Richard.

>  I saw that this optimization sometimes messes up VRP
> while sometimes helping it (as I described in my comment).  I also
> need to research this.
>
> >
> > Richard.
> >
> > > gcc/ChangeLog:
> > >
> > >         * Makefile.in: Add tree-ssa-cmp.o to OBJS.
> > >         * common.opt: Define ftree-cmp
> > >         * doc/invoke.texi: Document ftree-cmp.
> > >         * opts.cc (default_options_table): Handle OPT_ftree_cmp.
> > >         * passes.def (pass_cmp): New optimization pass.
> > >         * timevar.def (TV_TREE_CMP): New variable for timing.
> > >         * tree-pass.h (make_pass_cmp): New declaration.
> > >         * tree-ssa-cmp.cc: New file.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >         * gcc.dg/pr113680.c: New test.
> > >
> > > Signed-off-by: Ken Matsui <kmatsui@gcc.gnu.org>
> > > ---
> > >  gcc/Makefile.in                 |   1 +
> > >  gcc/common.opt                  |   4 +
> > >  gcc/doc/invoke.texi             |  11 +-
> > >  gcc/opts.cc                     |   1 +
> > >  gcc/passes.def                  |   3 +
> > >  gcc/testsuite/gcc.dg/pr113680.c |  47 ++++++
> > >  gcc/timevar.def                 |   1 +
> > >  gcc/tree-pass.h                 |   1 +
> > >  gcc/tree-ssa-cmp.cc             | 262 ++++++++++++++++++++++++++++++++
> > >  9 files changed, 330 insertions(+), 1 deletion(-)
> > >  create mode 100644 gcc/testsuite/gcc.dg/pr113680.c
> > >  create mode 100644 gcc/tree-ssa-cmp.cc
> > >
> > > diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> > > index a74761b7ab3..935b80b6947 100644
> > > --- a/gcc/Makefile.in
> > > +++ b/gcc/Makefile.in
> > > @@ -1731,6 +1731,7 @@ OBJS = \
> > >         tree-ssa-address.o \
> > >         tree-ssa-alias.o \
> > >         tree-ssa-ccp.o \
> > > +       tree-ssa-cmp.o \
> > >         tree-ssa-coalesce.o \
> > >         tree-ssa-copy.o \
> > >         tree-ssa-dce.o \
> > > diff --git a/gcc/common.opt b/gcc/common.opt
> > > index 51c4a17da83..7c853224458 100644
> > > --- a/gcc/common.opt
> > > +++ b/gcc/common.opt
> > > @@ -3053,6 +3053,10 @@ ftree-ch
> > >  Common Var(flag_tree_ch) Optimization
> > >  Enable loop header copying on trees.
> > >
> > > +ftree-cmp
> > > +Common Var(flag_tree_cmp) Optimization
> > > +Enable SSA comparison optimization on trees.
> > > +
> > >  ftree-coalesce-inlined-vars
> > >  Common Ignore RejectNegative
> > >  Does nothing.  Preserved for backward compatibility.
> > > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> > > index bdf05be387d..04762d490a3 100644
> > > --- a/gcc/doc/invoke.texi
> > > +++ b/gcc/doc/invoke.texi
> > > @@ -619,7 +619,7 @@ Objective-C and Objective-C++ Dialects}.
> > >  -fsplit-wide-types  -fsplit-wide-types-early  -fssa-backprop  -fssa-phiopt
> > >  -fstdarg-opt  -fstore-merging  -fstrict-aliasing -fipa-strict-aliasing
> > >  -fthread-jumps  -ftracer  -ftree-bit-ccp
> > > --ftree-builtin-call-dce  -ftree-ccp  -ftree-ch
> > > +-ftree-builtin-call-dce  -ftree-ccp  -ftree-ch  -ftree-cmp
> > >  -ftree-coalesce-vars  -ftree-copy-prop  -ftree-dce  -ftree-dominator-opts
> > >  -ftree-dse  -ftree-forwprop  -ftree-fre  -fcode-hoisting
> > >  -ftree-loop-if-convert  -ftree-loop-im
> > > @@ -12377,6 +12377,7 @@ compilation time.
> > >  -ftree-bit-ccp
> > >  -ftree-ccp
> > >  -ftree-ch
> > > +-ftree-cmp
> > >  -ftree-coalesce-vars
> > >  -ftree-copy-prop
> > >  -ftree-dce
> > > @@ -13707,6 +13708,14 @@ effectiveness of code motion optimizations.  It also saves one jump.  This flag
> > >  is enabled by default at @option{-O1} and higher.  It is not enabled
> > >  for @option{-Os}, since it usually increases code size.
> > >
> > > +@opindex ftree-cmp
> > > +@item -ftree-cmp
> > > +Perform comparison optimization on trees.  This decreases unnecessary
> > > +instructions for comparisons.  This flag is enabled by default at
> > > +@option{-O1} and higher.  This optimization is automatically disabled when
> > > +@option{-fwrapv} is specified to ensure correct behavior under signed integer
> > > +overflow wrapping.
> > > +
> > >  @opindex ftree-loop-optimize
> > >  @item -ftree-loop-optimize
> > >  Perform loop optimizations on trees.  This flag is enabled by default
> > > diff --git a/gcc/opts.cc b/gcc/opts.cc
> > > index 3333600e0ea..9ee64a48313 100644
> > > --- a/gcc/opts.cc
> > > +++ b/gcc/opts.cc
> > > @@ -589,6 +589,7 @@ static const struct default_options default_options_table[] =
> > >      { OPT_LEVELS_1_PLUS, OPT_ftree_builtin_call_dce, NULL, 1 },
> > >      { OPT_LEVELS_1_PLUS, OPT_ftree_ccp, NULL, 1 },
> > >      { OPT_LEVELS_1_PLUS, OPT_ftree_ch, NULL, 1 },
> > > +    { OPT_LEVELS_1_PLUS, OPT_ftree_cmp, NULL, 1 },
> > >      { OPT_LEVELS_1_PLUS, OPT_ftree_coalesce_vars, NULL, 1 },
> > >      { OPT_LEVELS_1_PLUS, OPT_ftree_copy_prop, NULL, 1 },
> > >      { OPT_LEVELS_1_PLUS, OPT_ftree_dce, NULL, 1 },
> > > diff --git a/gcc/passes.def b/gcc/passes.def
> > > index 1cbbd413097..e28a2d3c309 100644
> > > --- a/gcc/passes.def
> > > +++ b/gcc/passes.def
> > > @@ -84,6 +84,9 @@ along with GCC; see the file COPYING3.  If not see
> > >           /* After CCP we rewrite no longer addressed locals into SSA
> > >              form if possible.  */
> > >           NEXT_PASS (pass_forwprop);
> > > +         /* Do cmp before early_thread_jumps and vrp since it may
> > > +            modify conditions.  */
> > > +         NEXT_PASS (pass_cmp);
> > >            NEXT_PASS (pass_early_thread_jumps, /*first=*/true);
> > >           NEXT_PASS (pass_sra_early);
> > >           /* pass_build_ealias is a dummy pass that ensures that we
> > > diff --git a/gcc/testsuite/gcc.dg/pr113680.c b/gcc/testsuite/gcc.dg/pr113680.c
> > > new file mode 100644
> > > index 00000000000..59cd865aa21
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/pr113680.c
> > > @@ -0,0 +1,47 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-Os -fdump-tree-optimized" } */
> > > +
> > > +volatile int i = 0;
> > > +
> > > +void foo() { i = 1; }
> > > +void bar() { i = 2; }
> > > +
> > > +void f1(int x, int y)
> > > +{
> > > +  int diff = x - y;
> > > +  if (diff > 0)
> > > +    foo();
> > > +  if (diff < 0)
> > > +    bar();
> > > +}
> > > +
> > > +void f2(int x, int y)
> > > +{
> > > +  if ((x - y) > 0)
> > > +    foo();
> > > +  if ((x - y) < 0)
> > > +    bar();
> > > +}
> > > +
> > > +void f3(int x, int y)
> > > +{
> > > +  if (x > y)
> > > +    foo();
> > > +  if (x < y)
> > > +    bar();
> > > +}
> > > +
> > > +void f4(int x, int y)
> > > +{
> > > +  int diff = x - y;
> > > +  if (diff > 0)
> > > +    foo();
> > > +  if (x < y)
> > > +    bar();
> > > +}
> > > +
> > > +/* { dg-final { scan-tree-dump-not " - " "optimized" } } */
> > > +/* { dg-final { scan-assembler-times "cmp" 1 { target { i?86-*-* x86_64-*-* } } } } */
> > > +/* { dg-final { scan-assembler-times "jmp\[ \t\]+f1" 3 { target { i?86-*-* x86_64-*-* } } } } */
> > > +/* { dg-final { scan-assembler-not "sub" { target { i?86-*-* x86_64-*-* } } } } */
> > > +/* { dg-final { scan-assembler-not "test" { target { i?86-*-* x86_64-*-* } } } } */
> > > diff --git a/gcc/timevar.def b/gcc/timevar.def
> > > index 8e2168e0817..468266d0c56 100644
> > > --- a/gcc/timevar.def
> > > +++ b/gcc/timevar.def
> > > @@ -157,6 +157,7 @@ DEFTIMEVAR (TV_TREE_EH                   , "tree eh")
> > >  DEFTIMEVAR (TV_TREE_CFG                     , "tree CFG construction")
> > >  DEFTIMEVAR (TV_TREE_CLEANUP_CFG             , "tree CFG cleanup")
> > >  DEFTIMEVAR (TV_TREE_TAIL_MERGE       , "tree tail merge")
> > > +DEFTIMEVAR (TV_TREE_CMP                     , "tree cmp")
> > >  DEFTIMEVAR (TV_TREE_VRP              , "tree VRP")
> > >  DEFTIMEVAR (TV_TREE_VRP_THREADER     , "tree VRP threader")
> > >  DEFTIMEVAR (TV_TREE_EARLY_VRP        , "tree Early VRP")
> > > diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
> > > index 29267589eeb..71789a279c0 100644
> > > --- a/gcc/tree-pass.h
> > > +++ b/gcc/tree-pass.h
> > > @@ -401,6 +401,7 @@ extern gimple_opt_pass *make_pass_ch (gcc::context *ctxt);
> > >  extern gimple_opt_pass *make_pass_ch_vect (gcc::context *ctxt);
> > >  extern gimple_opt_pass *make_pass_sccopy (gcc::context *ctxt);
> > >  extern gimple_opt_pass *make_pass_ccp (gcc::context *ctxt);
> > > +extern gimple_opt_pass *make_pass_cmp (gcc::context *ctxt);
> > >  extern gimple_opt_pass *make_pass_split_paths (gcc::context *ctxt);
> > >  extern gimple_opt_pass *make_pass_build_ssa (gcc::context *ctxt);
> > >  extern gimple_opt_pass *make_pass_build_alias (gcc::context *ctxt);
> > > diff --git a/gcc/tree-ssa-cmp.cc b/gcc/tree-ssa-cmp.cc
> > > new file mode 100644
> > > index 00000000000..41d4a58a5b5
> > > --- /dev/null
> > > +++ b/gcc/tree-ssa-cmp.cc
> > > @@ -0,0 +1,262 @@
> > > +/* Global, SSA-based optimizations using comparison identities.
> > > +   Copyright (C) 2024 Free Software Foundation, Inc.
> > > +
> > > +This file is part of GCC.
> > > +
> > > +GCC is free software; you can redistribute it and/or modify it
> > > +under the terms of the GNU General Public License as published by the
> > > +Free Software Foundation; either version 3, or (at your option) any
> > > +later version.
> > > +
> > > +GCC is distributed in the hope that it will be useful, but WITHOUT
> > > +ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> > > +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> > > +for more details.
> > > +
> > > +You should have received a copy of the GNU General Public License
> > > +along with GCC; see the file COPYING3.  If not see
> > > +<http://www.gnu.org/licenses/>.  */
> > > +
> > > +/* (x - y) CMP 0 is equivalent to x CMP y where x and y are signed integers and
> > > +   CMP is <, <=, >, or >=.  Similarly, 0 CMP (x - y) is equivalent to y CMP x.
> > > +   As reported in PR middle-end/113680, this equivalence does not hold for
> > > +   types other than signed integers.  When it comes to conditions, the former
> > > +   was translated to a combination of sub and test, whereas the latter was
> > > +   translated to a single cmp.  Thus, this optimization pass tries to optimize
> > > +   the former to the latter.
> > > +
> > > +   When `-fwrapv` is enabled, GCC treats the overflow of signed integers as
> > > +   defined behavior, specifically, wrapping around according to two's
> > > +   complement arithmetic.  This has implications for optimizations that
> > > +   rely on the standard behavior of signed integers, where overflow is
> > > +   undefined.  Consider the example given:
> > > +
> > > +       long long llmax = __LONG_LONG_MAX__;
> > > +       long long llmin = -llmax - 1;
> > > +
> > > +   Here, `llmax - llmin` effectively becomes `llmax - (-llmax - 1)`, which
> > > +   simplifies to `2 * llmax + 1`.  Given that `llmax` is the maximum value for
> > > +   a `long long`, this calculation overflows in a defined manner
> > > +   (wrapping around), which under `-fwrapv` is a legal operation that produces
> > > +   a negative value due to two's complement wraparound.  Therefore,
> > > +   `llmax - llmin < 0` is true.
> > > +
> > > +   However, the direct comparison `llmax < llmin` is false since `llmax` is the
> > > +   maximum possible value and `llmin` is the minimum.  Hence, optimizations
> > > +   that rely on the equivalence of `(x - y) CMP 0` to `x CMP y` (and vice versa)
> > > +   cannot be safely applied when `-fwrapv` is enabled.  This is why this
> > > +   optimization pass is disabled under `-fwrapv`.
> > > +
> > > +   This optimization pass must run before the Jump Threading pass and
> > > +   the VRP pass, as it may modify conditions. For example, in the VRP pass:
> > > +
> > > +       (1)
> > > +         int diff = x - y;
> > > +         if (diff > 0)
> > > +           foo();
> > > +         if (diff < 0)
> > > +           bar();
> > > +
> > > +   The second condition would be converted to diff != 0 in the VRP pass because
> > > +   we know the postcondition of the first condition is diff <= 0, and then
> > > +   diff != 0 is cheaper than diff < 0. If we apply this pass after this VRP,
> > > +   we get:
> > > +
> > > +       (2)
> > > +         int diff = x - y;
> > > +         if (x > y)
> > > +           foo();
> > > +         if (diff != 0)
> > > +           bar();
> > > +
> > > +   This generates sub and test for the second condition and cmp for the first
> > > +   condition. However, if we apply this pass beforehand, we simply get:
> > > +
> > > +       (3)
> > > +         int diff = x - y;
> > > +         if (x > y)
> > > +           foo();
> > > +         if (x < y)
> > > +           bar();
> > > +
> > > +   In this code, diff will be eliminated as a dead code, and sub and test will
> > > +   not be generated, which is more efficient.
> > > +
> > > +   For the Jump Threading pass, without this optimization pass, (1) and (3)
> > > +   above are recognized as different, which prevents TCO.  */
> > > +
> > > +#include "config.h"
> > > +#include "system.h"
> > > +#include "coretypes.h"
> > > +#include "backend.h"
> > > +#include "tree.h"
> > > +#include "gimple.h"
> > > +#include "tree-pass.h"
> > > +#include "ssa.h"
> > > +#include "gimple-iterator.h"
> > > +#include "domwalk.h"
> > > +
> > > +/* True if VAR is a signed integer, false otherwise.  */
> > > +
> > > +static bool
> > > +is_signed_integer(const_tree var)
> > > +{
> > > +  return (TREE_CODE (TREE_TYPE (var)) == INTEGER_TYPE
> > > +         && !TYPE_UNSIGNED (TREE_TYPE (var)));
> > > +}
> > > +
> > > +/* If EXPR is an integer zero, return it.  If EXPR is SSA_NAME, and its
> > > +   defining statement is a subtraction of two signed integers, return the
> > > +   MINUS_EXPR.  Otherwise, return NULL_TREE.  */
> > > +
> > > +static const_tree
> > > +prop_integer_zero_or_minus_expr (const_tree expr)
> > > +{
> > > +  if (integer_zerop (expr))
> > > +    return expr;
> > > +
> > > +  if (TREE_CODE (expr) != SSA_NAME)
> > > +    return NULL_TREE;
> > > +
> > > +  gimple *defining_stmt = SSA_NAME_DEF_STMT (expr);
> > > +  if (!is_gimple_assign (defining_stmt)
> > > +      || gimple_assign_rhs_code (defining_stmt) != MINUS_EXPR)
> > > +    return NULL_TREE;
> > > +
> > > +  tree rhs1 = gimple_assign_rhs1 (defining_stmt);
> > > +  if (!is_signed_integer (rhs1))
> > > +    return NULL_TREE;
> > > +
> > > +  tree rhs2 = gimple_assign_rhs2 (defining_stmt);
> > > +  if (!is_signed_integer (rhs2))
> > > +    return NULL_TREE;
> > > +
> > > +  return build2 (MINUS_EXPR, TREE_TYPE (expr), rhs1, rhs2);
> > > +}
> > > +
> > > +/* Optimize:
> > > +
> > > +       1. (x - y) CMP 0
> > > +       2. 0 CMP (x - y)
> > > +
> > > +   to:
> > > +
> > > +       1. x CMP y
> > > +       2. y CMP x
> > > +
> > > +   where CMP is either <, <=, >, or >=.  */
> > > +
> > > +static void
> > > +optimize_signed_comparison (gcond *stmt)
> > > +{
> > > +  const enum tree_code code = gimple_cond_code (stmt);
> > > +  if (code < LT_EXPR || GE_EXPR < code)
> > > +    return;
> > > +
> > > +  const_tree lhs = prop_integer_zero_or_minus_expr (gimple_cond_lhs (stmt));
> > > +  if (lhs == NULL_TREE)
> > > +    return;
> > > +
> > > +  const_tree rhs = prop_integer_zero_or_minus_expr (gimple_cond_rhs (stmt));
> > > +  if (rhs == NULL_TREE)
> > > +    return;
> > > +
> > > +  if (TREE_CODE (lhs) == MINUS_EXPR && integer_zerop (rhs))
> > > +    {
> > > +      /* Case 1: (x - y) CMP 0  */
> > > +      tree x = TREE_OPERAND (lhs, 0);
> > > +      tree y = TREE_OPERAND (lhs, 1);
> > > +
> > > +      /* => x CMP y  */
> > > +      gimple_cond_set_lhs (stmt, x);
> > > +      gimple_cond_set_rhs (stmt, y);
> > > +      update_stmt (stmt);
> > > +    }
> > > +  else if (integer_zerop (lhs) && TREE_CODE (rhs) == MINUS_EXPR)
> > > +    {
> > > +      /* Case 2: 0 CMP (x - y)  */
> > > +      tree x = TREE_OPERAND (rhs, 0);
> > > +      tree y = TREE_OPERAND (rhs, 1);
> > > +
> > > +      /* => y CMP x  */
> > > +      gimple_cond_set_lhs (stmt, y);
> > > +      gimple_cond_set_rhs (stmt, x);
> > > +      update_stmt (stmt);
> > > +    }
> > > +}
> > > +
> > > +/* Find signed integer comparisons where the operands are MINUS_EXPR and
> > > +   0, and replace them with the equivalent comparison of the operands of
> > > +   the MINUS_EXPR without the subtraction.  */
> > > +
> > > +namespace {
> > > +
> > > +const pass_data pass_data_cmp =
> > > +{
> > > +  GIMPLE_PASS, /* type */
> > > +  "cmp", /* name */
> > > +  OPTGROUP_NONE, /* optinfo_flags */
> > > +  TV_TREE_CMP, /* tv_id */
> > > +  PROP_ssa, /* properties_required */
> > > +  0, /* properties_provided */
> > > +  0, /* properties_destroyed */
> > > +  0, /* todo_flags_start */
> > > +  TODO_remove_unused_locals, /* todo_flags_finish */
> > > +};
> > > +
> > > +class pass_cmp : public gimple_opt_pass
> > > +{
> > > +public:
> > > +  pass_cmp (gcc::context *ctxt)
> > > +    : gimple_opt_pass (pass_data_cmp, ctxt)
> > > +  {}
> > > +
> > > +  /* opt_pass methods: */
> > > +  bool gate (function *) final override
> > > +    {
> > > +      /* If -fwrapv is enabled, this pass cannot be safely applied as described
> > > +        in the comment at the top of this file.  */
> > > +      return flag_tree_cmp != 0 && flag_wrapv == 0;
> > > +    }
> > > +
> > > +  unsigned int execute (function *) final override;
> > > +
> > > +}; // class pass_cmp
> > > +
> > > +class cmp_dom_walker : public dom_walker
> > > +{
> > > +public:
> > > +  cmp_dom_walker () : dom_walker (CDI_DOMINATORS) {}
> > > +
> > > +  void after_dom_children (basic_block) final override;
> > > +};
> > > +
> > > +void
> > > +cmp_dom_walker::after_dom_children (basic_block bb)
> > > +{
> > > +  gimple_stmt_iterator gsi;
> > > +
> > > +  for (gsi = gsi_after_labels (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> > > +    {
> > > +      gimple *stmt = gsi_stmt (gsi);
> > > +      if (gimple_code (stmt) == GIMPLE_COND)
> > > +       optimize_signed_comparison (as_a <gcond *> (stmt));
> > > +    }
> > > +}
> > > +
> > > +
> > > +unsigned int
> > > +pass_cmp::execute (function *)
> > > +{
> > > +  calculate_dominance_info (CDI_DOMINATORS);
> > > +  cmp_dom_walker ().walk (ENTRY_BLOCK_PTR_FOR_FN (cfun));
> > > +  return 0;
> > > +}
> > > +
> > > +} // anon namespace
> > > +
> > > +gimple_opt_pass *
> > > +make_pass_cmp (gcc::context *ctxt)
> > > +{
> > > +  return new pass_cmp (ctxt);
> > > +}
> > > --
> > > 2.44.0
> > >

Ken Matsui March 7, 2024, 7:29 p.m. UTC | #4

On Tue, Mar 5, 2024 at 7:58 AM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Tue, Mar 5, 2024 at 1:51 PM Ken Matsui <kmatsui@cs.washington.edu> wrote:
> >
> > On Tue, Mar 5, 2024 at 12:38 AM Richard Biener
> > <richard.guenther@gmail.com> wrote:
> > >
> > > On Mon, Mar 4, 2024 at 9:40 PM Ken Matsui <kmatsui@gcc.gnu.org> wrote:
> > > >
> > > > (x - y) CMP 0 is equivalent to x CMP y where x and y are signed
> > > > integers and CMP is <, <=, >, or >=.  Similarly, 0 CMP (x - y) is
> > > > equivalent to y CMP x.  As reported in PR middle-end/113680, this
> > > > equivalence does not hold for types other than signed integers.  When
> > > > it comes to conditions, the former was translated to a combination of
> > > > sub and test, whereas the latter was translated to a single cmp.
> > > > Thus, this optimization pass tries to optimize the former to the
> > > > latter.
> > > >
> > > > When `-fwrapv` is enabled, GCC treats the overflow of signed integers
> > > > as defined behavior, specifically, wrapping around according to two's
> > > > complement arithmetic.  This has implications for optimizations that
> > > > rely on the standard behavior of signed integers, where overflow is
> > > > undefined.  Consider the example given:
> > > >
> > > >         long long llmax = __LONG_LONG_MAX__;
> > > >         long long llmin = -llmax - 1;
> > > >
> > > > Here, `llmax - llmin` effectively becomes `llmax - (-llmax - 1)`, which
> > > > simplifies to `2 * llmax + 1`.  Given that `llmax` is the maximum value
> > > > for a `long long`, this calculation overflows in a defined manner
> > > > (wrapping around), which under `-fwrapv` is a legal operation that
> > > > produces a negative value due to two's complement wraparound.
> > > > Therefore, `llmax - llmin < 0` is true.
> > > >
> > > > However, the direct comparison `llmax < llmin` is false since `llmax`
> > > > is the maximum possible value and `llmin` is the minimum.  Hence,
> > > > optimizations that rely on the equivalence of `(x - y) CMP 0` to
> > > > `x CMP y` (and vice versa) cannot be safely applied when `-fwrapv` is
> > > > enabled.  This is why this optimization pass is disabled under
> > > > `-fwrapv`.
> > > >
> > > > This optimization pass must run before the Jump Threading pass and the
> > > > VRP pass, as it may modify conditions. For example, in the VRP pass:
> > > >
> > > >         (1)
> > > >           int diff = x - y;
> > > >           if (diff > 0)
> > > >             foo();
> > > >           if (diff < 0)
> > > >             bar();
> > > >
> > > > The second condition would be converted to diff != 0 in the VRP pass
> > > > because we know the postcondition of the first condition is diff <= 0,
> > > > and then diff != 0 is cheaper than diff < 0. If we apply this pass
> > > > after this VRP, we get:
> > > >
> > > >         (2)
> > > >           int diff = x - y;
> > > >           if (x > y)
> > > >             foo();
> > > >           if (diff != 0)
> > > >             bar();
> > > >
> > > > This generates sub and test for the second condition and cmp for the
> > > > first condition. However, if we apply this pass beforehand, we simply
> > > > get:
> > > >
> > > >         (3)
> > > >           int diff = x - y;
> > > >           if (x > y)
> > > >             foo();
> > > >           if (x < y)
> > > >             bar();
> > > >
> > > > In this code, diff will be eliminated as a dead code, and sub and test
> > > > will not be generated, which is more efficient.
> > > >
> > > > For the Jump Threading pass, without this optimization pass, (1) and
> > > > (3) above are recognized as different, which prevents TCO.
> > > >
> > > >         PR middle-end/113680
> > >
> > > This shouldn't be done as a new optimization pass.  It fits either
> > > the explicit code present in the forwprop pass or a new match.pd
> > > pattern.  There's possible interaction with x - y value being used
> > > elsewhere and thus exposing a CSE opportunity as well as
> > > a comparison against zero being possibly implemented by
> > > a flag setting subtraction instruction.
> > >
> >
> > Thank you so much for your review!  Although the forwprop pass runs
> > multiple times, we might not need to apply this optimization multiple
> > times.  Would it be acceptable to add such optimization?  More
> > generally, I would like to know how to determine where to put
> > optimization in the future.
>
> This kind of pattern matching expression simplification is best
> addressed by patterns in match.pd though historically the forwprop
> pass still catches cases not in match.pd in its
> forward_propagate_into_comparison_1 (and callers).
>

I see.  When would patterns in match.pd be applied?  Around forwprop
or somewhere else?  (Also, could you please tell me a document I can
learn about these if it exists?)  I ask because this optimization
should be applied before the Jump Threading and VRP passes.

> > FYI, I read this page: https://gcc.gnu.org/wiki/OptimizationCourse
> >
> > > Our VN pass has some tricks to anticipate CSE opportunities
> > > like this, but it's not done "properly" in the way of anticipating
> > > both forms during PRE.
> > >
> > > I'll note we have
> > >
> > >  /* (A - B) != 0 ? (A - B) : (B - A)    same as (A - B) */
> > >  (for cmp (ne ltgt)
> > >
> > > and similar which might be confused by canonicalizing to A != B.
> >
> > I will investigate and update my patch (after my final exam ends...)!
> >
> > > I'm also surprised we don't already have the pattern you add.
> >
> > Hmm, so am I.
>
> It looks like we do it for equality compares which can also handle
> types where overflow is undefined.  -fdump-tree-all-folding shows
> in the 005t.original dump we simplify a - b != 0 via match.pd:6105:
>
> /* Transform comparisons of the form X - Y CMP 0 to X CMP Y.
>    ??? The transformation is valid for the other operators if overflow
>    is undefined for the type, but performing it here badly interacts
>    with the transformation in fold_cond_expr_with_comparison which
>    attempts to synthetize ABS_EXPR.  */
> (for cmp (eq ne)
>  (for sub (minus pointer_diff)
>   (simplify
>    (cmp (sub@2 @0 @1) integer_zerop)
>    (if (single_use (@2))
>     (cmp @0 @1)))))
>
> and the comment explains why we don't perform the transform
> it the way you intend.  That comment might no longer apply
> but I would guess there's testsuite fallout when you amend this
> pattern.

Thank you so much!  I will check this out.

>
> Richard.
>
> >  I saw that this optimization sometimes messes up VRP
> > while sometimes helping it (as I described in my comment).  I also
> > need to research this.
> >
> > >
> > > Richard.
> > >
> > > > gcc/ChangeLog:
> > > >
> > > >         * Makefile.in: Add tree-ssa-cmp.o to OBJS.
> > > >         * common.opt: Define ftree-cmp
> > > >         * doc/invoke.texi: Document ftree-cmp.
> > > >         * opts.cc (default_options_table): Handle OPT_ftree_cmp.
> > > >         * passes.def (pass_cmp): New optimization pass.
> > > >         * timevar.def (TV_TREE_CMP): New variable for timing.
> > > >         * tree-pass.h (make_pass_cmp): New declaration.
> > > >         * tree-ssa-cmp.cc: New file.
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > >         * gcc.dg/pr113680.c: New test.
> > > >
> > > > Signed-off-by: Ken Matsui <kmatsui@gcc.gnu.org>
> > > > ---
> > > >  gcc/Makefile.in                 |   1 +
> > > >  gcc/common.opt                  |   4 +
> > > >  gcc/doc/invoke.texi             |  11 +-
> > > >  gcc/opts.cc                     |   1 +
> > > >  gcc/passes.def                  |   3 +
> > > >  gcc/testsuite/gcc.dg/pr113680.c |  47 ++++++
> > > >  gcc/timevar.def                 |   1 +
> > > >  gcc/tree-pass.h                 |   1 +
> > > >  gcc/tree-ssa-cmp.cc             | 262 ++++++++++++++++++++++++++++++++
> > > >  9 files changed, 330 insertions(+), 1 deletion(-)
> > > >  create mode 100644 gcc/testsuite/gcc.dg/pr113680.c
> > > >  create mode 100644 gcc/tree-ssa-cmp.cc
> > > >
> > > > diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> > > > index a74761b7ab3..935b80b6947 100644
> > > > --- a/gcc/Makefile.in
> > > > +++ b/gcc/Makefile.in
> > > > @@ -1731,6 +1731,7 @@ OBJS = \
> > > >         tree-ssa-address.o \
> > > >         tree-ssa-alias.o \
> > > >         tree-ssa-ccp.o \
> > > > +       tree-ssa-cmp.o \
> > > >         tree-ssa-coalesce.o \
> > > >         tree-ssa-copy.o \
> > > >         tree-ssa-dce.o \
> > > > diff --git a/gcc/common.opt b/gcc/common.opt
> > > > index 51c4a17da83..7c853224458 100644
> > > > --- a/gcc/common.opt
> > > > +++ b/gcc/common.opt
> > > > @@ -3053,6 +3053,10 @@ ftree-ch
> > > >  Common Var(flag_tree_ch) Optimization
> > > >  Enable loop header copying on trees.
> > > >
> > > > +ftree-cmp
> > > > +Common Var(flag_tree_cmp) Optimization
> > > > +Enable SSA comparison optimization on trees.
> > > > +
> > > >  ftree-coalesce-inlined-vars
> > > >  Common Ignore RejectNegative
> > > >  Does nothing.  Preserved for backward compatibility.
> > > > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> > > > index bdf05be387d..04762d490a3 100644
> > > > --- a/gcc/doc/invoke.texi
> > > > +++ b/gcc/doc/invoke.texi
> > > > @@ -619,7 +619,7 @@ Objective-C and Objective-C++ Dialects}.
> > > >  -fsplit-wide-types  -fsplit-wide-types-early  -fssa-backprop  -fssa-phiopt
> > > >  -fstdarg-opt  -fstore-merging  -fstrict-aliasing -fipa-strict-aliasing
> > > >  -fthread-jumps  -ftracer  -ftree-bit-ccp
> > > > --ftree-builtin-call-dce  -ftree-ccp  -ftree-ch
> > > > +-ftree-builtin-call-dce  -ftree-ccp  -ftree-ch  -ftree-cmp
> > > >  -ftree-coalesce-vars  -ftree-copy-prop  -ftree-dce  -ftree-dominator-opts
> > > >  -ftree-dse  -ftree-forwprop  -ftree-fre  -fcode-hoisting
> > > >  -ftree-loop-if-convert  -ftree-loop-im
> > > > @@ -12377,6 +12377,7 @@ compilation time.
> > > >  -ftree-bit-ccp
> > > >  -ftree-ccp
> > > >  -ftree-ch
> > > > +-ftree-cmp
> > > >  -ftree-coalesce-vars
> > > >  -ftree-copy-prop
> > > >  -ftree-dce
> > > > @@ -13707,6 +13708,14 @@ effectiveness of code motion optimizations.  It also saves one jump.  This flag
> > > >  is enabled by default at @option{-O1} and higher.  It is not enabled
> > > >  for @option{-Os}, since it usually increases code size.
> > > >
> > > > +@opindex ftree-cmp
> > > > +@item -ftree-cmp
> > > > +Perform comparison optimization on trees.  This decreases unnecessary
> > > > +instructions for comparisons.  This flag is enabled by default at
> > > > +@option{-O1} and higher.  This optimization is automatically disabled when
> > > > +@option{-fwrapv} is specified to ensure correct behavior under signed integer
> > > > +overflow wrapping.
> > > > +
> > > >  @opindex ftree-loop-optimize
> > > >  @item -ftree-loop-optimize
> > > >  Perform loop optimizations on trees.  This flag is enabled by default
> > > > diff --git a/gcc/opts.cc b/gcc/opts.cc
> > > > index 3333600e0ea..9ee64a48313 100644
> > > > --- a/gcc/opts.cc
> > > > +++ b/gcc/opts.cc
> > > > @@ -589,6 +589,7 @@ static const struct default_options default_options_table[] =
> > > >      { OPT_LEVELS_1_PLUS, OPT_ftree_builtin_call_dce, NULL, 1 },
> > > >      { OPT_LEVELS_1_PLUS, OPT_ftree_ccp, NULL, 1 },
> > > >      { OPT_LEVELS_1_PLUS, OPT_ftree_ch, NULL, 1 },
> > > > +    { OPT_LEVELS_1_PLUS, OPT_ftree_cmp, NULL, 1 },
> > > >      { OPT_LEVELS_1_PLUS, OPT_ftree_coalesce_vars, NULL, 1 },
> > > >      { OPT_LEVELS_1_PLUS, OPT_ftree_copy_prop, NULL, 1 },
> > > >      { OPT_LEVELS_1_PLUS, OPT_ftree_dce, NULL, 1 },
> > > > diff --git a/gcc/passes.def b/gcc/passes.def
> > > > index 1cbbd413097..e28a2d3c309 100644
> > > > --- a/gcc/passes.def
> > > > +++ b/gcc/passes.def
> > > > @@ -84,6 +84,9 @@ along with GCC; see the file COPYING3.  If not see
> > > >           /* After CCP we rewrite no longer addressed locals into SSA
> > > >              form if possible.  */
> > > >           NEXT_PASS (pass_forwprop);
> > > > +         /* Do cmp before early_thread_jumps and vrp since it may
> > > > +            modify conditions.  */
> > > > +         NEXT_PASS (pass_cmp);
> > > >            NEXT_PASS (pass_early_thread_jumps, /*first=*/true);
> > > >           NEXT_PASS (pass_sra_early);
> > > >           /* pass_build_ealias is a dummy pass that ensures that we
> > > > diff --git a/gcc/testsuite/gcc.dg/pr113680.c b/gcc/testsuite/gcc.dg/pr113680.c
> > > > new file mode 100644
> > > > index 00000000000..59cd865aa21
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.dg/pr113680.c
> > > > @@ -0,0 +1,47 @@
> > > > +/* { dg-do compile } */
> > > > +/* { dg-options "-Os -fdump-tree-optimized" } */
> > > > +
> > > > +volatile int i = 0;
> > > > +
> > > > +void foo() { i = 1; }
> > > > +void bar() { i = 2; }
> > > > +
> > > > +void f1(int x, int y)
> > > > +{
> > > > +  int diff = x - y;
> > > > +  if (diff > 0)
> > > > +    foo();
> > > > +  if (diff < 0)
> > > > +    bar();
> > > > +}
> > > > +
> > > > +void f2(int x, int y)
> > > > +{
> > > > +  if ((x - y) > 0)
> > > > +    foo();
> > > > +  if ((x - y) < 0)
> > > > +    bar();
> > > > +}
> > > > +
> > > > +void f3(int x, int y)
> > > > +{
> > > > +  if (x > y)
> > > > +    foo();
> > > > +  if (x < y)
> > > > +    bar();
> > > > +}
> > > > +
> > > > +void f4(int x, int y)
> > > > +{
> > > > +  int diff = x - y;
> > > > +  if (diff > 0)
> > > > +    foo();
> > > > +  if (x < y)
> > > > +    bar();
> > > > +}
> > > > +
> > > > +/* { dg-final { scan-tree-dump-not " - " "optimized" } } */
> > > > +/* { dg-final { scan-assembler-times "cmp" 1 { target { i?86-*-* x86_64-*-* } } } } */
> > > > +/* { dg-final { scan-assembler-times "jmp\[ \t\]+f1" 3 { target { i?86-*-* x86_64-*-* } } } } */
> > > > +/* { dg-final { scan-assembler-not "sub" { target { i?86-*-* x86_64-*-* } } } } */
> > > > +/* { dg-final { scan-assembler-not "test" { target { i?86-*-* x86_64-*-* } } } } */
> > > > diff --git a/gcc/timevar.def b/gcc/timevar.def
> > > > index 8e2168e0817..468266d0c56 100644
> > > > --- a/gcc/timevar.def
> > > > +++ b/gcc/timevar.def
> > > > @@ -157,6 +157,7 @@ DEFTIMEVAR (TV_TREE_EH                   , "tree eh")
> > > >  DEFTIMEVAR (TV_TREE_CFG                     , "tree CFG construction")
> > > >  DEFTIMEVAR (TV_TREE_CLEANUP_CFG             , "tree CFG cleanup")
> > > >  DEFTIMEVAR (TV_TREE_TAIL_MERGE       , "tree tail merge")
> > > > +DEFTIMEVAR (TV_TREE_CMP                     , "tree cmp")
> > > >  DEFTIMEVAR (TV_TREE_VRP              , "tree VRP")
> > > >  DEFTIMEVAR (TV_TREE_VRP_THREADER     , "tree VRP threader")
> > > >  DEFTIMEVAR (TV_TREE_EARLY_VRP        , "tree Early VRP")
> > > > diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
> > > > index 29267589eeb..71789a279c0 100644
> > > > --- a/gcc/tree-pass.h
> > > > +++ b/gcc/tree-pass.h
> > > > @@ -401,6 +401,7 @@ extern gimple_opt_pass *make_pass_ch (gcc::context *ctxt);
> > > >  extern gimple_opt_pass *make_pass_ch_vect (gcc::context *ctxt);
> > > >  extern gimple_opt_pass *make_pass_sccopy (gcc::context *ctxt);
> > > >  extern gimple_opt_pass *make_pass_ccp (gcc::context *ctxt);
> > > > +extern gimple_opt_pass *make_pass_cmp (gcc::context *ctxt);
> > > >  extern gimple_opt_pass *make_pass_split_paths (gcc::context *ctxt);
> > > >  extern gimple_opt_pass *make_pass_build_ssa (gcc::context *ctxt);
> > > >  extern gimple_opt_pass *make_pass_build_alias (gcc::context *ctxt);
> > > > diff --git a/gcc/tree-ssa-cmp.cc b/gcc/tree-ssa-cmp.cc
> > > > new file mode 100644
> > > > index 00000000000..41d4a58a5b5
> > > > --- /dev/null
> > > > +++ b/gcc/tree-ssa-cmp.cc
> > > > @@ -0,0 +1,262 @@
> > > > +/* Global, SSA-based optimizations using comparison identities.
> > > > +   Copyright (C) 2024 Free Software Foundation, Inc.
> > > > +
> > > > +This file is part of GCC.
> > > > +
> > > > +GCC is free software; you can redistribute it and/or modify it
> > > > +under the terms of the GNU General Public License as published by the
> > > > +Free Software Foundation; either version 3, or (at your option) any
> > > > +later version.
> > > > +
> > > > +GCC is distributed in the hope that it will be useful, but WITHOUT
> > > > +ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> > > > +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> > > > +for more details.
> > > > +
> > > > +You should have received a copy of the GNU General Public License
> > > > +along with GCC; see the file COPYING3.  If not see
> > > > +<http://www.gnu.org/licenses/>.  */
> > > > +
> > > > +/* (x - y) CMP 0 is equivalent to x CMP y where x and y are signed integers and
> > > > +   CMP is <, <=, >, or >=.  Similarly, 0 CMP (x - y) is equivalent to y CMP x.
> > > > +   As reported in PR middle-end/113680, this equivalence does not hold for
> > > > +   types other than signed integers.  When it comes to conditions, the former
> > > > +   was translated to a combination of sub and test, whereas the latter was
> > > > +   translated to a single cmp.  Thus, this optimization pass tries to optimize
> > > > +   the former to the latter.
> > > > +
> > > > +   When `-fwrapv` is enabled, GCC treats the overflow of signed integers as
> > > > +   defined behavior, specifically, wrapping around according to two's
> > > > +   complement arithmetic.  This has implications for optimizations that
> > > > +   rely on the standard behavior of signed integers, where overflow is
> > > > +   undefined.  Consider the example given:
> > > > +
> > > > +       long long llmax = __LONG_LONG_MAX__;
> > > > +       long long llmin = -llmax - 1;
> > > > +
> > > > +   Here, `llmax - llmin` effectively becomes `llmax - (-llmax - 1)`, which
> > > > +   simplifies to `2 * llmax + 1`.  Given that `llmax` is the maximum value for
> > > > +   a `long long`, this calculation overflows in a defined manner
> > > > +   (wrapping around), which under `-fwrapv` is a legal operation that produces
> > > > +   a negative value due to two's complement wraparound.  Therefore,
> > > > +   `llmax - llmin < 0` is true.
> > > > +
> > > > +   However, the direct comparison `llmax < llmin` is false since `llmax` is the
> > > > +   maximum possible value and `llmin` is the minimum.  Hence, optimizations
> > > > +   that rely on the equivalence of `(x - y) CMP 0` to `x CMP y` (and vice versa)
> > > > +   cannot be safely applied when `-fwrapv` is enabled.  This is why this
> > > > +   optimization pass is disabled under `-fwrapv`.
> > > > +
> > > > +   This optimization pass must run before the Jump Threading pass and
> > > > +   the VRP pass, as it may modify conditions. For example, in the VRP pass:
> > > > +
> > > > +       (1)
> > > > +         int diff = x - y;
> > > > +         if (diff > 0)
> > > > +           foo();
> > > > +         if (diff < 0)
> > > > +           bar();
> > > > +
> > > > +   The second condition would be converted to diff != 0 in the VRP pass because
> > > > +   we know the postcondition of the first condition is diff <= 0, and then
> > > > +   diff != 0 is cheaper than diff < 0. If we apply this pass after this VRP,
> > > > +   we get:
> > > > +
> > > > +       (2)
> > > > +         int diff = x - y;
> > > > +         if (x > y)
> > > > +           foo();
> > > > +         if (diff != 0)
> > > > +           bar();
> > > > +
> > > > +   This generates sub and test for the second condition and cmp for the first
> > > > +   condition. However, if we apply this pass beforehand, we simply get:
> > > > +
> > > > +       (3)
> > > > +         int diff = x - y;
> > > > +         if (x > y)
> > > > +           foo();
> > > > +         if (x < y)
> > > > +           bar();
> > > > +
> > > > +   In this code, diff will be eliminated as a dead code, and sub and test will
> > > > +   not be generated, which is more efficient.
> > > > +
> > > > +   For the Jump Threading pass, without this optimization pass, (1) and (3)
> > > > +   above are recognized as different, which prevents TCO.  */
> > > > +
> > > > +#include "config.h"
> > > > +#include "system.h"
> > > > +#include "coretypes.h"
> > > > +#include "backend.h"
> > > > +#include "tree.h"
> > > > +#include "gimple.h"
> > > > +#include "tree-pass.h"
> > > > +#include "ssa.h"
> > > > +#include "gimple-iterator.h"
> > > > +#include "domwalk.h"
> > > > +
> > > > +/* True if VAR is a signed integer, false otherwise.  */
> > > > +
> > > > +static bool
> > > > +is_signed_integer(const_tree var)
> > > > +{
> > > > +  return (TREE_CODE (TREE_TYPE (var)) == INTEGER_TYPE
> > > > +         && !TYPE_UNSIGNED (TREE_TYPE (var)));
> > > > +}
> > > > +
> > > > +/* If EXPR is an integer zero, return it.  If EXPR is SSA_NAME, and its
> > > > +   defining statement is a subtraction of two signed integers, return the
> > > > +   MINUS_EXPR.  Otherwise, return NULL_TREE.  */
> > > > +
> > > > +static const_tree
> > > > +prop_integer_zero_or_minus_expr (const_tree expr)
> > > > +{
> > > > +  if (integer_zerop (expr))
> > > > +    return expr;
> > > > +
> > > > +  if (TREE_CODE (expr) != SSA_NAME)
> > > > +    return NULL_TREE;
> > > > +
> > > > +  gimple *defining_stmt = SSA_NAME_DEF_STMT (expr);
> > > > +  if (!is_gimple_assign (defining_stmt)
> > > > +      || gimple_assign_rhs_code (defining_stmt) != MINUS_EXPR)
> > > > +    return NULL_TREE;
> > > > +
> > > > +  tree rhs1 = gimple_assign_rhs1 (defining_stmt);
> > > > +  if (!is_signed_integer (rhs1))
> > > > +    return NULL_TREE;
> > > > +
> > > > +  tree rhs2 = gimple_assign_rhs2 (defining_stmt);
> > > > +  if (!is_signed_integer (rhs2))
> > > > +    return NULL_TREE;
> > > > +
> > > > +  return build2 (MINUS_EXPR, TREE_TYPE (expr), rhs1, rhs2);
> > > > +}
> > > > +
> > > > +/* Optimize:
> > > > +
> > > > +       1. (x - y) CMP 0
> > > > +       2. 0 CMP (x - y)
> > > > +
> > > > +   to:
> > > > +
> > > > +       1. x CMP y
> > > > +       2. y CMP x
> > > > +
> > > > +   where CMP is either <, <=, >, or >=.  */
> > > > +
> > > > +static void
> > > > +optimize_signed_comparison (gcond *stmt)
> > > > +{
> > > > +  const enum tree_code code = gimple_cond_code (stmt);
> > > > +  if (code < LT_EXPR || GE_EXPR < code)
> > > > +    return;
> > > > +
> > > > +  const_tree lhs = prop_integer_zero_or_minus_expr (gimple_cond_lhs (stmt));
> > > > +  if (lhs == NULL_TREE)
> > > > +    return;
> > > > +
> > > > +  const_tree rhs = prop_integer_zero_or_minus_expr (gimple_cond_rhs (stmt));
> > > > +  if (rhs == NULL_TREE)
> > > > +    return;
> > > > +
> > > > +  if (TREE_CODE (lhs) == MINUS_EXPR && integer_zerop (rhs))
> > > > +    {
> > > > +      /* Case 1: (x - y) CMP 0  */
> > > > +      tree x = TREE_OPERAND (lhs, 0);
> > > > +      tree y = TREE_OPERAND (lhs, 1);
> > > > +
> > > > +      /* => x CMP y  */
> > > > +      gimple_cond_set_lhs (stmt, x);
> > > > +      gimple_cond_set_rhs (stmt, y);
> > > > +      update_stmt (stmt);
> > > > +    }
> > > > +  else if (integer_zerop (lhs) && TREE_CODE (rhs) == MINUS_EXPR)
> > > > +    {
> > > > +      /* Case 2: 0 CMP (x - y)  */
> > > > +      tree x = TREE_OPERAND (rhs, 0);
> > > > +      tree y = TREE_OPERAND (rhs, 1);
> > > > +
> > > > +      /* => y CMP x  */
> > > > +      gimple_cond_set_lhs (stmt, y);
> > > > +      gimple_cond_set_rhs (stmt, x);
> > > > +      update_stmt (stmt);
> > > > +    }
> > > > +}
> > > > +
> > > > +/* Find signed integer comparisons where the operands are MINUS_EXPR and
> > > > +   0, and replace them with the equivalent comparison of the operands of
> > > > +   the MINUS_EXPR without the subtraction.  */
> > > > +
> > > > +namespace {
> > > > +
> > > > +const pass_data pass_data_cmp =
> > > > +{
> > > > +  GIMPLE_PASS, /* type */
> > > > +  "cmp", /* name */
> > > > +  OPTGROUP_NONE, /* optinfo_flags */
> > > > +  TV_TREE_CMP, /* tv_id */
> > > > +  PROP_ssa, /* properties_required */
> > > > +  0, /* properties_provided */
> > > > +  0, /* properties_destroyed */
> > > > +  0, /* todo_flags_start */
> > > > +  TODO_remove_unused_locals, /* todo_flags_finish */
> > > > +};
> > > > +
> > > > +class pass_cmp : public gimple_opt_pass
> > > > +{
> > > > +public:
> > > > +  pass_cmp (gcc::context *ctxt)
> > > > +    : gimple_opt_pass (pass_data_cmp, ctxt)
> > > > +  {}
> > > > +
> > > > +  /* opt_pass methods: */
> > > > +  bool gate (function *) final override
> > > > +    {
> > > > +      /* If -fwrapv is enabled, this pass cannot be safely applied as described
> > > > +        in the comment at the top of this file.  */
> > > > +      return flag_tree_cmp != 0 && flag_wrapv == 0;
> > > > +    }
> > > > +
> > > > +  unsigned int execute (function *) final override;
> > > > +
> > > > +}; // class pass_cmp
> > > > +
> > > > +class cmp_dom_walker : public dom_walker
> > > > +{
> > > > +public:
> > > > +  cmp_dom_walker () : dom_walker (CDI_DOMINATORS) {}
> > > > +
> > > > +  void after_dom_children (basic_block) final override;
> > > > +};
> > > > +
> > > > +void
> > > > +cmp_dom_walker::after_dom_children (basic_block bb)
> > > > +{
> > > > +  gimple_stmt_iterator gsi;
> > > > +
> > > > +  for (gsi = gsi_after_labels (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> > > > +    {
> > > > +      gimple *stmt = gsi_stmt (gsi);
> > > > +      if (gimple_code (stmt) == GIMPLE_COND)
> > > > +       optimize_signed_comparison (as_a <gcond *> (stmt));
> > > > +    }
> > > > +}
> > > > +
> > > > +
> > > > +unsigned int
> > > > +pass_cmp::execute (function *)
> > > > +{
> > > > +  calculate_dominance_info (CDI_DOMINATORS);
> > > > +  cmp_dom_walker ().walk (ENTRY_BLOCK_PTR_FOR_FN (cfun));
> > > > +  return 0;
> > > > +}
> > > > +
> > > > +} // anon namespace
> > > > +
> > > > +gimple_opt_pass *
> > > > +make_pass_cmp (gcc::context *ctxt)
> > > > +{
> > > > +  return new pass_cmp (ctxt);
> > > > +}
> > > > --
> > > > 2.44.0
> > > >

Richard Biener March 8, 2024, 6:48 a.m. UTC | #5

On Thu, Mar 7, 2024 at 8:29 PM Ken Matsui <kmatsui@cs.washington.edu> wrote:
>
> On Tue, Mar 5, 2024 at 7:58 AM Richard Biener
> <richard.guenther@gmail.com> wrote:
> >
> > On Tue, Mar 5, 2024 at 1:51 PM Ken Matsui <kmatsui@cs.washington.edu> wrote:
> > >
> > > On Tue, Mar 5, 2024 at 12:38 AM Richard Biener
> > > <richard.guenther@gmail.com> wrote:
> > > >
> > > > On Mon, Mar 4, 2024 at 9:40 PM Ken Matsui <kmatsui@gcc.gnu.org> wrote:
> > > > >
> > > > > (x - y) CMP 0 is equivalent to x CMP y where x and y are signed
> > > > > integers and CMP is <, <=, >, or >=.  Similarly, 0 CMP (x - y) is
> > > > > equivalent to y CMP x.  As reported in PR middle-end/113680, this
> > > > > equivalence does not hold for types other than signed integers.  When
> > > > > it comes to conditions, the former was translated to a combination of
> > > > > sub and test, whereas the latter was translated to a single cmp.
> > > > > Thus, this optimization pass tries to optimize the former to the
> > > > > latter.
> > > > >
> > > > > When `-fwrapv` is enabled, GCC treats the overflow of signed integers
> > > > > as defined behavior, specifically, wrapping around according to two's
> > > > > complement arithmetic.  This has implications for optimizations that
> > > > > rely on the standard behavior of signed integers, where overflow is
> > > > > undefined.  Consider the example given:
> > > > >
> > > > >         long long llmax = __LONG_LONG_MAX__;
> > > > >         long long llmin = -llmax - 1;
> > > > >
> > > > > Here, `llmax - llmin` effectively becomes `llmax - (-llmax - 1)`, which
> > > > > simplifies to `2 * llmax + 1`.  Given that `llmax` is the maximum value
> > > > > for a `long long`, this calculation overflows in a defined manner
> > > > > (wrapping around), which under `-fwrapv` is a legal operation that
> > > > > produces a negative value due to two's complement wraparound.
> > > > > Therefore, `llmax - llmin < 0` is true.
> > > > >
> > > > > However, the direct comparison `llmax < llmin` is false since `llmax`
> > > > > is the maximum possible value and `llmin` is the minimum.  Hence,
> > > > > optimizations that rely on the equivalence of `(x - y) CMP 0` to
> > > > > `x CMP y` (and vice versa) cannot be safely applied when `-fwrapv` is
> > > > > enabled.  This is why this optimization pass is disabled under
> > > > > `-fwrapv`.
> > > > >
> > > > > This optimization pass must run before the Jump Threading pass and the
> > > > > VRP pass, as it may modify conditions. For example, in the VRP pass:
> > > > >
> > > > >         (1)
> > > > >           int diff = x - y;
> > > > >           if (diff > 0)
> > > > >             foo();
> > > > >           if (diff < 0)
> > > > >             bar();
> > > > >
> > > > > The second condition would be converted to diff != 0 in the VRP pass
> > > > > because we know the postcondition of the first condition is diff <= 0,
> > > > > and then diff != 0 is cheaper than diff < 0. If we apply this pass
> > > > > after this VRP, we get:
> > > > >
> > > > >         (2)
> > > > >           int diff = x - y;
> > > > >           if (x > y)
> > > > >             foo();
> > > > >           if (diff != 0)
> > > > >             bar();
> > > > >
> > > > > This generates sub and test for the second condition and cmp for the
> > > > > first condition. However, if we apply this pass beforehand, we simply
> > > > > get:
> > > > >
> > > > >         (3)
> > > > >           int diff = x - y;
> > > > >           if (x > y)
> > > > >             foo();
> > > > >           if (x < y)
> > > > >             bar();
> > > > >
> > > > > In this code, diff will be eliminated as a dead code, and sub and test
> > > > > will not be generated, which is more efficient.
> > > > >
> > > > > For the Jump Threading pass, without this optimization pass, (1) and
> > > > > (3) above are recognized as different, which prevents TCO.
> > > > >
> > > > >         PR middle-end/113680
> > > >
> > > > This shouldn't be done as a new optimization pass.  It fits either
> > > > the explicit code present in the forwprop pass or a new match.pd
> > > > pattern.  There's possible interaction with x - y value being used
> > > > elsewhere and thus exposing a CSE opportunity as well as
> > > > a comparison against zero being possibly implemented by
> > > > a flag setting subtraction instruction.
> > > >
> > >
> > > Thank you so much for your review!  Although the forwprop pass runs
> > > multiple times, we might not need to apply this optimization multiple
> > > times.  Would it be acceptable to add such optimization?  More
> > > generally, I would like to know how to determine where to put
> > > optimization in the future.
> >
> > This kind of pattern matching expression simplification is best
> > addressed by patterns in match.pd though historically the forwprop
> > pass still catches cases not in match.pd in its
> > forward_propagate_into_comparison_1 (and callers).
> >
>
> I see.  When would patterns in match.pd be applied?  Around forwprop
> or somewhere else?  (Also, could you please tell me a document I can
> learn about these if it exists?)  I ask because this optimization
> should be applied before the Jump Threading and VRP passes.

match.pd patterns get used whenever an expression is simplified.  And
yes, the forwprop pass tries to simplify expressions rooted at each stmt.
In general each pass changing a statement or one of its operands should
re-simplify the expression rooted at it, so the theory is that all expressions
always remain simplified.

There's no documentation about the patterns present in match.pd if
you meant that.  I'm usually writing small testcases to "query" match.pd
and use -fdump-tree-all-folding looking into .original (when folded at
generic time), .gimple, .ccp and .forwprop which are the usual early
places where simplifications are triggered.

Richard.

> > > FYI, I read this page: https://gcc.gnu.org/wiki/OptimizationCourse
> > >
> > > > Our VN pass has some tricks to anticipate CSE opportunities
> > > > like this, but it's not done "properly" in the way of anticipating
> > > > both forms during PRE.
> > > >
> > > > I'll note we have
> > > >
> > > >  /* (A - B) != 0 ? (A - B) : (B - A)    same as (A - B) */
> > > >  (for cmp (ne ltgt)
> > > >
> > > > and similar which might be confused by canonicalizing to A != B.
> > >
> > > I will investigate and update my patch (after my final exam ends...)!
> > >
> > > > I'm also surprised we don't already have the pattern you add.
> > >
> > > Hmm, so am I.
> >
> > It looks like we do it for equality compares which can also handle
> > types where overflow is undefined.  -fdump-tree-all-folding shows
> > in the 005t.original dump we simplify a - b != 0 via match.pd:6105:
> >
> > /* Transform comparisons of the form X - Y CMP 0 to X CMP Y.
> >    ??? The transformation is valid for the other operators if overflow
> >    is undefined for the type, but performing it here badly interacts
> >    with the transformation in fold_cond_expr_with_comparison which
> >    attempts to synthetize ABS_EXPR.  */
> > (for cmp (eq ne)
> >  (for sub (minus pointer_diff)
> >   (simplify
> >    (cmp (sub@2 @0 @1) integer_zerop)
> >    (if (single_use (@2))
> >     (cmp @0 @1)))))
> >
> > and the comment explains why we don't perform the transform
> > it the way you intend.  That comment might no longer apply
> > but I would guess there's testsuite fallout when you amend this
> > pattern.
>
> Thank you so much!  I will check this out.
>
> >
> > Richard.
> >
> > >  I saw that this optimization sometimes messes up VRP
> > > while sometimes helping it (as I described in my comment).  I also
> > > need to research this.
> > >
> > > >
> > > > Richard.
> > > >
> > > > > gcc/ChangeLog:
> > > > >
> > > > >         * Makefile.in: Add tree-ssa-cmp.o to OBJS.
> > > > >         * common.opt: Define ftree-cmp
> > > > >         * doc/invoke.texi: Document ftree-cmp.
> > > > >         * opts.cc (default_options_table): Handle OPT_ftree_cmp.
> > > > >         * passes.def (pass_cmp): New optimization pass.
> > > > >         * timevar.def (TV_TREE_CMP): New variable for timing.
> > > > >         * tree-pass.h (make_pass_cmp): New declaration.
> > > > >         * tree-ssa-cmp.cc: New file.
> > > > >
> > > > > gcc/testsuite/ChangeLog:
> > > > >
> > > > >         * gcc.dg/pr113680.c: New test.
> > > > >
> > > > > Signed-off-by: Ken Matsui <kmatsui@gcc.gnu.org>
> > > > > ---
> > > > >  gcc/Makefile.in                 |   1 +
> > > > >  gcc/common.opt                  |   4 +
> > > > >  gcc/doc/invoke.texi             |  11 +-
> > > > >  gcc/opts.cc                     |   1 +
> > > > >  gcc/passes.def                  |   3 +
> > > > >  gcc/testsuite/gcc.dg/pr113680.c |  47 ++++++
> > > > >  gcc/timevar.def                 |   1 +
> > > > >  gcc/tree-pass.h                 |   1 +
> > > > >  gcc/tree-ssa-cmp.cc             | 262 ++++++++++++++++++++++++++++++++
> > > > >  9 files changed, 330 insertions(+), 1 deletion(-)
> > > > >  create mode 100644 gcc/testsuite/gcc.dg/pr113680.c
> > > > >  create mode 100644 gcc/tree-ssa-cmp.cc
> > > > >
> > > > > diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> > > > > index a74761b7ab3..935b80b6947 100644
> > > > > --- a/gcc/Makefile.in
> > > > > +++ b/gcc/Makefile.in
> > > > > @@ -1731,6 +1731,7 @@ OBJS = \
> > > > >         tree-ssa-address.o \
> > > > >         tree-ssa-alias.o \
> > > > >         tree-ssa-ccp.o \
> > > > > +       tree-ssa-cmp.o \
> > > > >         tree-ssa-coalesce.o \
> > > > >         tree-ssa-copy.o \
> > > > >         tree-ssa-dce.o \
> > > > > diff --git a/gcc/common.opt b/gcc/common.opt
> > > > > index 51c4a17da83..7c853224458 100644
> > > > > --- a/gcc/common.opt
> > > > > +++ b/gcc/common.opt
> > > > > @@ -3053,6 +3053,10 @@ ftree-ch
> > > > >  Common Var(flag_tree_ch) Optimization
> > > > >  Enable loop header copying on trees.
> > > > >
> > > > > +ftree-cmp
> > > > > +Common Var(flag_tree_cmp) Optimization
> > > > > +Enable SSA comparison optimization on trees.
> > > > > +
> > > > >  ftree-coalesce-inlined-vars
> > > > >  Common Ignore RejectNegative
> > > > >  Does nothing.  Preserved for backward compatibility.
> > > > > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> > > > > index bdf05be387d..04762d490a3 100644
> > > > > --- a/gcc/doc/invoke.texi
> > > > > +++ b/gcc/doc/invoke.texi
> > > > > @@ -619,7 +619,7 @@ Objective-C and Objective-C++ Dialects}.
> > > > >  -fsplit-wide-types  -fsplit-wide-types-early  -fssa-backprop  -fssa-phiopt
> > > > >  -fstdarg-opt  -fstore-merging  -fstrict-aliasing -fipa-strict-aliasing
> > > > >  -fthread-jumps  -ftracer  -ftree-bit-ccp
> > > > > --ftree-builtin-call-dce  -ftree-ccp  -ftree-ch
> > > > > +-ftree-builtin-call-dce  -ftree-ccp  -ftree-ch  -ftree-cmp
> > > > >  -ftree-coalesce-vars  -ftree-copy-prop  -ftree-dce  -ftree-dominator-opts
> > > > >  -ftree-dse  -ftree-forwprop  -ftree-fre  -fcode-hoisting
> > > > >  -ftree-loop-if-convert  -ftree-loop-im
> > > > > @@ -12377,6 +12377,7 @@ compilation time.
> > > > >  -ftree-bit-ccp
> > > > >  -ftree-ccp
> > > > >  -ftree-ch
> > > > > +-ftree-cmp
> > > > >  -ftree-coalesce-vars
> > > > >  -ftree-copy-prop
> > > > >  -ftree-dce
> > > > > @@ -13707,6 +13708,14 @@ effectiveness of code motion optimizations.  It also saves one jump.  This flag
> > > > >  is enabled by default at @option{-O1} and higher.  It is not enabled
> > > > >  for @option{-Os}, since it usually increases code size.
> > > > >
> > > > > +@opindex ftree-cmp
> > > > > +@item -ftree-cmp
> > > > > +Perform comparison optimization on trees.  This decreases unnecessary
> > > > > +instructions for comparisons.  This flag is enabled by default at
> > > > > +@option{-O1} and higher.  This optimization is automatically disabled when
> > > > > +@option{-fwrapv} is specified to ensure correct behavior under signed integer
> > > > > +overflow wrapping.
> > > > > +
> > > > >  @opindex ftree-loop-optimize
> > > > >  @item -ftree-loop-optimize
> > > > >  Perform loop optimizations on trees.  This flag is enabled by default
> > > > > diff --git a/gcc/opts.cc b/gcc/opts.cc
> > > > > index 3333600e0ea..9ee64a48313 100644
> > > > > --- a/gcc/opts.cc
> > > > > +++ b/gcc/opts.cc
> > > > > @@ -589,6 +589,7 @@ static const struct default_options default_options_table[] =
> > > > >      { OPT_LEVELS_1_PLUS, OPT_ftree_builtin_call_dce, NULL, 1 },
> > > > >      { OPT_LEVELS_1_PLUS, OPT_ftree_ccp, NULL, 1 },
> > > > >      { OPT_LEVELS_1_PLUS, OPT_ftree_ch, NULL, 1 },
> > > > > +    { OPT_LEVELS_1_PLUS, OPT_ftree_cmp, NULL, 1 },
> > > > >      { OPT_LEVELS_1_PLUS, OPT_ftree_coalesce_vars, NULL, 1 },
> > > > >      { OPT_LEVELS_1_PLUS, OPT_ftree_copy_prop, NULL, 1 },
> > > > >      { OPT_LEVELS_1_PLUS, OPT_ftree_dce, NULL, 1 },
> > > > > diff --git a/gcc/passes.def b/gcc/passes.def
> > > > > index 1cbbd413097..e28a2d3c309 100644
> > > > > --- a/gcc/passes.def
> > > > > +++ b/gcc/passes.def
> > > > > @@ -84,6 +84,9 @@ along with GCC; see the file COPYING3.  If not see
> > > > >           /* After CCP we rewrite no longer addressed locals into SSA
> > > > >              form if possible.  */
> > > > >           NEXT_PASS (pass_forwprop);
> > > > > +         /* Do cmp before early_thread_jumps and vrp since it may
> > > > > +            modify conditions.  */
> > > > > +         NEXT_PASS (pass_cmp);
> > > > >            NEXT_PASS (pass_early_thread_jumps, /*first=*/true);
> > > > >           NEXT_PASS (pass_sra_early);
> > > > >           /* pass_build_ealias is a dummy pass that ensures that we
> > > > > diff --git a/gcc/testsuite/gcc.dg/pr113680.c b/gcc/testsuite/gcc.dg/pr113680.c
> > > > > new file mode 100644
> > > > > index 00000000000..59cd865aa21
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.dg/pr113680.c
> > > > > @@ -0,0 +1,47 @@
> > > > > +/* { dg-do compile } */
> > > > > +/* { dg-options "-Os -fdump-tree-optimized" } */
> > > > > +
> > > > > +volatile int i = 0;
> > > > > +
> > > > > +void foo() { i = 1; }
> > > > > +void bar() { i = 2; }
> > > > > +
> > > > > +void f1(int x, int y)
> > > > > +{
> > > > > +  int diff = x - y;
> > > > > +  if (diff > 0)
> > > > > +    foo();
> > > > > +  if (diff < 0)
> > > > > +    bar();
> > > > > +}
> > > > > +
> > > > > +void f2(int x, int y)
> > > > > +{
> > > > > +  if ((x - y) > 0)
> > > > > +    foo();
> > > > > +  if ((x - y) < 0)
> > > > > +    bar();
> > > > > +}
> > > > > +
> > > > > +void f3(int x, int y)
> > > > > +{
> > > > > +  if (x > y)
> > > > > +    foo();
> > > > > +  if (x < y)
> > > > > +    bar();
> > > > > +}
> > > > > +
> > > > > +void f4(int x, int y)
> > > > > +{
> > > > > +  int diff = x - y;
> > > > > +  if (diff > 0)
> > > > > +    foo();
> > > > > +  if (x < y)
> > > > > +    bar();
> > > > > +}
> > > > > +
> > > > > +/* { dg-final { scan-tree-dump-not " - " "optimized" } } */
> > > > > +/* { dg-final { scan-assembler-times "cmp" 1 { target { i?86-*-* x86_64-*-* } } } } */
> > > > > +/* { dg-final { scan-assembler-times "jmp\[ \t\]+f1" 3 { target { i?86-*-* x86_64-*-* } } } } */
> > > > > +/* { dg-final { scan-assembler-not "sub" { target { i?86-*-* x86_64-*-* } } } } */
> > > > > +/* { dg-final { scan-assembler-not "test" { target { i?86-*-* x86_64-*-* } } } } */
> > > > > diff --git a/gcc/timevar.def b/gcc/timevar.def
> > > > > index 8e2168e0817..468266d0c56 100644
> > > > > --- a/gcc/timevar.def
> > > > > +++ b/gcc/timevar.def
> > > > > @@ -157,6 +157,7 @@ DEFTIMEVAR (TV_TREE_EH                   , "tree eh")
> > > > >  DEFTIMEVAR (TV_TREE_CFG                     , "tree CFG construction")
> > > > >  DEFTIMEVAR (TV_TREE_CLEANUP_CFG             , "tree CFG cleanup")
> > > > >  DEFTIMEVAR (TV_TREE_TAIL_MERGE       , "tree tail merge")
> > > > > +DEFTIMEVAR (TV_TREE_CMP                     , "tree cmp")
> > > > >  DEFTIMEVAR (TV_TREE_VRP              , "tree VRP")
> > > > >  DEFTIMEVAR (TV_TREE_VRP_THREADER     , "tree VRP threader")
> > > > >  DEFTIMEVAR (TV_TREE_EARLY_VRP        , "tree Early VRP")
> > > > > diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
> > > > > index 29267589eeb..71789a279c0 100644
> > > > > --- a/gcc/tree-pass.h
> > > > > +++ b/gcc/tree-pass.h
> > > > > @@ -401,6 +401,7 @@ extern gimple_opt_pass *make_pass_ch (gcc::context *ctxt);
> > > > >  extern gimple_opt_pass *make_pass_ch_vect (gcc::context *ctxt);
> > > > >  extern gimple_opt_pass *make_pass_sccopy (gcc::context *ctxt);
> > > > >  extern gimple_opt_pass *make_pass_ccp (gcc::context *ctxt);
> > > > > +extern gimple_opt_pass *make_pass_cmp (gcc::context *ctxt);
> > > > >  extern gimple_opt_pass *make_pass_split_paths (gcc::context *ctxt);
> > > > >  extern gimple_opt_pass *make_pass_build_ssa (gcc::context *ctxt);
> > > > >  extern gimple_opt_pass *make_pass_build_alias (gcc::context *ctxt);
> > > > > diff --git a/gcc/tree-ssa-cmp.cc b/gcc/tree-ssa-cmp.cc
> > > > > new file mode 100644
> > > > > index 00000000000..41d4a58a5b5
> > > > > --- /dev/null
> > > > > +++ b/gcc/tree-ssa-cmp.cc
> > > > > @@ -0,0 +1,262 @@
> > > > > +/* Global, SSA-based optimizations using comparison identities.
> > > > > +   Copyright (C) 2024 Free Software Foundation, Inc.
> > > > > +
> > > > > +This file is part of GCC.
> > > > > +
> > > > > +GCC is free software; you can redistribute it and/or modify it
> > > > > +under the terms of the GNU General Public License as published by the
> > > > > +Free Software Foundation; either version 3, or (at your option) any
> > > > > +later version.
> > > > > +
> > > > > +GCC is distributed in the hope that it will be useful, but WITHOUT
> > > > > +ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> > > > > +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> > > > > +for more details.
> > > > > +
> > > > > +You should have received a copy of the GNU General Public License
> > > > > +along with GCC; see the file COPYING3.  If not see
> > > > > +<http://www.gnu.org/licenses/>.  */
> > > > > +
> > > > > +/* (x - y) CMP 0 is equivalent to x CMP y where x and y are signed integers and
> > > > > +   CMP is <, <=, >, or >=.  Similarly, 0 CMP (x - y) is equivalent to y CMP x.
> > > > > +   As reported in PR middle-end/113680, this equivalence does not hold for
> > > > > +   types other than signed integers.  When it comes to conditions, the former
> > > > > +   was translated to a combination of sub and test, whereas the latter was
> > > > > +   translated to a single cmp.  Thus, this optimization pass tries to optimize
> > > > > +   the former to the latter.
> > > > > +
> > > > > +   When `-fwrapv` is enabled, GCC treats the overflow of signed integers as
> > > > > +   defined behavior, specifically, wrapping around according to two's
> > > > > +   complement arithmetic.  This has implications for optimizations that
> > > > > +   rely on the standard behavior of signed integers, where overflow is
> > > > > +   undefined.  Consider the example given:
> > > > > +
> > > > > +       long long llmax = __LONG_LONG_MAX__;
> > > > > +       long long llmin = -llmax - 1;
> > > > > +
> > > > > +   Here, `llmax - llmin` effectively becomes `llmax - (-llmax - 1)`, which
> > > > > +   simplifies to `2 * llmax + 1`.  Given that `llmax` is the maximum value for
> > > > > +   a `long long`, this calculation overflows in a defined manner
> > > > > +   (wrapping around), which under `-fwrapv` is a legal operation that produces
> > > > > +   a negative value due to two's complement wraparound.  Therefore,
> > > > > +   `llmax - llmin < 0` is true.
> > > > > +
> > > > > +   However, the direct comparison `llmax < llmin` is false since `llmax` is the
> > > > > +   maximum possible value and `llmin` is the minimum.  Hence, optimizations
> > > > > +   that rely on the equivalence of `(x - y) CMP 0` to `x CMP y` (and vice versa)
> > > > > +   cannot be safely applied when `-fwrapv` is enabled.  This is why this
> > > > > +   optimization pass is disabled under `-fwrapv`.
> > > > > +
> > > > > +   This optimization pass must run before the Jump Threading pass and
> > > > > +   the VRP pass, as it may modify conditions. For example, in the VRP pass:
> > > > > +
> > > > > +       (1)
> > > > > +         int diff = x - y;
> > > > > +         if (diff > 0)
> > > > > +           foo();
> > > > > +         if (diff < 0)
> > > > > +           bar();
> > > > > +
> > > > > +   The second condition would be converted to diff != 0 in the VRP pass because
> > > > > +   we know the postcondition of the first condition is diff <= 0, and then
> > > > > +   diff != 0 is cheaper than diff < 0. If we apply this pass after this VRP,
> > > > > +   we get:
> > > > > +
> > > > > +       (2)
> > > > > +         int diff = x - y;
> > > > > +         if (x > y)
> > > > > +           foo();
> > > > > +         if (diff != 0)
> > > > > +           bar();
> > > > > +
> > > > > +   This generates sub and test for the second condition and cmp for the first
> > > > > +   condition. However, if we apply this pass beforehand, we simply get:
> > > > > +
> > > > > +       (3)
> > > > > +         int diff = x - y;
> > > > > +         if (x > y)
> > > > > +           foo();
> > > > > +         if (x < y)
> > > > > +           bar();
> > > > > +
> > > > > +   In this code, diff will be eliminated as a dead code, and sub and test will
> > > > > +   not be generated, which is more efficient.
> > > > > +
> > > > > +   For the Jump Threading pass, without this optimization pass, (1) and (3)
> > > > > +   above are recognized as different, which prevents TCO.  */
> > > > > +
> > > > > +#include "config.h"
> > > > > +#include "system.h"
> > > > > +#include "coretypes.h"
> > > > > +#include "backend.h"
> > > > > +#include "tree.h"
> > > > > +#include "gimple.h"
> > > > > +#include "tree-pass.h"
> > > > > +#include "ssa.h"
> > > > > +#include "gimple-iterator.h"
> > > > > +#include "domwalk.h"
> > > > > +
> > > > > +/* True if VAR is a signed integer, false otherwise.  */
> > > > > +
> > > > > +static bool
> > > > > +is_signed_integer(const_tree var)
> > > > > +{
> > > > > +  return (TREE_CODE (TREE_TYPE (var)) == INTEGER_TYPE
> > > > > +         && !TYPE_UNSIGNED (TREE_TYPE (var)));
> > > > > +}
> > > > > +
> > > > > +/* If EXPR is an integer zero, return it.  If EXPR is SSA_NAME, and its
> > > > > +   defining statement is a subtraction of two signed integers, return the
> > > > > +   MINUS_EXPR.  Otherwise, return NULL_TREE.  */
> > > > > +
> > > > > +static const_tree
> > > > > +prop_integer_zero_or_minus_expr (const_tree expr)
> > > > > +{
> > > > > +  if (integer_zerop (expr))
> > > > > +    return expr;
> > > > > +
> > > > > +  if (TREE_CODE (expr) != SSA_NAME)
> > > > > +    return NULL_TREE;
> > > > > +
> > > > > +  gimple *defining_stmt = SSA_NAME_DEF_STMT (expr);
> > > > > +  if (!is_gimple_assign (defining_stmt)
> > > > > +      || gimple_assign_rhs_code (defining_stmt) != MINUS_EXPR)
> > > > > +    return NULL_TREE;
> > > > > +
> > > > > +  tree rhs1 = gimple_assign_rhs1 (defining_stmt);
> > > > > +  if (!is_signed_integer (rhs1))
> > > > > +    return NULL_TREE;
> > > > > +
> > > > > +  tree rhs2 = gimple_assign_rhs2 (defining_stmt);
> > > > > +  if (!is_signed_integer (rhs2))
> > > > > +    return NULL_TREE;
> > > > > +
> > > > > +  return build2 (MINUS_EXPR, TREE_TYPE (expr), rhs1, rhs2);
> > > > > +}
> > > > > +
> > > > > +/* Optimize:
> > > > > +
> > > > > +       1. (x - y) CMP 0
> > > > > +       2. 0 CMP (x - y)
> > > > > +
> > > > > +   to:
> > > > > +
> > > > > +       1. x CMP y
> > > > > +       2. y CMP x
> > > > > +
> > > > > +   where CMP is either <, <=, >, or >=.  */
> > > > > +
> > > > > +static void
> > > > > +optimize_signed_comparison (gcond *stmt)
> > > > > +{
> > > > > +  const enum tree_code code = gimple_cond_code (stmt);
> > > > > +  if (code < LT_EXPR || GE_EXPR < code)
> > > > > +    return;
> > > > > +
> > > > > +  const_tree lhs = prop_integer_zero_or_minus_expr (gimple_cond_lhs (stmt));
> > > > > +  if (lhs == NULL_TREE)
> > > > > +    return;
> > > > > +
> > > > > +  const_tree rhs = prop_integer_zero_or_minus_expr (gimple_cond_rhs (stmt));
> > > > > +  if (rhs == NULL_TREE)
> > > > > +    return;
> > > > > +
> > > > > +  if (TREE_CODE (lhs) == MINUS_EXPR && integer_zerop (rhs))
> > > > > +    {
> > > > > +      /* Case 1: (x - y) CMP 0  */
> > > > > +      tree x = TREE_OPERAND (lhs, 0);
> > > > > +      tree y = TREE_OPERAND (lhs, 1);
> > > > > +
> > > > > +      /* => x CMP y  */
> > > > > +      gimple_cond_set_lhs (stmt, x);
> > > > > +      gimple_cond_set_rhs (stmt, y);
> > > > > +      update_stmt (stmt);
> > > > > +    }
> > > > > +  else if (integer_zerop (lhs) && TREE_CODE (rhs) == MINUS_EXPR)
> > > > > +    {
> > > > > +      /* Case 2: 0 CMP (x - y)  */
> > > > > +      tree x = TREE_OPERAND (rhs, 0);
> > > > > +      tree y = TREE_OPERAND (rhs, 1);
> > > > > +
> > > > > +      /* => y CMP x  */
> > > > > +      gimple_cond_set_lhs (stmt, y);
> > > > > +      gimple_cond_set_rhs (stmt, x);
> > > > > +      update_stmt (stmt);
> > > > > +    }
> > > > > +}
> > > > > +
> > > > > +/* Find signed integer comparisons where the operands are MINUS_EXPR and
> > > > > +   0, and replace them with the equivalent comparison of the operands of
> > > > > +   the MINUS_EXPR without the subtraction.  */
> > > > > +
> > > > > +namespace {
> > > > > +
> > > > > +const pass_data pass_data_cmp =
> > > > > +{
> > > > > +  GIMPLE_PASS, /* type */
> > > > > +  "cmp", /* name */
> > > > > +  OPTGROUP_NONE, /* optinfo_flags */
> > > > > +  TV_TREE_CMP, /* tv_id */
> > > > > +  PROP_ssa, /* properties_required */
> > > > > +  0, /* properties_provided */
> > > > > +  0, /* properties_destroyed */
> > > > > +  0, /* todo_flags_start */
> > > > > +  TODO_remove_unused_locals, /* todo_flags_finish */
> > > > > +};
> > > > > +
> > > > > +class pass_cmp : public gimple_opt_pass
> > > > > +{
> > > > > +public:
> > > > > +  pass_cmp (gcc::context *ctxt)
> > > > > +    : gimple_opt_pass (pass_data_cmp, ctxt)
> > > > > +  {}
> > > > > +
> > > > > +  /* opt_pass methods: */
> > > > > +  bool gate (function *) final override
> > > > > +    {
> > > > > +      /* If -fwrapv is enabled, this pass cannot be safely applied as described
> > > > > +        in the comment at the top of this file.  */
> > > > > +      return flag_tree_cmp != 0 && flag_wrapv == 0;
> > > > > +    }
> > > > > +
> > > > > +  unsigned int execute (function *) final override;
> > > > > +
> > > > > +}; // class pass_cmp
> > > > > +
> > > > > +class cmp_dom_walker : public dom_walker
> > > > > +{
> > > > > +public:
> > > > > +  cmp_dom_walker () : dom_walker (CDI_DOMINATORS) {}
> > > > > +
> > > > > +  void after_dom_children (basic_block) final override;
> > > > > +};
> > > > > +
> > > > > +void
> > > > > +cmp_dom_walker::after_dom_children (basic_block bb)
> > > > > +{
> > > > > +  gimple_stmt_iterator gsi;
> > > > > +
> > > > > +  for (gsi = gsi_after_labels (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> > > > > +    {
> > > > > +      gimple *stmt = gsi_stmt (gsi);
> > > > > +      if (gimple_code (stmt) == GIMPLE_COND)
> > > > > +       optimize_signed_comparison (as_a <gcond *> (stmt));
> > > > > +    }
> > > > > +}
> > > > > +
> > > > > +
> > > > > +unsigned int
> > > > > +pass_cmp::execute (function *)
> > > > > +{
> > > > > +  calculate_dominance_info (CDI_DOMINATORS);
> > > > > +  cmp_dom_walker ().walk (ENTRY_BLOCK_PTR_FOR_FN (cfun));
> > > > > +  return 0;
> > > > > +}
> > > > > +
> > > > > +} // anon namespace
> > > > > +
> > > > > +gimple_opt_pass *
> > > > > +make_pass_cmp (gcc::context *ctxt)
> > > > > +{
> > > > > +  return new pass_cmp (ctxt);
> > > > > +}
> > > > > --
> > > > > 2.44.0
> > > > >

Ken Matsui March 8, 2024, 5:49 p.m. UTC | #6

On Thu, Mar 7, 2024 at 10:49 PM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Thu, Mar 7, 2024 at 8:29 PM Ken Matsui <kmatsui@cs.washington.edu> wrote:
> >
> > On Tue, Mar 5, 2024 at 7:58 AM Richard Biener
> > <richard.guenther@gmail.com> wrote:
> > >
> > > On Tue, Mar 5, 2024 at 1:51 PM Ken Matsui <kmatsui@cs.washington.edu> wrote:
> > > >
> > > > On Tue, Mar 5, 2024 at 12:38 AM Richard Biener
> > > > <richard.guenther@gmail.com> wrote:
> > > > >
> > > > > On Mon, Mar 4, 2024 at 9:40 PM Ken Matsui <kmatsui@gcc.gnu.org> wrote:
> > > > > >
> > > > > > (x - y) CMP 0 is equivalent to x CMP y where x and y are signed
> > > > > > integers and CMP is <, <=, >, or >=.  Similarly, 0 CMP (x - y) is
> > > > > > equivalent to y CMP x.  As reported in PR middle-end/113680, this
> > > > > > equivalence does not hold for types other than signed integers.  When
> > > > > > it comes to conditions, the former was translated to a combination of
> > > > > > sub and test, whereas the latter was translated to a single cmp.
> > > > > > Thus, this optimization pass tries to optimize the former to the
> > > > > > latter.
> > > > > >
> > > > > > When `-fwrapv` is enabled, GCC treats the overflow of signed integers
> > > > > > as defined behavior, specifically, wrapping around according to two's
> > > > > > complement arithmetic.  This has implications for optimizations that
> > > > > > rely on the standard behavior of signed integers, where overflow is
> > > > > > undefined.  Consider the example given:
> > > > > >
> > > > > >         long long llmax = __LONG_LONG_MAX__;
> > > > > >         long long llmin = -llmax - 1;
> > > > > >
> > > > > > Here, `llmax - llmin` effectively becomes `llmax - (-llmax - 1)`, which
> > > > > > simplifies to `2 * llmax + 1`.  Given that `llmax` is the maximum value
> > > > > > for a `long long`, this calculation overflows in a defined manner
> > > > > > (wrapping around), which under `-fwrapv` is a legal operation that
> > > > > > produces a negative value due to two's complement wraparound.
> > > > > > Therefore, `llmax - llmin < 0` is true.
> > > > > >
> > > > > > However, the direct comparison `llmax < llmin` is false since `llmax`
> > > > > > is the maximum possible value and `llmin` is the minimum.  Hence,
> > > > > > optimizations that rely on the equivalence of `(x - y) CMP 0` to
> > > > > > `x CMP y` (and vice versa) cannot be safely applied when `-fwrapv` is
> > > > > > enabled.  This is why this optimization pass is disabled under
> > > > > > `-fwrapv`.
> > > > > >
> > > > > > This optimization pass must run before the Jump Threading pass and the
> > > > > > VRP pass, as it may modify conditions. For example, in the VRP pass:
> > > > > >
> > > > > >         (1)
> > > > > >           int diff = x - y;
> > > > > >           if (diff > 0)
> > > > > >             foo();
> > > > > >           if (diff < 0)
> > > > > >             bar();
> > > > > >
> > > > > > The second condition would be converted to diff != 0 in the VRP pass
> > > > > > because we know the postcondition of the first condition is diff <= 0,
> > > > > > and then diff != 0 is cheaper than diff < 0. If we apply this pass
> > > > > > after this VRP, we get:
> > > > > >
> > > > > >         (2)
> > > > > >           int diff = x - y;
> > > > > >           if (x > y)
> > > > > >             foo();
> > > > > >           if (diff != 0)
> > > > > >             bar();
> > > > > >
> > > > > > This generates sub and test for the second condition and cmp for the
> > > > > > first condition. However, if we apply this pass beforehand, we simply
> > > > > > get:
> > > > > >
> > > > > >         (3)
> > > > > >           int diff = x - y;
> > > > > >           if (x > y)
> > > > > >             foo();
> > > > > >           if (x < y)
> > > > > >             bar();
> > > > > >
> > > > > > In this code, diff will be eliminated as a dead code, and sub and test
> > > > > > will not be generated, which is more efficient.
> > > > > >
> > > > > > For the Jump Threading pass, without this optimization pass, (1) and
> > > > > > (3) above are recognized as different, which prevents TCO.
> > > > > >
> > > > > >         PR middle-end/113680
> > > > >
> > > > > This shouldn't be done as a new optimization pass.  It fits either
> > > > > the explicit code present in the forwprop pass or a new match.pd
> > > > > pattern.  There's possible interaction with x - y value being used
> > > > > elsewhere and thus exposing a CSE opportunity as well as
> > > > > a comparison against zero being possibly implemented by
> > > > > a flag setting subtraction instruction.
> > > > >
> > > >
> > > > Thank you so much for your review!  Although the forwprop pass runs
> > > > multiple times, we might not need to apply this optimization multiple
> > > > times.  Would it be acceptable to add such optimization?  More
> > > > generally, I would like to know how to determine where to put
> > > > optimization in the future.
> > >
> > > This kind of pattern matching expression simplification is best
> > > addressed by patterns in match.pd though historically the forwprop
> > > pass still catches cases not in match.pd in its
> > > forward_propagate_into_comparison_1 (and callers).
> > >
> >
> > I see.  When would patterns in match.pd be applied?  Around forwprop
> > or somewhere else?  (Also, could you please tell me a document I can
> > learn about these if it exists?)  I ask because this optimization
> > should be applied before the Jump Threading and VRP passes.
>
> match.pd patterns get used whenever an expression is simplified.  And
> yes, the forwprop pass tries to simplify expressions rooted at each stmt.
> In general each pass changing a statement or one of its operands should
> re-simplify the expression rooted at it, so the theory is that all expressions
> always remain simplified.
>
> There's no documentation about the patterns present in match.pd if
> you meant that.

What I meant was the high-level structure around match.pd and other
passes, but it is answered!  That makes sense.  Thank you for your
time!

> I'm usually writing small testcases to "query" match.pd
> and use -fdump-tree-all-folding looking into .original (when folded at
> generic time), .gimple, .ccp and .forwprop which are the usual early
> places where simplifications are triggered.

This is really helpful, thank you!

>
> Richard.
>
> > > > FYI, I read this page: https://gcc.gnu.org/wiki/OptimizationCourse
> > > >
> > > > > Our VN pass has some tricks to anticipate CSE opportunities
> > > > > like this, but it's not done "properly" in the way of anticipating
> > > > > both forms during PRE.
> > > > >
> > > > > I'll note we have
> > > > >
> > > > >  /* (A - B) != 0 ? (A - B) : (B - A)    same as (A - B) */
> > > > >  (for cmp (ne ltgt)
> > > > >
> > > > > and similar which might be confused by canonicalizing to A != B.
> > > >
> > > > I will investigate and update my patch (after my final exam ends...)!
> > > >
> > > > > I'm also surprised we don't already have the pattern you add.
> > > >
> > > > Hmm, so am I.
> > >
> > > It looks like we do it for equality compares which can also handle
> > > types where overflow is undefined.  -fdump-tree-all-folding shows
> > > in the 005t.original dump we simplify a - b != 0 via match.pd:6105:
> > >
> > > /* Transform comparisons of the form X - Y CMP 0 to X CMP Y.
> > >    ??? The transformation is valid for the other operators if overflow
> > >    is undefined for the type, but performing it here badly interacts
> > >    with the transformation in fold_cond_expr_with_comparison which
> > >    attempts to synthetize ABS_EXPR.  */
> > > (for cmp (eq ne)
> > >  (for sub (minus pointer_diff)
> > >   (simplify
> > >    (cmp (sub@2 @0 @1) integer_zerop)
> > >    (if (single_use (@2))
> > >     (cmp @0 @1)))))
> > >
> > > and the comment explains why we don't perform the transform
> > > it the way you intend.  That comment might no longer apply
> > > but I would guess there's testsuite fallout when you amend this
> > > pattern.
> >
> > Thank you so much!  I will check this out.
> >
> > >
> > > Richard.
> > >
> > > >  I saw that this optimization sometimes messes up VRP
> > > > while sometimes helping it (as I described in my comment).  I also
> > > > need to research this.
> > > >
> > > > >
> > > > > Richard.
> > > > >
> > > > > > gcc/ChangeLog:
> > > > > >
> > > > > >         * Makefile.in: Add tree-ssa-cmp.o to OBJS.
> > > > > >         * common.opt: Define ftree-cmp
> > > > > >         * doc/invoke.texi: Document ftree-cmp.
> > > > > >         * opts.cc (default_options_table): Handle OPT_ftree_cmp.
> > > > > >         * passes.def (pass_cmp): New optimization pass.
> > > > > >         * timevar.def (TV_TREE_CMP): New variable for timing.
> > > > > >         * tree-pass.h (make_pass_cmp): New declaration.
> > > > > >         * tree-ssa-cmp.cc: New file.
> > > > > >
> > > > > > gcc/testsuite/ChangeLog:
> > > > > >
> > > > > >         * gcc.dg/pr113680.c: New test.
> > > > > >
> > > > > > Signed-off-by: Ken Matsui <kmatsui@gcc.gnu.org>
> > > > > > ---
> > > > > >  gcc/Makefile.in                 |   1 +
> > > > > >  gcc/common.opt                  |   4 +
> > > > > >  gcc/doc/invoke.texi             |  11 +-
> > > > > >  gcc/opts.cc                     |   1 +
> > > > > >  gcc/passes.def                  |   3 +
> > > > > >  gcc/testsuite/gcc.dg/pr113680.c |  47 ++++++
> > > > > >  gcc/timevar.def                 |   1 +
> > > > > >  gcc/tree-pass.h                 |   1 +
> > > > > >  gcc/tree-ssa-cmp.cc             | 262 ++++++++++++++++++++++++++++++++
> > > > > >  9 files changed, 330 insertions(+), 1 deletion(-)
> > > > > >  create mode 100644 gcc/testsuite/gcc.dg/pr113680.c
> > > > > >  create mode 100644 gcc/tree-ssa-cmp.cc
> > > > > >
> > > > > > diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> > > > > > index a74761b7ab3..935b80b6947 100644
> > > > > > --- a/gcc/Makefile.in
> > > > > > +++ b/gcc/Makefile.in
> > > > > > @@ -1731,6 +1731,7 @@ OBJS = \
> > > > > >         tree-ssa-address.o \
> > > > > >         tree-ssa-alias.o \
> > > > > >         tree-ssa-ccp.o \
> > > > > > +       tree-ssa-cmp.o \
> > > > > >         tree-ssa-coalesce.o \
> > > > > >         tree-ssa-copy.o \
> > > > > >         tree-ssa-dce.o \
> > > > > > diff --git a/gcc/common.opt b/gcc/common.opt
> > > > > > index 51c4a17da83..7c853224458 100644
> > > > > > --- a/gcc/common.opt
> > > > > > +++ b/gcc/common.opt
> > > > > > @@ -3053,6 +3053,10 @@ ftree-ch
> > > > > >  Common Var(flag_tree_ch) Optimization
> > > > > >  Enable loop header copying on trees.
> > > > > >
> > > > > > +ftree-cmp
> > > > > > +Common Var(flag_tree_cmp) Optimization
> > > > > > +Enable SSA comparison optimization on trees.
> > > > > > +
> > > > > >  ftree-coalesce-inlined-vars
> > > > > >  Common Ignore RejectNegative
> > > > > >  Does nothing.  Preserved for backward compatibility.
> > > > > > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> > > > > > index bdf05be387d..04762d490a3 100644
> > > > > > --- a/gcc/doc/invoke.texi
> > > > > > +++ b/gcc/doc/invoke.texi
> > > > > > @@ -619,7 +619,7 @@ Objective-C and Objective-C++ Dialects}.
> > > > > >  -fsplit-wide-types  -fsplit-wide-types-early  -fssa-backprop  -fssa-phiopt
> > > > > >  -fstdarg-opt  -fstore-merging  -fstrict-aliasing -fipa-strict-aliasing
> > > > > >  -fthread-jumps  -ftracer  -ftree-bit-ccp
> > > > > > --ftree-builtin-call-dce  -ftree-ccp  -ftree-ch
> > > > > > +-ftree-builtin-call-dce  -ftree-ccp  -ftree-ch  -ftree-cmp
> > > > > >  -ftree-coalesce-vars  -ftree-copy-prop  -ftree-dce  -ftree-dominator-opts
> > > > > >  -ftree-dse  -ftree-forwprop  -ftree-fre  -fcode-hoisting
> > > > > >  -ftree-loop-if-convert  -ftree-loop-im
> > > > > > @@ -12377,6 +12377,7 @@ compilation time.
> > > > > >  -ftree-bit-ccp
> > > > > >  -ftree-ccp
> > > > > >  -ftree-ch
> > > > > > +-ftree-cmp
> > > > > >  -ftree-coalesce-vars
> > > > > >  -ftree-copy-prop
> > > > > >  -ftree-dce
> > > > > > @@ -13707,6 +13708,14 @@ effectiveness of code motion optimizations.  It also saves one jump.  This flag
> > > > > >  is enabled by default at @option{-O1} and higher.  It is not enabled
> > > > > >  for @option{-Os}, since it usually increases code size.
> > > > > >
> > > > > > +@opindex ftree-cmp
> > > > > > +@item -ftree-cmp
> > > > > > +Perform comparison optimization on trees.  This decreases unnecessary
> > > > > > +instructions for comparisons.  This flag is enabled by default at
> > > > > > +@option{-O1} and higher.  This optimization is automatically disabled when
> > > > > > +@option{-fwrapv} is specified to ensure correct behavior under signed integer
> > > > > > +overflow wrapping.
> > > > > > +
> > > > > >  @opindex ftree-loop-optimize
> > > > > >  @item -ftree-loop-optimize
> > > > > >  Perform loop optimizations on trees.  This flag is enabled by default
> > > > > > diff --git a/gcc/opts.cc b/gcc/opts.cc
> > > > > > index 3333600e0ea..9ee64a48313 100644
> > > > > > --- a/gcc/opts.cc
> > > > > > +++ b/gcc/opts.cc
> > > > > > @@ -589,6 +589,7 @@ static const struct default_options default_options_table[] =
> > > > > >      { OPT_LEVELS_1_PLUS, OPT_ftree_builtin_call_dce, NULL, 1 },
> > > > > >      { OPT_LEVELS_1_PLUS, OPT_ftree_ccp, NULL, 1 },
> > > > > >      { OPT_LEVELS_1_PLUS, OPT_ftree_ch, NULL, 1 },
> > > > > > +    { OPT_LEVELS_1_PLUS, OPT_ftree_cmp, NULL, 1 },
> > > > > >      { OPT_LEVELS_1_PLUS, OPT_ftree_coalesce_vars, NULL, 1 },
> > > > > >      { OPT_LEVELS_1_PLUS, OPT_ftree_copy_prop, NULL, 1 },
> > > > > >      { OPT_LEVELS_1_PLUS, OPT_ftree_dce, NULL, 1 },
> > > > > > diff --git a/gcc/passes.def b/gcc/passes.def
> > > > > > index 1cbbd413097..e28a2d3c309 100644
> > > > > > --- a/gcc/passes.def
> > > > > > +++ b/gcc/passes.def
> > > > > > @@ -84,6 +84,9 @@ along with GCC; see the file COPYING3.  If not see
> > > > > >           /* After CCP we rewrite no longer addressed locals into SSA
> > > > > >              form if possible.  */
> > > > > >           NEXT_PASS (pass_forwprop);
> > > > > > +         /* Do cmp before early_thread_jumps and vrp since it may
> > > > > > +            modify conditions.  */
> > > > > > +         NEXT_PASS (pass_cmp);
> > > > > >            NEXT_PASS (pass_early_thread_jumps, /*first=*/true);
> > > > > >           NEXT_PASS (pass_sra_early);
> > > > > >           /* pass_build_ealias is a dummy pass that ensures that we
> > > > > > diff --git a/gcc/testsuite/gcc.dg/pr113680.c b/gcc/testsuite/gcc.dg/pr113680.c
> > > > > > new file mode 100644
> > > > > > index 00000000000..59cd865aa21
> > > > > > --- /dev/null
> > > > > > +++ b/gcc/testsuite/gcc.dg/pr113680.c
> > > > > > @@ -0,0 +1,47 @@
> > > > > > +/* { dg-do compile } */
> > > > > > +/* { dg-options "-Os -fdump-tree-optimized" } */
> > > > > > +
> > > > > > +volatile int i = 0;
> > > > > > +
> > > > > > +void foo() { i = 1; }
> > > > > > +void bar() { i = 2; }
> > > > > > +
> > > > > > +void f1(int x, int y)
> > > > > > +{
> > > > > > +  int diff = x - y;
> > > > > > +  if (diff > 0)
> > > > > > +    foo();
> > > > > > +  if (diff < 0)
> > > > > > +    bar();
> > > > > > +}
> > > > > > +
> > > > > > +void f2(int x, int y)
> > > > > > +{
> > > > > > +  if ((x - y) > 0)
> > > > > > +    foo();
> > > > > > +  if ((x - y) < 0)
> > > > > > +    bar();
> > > > > > +}
> > > > > > +
> > > > > > +void f3(int x, int y)
> > > > > > +{
> > > > > > +  if (x > y)
> > > > > > +    foo();
> > > > > > +  if (x < y)
> > > > > > +    bar();
> > > > > > +}
> > > > > > +
> > > > > > +void f4(int x, int y)
> > > > > > +{
> > > > > > +  int diff = x - y;
> > > > > > +  if (diff > 0)
> > > > > > +    foo();
> > > > > > +  if (x < y)
> > > > > > +    bar();
> > > > > > +}
> > > > > > +
> > > > > > +/* { dg-final { scan-tree-dump-not " - " "optimized" } } */
> > > > > > +/* { dg-final { scan-assembler-times "cmp" 1 { target { i?86-*-* x86_64-*-* } } } } */
> > > > > > +/* { dg-final { scan-assembler-times "jmp\[ \t\]+f1" 3 { target { i?86-*-* x86_64-*-* } } } } */
> > > > > > +/* { dg-final { scan-assembler-not "sub" { target { i?86-*-* x86_64-*-* } } } } */
> > > > > > +/* { dg-final { scan-assembler-not "test" { target { i?86-*-* x86_64-*-* } } } } */
> > > > > > diff --git a/gcc/timevar.def b/gcc/timevar.def
> > > > > > index 8e2168e0817..468266d0c56 100644
> > > > > > --- a/gcc/timevar.def
> > > > > > +++ b/gcc/timevar.def
> > > > > > @@ -157,6 +157,7 @@ DEFTIMEVAR (TV_TREE_EH                   , "tree eh")
> > > > > >  DEFTIMEVAR (TV_TREE_CFG                     , "tree CFG construction")
> > > > > >  DEFTIMEVAR (TV_TREE_CLEANUP_CFG             , "tree CFG cleanup")
> > > > > >  DEFTIMEVAR (TV_TREE_TAIL_MERGE       , "tree tail merge")
> > > > > > +DEFTIMEVAR (TV_TREE_CMP                     , "tree cmp")
> > > > > >  DEFTIMEVAR (TV_TREE_VRP              , "tree VRP")
> > > > > >  DEFTIMEVAR (TV_TREE_VRP_THREADER     , "tree VRP threader")
> > > > > >  DEFTIMEVAR (TV_TREE_EARLY_VRP        , "tree Early VRP")
> > > > > > diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
> > > > > > index 29267589eeb..71789a279c0 100644
> > > > > > --- a/gcc/tree-pass.h
> > > > > > +++ b/gcc/tree-pass.h
> > > > > > @@ -401,6 +401,7 @@ extern gimple_opt_pass *make_pass_ch (gcc::context *ctxt);
> > > > > >  extern gimple_opt_pass *make_pass_ch_vect (gcc::context *ctxt);
> > > > > >  extern gimple_opt_pass *make_pass_sccopy (gcc::context *ctxt);
> > > > > >  extern gimple_opt_pass *make_pass_ccp (gcc::context *ctxt);
> > > > > > +extern gimple_opt_pass *make_pass_cmp (gcc::context *ctxt);
> > > > > >  extern gimple_opt_pass *make_pass_split_paths (gcc::context *ctxt);
> > > > > >  extern gimple_opt_pass *make_pass_build_ssa (gcc::context *ctxt);
> > > > > >  extern gimple_opt_pass *make_pass_build_alias (gcc::context *ctxt);
> > > > > > diff --git a/gcc/tree-ssa-cmp.cc b/gcc/tree-ssa-cmp.cc
> > > > > > new file mode 100644
> > > > > > index 00000000000..41d4a58a5b5
> > > > > > --- /dev/null
> > > > > > +++ b/gcc/tree-ssa-cmp.cc
> > > > > > @@ -0,0 +1,262 @@
> > > > > > +/* Global, SSA-based optimizations using comparison identities.
> > > > > > +   Copyright (C) 2024 Free Software Foundation, Inc.
> > > > > > +
> > > > > > +This file is part of GCC.
> > > > > > +
> > > > > > +GCC is free software; you can redistribute it and/or modify it
> > > > > > +under the terms of the GNU General Public License as published by the
> > > > > > +Free Software Foundation; either version 3, or (at your option) any
> > > > > > +later version.
> > > > > > +
> > > > > > +GCC is distributed in the hope that it will be useful, but WITHOUT
> > > > > > +ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> > > > > > +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> > > > > > +for more details.
> > > > > > +
> > > > > > +You should have received a copy of the GNU General Public License
> > > > > > +along with GCC; see the file COPYING3.  If not see
> > > > > > +<http://www.gnu.org/licenses/>.  */
> > > > > > +
> > > > > > +/* (x - y) CMP 0 is equivalent to x CMP y where x and y are signed integers and
> > > > > > +   CMP is <, <=, >, or >=.  Similarly, 0 CMP (x - y) is equivalent to y CMP x.
> > > > > > +   As reported in PR middle-end/113680, this equivalence does not hold for
> > > > > > +   types other than signed integers.  When it comes to conditions, the former
> > > > > > +   was translated to a combination of sub and test, whereas the latter was
> > > > > > +   translated to a single cmp.  Thus, this optimization pass tries to optimize
> > > > > > +   the former to the latter.
> > > > > > +
> > > > > > +   When `-fwrapv` is enabled, GCC treats the overflow of signed integers as
> > > > > > +   defined behavior, specifically, wrapping around according to two's
> > > > > > +   complement arithmetic.  This has implications for optimizations that
> > > > > > +   rely on the standard behavior of signed integers, where overflow is
> > > > > > +   undefined.  Consider the example given:
> > > > > > +
> > > > > > +       long long llmax = __LONG_LONG_MAX__;
> > > > > > +       long long llmin = -llmax - 1;
> > > > > > +
> > > > > > +   Here, `llmax - llmin` effectively becomes `llmax - (-llmax - 1)`, which
> > > > > > +   simplifies to `2 * llmax + 1`.  Given that `llmax` is the maximum value for
> > > > > > +   a `long long`, this calculation overflows in a defined manner
> > > > > > +   (wrapping around), which under `-fwrapv` is a legal operation that produces
> > > > > > +   a negative value due to two's complement wraparound.  Therefore,
> > > > > > +   `llmax - llmin < 0` is true.
> > > > > > +
> > > > > > +   However, the direct comparison `llmax < llmin` is false since `llmax` is the
> > > > > > +   maximum possible value and `llmin` is the minimum.  Hence, optimizations
> > > > > > +   that rely on the equivalence of `(x - y) CMP 0` to `x CMP y` (and vice versa)
> > > > > > +   cannot be safely applied when `-fwrapv` is enabled.  This is why this
> > > > > > +   optimization pass is disabled under `-fwrapv`.
> > > > > > +
> > > > > > +   This optimization pass must run before the Jump Threading pass and
> > > > > > +   the VRP pass, as it may modify conditions. For example, in the VRP pass:
> > > > > > +
> > > > > > +       (1)
> > > > > > +         int diff = x - y;
> > > > > > +         if (diff > 0)
> > > > > > +           foo();
> > > > > > +         if (diff < 0)
> > > > > > +           bar();
> > > > > > +
> > > > > > +   The second condition would be converted to diff != 0 in the VRP pass because
> > > > > > +   we know the postcondition of the first condition is diff <= 0, and then
> > > > > > +   diff != 0 is cheaper than diff < 0. If we apply this pass after this VRP,
> > > > > > +   we get:
> > > > > > +
> > > > > > +       (2)
> > > > > > +         int diff = x - y;
> > > > > > +         if (x > y)
> > > > > > +           foo();
> > > > > > +         if (diff != 0)
> > > > > > +           bar();
> > > > > > +
> > > > > > +   This generates sub and test for the second condition and cmp for the first
> > > > > > +   condition. However, if we apply this pass beforehand, we simply get:
> > > > > > +
> > > > > > +       (3)
> > > > > > +         int diff = x - y;
> > > > > > +         if (x > y)
> > > > > > +           foo();
> > > > > > +         if (x < y)
> > > > > > +           bar();
> > > > > > +
> > > > > > +   In this code, diff will be eliminated as a dead code, and sub and test will
> > > > > > +   not be generated, which is more efficient.
> > > > > > +
> > > > > > +   For the Jump Threading pass, without this optimization pass, (1) and (3)
> > > > > > +   above are recognized as different, which prevents TCO.  */
> > > > > > +
> > > > > > +#include "config.h"
> > > > > > +#include "system.h"
> > > > > > +#include "coretypes.h"
> > > > > > +#include "backend.h"
> > > > > > +#include "tree.h"
> > > > > > +#include "gimple.h"
> > > > > > +#include "tree-pass.h"
> > > > > > +#include "ssa.h"
> > > > > > +#include "gimple-iterator.h"
> > > > > > +#include "domwalk.h"
> > > > > > +
> > > > > > +/* True if VAR is a signed integer, false otherwise.  */
> > > > > > +
> > > > > > +static bool
> > > > > > +is_signed_integer(const_tree var)
> > > > > > +{
> > > > > > +  return (TREE_CODE (TREE_TYPE (var)) == INTEGER_TYPE
> > > > > > +         && !TYPE_UNSIGNED (TREE_TYPE (var)));
> > > > > > +}
> > > > > > +
> > > > > > +/* If EXPR is an integer zero, return it.  If EXPR is SSA_NAME, and its
> > > > > > +   defining statement is a subtraction of two signed integers, return the
> > > > > > +   MINUS_EXPR.  Otherwise, return NULL_TREE.  */
> > > > > > +
> > > > > > +static const_tree
> > > > > > +prop_integer_zero_or_minus_expr (const_tree expr)
> > > > > > +{
> > > > > > +  if (integer_zerop (expr))
> > > > > > +    return expr;
> > > > > > +
> > > > > > +  if (TREE_CODE (expr) != SSA_NAME)
> > > > > > +    return NULL_TREE;
> > > > > > +
> > > > > > +  gimple *defining_stmt = SSA_NAME_DEF_STMT (expr);
> > > > > > +  if (!is_gimple_assign (defining_stmt)
> > > > > > +      || gimple_assign_rhs_code (defining_stmt) != MINUS_EXPR)
> > > > > > +    return NULL_TREE;
> > > > > > +
> > > > > > +  tree rhs1 = gimple_assign_rhs1 (defining_stmt);
> > > > > > +  if (!is_signed_integer (rhs1))
> > > > > > +    return NULL_TREE;
> > > > > > +
> > > > > > +  tree rhs2 = gimple_assign_rhs2 (defining_stmt);
> > > > > > +  if (!is_signed_integer (rhs2))
> > > > > > +    return NULL_TREE;
> > > > > > +
> > > > > > +  return build2 (MINUS_EXPR, TREE_TYPE (expr), rhs1, rhs2);
> > > > > > +}
> > > > > > +
> > > > > > +/* Optimize:
> > > > > > +
> > > > > > +       1. (x - y) CMP 0
> > > > > > +       2. 0 CMP (x - y)
> > > > > > +
> > > > > > +   to:
> > > > > > +
> > > > > > +       1. x CMP y
> > > > > > +       2. y CMP x
> > > > > > +
> > > > > > +   where CMP is either <, <=, >, or >=.  */
> > > > > > +
> > > > > > +static void
> > > > > > +optimize_signed_comparison (gcond *stmt)
> > > > > > +{
> > > > > > +  const enum tree_code code = gimple_cond_code (stmt);
> > > > > > +  if (code < LT_EXPR || GE_EXPR < code)
> > > > > > +    return;
> > > > > > +
> > > > > > +  const_tree lhs = prop_integer_zero_or_minus_expr (gimple_cond_lhs (stmt));
> > > > > > +  if (lhs == NULL_TREE)
> > > > > > +    return;
> > > > > > +
> > > > > > +  const_tree rhs = prop_integer_zero_or_minus_expr (gimple_cond_rhs (stmt));
> > > > > > +  if (rhs == NULL_TREE)
> > > > > > +    return;
> > > > > > +
> > > > > > +  if (TREE_CODE (lhs) == MINUS_EXPR && integer_zerop (rhs))
> > > > > > +    {
> > > > > > +      /* Case 1: (x - y) CMP 0  */
> > > > > > +      tree x = TREE_OPERAND (lhs, 0);
> > > > > > +      tree y = TREE_OPERAND (lhs, 1);
> > > > > > +
> > > > > > +      /* => x CMP y  */
> > > > > > +      gimple_cond_set_lhs (stmt, x);
> > > > > > +      gimple_cond_set_rhs (stmt, y);
> > > > > > +      update_stmt (stmt);
> > > > > > +    }
> > > > > > +  else if (integer_zerop (lhs) && TREE_CODE (rhs) == MINUS_EXPR)
> > > > > > +    {
> > > > > > +      /* Case 2: 0 CMP (x - y)  */
> > > > > > +      tree x = TREE_OPERAND (rhs, 0);
> > > > > > +      tree y = TREE_OPERAND (rhs, 1);
> > > > > > +
> > > > > > +      /* => y CMP x  */
> > > > > > +      gimple_cond_set_lhs (stmt, y);
> > > > > > +      gimple_cond_set_rhs (stmt, x);
> > > > > > +      update_stmt (stmt);
> > > > > > +    }
> > > > > > +}
> > > > > > +
> > > > > > +/* Find signed integer comparisons where the operands are MINUS_EXPR and
> > > > > > +   0, and replace them with the equivalent comparison of the operands of
> > > > > > +   the MINUS_EXPR without the subtraction.  */
> > > > > > +
> > > > > > +namespace {
> > > > > > +
> > > > > > +const pass_data pass_data_cmp =
> > > > > > +{
> > > > > > +  GIMPLE_PASS, /* type */
> > > > > > +  "cmp", /* name */
> > > > > > +  OPTGROUP_NONE, /* optinfo_flags */
> > > > > > +  TV_TREE_CMP, /* tv_id */
> > > > > > +  PROP_ssa, /* properties_required */
> > > > > > +  0, /* properties_provided */
> > > > > > +  0, /* properties_destroyed */
> > > > > > +  0, /* todo_flags_start */
> > > > > > +  TODO_remove_unused_locals, /* todo_flags_finish */
> > > > > > +};
> > > > > > +
> > > > > > +class pass_cmp : public gimple_opt_pass
> > > > > > +{
> > > > > > +public:
> > > > > > +  pass_cmp (gcc::context *ctxt)
> > > > > > +    : gimple_opt_pass (pass_data_cmp, ctxt)
> > > > > > +  {}
> > > > > > +
> > > > > > +  /* opt_pass methods: */
> > > > > > +  bool gate (function *) final override
> > > > > > +    {
> > > > > > +      /* If -fwrapv is enabled, this pass cannot be safely applied as described
> > > > > > +        in the comment at the top of this file.  */
> > > > > > +      return flag_tree_cmp != 0 && flag_wrapv == 0;
> > > > > > +    }
> > > > > > +
> > > > > > +  unsigned int execute (function *) final override;
> > > > > > +
> > > > > > +}; // class pass_cmp
> > > > > > +
> > > > > > +class cmp_dom_walker : public dom_walker
> > > > > > +{
> > > > > > +public:
> > > > > > +  cmp_dom_walker () : dom_walker (CDI_DOMINATORS) {}
> > > > > > +
> > > > > > +  void after_dom_children (basic_block) final override;
> > > > > > +};
> > > > > > +
> > > > > > +void
> > > > > > +cmp_dom_walker::after_dom_children (basic_block bb)
> > > > > > +{
> > > > > > +  gimple_stmt_iterator gsi;
> > > > > > +
> > > > > > +  for (gsi = gsi_after_labels (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> > > > > > +    {
> > > > > > +      gimple *stmt = gsi_stmt (gsi);
> > > > > > +      if (gimple_code (stmt) == GIMPLE_COND)
> > > > > > +       optimize_signed_comparison (as_a <gcond *> (stmt));
> > > > > > +    }
> > > > > > +}
> > > > > > +
> > > > > > +
> > > > > > +unsigned int
> > > > > > +pass_cmp::execute (function *)
> > > > > > +{
> > > > > > +  calculate_dominance_info (CDI_DOMINATORS);
> > > > > > +  cmp_dom_walker ().walk (ENTRY_BLOCK_PTR_FOR_FN (cfun));
> > > > > > +  return 0;
> > > > > > +}
> > > > > > +
> > > > > > +} // anon namespace
> > > > > > +
> > > > > > +gimple_opt_pass *
> > > > > > +make_pass_cmp (gcc::context *ctxt)
> > > > > > +{
> > > > > > +  return new pass_cmp (ctxt);
> > > > > > +}
> > > > > > --
> > > > > > 2.44.0
> > > > > >

Richard Biener March 11, 2024, 7:22 a.m. UTC | #7

On Fri, Mar 8, 2024 at 6:50 PM Ken Matsui <kmatsui@cs.washington.edu> wrote:
>
> On Thu, Mar 7, 2024 at 10:49 PM Richard Biener
> <richard.guenther@gmail.com> wrote:
> >
> > On Thu, Mar 7, 2024 at 8:29 PM Ken Matsui <kmatsui@cs.washington.edu> wrote:
> > >
> > > On Tue, Mar 5, 2024 at 7:58 AM Richard Biener
> > > <richard.guenther@gmail.com> wrote:
> > > >
> > > > On Tue, Mar 5, 2024 at 1:51 PM Ken Matsui <kmatsui@cs.washington.edu> wrote:
> > > > >
> > > > > On Tue, Mar 5, 2024 at 12:38 AM Richard Biener
> > > > > <richard.guenther@gmail.com> wrote:
> > > > > >
> > > > > > On Mon, Mar 4, 2024 at 9:40 PM Ken Matsui <kmatsui@gcc.gnu.org> wrote:
> > > > > > >
> > > > > > > (x - y) CMP 0 is equivalent to x CMP y where x and y are signed
> > > > > > > integers and CMP is <, <=, >, or >=.  Similarly, 0 CMP (x - y) is
> > > > > > > equivalent to y CMP x.  As reported in PR middle-end/113680, this
> > > > > > > equivalence does not hold for types other than signed integers.  When
> > > > > > > it comes to conditions, the former was translated to a combination of
> > > > > > > sub and test, whereas the latter was translated to a single cmp.
> > > > > > > Thus, this optimization pass tries to optimize the former to the
> > > > > > > latter.
> > > > > > >
> > > > > > > When `-fwrapv` is enabled, GCC treats the overflow of signed integers
> > > > > > > as defined behavior, specifically, wrapping around according to two's
> > > > > > > complement arithmetic.  This has implications for optimizations that
> > > > > > > rely on the standard behavior of signed integers, where overflow is
> > > > > > > undefined.  Consider the example given:
> > > > > > >
> > > > > > >         long long llmax = __LONG_LONG_MAX__;
> > > > > > >         long long llmin = -llmax - 1;
> > > > > > >
> > > > > > > Here, `llmax - llmin` effectively becomes `llmax - (-llmax - 1)`, which
> > > > > > > simplifies to `2 * llmax + 1`.  Given that `llmax` is the maximum value
> > > > > > > for a `long long`, this calculation overflows in a defined manner
> > > > > > > (wrapping around), which under `-fwrapv` is a legal operation that
> > > > > > > produces a negative value due to two's complement wraparound.
> > > > > > > Therefore, `llmax - llmin < 0` is true.
> > > > > > >
> > > > > > > However, the direct comparison `llmax < llmin` is false since `llmax`
> > > > > > > is the maximum possible value and `llmin` is the minimum.  Hence,
> > > > > > > optimizations that rely on the equivalence of `(x - y) CMP 0` to
> > > > > > > `x CMP y` (and vice versa) cannot be safely applied when `-fwrapv` is
> > > > > > > enabled.  This is why this optimization pass is disabled under
> > > > > > > `-fwrapv`.
> > > > > > >
> > > > > > > This optimization pass must run before the Jump Threading pass and the
> > > > > > > VRP pass, as it may modify conditions. For example, in the VRP pass:
> > > > > > >
> > > > > > >         (1)
> > > > > > >           int diff = x - y;
> > > > > > >           if (diff > 0)
> > > > > > >             foo();
> > > > > > >           if (diff < 0)
> > > > > > >             bar();
> > > > > > >
> > > > > > > The second condition would be converted to diff != 0 in the VRP pass
> > > > > > > because we know the postcondition of the first condition is diff <= 0,
> > > > > > > and then diff != 0 is cheaper than diff < 0. If we apply this pass
> > > > > > > after this VRP, we get:
> > > > > > >
> > > > > > >         (2)
> > > > > > >           int diff = x - y;
> > > > > > >           if (x > y)
> > > > > > >             foo();
> > > > > > >           if (diff != 0)
> > > > > > >             bar();
> > > > > > >
> > > > > > > This generates sub and test for the second condition and cmp for the
> > > > > > > first condition. However, if we apply this pass beforehand, we simply
> > > > > > > get:
> > > > > > >
> > > > > > >         (3)
> > > > > > >           int diff = x - y;
> > > > > > >           if (x > y)
> > > > > > >             foo();
> > > > > > >           if (x < y)
> > > > > > >             bar();
> > > > > > >
> > > > > > > In this code, diff will be eliminated as a dead code, and sub and test
> > > > > > > will not be generated, which is more efficient.
> > > > > > >
> > > > > > > For the Jump Threading pass, without this optimization pass, (1) and
> > > > > > > (3) above are recognized as different, which prevents TCO.
> > > > > > >
> > > > > > >         PR middle-end/113680
> > > > > >
> > > > > > This shouldn't be done as a new optimization pass.  It fits either
> > > > > > the explicit code present in the forwprop pass or a new match.pd
> > > > > > pattern.  There's possible interaction with x - y value being used
> > > > > > elsewhere and thus exposing a CSE opportunity as well as
> > > > > > a comparison against zero being possibly implemented by
> > > > > > a flag setting subtraction instruction.
> > > > > >
> > > > >
> > > > > Thank you so much for your review!  Although the forwprop pass runs
> > > > > multiple times, we might not need to apply this optimization multiple
> > > > > times.  Would it be acceptable to add such optimization?  More
> > > > > generally, I would like to know how to determine where to put
> > > > > optimization in the future.
> > > >
> > > > This kind of pattern matching expression simplification is best
> > > > addressed by patterns in match.pd though historically the forwprop
> > > > pass still catches cases not in match.pd in its
> > > > forward_propagate_into_comparison_1 (and callers).
> > > >
> > >
> > > I see.  When would patterns in match.pd be applied?  Around forwprop
> > > or somewhere else?  (Also, could you please tell me a document I can
> > > learn about these if it exists?)  I ask because this optimization
> > > should be applied before the Jump Threading and VRP passes.
> >
> > match.pd patterns get used whenever an expression is simplified.  And
> > yes, the forwprop pass tries to simplify expressions rooted at each stmt.
> > In general each pass changing a statement or one of its operands should
> > re-simplify the expression rooted at it, so the theory is that all expressions
> > always remain simplified.
> >
> > There's no documentation about the patterns present in match.pd if
> > you meant that.
>
> What I meant was the high-level structure around match.pd and other
> passes, but it is answered!  That makes sense.  Thank you for your
> time!

In theory there's doc/passes.texi but we do a very bad job at keeping
that up-to-date.

> > I'm usually writing small testcases to "query" match.pd
> > and use -fdump-tree-all-folding looking into .original (when folded at
> > generic time), .gimple, .ccp and .forwprop which are the usual early
> > places where simplifications are triggered.
>
> This is really helpful, thank you!
>
> >
> > Richard.
> >
> > > > > FYI, I read this page: https://gcc.gnu.org/wiki/OptimizationCourse
> > > > >
> > > > > > Our VN pass has some tricks to anticipate CSE opportunities
> > > > > > like this, but it's not done "properly" in the way of anticipating
> > > > > > both forms during PRE.
> > > > > >
> > > > > > I'll note we have
> > > > > >
> > > > > >  /* (A - B) != 0 ? (A - B) : (B - A)    same as (A - B) */
> > > > > >  (for cmp (ne ltgt)
> > > > > >
> > > > > > and similar which might be confused by canonicalizing to A != B.
> > > > >
> > > > > I will investigate and update my patch (after my final exam ends...)!
> > > > >
> > > > > > I'm also surprised we don't already have the pattern you add.
> > > > >
> > > > > Hmm, so am I.
> > > >
> > > > It looks like we do it for equality compares which can also handle
> > > > types where overflow is undefined.  -fdump-tree-all-folding shows
> > > > in the 005t.original dump we simplify a - b != 0 via match.pd:6105:
> > > >
> > > > /* Transform comparisons of the form X - Y CMP 0 to X CMP Y.
> > > >    ??? The transformation is valid for the other operators if overflow
> > > >    is undefined for the type, but performing it here badly interacts
> > > >    with the transformation in fold_cond_expr_with_comparison which
> > > >    attempts to synthetize ABS_EXPR.  */
> > > > (for cmp (eq ne)
> > > >  (for sub (minus pointer_diff)
> > > >   (simplify
> > > >    (cmp (sub@2 @0 @1) integer_zerop)
> > > >    (if (single_use (@2))
> > > >     (cmp @0 @1)))))
> > > >
> > > > and the comment explains why we don't perform the transform
> > > > it the way you intend.  That comment might no longer apply
> > > > but I would guess there's testsuite fallout when you amend this
> > > > pattern.
> > >
> > > Thank you so much!  I will check this out.
> > >
> > > >
> > > > Richard.
> > > >
> > > > >  I saw that this optimization sometimes messes up VRP
> > > > > while sometimes helping it (as I described in my comment).  I also
> > > > > need to research this.
> > > > >
> > > > > >
> > > > > > Richard.
> > > > > >
> > > > > > > gcc/ChangeLog:
> > > > > > >
> > > > > > >         * Makefile.in: Add tree-ssa-cmp.o to OBJS.
> > > > > > >         * common.opt: Define ftree-cmp
> > > > > > >         * doc/invoke.texi: Document ftree-cmp.
> > > > > > >         * opts.cc (default_options_table): Handle OPT_ftree_cmp.
> > > > > > >         * passes.def (pass_cmp): New optimization pass.
> > > > > > >         * timevar.def (TV_TREE_CMP): New variable for timing.
> > > > > > >         * tree-pass.h (make_pass_cmp): New declaration.
> > > > > > >         * tree-ssa-cmp.cc: New file.
> > > > > > >
> > > > > > > gcc/testsuite/ChangeLog:
> > > > > > >
> > > > > > >         * gcc.dg/pr113680.c: New test.
> > > > > > >
> > > > > > > Signed-off-by: Ken Matsui <kmatsui@gcc.gnu.org>
> > > > > > > ---
> > > > > > >  gcc/Makefile.in                 |   1 +
> > > > > > >  gcc/common.opt                  |   4 +
> > > > > > >  gcc/doc/invoke.texi             |  11 +-
> > > > > > >  gcc/opts.cc                     |   1 +
> > > > > > >  gcc/passes.def                  |   3 +
> > > > > > >  gcc/testsuite/gcc.dg/pr113680.c |  47 ++++++
> > > > > > >  gcc/timevar.def                 |   1 +
> > > > > > >  gcc/tree-pass.h                 |   1 +
> > > > > > >  gcc/tree-ssa-cmp.cc             | 262 ++++++++++++++++++++++++++++++++
> > > > > > >  9 files changed, 330 insertions(+), 1 deletion(-)
> > > > > > >  create mode 100644 gcc/testsuite/gcc.dg/pr113680.c
> > > > > > >  create mode 100644 gcc/tree-ssa-cmp.cc
> > > > > > >
> > > > > > > diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> > > > > > > index a74761b7ab3..935b80b6947 100644
> > > > > > > --- a/gcc/Makefile.in
> > > > > > > +++ b/gcc/Makefile.in
> > > > > > > @@ -1731,6 +1731,7 @@ OBJS = \
> > > > > > >         tree-ssa-address.o \
> > > > > > >         tree-ssa-alias.o \
> > > > > > >         tree-ssa-ccp.o \
> > > > > > > +       tree-ssa-cmp.o \
> > > > > > >         tree-ssa-coalesce.o \
> > > > > > >         tree-ssa-copy.o \
> > > > > > >         tree-ssa-dce.o \
> > > > > > > diff --git a/gcc/common.opt b/gcc/common.opt
> > > > > > > index 51c4a17da83..7c853224458 100644
> > > > > > > --- a/gcc/common.opt
> > > > > > > +++ b/gcc/common.opt
> > > > > > > @@ -3053,6 +3053,10 @@ ftree-ch
> > > > > > >  Common Var(flag_tree_ch) Optimization
> > > > > > >  Enable loop header copying on trees.
> > > > > > >
> > > > > > > +ftree-cmp
> > > > > > > +Common Var(flag_tree_cmp) Optimization
> > > > > > > +Enable SSA comparison optimization on trees.
> > > > > > > +
> > > > > > >  ftree-coalesce-inlined-vars
> > > > > > >  Common Ignore RejectNegative
> > > > > > >  Does nothing.  Preserved for backward compatibility.
> > > > > > > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> > > > > > > index bdf05be387d..04762d490a3 100644
> > > > > > > --- a/gcc/doc/invoke.texi
> > > > > > > +++ b/gcc/doc/invoke.texi
> > > > > > > @@ -619,7 +619,7 @@ Objective-C and Objective-C++ Dialects}.
> > > > > > >  -fsplit-wide-types  -fsplit-wide-types-early  -fssa-backprop  -fssa-phiopt
> > > > > > >  -fstdarg-opt  -fstore-merging  -fstrict-aliasing -fipa-strict-aliasing
> > > > > > >  -fthread-jumps  -ftracer  -ftree-bit-ccp
> > > > > > > --ftree-builtin-call-dce  -ftree-ccp  -ftree-ch
> > > > > > > +-ftree-builtin-call-dce  -ftree-ccp  -ftree-ch  -ftree-cmp
> > > > > > >  -ftree-coalesce-vars  -ftree-copy-prop  -ftree-dce  -ftree-dominator-opts
> > > > > > >  -ftree-dse  -ftree-forwprop  -ftree-fre  -fcode-hoisting
> > > > > > >  -ftree-loop-if-convert  -ftree-loop-im
> > > > > > > @@ -12377,6 +12377,7 @@ compilation time.
> > > > > > >  -ftree-bit-ccp
> > > > > > >  -ftree-ccp
> > > > > > >  -ftree-ch
> > > > > > > +-ftree-cmp
> > > > > > >  -ftree-coalesce-vars
> > > > > > >  -ftree-copy-prop
> > > > > > >  -ftree-dce
> > > > > > > @@ -13707,6 +13708,14 @@ effectiveness of code motion optimizations.  It also saves one jump.  This flag
> > > > > > >  is enabled by default at @option{-O1} and higher.  It is not enabled
> > > > > > >  for @option{-Os}, since it usually increases code size.
> > > > > > >
> > > > > > > +@opindex ftree-cmp
> > > > > > > +@item -ftree-cmp
> > > > > > > +Perform comparison optimization on trees.  This decreases unnecessary
> > > > > > > +instructions for comparisons.  This flag is enabled by default at
> > > > > > > +@option{-O1} and higher.  This optimization is automatically disabled when
> > > > > > > +@option{-fwrapv} is specified to ensure correct behavior under signed integer
> > > > > > > +overflow wrapping.
> > > > > > > +
> > > > > > >  @opindex ftree-loop-optimize
> > > > > > >  @item -ftree-loop-optimize
> > > > > > >  Perform loop optimizations on trees.  This flag is enabled by default
> > > > > > > diff --git a/gcc/opts.cc b/gcc/opts.cc
> > > > > > > index 3333600e0ea..9ee64a48313 100644
> > > > > > > --- a/gcc/opts.cc
> > > > > > > +++ b/gcc/opts.cc
> > > > > > > @@ -589,6 +589,7 @@ static const struct default_options default_options_table[] =
> > > > > > >      { OPT_LEVELS_1_PLUS, OPT_ftree_builtin_call_dce, NULL, 1 },
> > > > > > >      { OPT_LEVELS_1_PLUS, OPT_ftree_ccp, NULL, 1 },
> > > > > > >      { OPT_LEVELS_1_PLUS, OPT_ftree_ch, NULL, 1 },
> > > > > > > +    { OPT_LEVELS_1_PLUS, OPT_ftree_cmp, NULL, 1 },
> > > > > > >      { OPT_LEVELS_1_PLUS, OPT_ftree_coalesce_vars, NULL, 1 },
> > > > > > >      { OPT_LEVELS_1_PLUS, OPT_ftree_copy_prop, NULL, 1 },
> > > > > > >      { OPT_LEVELS_1_PLUS, OPT_ftree_dce, NULL, 1 },
> > > > > > > diff --git a/gcc/passes.def b/gcc/passes.def
> > > > > > > index 1cbbd413097..e28a2d3c309 100644
> > > > > > > --- a/gcc/passes.def
> > > > > > > +++ b/gcc/passes.def
> > > > > > > @@ -84,6 +84,9 @@ along with GCC; see the file COPYING3.  If not see
> > > > > > >           /* After CCP we rewrite no longer addressed locals into SSA
> > > > > > >              form if possible.  */
> > > > > > >           NEXT_PASS (pass_forwprop);
> > > > > > > +         /* Do cmp before early_thread_jumps and vrp since it may
> > > > > > > +            modify conditions.  */
> > > > > > > +         NEXT_PASS (pass_cmp);
> > > > > > >            NEXT_PASS (pass_early_thread_jumps, /*first=*/true);
> > > > > > >           NEXT_PASS (pass_sra_early);
> > > > > > >           /* pass_build_ealias is a dummy pass that ensures that we
> > > > > > > diff --git a/gcc/testsuite/gcc.dg/pr113680.c b/gcc/testsuite/gcc.dg/pr113680.c
> > > > > > > new file mode 100644
> > > > > > > index 00000000000..59cd865aa21
> > > > > > > --- /dev/null
> > > > > > > +++ b/gcc/testsuite/gcc.dg/pr113680.c
> > > > > > > @@ -0,0 +1,47 @@
> > > > > > > +/* { dg-do compile } */
> > > > > > > +/* { dg-options "-Os -fdump-tree-optimized" } */
> > > > > > > +
> > > > > > > +volatile int i = 0;
> > > > > > > +
> > > > > > > +void foo() { i = 1; }
> > > > > > > +void bar() { i = 2; }
> > > > > > > +
> > > > > > > +void f1(int x, int y)
> > > > > > > +{
> > > > > > > +  int diff = x - y;
> > > > > > > +  if (diff > 0)
> > > > > > > +    foo();
> > > > > > > +  if (diff < 0)
> > > > > > > +    bar();
> > > > > > > +}
> > > > > > > +
> > > > > > > +void f2(int x, int y)
> > > > > > > +{
> > > > > > > +  if ((x - y) > 0)
> > > > > > > +    foo();
> > > > > > > +  if ((x - y) < 0)
> > > > > > > +    bar();
> > > > > > > +}
> > > > > > > +
> > > > > > > +void f3(int x, int y)
> > > > > > > +{
> > > > > > > +  if (x > y)
> > > > > > > +    foo();
> > > > > > > +  if (x < y)
> > > > > > > +    bar();
> > > > > > > +}
> > > > > > > +
> > > > > > > +void f4(int x, int y)
> > > > > > > +{
> > > > > > > +  int diff = x - y;
> > > > > > > +  if (diff > 0)
> > > > > > > +    foo();
> > > > > > > +  if (x < y)
> > > > > > > +    bar();
> > > > > > > +}
> > > > > > > +
> > > > > > > +/* { dg-final { scan-tree-dump-not " - " "optimized" } } */
> > > > > > > +/* { dg-final { scan-assembler-times "cmp" 1 { target { i?86-*-* x86_64-*-* } } } } */
> > > > > > > +/* { dg-final { scan-assembler-times "jmp\[ \t\]+f1" 3 { target { i?86-*-* x86_64-*-* } } } } */
> > > > > > > +/* { dg-final { scan-assembler-not "sub" { target { i?86-*-* x86_64-*-* } } } } */
> > > > > > > +/* { dg-final { scan-assembler-not "test" { target { i?86-*-* x86_64-*-* } } } } */
> > > > > > > diff --git a/gcc/timevar.def b/gcc/timevar.def
> > > > > > > index 8e2168e0817..468266d0c56 100644
> > > > > > > --- a/gcc/timevar.def
> > > > > > > +++ b/gcc/timevar.def
> > > > > > > @@ -157,6 +157,7 @@ DEFTIMEVAR (TV_TREE_EH                   , "tree eh")
> > > > > > >  DEFTIMEVAR (TV_TREE_CFG                     , "tree CFG construction")
> > > > > > >  DEFTIMEVAR (TV_TREE_CLEANUP_CFG             , "tree CFG cleanup")
> > > > > > >  DEFTIMEVAR (TV_TREE_TAIL_MERGE       , "tree tail merge")
> > > > > > > +DEFTIMEVAR (TV_TREE_CMP                     , "tree cmp")
> > > > > > >  DEFTIMEVAR (TV_TREE_VRP              , "tree VRP")
> > > > > > >  DEFTIMEVAR (TV_TREE_VRP_THREADER     , "tree VRP threader")
> > > > > > >  DEFTIMEVAR (TV_TREE_EARLY_VRP        , "tree Early VRP")
> > > > > > > diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
> > > > > > > index 29267589eeb..71789a279c0 100644
> > > > > > > --- a/gcc/tree-pass.h
> > > > > > > +++ b/gcc/tree-pass.h
> > > > > > > @@ -401,6 +401,7 @@ extern gimple_opt_pass *make_pass_ch (gcc::context *ctxt);
> > > > > > >  extern gimple_opt_pass *make_pass_ch_vect (gcc::context *ctxt);
> > > > > > >  extern gimple_opt_pass *make_pass_sccopy (gcc::context *ctxt);
> > > > > > >  extern gimple_opt_pass *make_pass_ccp (gcc::context *ctxt);
> > > > > > > +extern gimple_opt_pass *make_pass_cmp (gcc::context *ctxt);
> > > > > > >  extern gimple_opt_pass *make_pass_split_paths (gcc::context *ctxt);
> > > > > > >  extern gimple_opt_pass *make_pass_build_ssa (gcc::context *ctxt);
> > > > > > >  extern gimple_opt_pass *make_pass_build_alias (gcc::context *ctxt);
> > > > > > > diff --git a/gcc/tree-ssa-cmp.cc b/gcc/tree-ssa-cmp.cc
> > > > > > > new file mode 100644
> > > > > > > index 00000000000..41d4a58a5b5
> > > > > > > --- /dev/null
> > > > > > > +++ b/gcc/tree-ssa-cmp.cc
> > > > > > > @@ -0,0 +1,262 @@
> > > > > > > +/* Global, SSA-based optimizations using comparison identities.
> > > > > > > +   Copyright (C) 2024 Free Software Foundation, Inc.
> > > > > > > +
> > > > > > > +This file is part of GCC.
> > > > > > > +
> > > > > > > +GCC is free software; you can redistribute it and/or modify it
> > > > > > > +under the terms of the GNU General Public License as published by the
> > > > > > > +Free Software Foundation; either version 3, or (at your option) any
> > > > > > > +later version.
> > > > > > > +
> > > > > > > +GCC is distributed in the hope that it will be useful, but WITHOUT
> > > > > > > +ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> > > > > > > +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> > > > > > > +for more details.
> > > > > > > +
> > > > > > > +You should have received a copy of the GNU General Public License
> > > > > > > +along with GCC; see the file COPYING3.  If not see
> > > > > > > +<http://www.gnu.org/licenses/>.  */
> > > > > > > +
> > > > > > > +/* (x - y) CMP 0 is equivalent to x CMP y where x and y are signed integers and
> > > > > > > +   CMP is <, <=, >, or >=.  Similarly, 0 CMP (x - y) is equivalent to y CMP x.
> > > > > > > +   As reported in PR middle-end/113680, this equivalence does not hold for
> > > > > > > +   types other than signed integers.  When it comes to conditions, the former
> > > > > > > +   was translated to a combination of sub and test, whereas the latter was
> > > > > > > +   translated to a single cmp.  Thus, this optimization pass tries to optimize
> > > > > > > +   the former to the latter.
> > > > > > > +
> > > > > > > +   When `-fwrapv` is enabled, GCC treats the overflow of signed integers as
> > > > > > > +   defined behavior, specifically, wrapping around according to two's
> > > > > > > +   complement arithmetic.  This has implications for optimizations that
> > > > > > > +   rely on the standard behavior of signed integers, where overflow is
> > > > > > > +   undefined.  Consider the example given:
> > > > > > > +
> > > > > > > +       long long llmax = __LONG_LONG_MAX__;
> > > > > > > +       long long llmin = -llmax - 1;
> > > > > > > +
> > > > > > > +   Here, `llmax - llmin` effectively becomes `llmax - (-llmax - 1)`, which
> > > > > > > +   simplifies to `2 * llmax + 1`.  Given that `llmax` is the maximum value for
> > > > > > > +   a `long long`, this calculation overflows in a defined manner
> > > > > > > +   (wrapping around), which under `-fwrapv` is a legal operation that produces
> > > > > > > +   a negative value due to two's complement wraparound.  Therefore,
> > > > > > > +   `llmax - llmin < 0` is true.
> > > > > > > +
> > > > > > > +   However, the direct comparison `llmax < llmin` is false since `llmax` is the
> > > > > > > +   maximum possible value and `llmin` is the minimum.  Hence, optimizations
> > > > > > > +   that rely on the equivalence of `(x - y) CMP 0` to `x CMP y` (and vice versa)
> > > > > > > +   cannot be safely applied when `-fwrapv` is enabled.  This is why this
> > > > > > > +   optimization pass is disabled under `-fwrapv`.
> > > > > > > +
> > > > > > > +   This optimization pass must run before the Jump Threading pass and
> > > > > > > +   the VRP pass, as it may modify conditions. For example, in the VRP pass:
> > > > > > > +
> > > > > > > +       (1)
> > > > > > > +         int diff = x - y;
> > > > > > > +         if (diff > 0)
> > > > > > > +           foo();
> > > > > > > +         if (diff < 0)
> > > > > > > +           bar();
> > > > > > > +
> > > > > > > +   The second condition would be converted to diff != 0 in the VRP pass because
> > > > > > > +   we know the postcondition of the first condition is diff <= 0, and then
> > > > > > > +   diff != 0 is cheaper than diff < 0. If we apply this pass after this VRP,
> > > > > > > +   we get:
> > > > > > > +
> > > > > > > +       (2)
> > > > > > > +         int diff = x - y;
> > > > > > > +         if (x > y)
> > > > > > > +           foo();
> > > > > > > +         if (diff != 0)
> > > > > > > +           bar();
> > > > > > > +
> > > > > > > +   This generates sub and test for the second condition and cmp for the first
> > > > > > > +   condition. However, if we apply this pass beforehand, we simply get:
> > > > > > > +
> > > > > > > +       (3)
> > > > > > > +         int diff = x - y;
> > > > > > > +         if (x > y)
> > > > > > > +           foo();
> > > > > > > +         if (x < y)
> > > > > > > +           bar();
> > > > > > > +
> > > > > > > +   In this code, diff will be eliminated as a dead code, and sub and test will
> > > > > > > +   not be generated, which is more efficient.
> > > > > > > +
> > > > > > > +   For the Jump Threading pass, without this optimization pass, (1) and (3)
> > > > > > > +   above are recognized as different, which prevents TCO.  */
> > > > > > > +
> > > > > > > +#include "config.h"
> > > > > > > +#include "system.h"
> > > > > > > +#include "coretypes.h"
> > > > > > > +#include "backend.h"
> > > > > > > +#include "tree.h"
> > > > > > > +#include "gimple.h"
> > > > > > > +#include "tree-pass.h"
> > > > > > > +#include "ssa.h"
> > > > > > > +#include "gimple-iterator.h"
> > > > > > > +#include "domwalk.h"
> > > > > > > +
> > > > > > > +/* True if VAR is a signed integer, false otherwise.  */
> > > > > > > +
> > > > > > > +static bool
> > > > > > > +is_signed_integer(const_tree var)
> > > > > > > +{
> > > > > > > +  return (TREE_CODE (TREE_TYPE (var)) == INTEGER_TYPE
> > > > > > > +         && !TYPE_UNSIGNED (TREE_TYPE (var)));
> > > > > > > +}
> > > > > > > +
> > > > > > > +/* If EXPR is an integer zero, return it.  If EXPR is SSA_NAME, and its
> > > > > > > +   defining statement is a subtraction of two signed integers, return the
> > > > > > > +   MINUS_EXPR.  Otherwise, return NULL_TREE.  */
> > > > > > > +
> > > > > > > +static const_tree
> > > > > > > +prop_integer_zero_or_minus_expr (const_tree expr)
> > > > > > > +{
> > > > > > > +  if (integer_zerop (expr))
> > > > > > > +    return expr;
> > > > > > > +
> > > > > > > +  if (TREE_CODE (expr) != SSA_NAME)
> > > > > > > +    return NULL_TREE;
> > > > > > > +
> > > > > > > +  gimple *defining_stmt = SSA_NAME_DEF_STMT (expr);
> > > > > > > +  if (!is_gimple_assign (defining_stmt)
> > > > > > > +      || gimple_assign_rhs_code (defining_stmt) != MINUS_EXPR)
> > > > > > > +    return NULL_TREE;
> > > > > > > +
> > > > > > > +  tree rhs1 = gimple_assign_rhs1 (defining_stmt);
> > > > > > > +  if (!is_signed_integer (rhs1))
> > > > > > > +    return NULL_TREE;
> > > > > > > +
> > > > > > > +  tree rhs2 = gimple_assign_rhs2 (defining_stmt);
> > > > > > > +  if (!is_signed_integer (rhs2))
> > > > > > > +    return NULL_TREE;
> > > > > > > +
> > > > > > > +  return build2 (MINUS_EXPR, TREE_TYPE (expr), rhs1, rhs2);
> > > > > > > +}
> > > > > > > +
> > > > > > > +/* Optimize:
> > > > > > > +
> > > > > > > +       1. (x - y) CMP 0
> > > > > > > +       2. 0 CMP (x - y)
> > > > > > > +
> > > > > > > +   to:
> > > > > > > +
> > > > > > > +       1. x CMP y
> > > > > > > +       2. y CMP x
> > > > > > > +
> > > > > > > +   where CMP is either <, <=, >, or >=.  */
> > > > > > > +
> > > > > > > +static void
> > > > > > > +optimize_signed_comparison (gcond *stmt)
> > > > > > > +{
> > > > > > > +  const enum tree_code code = gimple_cond_code (stmt);
> > > > > > > +  if (code < LT_EXPR || GE_EXPR < code)
> > > > > > > +    return;
> > > > > > > +
> > > > > > > +  const_tree lhs = prop_integer_zero_or_minus_expr (gimple_cond_lhs (stmt));
> > > > > > > +  if (lhs == NULL_TREE)
> > > > > > > +    return;
> > > > > > > +
> > > > > > > +  const_tree rhs = prop_integer_zero_or_minus_expr (gimple_cond_rhs (stmt));
> > > > > > > +  if (rhs == NULL_TREE)
> > > > > > > +    return;
> > > > > > > +
> > > > > > > +  if (TREE_CODE (lhs) == MINUS_EXPR && integer_zerop (rhs))
> > > > > > > +    {
> > > > > > > +      /* Case 1: (x - y) CMP 0  */
> > > > > > > +      tree x = TREE_OPERAND (lhs, 0);
> > > > > > > +      tree y = TREE_OPERAND (lhs, 1);
> > > > > > > +
> > > > > > > +      /* => x CMP y  */
> > > > > > > +      gimple_cond_set_lhs (stmt, x);
> > > > > > > +      gimple_cond_set_rhs (stmt, y);
> > > > > > > +      update_stmt (stmt);
> > > > > > > +    }
> > > > > > > +  else if (integer_zerop (lhs) && TREE_CODE (rhs) == MINUS_EXPR)
> > > > > > > +    {
> > > > > > > +      /* Case 2: 0 CMP (x - y)  */
> > > > > > > +      tree x = TREE_OPERAND (rhs, 0);
> > > > > > > +      tree y = TREE_OPERAND (rhs, 1);
> > > > > > > +
> > > > > > > +      /* => y CMP x  */
> > > > > > > +      gimple_cond_set_lhs (stmt, y);
> > > > > > > +      gimple_cond_set_rhs (stmt, x);
> > > > > > > +      update_stmt (stmt);
> > > > > > > +    }
> > > > > > > +}
> > > > > > > +
> > > > > > > +/* Find signed integer comparisons where the operands are MINUS_EXPR and
> > > > > > > +   0, and replace them with the equivalent comparison of the operands of
> > > > > > > +   the MINUS_EXPR without the subtraction.  */
> > > > > > > +
> > > > > > > +namespace {
> > > > > > > +
> > > > > > > +const pass_data pass_data_cmp =
> > > > > > > +{
> > > > > > > +  GIMPLE_PASS, /* type */
> > > > > > > +  "cmp", /* name */
> > > > > > > +  OPTGROUP_NONE, /* optinfo_flags */
> > > > > > > +  TV_TREE_CMP, /* tv_id */
> > > > > > > +  PROP_ssa, /* properties_required */
> > > > > > > +  0, /* properties_provided */
> > > > > > > +  0, /* properties_destroyed */
> > > > > > > +  0, /* todo_flags_start */
> > > > > > > +  TODO_remove_unused_locals, /* todo_flags_finish */
> > > > > > > +};
> > > > > > > +
> > > > > > > +class pass_cmp : public gimple_opt_pass
> > > > > > > +{
> > > > > > > +public:
> > > > > > > +  pass_cmp (gcc::context *ctxt)
> > > > > > > +    : gimple_opt_pass (pass_data_cmp, ctxt)
> > > > > > > +  {}
> > > > > > > +
> > > > > > > +  /* opt_pass methods: */
> > > > > > > +  bool gate (function *) final override
> > > > > > > +    {
> > > > > > > +      /* If -fwrapv is enabled, this pass cannot be safely applied as described
> > > > > > > +        in the comment at the top of this file.  */
> > > > > > > +      return flag_tree_cmp != 0 && flag_wrapv == 0;
> > > > > > > +    }
> > > > > > > +
> > > > > > > +  unsigned int execute (function *) final override;
> > > > > > > +
> > > > > > > +}; // class pass_cmp
> > > > > > > +
> > > > > > > +class cmp_dom_walker : public dom_walker
> > > > > > > +{
> > > > > > > +public:
> > > > > > > +  cmp_dom_walker () : dom_walker (CDI_DOMINATORS) {}
> > > > > > > +
> > > > > > > +  void after_dom_children (basic_block) final override;
> > > > > > > +};
> > > > > > > +
> > > > > > > +void
> > > > > > > +cmp_dom_walker::after_dom_children (basic_block bb)
> > > > > > > +{
> > > > > > > +  gimple_stmt_iterator gsi;
> > > > > > > +
> > > > > > > +  for (gsi = gsi_after_labels (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> > > > > > > +    {
> > > > > > > +      gimple *stmt = gsi_stmt (gsi);
> > > > > > > +      if (gimple_code (stmt) == GIMPLE_COND)
> > > > > > > +       optimize_signed_comparison (as_a <gcond *> (stmt));
> > > > > > > +    }
> > > > > > > +}
> > > > > > > +
> > > > > > > +
> > > > > > > +unsigned int
> > > > > > > +pass_cmp::execute (function *)
> > > > > > > +{
> > > > > > > +  calculate_dominance_info (CDI_DOMINATORS);
> > > > > > > +  cmp_dom_walker ().walk (ENTRY_BLOCK_PTR_FOR_FN (cfun));
> > > > > > > +  return 0;
> > > > > > > +}
> > > > > > > +
> > > > > > > +} // anon namespace
> > > > > > > +
> > > > > > > +gimple_opt_pass *
> > > > > > > +make_pass_cmp (gcc::context *ctxt)
> > > > > > > +{
> > > > > > > +  return new pass_cmp (ctxt);
> > > > > > > +}
> > > > > > > --
> > > > > > > 2.44.0
> > > > > > >

middle-end/113680 - Optimize (x - y) CMP 0 as x CMP y

Commit Message

Comments

Patch