diff mbox series

[RFC] PR tree-optimization/109071 - -Warray-bounds false positive warnings due to code duplication from jump threading

Message ID 20240513194830.1676938-1-qing.zhao@oracle.com
State New
Headers show
Series [RFC] PR tree-optimization/109071 - -Warray-bounds false positive warnings due to code duplication from jump threading | expand

Commit Message

Qing Zhao May 13, 2024, 7:48 p.m. UTC
-Warray-bounds is an important option to enable linux kernal to keep
the array out-of-bound errors out of the source tree.

However, due to the false positive warnings reported in PR109071
(-Warray-bounds false positive warnings due to code duplication from
jump threading), -Warray-bounds=1 cannot be added on by default.

Although it's impossible to elinimate all the false positive warnings
from -Warray-bounds=1 (See PR104355 Misleading -Warray-bounds
documentation says "always out of bounds"), we should minimize the
false positive warnings in -Warray-bounds=1.

The root reason for the false positive warnings reported in PR109071 is:

When the thread jump optimization tries to reduce the # of branches
inside the routine, sometimes it needs to duplicate the code and
split into two conditional pathes. for example:

The original code:

void sparx5_set (int * ptr, struct nums * sg, int index)
{
  if (index >= 4)
    warn ();
  *ptr = 0;
  *val = sg->vals[index];
  if (index >= 4)
    warn ();
  *ptr = *val;

  return;
}

With the thread jump, the above becomes:

void sparx5_set (int * ptr, struct nums * sg, int index)
{
  if (index >= 4)
    {
      warn ();
      *ptr = 0;		// Code duplications since "warn" does return;
      *val = sg->vals[index];	// same this line.
				// In this path, since it's under the condition
				// "index >= 4", the compiler knows the value
				// of "index" is larger then 4, therefore the
				// out-of-bound warning.
      warn ();
    }
  else
    {
      *ptr = 0;
      *val = sg->vals[index];
    }
  *ptr = *val;
  return;
}

We can see, after the thread jump optimization, the # of branches inside
the routine "sparx5_set" is reduced from 2 to 1, however,  due to the
code duplication (which is needed for the correctness of the code), we
got a false positive out-of-bound warning.

In order to eliminate such false positive out-of-bound warning,

A. Add one more flag for GIMPLE: is_splitted.
B. During the thread jump optimization, when the basic blocks are
   duplicated, mark all the STMTs inside the original and duplicated
   basic blocks as "is_splitted";
C. Inside the array bound checker, add the following new heuristic:

If
   1. the stmt is duplicated and splitted into two conditional paths;
+  2. the warning level < 2;
+  3. the current block is not dominating the exit block
Then not report the warning.

The false positive warnings are moved from -Warray-bounds=1 to
 -Warray-bounds=2 now.

Bootstrapped and regression tested on both x86 and aarch64. adjusted
 -Warray-bounds-61.c due to the false positive warnings.

Let me know if you have any comments and suggestions.

Thanks.

Qing


	PR tree optimization/109071

gcc/ChangeLog:

	* gimple-array-bounds.cc (check_out_of_bounds_and_warn): Add two new
	arguments for the new heuristic to not issue warnings.
	(array_bounds_checker::check_array_ref): Call the new prototype of the
	routine check_out_of_bounds_and_warn.
	(array_bounds_checker::check_mem_ref): Add one new argument for the
	new heuristic to not issue warnings.
	(array_bounds_checker::check_addr_expr): Call the new prototype of the
	routine check_mem_ref, add new heuristic for not issue warnings.
	(array_bounds_checker::check_array_bounds): Call the new prototype of
	the routine check_mem_ref.
	* gimple-array-bounds.h: New prototype of check_mem_ref.
	* gimple.h (struct GTY): Add one new flag is_splitted for gimple.
	(gimple_is_splitted_p): New function.
	(gimple_set_is_splitted): New function.
	* tree-ssa-threadupdate.cc (set_stmts_in_bb_is_splitted): New function.
	(back_jt_path_registry::duplicate_thread_path): Mark all the stmts in
	both original and copied blocks as IS_SPLITTED.

gcc/testsuite/ChangeLog:

	* gcc.dg/Warray-bounds-61.c: Adjust testing case.
	* gcc.dg/pr109071-1.c: New test.
	* gcc.dg/pr109071.c: New test.
---
 gcc/gimple-array-bounds.cc              | 46 +++++++++++++++++++++----
 gcc/gimple-array-bounds.h               |  2 +-
 gcc/gimple.h                            | 21 +++++++++--
 gcc/testsuite/gcc.dg/Warray-bounds-61.c |  6 ++--
 gcc/testsuite/gcc.dg/pr109071-1.c       | 22 ++++++++++++
 gcc/testsuite/gcc.dg/pr109071.c         | 22 ++++++++++++
 gcc/tree-ssa-threadupdate.cc            | 15 ++++++++
 7 files changed, 122 insertions(+), 12 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr109071-1.c
 create mode 100644 gcc/testsuite/gcc.dg/pr109071.c

Comments

Jeff Law May 13, 2024, 8:46 p.m. UTC | #1
On 5/13/24 1:48 PM, Qing Zhao wrote:
> -Warray-bounds is an important option to enable linux kernal to keep
> the array out-of-bound errors out of the source tree.
> 
> However, due to the false positive warnings reported in PR109071
> (-Warray-bounds false positive warnings due to code duplication from
> jump threading), -Warray-bounds=1 cannot be added on by default.
> 
> Although it's impossible to elinimate all the false positive warnings
> from -Warray-bounds=1 (See PR104355 Misleading -Warray-bounds
> documentation says "always out of bounds"), we should minimize the
> false positive warnings in -Warray-bounds=1.
> 
> The root reason for the false positive warnings reported in PR109071 is:
> 
> When the thread jump optimization tries to reduce the # of branches
> inside the routine, sometimes it needs to duplicate the code and
> split into two conditional pathes. for example:
> 
> The original code:
> 
> void sparx5_set (int * ptr, struct nums * sg, int index)
> {
>    if (index >= 4)
>      warn ();
>    *ptr = 0;
>    *val = sg->vals[index];
>    if (index >= 4)
>      warn ();
>    *ptr = *val;
> 
>    return;
> }
> 
> With the thread jump, the above becomes:
> 
> void sparx5_set (int * ptr, struct nums * sg, int index)
> {
>    if (index >= 4)
>      {
>        warn ();
>        *ptr = 0;		// Code duplications since "warn" does return;
>        *val = sg->vals[index];	// same this line.
> 				// In this path, since it's under the condition
> 				// "index >= 4", the compiler knows the value
> 				// of "index" is larger then 4, therefore the
> 				// out-of-bound warning.
>        warn ();
>      }
>    else
>      {
>        *ptr = 0;
>        *val = sg->vals[index];
>      }
>    *ptr = *val;
>    return;
> }
> 
> We can see, after the thread jump optimization, the # of branches inside
> the routine "sparx5_set" is reduced from 2 to 1, however,  due to the
> code duplication (which is needed for the correctness of the code), we
> got a false positive out-of-bound warning.
> 
> In order to eliminate such false positive out-of-bound warning,
> 
> A. Add one more flag for GIMPLE: is_splitted.
> B. During the thread jump optimization, when the basic blocks are
>     duplicated, mark all the STMTs inside the original and duplicated
>     basic blocks as "is_splitted";
> C. Inside the array bound checker, add the following new heuristic:
> 
> If
>     1. the stmt is duplicated and splitted into two conditional paths;
> +  2. the warning level < 2;
> +  3. the current block is not dominating the exit block
> Then not report the warning.
> 
> The false positive warnings are moved from -Warray-bounds=1 to
>   -Warray-bounds=2 now.
> 
> Bootstrapped and regression tested on both x86 and aarch64. adjusted
>   -Warray-bounds-61.c due to the false positive warnings.
> 
> Let me know if you have any comments and suggestions.
This sounds horribly wrong.   In the code above, the warning is correct.

Jeff
Kees Cook May 13, 2024, 9:41 p.m. UTC | #2
On Mon, May 13, 2024 at 02:46:32PM -0600, Jeff Law wrote:
> 
> 
> On 5/13/24 1:48 PM, Qing Zhao wrote:
> > -Warray-bounds is an important option to enable linux kernal to keep
> > the array out-of-bound errors out of the source tree.
> > 
> > However, due to the false positive warnings reported in PR109071
> > (-Warray-bounds false positive warnings due to code duplication from
> > jump threading), -Warray-bounds=1 cannot be added on by default.
> > 
> > Although it's impossible to elinimate all the false positive warnings
> > from -Warray-bounds=1 (See PR104355 Misleading -Warray-bounds
> > documentation says "always out of bounds"), we should minimize the
> > false positive warnings in -Warray-bounds=1.
> > 
> > The root reason for the false positive warnings reported in PR109071 is:
> > 
> > When the thread jump optimization tries to reduce the # of branches
> > inside the routine, sometimes it needs to duplicate the code and
> > split into two conditional pathes. for example:
> > 
> > The original code:
> > 
> > void sparx5_set (int * ptr, struct nums * sg, int index)
> > {
> >    if (index >= 4)
> >      warn ();
> >    *ptr = 0;
> >    *val = sg->vals[index];
> >    if (index >= 4)
> >      warn ();
> >    *ptr = *val;
> > 
> >    return;
> > }
> > 
> > With the thread jump, the above becomes:
> > 
> > void sparx5_set (int * ptr, struct nums * sg, int index)
> > {
> >    if (index >= 4)
> >      {
> >        warn ();
> >        *ptr = 0;		// Code duplications since "warn" does return;
> >        *val = sg->vals[index];	// same this line.
> > 				// In this path, since it's under the condition
> > 				// "index >= 4", the compiler knows the value
> > 				// of "index" is larger then 4, therefore the
> > 				// out-of-bound warning.
> >        warn ();
> >      }
> >    else
> >      {
> >        *ptr = 0;
> >        *val = sg->vals[index];
> >      }
> >    *ptr = *val;
> >    return;
> > }
> > 
> > We can see, after the thread jump optimization, the # of branches inside
> > the routine "sparx5_set" is reduced from 2 to 1, however,  due to the
> > code duplication (which is needed for the correctness of the code), we
> > got a false positive out-of-bound warning.
> > 
> > In order to eliminate such false positive out-of-bound warning,
> > 
> > A. Add one more flag for GIMPLE: is_splitted.
> > B. During the thread jump optimization, when the basic blocks are
> >     duplicated, mark all the STMTs inside the original and duplicated
> >     basic blocks as "is_splitted";
> > C. Inside the array bound checker, add the following new heuristic:
> > 
> > If
> >     1. the stmt is duplicated and splitted into two conditional paths;
> > +  2. the warning level < 2;
> > +  3. the current block is not dominating the exit block
> > Then not report the warning.
> > 
> > The false positive warnings are moved from -Warray-bounds=1 to
> >   -Warray-bounds=2 now.
> > 
> > Bootstrapped and regression tested on both x86 and aarch64. adjusted
> >   -Warray-bounds-61.c due to the false positive warnings.
> > 
> > Let me know if you have any comments and suggestions.
> This sounds horribly wrong.   In the code above, the warning is correct.

It's not sensible from a user's perspective.

If this doesn't warn:

void sparx5_set (int * ptr, struct nums * sg, int index)
{
   *ptr = 0;
   *val = sg->vals[index];
   *ptr = *val;
}

... because the value range tracking of "index" spans [INT_MIN,INT_MAX],
and warnings based on the value range are silenced if they haven't been
clamped at all. (Otherwise warnings would be produced everywhere: only
when a limited set of values is known is it useful to produce a warning.)


But it makes no sense to warn about:

void sparx5_set (int * ptr, struct nums * sg, int index)
{
   if (index >= 4)
     warn ();
   *ptr = 0;
   *val = sg->vals[index];
   if (index >= 4)
     warn ();
   *ptr = *val;
}

Because at "*val = sg->vals[index];" the actual value range tracking for
index is _still_ [INT_MIN,INT_MAX]. (Only within the "then" side of the
"if" statements is the range tracking [4,INT_MAX].)

However, in the case where jump threading has split the execution flow
and produced a copy of "*val = sg->vals[index];" where the value range
tracking for "index" is now [4,INT_MAX], is the warning valid. But it
is only for that instance. Reporting it for effectively both (there is
only 1 source line for the array indexing) is misleading because there
is nothing the user can do about it -- the compiler created the copy and
then noticed it had a range it could apply to that array index.

This situation makes -Warray-bounds unusable for the Linux kernel (we
cannot have false positives says BDFL), but we'd *really* like to have
it enabled since it usually finds real bugs. But these false positives
can't be fixed on our end. :( So, moving them to -Warray-bounds=2 makes
sense as that's the level documented to have potential false positives.

-Kees
Andrew Pinski May 13, 2024, 11:38 p.m. UTC | #3
On Mon, May 13, 2024, 11:41 PM Kees Cook <keescook@chromium.org> wrote:

> On Mon, May 13, 2024 at 02:46:32PM -0600, Jeff Law wrote:
> >
> >
> > On 5/13/24 1:48 PM, Qing Zhao wrote:
> > > -Warray-bounds is an important option to enable linux kernal to keep
> > > the array out-of-bound errors out of the source tree.
> > >
> > > However, due to the false positive warnings reported in PR109071
> > > (-Warray-bounds false positive warnings due to code duplication from
> > > jump threading), -Warray-bounds=1 cannot be added on by default.
> > >
> > > Although it's impossible to elinimate all the false positive warnings
> > > from -Warray-bounds=1 (See PR104355 Misleading -Warray-bounds
> > > documentation says "always out of bounds"), we should minimize the
> > > false positive warnings in -Warray-bounds=1.
> > >
> > > The root reason for the false positive warnings reported in PR109071
> is:
> > >
> > > When the thread jump optimization tries to reduce the # of branches
> > > inside the routine, sometimes it needs to duplicate the code and
> > > split into two conditional pathes. for example:
> > >
> > > The original code:
> > >
> > > void sparx5_set (int * ptr, struct nums * sg, int index)
> > > {
> > >    if (index >= 4)
> > >      warn ();
> > >    *ptr = 0;
> > >    *val = sg->vals[index];
> > >    if (index >= 4)
> > >      warn ();
> > >    *ptr = *val;
> > >
> > >    return;
> > > }
> > >
> > > With the thread jump, the above becomes:
> > >
> > > void sparx5_set (int * ptr, struct nums * sg, int index)
> > > {
> > >    if (index >= 4)
> > >      {
> > >        warn ();
> > >        *ptr = 0;            // Code duplications since "warn" does
> return;
> > >        *val = sg->vals[index];      // same this line.
> > >                             // In this path, since it's under the
> condition
> > >                             // "index >= 4", the compiler knows the
> value
> > >                             // of "index" is larger then 4, therefore
> the
> > >                             // out-of-bound warning.
> > >        warn ();
> > >      }
> > >    else
> > >      {
> > >        *ptr = 0;
> > >        *val = sg->vals[index];
> > >      }
> > >    *ptr = *val;
> > >    return;
> > > }
> > >
> > > We can see, after the thread jump optimization, the # of branches
> inside
> > > the routine "sparx5_set" is reduced from 2 to 1, however,  due to the
> > > code duplication (which is needed for the correctness of the code), we
> > > got a false positive out-of-bound warning.
> > >
> > > In order to eliminate such false positive out-of-bound warning,
> > >
> > > A. Add one more flag for GIMPLE: is_splitted.
> > > B. During the thread jump optimization, when the basic blocks are
> > >     duplicated, mark all the STMTs inside the original and duplicated
> > >     basic blocks as "is_splitted";
> > > C. Inside the array bound checker, add the following new heuristic:
> > >
> > > If
> > >     1. the stmt is duplicated and splitted into two conditional paths;
> > > +  2. the warning level < 2;
> > > +  3. the current block is not dominating the exit block
> > > Then not report the warning.
> > >
> > > The false positive warnings are moved from -Warray-bounds=1 to
> > >   -Warray-bounds=2 now.
> > >
> > > Bootstrapped and regression tested on both x86 and aarch64. adjusted
> > >   -Warray-bounds-61.c due to the false positive warnings.
> > >
> > > Let me know if you have any comments and suggestions.
> > This sounds horribly wrong.   In the code above, the warning is correct.
>
> It's not sensible from a user's perspective.
>
> If this doesn't warn:
>
> void sparx5_set (int * ptr, struct nums * sg, int index)
> {
>    *ptr = 0;
>    *val = sg->vals[index];
>    *ptr = *val;
> }
>
> ... because the value range tracking of "index" spans [INT_MIN,INT_MAX],
> and warnings based on the value range are silenced if they haven't been
> clamped at all. (Otherwise warnings would be produced everywhere: only
> when a limited set of values is known is it useful to produce a warning.)
>
>
> But it makes no sense to warn about:
>
> void sparx5_set (int * ptr, struct nums * sg, int index)
> {
>    if (index >= 4)
>      warn ();
>    *ptr = 0;
>    *val = sg->vals[index];
>    if (index >= 4)
>      warn ();
>    *ptr = *val;
> }
>
> Because at "*val = sg->vals[index];" the actual value range tracking for
> index is _still_ [INT_MIN,INT_MAX]. (Only within the "then" side of the
> "if" statements is the range tracking [4,INT_MAX].)
>
> However, in the case where jump threading has split the execution flow
> and produced a copy of "*val = sg->vals[index];" where the value range
> tracking for "index" is now [4,INT_MAX], is the warning valid. But it
> is only for that instance. Reporting it for effectively both (there is
> only 1 source line for the array indexing) is misleading because there
> is nothing the user can do about it -- the compiler created the copy and
> then noticed it had a range it could apply to that array index.
>

"there is nothing the user can do about it" is very much false. They could
change warn call into a noreturn function call instead.  (In the case of
the Linux kernel panic). There are things the user can do to fix the
warning and even get better code generation out of the compilers.

Thanks,
Andrew



> This situation makes -Warray-bounds unusable for the Linux kernel (we
> cannot have false positives says BDFL), but we'd *really* like to have
> it enabled since it usually finds real bugs. But these false positives
> can't be fixed on our end. :( So, moving them to -Warray-bounds=2 makes
> sense as that's the level documented to have potential false positives.
>
> -Kees
>
> --
> Kees Cook
>
Kees Cook May 14, 2024, 12:14 a.m. UTC | #4
On Tue, May 14, 2024 at 01:38:49AM +0200, Andrew Pinski wrote:
> On Mon, May 13, 2024, 11:41 PM Kees Cook <keescook@chromium.org> wrote:
> > But it makes no sense to warn about:
> >
> > void sparx5_set (int * ptr, struct nums * sg, int index)
> > {
> >    if (index >= 4)
> >      warn ();
> >    *ptr = 0;
> >    *val = sg->vals[index];
> >    if (index >= 4)
> >      warn ();
> >    *ptr = *val;
> > }
> >
> > Because at "*val = sg->vals[index];" the actual value range tracking for
> > index is _still_ [INT_MIN,INT_MAX]. (Only within the "then" side of the
> > "if" statements is the range tracking [4,INT_MAX].)
> >
> > However, in the case where jump threading has split the execution flow
> > and produced a copy of "*val = sg->vals[index];" where the value range
> > tracking for "index" is now [4,INT_MAX], is the warning valid. But it
> > is only for that instance. Reporting it for effectively both (there is
> > only 1 source line for the array indexing) is misleading because there
> > is nothing the user can do about it -- the compiler created the copy and
> > then noticed it had a range it could apply to that array index.
> >
> 
> "there is nothing the user can do about it" is very much false. They could
> change warn call into a noreturn function call instead.  (In the case of
> the Linux kernel panic). There are things the user can do to fix the
> warning and even get better code generation out of the compilers.

This isn't about warn() not being noreturn. The warn() could be any
function call; the jump threading still happens.

GCC is warning about a compiler-constructed situation that cannot be
reliably fixed on the source side (GCC emitting the warning is highly
unstable in these cases), since the condition is not *always* true for
the given line of code. If it is not useful to warn for "array[index]"
being out of range when "index" is always [INT_MIN,INT_MAX], then it
is not useful to warn when "index" MAY be [INT_MIN,INT_MAX] for a given
line of code.

-Kees
Richard Biener May 14, 2024, 1:08 p.m. UTC | #5
On Mon, 13 May 2024, Qing Zhao wrote:

> -Warray-bounds is an important option to enable linux kernal to keep
> the array out-of-bound errors out of the source tree.
> 
> However, due to the false positive warnings reported in PR109071
> (-Warray-bounds false positive warnings due to code duplication from
> jump threading), -Warray-bounds=1 cannot be added on by default.
> 
> Although it's impossible to elinimate all the false positive warnings
> from -Warray-bounds=1 (See PR104355 Misleading -Warray-bounds
> documentation says "always out of bounds"), we should minimize the
> false positive warnings in -Warray-bounds=1.
> 
> The root reason for the false positive warnings reported in PR109071 is:
> 
> When the thread jump optimization tries to reduce the # of branches
> inside the routine, sometimes it needs to duplicate the code and
> split into two conditional pathes. for example:
> 
> The original code:
> 
> void sparx5_set (int * ptr, struct nums * sg, int index)
> {
>   if (index >= 4)
>     warn ();
>   *ptr = 0;
>   *val = sg->vals[index];
>   if (index >= 4)
>     warn ();
>   *ptr = *val;
> 
>   return;
> }
> 
> With the thread jump, the above becomes:
> 
> void sparx5_set (int * ptr, struct nums * sg, int index)
> {
>   if (index >= 4)
>     {
>       warn ();
>       *ptr = 0;		// Code duplications since "warn" does return;
>       *val = sg->vals[index];	// same this line.
> 				// In this path, since it's under the condition
> 				// "index >= 4", the compiler knows the value
> 				// of "index" is larger then 4, therefore the
> 				// out-of-bound warning.
>       warn ();
>     }
>   else
>     {
>       *ptr = 0;
>       *val = sg->vals[index];
>     }
>   *ptr = *val;
>   return;
> }
> 
> We can see, after the thread jump optimization, the # of branches inside
> the routine "sparx5_set" is reduced from 2 to 1, however,  due to the
> code duplication (which is needed for the correctness of the code), we
> got a false positive out-of-bound warning.
> 
> In order to eliminate such false positive out-of-bound warning,
> 
> A. Add one more flag for GIMPLE: is_splitted.
> B. During the thread jump optimization, when the basic blocks are
>    duplicated, mark all the STMTs inside the original and duplicated
>    basic blocks as "is_splitted";
> C. Inside the array bound checker, add the following new heuristic:
> 
> If
>    1. the stmt is duplicated and splitted into two conditional paths;
> +  2. the warning level < 2;
> +  3. the current block is not dominating the exit block
> Then not report the warning.
> 
> The false positive warnings are moved from -Warray-bounds=1 to
>  -Warray-bounds=2 now.
> 
> Bootstrapped and regression tested on both x86 and aarch64. adjusted
>  -Warray-bounds-61.c due to the false positive warnings.
> 
> Let me know if you have any comments and suggestions.

At the last Cauldron I talked with David Malcolm about these kind of
issues and thought of instead of suppressing diagnostics to record
how a block was duplicated.  For jump threading my idea was to record
the condition that was proved true when entering the path and do this
by recording the corresponding locations so that in the end we can
use the diagnostic-path infrastructure to say

warning: array index always above array bounds
events 1:

| 3 |  if (index >= 4)
         |
        (1) when index >= 4

it would be possible to record the info as part of the ad-hoc
location data on each duplicated stmt or, possibly simpler,
as part of a debug stmt of new kind.

I'm not sure pruning the warnings is a good thing to do.  One
would argue we should instead isolate such path as unreachable
since it invokes undefined behavior.  In particular your
example is clearly a bug and should be diagnosed.

Note very similar issues happen when unrolling a loop.

Note all late diagnostics are prone to these kind of issues.

Richard.

> Thanks.
> 
> Qing
> 
> 
> 	PR tree optimization/109071
> 
> gcc/ChangeLog:
> 
> 	* gimple-array-bounds.cc (check_out_of_bounds_and_warn): Add two new
> 	arguments for the new heuristic to not issue warnings.
> 	(array_bounds_checker::check_array_ref): Call the new prototype of the
> 	routine check_out_of_bounds_and_warn.
> 	(array_bounds_checker::check_mem_ref): Add one new argument for the
> 	new heuristic to not issue warnings.
> 	(array_bounds_checker::check_addr_expr): Call the new prototype of the
> 	routine check_mem_ref, add new heuristic for not issue warnings.
> 	(array_bounds_checker::check_array_bounds): Call the new prototype of
> 	the routine check_mem_ref.
> 	* gimple-array-bounds.h: New prototype of check_mem_ref.
> 	* gimple.h (struct GTY): Add one new flag is_splitted for gimple.
> 	(gimple_is_splitted_p): New function.
> 	(gimple_set_is_splitted): New function.
> 	* tree-ssa-threadupdate.cc (set_stmts_in_bb_is_splitted): New function.
> 	(back_jt_path_registry::duplicate_thread_path): Mark all the stmts in
> 	both original and copied blocks as IS_SPLITTED.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.dg/Warray-bounds-61.c: Adjust testing case.
> 	* gcc.dg/pr109071-1.c: New test.
> 	* gcc.dg/pr109071.c: New test.
> ---
>  gcc/gimple-array-bounds.cc              | 46 +++++++++++++++++++++----
>  gcc/gimple-array-bounds.h               |  2 +-
>  gcc/gimple.h                            | 21 +++++++++--
>  gcc/testsuite/gcc.dg/Warray-bounds-61.c |  6 ++--
>  gcc/testsuite/gcc.dg/pr109071-1.c       | 22 ++++++++++++
>  gcc/testsuite/gcc.dg/pr109071.c         | 22 ++++++++++++
>  gcc/tree-ssa-threadupdate.cc            | 15 ++++++++
>  7 files changed, 122 insertions(+), 12 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/pr109071-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/pr109071.c
> 
> diff --git a/gcc/gimple-array-bounds.cc b/gcc/gimple-array-bounds.cc
> index 008071cd5464..4a2975623bc1 100644
> --- a/gcc/gimple-array-bounds.cc
> +++ b/gcc/gimple-array-bounds.cc
> @@ -264,7 +264,9 @@ check_out_of_bounds_and_warn (location_t location, tree ref,
>  			      tree up_bound, tree up_bound_p1,
>  			      const value_range *vr,
>  			      bool ignore_off_by_one, bool for_array_bound,
> -			      bool *out_of_bound)
> +			      bool *out_of_bound,
> +			      bool is_splitted,
> +			      bool is_dominate_exit)
>  {
>    tree min, max;
>    tree low_bound = array_ref_low_bound (ref);
> @@ -273,6 +275,13 @@ check_out_of_bounds_and_warn (location_t location, tree ref,
>    bool warned = false;
>    *out_of_bound = false;
>  
> +  /* If the stmt is duplicated and splitted, the warning level is not 2,
> +     and the current block is not dominating the exit block, not report
> +     the warning.  */
> +  if (is_splitted && warn_array_bounds < 2
> +      && !is_dominate_exit)
> +    return false;
> +
>    /* Empty array.  */
>    if (up_bound && tree_int_cst_equal (low_bound, up_bound_p1))
>      {
> @@ -386,12 +395,17 @@ array_bounds_checker::check_array_ref (location_t location, tree ref,
>  	}
>      }
>  
> +  bool is_dominate_exit = dominated_by_p (CDI_DOMINATORS,
> +					  EXIT_BLOCK_PTR_FOR_FN (fun),
> +					  gimple_bb (stmt));
> +
>    warned = check_out_of_bounds_and_warn (location, ref,
>  					 low_sub_org, low_sub, up_sub,
>  					 up_bound, up_bound_p1, &vr,
>  					 ignore_off_by_one, warn_array_bounds,
> -					 &out_of_bound);
> -
> +					 &out_of_bound,
> +					 gimple_is_splitted_p (stmt),
> +					 is_dominate_exit);
>  
>    if (!warned && sam == special_array_member::int_0)
>      warned = warning_at (location, OPT_Wzero_length_bounds,
> @@ -476,7 +490,7 @@ array_bounds_checker::check_array_ref (location_t location, tree ref,
>  
>  bool
>  array_bounds_checker::check_mem_ref (location_t location, tree ref,
> -				     bool ignore_off_by_one)
> +				     gimple *stmt, bool ignore_off_by_one)
>  {
>    if (warning_suppressed_p (ref, OPT_Warray_bounds_))
>      return false;
> @@ -574,6 +588,16 @@ array_bounds_checker::check_mem_ref (location_t location, tree ref,
>  	}
>      }
>  
> +  /* If the stmt is duplicated and splitted, the warning level is not 2,
> +     and the current block is not dominating the exit block, not report
> +     the warning.  */
> +  bool is_dominate_exit = dominated_by_p (CDI_DOMINATORS,
> +					  EXIT_BLOCK_PTR_FOR_FN (fun),
> +					  gimple_bb (stmt));
> +  if (gimple_is_splitted_p (stmt) && warn_array_bounds < 2
> +      && !is_dominate_exit)
> +    return false;
> +
>    bool warned = false;
>    if (lboob)
>      {
> @@ -654,7 +678,7 @@ array_bounds_checker::check_addr_expr (location_t location, tree t,
>  	  ignore_off_by_one = false;
>  	}
>        else if (TREE_CODE (t) == MEM_REF)
> -	warned = check_mem_ref (location, t, ignore_off_by_one);
> +	warned = check_mem_ref (location, t, stmt, ignore_off_by_one);
>  
>        if (warned)
>  	suppress_warning (t, OPT_Warray_bounds_);
> @@ -690,6 +714,16 @@ array_bounds_checker::check_addr_expr (location_t location, tree t,
>    if (!mem_ref_offset (t).is_constant (&idx))
>      return;
>  
> +  /* If the stmt is duplicated and splitted, the warning level is not 2,
> +     and the current block is not dominating the exit block, not report
> +     the warning.  */
> +  bool is_dominate_exit = dominated_by_p (CDI_DOMINATORS,
> +					  EXIT_BLOCK_PTR_FOR_FN (fun),
> +					  gimple_bb (stmt));
> +  if (gimple_is_splitted_p (stmt) && warn_array_bounds < 2
> +      && !is_dominate_exit)
> +    return;
> +
>    bool warned = false;
>    idx = wi::sdiv_trunc (idx, wi::to_offset (el_sz));
>    if (idx < 0)
> @@ -809,7 +843,7 @@ array_bounds_checker::check_array_bounds (tree *tp, int *walk_subtree,
>      warned = checker->check_array_ref (location, t, wi->stmt,
>  				       false/*ignore_off_by_one*/);
>    else if (TREE_CODE (t) == MEM_REF)
> -    warned = checker->check_mem_ref (location, t,
> +    warned = checker->check_mem_ref (location, t, wi->stmt,
>  				     false /*ignore_off_by_one*/);
>    else if (TREE_CODE (t) == ADDR_EXPR)
>      {
> diff --git a/gcc/gimple-array-bounds.h b/gcc/gimple-array-bounds.h
> index 3e077d0178ff..7c98f02204c9 100644
> --- a/gcc/gimple-array-bounds.h
> +++ b/gcc/gimple-array-bounds.h
> @@ -33,7 +33,7 @@ public:
>  private:
>    static tree check_array_bounds (tree *tp, int *walk_subtree, void *data);
>    bool check_array_ref (location_t, tree, gimple *, bool ignore_off_by_one);
> -  bool check_mem_ref (location_t, tree, bool ignore_off_by_one);
> +  bool check_mem_ref (location_t, tree, gimple *, bool ignore_off_by_one);
>    void check_addr_expr (location_t, tree, gimple *);
>    void get_value_range (irange &r, const_tree op, gimple *);
>  
> diff --git a/gcc/gimple.h b/gcc/gimple.h
> index bd315ffc2dd4..08f52d107084 100644
> --- a/gcc/gimple.h
> +++ b/gcc/gimple.h
> @@ -254,8 +254,8 @@ struct GTY((desc ("gimple_statement_structure (&%h)"), tag ("GSS_BASE"),
>    /* Nonzero if this statement contains volatile operands.  */
>    unsigned has_volatile_ops 	: 1;
>  
> -  /* Padding to get subcode to 16 bit alignment.  */
> -  unsigned pad			: 1;
> +  /* Nonzero if this statement is duplicated and splitted to two pathes.  */
> +  unsigned is_splitted		: 1;
>  
>    /* The SUBCODE field can be used for tuple-specific flags for tuples
>       that do not require subcodes.  Note that SUBCODE should be at
> @@ -2327,6 +2327,23 @@ gimple_set_has_volatile_ops (gimple *stmt, bool volatilep)
>      stmt->has_volatile_ops = (unsigned) volatilep;
>  }
>  
> +/* Return true if statement G's is_splitted field has
> +   been set.  */
> +
> +inline bool
> +gimple_is_splitted_p (const gimple *g)
> +{
> +  return (bool) g->is_splitted;
> +}
> +
> +/* Set the IS_SPLITTED flag to IS_SPLITTEDP.  */
> +
> +inline void
> +gimple_set_is_splitted (gimple *s, bool is_splittedp)
> +{
> +    s->is_splitted = (unsigned) is_splittedp;
> +}
> +
>  /* Return true if STMT is in a transaction.  */
>  
>  inline bool
> diff --git a/gcc/testsuite/gcc.dg/Warray-bounds-61.c b/gcc/testsuite/gcc.dg/Warray-bounds-61.c
> index 5b66cdc0aab1..cb3c64a813d7 100644
> --- a/gcc/testsuite/gcc.dg/Warray-bounds-61.c
> +++ b/gcc/testsuite/gcc.dg/Warray-bounds-61.c
> @@ -23,7 +23,7 @@ void test_ua3_ua0_a0 (int i)
>  
>    if (i < __LINE__)
>      i = 5;
> -  ua3_a0.a0[i] = 0;           // { dg-warning "\\\[-Warray-bounds" }
> +  ua3_a0.a0[i] = 0;           // { dg-bogus "\\\[-Warray-bounds" }
>  
>    if (i > -1)
>      i = -1;
> @@ -44,7 +44,7 @@ void test_ua3_ua0_a1 (int i)
>  
>    if (i > -1)
>      i = -1;
> -  ua3_a0.a1[i] = 0;           // { dg-warning "\\\[-Warray-bounds" }
> +  ua3_a0.a1[i] = 0;           // { dg-bogus "\\\[-Warray-bounds" }
>  
>    if (i < 7)
>      i = 7;
> @@ -60,7 +60,7 @@ void test_ua3_ua0_a2 (int i)
>  
>    if (i < __LINE__)
>      i = __LINE__;
> -  ua3_a0.a2[i] = 0;           // { dg-warning "\\\[-Warray-bounds" }
> +  ua3_a0.a2[i] = 0;           // { dg-bogus "\\\[-Warray-bounds" }
>  
>    if (i > -1)
>      i = -1;
> diff --git a/gcc/testsuite/gcc.dg/pr109071-1.c b/gcc/testsuite/gcc.dg/pr109071-1.c
> new file mode 100644
> index 000000000000..a405c80bd549
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr109071-1.c
> @@ -0,0 +1,22 @@
> +/* PR tree-optimization/109071 -Warray-bounds false positive warnings
> +   due to code duplication from jump threading 
> +   { dg-do compile }
> +   { dg-options "-O2 -Warray-bounds=2" }
> + */
> +
> +extern void warn(void);
> +static inline void assign(int val, int *regs, int index)
> +{
> +  if (index >= 4)
> +    warn();
> +  *regs = val;
> +}
> +struct nums {int vals[4];};
> +
> +void sparx5_set (int *ptr, struct nums *sg, int index)
> +{
> +  int *val = &sg->vals[index]; /* { dg-warning "is above array bounds" } */
> +
> +  assign(0,    ptr, index);
> +  assign(*val, ptr, index);
> +}
> diff --git a/gcc/testsuite/gcc.dg/pr109071.c b/gcc/testsuite/gcc.dg/pr109071.c
> new file mode 100644
> index 000000000000..782dfad84ea2
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr109071.c
> @@ -0,0 +1,22 @@
> +/* PR tree-optimization/109071 -Warray-bounds false positive warnings
> +   due to code duplication from jump threading 
> +   { dg-do compile }
> +   { dg-options "-O2 -Wall" }
> + */
> +
> +extern void warn(void);
> +static inline void assign(int val, int *regs, int index)
> +{
> +  if (index >= 4)
> +    warn();
> +  *regs = val;
> +}
> +struct nums {int vals[4];};
> +
> +void sparx5_set (int *ptr, struct nums *sg, int index)
> +{
> +  int *val = &sg->vals[index]; /* { dg-bogus "is above array bounds" } */
> +
> +  assign(0,    ptr, index);
> +  assign(*val, ptr, index);
> +}
> diff --git a/gcc/tree-ssa-threadupdate.cc b/gcc/tree-ssa-threadupdate.cc
> index fa61ba9512b7..9f338dd4d54d 100644
> --- a/gcc/tree-ssa-threadupdate.cc
> +++ b/gcc/tree-ssa-threadupdate.cc
> @@ -2371,6 +2371,17 @@ back_jt_path_registry::adjust_paths_after_duplication (unsigned curr_path_num)
>      }
>  }
>  
> +/* Set all the stmts in the basic block BB as IS_SPLITTED.  */
> +
> +static void
> +set_stmts_in_bb_is_splitted (basic_block bb)
> +{
> +  gimple_stmt_iterator gsi;
> +  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> +    gimple_set_is_splitted (gsi_stmt (gsi), true);
> +  return;
> +}
> +
>  /* Duplicates a jump-thread path of N_REGION basic blocks.
>     The ENTRY edge is redirected to the duplicate of the region.
>  
> @@ -2418,6 +2429,10 @@ back_jt_path_registry::duplicate_thread_path (edge entry,
>    basic_block *region_copy = XNEWVEC (basic_block, n_region);
>    copy_bbs (region, n_region, region_copy, &exit, 1, &exit_copy, loop,
>  	    split_edge_bb_loc (entry), false);
> +  /* Mark all the stmts in both original and copied basic blocks
> +     as IS_SPLITTED.  */
> +  set_stmts_in_bb_is_splitted (*region);
> +  set_stmts_in_bb_is_splitted (*region_copy);
>  
>    /* Fix up: copy_bbs redirects all edges pointing to copied blocks.  The
>       following code ensures that all the edges exiting the jump-thread path are
>
Qing Zhao May 14, 2024, 2:17 p.m. UTC | #6
> On May 14, 2024, at 09:08, Richard Biener <rguenther@suse.de> wrote:
> 
> On Mon, 13 May 2024, Qing Zhao wrote:
> 
>> -Warray-bounds is an important option to enable linux kernal to keep
>> the array out-of-bound errors out of the source tree.
>> 
>> However, due to the false positive warnings reported in PR109071
>> (-Warray-bounds false positive warnings due to code duplication from
>> jump threading), -Warray-bounds=1 cannot be added on by default.
>> 
>> Although it's impossible to elinimate all the false positive warnings
>> from -Warray-bounds=1 (See PR104355 Misleading -Warray-bounds
>> documentation says "always out of bounds"), we should minimize the
>> false positive warnings in -Warray-bounds=1.
>> 
>> The root reason for the false positive warnings reported in PR109071 is:
>> 
>> When the thread jump optimization tries to reduce the # of branches
>> inside the routine, sometimes it needs to duplicate the code and
>> split into two conditional pathes. for example:
>> 
>> The original code:
>> 
>> void sparx5_set (int * ptr, struct nums * sg, int index)
>> {
>>  if (index >= 4)
>>    warn ();
>>  *ptr = 0;
>>  *val = sg->vals[index];
>>  if (index >= 4)
>>    warn ();
>>  *ptr = *val;
>> 
>>  return;
>> }
>> 
>> With the thread jump, the above becomes:
>> 
>> void sparx5_set (int * ptr, struct nums * sg, int index)
>> {
>>  if (index >= 4)
>>    {
>>      warn ();
>>      *ptr = 0; // Code duplications since "warn" does return;
>>      *val = sg->vals[index]; // same this line.
>> // In this path, since it's under the condition
>> // "index >= 4", the compiler knows the value
>> // of "index" is larger then 4, therefore the
>> // out-of-bound warning.
>>      warn ();
>>    }
>>  else
>>    {
>>      *ptr = 0;
>>      *val = sg->vals[index];
>>    }
>>  *ptr = *val;
>>  return;
>> }
>> 
>> We can see, after the thread jump optimization, the # of branches inside
>> the routine "sparx5_set" is reduced from 2 to 1, however,  due to the
>> code duplication (which is needed for the correctness of the code), we
>> got a false positive out-of-bound warning.
>> 
>> In order to eliminate such false positive out-of-bound warning,
>> 
>> A. Add one more flag for GIMPLE: is_splitted.
>> B. During the thread jump optimization, when the basic blocks are
>>   duplicated, mark all the STMTs inside the original and duplicated
>>   basic blocks as "is_splitted";
>> C. Inside the array bound checker, add the following new heuristic:
>> 
>> If
>>   1. the stmt is duplicated and splitted into two conditional paths;
>> +  2. the warning level < 2;
>> +  3. the current block is not dominating the exit block
>> Then not report the warning.
>> 
>> The false positive warnings are moved from -Warray-bounds=1 to
>> -Warray-bounds=2 now.
>> 
>> Bootstrapped and regression tested on both x86 and aarch64. adjusted
>> -Warray-bounds-61.c due to the false positive warnings.
>> 
>> Let me know if you have any comments and suggestions.
> 
> At the last Cauldron I talked with David Malcolm about these kind of
> issues and thought of instead of suppressing diagnostics to record
> how a block was duplicated.  For jump threading my idea was to record
> the condition that was proved true when entering the path and do this
> by recording the corresponding locations so that in the end we can
> use the diagnostic-path infrastructure to say
> 
> warning: array index always above array bounds
> events 1:
> 
> | 3 |  if (index >= 4)
>         |
>        (1) when index >= 4

Yes, this is a good idea. 

The current major issue with the warning is:  the constant index value 4 is not in the source code, it’s a compiler generated intermediate value (even though it’s a correct value -:)). Such warning messages confuse the end-users with information that cannot be connected directly to the source code. 

With the above recorded “events” information, the warning messages should make good sense to the end user, and also help the end user to locate the place where the fix in the source code can be added. 

Actually, with the above warning information, the user can locate the place “line 3” to add fixes as following:

        if (*index >= 4)
          {
                warn();
                *index = 3;
          }

I.e.
[109071]$ diff t_org.c t.c
2c2
< static inline void assign(int val, int *regs, int index)
---
> static inline void assign(int val, int *regs, int *index)
4c4,5
< if (index >= 4)
---
> if (*index >= 4)
>   {
5a7,8
> *index = 3;
>   }
14,15c17,18
< assign(0,    ptr, index);
< assign(*val, ptr, index);
---
> assign(0,    ptr, &index);
> assign(*val, ptr, &index);

> 
> it would be possible to record the info as part of the ad-hoc
> location data on each duplicated stmt or, possibly simpler,
> as part of a debug stmt of new kind.

Recording such info to each stmt might be more reliable? 

> 
> I'm not sure pruning the warnings is a good thing to do.  One
> would argue we should instead isolate such path as unreachable
> since it invokes undefined behavior.  In particular your
> example is clearly a bug and should be diagnosed.

I agree that the current warning is correct (that’s the reason I only moved it from -Warray-bounds=1 to -Warray-bounds=2 -:)), the major issue is that it’s not clear and confusing the end-user. 

With more information, we can help the user to locate the issue and fix it in the source code. 
> 
> Note very similar issues happen when unrolling a loop.
> 
> Note all late diagnostics are prone to these kind of issues.

Are there any open PRs for these similar issues?

Thanks. 

Qing
> 
> Richard.
> 
>> Thanks.
>> 
>> Qing
>> 
>> 
>> PR tree optimization/109071
>> 
>> gcc/ChangeLog:
>> 
>> * gimple-array-bounds.cc (check_out_of_bounds_and_warn): Add two new
>> arguments for the new heuristic to not issue warnings.
>> (array_bounds_checker::check_array_ref): Call the new prototype of the
>> routine check_out_of_bounds_and_warn.
>> (array_bounds_checker::check_mem_ref): Add one new argument for the
>> new heuristic to not issue warnings.
>> (array_bounds_checker::check_addr_expr): Call the new prototype of the
>> routine check_mem_ref, add new heuristic for not issue warnings.
>> (array_bounds_checker::check_array_bounds): Call the new prototype of
>> the routine check_mem_ref.
>> * gimple-array-bounds.h: New prototype of check_mem_ref.
>> * gimple.h (struct GTY): Add one new flag is_splitted for gimple.
>> (gimple_is_splitted_p): New function.
>> (gimple_set_is_splitted): New function.
>> * tree-ssa-threadupdate.cc (set_stmts_in_bb_is_splitted): New function.
>> (back_jt_path_registry::duplicate_thread_path): Mark all the stmts in
>> both original and copied blocks as IS_SPLITTED.
>> 
>> gcc/testsuite/ChangeLog:
>> 
>> * gcc.dg/Warray-bounds-61.c: Adjust testing case.
>> * gcc.dg/pr109071-1.c: New test.
>> * gcc.dg/pr109071.c: New test.
>> ---
>> gcc/gimple-array-bounds.cc              | 46 +++++++++++++++++++++----
>> gcc/gimple-array-bounds.h               |  2 +-
>> gcc/gimple.h                            | 21 +++++++++--
>> gcc/testsuite/gcc.dg/Warray-bounds-61.c |  6 ++--
>> gcc/testsuite/gcc.dg/pr109071-1.c       | 22 ++++++++++++
>> gcc/testsuite/gcc.dg/pr109071.c         | 22 ++++++++++++
>> gcc/tree-ssa-threadupdate.cc            | 15 ++++++++
>> 7 files changed, 122 insertions(+), 12 deletions(-)
>> create mode 100644 gcc/testsuite/gcc.dg/pr109071-1.c
>> create mode 100644 gcc/testsuite/gcc.dg/pr109071.c
>> 
>> diff --git a/gcc/gimple-array-bounds.cc b/gcc/gimple-array-bounds.cc
>> index 008071cd5464..4a2975623bc1 100644
>> --- a/gcc/gimple-array-bounds.cc
>> +++ b/gcc/gimple-array-bounds.cc
>> @@ -264,7 +264,9 @@ check_out_of_bounds_and_warn (location_t location, tree ref,
>>       tree up_bound, tree up_bound_p1,
>>       const value_range *vr,
>>       bool ignore_off_by_one, bool for_array_bound,
>> -       bool *out_of_bound)
>> +       bool *out_of_bound,
>> +       bool is_splitted,
>> +       bool is_dominate_exit)
>> {
>>   tree min, max;
>>   tree low_bound = array_ref_low_bound (ref);
>> @@ -273,6 +275,13 @@ check_out_of_bounds_and_warn (location_t location, tree ref,
>>   bool warned = false;
>>   *out_of_bound = false;
>> 
>> +  /* If the stmt is duplicated and splitted, the warning level is not 2,
>> +     and the current block is not dominating the exit block, not report
>> +     the warning.  */
>> +  if (is_splitted && warn_array_bounds < 2
>> +      && !is_dominate_exit)
>> +    return false;
>> +
>>   /* Empty array.  */
>>   if (up_bound && tree_int_cst_equal (low_bound, up_bound_p1))
>>     {
>> @@ -386,12 +395,17 @@ array_bounds_checker::check_array_ref (location_t location, tree ref,
>> }
>>     }
>> 
>> +  bool is_dominate_exit = dominated_by_p (CDI_DOMINATORS,
>> +   EXIT_BLOCK_PTR_FOR_FN (fun),
>> +   gimple_bb (stmt));
>> +
>>   warned = check_out_of_bounds_and_warn (location, ref,
>>  low_sub_org, low_sub, up_sub,
>>  up_bound, up_bound_p1, &vr,
>>  ignore_off_by_one, warn_array_bounds,
>> -  &out_of_bound);
>> -
>> +  &out_of_bound,
>> +  gimple_is_splitted_p (stmt),
>> +  is_dominate_exit);
>> 
>>   if (!warned && sam == special_array_member::int_0)
>>     warned = warning_at (location, OPT_Wzero_length_bounds,
>> @@ -476,7 +490,7 @@ array_bounds_checker::check_array_ref (location_t location, tree ref,
>> 
>> bool
>> array_bounds_checker::check_mem_ref (location_t location, tree ref,
>> -      bool ignore_off_by_one)
>> +      gimple *stmt, bool ignore_off_by_one)
>> {
>>   if (warning_suppressed_p (ref, OPT_Warray_bounds_))
>>     return false;
>> @@ -574,6 +588,16 @@ array_bounds_checker::check_mem_ref (location_t location, tree ref,
>> }
>>     }
>> 
>> +  /* If the stmt is duplicated and splitted, the warning level is not 2,
>> +     and the current block is not dominating the exit block, not report
>> +     the warning.  */
>> +  bool is_dominate_exit = dominated_by_p (CDI_DOMINATORS,
>> +   EXIT_BLOCK_PTR_FOR_FN (fun),
>> +   gimple_bb (stmt));
>> +  if (gimple_is_splitted_p (stmt) && warn_array_bounds < 2
>> +      && !is_dominate_exit)
>> +    return false;
>> +
>>   bool warned = false;
>>   if (lboob)
>>     {
>> @@ -654,7 +678,7 @@ array_bounds_checker::check_addr_expr (location_t location, tree t,
>>   ignore_off_by_one = false;
>> }
>>       else if (TREE_CODE (t) == MEM_REF)
>> - warned = check_mem_ref (location, t, ignore_off_by_one);
>> + warned = check_mem_ref (location, t, stmt, ignore_off_by_one);
>> 
>>       if (warned)
>> suppress_warning (t, OPT_Warray_bounds_);
>> @@ -690,6 +714,16 @@ array_bounds_checker::check_addr_expr (location_t location, tree t,
>>   if (!mem_ref_offset (t).is_constant (&idx))
>>     return;
>> 
>> +  /* If the stmt is duplicated and splitted, the warning level is not 2,
>> +     and the current block is not dominating the exit block, not report
>> +     the warning.  */
>> +  bool is_dominate_exit = dominated_by_p (CDI_DOMINATORS,
>> +   EXIT_BLOCK_PTR_FOR_FN (fun),
>> +   gimple_bb (stmt));
>> +  if (gimple_is_splitted_p (stmt) && warn_array_bounds < 2
>> +      && !is_dominate_exit)
>> +    return;
>> +
>>   bool warned = false;
>>   idx = wi::sdiv_trunc (idx, wi::to_offset (el_sz));
>>   if (idx < 0)
>> @@ -809,7 +843,7 @@ array_bounds_checker::check_array_bounds (tree *tp, int *walk_subtree,
>>     warned = checker->check_array_ref (location, t, wi->stmt,
>>        false/*ignore_off_by_one*/);
>>   else if (TREE_CODE (t) == MEM_REF)
>> -    warned = checker->check_mem_ref (location, t,
>> +    warned = checker->check_mem_ref (location, t, wi->stmt,
>>      false /*ignore_off_by_one*/);
>>   else if (TREE_CODE (t) == ADDR_EXPR)
>>     {
>> diff --git a/gcc/gimple-array-bounds.h b/gcc/gimple-array-bounds.h
>> index 3e077d0178ff..7c98f02204c9 100644
>> --- a/gcc/gimple-array-bounds.h
>> +++ b/gcc/gimple-array-bounds.h
>> @@ -33,7 +33,7 @@ public:
>> private:
>>   static tree check_array_bounds (tree *tp, int *walk_subtree, void *data);
>>   bool check_array_ref (location_t, tree, gimple *, bool ignore_off_by_one);
>> -  bool check_mem_ref (location_t, tree, bool ignore_off_by_one);
>> +  bool check_mem_ref (location_t, tree, gimple *, bool ignore_off_by_one);
>>   void check_addr_expr (location_t, tree, gimple *);
>>   void get_value_range (irange &r, const_tree op, gimple *);
>> 
>> diff --git a/gcc/gimple.h b/gcc/gimple.h
>> index bd315ffc2dd4..08f52d107084 100644
>> --- a/gcc/gimple.h
>> +++ b/gcc/gimple.h
>> @@ -254,8 +254,8 @@ struct GTY((desc ("gimple_statement_structure (&%h)"), tag ("GSS_BASE"),
>>   /* Nonzero if this statement contains volatile operands.  */
>>   unsigned has_volatile_ops  : 1;
>> 
>> -  /* Padding to get subcode to 16 bit alignment.  */
>> -  unsigned pad : 1;
>> +  /* Nonzero if this statement is duplicated and splitted to two pathes.  */
>> +  unsigned is_splitted : 1;
>> 
>>   /* The SUBCODE field can be used for tuple-specific flags for tuples
>>      that do not require subcodes.  Note that SUBCODE should be at
>> @@ -2327,6 +2327,23 @@ gimple_set_has_volatile_ops (gimple *stmt, bool volatilep)
>>     stmt->has_volatile_ops = (unsigned) volatilep;
>> }
>> 
>> +/* Return true if statement G's is_splitted field has
>> +   been set.  */
>> +
>> +inline bool
>> +gimple_is_splitted_p (const gimple *g)
>> +{
>> +  return (bool) g->is_splitted;
>> +}
>> +
>> +/* Set the IS_SPLITTED flag to IS_SPLITTEDP.  */
>> +
>> +inline void
>> +gimple_set_is_splitted (gimple *s, bool is_splittedp)
>> +{
>> +    s->is_splitted = (unsigned) is_splittedp;
>> +}
>> +
>> /* Return true if STMT is in a transaction.  */
>> 
>> inline bool
>> diff --git a/gcc/testsuite/gcc.dg/Warray-bounds-61.c b/gcc/testsuite/gcc.dg/Warray-bounds-61.c
>> index 5b66cdc0aab1..cb3c64a813d7 100644
>> --- a/gcc/testsuite/gcc.dg/Warray-bounds-61.c
>> +++ b/gcc/testsuite/gcc.dg/Warray-bounds-61.c
>> @@ -23,7 +23,7 @@ void test_ua3_ua0_a0 (int i)
>> 
>>   if (i < __LINE__)
>>     i = 5;
>> -  ua3_a0.a0[i] = 0;           // { dg-warning "\\\[-Warray-bounds" }
>> +  ua3_a0.a0[i] = 0;           // { dg-bogus "\\\[-Warray-bounds" }
>> 
>>   if (i > -1)
>>     i = -1;
>> @@ -44,7 +44,7 @@ void test_ua3_ua0_a1 (int i)
>> 
>>   if (i > -1)
>>     i = -1;
>> -  ua3_a0.a1[i] = 0;           // { dg-warning "\\\[-Warray-bounds" }
>> +  ua3_a0.a1[i] = 0;           // { dg-bogus "\\\[-Warray-bounds" }
>> 
>>   if (i < 7)
>>     i = 7;
>> @@ -60,7 +60,7 @@ void test_ua3_ua0_a2 (int i)
>> 
>>   if (i < __LINE__)
>>     i = __LINE__;
>> -  ua3_a0.a2[i] = 0;           // { dg-warning "\\\[-Warray-bounds" }
>> +  ua3_a0.a2[i] = 0;           // { dg-bogus "\\\[-Warray-bounds" }
>> 
>>   if (i > -1)
>>     i = -1;
>> diff --git a/gcc/testsuite/gcc.dg/pr109071-1.c b/gcc/testsuite/gcc.dg/pr109071-1.c
>> new file mode 100644
>> index 000000000000..a405c80bd549
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/pr109071-1.c
>> @@ -0,0 +1,22 @@
>> +/* PR tree-optimization/109071 -Warray-bounds false positive warnings
>> +   due to code duplication from jump threading 
>> +   { dg-do compile }
>> +   { dg-options "-O2 -Warray-bounds=2" }
>> + */
>> +
>> +extern void warn(void);
>> +static inline void assign(int val, int *regs, int index)
>> +{
>> +  if (index >= 4)
>> +    warn();
>> +  *regs = val;
>> +}
>> +struct nums {int vals[4];};
>> +
>> +void sparx5_set (int *ptr, struct nums *sg, int index)
>> +{
>> +  int *val = &sg->vals[index]; /* { dg-warning "is above array bounds" } */
>> +
>> +  assign(0,    ptr, index);
>> +  assign(*val, ptr, index);
>> +}
>> diff --git a/gcc/testsuite/gcc.dg/pr109071.c b/gcc/testsuite/gcc.dg/pr109071.c
>> new file mode 100644
>> index 000000000000..782dfad84ea2
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/pr109071.c
>> @@ -0,0 +1,22 @@
>> +/* PR tree-optimization/109071 -Warray-bounds false positive warnings
>> +   due to code duplication from jump threading 
>> +   { dg-do compile }
>> +   { dg-options "-O2 -Wall" }
>> + */
>> +
>> +extern void warn(void);
>> +static inline void assign(int val, int *regs, int index)
>> +{
>> +  if (index >= 4)
>> +    warn();
>> +  *regs = val;
>> +}
>> +struct nums {int vals[4];};
>> +
>> +void sparx5_set (int *ptr, struct nums *sg, int index)
>> +{
>> +  int *val = &sg->vals[index]; /* { dg-bogus "is above array bounds" } */
>> +
>> +  assign(0,    ptr, index);
>> +  assign(*val, ptr, index);
>> +}
>> diff --git a/gcc/tree-ssa-threadupdate.cc b/gcc/tree-ssa-threadupdate.cc
>> index fa61ba9512b7..9f338dd4d54d 100644
>> --- a/gcc/tree-ssa-threadupdate.cc
>> +++ b/gcc/tree-ssa-threadupdate.cc
>> @@ -2371,6 +2371,17 @@ back_jt_path_registry::adjust_paths_after_duplication (unsigned curr_path_num)
>>     }
>> }
>> 
>> +/* Set all the stmts in the basic block BB as IS_SPLITTED.  */
>> +
>> +static void
>> +set_stmts_in_bb_is_splitted (basic_block bb)
>> +{
>> +  gimple_stmt_iterator gsi;
>> +  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
>> +    gimple_set_is_splitted (gsi_stmt (gsi), true);
>> +  return;
>> +}
>> +
>> /* Duplicates a jump-thread path of N_REGION basic blocks.
>>    The ENTRY edge is redirected to the duplicate of the region.
>> 
>> @@ -2418,6 +2429,10 @@ back_jt_path_registry::duplicate_thread_path (edge entry,
>>   basic_block *region_copy = XNEWVEC (basic_block, n_region);
>>   copy_bbs (region, n_region, region_copy, &exit, 1, &exit_copy, loop,
>>     split_edge_bb_loc (entry), false);
>> +  /* Mark all the stmts in both original and copied basic blocks
>> +     as IS_SPLITTED.  */
>> +  set_stmts_in_bb_is_splitted (*region);
>> +  set_stmts_in_bb_is_splitted (*region_copy);
>> 
>>   /* Fix up: copy_bbs redirects all edges pointing to copied blocks.  The
>>      following code ensures that all the edges exiting the jump-thread path are
>> 
> 
> -- 
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
Richard Biener May 14, 2024, 2:29 p.m. UTC | #7
On Tue, 14 May 2024, Qing Zhao wrote:

> 
> 
> > On May 14, 2024, at 09:08, Richard Biener <rguenther@suse.de> wrote:
> > 
> > On Mon, 13 May 2024, Qing Zhao wrote:
> > 
> >> -Warray-bounds is an important option to enable linux kernal to keep
> >> the array out-of-bound errors out of the source tree.
> >> 
> >> However, due to the false positive warnings reported in PR109071
> >> (-Warray-bounds false positive warnings due to code duplication from
> >> jump threading), -Warray-bounds=1 cannot be added on by default.
> >> 
> >> Although it's impossible to elinimate all the false positive warnings
> >> from -Warray-bounds=1 (See PR104355 Misleading -Warray-bounds
> >> documentation says "always out of bounds"), we should minimize the
> >> false positive warnings in -Warray-bounds=1.
> >> 
> >> The root reason for the false positive warnings reported in PR109071 is:
> >> 
> >> When the thread jump optimization tries to reduce the # of branches
> >> inside the routine, sometimes it needs to duplicate the code and
> >> split into two conditional pathes. for example:
> >> 
> >> The original code:
> >> 
> >> void sparx5_set (int * ptr, struct nums * sg, int index)
> >> {
> >>  if (index >= 4)
> >>    warn ();
> >>  *ptr = 0;
> >>  *val = sg->vals[index];
> >>  if (index >= 4)
> >>    warn ();
> >>  *ptr = *val;
> >> 
> >>  return;
> >> }
> >> 
> >> With the thread jump, the above becomes:
> >> 
> >> void sparx5_set (int * ptr, struct nums * sg, int index)
> >> {
> >>  if (index >= 4)
> >>    {
> >>      warn ();
> >>      *ptr = 0; // Code duplications since "warn" does return;
> >>      *val = sg->vals[index]; // same this line.
> >> // In this path, since it's under the condition
> >> // "index >= 4", the compiler knows the value
> >> // of "index" is larger then 4, therefore the
> >> // out-of-bound warning.
> >>      warn ();
> >>    }
> >>  else
> >>    {
> >>      *ptr = 0;
> >>      *val = sg->vals[index];
> >>    }
> >>  *ptr = *val;
> >>  return;
> >> }
> >> 
> >> We can see, after the thread jump optimization, the # of branches inside
> >> the routine "sparx5_set" is reduced from 2 to 1, however,  due to the
> >> code duplication (which is needed for the correctness of the code), we
> >> got a false positive out-of-bound warning.
> >> 
> >> In order to eliminate such false positive out-of-bound warning,
> >> 
> >> A. Add one more flag for GIMPLE: is_splitted.
> >> B. During the thread jump optimization, when the basic blocks are
> >>   duplicated, mark all the STMTs inside the original and duplicated
> >>   basic blocks as "is_splitted";
> >> C. Inside the array bound checker, add the following new heuristic:
> >> 
> >> If
> >>   1. the stmt is duplicated and splitted into two conditional paths;
> >> +  2. the warning level < 2;
> >> +  3. the current block is not dominating the exit block
> >> Then not report the warning.
> >> 
> >> The false positive warnings are moved from -Warray-bounds=1 to
> >> -Warray-bounds=2 now.
> >> 
> >> Bootstrapped and regression tested on both x86 and aarch64. adjusted
> >> -Warray-bounds-61.c due to the false positive warnings.
> >> 
> >> Let me know if you have any comments and suggestions.
> > 
> > At the last Cauldron I talked with David Malcolm about these kind of
> > issues and thought of instead of suppressing diagnostics to record
> > how a block was duplicated.  For jump threading my idea was to record
> > the condition that was proved true when entering the path and do this
> > by recording the corresponding locations so that in the end we can
> > use the diagnostic-path infrastructure to say
> > 
> > warning: array index always above array bounds
> > events 1:
> > 
> > | 3 |  if (index >= 4)
> >         |
> >        (1) when index >= 4

As it's been quite some time I think I remeber that I thought of
constructing the diagnostic path at jump threading time and associating
that with the location.  But I don't remember exactly where I wanted to
put it - I think it was on an extra stmt to avoid having too many
ad-hoc locations as I'm not sure of their cost.  It would of course
need experimenting since we can end up moving stmts and merging blocks
though the linear traces created by jump threading should be quite
stable (as opposed to say the unrolling case where multiple instances
of the loop body likely will end up in the exact same basic block).

> Yes, this is a good idea. 
> 
> The current major issue with the warning is:  the constant index value 4 is not in the source code, it’s a compiler generated intermediate value (even though it’s a correct value -:)). Such warning messages confuse the end-users with information that cannot be connected directly to the source code. 
> 
> With the above recorded “events” information, the warning messages should make good sense to the end user, and also help the end user to locate the place where the fix in the source code can be added. 
> 
> Actually, with the above warning information, the user can locate the place “line 3” to add fixes as following:
> 
>         if (*index >= 4)
>           {
>                 warn();
>                 *index = 3;
>           }
> 
> I.e.
> [109071]$ diff t_org.c t.c
> 2c2
> < static inline void assign(int val, int *regs, int index)
> ---
> > static inline void assign(int val, int *regs, int *index)
> 4c4,5
> < if (index >= 4)
> ---
> > if (*index >= 4)
> >   {
> 5a7,8
> > *index = 3;
> >   }
> 14,15c17,18
> < assign(0,    ptr, index);
> < assign(*val, ptr, index);
> ---
> > assign(0,    ptr, &index);
> > assign(*val, ptr, &index);
> 
> > 
> > it would be possible to record the info as part of the ad-hoc
> > location data on each duplicated stmt or, possibly simpler,
> > as part of a debug stmt of new kind.
> 
> Recording such info to each stmt might be more reliable? 
> 
> > 
> > I'm not sure pruning the warnings is a good thing to do.  One
> > would argue we should instead isolate such path as unreachable
> > since it invokes undefined behavior.  In particular your
> > example is clearly a bug and should be diagnosed.
> 
> I agree that the current warning is correct (that’s the reason I only moved it from -Warray-bounds=1 to -Warray-bounds=2 -:)), the major issue is that it’s not clear and confusing the end-user. 
> 
> With more information, we can help the user to locate the issue and fix it in the source code. 
> > 
> > Note very similar issues happen when unrolling a loop.
> > 
> > Note all late diagnostics are prone to these kind of issues.
> 
> Are there any open PRs for these similar issues?

You can search bugzilla for diagnostic bugs or bugs blocking
the symbolic Warray-bounds bug.

Richard.

> Thanks. 
> 
> Qing
> > 
> > Richard.
> > 
> >> Thanks.
> >> 
> >> Qing
> >> 
> >> 
> >> PR tree optimization/109071
> >> 
> >> gcc/ChangeLog:
> >> 
> >> * gimple-array-bounds.cc (check_out_of_bounds_and_warn): Add two new
> >> arguments for the new heuristic to not issue warnings.
> >> (array_bounds_checker::check_array_ref): Call the new prototype of the
> >> routine check_out_of_bounds_and_warn.
> >> (array_bounds_checker::check_mem_ref): Add one new argument for the
> >> new heuristic to not issue warnings.
> >> (array_bounds_checker::check_addr_expr): Call the new prototype of the
> >> routine check_mem_ref, add new heuristic for not issue warnings.
> >> (array_bounds_checker::check_array_bounds): Call the new prototype of
> >> the routine check_mem_ref.
> >> * gimple-array-bounds.h: New prototype of check_mem_ref.
> >> * gimple.h (struct GTY): Add one new flag is_splitted for gimple.
> >> (gimple_is_splitted_p): New function.
> >> (gimple_set_is_splitted): New function.
> >> * tree-ssa-threadupdate.cc (set_stmts_in_bb_is_splitted): New function.
> >> (back_jt_path_registry::duplicate_thread_path): Mark all the stmts in
> >> both original and copied blocks as IS_SPLITTED.
> >> 
> >> gcc/testsuite/ChangeLog:
> >> 
> >> * gcc.dg/Warray-bounds-61.c: Adjust testing case.
> >> * gcc.dg/pr109071-1.c: New test.
> >> * gcc.dg/pr109071.c: New test.
> >> ---
> >> gcc/gimple-array-bounds.cc              | 46 +++++++++++++++++++++----
> >> gcc/gimple-array-bounds.h               |  2 +-
> >> gcc/gimple.h                            | 21 +++++++++--
> >> gcc/testsuite/gcc.dg/Warray-bounds-61.c |  6 ++--
> >> gcc/testsuite/gcc.dg/pr109071-1.c       | 22 ++++++++++++
> >> gcc/testsuite/gcc.dg/pr109071.c         | 22 ++++++++++++
> >> gcc/tree-ssa-threadupdate.cc            | 15 ++++++++
> >> 7 files changed, 122 insertions(+), 12 deletions(-)
> >> create mode 100644 gcc/testsuite/gcc.dg/pr109071-1.c
> >> create mode 100644 gcc/testsuite/gcc.dg/pr109071.c
> >> 
> >> diff --git a/gcc/gimple-array-bounds.cc b/gcc/gimple-array-bounds.cc
> >> index 008071cd5464..4a2975623bc1 100644
> >> --- a/gcc/gimple-array-bounds.cc
> >> +++ b/gcc/gimple-array-bounds.cc
> >> @@ -264,7 +264,9 @@ check_out_of_bounds_and_warn (location_t location, tree ref,
> >>       tree up_bound, tree up_bound_p1,
> >>       const value_range *vr,
> >>       bool ignore_off_by_one, bool for_array_bound,
> >> -       bool *out_of_bound)
> >> +       bool *out_of_bound,
> >> +       bool is_splitted,
> >> +       bool is_dominate_exit)
> >> {
> >>   tree min, max;
> >>   tree low_bound = array_ref_low_bound (ref);
> >> @@ -273,6 +275,13 @@ check_out_of_bounds_and_warn (location_t location, tree ref,
> >>   bool warned = false;
> >>   *out_of_bound = false;
> >> 
> >> +  /* If the stmt is duplicated and splitted, the warning level is not 2,
> >> +     and the current block is not dominating the exit block, not report
> >> +     the warning.  */
> >> +  if (is_splitted && warn_array_bounds < 2
> >> +      && !is_dominate_exit)
> >> +    return false;
> >> +
> >>   /* Empty array.  */
> >>   if (up_bound && tree_int_cst_equal (low_bound, up_bound_p1))
> >>     {
> >> @@ -386,12 +395,17 @@ array_bounds_checker::check_array_ref (location_t location, tree ref,
> >> }
> >>     }
> >> 
> >> +  bool is_dominate_exit = dominated_by_p (CDI_DOMINATORS,
> >> +   EXIT_BLOCK_PTR_FOR_FN (fun),
> >> +   gimple_bb (stmt));
> >> +
> >>   warned = check_out_of_bounds_and_warn (location, ref,
> >>  low_sub_org, low_sub, up_sub,
> >>  up_bound, up_bound_p1, &vr,
> >>  ignore_off_by_one, warn_array_bounds,
> >> -  &out_of_bound);
> >> -
> >> +  &out_of_bound,
> >> +  gimple_is_splitted_p (stmt),
> >> +  is_dominate_exit);
> >> 
> >>   if (!warned && sam == special_array_member::int_0)
> >>     warned = warning_at (location, OPT_Wzero_length_bounds,
> >> @@ -476,7 +490,7 @@ array_bounds_checker::check_array_ref (location_t location, tree ref,
> >> 
> >> bool
> >> array_bounds_checker::check_mem_ref (location_t location, tree ref,
> >> -      bool ignore_off_by_one)
> >> +      gimple *stmt, bool ignore_off_by_one)
> >> {
> >>   if (warning_suppressed_p (ref, OPT_Warray_bounds_))
> >>     return false;
> >> @@ -574,6 +588,16 @@ array_bounds_checker::check_mem_ref (location_t location, tree ref,
> >> }
> >>     }
> >> 
> >> +  /* If the stmt is duplicated and splitted, the warning level is not 2,
> >> +     and the current block is not dominating the exit block, not report
> >> +     the warning.  */
> >> +  bool is_dominate_exit = dominated_by_p (CDI_DOMINATORS,
> >> +   EXIT_BLOCK_PTR_FOR_FN (fun),
> >> +   gimple_bb (stmt));
> >> +  if (gimple_is_splitted_p (stmt) && warn_array_bounds < 2
> >> +      && !is_dominate_exit)
> >> +    return false;
> >> +
> >>   bool warned = false;
> >>   if (lboob)
> >>     {
> >> @@ -654,7 +678,7 @@ array_bounds_checker::check_addr_expr (location_t location, tree t,
> >>   ignore_off_by_one = false;
> >> }
> >>       else if (TREE_CODE (t) == MEM_REF)
> >> - warned = check_mem_ref (location, t, ignore_off_by_one);
> >> + warned = check_mem_ref (location, t, stmt, ignore_off_by_one);
> >> 
> >>       if (warned)
> >> suppress_warning (t, OPT_Warray_bounds_);
> >> @@ -690,6 +714,16 @@ array_bounds_checker::check_addr_expr (location_t location, tree t,
> >>   if (!mem_ref_offset (t).is_constant (&idx))
> >>     return;
> >> 
> >> +  /* If the stmt is duplicated and splitted, the warning level is not 2,
> >> +     and the current block is not dominating the exit block, not report
> >> +     the warning.  */
> >> +  bool is_dominate_exit = dominated_by_p (CDI_DOMINATORS,
> >> +   EXIT_BLOCK_PTR_FOR_FN (fun),
> >> +   gimple_bb (stmt));
> >> +  if (gimple_is_splitted_p (stmt) && warn_array_bounds < 2
> >> +      && !is_dominate_exit)
> >> +    return;
> >> +
> >>   bool warned = false;
> >>   idx = wi::sdiv_trunc (idx, wi::to_offset (el_sz));
> >>   if (idx < 0)
> >> @@ -809,7 +843,7 @@ array_bounds_checker::check_array_bounds (tree *tp, int *walk_subtree,
> >>     warned = checker->check_array_ref (location, t, wi->stmt,
> >>        false/*ignore_off_by_one*/);
> >>   else if (TREE_CODE (t) == MEM_REF)
> >> -    warned = checker->check_mem_ref (location, t,
> >> +    warned = checker->check_mem_ref (location, t, wi->stmt,
> >>      false /*ignore_off_by_one*/);
> >>   else if (TREE_CODE (t) == ADDR_EXPR)
> >>     {
> >> diff --git a/gcc/gimple-array-bounds.h b/gcc/gimple-array-bounds.h
> >> index 3e077d0178ff..7c98f02204c9 100644
> >> --- a/gcc/gimple-array-bounds.h
> >> +++ b/gcc/gimple-array-bounds.h
> >> @@ -33,7 +33,7 @@ public:
> >> private:
> >>   static tree check_array_bounds (tree *tp, int *walk_subtree, void *data);
> >>   bool check_array_ref (location_t, tree, gimple *, bool ignore_off_by_one);
> >> -  bool check_mem_ref (location_t, tree, bool ignore_off_by_one);
> >> +  bool check_mem_ref (location_t, tree, gimple *, bool ignore_off_by_one);
> >>   void check_addr_expr (location_t, tree, gimple *);
> >>   void get_value_range (irange &r, const_tree op, gimple *);
> >> 
> >> diff --git a/gcc/gimple.h b/gcc/gimple.h
> >> index bd315ffc2dd4..08f52d107084 100644
> >> --- a/gcc/gimple.h
> >> +++ b/gcc/gimple.h
> >> @@ -254,8 +254,8 @@ struct GTY((desc ("gimple_statement_structure (&%h)"), tag ("GSS_BASE"),
> >>   /* Nonzero if this statement contains volatile operands.  */
> >>   unsigned has_volatile_ops  : 1;
> >> 
> >> -  /* Padding to get subcode to 16 bit alignment.  */
> >> -  unsigned pad : 1;
> >> +  /* Nonzero if this statement is duplicated and splitted to two pathes.  */
> >> +  unsigned is_splitted : 1;
> >> 
> >>   /* The SUBCODE field can be used for tuple-specific flags for tuples
> >>      that do not require subcodes.  Note that SUBCODE should be at
> >> @@ -2327,6 +2327,23 @@ gimple_set_has_volatile_ops (gimple *stmt, bool volatilep)
> >>     stmt->has_volatile_ops = (unsigned) volatilep;
> >> }
> >> 
> >> +/* Return true if statement G's is_splitted field has
> >> +   been set.  */
> >> +
> >> +inline bool
> >> +gimple_is_splitted_p (const gimple *g)
> >> +{
> >> +  return (bool) g->is_splitted;
> >> +}
> >> +
> >> +/* Set the IS_SPLITTED flag to IS_SPLITTEDP.  */
> >> +
> >> +inline void
> >> +gimple_set_is_splitted (gimple *s, bool is_splittedp)
> >> +{
> >> +    s->is_splitted = (unsigned) is_splittedp;
> >> +}
> >> +
> >> /* Return true if STMT is in a transaction.  */
> >> 
> >> inline bool
> >> diff --git a/gcc/testsuite/gcc.dg/Warray-bounds-61.c b/gcc/testsuite/gcc.dg/Warray-bounds-61.c
> >> index 5b66cdc0aab1..cb3c64a813d7 100644
> >> --- a/gcc/testsuite/gcc.dg/Warray-bounds-61.c
> >> +++ b/gcc/testsuite/gcc.dg/Warray-bounds-61.c
> >> @@ -23,7 +23,7 @@ void test_ua3_ua0_a0 (int i)
> >> 
> >>   if (i < __LINE__)
> >>     i = 5;
> >> -  ua3_a0.a0[i] = 0;           // { dg-warning "\\\[-Warray-bounds" }
> >> +  ua3_a0.a0[i] = 0;           // { dg-bogus "\\\[-Warray-bounds" }
> >> 
> >>   if (i > -1)
> >>     i = -1;
> >> @@ -44,7 +44,7 @@ void test_ua3_ua0_a1 (int i)
> >> 
> >>   if (i > -1)
> >>     i = -1;
> >> -  ua3_a0.a1[i] = 0;           // { dg-warning "\\\[-Warray-bounds" }
> >> +  ua3_a0.a1[i] = 0;           // { dg-bogus "\\\[-Warray-bounds" }
> >> 
> >>   if (i < 7)
> >>     i = 7;
> >> @@ -60,7 +60,7 @@ void test_ua3_ua0_a2 (int i)
> >> 
> >>   if (i < __LINE__)
> >>     i = __LINE__;
> >> -  ua3_a0.a2[i] = 0;           // { dg-warning "\\\[-Warray-bounds" }
> >> +  ua3_a0.a2[i] = 0;           // { dg-bogus "\\\[-Warray-bounds" }
> >> 
> >>   if (i > -1)
> >>     i = -1;
> >> diff --git a/gcc/testsuite/gcc.dg/pr109071-1.c b/gcc/testsuite/gcc.dg/pr109071-1.c
> >> new file mode 100644
> >> index 000000000000..a405c80bd549
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.dg/pr109071-1.c
> >> @@ -0,0 +1,22 @@
> >> +/* PR tree-optimization/109071 -Warray-bounds false positive warnings
> >> +   due to code duplication from jump threading 
> >> +   { dg-do compile }
> >> +   { dg-options "-O2 -Warray-bounds=2" }
> >> + */
> >> +
> >> +extern void warn(void);
> >> +static inline void assign(int val, int *regs, int index)
> >> +{
> >> +  if (index >= 4)
> >> +    warn();
> >> +  *regs = val;
> >> +}
> >> +struct nums {int vals[4];};
> >> +
> >> +void sparx5_set (int *ptr, struct nums *sg, int index)
> >> +{
> >> +  int *val = &sg->vals[index]; /* { dg-warning "is above array bounds" } */
> >> +
> >> +  assign(0,    ptr, index);
> >> +  assign(*val, ptr, index);
> >> +}
> >> diff --git a/gcc/testsuite/gcc.dg/pr109071.c b/gcc/testsuite/gcc.dg/pr109071.c
> >> new file mode 100644
> >> index 000000000000..782dfad84ea2
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.dg/pr109071.c
> >> @@ -0,0 +1,22 @@
> >> +/* PR tree-optimization/109071 -Warray-bounds false positive warnings
> >> +   due to code duplication from jump threading 
> >> +   { dg-do compile }
> >> +   { dg-options "-O2 -Wall" }
> >> + */
> >> +
> >> +extern void warn(void);
> >> +static inline void assign(int val, int *regs, int index)
> >> +{
> >> +  if (index >= 4)
> >> +    warn();
> >> +  *regs = val;
> >> +}
> >> +struct nums {int vals[4];};
> >> +
> >> +void sparx5_set (int *ptr, struct nums *sg, int index)
> >> +{
> >> +  int *val = &sg->vals[index]; /* { dg-bogus "is above array bounds" } */
> >> +
> >> +  assign(0,    ptr, index);
> >> +  assign(*val, ptr, index);
> >> +}
> >> diff --git a/gcc/tree-ssa-threadupdate.cc b/gcc/tree-ssa-threadupdate.cc
> >> index fa61ba9512b7..9f338dd4d54d 100644
> >> --- a/gcc/tree-ssa-threadupdate.cc
> >> +++ b/gcc/tree-ssa-threadupdate.cc
> >> @@ -2371,6 +2371,17 @@ back_jt_path_registry::adjust_paths_after_duplication (unsigned curr_path_num)
> >>     }
> >> }
> >> 
> >> +/* Set all the stmts in the basic block BB as IS_SPLITTED.  */
> >> +
> >> +static void
> >> +set_stmts_in_bb_is_splitted (basic_block bb)
> >> +{
> >> +  gimple_stmt_iterator gsi;
> >> +  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> >> +    gimple_set_is_splitted (gsi_stmt (gsi), true);
> >> +  return;
> >> +}
> >> +
> >> /* Duplicates a jump-thread path of N_REGION basic blocks.
> >>    The ENTRY edge is redirected to the duplicate of the region.
> >> 
> >> @@ -2418,6 +2429,10 @@ back_jt_path_registry::duplicate_thread_path (edge entry,
> >>   basic_block *region_copy = XNEWVEC (basic_block, n_region);
> >>   copy_bbs (region, n_region, region_copy, &exit, 1, &exit_copy, loop,
> >>     split_edge_bb_loc (entry), false);
> >> +  /* Mark all the stmts in both original and copied basic blocks
> >> +     as IS_SPLITTED.  */
> >> +  set_stmts_in_bb_is_splitted (*region);
> >> +  set_stmts_in_bb_is_splitted (*region_copy);
> >> 
> >>   /* Fix up: copy_bbs redirects all edges pointing to copied blocks.  The
> >>      following code ensures that all the edges exiting the jump-thread path are
> >> 
> > 
> > -- 
> > Richard Biener <rguenther@suse.de>
> > SUSE Software Solutions Germany GmbH,
> > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
> 
> 
>
Qing Zhao May 14, 2024, 2:57 p.m. UTC | #8
> On May 13, 2024, at 20:14, Kees Cook <keescook@chromium.org> wrote:
> 
> On Tue, May 14, 2024 at 01:38:49AM +0200, Andrew Pinski wrote:
>> On Mon, May 13, 2024, 11:41 PM Kees Cook <keescook@chromium.org> wrote:
>>> But it makes no sense to warn about:
>>> 
>>> void sparx5_set (int * ptr, struct nums * sg, int index)
>>> {
>>>   if (index >= 4)
>>>     warn ();
>>>   *ptr = 0;
>>>   *val = sg->vals[index];
>>>   if (index >= 4)
>>>     warn ();
>>>   *ptr = *val;
>>> }
>>> 
>>> Because at "*val = sg->vals[index];" the actual value range tracking for
>>> index is _still_ [INT_MIN,INT_MAX]. (Only within the "then" side of the
>>> "if" statements is the range tracking [4,INT_MAX].)
>>> 
>>> However, in the case where jump threading has split the execution flow
>>> and produced a copy of "*val = sg->vals[index];" where the value range
>>> tracking for "index" is now [4,INT_MAX], is the warning valid. But it
>>> is only for that instance. Reporting it for effectively both (there is
>>> only 1 source line for the array indexing) is misleading because there
>>> is nothing the user can do about it -- the compiler created the copy and
>>> then noticed it had a range it could apply to that array index.
>>> 
>> 
>> "there is nothing the user can do about it" is very much false. They could
>> change warn call into a noreturn function call instead.  (In the case of
>> the Linux kernel panic). There are things the user can do to fix the
>> warning and even get better code generation out of the compilers.
> 
> This isn't about warn() not being noreturn. The warn() could be any
> function call; the jump threading still happens.

When the program is executed on the “if (index > = 4)” path,  the value of “index” is definitely
>= 4, when sg->vals[index] is referenced on this path (the case when the routine “warn” is NOT noreturn), it’s
definitely an out-of-bounds array access.  So, the compiler’s warning is correct. And this warning does catch 
a potential issue in the source code that need to be fixed by either of the following two solutions:

   1. Make the routine “warn” as noreturn and mark it noreturn;
Or
   2. On the path “if (index >= 4)”, make the value of “index” in the bound of the array. 

With either of the above source code changes, we can fix this potential out-of-bound array access bug in the source code.

Qing
> 
> GCC is warning about a compiler-constructed situation that cannot be
> reliably fixed on the source side (GCC emitting the warning is highly
> unstable in these cases), since the condition is not *always* true for
> the given line of code. If it is not useful to warn for "array[index]"
> being out of range when "index" is always [INT_MIN,INT_MAX], then it
> is not useful to warn when "index" MAY be [INT_MIN,INT_MAX] for a given
> line of code.
> 
> -Kees
> 
> -- 
> Kees Cook
Jeff Law May 14, 2024, 3:08 p.m. UTC | #9
On 5/14/24 8:57 AM, Qing Zhao wrote:
> 
> 
>> On May 13, 2024, at 20:14, Kees Cook <keescook@chromium.org> wrote:
>>
>> On Tue, May 14, 2024 at 01:38:49AM +0200, Andrew Pinski wrote:
>>> On Mon, May 13, 2024, 11:41 PM Kees Cook <keescook@chromium.org> wrote:
>>>> But it makes no sense to warn about:
>>>>
>>>> void sparx5_set (int * ptr, struct nums * sg, int index)
>>>> {
>>>>    if (index >= 4)
>>>>      warn ();
>>>>    *ptr = 0;
>>>>    *val = sg->vals[index];
>>>>    if (index >= 4)
>>>>      warn ();
>>>>    *ptr = *val;
>>>> }
>>>>
>>>> Because at "*val = sg->vals[index];" the actual value range tracking for
>>>> index is _still_ [INT_MIN,INT_MAX]. (Only within the "then" side of the
>>>> "if" statements is the range tracking [4,INT_MAX].)
>>>>
>>>> However, in the case where jump threading has split the execution flow
>>>> and produced a copy of "*val = sg->vals[index];" where the value range
>>>> tracking for "index" is now [4,INT_MAX], is the warning valid. But it
>>>> is only for that instance. Reporting it for effectively both (there is
>>>> only 1 source line for the array indexing) is misleading because there
>>>> is nothing the user can do about it -- the compiler created the copy and
>>>> then noticed it had a range it could apply to that array index.
>>>>
>>>
>>> "there is nothing the user can do about it" is very much false. They could
>>> change warn call into a noreturn function call instead.  (In the case of
>>> the Linux kernel panic). There are things the user can do to fix the
>>> warning and even get better code generation out of the compilers.
>>
>> This isn't about warn() not being noreturn. The warn() could be any
>> function call; the jump threading still happens.
> 
> When the program is executed on the “if (index > = 4)” path,  the value of “index” is definitely
>> = 4, when sg->vals[index] is referenced on this path (the case when the routine “warn” is NOT noreturn), it’s
> definitely an out-of-bounds array access.  So, the compiler’s warning is correct. And this warning does catch
> a potential issue in the source code that need to be fixed by either of the following two solutions:
> 
>     1. Make the routine “warn” as noreturn and mark it noreturn;
This would be my recommendation.  We're about to execute undefined 
behavior.  I don't see a way to necessarily recover safely here, so I'd 
suggest having warn() not return and mark it appropriately.

That'll have numerous secondary benefits as well.

jeff
Qing Zhao May 14, 2024, 3:19 p.m. UTC | #10
> On May 14, 2024, at 10:29, Richard Biener <rguenther@suse.de> wrote:
> 
> On Tue, 14 May 2024, Qing Zhao wrote:
> 
>> 
>> 
>>> On May 14, 2024, at 09:08, Richard Biener <rguenther@suse.de> wrote:
>>> 
>>> On Mon, 13 May 2024, Qing Zhao wrote:
>>> 
>>>> -Warray-bounds is an important option to enable linux kernal to keep
>>>> the array out-of-bound errors out of the source tree.
>>>> 
>>>> However, due to the false positive warnings reported in PR109071
>>>> (-Warray-bounds false positive warnings due to code duplication from
>>>> jump threading), -Warray-bounds=1 cannot be added on by default.
>>>> 
>>>> Although it's impossible to elinimate all the false positive warnings
>>>> from -Warray-bounds=1 (See PR104355 Misleading -Warray-bounds
>>>> documentation says "always out of bounds"), we should minimize the
>>>> false positive warnings in -Warray-bounds=1.
>>>> 
>>>> The root reason for the false positive warnings reported in PR109071 is:
>>>> 
>>>> When the thread jump optimization tries to reduce the # of branches
>>>> inside the routine, sometimes it needs to duplicate the code and
>>>> split into two conditional pathes. for example:
>>>> 
>>>> The original code:
>>>> 
>>>> void sparx5_set (int * ptr, struct nums * sg, int index)
>>>> {
>>>> if (index >= 4)
>>>>   warn ();
>>>> *ptr = 0;
>>>> *val = sg->vals[index];
>>>> if (index >= 4)
>>>>   warn ();
>>>> *ptr = *val;
>>>> 
>>>> return;
>>>> }
>>>> 
>>>> With the thread jump, the above becomes:
>>>> 
>>>> void sparx5_set (int * ptr, struct nums * sg, int index)
>>>> {
>>>> if (index >= 4)
>>>>   {
>>>>     warn ();
>>>>     *ptr = 0; // Code duplications since "warn" does return;
>>>>     *val = sg->vals[index]; // same this line.
>>>> // In this path, since it's under the condition
>>>> // "index >= 4", the compiler knows the value
>>>> // of "index" is larger then 4, therefore the
>>>> // out-of-bound warning.
>>>>     warn ();
>>>>   }
>>>> else
>>>>   {
>>>>     *ptr = 0;
>>>>     *val = sg->vals[index];
>>>>   }
>>>> *ptr = *val;
>>>> return;
>>>> }
>>>> 
>>>> We can see, after the thread jump optimization, the # of branches inside
>>>> the routine "sparx5_set" is reduced from 2 to 1, however,  due to the
>>>> code duplication (which is needed for the correctness of the code), we
>>>> got a false positive out-of-bound warning.
>>>> 
>>>> In order to eliminate such false positive out-of-bound warning,
>>>> 
>>>> A. Add one more flag for GIMPLE: is_splitted.
>>>> B. During the thread jump optimization, when the basic blocks are
>>>>  duplicated, mark all the STMTs inside the original and duplicated
>>>>  basic blocks as "is_splitted";
>>>> C. Inside the array bound checker, add the following new heuristic:
>>>> 
>>>> If
>>>>  1. the stmt is duplicated and splitted into two conditional paths;
>>>> +  2. the warning level < 2;
>>>> +  3. the current block is not dominating the exit block
>>>> Then not report the warning.
>>>> 
>>>> The false positive warnings are moved from -Warray-bounds=1 to
>>>> -Warray-bounds=2 now.
>>>> 
>>>> Bootstrapped and regression tested on both x86 and aarch64. adjusted
>>>> -Warray-bounds-61.c due to the false positive warnings.
>>>> 
>>>> Let me know if you have any comments and suggestions.
>>> 
>>> At the last Cauldron I talked with David Malcolm about these kind of
>>> issues and thought of instead of suppressing diagnostics to record
>>> how a block was duplicated.  For jump threading my idea was to record
>>> the condition that was proved true when entering the path and do this
>>> by recording the corresponding locations so that in the end we can
>>> use the diagnostic-path infrastructure to say
>>> 
>>> warning: array index always above array bounds
>>> events 1:
>>> 
>>> | 3 |  if (index >= 4)
>>>        |
>>>       (1) when index >= 4
> 
> As it's been quite some time I think I remeber that I thought of
> constructing the diagnostic path at jump threading time and associating
> that with the location.  But I don't remember exactly where I wanted to
> put it - I think it was on an extra stmt to avoid having too many
> ad-hoc locations as I'm not sure of their cost.

Yes, an extra stmt for each basic block might  be less memory cost than an extra location
info for each stmt.  

The major concern with the extra stmt for each basic block is how to  keep this special stmt in place
after stmts moving and basic block merging as you mentioned below.  During my debugging of bug
109071, I noticed that the duplicated basic blocks are merged with old blocks to form new blocks. And
the original block is deleted.  I expected that the implementation with the extra stmt for each basic block
might be much harder and error prone.

>  It would of course
> need experimenting since we can end up moving stmts and merging blocks
> though the linear traces created by jump threading should be quite
> stable (as opposed to say the unrolling case where multiple instances
> of the loop body likely will end up in the exact same basic block).

Do you mean, for loop unrolling the approach with one extra stmt for one basic block might be even harder and unreliable? 
> 
>> Yes, this is a good idea. 
>> 
>> The current major issue with the warning is:  the constant index value 4 is not in the source code, it’s a compiler generated intermediate value (even though it’s a correct value -:)). Such warning messages confuse the end-users with information that cannot be connected directly to the source code. 
>> 
>> With the above recorded “events” information, the warning messages should make good sense to the end user, and also help the end user to locate the place where the fix in the source code can be added. 
>> 
>> Actually, with the above warning information, the user can locate the place “line 3” to add fixes as following:
>> 
>>        if (*index >= 4)
>>          {
>>                warn();
>>                *index = 3;
>>          }
>> 
>> I.e.
>> [109071]$ diff t_org.c t.c
>> 2c2
>> < static inline void assign(int val, int *regs, int index)
>> ---
>>> static inline void assign(int val, int *regs, int *index)
>> 4c4,5
>> < if (index >= 4)
>> ---
>>> if (*index >= 4)
>>>  {
>> 5a7,8
>>> *index = 3;
>>>  }
>> 14,15c17,18
>> < assign(0,    ptr, index);
>> < assign(*val, ptr, index);
>> ---
>>> assign(0,    ptr, &index);
>>> assign(*val, ptr, &index);
>> 
>>> 
>>> it would be possible to record the info as part of the ad-hoc
>>> location data on each duplicated stmt or, possibly simpler,
>>> as part of a debug stmt of new kind.
>> 
>> Recording such info to each stmt might be more reliable? 
>> 
>>> 
>>> I'm not sure pruning the warnings is a good thing to do.  One
>>> would argue we should instead isolate such path as unreachable
>>> since it invokes undefined behavior.  In particular your
>>> example is clearly a bug and should be diagnosed.
>> 
>> I agree that the current warning is correct (that’s the reason I only moved it from -Warray-bounds=1 to -Warray-bounds=2 -:)), the major issue is that it’s not clear and confusing the end-user. 
>> 
>> With more information, we can help the user to locate the issue and fix it in the source code.
>>> 
>>> Note very similar issues happen when unrolling a loop.
>>> 
>>> Note all late diagnostics are prone to these kind of issues.
>> 
>> Are there any open PRs for these similar issues?
> 
> You can search bugzilla for diagnostic bugs or bugs blocking
> the symbolic Warray-bounds bug.

Sure, will check.

thanks.

Qing
> 
> Richard.
> 
>> Thanks. 
>> 
>> Qing
>>> 
>>> Richard.
>>> 
>>>> Thanks.
>>>> 
>>>> Qing
>>>> 
>>>> 
>>>> PR tree optimization/109071
>>>> 
>>>> gcc/ChangeLog:
>>>> 
>>>> * gimple-array-bounds.cc (check_out_of_bounds_and_warn): Add two new
>>>> arguments for the new heuristic to not issue warnings.
>>>> (array_bounds_checker::check_array_ref): Call the new prototype of the
>>>> routine check_out_of_bounds_and_warn.
>>>> (array_bounds_checker::check_mem_ref): Add one new argument for the
>>>> new heuristic to not issue warnings.
>>>> (array_bounds_checker::check_addr_expr): Call the new prototype of the
>>>> routine check_mem_ref, add new heuristic for not issue warnings.
>>>> (array_bounds_checker::check_array_bounds): Call the new prototype of
>>>> the routine check_mem_ref.
>>>> * gimple-array-bounds.h: New prototype of check_mem_ref.
>>>> * gimple.h (struct GTY): Add one new flag is_splitted for gimple.
>>>> (gimple_is_splitted_p): New function.
>>>> (gimple_set_is_splitted): New function.
>>>> * tree-ssa-threadupdate.cc (set_stmts_in_bb_is_splitted): New function.
>>>> (back_jt_path_registry::duplicate_thread_path): Mark all the stmts in
>>>> both original and copied blocks as IS_SPLITTED.
>>>> 
>>>> gcc/testsuite/ChangeLog:
>>>> 
>>>> * gcc.dg/Warray-bounds-61.c: Adjust testing case.
>>>> * gcc.dg/pr109071-1.c: New test.
>>>> * gcc.dg/pr109071.c: New test.
>>>> ---
>>>> gcc/gimple-array-bounds.cc              | 46 +++++++++++++++++++++----
>>>> gcc/gimple-array-bounds.h               |  2 +-
>>>> gcc/gimple.h                            | 21 +++++++++--
>>>> gcc/testsuite/gcc.dg/Warray-bounds-61.c |  6 ++--
>>>> gcc/testsuite/gcc.dg/pr109071-1.c       | 22 ++++++++++++
>>>> gcc/testsuite/gcc.dg/pr109071.c         | 22 ++++++++++++
>>>> gcc/tree-ssa-threadupdate.cc            | 15 ++++++++
>>>> 7 files changed, 122 insertions(+), 12 deletions(-)
>>>> create mode 100644 gcc/testsuite/gcc.dg/pr109071-1.c
>>>> create mode 100644 gcc/testsuite/gcc.dg/pr109071.c
>>>> 
>>>> diff --git a/gcc/gimple-array-bounds.cc b/gcc/gimple-array-bounds.cc
>>>> index 008071cd5464..4a2975623bc1 100644
>>>> --- a/gcc/gimple-array-bounds.cc
>>>> +++ b/gcc/gimple-array-bounds.cc
>>>> @@ -264,7 +264,9 @@ check_out_of_bounds_and_warn (location_t location, tree ref,
>>>>      tree up_bound, tree up_bound_p1,
>>>>      const value_range *vr,
>>>>      bool ignore_off_by_one, bool for_array_bound,
>>>> -       bool *out_of_bound)
>>>> +       bool *out_of_bound,
>>>> +       bool is_splitted,
>>>> +       bool is_dominate_exit)
>>>> {
>>>>  tree min, max;
>>>>  tree low_bound = array_ref_low_bound (ref);
>>>> @@ -273,6 +275,13 @@ check_out_of_bounds_and_warn (location_t location, tree ref,
>>>>  bool warned = false;
>>>>  *out_of_bound = false;
>>>> 
>>>> +  /* If the stmt is duplicated and splitted, the warning level is not 2,
>>>> +     and the current block is not dominating the exit block, not report
>>>> +     the warning.  */
>>>> +  if (is_splitted && warn_array_bounds < 2
>>>> +      && !is_dominate_exit)
>>>> +    return false;
>>>> +
>>>>  /* Empty array.  */
>>>>  if (up_bound && tree_int_cst_equal (low_bound, up_bound_p1))
>>>>    {
>>>> @@ -386,12 +395,17 @@ array_bounds_checker::check_array_ref (location_t location, tree ref,
>>>> }
>>>>    }
>>>> 
>>>> +  bool is_dominate_exit = dominated_by_p (CDI_DOMINATORS,
>>>> +   EXIT_BLOCK_PTR_FOR_FN (fun),
>>>> +   gimple_bb (stmt));
>>>> +
>>>>  warned = check_out_of_bounds_and_warn (location, ref,
>>>> low_sub_org, low_sub, up_sub,
>>>> up_bound, up_bound_p1, &vr,
>>>> ignore_off_by_one, warn_array_bounds,
>>>> -  &out_of_bound);
>>>> -
>>>> +  &out_of_bound,
>>>> +  gimple_is_splitted_p (stmt),
>>>> +  is_dominate_exit);
>>>> 
>>>>  if (!warned && sam == special_array_member::int_0)
>>>>    warned = warning_at (location, OPT_Wzero_length_bounds,
>>>> @@ -476,7 +490,7 @@ array_bounds_checker::check_array_ref (location_t location, tree ref,
>>>> 
>>>> bool
>>>> array_bounds_checker::check_mem_ref (location_t location, tree ref,
>>>> -      bool ignore_off_by_one)
>>>> +      gimple *stmt, bool ignore_off_by_one)
>>>> {
>>>>  if (warning_suppressed_p (ref, OPT_Warray_bounds_))
>>>>    return false;
>>>> @@ -574,6 +588,16 @@ array_bounds_checker::check_mem_ref (location_t location, tree ref,
>>>> }
>>>>    }
>>>> 
>>>> +  /* If the stmt is duplicated and splitted, the warning level is not 2,
>>>> +     and the current block is not dominating the exit block, not report
>>>> +     the warning.  */
>>>> +  bool is_dominate_exit = dominated_by_p (CDI_DOMINATORS,
>>>> +   EXIT_BLOCK_PTR_FOR_FN (fun),
>>>> +   gimple_bb (stmt));
>>>> +  if (gimple_is_splitted_p (stmt) && warn_array_bounds < 2
>>>> +      && !is_dominate_exit)
>>>> +    return false;
>>>> +
>>>>  bool warned = false;
>>>>  if (lboob)
>>>>    {
>>>> @@ -654,7 +678,7 @@ array_bounds_checker::check_addr_expr (location_t location, tree t,
>>>>  ignore_off_by_one = false;
>>>> }
>>>>      else if (TREE_CODE (t) == MEM_REF)
>>>> - warned = check_mem_ref (location, t, ignore_off_by_one);
>>>> + warned = check_mem_ref (location, t, stmt, ignore_off_by_one);
>>>> 
>>>>      if (warned)
>>>> suppress_warning (t, OPT_Warray_bounds_);
>>>> @@ -690,6 +714,16 @@ array_bounds_checker::check_addr_expr (location_t location, tree t,
>>>>  if (!mem_ref_offset (t).is_constant (&idx))
>>>>    return;
>>>> 
>>>> +  /* If the stmt is duplicated and splitted, the warning level is not 2,
>>>> +     and the current block is not dominating the exit block, not report
>>>> +     the warning.  */
>>>> +  bool is_dominate_exit = dominated_by_p (CDI_DOMINATORS,
>>>> +   EXIT_BLOCK_PTR_FOR_FN (fun),
>>>> +   gimple_bb (stmt));
>>>> +  if (gimple_is_splitted_p (stmt) && warn_array_bounds < 2
>>>> +      && !is_dominate_exit)
>>>> +    return;
>>>> +
>>>>  bool warned = false;
>>>>  idx = wi::sdiv_trunc (idx, wi::to_offset (el_sz));
>>>>  if (idx < 0)
>>>> @@ -809,7 +843,7 @@ array_bounds_checker::check_array_bounds (tree *tp, int *walk_subtree,
>>>>    warned = checker->check_array_ref (location, t, wi->stmt,
>>>>       false/*ignore_off_by_one*/);
>>>>  else if (TREE_CODE (t) == MEM_REF)
>>>> -    warned = checker->check_mem_ref (location, t,
>>>> +    warned = checker->check_mem_ref (location, t, wi->stmt,
>>>>     false /*ignore_off_by_one*/);
>>>>  else if (TREE_CODE (t) == ADDR_EXPR)
>>>>    {
>>>> diff --git a/gcc/gimple-array-bounds.h b/gcc/gimple-array-bounds.h
>>>> index 3e077d0178ff..7c98f02204c9 100644
>>>> --- a/gcc/gimple-array-bounds.h
>>>> +++ b/gcc/gimple-array-bounds.h
>>>> @@ -33,7 +33,7 @@ public:
>>>> private:
>>>>  static tree check_array_bounds (tree *tp, int *walk_subtree, void *data);
>>>>  bool check_array_ref (location_t, tree, gimple *, bool ignore_off_by_one);
>>>> -  bool check_mem_ref (location_t, tree, bool ignore_off_by_one);
>>>> +  bool check_mem_ref (location_t, tree, gimple *, bool ignore_off_by_one);
>>>>  void check_addr_expr (location_t, tree, gimple *);
>>>>  void get_value_range (irange &r, const_tree op, gimple *);
>>>> 
>>>> diff --git a/gcc/gimple.h b/gcc/gimple.h
>>>> index bd315ffc2dd4..08f52d107084 100644
>>>> --- a/gcc/gimple.h
>>>> +++ b/gcc/gimple.h
>>>> @@ -254,8 +254,8 @@ struct GTY((desc ("gimple_statement_structure (&%h)"), tag ("GSS_BASE"),
>>>>  /* Nonzero if this statement contains volatile operands.  */
>>>>  unsigned has_volatile_ops  : 1;
>>>> 
>>>> -  /* Padding to get subcode to 16 bit alignment.  */
>>>> -  unsigned pad : 1;
>>>> +  /* Nonzero if this statement is duplicated and splitted to two pathes.  */
>>>> +  unsigned is_splitted : 1;
>>>> 
>>>>  /* The SUBCODE field can be used for tuple-specific flags for tuples
>>>>     that do not require subcodes.  Note that SUBCODE should be at
>>>> @@ -2327,6 +2327,23 @@ gimple_set_has_volatile_ops (gimple *stmt, bool volatilep)
>>>>    stmt->has_volatile_ops = (unsigned) volatilep;
>>>> }
>>>> 
>>>> +/* Return true if statement G's is_splitted field has
>>>> +   been set.  */
>>>> +
>>>> +inline bool
>>>> +gimple_is_splitted_p (const gimple *g)
>>>> +{
>>>> +  return (bool) g->is_splitted;
>>>> +}
>>>> +
>>>> +/* Set the IS_SPLITTED flag to IS_SPLITTEDP.  */
>>>> +
>>>> +inline void
>>>> +gimple_set_is_splitted (gimple *s, bool is_splittedp)
>>>> +{
>>>> +    s->is_splitted = (unsigned) is_splittedp;
>>>> +}
>>>> +
>>>> /* Return true if STMT is in a transaction.  */
>>>> 
>>>> inline bool
>>>> diff --git a/gcc/testsuite/gcc.dg/Warray-bounds-61.c b/gcc/testsuite/gcc.dg/Warray-bounds-61.c
>>>> index 5b66cdc0aab1..cb3c64a813d7 100644
>>>> --- a/gcc/testsuite/gcc.dg/Warray-bounds-61.c
>>>> +++ b/gcc/testsuite/gcc.dg/Warray-bounds-61.c
>>>> @@ -23,7 +23,7 @@ void test_ua3_ua0_a0 (int i)
>>>> 
>>>>  if (i < __LINE__)
>>>>    i = 5;
>>>> -  ua3_a0.a0[i] = 0;           // { dg-warning "\\\[-Warray-bounds" }
>>>> +  ua3_a0.a0[i] = 0;           // { dg-bogus "\\\[-Warray-bounds" }
>>>> 
>>>>  if (i > -1)
>>>>    i = -1;
>>>> @@ -44,7 +44,7 @@ void test_ua3_ua0_a1 (int i)
>>>> 
>>>>  if (i > -1)
>>>>    i = -1;
>>>> -  ua3_a0.a1[i] = 0;           // { dg-warning "\\\[-Warray-bounds" }
>>>> +  ua3_a0.a1[i] = 0;           // { dg-bogus "\\\[-Warray-bounds" }
>>>> 
>>>>  if (i < 7)
>>>>    i = 7;
>>>> @@ -60,7 +60,7 @@ void test_ua3_ua0_a2 (int i)
>>>> 
>>>>  if (i < __LINE__)
>>>>    i = __LINE__;
>>>> -  ua3_a0.a2[i] = 0;           // { dg-warning "\\\[-Warray-bounds" }
>>>> +  ua3_a0.a2[i] = 0;           // { dg-bogus "\\\[-Warray-bounds" }
>>>> 
>>>>  if (i > -1)
>>>>    i = -1;
>>>> diff --git a/gcc/testsuite/gcc.dg/pr109071-1.c b/gcc/testsuite/gcc.dg/pr109071-1.c
>>>> new file mode 100644
>>>> index 000000000000..a405c80bd549
>>>> --- /dev/null
>>>> +++ b/gcc/testsuite/gcc.dg/pr109071-1.c
>>>> @@ -0,0 +1,22 @@
>>>> +/* PR tree-optimization/109071 -Warray-bounds false positive warnings
>>>> +   due to code duplication from jump threading 
>>>> +   { dg-do compile }
>>>> +   { dg-options "-O2 -Warray-bounds=2" }
>>>> + */
>>>> +
>>>> +extern void warn(void);
>>>> +static inline void assign(int val, int *regs, int index)
>>>> +{
>>>> +  if (index >= 4)
>>>> +    warn();
>>>> +  *regs = val;
>>>> +}
>>>> +struct nums {int vals[4];};
>>>> +
>>>> +void sparx5_set (int *ptr, struct nums *sg, int index)
>>>> +{
>>>> +  int *val = &sg->vals[index]; /* { dg-warning "is above array bounds" } */
>>>> +
>>>> +  assign(0,    ptr, index);
>>>> +  assign(*val, ptr, index);
>>>> +}
>>>> diff --git a/gcc/testsuite/gcc.dg/pr109071.c b/gcc/testsuite/gcc.dg/pr109071.c
>>>> new file mode 100644
>>>> index 000000000000..782dfad84ea2
>>>> --- /dev/null
>>>> +++ b/gcc/testsuite/gcc.dg/pr109071.c
>>>> @@ -0,0 +1,22 @@
>>>> +/* PR tree-optimization/109071 -Warray-bounds false positive warnings
>>>> +   due to code duplication from jump threading 
>>>> +   { dg-do compile }
>>>> +   { dg-options "-O2 -Wall" }
>>>> + */
>>>> +
>>>> +extern void warn(void);
>>>> +static inline void assign(int val, int *regs, int index)
>>>> +{
>>>> +  if (index >= 4)
>>>> +    warn();
>>>> +  *regs = val;
>>>> +}
>>>> +struct nums {int vals[4];};
>>>> +
>>>> +void sparx5_set (int *ptr, struct nums *sg, int index)
>>>> +{
>>>> +  int *val = &sg->vals[index]; /* { dg-bogus "is above array bounds" } */
>>>> +
>>>> +  assign(0,    ptr, index);
>>>> +  assign(*val, ptr, index);
>>>> +}
>>>> diff --git a/gcc/tree-ssa-threadupdate.cc b/gcc/tree-ssa-threadupdate.cc
>>>> index fa61ba9512b7..9f338dd4d54d 100644
>>>> --- a/gcc/tree-ssa-threadupdate.cc
>>>> +++ b/gcc/tree-ssa-threadupdate.cc
>>>> @@ -2371,6 +2371,17 @@ back_jt_path_registry::adjust_paths_after_duplication (unsigned curr_path_num)
>>>>    }
>>>> }
>>>> 
>>>> +/* Set all the stmts in the basic block BB as IS_SPLITTED.  */
>>>> +
>>>> +static void
>>>> +set_stmts_in_bb_is_splitted (basic_block bb)
>>>> +{
>>>> +  gimple_stmt_iterator gsi;
>>>> +  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
>>>> +    gimple_set_is_splitted (gsi_stmt (gsi), true);
>>>> +  return;
>>>> +}
>>>> +
>>>> /* Duplicates a jump-thread path of N_REGION basic blocks.
>>>>   The ENTRY edge is redirected to the duplicate of the region.
>>>> 
>>>> @@ -2418,6 +2429,10 @@ back_jt_path_registry::duplicate_thread_path (edge entry,
>>>>  basic_block *region_copy = XNEWVEC (basic_block, n_region);
>>>>  copy_bbs (region, n_region, region_copy, &exit, 1, &exit_copy, loop,
>>>>    split_edge_bb_loc (entry), false);
>>>> +  /* Mark all the stmts in both original and copied basic blocks
>>>> +     as IS_SPLITTED.  */
>>>> +  set_stmts_in_bb_is_splitted (*region);
>>>> +  set_stmts_in_bb_is_splitted (*region_copy);
>>>> 
>>>>  /* Fix up: copy_bbs redirects all edges pointing to copied blocks.  The
>>>>     following code ensures that all the edges exiting the jump-thread path are
>>>> 
>>> 
>>> -- 
>>> Richard Biener <rguenther@suse.de>
>>> SUSE Software Solutions Germany GmbH,
>>> Frankenstrasse 146, 90461 Nuernberg, Germany;
>>> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
>> 
>> 
>> 
> 
> -- 
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
Qing Zhao May 14, 2024, 4:03 p.m. UTC | #11
> On May 14, 2024, at 11:08, Jeff Law <jeffreyalaw@gmail.com> wrote:
> 
> 
> 
> On 5/14/24 8:57 AM, Qing Zhao wrote:
>>> On May 13, 2024, at 20:14, Kees Cook <keescook@chromium.org> wrote:
>>> 
>>> On Tue, May 14, 2024 at 01:38:49AM +0200, Andrew Pinski wrote:
>>>> On Mon, May 13, 2024, 11:41 PM Kees Cook <keescook@chromium.org> wrote:
>>>>> But it makes no sense to warn about:
>>>>> 
>>>>> void sparx5_set (int * ptr, struct nums * sg, int index)
>>>>> {
>>>>>   if (index >= 4)
>>>>>     warn ();
>>>>>   *ptr = 0;
>>>>>   *val = sg->vals[index];
>>>>>   if (index >= 4)
>>>>>     warn ();
>>>>>   *ptr = *val;
>>>>> }
>>>>> 
>>>>> Because at "*val = sg->vals[index];" the actual value range tracking for
>>>>> index is _still_ [INT_MIN,INT_MAX]. (Only within the "then" side of the
>>>>> "if" statements is the range tracking [4,INT_MAX].)
>>>>> 
>>>>> However, in the case where jump threading has split the execution flow
>>>>> and produced a copy of "*val = sg->vals[index];" where the value range
>>>>> tracking for "index" is now [4,INT_MAX], is the warning valid. But it
>>>>> is only for that instance. Reporting it for effectively both (there is
>>>>> only 1 source line for the array indexing) is misleading because there
>>>>> is nothing the user can do about it -- the compiler created the copy and
>>>>> then noticed it had a range it could apply to that array index.
>>>>> 
>>>> 
>>>> "there is nothing the user can do about it" is very much false. They could
>>>> change warn call into a noreturn function call instead.  (In the case of
>>>> the Linux kernel panic). There are things the user can do to fix the
>>>> warning and even get better code generation out of the compilers.
>>> 
>>> This isn't about warn() not being noreturn. The warn() could be any
>>> function call; the jump threading still happens.
>> When the program is executed on the “if (index > = 4)” path,  the value of “index” is definitely
>>> = 4, when sg->vals[index] is referenced on this path (the case when the routine “warn” is NOT noreturn), it’s
>> definitely an out-of-bounds array access.  So, the compiler’s warning is correct. And this warning does catch
>> a potential issue in the source code that need to be fixed by either of the following two solutions:
>>    1. Make the routine “warn” as noreturn and mark it noreturn;
> This would be my recommendation.  We're about to execute undefined behavior.  I don't see a way to necessarily recover safely here, so I'd suggest having warn() not return and mark it appropriately.
> 
> That'll have numerous secondary benefits as well.

Yes, agreed.  Such source code change should be a nice security improvement to the linux kernel source code. 

Qing
> 
> jeff
Kees Cook May 14, 2024, 4:43 p.m. UTC | #12
On Mon, May 13, 2024 at 07:48:30PM +0000, Qing Zhao wrote:
> The false positive warnings are moved from -Warray-bounds=1 to
>  -Warray-bounds=2 now.

On a Linux kernel x86_64 allmodconfig build, this takes the -Warray-bounds
warnings from 14 to 9. After examining these 9, I see:

- 4: legitimate bugs in Linux source (3 distinct, 1 repeat). I'll be
  reporting/fixing these in Linux.
- 4: 4 instances of 1 case of buggy interaction with -fsanitize=shift,
     looking similar to these past bugs:
	https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105679
	https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108306
  the difference being operating on an enum. I will reduce the case
  and open a bug report for it.
- 1: still examining. It looks like a false positive so far.

Thanks!

-Kees
Richard Biener May 14, 2024, 5:14 p.m. UTC | #13
On Tue, 14 May 2024, Qing Zhao wrote:

> 
> 
> > On May 14, 2024, at 10:29, Richard Biener <rguenther@suse.de> wrote:
> > 
[...]
> >  It would of course
> > need experimenting since we can end up moving stmts and merging blocks
> > though the linear traces created by jump threading should be quite
> > stable (as opposed to say the unrolling case where multiple instances
> > of the loop body likely will end up in the exact same basic block).
> 
> Do you mean, for loop unrolling the approach with one extra stmt for one 
> basic block might be even harder and unreliable?

The question is whether the stmt marks the whole block or whether we
for example add both a START and END stmt covering a copied path.
I would guess for unrolling we need definitely need to do the latter
(so we can diagnose "on the 3rd iteration of an unrolled loop" or
similar).

Richard.
Qing Zhao May 14, 2024, 5:21 p.m. UTC | #14
> On May 14, 2024, at 13:14, Richard Biener <rguenther@suse.de> wrote:
> 
> On Tue, 14 May 2024, Qing Zhao wrote:
> 
>> 
>> 
>>> On May 14, 2024, at 10:29, Richard Biener <rguenther@suse.de> wrote:
>>> 
> [...]
>>> It would of course
>>> need experimenting since we can end up moving stmts and merging blocks
>>> though the linear traces created by jump threading should be quite
>>> stable (as opposed to say the unrolling case where multiple instances
>>> of the loop body likely will end up in the exact same basic block).
>> 
>> Do you mean, for loop unrolling the approach with one extra stmt for one 
>> basic block might be even harder and unreliable?
> 
> The question is whether the stmt marks the whole block or whether we
> for example add both a START and END stmt covering a copied path.
> I would guess for unrolling we need definitely need to do the latter
> (so we can diagnose "on the 3rd iteration of an unrolled loop" or
> similar).

Okay. I see. 

Is it possible that the START and END stmts might be moved around and out-of-place by the different optimizations?

Qing
> 
> Richard.
>
Kees Cook May 14, 2024, 7:50 p.m. UTC | #15
On Tue, May 14, 2024 at 02:17:16PM +0000, Qing Zhao wrote:
> The current major issue with the warning is:  the constant index value 4
> is not in the source code, it’s a compiler generated intermediate value
> (even though it’s a correct value -:)). Such warning messages confuse
> the end-users with information that cannot be connected directly to the
> source code.

Right -- this "4" comes from -fsanitize=array-bounds (in "warn but
continue" mode).

Now, the minimized PoC shows a situation that triggers the situation, but
I think it's worth looking at the original code that caused this false
positive:

	for (i = 0; i < sg->num_entries; i++) {
		gce = &sg->gce[i];


The issue here is that sg->num_entries has already been bounds-checked
in a separate function. As a result, the value range tracking for "i"
here is unbounded.

Enabling -fsanitize=array-bounds means the sg->gce[i] access gets
instrumented, and suddenly "i" gains an implicit range, induced by the
sanitizer.

(I would point out that this is very similar to the problems we've had
with -fsanitize=shift[1][2]: the sanitizer induces a belief about a
given variable's range this isn't true.)

Now, there is an argument to be made that the original code should be
doing:

	for (i = 0; i < 4 && i < sg->num_entries; i++) {

But this is:

a) logically redundant (Linux maintainers don't tend to like duplicating
   their range checking)

b) a very simple case

The point of the sanitizers is to catch "impossible" situations at
run-time for the cases where some value may end up out of range. Having
it _induce_ a range on the resulting code makes no sense.

Could we, perhaps, have sanitizer code not influence the value range
tracking? That continues to look like the root cause for these things.

-Kees

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105679
[2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108306
Richard Biener May 15, 2024, 5:50 a.m. UTC | #16
On Tue, 14 May 2024, Kees Cook wrote:

> On Tue, May 14, 2024 at 02:17:16PM +0000, Qing Zhao wrote:
> > The current major issue with the warning is:  the constant index value 4
> > is not in the source code, it’s a compiler generated intermediate value
> > (even though it’s a correct value -:)). Such warning messages confuse
> > the end-users with information that cannot be connected directly to the
> > source code.
> 
> Right -- this "4" comes from -fsanitize=array-bounds (in "warn but
> continue" mode).
> 
> Now, the minimized PoC shows a situation that triggers the situation, but
> I think it's worth looking at the original code that caused this false
> positive:
> 
> 	for (i = 0; i < sg->num_entries; i++) {
> 		gce = &sg->gce[i];
> 
> 
> The issue here is that sg->num_entries has already been bounds-checked
> in a separate function. As a result, the value range tracking for "i"
> here is unbounded.
> 
> Enabling -fsanitize=array-bounds means the sg->gce[i] access gets
> instrumented, and suddenly "i" gains an implicit range, induced by the
> sanitizer.
> 
> (I would point out that this is very similar to the problems we've had
> with -fsanitize=shift[1][2]: the sanitizer induces a belief about a
> given variable's range this isn't true.)
> 
> Now, there is an argument to be made that the original code should be
> doing:
> 
> 	for (i = 0; i < 4 && i < sg->num_entries; i++) {
> 
> But this is:
> 
> a) logically redundant (Linux maintainers don't tend to like duplicating
>    their range checking)
> 
> b) a very simple case
> 
> The point of the sanitizers is to catch "impossible" situations at
> run-time for the cases where some value may end up out of range. Having
> it _induce_ a range on the resulting code makes no sense.
> 
> Could we, perhaps, have sanitizer code not influence the value range
> tracking? That continues to look like the root cause for these things.

The sanitizer code adds checks that are not distinguishable from
user code exactly because we want value-range analysis to eventually
elide even (redundant) sanitizer checks.

I think the fix for the source when there's a-priori knowledge
of sg->num_entries is to assert that knowledge through language
features or when using C through GNU extensions like assert()
using __builtin_unreachable ().  That also serves documentation
purposes "this code expects sg->num_entries to be bounds-checked".

To me it doesn't make much sense to mix sanitizing of array
accesses and at the same time do -Warray-bound diagnostics.

Note I tried to teach jump threading to be less aggressive
threading paths to exceptional situations (I think the
sanitizer runtime calls are at least marked unlikely), but
the comment was that even those are very much desired but
I can't remember the details.  This was as part of PR111515
but I think I've run into this earlier when trying to
improve back_threader_profitability::possibly_profitable_path_p.
There we have

  /* Threading is profitable if the path duplicated is hot but also
     in a case we separate cold path from hot path and permit optimization
     of the hot path later.  Be on the agressive side here. In some testcases,
     as in PR 78407 this leads to noticeable improvements.  */

here we have

  if (A)
    unlikely();
  B;
  if (A)
    unlikely ();

and we choose to perform that path separation which optimizes
the not exceptional path which automatically separates the
exceptional path as well.

IMO that sanitizer mode that continues running is bad - it makes
the compiler aware of undefined behavior and make the code run
into it with open eyes.  You get what you asked for.

Richard.
Richard Biener May 15, 2024, 6:09 a.m. UTC | #17
On Tue, 14 May 2024, Qing Zhao wrote:

> 
> 
> > On May 14, 2024, at 13:14, Richard Biener <rguenther@suse.de> wrote:
> > 
> > On Tue, 14 May 2024, Qing Zhao wrote:
> > 
> >> 
> >> 
> >>> On May 14, 2024, at 10:29, Richard Biener <rguenther@suse.de> wrote:
> >>> 
> > [...]
> >>> It would of course
> >>> need experimenting since we can end up moving stmts and merging blocks
> >>> though the linear traces created by jump threading should be quite
> >>> stable (as opposed to say the unrolling case where multiple instances
> >>> of the loop body likely will end up in the exact same basic block).
> >> 
> >> Do you mean, for loop unrolling the approach with one extra stmt for one 
> >> basic block might be even harder and unreliable?
> > 
> > The question is whether the stmt marks the whole block or whether we
> > for example add both a START and END stmt covering a copied path.
> > I would guess for unrolling we need definitely need to do the latter
> > (so we can diagnose "on the 3rd iteration of an unrolled loop" or
> > similar).
> 
> Okay. I see. 
> 
> Is it possible that the START and END stmts might be moved around and 
> out-of-place by the different optimizations?

There is nothign preventing stmts to be moved across START or END.

Richard.
Qing Zhao May 15, 2024, 1:37 p.m. UTC | #18
> On May 15, 2024, at 02:09, Richard Biener <rguenther@suse.de> wrote:
> 
> On Tue, 14 May 2024, Qing Zhao wrote:
> 
>> 
>> 
>>> On May 14, 2024, at 13:14, Richard Biener <rguenther@suse.de> wrote:
>>> 
>>> On Tue, 14 May 2024, Qing Zhao wrote:
>>> 
>>>> 
>>>> 
>>>>> On May 14, 2024, at 10:29, Richard Biener <rguenther@suse.de> wrote:
>>>>> 
>>> [...]
>>>>> It would of course
>>>>> need experimenting since we can end up moving stmts and merging blocks
>>>>> though the linear traces created by jump threading should be quite
>>>>> stable (as opposed to say the unrolling case where multiple instances
>>>>> of the loop body likely will end up in the exact same basic block).
>>>> 
>>>> Do you mean, for loop unrolling the approach with one extra stmt for one 
>>>> basic block might be even harder and unreliable?
>>> 
>>> The question is whether the stmt marks the whole block or whether we
>>> for example add both a START and END stmt covering a copied path.
>>> I would guess for unrolling we need definitely need to do the latter
>>> (so we can diagnose "on the 3rd iteration of an unrolled loop" or
>>> similar).
>> 
>> Okay. I see. 
>> 
>> Is it possible that the START and END stmts might be moved around and 
>> out-of-place by the different optimizations?
> 
> There is nothign preventing stmts to be moved across START or END.
Then we have to add some artificial data dependency or memory barrier at START and END to prevent such transformation. However, this might also prevent some useful transformation, therefore impact the performance…
Not sure whether this is a good approach…

Yes, some experiments might need to be done to compare the cost of these different approaches.

Qing
> 
> Richard.
David Malcolm May 15, 2024, 2 p.m. UTC | #19
On Tue, 2024-05-14 at 15:08 +0200, Richard Biener wrote:
> On Mon, 13 May 2024, Qing Zhao wrote:
> 
> > -Warray-bounds is an important option to enable linux kernal to
> > keep
> > the array out-of-bound errors out of the source tree.
> > 
> > However, due to the false positive warnings reported in PR109071
> > (-Warray-bounds false positive warnings due to code duplication
> > from
> > jump threading), -Warray-bounds=1 cannot be added on by default.
> > 
> > Although it's impossible to elinimate all the false positive
> > warnings
> > from -Warray-bounds=1 (See PR104355 Misleading -Warray-bounds
> > documentation says "always out of bounds"), we should minimize the
> > false positive warnings in -Warray-bounds=1.
> > 
> > The root reason for the false positive warnings reported in
> > PR109071 is:
> > 
> > When the thread jump optimization tries to reduce the # of branches
> > inside the routine, sometimes it needs to duplicate the code and
> > split into two conditional pathes. for example:
> > 
> > The original code:
> > 
> > void sparx5_set (int * ptr, struct nums * sg, int index)
> > {
> >   if (index >= 4)
> >     warn ();
> >   *ptr = 0;
> >   *val = sg->vals[index];
> >   if (index >= 4)
> >     warn ();
> >   *ptr = *val;
> > 
> >   return;
> > }
> > 
> > With the thread jump, the above becomes:
> > 
> > void sparx5_set (int * ptr, struct nums * sg, int index)
> > {
> >   if (index >= 4)
> >     {
> >       warn ();
> >       *ptr = 0;         // Code duplications since "warn" does
> > return;
> >       *val = sg->vals[index];   // same this line.
> >                                 // In this path, since it's under
> > the condition
> >                                 // "index >= 4", the compiler knows
> > the value
> >                                 // of "index" is larger then 4,
> > therefore the
> >                                 // out-of-bound warning.
> >       warn ();
> >     }
> >   else
> >     {
> >       *ptr = 0;
> >       *val = sg->vals[index];
> >     }
> >   *ptr = *val;
> >   return;
> > }
> > 
> > We can see, after the thread jump optimization, the # of branches
> > inside
> > the routine "sparx5_set" is reduced from 2 to 1, however,  due to
> > the
> > code duplication (which is needed for the correctness of the code),
> > we
> > got a false positive out-of-bound warning.
> > 
> > In order to eliminate such false positive out-of-bound warning,
> > 
> > A. Add one more flag for GIMPLE: is_splitted.
> > B. During the thread jump optimization, when the basic blocks are
> >    duplicated, mark all the STMTs inside the original and
> > duplicated
> >    basic blocks as "is_splitted";
> > C. Inside the array bound checker, add the following new heuristic:
> > 
> > If
> >    1. the stmt is duplicated and splitted into two conditional
> > paths;
> > +  2. the warning level < 2;
> > +  3. the current block is not dominating the exit block
> > Then not report the warning.
> > 
> > The false positive warnings are moved from -Warray-bounds=1 to
> >  -Warray-bounds=2 now.
> > 
> > Bootstrapped and regression tested on both x86 and aarch64.
> > adjusted
> >  -Warray-bounds-61.c due to the false positive warnings.
> > 
> > Let me know if you have any comments and suggestions.
> 
> At the last Cauldron I talked with David Malcolm about these kind of
> issues and thought of instead of suppressing diagnostics to record
> how a block was duplicated.  For jump threading my idea was to record
> the condition that was proved true when entering the path and do this
> by recording the corresponding locations so that in the end we can
> use the diagnostic-path infrastructure to say
> 
> warning: array index always above array bounds
> events 1:
> 
> > 3 |  if (index >= 4)
>          |
>         (1) when index >= 4
> 
> it would be possible to record the info as part of the ad-hoc
> location data on each duplicated stmt or, possibly simpler,
> as part of a debug stmt of new kind.
> 
> I'm not sure pruning the warnings is a good thing to do.  One
> would argue we should instead isolate such path as unreachable
> since it invokes undefined behavior.  In particular your
> example is clearly a bug and should be diagnosed.
> 
> Note very similar issues happen when unrolling a loop.
> 
> Note all late diagnostics are prone to these kind of issues.

To recap our chat at Cauldron: any GCC diagnostic can potentially have
a diagnostic_path associated with it (not just the analyzer).  The
current mechanism is:
(a) use a rich_location for the diagnostic, and 
(b) create an instance of some diagnostic_path subclass (e.g.
simple_diagnostic_path, or something else), and 
(c) call  richloc.set_path (&path);  to associate the path with the
rich_location

See gcc/testsuite/gcc.dg/plugin/diagnostic_plugin_test_paths.c for an
example of using simple_diagnostic_path that doesn't use the analyzer.


If we want *every* late diagnostic to potentially have a path, it
sounds like we might want some extra infrastructure (perhaps a hook
that autogenerates the paths from the ad-hoc data???)  But probably
best to get it working for just a specific example first before trying
to be fancy and generalize.

Dave


> 
> Richard.
> 
> > Thanks.
> > 
> > Qing
> > 
> > 
> >         PR tree optimization/109071
> > 
> > gcc/ChangeLog:
> > 
> >         * gimple-array-bounds.cc (check_out_of_bounds_and_warn):
> > Add two new
> >         arguments for the new heuristic to not issue warnings.
> >         (array_bounds_checker::check_array_ref): Call the new
> > prototype of the
> >         routine check_out_of_bounds_and_warn.
> >         (array_bounds_checker::check_mem_ref): Add one new argument
> > for the
> >         new heuristic to not issue warnings.
> >         (array_bounds_checker::check_addr_expr): Call the new
> > prototype of the
> >         routine check_mem_ref, add new heuristic for not issue
> > warnings.
> >         (array_bounds_checker::check_array_bounds): Call the new
> > prototype of
> >         the routine check_mem_ref.
> >         * gimple-array-bounds.h: New prototype of check_mem_ref.
> >         * gimple.h (struct GTY): Add one new flag is_splitted for
> > gimple.
> >         (gimple_is_splitted_p): New function.
> >         (gimple_set_is_splitted): New function.
> >         * tree-ssa-threadupdate.cc (set_stmts_in_bb_is_splitted):
> > New function.
> >         (back_jt_path_registry::duplicate_thread_path): Mark all
> > the stmts in
> >         both original and copied blocks as IS_SPLITTED.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> >         * gcc.dg/Warray-bounds-61.c: Adjust testing case.
> >         * gcc.dg/pr109071-1.c: New test.
> >         * gcc.dg/pr109071.c: New test.
> > ---
> >  gcc/gimple-array-bounds.cc              | 46
> > +++++++++++++++++++++----
> >  gcc/gimple-array-bounds.h               |  2 +-
> >  gcc/gimple.h                            | 21 +++++++++--
> >  gcc/testsuite/gcc.dg/Warray-bounds-61.c |  6 ++--
> >  gcc/testsuite/gcc.dg/pr109071-1.c       | 22 ++++++++++++
> >  gcc/testsuite/gcc.dg/pr109071.c         | 22 ++++++++++++
> >  gcc/tree-ssa-threadupdate.cc            | 15 ++++++++
> >  7 files changed, 122 insertions(+), 12 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.dg/pr109071-1.c
> >  create mode 100644 gcc/testsuite/gcc.dg/pr109071.c
> > 
> > diff --git a/gcc/gimple-array-bounds.cc b/gcc/gimple-array-
> > bounds.cc
> > index 008071cd5464..4a2975623bc1 100644
> > --- a/gcc/gimple-array-bounds.cc
> > +++ b/gcc/gimple-array-bounds.cc
> > @@ -264,7 +264,9 @@ check_out_of_bounds_and_warn (location_t
> > location, tree ref,
> >                               tree up_bound, tree up_bound_p1,
> >                               const value_range *vr,
> >                               bool ignore_off_by_one, bool
> > for_array_bound,
> > -                             bool *out_of_bound)
> > +                             bool *out_of_bound,
> > +                             bool is_splitted,
> > +                             bool is_dominate_exit)
> >  {
> >    tree min, max;
> >    tree low_bound = array_ref_low_bound (ref);
> > @@ -273,6 +275,13 @@ check_out_of_bounds_and_warn (location_t
> > location, tree ref,
> >    bool warned = false;
> >    *out_of_bound = false;
> >  
> > +  /* If the stmt is duplicated and splitted, the warning level is
> > not 2,
> > +     and the current block is not dominating the exit block, not
> > report
> > +     the warning.  */
> > +  if (is_splitted && warn_array_bounds < 2
> > +      && !is_dominate_exit)
> > +    return false;
> > +
> >    /* Empty array.  */
> >    if (up_bound && tree_int_cst_equal (low_bound, up_bound_p1))
> >      {
> > @@ -386,12 +395,17 @@ array_bounds_checker::check_array_ref
> > (location_t location, tree ref,
> >         }
> >      }
> >  
> > +  bool is_dominate_exit = dominated_by_p (CDI_DOMINATORS,
> > +                                         EXIT_BLOCK_PTR_FOR_FN
> > (fun),
> > +                                         gimple_bb (stmt));
> > +
> >    warned = check_out_of_bounds_and_warn (location, ref,
> >                                          low_sub_org, low_sub,
> > up_sub,
> >                                          up_bound, up_bound_p1,
> > &vr,
> >                                          ignore_off_by_one,
> > warn_array_bounds,
> > -                                        &out_of_bound);
> > -
> > +                                        &out_of_bound,
> > +                                        gimple_is_splitted_p
> > (stmt),
> > +                                        is_dominate_exit);
> >  
> >    if (!warned && sam == special_array_member::int_0)
> >      warned = warning_at (location, OPT_Wzero_length_bounds,
> > @@ -476,7 +490,7 @@ array_bounds_checker::check_array_ref
> > (location_t location, tree ref,
> >  
> >  bool
> >  array_bounds_checker::check_mem_ref (location_t location, tree
> > ref,
> > -                                    bool ignore_off_by_one)
> > +                                    gimple *stmt, bool
> > ignore_off_by_one)
> >  {
> >    if (warning_suppressed_p (ref, OPT_Warray_bounds_))
> >      return false;
> > @@ -574,6 +588,16 @@ array_bounds_checker::check_mem_ref
> > (location_t location, tree ref,
> >         }
> >      }
> >  
> > +  /* If the stmt is duplicated and splitted, the warning level is
> > not 2,
> > +     and the current block is not dominating the exit block, not
> > report
> > +     the warning.  */
> > +  bool is_dominate_exit = dominated_by_p (CDI_DOMINATORS,
> > +                                         EXIT_BLOCK_PTR_FOR_FN
> > (fun),
> > +                                         gimple_bb (stmt));
> > +  if (gimple_is_splitted_p (stmt) && warn_array_bounds < 2
> > +      && !is_dominate_exit)
> > +    return false;
> > +
> >    bool warned = false;
> >    if (lboob)
> >      {
> > @@ -654,7 +678,7 @@ array_bounds_checker::check_addr_expr
> > (location_t location, tree t,
> >           ignore_off_by_one = false;
> >         }
> >        else if (TREE_CODE (t) == MEM_REF)
> > -       warned = check_mem_ref (location, t, ignore_off_by_one);
> > +       warned = check_mem_ref (location, t, stmt,
> > ignore_off_by_one);
> >  
> >        if (warned)
> >         suppress_warning (t, OPT_Warray_bounds_);
> > @@ -690,6 +714,16 @@ array_bounds_checker::check_addr_expr
> > (location_t location, tree t,
> >    if (!mem_ref_offset (t).is_constant (&idx))
> >      return;
> >  
> > +  /* If the stmt is duplicated and splitted, the warning level is
> > not 2,
> > +     and the current block is not dominating the exit block, not
> > report
> > +     the warning.  */
> > +  bool is_dominate_exit = dominated_by_p (CDI_DOMINATORS,
> > +                                         EXIT_BLOCK_PTR_FOR_FN
> > (fun),
> > +                                         gimple_bb (stmt));
> > +  if (gimple_is_splitted_p (stmt) && warn_array_bounds < 2
> > +      && !is_dominate_exit)
> > +    return;
> > +
> >    bool warned = false;
> >    idx = wi::sdiv_trunc (idx, wi::to_offset (el_sz));
> >    if (idx < 0)
> > @@ -809,7 +843,7 @@ array_bounds_checker::check_array_bounds (tree
> > *tp, int *walk_subtree,
> >      warned = checker->check_array_ref (location, t, wi->stmt,
> >                                        false/*ignore_off_by_one*/);
> >    else if (TREE_CODE (t) == MEM_REF)
> > -    warned = checker->check_mem_ref (location, t,
> > +    warned = checker->check_mem_ref (location, t, wi->stmt,
> >                                      false /*ignore_off_by_one*/);
> >    else if (TREE_CODE (t) == ADDR_EXPR)
> >      {
> > diff --git a/gcc/gimple-array-bounds.h b/gcc/gimple-array-bounds.h
> > index 3e077d0178ff..7c98f02204c9 100644
> > --- a/gcc/gimple-array-bounds.h
> > +++ b/gcc/gimple-array-bounds.h
> > @@ -33,7 +33,7 @@ public:
> >  private:
> >    static tree check_array_bounds (tree *tp, int *walk_subtree,
> > void *data);
> >    bool check_array_ref (location_t, tree, gimple *, bool
> > ignore_off_by_one);
> > -  bool check_mem_ref (location_t, tree, bool ignore_off_by_one);
> > +  bool check_mem_ref (location_t, tree, gimple *, bool
> > ignore_off_by_one);
> >    void check_addr_expr (location_t, tree, gimple *);
> >    void get_value_range (irange &r, const_tree op, gimple *);
> >  
> > diff --git a/gcc/gimple.h b/gcc/gimple.h
> > index bd315ffc2dd4..08f52d107084 100644
> > --- a/gcc/gimple.h
> > +++ b/gcc/gimple.h
> > @@ -254,8 +254,8 @@ struct GTY((desc ("gimple_statement_structure
> > (&%h)"), tag ("GSS_BASE"),
> >    /* Nonzero if this statement contains volatile operands.  */
> >    unsigned has_volatile_ops    : 1;
> >  
> > -  /* Padding to get subcode to 16 bit alignment.  */
> > -  unsigned pad                 : 1;
> > +  /* Nonzero if this statement is duplicated and splitted to two
> > pathes.  */
> > +  unsigned is_splitted         : 1;
> >  
> >    /* The SUBCODE field can be used for tuple-specific flags for
> > tuples
> >       that do not require subcodes.  Note that SUBCODE should be at
> > @@ -2327,6 +2327,23 @@ gimple_set_has_volatile_ops (gimple *stmt,
> > bool volatilep)
> >      stmt->has_volatile_ops = (unsigned) volatilep;
> >  }
> >  
> > +/* Return true if statement G's is_splitted field has
> > +   been set.  */
> > +
> > +inline bool
> > +gimple_is_splitted_p (const gimple *g)
> > +{
> > +  return (bool) g->is_splitted;
> > +}
> > +
> > +/* Set the IS_SPLITTED flag to IS_SPLITTEDP.  */
> > +
> > +inline void
> > +gimple_set_is_splitted (gimple *s, bool is_splittedp)
> > +{
> > +    s->is_splitted = (unsigned) is_splittedp;
> > +}
> > +
> >  /* Return true if STMT is in a transaction.  */
> >  
> >  inline bool
> > diff --git a/gcc/testsuite/gcc.dg/Warray-bounds-61.c
> > b/gcc/testsuite/gcc.dg/Warray-bounds-61.c
> > index 5b66cdc0aab1..cb3c64a813d7 100644
> > --- a/gcc/testsuite/gcc.dg/Warray-bounds-61.c
> > +++ b/gcc/testsuite/gcc.dg/Warray-bounds-61.c
> > @@ -23,7 +23,7 @@ void test_ua3_ua0_a0 (int i)
> >  
> >    if (i < __LINE__)
> >      i = 5;
> > -  ua3_a0.a0[i] = 0;           // { dg-warning "\\\[-Warray-bounds"
> > }
> > +  ua3_a0.a0[i] = 0;           // { dg-bogus "\\\[-Warray-bounds" }
> >  
> >    if (i > -1)
> >      i = -1;
> > @@ -44,7 +44,7 @@ void test_ua3_ua0_a1 (int i)
> >  
> >    if (i > -1)
> >      i = -1;
> > -  ua3_a0.a1[i] = 0;           // { dg-warning "\\\[-Warray-bounds"
> > }
> > +  ua3_a0.a1[i] = 0;           // { dg-bogus "\\\[-Warray-bounds" }
> >  
> >    if (i < 7)
> >      i = 7;
> > @@ -60,7 +60,7 @@ void test_ua3_ua0_a2 (int i)
> >  
> >    if (i < __LINE__)
> >      i = __LINE__;
> > -  ua3_a0.a2[i] = 0;           // { dg-warning "\\\[-Warray-bounds"
> > }
> > +  ua3_a0.a2[i] = 0;           // { dg-bogus "\\\[-Warray-bounds" }
> >  
> >    if (i > -1)
> >      i = -1;
> > diff --git a/gcc/testsuite/gcc.dg/pr109071-1.c
> > b/gcc/testsuite/gcc.dg/pr109071-1.c
> > new file mode 100644
> > index 000000000000..a405c80bd549
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/pr109071-1.c
> > @@ -0,0 +1,22 @@
> > +/* PR tree-optimization/109071 -Warray-bounds false positive
> > warnings
> > +   due to code duplication from jump threading 
> > +   { dg-do compile }
> > +   { dg-options "-O2 -Warray-bounds=2" }
> > + */
> > +
> > +extern void warn(void);
> > +static inline void assign(int val, int *regs, int index)
> > +{
> > +  if (index >= 4)
> > +    warn();
> > +  *regs = val;
> > +}
> > +struct nums {int vals[4];};
> > +
> > +void sparx5_set (int *ptr, struct nums *sg, int index)
> > +{
> > +  int *val = &sg->vals[index]; /* { dg-warning "is above array
> > bounds" } */
> > +
> > +  assign(0,    ptr, index);
> > +  assign(*val, ptr, index);
> > +}
> > diff --git a/gcc/testsuite/gcc.dg/pr109071.c
> > b/gcc/testsuite/gcc.dg/pr109071.c
> > new file mode 100644
> > index 000000000000..782dfad84ea2
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/pr109071.c
> > @@ -0,0 +1,22 @@
> > +/* PR tree-optimization/109071 -Warray-bounds false positive
> > warnings
> > +   due to code duplication from jump threading 
> > +   { dg-do compile }
> > +   { dg-options "-O2 -Wall" }
> > + */
> > +
> > +extern void warn(void);
> > +static inline void assign(int val, int *regs, int index)
> > +{
> > +  if (index >= 4)
> > +    warn();
> > +  *regs = val;
> > +}
> > +struct nums {int vals[4];};
> > +
> > +void sparx5_set (int *ptr, struct nums *sg, int index)
> > +{
> > +  int *val = &sg->vals[index]; /* { dg-bogus "is above array
> > bounds" } */
> > +
> > +  assign(0,    ptr, index);
> > +  assign(*val, ptr, index);
> > +}
> > diff --git a/gcc/tree-ssa-threadupdate.cc b/gcc/tree-ssa-
> > threadupdate.cc
> > index fa61ba9512b7..9f338dd4d54d 100644
> > --- a/gcc/tree-ssa-threadupdate.cc
> > +++ b/gcc/tree-ssa-threadupdate.cc
> > @@ -2371,6 +2371,17 @@
> > back_jt_path_registry::adjust_paths_after_duplication (unsigned
> > curr_path_num)
> >      }
> >  }
> >  
> > +/* Set all the stmts in the basic block BB as IS_SPLITTED.  */
> > +
> > +static void
> > +set_stmts_in_bb_is_splitted (basic_block bb)
> > +{
> > +  gimple_stmt_iterator gsi;
> > +  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> > +    gimple_set_is_splitted (gsi_stmt (gsi), true);
> > +  return;
> > +}
> > +
> >  /* Duplicates a jump-thread path of N_REGION basic blocks.
> >     The ENTRY edge is redirected to the duplicate of the region.
> >  
> > @@ -2418,6 +2429,10 @@ back_jt_path_registry::duplicate_thread_path
> > (edge entry,
> >    basic_block *region_copy = XNEWVEC (basic_block, n_region);
> >    copy_bbs (region, n_region, region_copy, &exit, 1, &exit_copy,
> > loop,
> >             split_edge_bb_loc (entry), false);
> > +  /* Mark all the stmts in both original and copied basic blocks
> > +     as IS_SPLITTED.  */
> > +  set_stmts_in_bb_is_splitted (*region);
> > +  set_stmts_in_bb_is_splitted (*region_copy);
> >  
> >    /* Fix up: copy_bbs redirects all edges pointing to copied
> > blocks.  The
> >       following code ensures that all the edges exiting the jump-
> > thread path are
> > 
>
Qing Zhao May 21, 2024, 3:13 p.m. UTC | #20
Thanks for the comments and suggestions.

> On May 15, 2024, at 10:00, David Malcolm <dmalcolm@redhat.com> wrote:
> 
> On Tue, 2024-05-14 at 15:08 +0200, Richard Biener wrote:
>> On Mon, 13 May 2024, Qing Zhao wrote:
>> 
>>> -Warray-bounds is an important option to enable linux kernal to
>>> keep
>>> the array out-of-bound errors out of the source tree.
>>> 
>>> However, due to the false positive warnings reported in PR109071
>>> (-Warray-bounds false positive warnings due to code duplication
>>> from
>>> jump threading), -Warray-bounds=1 cannot be added on by default.
>>> 
>>> Although it's impossible to elinimate all the false positive
>>> warnings
>>> from -Warray-bounds=1 (See PR104355 Misleading -Warray-bounds
>>> documentation says "always out of bounds"), we should minimize the
>>> false positive warnings in -Warray-bounds=1.
>>> 
>>> The root reason for the false positive warnings reported in
>>> PR109071 is:
>>> 
>>> When the thread jump optimization tries to reduce the # of branches
>>> inside the routine, sometimes it needs to duplicate the code and
>>> split into two conditional pathes. for example:
>>> 
>>> The original code:
>>> 
>>> void sparx5_set (int * ptr, struct nums * sg, int index)
>>> {
>>>   if (index >= 4)
>>>     warn ();
>>>   *ptr = 0;
>>>   *val = sg->vals[index];
>>>   if (index >= 4)
>>>     warn ();
>>>   *ptr = *val;
>>> 
>>>   return;
>>> }
>>> 
>>> With the thread jump, the above becomes:
>>> 
>>> void sparx5_set (int * ptr, struct nums * sg, int index)
>>> {
>>>   if (index >= 4)
>>>     {
>>>       warn ();
>>>       *ptr = 0;         // Code duplications since "warn" does
>>> return;
>>>       *val = sg->vals[index];   // same this line.
>>>                                 // In this path, since it's under
>>> the condition
>>>                                 // "index >= 4", the compiler knows
>>> the value
>>>                                 // of "index" is larger then 4,
>>> therefore the
>>>                                 // out-of-bound warning.
>>>       warn ();
>>>     }
>>>   else
>>>     {
>>>       *ptr = 0;
>>>       *val = sg->vals[index];
>>>     }
>>>   *ptr = *val;
>>>   return;
>>> }
>>> 
>>> We can see, after the thread jump optimization, the # of branches
>>> inside
>>> the routine "sparx5_set" is reduced from 2 to 1, however,  due to
>>> the
>>> code duplication (which is needed for the correctness of the code),
>>> we
>>> got a false positive out-of-bound warning.
>>> 
>>> In order to eliminate such false positive out-of-bound warning,
>>> 
>>> A. Add one more flag for GIMPLE: is_splitted.
>>> B. During the thread jump optimization, when the basic blocks are
>>>    duplicated, mark all the STMTs inside the original and
>>> duplicated
>>>    basic blocks as "is_splitted";
>>> C. Inside the array bound checker, add the following new heuristic:
>>> 
>>> If
>>>    1. the stmt is duplicated and splitted into two conditional
>>> paths;
>>> +  2. the warning level < 2;
>>> +  3. the current block is not dominating the exit block
>>> Then not report the warning.
>>> 
>>> The false positive warnings are moved from -Warray-bounds=1 to
>>>  -Warray-bounds=2 now.
>>> 
>>> Bootstrapped and regression tested on both x86 and aarch64.
>>> adjusted
>>>  -Warray-bounds-61.c due to the false positive warnings.
>>> 
>>> Let me know if you have any comments and suggestions.
>> 
>> At the last Cauldron I talked with David Malcolm about these kind of
>> issues and thought of instead of suppressing diagnostics to record
>> how a block was duplicated.  For jump threading my idea was to record
>> the condition that was proved true when entering the path and do this
>> by recording the corresponding locations

Is only recording the location for the TRUE path  enough?
We might need to record the corresponding locations for both TRUE and FALSE paths since the VRP might be more accurate on both paths. 
Is only recording the location is enough? 
Do we need to record the pointer to the original condition stmt?


>> so that in the end we can
>> use the diagnostic-path infrastructure to say
>> 
>> warning: array index always above array bounds
>> events 1:
>> 
>>> 3 |  if (index >= 4)
>>          |
>>         (1) when index >= 4
>> 
>> it would be possible to record the info as part of the ad-hoc
>> location data on each duplicated stmt or, possibly simpler,
>> as part of a debug stmt of new kind.
>> 
>> I'm not sure pruning the warnings is a good thing to do.  One
>> would argue we should instead isolate such path as unreachable
>> since it invokes undefined behavior.  In particular your
>> example is clearly a bug and should be diagnosed.
>> 
>> Note very similar issues happen when unrolling a loop.
>> 
>> Note all late diagnostics are prone to these kind of issues.
> 
> To recap our chat at Cauldron: any GCC diagnostic can potentially have
> a diagnostic_path associated with it (not just the analyzer).  The
> current mechanism is:
> (a) use a rich_location for the diagnostic, and 
> (b) create an instance of some diagnostic_path subclass (e.g.
> simple_diagnostic_path, or something else), and 
> (c) call  richloc.set_path (&path);  to associate the path with the
> rich_location
> 
> See gcc/testsuite/gcc.dg/plugin/diagnostic_plugin_test_paths.c for an
> example of using simple_diagnostic_path that doesn't use the analyzer.

Thanks for the information. 
Yes, If we have recorded all necessary information for the diagnostic during the jump threading or loop unrolling transformation, the current rich_location and diagnostic_path infrastruture looks a very nice fit to use to report the warning with more details. 
> 
> 
> If we want *every* late diagnostic to potentially have a path, it
> sounds like we might want some extra infrastructure (perhaps a hook
> that autogenerates the paths from the ad-hoc data???)

Recording the minimum and necessary information into the IR during transformation to help the late diagnostic is the key to this approach.  What kind of information need to be recorded and where to record such information to minimize the cost need more thinking and discussion.

>  But probably
> best to get it working for just a specific example first before trying
> to be fancy and generalize.

Yes. 

Thanks.

Qing
> 
> Dave
> 
> 
>> 
>> Richard.
>> 
>>> Thanks.
>>> 
>>> Qing
>>> 
>>> 
>>>         PR tree optimization/109071
>>> 
>>> gcc/ChangeLog:
>>> 
>>>         * gimple-array-bounds.cc (check_out_of_bounds_and_warn):
>>> Add two new
>>>         arguments for the new heuristic to not issue warnings.
>>>         (array_bounds_checker::check_array_ref): Call the new
>>> prototype of the
>>>         routine check_out_of_bounds_and_warn.
>>>         (array_bounds_checker::check_mem_ref): Add one new argument
>>> for the
>>>         new heuristic to not issue warnings.
>>>         (array_bounds_checker::check_addr_expr): Call the new
>>> prototype of the
>>>         routine check_mem_ref, add new heuristic for not issue
>>> warnings.
>>>         (array_bounds_checker::check_array_bounds): Call the new
>>> prototype of
>>>         the routine check_mem_ref.
>>>         * gimple-array-bounds.h: New prototype of check_mem_ref.
>>>         * gimple.h (struct GTY): Add one new flag is_splitted for
>>> gimple.
>>>         (gimple_is_splitted_p): New function.
>>>         (gimple_set_is_splitted): New function.
>>>         * tree-ssa-threadupdate.cc (set_stmts_in_bb_is_splitted):
>>> New function.
>>>         (back_jt_path_registry::duplicate_thread_path): Mark all
>>> the stmts in
>>>         both original and copied blocks as IS_SPLITTED.
>>> 
>>> gcc/testsuite/ChangeLog:
>>> 
>>>         * gcc.dg/Warray-bounds-61.c: Adjust testing case.
>>>         * gcc.dg/pr109071-1.c: New test.
>>>         * gcc.dg/pr109071.c: New test.
>>> ---
>>>  gcc/gimple-array-bounds.cc              | 46
>>> +++++++++++++++++++++----
>>>  gcc/gimple-array-bounds.h               |  2 +-
>>>  gcc/gimple.h                            | 21 +++++++++--
>>>  gcc/testsuite/gcc.dg/Warray-bounds-61.c |  6 ++--
>>>  gcc/testsuite/gcc.dg/pr109071-1.c       | 22 ++++++++++++
>>>  gcc/testsuite/gcc.dg/pr109071.c         | 22 ++++++++++++
>>>  gcc/tree-ssa-threadupdate.cc            | 15 ++++++++
>>>  7 files changed, 122 insertions(+), 12 deletions(-)
>>>  create mode 100644 gcc/testsuite/gcc.dg/pr109071-1.c
>>>  create mode 100644 gcc/testsuite/gcc.dg/pr109071.c
>>> 
>>> diff --git a/gcc/gimple-array-bounds.cc b/gcc/gimple-array-
>>> bounds.cc
>>> index 008071cd5464..4a2975623bc1 100644
>>> --- a/gcc/gimple-array-bounds.cc
>>> +++ b/gcc/gimple-array-bounds.cc
>>> @@ -264,7 +264,9 @@ check_out_of_bounds_and_warn (location_t
>>> location, tree ref,
>>>                               tree up_bound, tree up_bound_p1,
>>>                               const value_range *vr,
>>>                               bool ignore_off_by_one, bool
>>> for_array_bound,
>>> -                             bool *out_of_bound)
>>> +                             bool *out_of_bound,
>>> +                             bool is_splitted,
>>> +                             bool is_dominate_exit)
>>>  {
>>>    tree min, max;
>>>    tree low_bound = array_ref_low_bound (ref);
>>> @@ -273,6 +275,13 @@ check_out_of_bounds_and_warn (location_t
>>> location, tree ref,
>>>    bool warned = false;
>>>    *out_of_bound = false;
>>>  
>>> +  /* If the stmt is duplicated and splitted, the warning level is
>>> not 2,
>>> +     and the current block is not dominating the exit block, not
>>> report
>>> +     the warning.  */
>>> +  if (is_splitted && warn_array_bounds < 2
>>> +      && !is_dominate_exit)
>>> +    return false;
>>> +
>>>    /* Empty array.  */
>>>    if (up_bound && tree_int_cst_equal (low_bound, up_bound_p1))
>>>      {
>>> @@ -386,12 +395,17 @@ array_bounds_checker::check_array_ref
>>> (location_t location, tree ref,
>>>         }
>>>      }
>>>  
>>> +  bool is_dominate_exit = dominated_by_p (CDI_DOMINATORS,
>>> +                                         EXIT_BLOCK_PTR_FOR_FN
>>> (fun),
>>> +                                         gimple_bb (stmt));
>>> +
>>>    warned = check_out_of_bounds_and_warn (location, ref,
>>>                                          low_sub_org, low_sub,
>>> up_sub,
>>>                                          up_bound, up_bound_p1,
>>> &vr,
>>>                                          ignore_off_by_one,
>>> warn_array_bounds,
>>> -                                        &out_of_bound);
>>> -
>>> +                                        &out_of_bound,
>>> +                                        gimple_is_splitted_p
>>> (stmt),
>>> +                                        is_dominate_exit);
>>>  
>>>    if (!warned && sam == special_array_member::int_0)
>>>      warned = warning_at (location, OPT_Wzero_length_bounds,
>>> @@ -476,7 +490,7 @@ array_bounds_checker::check_array_ref
>>> (location_t location, tree ref,
>>>  
>>>  bool
>>>  array_bounds_checker::check_mem_ref (location_t location, tree
>>> ref,
>>> -                                    bool ignore_off_by_one)
>>> +                                    gimple *stmt, bool
>>> ignore_off_by_one)
>>>  {
>>>    if (warning_suppressed_p (ref, OPT_Warray_bounds_))
>>>      return false;
>>> @@ -574,6 +588,16 @@ array_bounds_checker::check_mem_ref
>>> (location_t location, tree ref,
>>>         }
>>>      }
>>>  
>>> +  /* If the stmt is duplicated and splitted, the warning level is
>>> not 2,
>>> +     and the current block is not dominating the exit block, not
>>> report
>>> +     the warning.  */
>>> +  bool is_dominate_exit = dominated_by_p (CDI_DOMINATORS,
>>> +                                         EXIT_BLOCK_PTR_FOR_FN
>>> (fun),
>>> +                                         gimple_bb (stmt));
>>> +  if (gimple_is_splitted_p (stmt) && warn_array_bounds < 2
>>> +      && !is_dominate_exit)
>>> +    return false;
>>> +
>>>    bool warned = false;
>>>    if (lboob)
>>>      {
>>> @@ -654,7 +678,7 @@ array_bounds_checker::check_addr_expr
>>> (location_t location, tree t,
>>>           ignore_off_by_one = false;
>>>         }
>>>        else if (TREE_CODE (t) == MEM_REF)
>>> -       warned = check_mem_ref (location, t, ignore_off_by_one);
>>> +       warned = check_mem_ref (location, t, stmt,
>>> ignore_off_by_one);
>>>  
>>>        if (warned)
>>>         suppress_warning (t, OPT_Warray_bounds_);
>>> @@ -690,6 +714,16 @@ array_bounds_checker::check_addr_expr
>>> (location_t location, tree t,
>>>    if (!mem_ref_offset (t).is_constant (&idx))
>>>      return;
>>>  
>>> +  /* If the stmt is duplicated and splitted, the warning level is
>>> not 2,
>>> +     and the current block is not dominating the exit block, not
>>> report
>>> +     the warning.  */
>>> +  bool is_dominate_exit = dominated_by_p (CDI_DOMINATORS,
>>> +                                         EXIT_BLOCK_PTR_FOR_FN
>>> (fun),
>>> +                                         gimple_bb (stmt));
>>> +  if (gimple_is_splitted_p (stmt) && warn_array_bounds < 2
>>> +      && !is_dominate_exit)
>>> +    return;
>>> +
>>>    bool warned = false;
>>>    idx = wi::sdiv_trunc (idx, wi::to_offset (el_sz));
>>>    if (idx < 0)
>>> @@ -809,7 +843,7 @@ array_bounds_checker::check_array_bounds (tree
>>> *tp, int *walk_subtree,
>>>      warned = checker->check_array_ref (location, t, wi->stmt,
>>>                                        false/*ignore_off_by_one*/);
>>>    else if (TREE_CODE (t) == MEM_REF)
>>> -    warned = checker->check_mem_ref (location, t,
>>> +    warned = checker->check_mem_ref (location, t, wi->stmt,
>>>                                      false /*ignore_off_by_one*/);
>>>    else if (TREE_CODE (t) == ADDR_EXPR)
>>>      {
>>> diff --git a/gcc/gimple-array-bounds.h b/gcc/gimple-array-bounds.h
>>> index 3e077d0178ff..7c98f02204c9 100644
>>> --- a/gcc/gimple-array-bounds.h
>>> +++ b/gcc/gimple-array-bounds.h
>>> @@ -33,7 +33,7 @@ public:
>>>  private:
>>>    static tree check_array_bounds (tree *tp, int *walk_subtree,
>>> void *data);
>>>    bool check_array_ref (location_t, tree, gimple *, bool
>>> ignore_off_by_one);
>>> -  bool check_mem_ref (location_t, tree, bool ignore_off_by_one);
>>> +  bool check_mem_ref (location_t, tree, gimple *, bool
>>> ignore_off_by_one);
>>>    void check_addr_expr (location_t, tree, gimple *);
>>>    void get_value_range (irange &r, const_tree op, gimple *);
>>>  
>>> diff --git a/gcc/gimple.h b/gcc/gimple.h
>>> index bd315ffc2dd4..08f52d107084 100644
>>> --- a/gcc/gimple.h
>>> +++ b/gcc/gimple.h
>>> @@ -254,8 +254,8 @@ struct GTY((desc ("gimple_statement_structure
>>> (&%h)"), tag ("GSS_BASE"),
>>>    /* Nonzero if this statement contains volatile operands.  */
>>>    unsigned has_volatile_ops    : 1;
>>>  
>>> -  /* Padding to get subcode to 16 bit alignment.  */
>>> -  unsigned pad                 : 1;
>>> +  /* Nonzero if this statement is duplicated and splitted to two
>>> pathes.  */
>>> +  unsigned is_splitted         : 1;
>>>  
>>>    /* The SUBCODE field can be used for tuple-specific flags for
>>> tuples
>>>       that do not require subcodes.  Note that SUBCODE should be at
>>> @@ -2327,6 +2327,23 @@ gimple_set_has_volatile_ops (gimple *stmt,
>>> bool volatilep)
>>>      stmt->has_volatile_ops = (unsigned) volatilep;
>>>  }
>>>  
>>> +/* Return true if statement G's is_splitted field has
>>> +   been set.  */
>>> +
>>> +inline bool
>>> +gimple_is_splitted_p (const gimple *g)
>>> +{
>>> +  return (bool) g->is_splitted;
>>> +}
>>> +
>>> +/* Set the IS_SPLITTED flag to IS_SPLITTEDP.  */
>>> +
>>> +inline void
>>> +gimple_set_is_splitted (gimple *s, bool is_splittedp)
>>> +{
>>> +    s->is_splitted = (unsigned) is_splittedp;
>>> +}
>>> +
>>>  /* Return true if STMT is in a transaction.  */
>>>  
>>>  inline bool
>>> diff --git a/gcc/testsuite/gcc.dg/Warray-bounds-61.c
>>> b/gcc/testsuite/gcc.dg/Warray-bounds-61.c
>>> index 5b66cdc0aab1..cb3c64a813d7 100644
>>> --- a/gcc/testsuite/gcc.dg/Warray-bounds-61.c
>>> +++ b/gcc/testsuite/gcc.dg/Warray-bounds-61.c
>>> @@ -23,7 +23,7 @@ void test_ua3_ua0_a0 (int i)
>>>  
>>>    if (i < __LINE__)
>>>      i = 5;
>>> -  ua3_a0.a0[i] = 0;           // { dg-warning "\\\[-Warray-bounds"
>>> }
>>> +  ua3_a0.a0[i] = 0;           // { dg-bogus "\\\[-Warray-bounds" }
>>>  
>>>    if (i > -1)
>>>      i = -1;
>>> @@ -44,7 +44,7 @@ void test_ua3_ua0_a1 (int i)
>>>  
>>>    if (i > -1)
>>>      i = -1;
>>> -  ua3_a0.a1[i] = 0;           // { dg-warning "\\\[-Warray-bounds"
>>> }
>>> +  ua3_a0.a1[i] = 0;           // { dg-bogus "\\\[-Warray-bounds" }
>>>  
>>>    if (i < 7)
>>>      i = 7;
>>> @@ -60,7 +60,7 @@ void test_ua3_ua0_a2 (int i)
>>>  
>>>    if (i < __LINE__)
>>>      i = __LINE__;
>>> -  ua3_a0.a2[i] = 0;           // { dg-warning "\\\[-Warray-bounds"
>>> }
>>> +  ua3_a0.a2[i] = 0;           // { dg-bogus "\\\[-Warray-bounds" }
>>>  
>>>    if (i > -1)
>>>      i = -1;
>>> diff --git a/gcc/testsuite/gcc.dg/pr109071-1.c
>>> b/gcc/testsuite/gcc.dg/pr109071-1.c
>>> new file mode 100644
>>> index 000000000000..a405c80bd549
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.dg/pr109071-1.c
>>> @@ -0,0 +1,22 @@
>>> +/* PR tree-optimization/109071 -Warray-bounds false positive
>>> warnings
>>> +   due to code duplication from jump threading 
>>> +   { dg-do compile }
>>> +   { dg-options "-O2 -Warray-bounds=2" }
>>> + */
>>> +
>>> +extern void warn(void);
>>> +static inline void assign(int val, int *regs, int index)
>>> +{
>>> +  if (index >= 4)
>>> +    warn();
>>> +  *regs = val;
>>> +}
>>> +struct nums {int vals[4];};
>>> +
>>> +void sparx5_set (int *ptr, struct nums *sg, int index)
>>> +{
>>> +  int *val = &sg->vals[index]; /* { dg-warning "is above array
>>> bounds" } */
>>> +
>>> +  assign(0,    ptr, index);
>>> +  assign(*val, ptr, index);
>>> +}
>>> diff --git a/gcc/testsuite/gcc.dg/pr109071.c
>>> b/gcc/testsuite/gcc.dg/pr109071.c
>>> new file mode 100644
>>> index 000000000000..782dfad84ea2
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.dg/pr109071.c
>>> @@ -0,0 +1,22 @@
>>> +/* PR tree-optimization/109071 -Warray-bounds false positive
>>> warnings
>>> +   due to code duplication from jump threading 
>>> +   { dg-do compile }
>>> +   { dg-options "-O2 -Wall" }
>>> + */
>>> +
>>> +extern void warn(void);
>>> +static inline void assign(int val, int *regs, int index)
>>> +{
>>> +  if (index >= 4)
>>> +    warn();
>>> +  *regs = val;
>>> +}
>>> +struct nums {int vals[4];};
>>> +
>>> +void sparx5_set (int *ptr, struct nums *sg, int index)
>>> +{
>>> +  int *val = &sg->vals[index]; /* { dg-bogus "is above array
>>> bounds" } */
>>> +
>>> +  assign(0,    ptr, index);
>>> +  assign(*val, ptr, index);
>>> +}
>>> diff --git a/gcc/tree-ssa-threadupdate.cc b/gcc/tree-ssa-
>>> threadupdate.cc
>>> index fa61ba9512b7..9f338dd4d54d 100644
>>> --- a/gcc/tree-ssa-threadupdate.cc
>>> +++ b/gcc/tree-ssa-threadupdate.cc
>>> @@ -2371,6 +2371,17 @@
>>> back_jt_path_registry::adjust_paths_after_duplication (unsigned
>>> curr_path_num)
>>>      }
>>>  }
>>>  
>>> +/* Set all the stmts in the basic block BB as IS_SPLITTED.  */
>>> +
>>> +static void
>>> +set_stmts_in_bb_is_splitted (basic_block bb)
>>> +{
>>> +  gimple_stmt_iterator gsi;
>>> +  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
>>> +    gimple_set_is_splitted (gsi_stmt (gsi), true);
>>> +  return;
>>> +}
>>> +
>>>  /* Duplicates a jump-thread path of N_REGION basic blocks.
>>>     The ENTRY edge is redirected to the duplicate of the region.
>>>  
>>> @@ -2418,6 +2429,10 @@ back_jt_path_registry::duplicate_thread_path
>>> (edge entry,
>>>    basic_block *region_copy = XNEWVEC (basic_block, n_region);
>>>    copy_bbs (region, n_region, region_copy, &exit, 1, &exit_copy,
>>> loop,
>>>             split_edge_bb_loc (entry), false);
>>> +  /* Mark all the stmts in both original and copied basic blocks
>>> +     as IS_SPLITTED.  */
>>> +  set_stmts_in_bb_is_splitted (*region);
>>> +  set_stmts_in_bb_is_splitted (*region_copy);
>>>  
>>>    /* Fix up: copy_bbs redirects all edges pointing to copied
>>> blocks.  The
>>>       following code ensures that all the edges exiting the jump-
>>> thread path are
>>> 
>> 
>
David Malcolm May 21, 2024, 9:36 p.m. UTC | #21
On Tue, 2024-05-21 at 15:13 +0000, Qing Zhao wrote:
> Thanks for the comments and suggestions.
> 
> > On May 15, 2024, at 10:00, David Malcolm <dmalcolm@redhat.com>
> > wrote:
> > 
> > On Tue, 2024-05-14 at 15:08 +0200, Richard Biener wrote:
> > > On Mon, 13 May 2024, Qing Zhao wrote:
> > > 
> > > > -Warray-bounds is an important option to enable linux kernal to
> > > > keep
> > > > the array out-of-bound errors out of the source tree.
> > > > 
> > > > However, due to the false positive warnings reported in
> > > > PR109071
> > > > (-Warray-bounds false positive warnings due to code duplication
> > > > from
> > > > jump threading), -Warray-bounds=1 cannot be added on by
> > > > default.
> > > > 
> > > > Although it's impossible to elinimate all the false positive
> > > > warnings
> > > > from -Warray-bounds=1 (See PR104355 Misleading -Warray-bounds
> > > > documentation says "always out of bounds"), we should minimize
> > > > the
> > > > false positive warnings in -Warray-bounds=1.
> > > > 
> > > > The root reason for the false positive warnings reported in
> > > > PR109071 is:
> > > > 
> > > > When the thread jump optimization tries to reduce the # of
> > > > branches
> > > > inside the routine, sometimes it needs to duplicate the code
> > > > and
> > > > split into two conditional pathes. for example:
> > > > 
> > > > The original code:
> > > > 
> > > > void sparx5_set (int * ptr, struct nums * sg, int index)
> > > > {
> > > >   if (index >= 4)
> > > >     warn ();
> > > >   *ptr = 0;
> > > >   *val = sg->vals[index];
> > > >   if (index >= 4)
> > > >     warn ();
> > > >   *ptr = *val;
> > > > 
> > > >   return;
> > > > }
> > > > 
> > > > With the thread jump, the above becomes:
> > > > 
> > > > void sparx5_set (int * ptr, struct nums * sg, int index)
> > > > {
> > > >   if (index >= 4)
> > > >     {
> > > >       warn ();
> > > >       *ptr = 0;         // Code duplications since "warn" does
> > > > return;
> > > >       *val = sg->vals[index];   // same this line.
> > > >                                 // In this path, since it's
> > > > under
> > > > the condition
> > > >                                 // "index >= 4", the compiler
> > > > knows
> > > > the value
> > > >                                 // of "index" is larger then 4,
> > > > therefore the
> > > >                                 // out-of-bound warning.
> > > >       warn ();
> > > >     }
> > > >   else
> > > >     {
> > > >       *ptr = 0;
> > > >       *val = sg->vals[index];
> > > >     }
> > > >   *ptr = *val;
> > > >   return;
> > > > }
> > > > 
> > > > We can see, after the thread jump optimization, the # of
> > > > branches
> > > > inside
> > > > the routine "sparx5_set" is reduced from 2 to 1, however,  due
> > > > to
> > > > the
> > > > code duplication (which is needed for the correctness of the
> > > > code),
> > > > we
> > > > got a false positive out-of-bound warning.
> > > > 
> > > > In order to eliminate such false positive out-of-bound warning,
> > > > 
> > > > A. Add one more flag for GIMPLE: is_splitted.
> > > > B. During the thread jump optimization, when the basic blocks
> > > > are
> > > >    duplicated, mark all the STMTs inside the original and
> > > > duplicated
> > > >    basic blocks as "is_splitted";
> > > > C. Inside the array bound checker, add the following new
> > > > heuristic:
> > > > 
> > > > If
> > > >    1. the stmt is duplicated and splitted into two conditional
> > > > paths;
> > > > +  2. the warning level < 2;
> > > > +  3. the current block is not dominating the exit block
> > > > Then not report the warning.
> > > > 
> > > > The false positive warnings are moved from -Warray-bounds=1 to
> > > >  -Warray-bounds=2 now.
> > > > 
> > > > Bootstrapped and regression tested on both x86 and aarch64.
> > > > adjusted
> > > >  -Warray-bounds-61.c due to the false positive warnings.
> > > > 
> > > > Let me know if you have any comments and suggestions.
> > > 
> > > At the last Cauldron I talked with David Malcolm about these kind
> > > of
> > > issues and thought of instead of suppressing diagnostics to
> > > record
> > > how a block was duplicated.  For jump threading my idea was to
> > > record
> > > the condition that was proved true when entering the path and do
> > > this
> > > by recording the corresponding locations
> 
> Is only recording the location for the TRUE path  enough?
> We might need to record the corresponding locations for both TRUE and
> FALSE paths since the VRP might be more accurate on both paths. 
> Is only recording the location is enough? 
> Do we need to record the pointer to the original condition stmt?

Just to be clear: I don't plan to work on this myself (I have far too
much already to work on...); I'm assuming Richard Biener is
hoping/planning to implement this himself.

But feel free to ask me any questions about the diagnostic_path
machinery within the diagnostics subsystem.

[...snip...]

Regards
Dave
Richard Biener May 22, 2024, 7:38 a.m. UTC | #22
On Tue, May 21, 2024 at 11:36 PM David Malcolm <dmalcolm@redhat.com> wrote:
>
> On Tue, 2024-05-21 at 15:13 +0000, Qing Zhao wrote:
> > Thanks for the comments and suggestions.
> >
> > > On May 15, 2024, at 10:00, David Malcolm <dmalcolm@redhat.com>
> > > wrote:
> > >
> > > On Tue, 2024-05-14 at 15:08 +0200, Richard Biener wrote:
> > > > On Mon, 13 May 2024, Qing Zhao wrote:
> > > >
> > > > > -Warray-bounds is an important option to enable linux kernal to
> > > > > keep
> > > > > the array out-of-bound errors out of the source tree.
> > > > >
> > > > > However, due to the false positive warnings reported in
> > > > > PR109071
> > > > > (-Warray-bounds false positive warnings due to code duplication
> > > > > from
> > > > > jump threading), -Warray-bounds=1 cannot be added on by
> > > > > default.
> > > > >
> > > > > Although it's impossible to elinimate all the false positive
> > > > > warnings
> > > > > from -Warray-bounds=1 (See PR104355 Misleading -Warray-bounds
> > > > > documentation says "always out of bounds"), we should minimize
> > > > > the
> > > > > false positive warnings in -Warray-bounds=1.
> > > > >
> > > > > The root reason for the false positive warnings reported in
> > > > > PR109071 is:
> > > > >
> > > > > When the thread jump optimization tries to reduce the # of
> > > > > branches
> > > > > inside the routine, sometimes it needs to duplicate the code
> > > > > and
> > > > > split into two conditional pathes. for example:
> > > > >
> > > > > The original code:
> > > > >
> > > > > void sparx5_set (int * ptr, struct nums * sg, int index)
> > > > > {
> > > > >   if (index >= 4)
> > > > >     warn ();
> > > > >   *ptr = 0;
> > > > >   *val = sg->vals[index];
> > > > >   if (index >= 4)
> > > > >     warn ();
> > > > >   *ptr = *val;
> > > > >
> > > > >   return;
> > > > > }
> > > > >
> > > > > With the thread jump, the above becomes:
> > > > >
> > > > > void sparx5_set (int * ptr, struct nums * sg, int index)
> > > > > {
> > > > >   if (index >= 4)
> > > > >     {
> > > > >       warn ();
> > > > >       *ptr = 0;         // Code duplications since "warn" does
> > > > > return;
> > > > >       *val = sg->vals[index];   // same this line.
> > > > >                                 // In this path, since it's
> > > > > under
> > > > > the condition
> > > > >                                 // "index >= 4", the compiler
> > > > > knows
> > > > > the value
> > > > >                                 // of "index" is larger then 4,
> > > > > therefore the
> > > > >                                 // out-of-bound warning.
> > > > >       warn ();
> > > > >     }
> > > > >   else
> > > > >     {
> > > > >       *ptr = 0;
> > > > >       *val = sg->vals[index];
> > > > >     }
> > > > >   *ptr = *val;
> > > > >   return;
> > > > > }
> > > > >
> > > > > We can see, after the thread jump optimization, the # of
> > > > > branches
> > > > > inside
> > > > > the routine "sparx5_set" is reduced from 2 to 1, however,  due
> > > > > to
> > > > > the
> > > > > code duplication (which is needed for the correctness of the
> > > > > code),
> > > > > we
> > > > > got a false positive out-of-bound warning.
> > > > >
> > > > > In order to eliminate such false positive out-of-bound warning,
> > > > >
> > > > > A. Add one more flag for GIMPLE: is_splitted.
> > > > > B. During the thread jump optimization, when the basic blocks
> > > > > are
> > > > >    duplicated, mark all the STMTs inside the original and
> > > > > duplicated
> > > > >    basic blocks as "is_splitted";
> > > > > C. Inside the array bound checker, add the following new
> > > > > heuristic:
> > > > >
> > > > > If
> > > > >    1. the stmt is duplicated and splitted into two conditional
> > > > > paths;
> > > > > +  2. the warning level < 2;
> > > > > +  3. the current block is not dominating the exit block
> > > > > Then not report the warning.
> > > > >
> > > > > The false positive warnings are moved from -Warray-bounds=1 to
> > > > >  -Warray-bounds=2 now.
> > > > >
> > > > > Bootstrapped and regression tested on both x86 and aarch64.
> > > > > adjusted
> > > > >  -Warray-bounds-61.c due to the false positive warnings.
> > > > >
> > > > > Let me know if you have any comments and suggestions.
> > > >
> > > > At the last Cauldron I talked with David Malcolm about these kind
> > > > of
> > > > issues and thought of instead of suppressing diagnostics to
> > > > record
> > > > how a block was duplicated.  For jump threading my idea was to
> > > > record
> > > > the condition that was proved true when entering the path and do
> > > > this
> > > > by recording the corresponding locations
> >
> > Is only recording the location for the TRUE path  enough?
> > We might need to record the corresponding locations for both TRUE and
> > FALSE paths since the VRP might be more accurate on both paths.
> > Is only recording the location is enough?
> > Do we need to record the pointer to the original condition stmt?
>
> Just to be clear: I don't plan to work on this myself (I have far too
> much already to work on...); I'm assuming Richard Biener is
> hoping/planning to implement this himself.

While I think some of this might be an improvement to those vast
number of "false positive" diagnostics we have from (too) late diagnostic
passes I do not have the cycles to work on this.

Richard.

> But feel free to ask me any questions about the diagnostic_path
> machinery within the diagnostics subsystem.
>
> [...snip...]
>
> Regards
> Dave
>
Qing Zhao May 22, 2024, 6:53 p.m. UTC | #23
> On May 22, 2024, at 03:38, Richard Biener <richard.guenther@gmail.com> wrote:
> 
> On Tue, May 21, 2024 at 11:36 PM David Malcolm <dmalcolm@redhat.com> wrote:
>> 
>> On Tue, 2024-05-21 at 15:13 +0000, Qing Zhao wrote:
>>> Thanks for the comments and suggestions.
>>> 
>>>> On May 15, 2024, at 10:00, David Malcolm <dmalcolm@redhat.com>
>>>> wrote:
>>>> 
>>>> On Tue, 2024-05-14 at 15:08 +0200, Richard Biener wrote:
>>>>> On Mon, 13 May 2024, Qing Zhao wrote:
>>>>> 
>>>>>> -Warray-bounds is an important option to enable linux kernal to
>>>>>> keep
>>>>>> the array out-of-bound errors out of the source tree.
>>>>>> 
>>>>>> However, due to the false positive warnings reported in
>>>>>> PR109071
>>>>>> (-Warray-bounds false positive warnings due to code duplication
>>>>>> from
>>>>>> jump threading), -Warray-bounds=1 cannot be added on by
>>>>>> default.
>>>>>> 
>>>>>> Although it's impossible to elinimate all the false positive
>>>>>> warnings
>>>>>> from -Warray-bounds=1 (See PR104355 Misleading -Warray-bounds
>>>>>> documentation says "always out of bounds"), we should minimize
>>>>>> the
>>>>>> false positive warnings in -Warray-bounds=1.
>>>>>> 
>>>>>> The root reason for the false positive warnings reported in
>>>>>> PR109071 is:
>>>>>> 
>>>>>> When the thread jump optimization tries to reduce the # of
>>>>>> branches
>>>>>> inside the routine, sometimes it needs to duplicate the code
>>>>>> and
>>>>>> split into two conditional pathes. for example:
>>>>>> 
>>>>>> The original code:
>>>>>> 
>>>>>> void sparx5_set (int * ptr, struct nums * sg, int index)
>>>>>> {
>>>>>>  if (index >= 4)
>>>>>>    warn ();
>>>>>>  *ptr = 0;
>>>>>>  *val = sg->vals[index];
>>>>>>  if (index >= 4)
>>>>>>    warn ();
>>>>>>  *ptr = *val;
>>>>>> 
>>>>>>  return;
>>>>>> }
>>>>>> 
>>>>>> With the thread jump, the above becomes:
>>>>>> 
>>>>>> void sparx5_set (int * ptr, struct nums * sg, int index)
>>>>>> {
>>>>>>  if (index >= 4)
>>>>>>    {
>>>>>>      warn ();
>>>>>>      *ptr = 0;         // Code duplications since "warn" does
>>>>>> return;
>>>>>>      *val = sg->vals[index];   // same this line.
>>>>>>                                // In this path, since it's
>>>>>> under
>>>>>> the condition
>>>>>>                                // "index >= 4", the compiler
>>>>>> knows
>>>>>> the value
>>>>>>                                // of "index" is larger then 4,
>>>>>> therefore the
>>>>>>                                // out-of-bound warning.
>>>>>>      warn ();
>>>>>>    }
>>>>>>  else
>>>>>>    {
>>>>>>      *ptr = 0;
>>>>>>      *val = sg->vals[index];
>>>>>>    }
>>>>>>  *ptr = *val;
>>>>>>  return;
>>>>>> }
>>>>>> 
>>>>>> We can see, after the thread jump optimization, the # of
>>>>>> branches
>>>>>> inside
>>>>>> the routine "sparx5_set" is reduced from 2 to 1, however,  due
>>>>>> to
>>>>>> the
>>>>>> code duplication (which is needed for the correctness of the
>>>>>> code),
>>>>>> we
>>>>>> got a false positive out-of-bound warning.
>>>>>> 
>>>>>> In order to eliminate such false positive out-of-bound warning,
>>>>>> 
>>>>>> A. Add one more flag for GIMPLE: is_splitted.
>>>>>> B. During the thread jump optimization, when the basic blocks
>>>>>> are
>>>>>>   duplicated, mark all the STMTs inside the original and
>>>>>> duplicated
>>>>>>   basic blocks as "is_splitted";
>>>>>> C. Inside the array bound checker, add the following new
>>>>>> heuristic:
>>>>>> 
>>>>>> If
>>>>>>   1. the stmt is duplicated and splitted into two conditional
>>>>>> paths;
>>>>>> +  2. the warning level < 2;
>>>>>> +  3. the current block is not dominating the exit block
>>>>>> Then not report the warning.
>>>>>> 
>>>>>> The false positive warnings are moved from -Warray-bounds=1 to
>>>>>> -Warray-bounds=2 now.
>>>>>> 
>>>>>> Bootstrapped and regression tested on both x86 and aarch64.
>>>>>> adjusted
>>>>>> -Warray-bounds-61.c due to the false positive warnings.
>>>>>> 
>>>>>> Let me know if you have any comments and suggestions.
>>>>> 
>>>>> At the last Cauldron I talked with David Malcolm about these kind
>>>>> of
>>>>> issues and thought of instead of suppressing diagnostics to
>>>>> record
>>>>> how a block was duplicated.  For jump threading my idea was to
>>>>> record
>>>>> the condition that was proved true when entering the path and do
>>>>> this
>>>>> by recording the corresponding locations
>>> 
>>> Is only recording the location for the TRUE path  enough?
>>> We might need to record the corresponding locations for both TRUE and
>>> FALSE paths since the VRP might be more accurate on both paths.
>>> Is only recording the location is enough?
>>> Do we need to record the pointer to the original condition stmt?
>> 
>> Just to be clear: I don't plan to work on this myself (I have far too
>> much already to work on...); I'm assuming Richard Biener is
>> hoping/planning to implement this himself.
> 
> While I think some of this might be an improvement to those vast
> number of "false positive" diagnostics we have from (too) late diagnostic
> passes I do not have the cycles to work on this.

I can study a little bit more on how to improve the diagnostics for PR 109071 first. 

FYI, I just updated PR109071’s description as: Need more context for -Warray-bounds warnings due to code duplication from jump threading. 

As we already agreed, this is NOT a false positive. It caught a real bug in linux kernel that need to be patched in the kernel source. (See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109071#c11)

In order to add more context to the diagnostics to help the end user locate the bug accurately in their source code, compiler needs:

1. During jump threading phase, record the following information for each duplicated STMT:
	A. A pointer to the COND stmt;
	B. True/false for such COND 
2. During array out-of-bound checking phase, whenever locate an out-of-bound case, 
	A. Use a rich_location for the diagnostic;
	B. Create an instance of diagnositic_path, add events to this diagnostic_path based on the COND stmt, True/False info recorded in the STMT;
	C. Call richloc.set_path() to associate the path with the rich_location;
        D. Then issue warning with this rich_location instead of the current regular location. 

Any comment and suggestion to the above? 
Thanks.

Qing


> 
> Richard.
> 
>> But feel free to ask me any questions about the diagnostic_path
>> machinery within the diagnostics subsystem.
>> 
>> [...snip...]
>> 
>> Regards
>> Dave
Richard Biener May 23, 2024, 11:46 a.m. UTC | #24
On Wed, May 22, 2024 at 8:53 PM Qing Zhao <qing.zhao@oracle.com> wrote:
>
>
>
> > On May 22, 2024, at 03:38, Richard Biener <richard.guenther@gmail.com> wrote:
> >
> > On Tue, May 21, 2024 at 11:36 PM David Malcolm <dmalcolm@redhat.com> wrote:
> >>
> >> On Tue, 2024-05-21 at 15:13 +0000, Qing Zhao wrote:
> >>> Thanks for the comments and suggestions.
> >>>
> >>>> On May 15, 2024, at 10:00, David Malcolm <dmalcolm@redhat.com>
> >>>> wrote:
> >>>>
> >>>> On Tue, 2024-05-14 at 15:08 +0200, Richard Biener wrote:
> >>>>> On Mon, 13 May 2024, Qing Zhao wrote:
> >>>>>
> >>>>>> -Warray-bounds is an important option to enable linux kernal to
> >>>>>> keep
> >>>>>> the array out-of-bound errors out of the source tree.
> >>>>>>
> >>>>>> However, due to the false positive warnings reported in
> >>>>>> PR109071
> >>>>>> (-Warray-bounds false positive warnings due to code duplication
> >>>>>> from
> >>>>>> jump threading), -Warray-bounds=1 cannot be added on by
> >>>>>> default.
> >>>>>>
> >>>>>> Although it's impossible to elinimate all the false positive
> >>>>>> warnings
> >>>>>> from -Warray-bounds=1 (See PR104355 Misleading -Warray-bounds
> >>>>>> documentation says "always out of bounds"), we should minimize
> >>>>>> the
> >>>>>> false positive warnings in -Warray-bounds=1.
> >>>>>>
> >>>>>> The root reason for the false positive warnings reported in
> >>>>>> PR109071 is:
> >>>>>>
> >>>>>> When the thread jump optimization tries to reduce the # of
> >>>>>> branches
> >>>>>> inside the routine, sometimes it needs to duplicate the code
> >>>>>> and
> >>>>>> split into two conditional pathes. for example:
> >>>>>>
> >>>>>> The original code:
> >>>>>>
> >>>>>> void sparx5_set (int * ptr, struct nums * sg, int index)
> >>>>>> {
> >>>>>>  if (index >= 4)
> >>>>>>    warn ();
> >>>>>>  *ptr = 0;
> >>>>>>  *val = sg->vals[index];
> >>>>>>  if (index >= 4)
> >>>>>>    warn ();
> >>>>>>  *ptr = *val;
> >>>>>>
> >>>>>>  return;
> >>>>>> }
> >>>>>>
> >>>>>> With the thread jump, the above becomes:
> >>>>>>
> >>>>>> void sparx5_set (int * ptr, struct nums * sg, int index)
> >>>>>> {
> >>>>>>  if (index >= 4)
> >>>>>>    {
> >>>>>>      warn ();
> >>>>>>      *ptr = 0;         // Code duplications since "warn" does
> >>>>>> return;
> >>>>>>      *val = sg->vals[index];   // same this line.
> >>>>>>                                // In this path, since it's
> >>>>>> under
> >>>>>> the condition
> >>>>>>                                // "index >= 4", the compiler
> >>>>>> knows
> >>>>>> the value
> >>>>>>                                // of "index" is larger then 4,
> >>>>>> therefore the
> >>>>>>                                // out-of-bound warning.
> >>>>>>      warn ();
> >>>>>>    }
> >>>>>>  else
> >>>>>>    {
> >>>>>>      *ptr = 0;
> >>>>>>      *val = sg->vals[index];
> >>>>>>    }
> >>>>>>  *ptr = *val;
> >>>>>>  return;
> >>>>>> }
> >>>>>>
> >>>>>> We can see, after the thread jump optimization, the # of
> >>>>>> branches
> >>>>>> inside
> >>>>>> the routine "sparx5_set" is reduced from 2 to 1, however,  due
> >>>>>> to
> >>>>>> the
> >>>>>> code duplication (which is needed for the correctness of the
> >>>>>> code),
> >>>>>> we
> >>>>>> got a false positive out-of-bound warning.
> >>>>>>
> >>>>>> In order to eliminate such false positive out-of-bound warning,
> >>>>>>
> >>>>>> A. Add one more flag for GIMPLE: is_splitted.
> >>>>>> B. During the thread jump optimization, when the basic blocks
> >>>>>> are
> >>>>>>   duplicated, mark all the STMTs inside the original and
> >>>>>> duplicated
> >>>>>>   basic blocks as "is_splitted";
> >>>>>> C. Inside the array bound checker, add the following new
> >>>>>> heuristic:
> >>>>>>
> >>>>>> If
> >>>>>>   1. the stmt is duplicated and splitted into two conditional
> >>>>>> paths;
> >>>>>> +  2. the warning level < 2;
> >>>>>> +  3. the current block is not dominating the exit block
> >>>>>> Then not report the warning.
> >>>>>>
> >>>>>> The false positive warnings are moved from -Warray-bounds=1 to
> >>>>>> -Warray-bounds=2 now.
> >>>>>>
> >>>>>> Bootstrapped and regression tested on both x86 and aarch64.
> >>>>>> adjusted
> >>>>>> -Warray-bounds-61.c due to the false positive warnings.
> >>>>>>
> >>>>>> Let me know if you have any comments and suggestions.
> >>>>>
> >>>>> At the last Cauldron I talked with David Malcolm about these kind
> >>>>> of
> >>>>> issues and thought of instead of suppressing diagnostics to
> >>>>> record
> >>>>> how a block was duplicated.  For jump threading my idea was to
> >>>>> record
> >>>>> the condition that was proved true when entering the path and do
> >>>>> this
> >>>>> by recording the corresponding locations
> >>>
> >>> Is only recording the location for the TRUE path  enough?
> >>> We might need to record the corresponding locations for both TRUE and
> >>> FALSE paths since the VRP might be more accurate on both paths.
> >>> Is only recording the location is enough?
> >>> Do we need to record the pointer to the original condition stmt?
> >>
> >> Just to be clear: I don't plan to work on this myself (I have far too
> >> much already to work on...); I'm assuming Richard Biener is
> >> hoping/planning to implement this himself.
> >
> > While I think some of this might be an improvement to those vast
> > number of "false positive" diagnostics we have from (too) late diagnostic
> > passes I do not have the cycles to work on this.
>
> I can study a little bit more on how to improve the diagnostics for PR 109071 first.
>
> FYI, I just updated PR109071’s description as: Need more context for -Warray-bounds warnings due to code duplication from jump threading.
>
> As we already agreed, this is NOT a false positive. It caught a real bug in linux kernel that need to be patched in the kernel source. (See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109071#c11)
>
> In order to add more context to the diagnostics to help the end user locate the bug accurately in their source code, compiler needs:
>
> 1. During jump threading phase, record the following information for each duplicated STMT:
>         A. A pointer to the COND stmt;
>         B. True/false for such COND
> 2. During array out-of-bound checking phase, whenever locate an out-of-bound case,
>         A. Use a rich_location for the diagnostic;
>         B. Create an instance of diagnositic_path, add events to this diagnostic_path based on the COND stmt, True/False info recorded in the STMT;
>         C. Call richloc.set_path() to associate the path with the rich_location;
>         D. Then issue warning with this rich_location instead of the current regular location.
>
> Any comment and suggestion to the above?

I was originally thinking of using location_adhoc_data to store 1.A
and 1.B as a common bit to associate to each
copied stmt.  IIRC location_adhoc_data->data is the stmts associated
lexical block so we'd need to stuff another
'data' field into this struct, like a "copy history" where we could
then even record copied-of-copy-of-X.
locataion_adhoc_data seems imperfectly packed right now, with a 32bit
hole before 'data' and 32bit unused
at its end, so we might get away without memory use by re-ordering
discriminator before 'data' ...

Richard.

> Thanks.
>
> Qing
>
>
> >
> > Richard.
> >
> >> But feel free to ask me any questions about the diagnostic_path
> >> machinery within the diagnostics subsystem.
> >>
> >> [...snip...]
> >>
> >> Regards
> >> Dave
>
>
Qing Zhao May 23, 2024, 2:03 p.m. UTC | #25
> On May 23, 2024, at 07:46, Richard Biener <richard.guenther@gmail.com> wrote:
> 
> On Wed, May 22, 2024 at 8:53 PM Qing Zhao <qing.zhao@oracle.com> wrote:
>> 
>> 
>> 
>>> On May 22, 2024, at 03:38, Richard Biener <richard.guenther@gmail.com> wrote:
>>> 
>>> On Tue, May 21, 2024 at 11:36 PM David Malcolm <dmalcolm@redhat.com> wrote:
>>>> 
>>>> On Tue, 2024-05-21 at 15:13 +0000, Qing Zhao wrote:
>>>>> Thanks for the comments and suggestions.
>>>>> 
>>>>>> On May 15, 2024, at 10:00, David Malcolm <dmalcolm@redhat.com>
>>>>>> wrote:
>>>>>> 
>>>>>> On Tue, 2024-05-14 at 15:08 +0200, Richard Biener wrote:
>>>>>>> On Mon, 13 May 2024, Qing Zhao wrote:
>>>>>>> 
>>>>>>>> -Warray-bounds is an important option to enable linux kernal to
>>>>>>>> keep
>>>>>>>> the array out-of-bound errors out of the source tree.
>>>>>>>> 
>>>>>>>> However, due to the false positive warnings reported in
>>>>>>>> PR109071
>>>>>>>> (-Warray-bounds false positive warnings due to code duplication
>>>>>>>> from
>>>>>>>> jump threading), -Warray-bounds=1 cannot be added on by
>>>>>>>> default.
>>>>>>>> 
>>>>>>>> Although it's impossible to elinimate all the false positive
>>>>>>>> warnings
>>>>>>>> from -Warray-bounds=1 (See PR104355 Misleading -Warray-bounds
>>>>>>>> documentation says "always out of bounds"), we should minimize
>>>>>>>> the
>>>>>>>> false positive warnings in -Warray-bounds=1.
>>>>>>>> 
>>>>>>>> The root reason for the false positive warnings reported in
>>>>>>>> PR109071 is:
>>>>>>>> 
>>>>>>>> When the thread jump optimization tries to reduce the # of
>>>>>>>> branches
>>>>>>>> inside the routine, sometimes it needs to duplicate the code
>>>>>>>> and
>>>>>>>> split into two conditional pathes. for example:
>>>>>>>> 
>>>>>>>> The original code:
>>>>>>>> 
>>>>>>>> void sparx5_set (int * ptr, struct nums * sg, int index)
>>>>>>>> {
>>>>>>>> if (index >= 4)
>>>>>>>>   warn ();
>>>>>>>> *ptr = 0;
>>>>>>>> *val = sg->vals[index];
>>>>>>>> if (index >= 4)
>>>>>>>>   warn ();
>>>>>>>> *ptr = *val;
>>>>>>>> 
>>>>>>>> return;
>>>>>>>> }
>>>>>>>> 
>>>>>>>> With the thread jump, the above becomes:
>>>>>>>> 
>>>>>>>> void sparx5_set (int * ptr, struct nums * sg, int index)
>>>>>>>> {
>>>>>>>> if (index >= 4)
>>>>>>>>   {
>>>>>>>>     warn ();
>>>>>>>>     *ptr = 0;         // Code duplications since "warn" does
>>>>>>>> return;
>>>>>>>>     *val = sg->vals[index];   // same this line.
>>>>>>>>                               // In this path, since it's
>>>>>>>> under
>>>>>>>> the condition
>>>>>>>>                               // "index >= 4", the compiler
>>>>>>>> knows
>>>>>>>> the value
>>>>>>>>                               // of "index" is larger then 4,
>>>>>>>> therefore the
>>>>>>>>                               // out-of-bound warning.
>>>>>>>>     warn ();
>>>>>>>>   }
>>>>>>>> else
>>>>>>>>   {
>>>>>>>>     *ptr = 0;
>>>>>>>>     *val = sg->vals[index];
>>>>>>>>   }
>>>>>>>> *ptr = *val;
>>>>>>>> return;
>>>>>>>> }
>>>>>>>> 
>>>>>>>> We can see, after the thread jump optimization, the # of
>>>>>>>> branches
>>>>>>>> inside
>>>>>>>> the routine "sparx5_set" is reduced from 2 to 1, however,  due
>>>>>>>> to
>>>>>>>> the
>>>>>>>> code duplication (which is needed for the correctness of the
>>>>>>>> code),
>>>>>>>> we
>>>>>>>> got a false positive out-of-bound warning.
>>>>>>>> 
>>>>>>>> In order to eliminate such false positive out-of-bound warning,
>>>>>>>> 
>>>>>>>> A. Add one more flag for GIMPLE: is_splitted.
>>>>>>>> B. During the thread jump optimization, when the basic blocks
>>>>>>>> are
>>>>>>>>  duplicated, mark all the STMTs inside the original and
>>>>>>>> duplicated
>>>>>>>>  basic blocks as "is_splitted";
>>>>>>>> C. Inside the array bound checker, add the following new
>>>>>>>> heuristic:
>>>>>>>> 
>>>>>>>> If
>>>>>>>>  1. the stmt is duplicated and splitted into two conditional
>>>>>>>> paths;
>>>>>>>> +  2. the warning level < 2;
>>>>>>>> +  3. the current block is not dominating the exit block
>>>>>>>> Then not report the warning.
>>>>>>>> 
>>>>>>>> The false positive warnings are moved from -Warray-bounds=1 to
>>>>>>>> -Warray-bounds=2 now.
>>>>>>>> 
>>>>>>>> Bootstrapped and regression tested on both x86 and aarch64.
>>>>>>>> adjusted
>>>>>>>> -Warray-bounds-61.c due to the false positive warnings.
>>>>>>>> 
>>>>>>>> Let me know if you have any comments and suggestions.
>>>>>>> 
>>>>>>> At the last Cauldron I talked with David Malcolm about these kind
>>>>>>> of
>>>>>>> issues and thought of instead of suppressing diagnostics to
>>>>>>> record
>>>>>>> how a block was duplicated.  For jump threading my idea was to
>>>>>>> record
>>>>>>> the condition that was proved true when entering the path and do
>>>>>>> this
>>>>>>> by recording the corresponding locations
>>>>> 
>>>>> Is only recording the location for the TRUE path  enough?
>>>>> We might need to record the corresponding locations for both TRUE and
>>>>> FALSE paths since the VRP might be more accurate on both paths.
>>>>> Is only recording the location is enough?
>>>>> Do we need to record the pointer to the original condition stmt?
>>>> 
>>>> Just to be clear: I don't plan to work on this myself (I have far too
>>>> much already to work on...); I'm assuming Richard Biener is
>>>> hoping/planning to implement this himself.
>>> 
>>> While I think some of this might be an improvement to those vast
>>> number of "false positive" diagnostics we have from (too) late diagnostic
>>> passes I do not have the cycles to work on this.
>> 
>> I can study a little bit more on how to improve the diagnostics for PR 109071 first.
>> 
>> FYI, I just updated PR109071’s description as: Need more context for -Warray-bounds warnings due to code duplication from jump threading.
>> 
>> As we already agreed, this is NOT a false positive. It caught a real bug in linux kernel that need to be patched in the kernel source. (See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109071#c11)
>> 
>> In order to add more context to the diagnostics to help the end user locate the bug accurately in their source code, compiler needs:
>> 
>> 1. During jump threading phase, record the following information for each duplicated STMT:
>>        A. A pointer to the COND stmt;
>>        B. True/false for such COND
>> 2. During array out-of-bound checking phase, whenever locate an out-of-bound case,
>>        A. Use a rich_location for the diagnostic;
>>        B. Create an instance of diagnositic_path, add events to this diagnostic_path based on the COND stmt, True/False info recorded in the STMT;
>>        C. Call richloc.set_path() to associate the path with the rich_location;
>>        D. Then issue warning with this rich_location instead of the current regular location.
>> 
>> Any comment and suggestion to the above?
> 
> I was originally thinking of using location_adhoc_data to store 1.A
> and 1.B as a common bit to associate to each
> copied stmt.  IIRC location_adhoc_data->data is the stmts associated
> lexical block so we'd need to stuff another
> 'data' field into this struct, like a "copy history" where we could
> then even record copied-of-copy-of-X.
> locataion_adhoc_data seems imperfectly packed right now, with a 32bit
> hole before 'data' and 32bit unused
> at its end, so we might get away without memory use by re-ordering
> discriminator before 'data' ...

Is “location_adhoc_data” an available data structure in current GCC? I just searched GCC source tree, cannot find it. 

Qing
> 
> Richard.
> 
>> Thanks.
>> 
>> Qing
>> 
>> 
>>> 
>>> Richard.
>>> 
>>>> But feel free to ask me any questions about the diagnostic_path
>>>> machinery within the diagnostics subsystem.
>>>> 
>>>> [...snip...]
>>>> 
>>>> Regards
>>>> Dave
>> 
>>
David Malcolm May 23, 2024, 2:13 p.m. UTC | #26
On Thu, 2024-05-23 at 14:03 +0000, Qing Zhao wrote:

[...snip...]

> Is “location_adhoc_data” an available data structure in current GCC?
> I just searched GCC source tree, cannot find it. 

It's in libcpp/include/line-table.h; see the big comment at the top of
that file.

Dave
Qing Zhao May 23, 2024, 2:23 p.m. UTC | #27
> On May 23, 2024, at 10:13, David Malcolm <dmalcolm@redhat.com> wrote:
> 
> On Thu, 2024-05-23 at 14:03 +0000, Qing Zhao wrote:
> 
> [...snip...]
> 
>> Is “location_adhoc_data” an available data structure in current GCC?
>> I just searched GCC source tree, cannot find it.
> 
> It's in libcpp/include/line-table.h; see the big comment at the top of
> that file.

Thanks a lot.

I found it in libcpp/include/line-map.h. reading it right now.

Qing
> 
> Dave
>
diff mbox series

Patch

diff --git a/gcc/gimple-array-bounds.cc b/gcc/gimple-array-bounds.cc
index 008071cd5464..4a2975623bc1 100644
--- a/gcc/gimple-array-bounds.cc
+++ b/gcc/gimple-array-bounds.cc
@@ -264,7 +264,9 @@  check_out_of_bounds_and_warn (location_t location, tree ref,
 			      tree up_bound, tree up_bound_p1,
 			      const value_range *vr,
 			      bool ignore_off_by_one, bool for_array_bound,
-			      bool *out_of_bound)
+			      bool *out_of_bound,
+			      bool is_splitted,
+			      bool is_dominate_exit)
 {
   tree min, max;
   tree low_bound = array_ref_low_bound (ref);
@@ -273,6 +275,13 @@  check_out_of_bounds_and_warn (location_t location, tree ref,
   bool warned = false;
   *out_of_bound = false;
 
+  /* If the stmt is duplicated and splitted, the warning level is not 2,
+     and the current block is not dominating the exit block, not report
+     the warning.  */
+  if (is_splitted && warn_array_bounds < 2
+      && !is_dominate_exit)
+    return false;
+
   /* Empty array.  */
   if (up_bound && tree_int_cst_equal (low_bound, up_bound_p1))
     {
@@ -386,12 +395,17 @@  array_bounds_checker::check_array_ref (location_t location, tree ref,
 	}
     }
 
+  bool is_dominate_exit = dominated_by_p (CDI_DOMINATORS,
+					  EXIT_BLOCK_PTR_FOR_FN (fun),
+					  gimple_bb (stmt));
+
   warned = check_out_of_bounds_and_warn (location, ref,
 					 low_sub_org, low_sub, up_sub,
 					 up_bound, up_bound_p1, &vr,
 					 ignore_off_by_one, warn_array_bounds,
-					 &out_of_bound);
-
+					 &out_of_bound,
+					 gimple_is_splitted_p (stmt),
+					 is_dominate_exit);
 
   if (!warned && sam == special_array_member::int_0)
     warned = warning_at (location, OPT_Wzero_length_bounds,
@@ -476,7 +490,7 @@  array_bounds_checker::check_array_ref (location_t location, tree ref,
 
 bool
 array_bounds_checker::check_mem_ref (location_t location, tree ref,
-				     bool ignore_off_by_one)
+				     gimple *stmt, bool ignore_off_by_one)
 {
   if (warning_suppressed_p (ref, OPT_Warray_bounds_))
     return false;
@@ -574,6 +588,16 @@  array_bounds_checker::check_mem_ref (location_t location, tree ref,
 	}
     }
 
+  /* If the stmt is duplicated and splitted, the warning level is not 2,
+     and the current block is not dominating the exit block, not report
+     the warning.  */
+  bool is_dominate_exit = dominated_by_p (CDI_DOMINATORS,
+					  EXIT_BLOCK_PTR_FOR_FN (fun),
+					  gimple_bb (stmt));
+  if (gimple_is_splitted_p (stmt) && warn_array_bounds < 2
+      && !is_dominate_exit)
+    return false;
+
   bool warned = false;
   if (lboob)
     {
@@ -654,7 +678,7 @@  array_bounds_checker::check_addr_expr (location_t location, tree t,
 	  ignore_off_by_one = false;
 	}
       else if (TREE_CODE (t) == MEM_REF)
-	warned = check_mem_ref (location, t, ignore_off_by_one);
+	warned = check_mem_ref (location, t, stmt, ignore_off_by_one);
 
       if (warned)
 	suppress_warning (t, OPT_Warray_bounds_);
@@ -690,6 +714,16 @@  array_bounds_checker::check_addr_expr (location_t location, tree t,
   if (!mem_ref_offset (t).is_constant (&idx))
     return;
 
+  /* If the stmt is duplicated and splitted, the warning level is not 2,
+     and the current block is not dominating the exit block, not report
+     the warning.  */
+  bool is_dominate_exit = dominated_by_p (CDI_DOMINATORS,
+					  EXIT_BLOCK_PTR_FOR_FN (fun),
+					  gimple_bb (stmt));
+  if (gimple_is_splitted_p (stmt) && warn_array_bounds < 2
+      && !is_dominate_exit)
+    return;
+
   bool warned = false;
   idx = wi::sdiv_trunc (idx, wi::to_offset (el_sz));
   if (idx < 0)
@@ -809,7 +843,7 @@  array_bounds_checker::check_array_bounds (tree *tp, int *walk_subtree,
     warned = checker->check_array_ref (location, t, wi->stmt,
 				       false/*ignore_off_by_one*/);
   else if (TREE_CODE (t) == MEM_REF)
-    warned = checker->check_mem_ref (location, t,
+    warned = checker->check_mem_ref (location, t, wi->stmt,
 				     false /*ignore_off_by_one*/);
   else if (TREE_CODE (t) == ADDR_EXPR)
     {
diff --git a/gcc/gimple-array-bounds.h b/gcc/gimple-array-bounds.h
index 3e077d0178ff..7c98f02204c9 100644
--- a/gcc/gimple-array-bounds.h
+++ b/gcc/gimple-array-bounds.h
@@ -33,7 +33,7 @@  public:
 private:
   static tree check_array_bounds (tree *tp, int *walk_subtree, void *data);
   bool check_array_ref (location_t, tree, gimple *, bool ignore_off_by_one);
-  bool check_mem_ref (location_t, tree, bool ignore_off_by_one);
+  bool check_mem_ref (location_t, tree, gimple *, bool ignore_off_by_one);
   void check_addr_expr (location_t, tree, gimple *);
   void get_value_range (irange &r, const_tree op, gimple *);
 
diff --git a/gcc/gimple.h b/gcc/gimple.h
index bd315ffc2dd4..08f52d107084 100644
--- a/gcc/gimple.h
+++ b/gcc/gimple.h
@@ -254,8 +254,8 @@  struct GTY((desc ("gimple_statement_structure (&%h)"), tag ("GSS_BASE"),
   /* Nonzero if this statement contains volatile operands.  */
   unsigned has_volatile_ops 	: 1;
 
-  /* Padding to get subcode to 16 bit alignment.  */
-  unsigned pad			: 1;
+  /* Nonzero if this statement is duplicated and splitted to two pathes.  */
+  unsigned is_splitted		: 1;
 
   /* The SUBCODE field can be used for tuple-specific flags for tuples
      that do not require subcodes.  Note that SUBCODE should be at
@@ -2327,6 +2327,23 @@  gimple_set_has_volatile_ops (gimple *stmt, bool volatilep)
     stmt->has_volatile_ops = (unsigned) volatilep;
 }
 
+/* Return true if statement G's is_splitted field has
+   been set.  */
+
+inline bool
+gimple_is_splitted_p (const gimple *g)
+{
+  return (bool) g->is_splitted;
+}
+
+/* Set the IS_SPLITTED flag to IS_SPLITTEDP.  */
+
+inline void
+gimple_set_is_splitted (gimple *s, bool is_splittedp)
+{
+    s->is_splitted = (unsigned) is_splittedp;
+}
+
 /* Return true if STMT is in a transaction.  */
 
 inline bool
diff --git a/gcc/testsuite/gcc.dg/Warray-bounds-61.c b/gcc/testsuite/gcc.dg/Warray-bounds-61.c
index 5b66cdc0aab1..cb3c64a813d7 100644
--- a/gcc/testsuite/gcc.dg/Warray-bounds-61.c
+++ b/gcc/testsuite/gcc.dg/Warray-bounds-61.c
@@ -23,7 +23,7 @@  void test_ua3_ua0_a0 (int i)
 
   if (i < __LINE__)
     i = 5;
-  ua3_a0.a0[i] = 0;           // { dg-warning "\\\[-Warray-bounds" }
+  ua3_a0.a0[i] = 0;           // { dg-bogus "\\\[-Warray-bounds" }
 
   if (i > -1)
     i = -1;
@@ -44,7 +44,7 @@  void test_ua3_ua0_a1 (int i)
 
   if (i > -1)
     i = -1;
-  ua3_a0.a1[i] = 0;           // { dg-warning "\\\[-Warray-bounds" }
+  ua3_a0.a1[i] = 0;           // { dg-bogus "\\\[-Warray-bounds" }
 
   if (i < 7)
     i = 7;
@@ -60,7 +60,7 @@  void test_ua3_ua0_a2 (int i)
 
   if (i < __LINE__)
     i = __LINE__;
-  ua3_a0.a2[i] = 0;           // { dg-warning "\\\[-Warray-bounds" }
+  ua3_a0.a2[i] = 0;           // { dg-bogus "\\\[-Warray-bounds" }
 
   if (i > -1)
     i = -1;
diff --git a/gcc/testsuite/gcc.dg/pr109071-1.c b/gcc/testsuite/gcc.dg/pr109071-1.c
new file mode 100644
index 000000000000..a405c80bd549
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr109071-1.c
@@ -0,0 +1,22 @@ 
+/* PR tree-optimization/109071 -Warray-bounds false positive warnings
+   due to code duplication from jump threading 
+   { dg-do compile }
+   { dg-options "-O2 -Warray-bounds=2" }
+ */
+
+extern void warn(void);
+static inline void assign(int val, int *regs, int index)
+{
+  if (index >= 4)
+    warn();
+  *regs = val;
+}
+struct nums {int vals[4];};
+
+void sparx5_set (int *ptr, struct nums *sg, int index)
+{
+  int *val = &sg->vals[index]; /* { dg-warning "is above array bounds" } */
+
+  assign(0,    ptr, index);
+  assign(*val, ptr, index);
+}
diff --git a/gcc/testsuite/gcc.dg/pr109071.c b/gcc/testsuite/gcc.dg/pr109071.c
new file mode 100644
index 000000000000..782dfad84ea2
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr109071.c
@@ -0,0 +1,22 @@ 
+/* PR tree-optimization/109071 -Warray-bounds false positive warnings
+   due to code duplication from jump threading 
+   { dg-do compile }
+   { dg-options "-O2 -Wall" }
+ */
+
+extern void warn(void);
+static inline void assign(int val, int *regs, int index)
+{
+  if (index >= 4)
+    warn();
+  *regs = val;
+}
+struct nums {int vals[4];};
+
+void sparx5_set (int *ptr, struct nums *sg, int index)
+{
+  int *val = &sg->vals[index]; /* { dg-bogus "is above array bounds" } */
+
+  assign(0,    ptr, index);
+  assign(*val, ptr, index);
+}
diff --git a/gcc/tree-ssa-threadupdate.cc b/gcc/tree-ssa-threadupdate.cc
index fa61ba9512b7..9f338dd4d54d 100644
--- a/gcc/tree-ssa-threadupdate.cc
+++ b/gcc/tree-ssa-threadupdate.cc
@@ -2371,6 +2371,17 @@  back_jt_path_registry::adjust_paths_after_duplication (unsigned curr_path_num)
     }
 }
 
+/* Set all the stmts in the basic block BB as IS_SPLITTED.  */
+
+static void
+set_stmts_in_bb_is_splitted (basic_block bb)
+{
+  gimple_stmt_iterator gsi;
+  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+    gimple_set_is_splitted (gsi_stmt (gsi), true);
+  return;
+}
+
 /* Duplicates a jump-thread path of N_REGION basic blocks.
    The ENTRY edge is redirected to the duplicate of the region.
 
@@ -2418,6 +2429,10 @@  back_jt_path_registry::duplicate_thread_path (edge entry,
   basic_block *region_copy = XNEWVEC (basic_block, n_region);
   copy_bbs (region, n_region, region_copy, &exit, 1, &exit_copy, loop,
 	    split_edge_bb_loc (entry), false);
+  /* Mark all the stmts in both original and copied basic blocks
+     as IS_SPLITTED.  */
+  set_stmts_in_bb_is_splitted (*region);
+  set_stmts_in_bb_is_splitted (*region_copy);
 
   /* Fix up: copy_bbs redirects all edges pointing to copied blocks.  The
      following code ensures that all the edges exiting the jump-thread path are