diff mbox series

enhance -Warray-bounds to handle strings and excessive indices

Message ID a26a14b4-1f1e-2df1-8a17-5a8dbbb87db0@gmail.com
State New
Headers show
Series enhance -Warray-bounds to handle strings and excessive indices | expand

Commit Message

Martin Sebor Oct. 18, 2017, 3:34 a.m. UTC
While testing my latest -Wrestrict changes I noticed a number of
opportunities to improve the -Warray-bounds warning.  Attached
is a patch that implements a solution for the following subset
of these:

PR tree-optimization/82596 - missing -Warray-bounds on an out-of
   bounds index into string literal
PR tree-optimization/82588 - missing -Warray-bounds on an excessively
   large index
PR tree-optimization/82583 - missing -Warray-bounds on out-of-bounds
   inner indices

The patch also adds more detail to the -Warray-bounds diagnostics
to make it easier to see the cause of the problem.

Richard, since the changes touch tree-vrp.c, I look in particular
for your comments.

Thanks
Martin

Comments

Richard Biener Oct. 18, 2017, 10:48 a.m. UTC | #1
On Wed, Oct 18, 2017 at 5:34 AM, Martin Sebor <msebor@gmail.com> wrote:
> While testing my latest -Wrestrict changes I noticed a number of
> opportunities to improve the -Warray-bounds warning.  Attached
> is a patch that implements a solution for the following subset
> of these:
>
> PR tree-optimization/82596 - missing -Warray-bounds on an out-of
>   bounds index into string literal
> PR tree-optimization/82588 - missing -Warray-bounds on an excessively
>   large index
> PR tree-optimization/82583 - missing -Warray-bounds on out-of-bounds
>   inner indices
>
> The patch also adds more detail to the -Warray-bounds diagnostics
> to make it easier to see the cause of the problem.
>
> Richard, since the changes touch tree-vrp.c, I look in particular
> for your comments.

+      /* Accesses to trailing arrays via pointers may access storage
+        beyond the types array bounds.  For such arrays, or for flexible
+        array members as well as for other arrays of an unknown size,
+        replace the upper bound with a more permissive one that assumes
+        the size of the largest object is SSIZE_MAX.  */

I believe handling those cases are somewhat academic, but ...

+      tree eltype = TREE_TYPE (ref);
+      tree eltsize = TYPE_SIZE_UNIT (eltype);

needs to use array_ref_element_size.  Note that this size can be
non-constant in which case you now ICE so you have to check
this is an INTEGER_CST.

+      tree maxbound = TYPE_MAX_VALUE (ssizetype);

please don't use ssizetype - sizetype can be of different precision
than pointers (and thus objects).  size_type_node would come
close (maps to size_t), eventually ptrdiff_type_node is also
defined by all frontends.

+      up_bound_p1 = fold_build2 (TRUNC_DIV_EXPR, ssizetype, maxbound, eltsize);
+

int_const_binop if you insist on using trees...

+      tree arg = TREE_OPERAND (ref, 0);
+      tree_code code = TREE_CODE (arg);
+      if (code == COMPONENT_REF)
+       {
+         HOST_WIDE_INT off;
+         if (tree base = get_addr_base_and_unit_offset (ref, &off))
+           up_bound_p1 = fold_build2 (MINUS_EXPR, ssizetype, up_bound_p1,
+                                      TYPE_SIZE_UNIT (TREE_TYPE (base)));
+         else
+           return;

so this gives up on a.b[i].c.d[k] (ok, array_at_struct_end_p will be false).
simply not subtracting anyhing instead of returning would be conservatively
correct, no?  Likewise subtracting the offset of the array for all "previous"
variably indexed components with assuming the lowest value for the index.
But as above I think compensating for the offset of the array within the object
is academic ... ;)

+      else if (code == STRING_CST)
+       up_bound_p1 = build_int_cst (ssizetype, TREE_STRING_LENGTH (arg));

that one is more interesting -- why's the TYPE_DOMAIN of the STRING_CST lacking
a max value?  Iff we use build_string_literal it should have the proper type.

Thanks,
Richard.

>
> Thanks
> Martin
Martin Sebor Oct. 18, 2017, 11:19 p.m. UTC | #2
On 10/18/2017 04:48 AM, Richard Biener wrote:
> On Wed, Oct 18, 2017 at 5:34 AM, Martin Sebor <msebor@gmail.com> wrote:
>> While testing my latest -Wrestrict changes I noticed a number of
>> opportunities to improve the -Warray-bounds warning.  Attached
>> is a patch that implements a solution for the following subset
>> of these:
>>
>> PR tree-optimization/82596 - missing -Warray-bounds on an out-of
>>   bounds index into string literal
>> PR tree-optimization/82588 - missing -Warray-bounds on an excessively
>>   large index
>> PR tree-optimization/82583 - missing -Warray-bounds on out-of-bounds
>>   inner indices
>>
>> The patch also adds more detail to the -Warray-bounds diagnostics
>> to make it easier to see the cause of the problem.
>>
>> Richard, since the changes touch tree-vrp.c, I look in particular
>> for your comments.
>
> +      /* Accesses to trailing arrays via pointers may access storage
> +        beyond the types array bounds.  For such arrays, or for flexible
> +        array members as well as for other arrays of an unknown size,
> +        replace the upper bound with a more permissive one that assumes
> +        the size of the largest object is SSIZE_MAX.  */
>
> I believe handling those cases are somewhat academic, but ...

Thanks for the quick review!

I agree the SSIZE_MAX tests handle corner cases and there are
arguably more important gaps here to plug (e.g., VLAs).  Then
again, most bugs tend to lurk in corner cases of one kind or
another and these seemed like a good way for me to come up to
speed on the implementation before tackling those.  If you
have suggestions for which to dive into next I'm happy to take
them.

> +      tree eltype = TREE_TYPE (ref);
> +      tree eltsize = TYPE_SIZE_UNIT (eltype);
>
> needs to use array_ref_element_size.  Note that this size can be
> non-constant in which case you now ICE so you have to check
> this is an INTEGER_CST.

Thanks.  I did reproduce a few ICEs due to VLAs.  I've fixed
the problems and added tests for them.  One-dimensional VLAs
are now handled the same way arrays of unknown bound are.
Handling them more intelligently (i.e., taking into account
the ranges their bounds are in) and especially handling
multidimensional VLAs will take some effort.

>
> +      tree maxbound = TYPE_MAX_VALUE (ssizetype);
>
> please don't use ssizetype - sizetype can be of different precision
> than pointers (and thus objects).  size_type_node would come
> close (maps to size_t), eventually ptrdiff_type_node is also
> defined by all frontends.

Okay, I've changed it to sizetype (it would be nice if there
were a cleaner way of doing it rather than by bit-twiddling.)

ptrdiff_type would have been my first choice but it's only defined
by the C/C++ front ends and not available in the middle-end, so
the other warnings that consider object sizes deal with ssizetype
(e.g., -Walloc-size-larger-than, -Wstringop- overflow, and
-Wformat-overflow).

That said, I'm not sure I understand under what conditions
ssizetype is not the signed equivalent of sizetype.  Can you
clarify?

As an aside, at some point I would like to get away from a type
based limit in all these warnings and instead use one that can
be controlled by an option so that a user can impose a lower limit
on the maximum size of an object and have all size-related warnings
(and perhaps even optimizations) enforce it and benefit from it.

> +      up_bound_p1 = fold_build2 (TRUNC_DIV_EXPR, ssizetype, maxbound, eltsize);
> +
>
> int_const_binop if you insist on using trees...

Sure.  (I think offset_int would be more convenient but the rest
of the function uses trees so I just stuck to those to avoid
converting things back and forth or disrupting more of the code
than I had to.)

>
> +      tree arg = TREE_OPERAND (ref, 0);
> +      tree_code code = TREE_CODE (arg);
> +      if (code == COMPONENT_REF)
> +       {
> +         HOST_WIDE_INT off;
> +         if (tree base = get_addr_base_and_unit_offset (ref, &off))
> +           up_bound_p1 = fold_build2 (MINUS_EXPR, ssizetype, up_bound_p1,
> +                                      TYPE_SIZE_UNIT (TREE_TYPE (base)));
> +         else
> +           return;
>
> so this gives up on a.b[i].c.d[k] (ok, array_at_struct_end_p will be false).
> simply not subtracting anyhing instead of returning would be conservatively
> correct, no?  Likewise subtracting the offset of the array for all "previous"
> variably indexed components with assuming the lowest value for the index.
> But as above I think compensating for the offset of the array within the object
> is academic ... ;)

I was going to say yes (it gives up) but on second thought I don't
think it does.  Only the major index can be unbounded and the code
does consider the size of the sub-array when checking the major
index.  So, IIUC, I think this works correctly as is (*).  What
doesn't work is VLAs but those are a separate problem.  Let me
know if I misunderstood your question.

> +      else if (code == STRING_CST)
> +       up_bound_p1 = build_int_cst (ssizetype, TREE_STRING_LENGTH (arg));
>
> that one is more interesting -- why's the TYPE_DOMAIN of the STRING_CST lacking
> a max value?  Iff we use build_string_literal it should have the proper type.

Good question!  STRING_CST does have a domain.  The problem is
that array_at_struct_end_p() returns true for STRING_CST.  I've
added the handling to the function and removed the block above
from the latest patch.

Martin

[*] This is diagnosed:

   struct C { char d[4]; };
   struct B { struct C c; };
   struct A { struct B b[7]; };

   int f (struct A a, unsigned i, unsigned k)
   {
     if (i < 7) i = 7;
     if (k < 4) k = 4;

     return a.b[i].c.d[k];
   }

   warning: array subscript 4 is above array bounds of ‘char[4]’ 
[-Warray-bounds]
      return a.b[i].c.d[k];
             ~~~~~~~~~~^~~
   warning: array subscript 7 is above array bounds of ‘struct B[7]’ 
[-Warray-bounds]
      return a.b[i].c.d[k];
             ~~~^~~
PR tree-optimization/82596 - missing -Warray-bounds on an out-of-bounds index into string literal
PR tree-optimization/82588 - missing -Warray-bounds on a excessively large index
PR tree-optimization/82583 - missing -Warray-bounds on out-of-bounds inner indic

gcc/ChangeLog:
	PR tree-optimization/82596
	PR tree-optimization/82588
	PR tree-optimization/82583
	* tree.c (array_at_struct_end_p): Handle STRING_CST.
	* tree-vrp.c (check_array_ref): Handle flexible array members,
	string literals, and inner indices.
	(search_for_addr_array): Add detail to diagnostics.

gcc/testsuite/ChangeLog:

	PR tree-optimization/82596
	PR tree-optimization/82588
	PR tree-optimization/82583	
	* c-c++-common/Warray-bounds.c: New test.
	* gcc.dg/Warray-bounds-11.c: Adjust.
	* gcc.dg/Warray-bounds-22.c: New test.

diff --git a/gcc/testsuite/c-c++-common/Warray-bounds.c b/gcc/testsuite/c-c++-common/Warray-bounds.c
new file mode 100644
index 0000000..c100ca8
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/Warray-bounds.c
@@ -0,0 +1,239 @@
+/* PR tree-optimization/82588 - missing -Warray-bounds on an excessively
+   large index
+   { dg-do compile }
+   { dg-options "-O2 -Warray-bounds -ftrack-macro-expansion=0" }  */
+
+#define SIZE_MAX  __SIZE_MAX__
+#define SSIZE_MAX __PTRDIFF_MAX__
+#define SSIZE_MIN (-SSIZE_MAX - 1)
+
+typedef __PTRDIFF_TYPE__ ssize_t;
+typedef __SIZE_TYPE__    size_t;
+
+extern ssize_t signed_value (void)
+{
+  extern volatile ssize_t signed_value_source;
+  return signed_value_source;
+}
+
+extern size_t unsigned_value (void)
+{
+  extern volatile size_t unsigned_value_source;
+  return unsigned_value_source;
+}
+
+ssize_t signed_range (ssize_t min, ssize_t max)
+{
+  ssize_t val = signed_value ();
+  return val < min || max < val ? min : val;
+}
+
+struct AX { int n; char ax[]; };
+
+struct A1 { int i; char a1[1]; };
+struct B { int i; struct A1 a1x[]; };
+
+void sink (int, ...);
+
+#define R(min, max) signed_range (min, max)
+#define T(expr)     sink (0, expr)
+
+struct __attribute__ ((packed)) S16 { unsigned i: 16; };
+
+void farr_char (void)
+{
+  extern char ac[];
+
+  T (ac[SSIZE_MIN]);                      /* { dg-warning "array subscript -\[0-9\]+ is below array bounds of .char *\\\[]." } */
+  T (ac[-1]);                             /* { dg-warning "array subscript -1 is below array bounds" } */
+  T (ac[0]);
+
+  T (ac[SSIZE_MAX - 1]);
+  T (ac[SSIZE_MAX]);                      /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+  T (ac[SSIZE_MAX + (size_t)1]);          /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+  T (ac[SIZE_MAX]);                       /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+}
+
+void farr_s16 (void)
+{
+  extern struct S16 ax[];
+
+  T (ax[SSIZE_MIN]);                      /* { dg-warning "array subscript -\[0-9\]+ is below array bounds of .(struct )?S16 *\\\[]." } */
+  T (ax[-1]);                             /* { dg-warning "array subscript -1 is below array bounds" } */
+  T (ax[0]);
+
+  T (ax[SSIZE_MAX / 2 - 1]);
+  T (ax[SSIZE_MAX / 2]);                  /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+  T (ax[SSIZE_MAX / 2 + (size_t)1]);      /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+  T (ax[SIZE_MAX]);                       /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+}
+
+void farr_s16_7 (void)
+{
+  extern struct S16 ax_7[][7];
+
+  T (ax_7[0][SSIZE_MIN]);                 /* { dg-warning "array subscript -\[0-9\]+ is below array bounds of .(struct )?S16 *\\\[7]." } */
+  T (ax_7[0][-1]);                        /* { dg-warning "array subscript -1 is below array bounds" } */
+  T (ax_7[0][0]);
+  T (ax_7[0][7]);                        /* { dg-warning "array subscript 7 is above array bounds of .(struct )?S16 *\\\[7]." } */
+  T (ax_7[0][8]);                        /* { dg-warning "array subscript 8 is above array bounds of .(struct )?S16 *\\\[7]." } */
+
+  T (ax_7[0][SSIZE_MAX / 2]);            /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+  T (ax_7[0][SIZE_MAX]);                 /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+
+  T (ax_7[SSIZE_MIN][0]);                 /* { dg-warning "array subscript -\[0-9\]+ is below array bounds of .(struct )?S16 *\\\[]\\\[7]." } */
+  T (ax_7[-1][0]);                        /* { dg-warning "array subscript -1 is below array bounds" } */
+
+  T (ax_7[SSIZE_MAX / 2][0]);             /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+  T (ax_7[SIZE_MAX][0]);                  /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+
+  ssize_t i = R (SSIZE_MIN, -1);
+  T (ax_7[i][0]);                         /* { dg-warning "array subscript -1 is below array bounds" } */
+
+  T (ax_7[R (SSIZE_MIN, 0)][0]);
+  T (ax_7[R (-1, 0)][0]);
+  T (ax_7[R (-1, 1)][0]);
+  T (ax_7[R (-1, 7)][0]);
+  T (ax_7[R (-1, SSIZE_MAX)][0]);
+
+  T (ax_7[R ( 1, SSIZE_MAX)][0]);
+  T (ax_7[R (SSIZE_MAX / 14 - 1, SSIZE_MAX)][0]);
+
+  i = R (SSIZE_MAX / 14, SSIZE_MAX);
+  T (ax_7[i][0]);                         /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+
+  T (ax_7[0][R (SSIZE_MIN, 0)]);
+  T (ax_7[0][R (-1, 0)]);
+  T (ax_7[0][R (-1, 1)]);
+  T (ax_7[0][R (-1, 7)]);
+  T (ax_7[0][R (-1, SSIZE_MAX)]);
+  T (ax_7[0][R (-1, SSIZE_MAX)]);
+
+  T (ax_7[0][R (1, SSIZE_MAX)]);
+  T (ax_7[0][R (7, SSIZE_MAX)]);          /* { dg-warning "array subscript 7 is above array bounds" } */
+}
+
+void farr_x_5_7 (void)
+{
+  extern struct S16 a[][5][7];
+
+  T (a[0][0][-3]);                        /* { dg-warning "array subscript -3 is below array bounds of .(struct )?S16 *\\\[7]." } */
+  T (a[0][-2][0]);                        /* { dg-warning "array subscript -2 is below array bounds of .(struct )?S16 *\\\[5]\\\[7]." } */
+  T (a[-1][0][0]);                        /* { dg-warning "array subscript -1 is below array bounds of .(struct )?S16 *\\\[]\\\[5]\\\[7]." } */
+
+}
+
+
+void fax (struct AX *p)
+{
+  T (p->ax[SSIZE_MIN]);                   /* { dg-warning "array subscript -\[0-9\]+ is below array bounds" } */
+  T (p->ax[-1]);                          /* { dg-warning "array subscript -1 is below array bounds" } */
+  T (p->ax[0]);
+  T (p->ax[SSIZE_MAX - sizeof *p - 1]);
+  T (p->ax[SSIZE_MAX - sizeof *p]);       /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+  T (p->ax[SSIZE_MAX - sizeof *p + 1]);   /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+  T (p->ax[SIZE_MAX]);                    /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+}
+
+void fa1 (struct A1 *p)
+{
+  T (p->a1[SSIZE_MIN]);                   /* { dg-warning "array subscript -\[0-9\]+ is below array bounds" } */
+  T (p->a1[-1]);                          /* { dg-warning "array subscript -1 is below array bounds" } */
+  T (p->a1[0]);
+  T (p->a1[9]);
+  T (p->a1[SSIZE_MAX - sizeof *p - 1]);
+  T (p->a1[SSIZE_MAX - sizeof *p]);       /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+  T (p->a1[SSIZE_MAX - sizeof *p + 1]);   /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+  T (p->a1[SIZE_MAX]);                    /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+}
+
+void fb (struct B *p)
+{
+  T (p->a1x->a1[SSIZE_MIN]);             /* { dg-warning "array subscript -\[0-9\]+ is below array bounds" } */
+  T (p->a1x->a1[-1]);                    /* { dg-warning "array subscript -1 is below array bounds" } */
+  T (p->a1x->a1[0]);
+  T (p->a1x->a1[9]);                     /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+  T (p->a1x->a1[SSIZE_MAX - sizeof *p]); /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+  T (p->a1x->a1[SSIZE_MAX - sizeof *p + 1]); /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+  T (p->a1x->a1[SIZE_MAX]);               /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+
+  T (p->a1x[1].a1[SSIZE_MIN]);            /* { dg-warning "array subscript -\[0-9\]+ is below array bounds" } */
+  T (p->a1x[1].a1[-1]);                   /* { dg-warning "array subscript -1 is below array bounds" } */
+  T (p->a1x[1].a1[0]);
+  T (p->a1x[1].a1[9]);                    /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+  T (p->a1x[1].a1[SSIZE_MAX - sizeof *p]);/* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+  T (p->a1x[1].a1[SSIZE_MAX - sizeof *p + 1]); /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+  T (p->a1x[1].a1[SIZE_MAX]);             /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+
+  T (p->a1x[2].a1[SSIZE_MIN]);            /* { dg-warning "array subscript -\[0-9\]+ is below array bounds" } */
+  T (p->a1x[2].a1[-1]);                   /* { dg-warning "array subscript -1 is below array bounds" } */
+  T (p->a1x[2].a1[0]);
+  T (p->a1x[2].a1[9]);                    /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+  T (p->a1x[2].a1[SSIZE_MAX - sizeof *p]);/* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+  T (p->a1x[2].a1[SSIZE_MAX - sizeof *p + 1]); /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+  T (p->a1x[2].a1[SIZE_MAX]);             /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+
+  T (p->a1x[3].a1[SSIZE_MIN]);            /* { dg-warning "array subscript -\[0-9\]+ is below array bounds" } */
+  T (p->a1x[3].a1[-1]);                   /* { dg-warning "array subscript -1 is below array bounds" } */
+  T (p->a1x[3].a1[0]);
+  T (p->a1x[3].a1[9]);                    /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+
+  T (p->a1x[9].a1[0]);
+
+  enum { MAX = SSIZE_MAX / sizeof *p->a1x - sizeof *p };
+
+  T (p->a1x[SSIZE_MIN].a1);               /* { dg-warning "array subscript -\[0-9\]+ is below array bounds" } */
+  T (p->a1x[-1].a1);                      /* { dg-warning "array subscript -1 is below array bounds" } */
+  T (p->a1x[MAX].a1);
+  T (p->a1x[MAX + 2].a1);                 /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+
+  T (p->a1x[SSIZE_MAX].a1);               /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+  T (p->a1x[SIZE_MAX].a1);                /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+
+  T (p->a1x[SSIZE_MIN].a1[0]);            /* { dg-warning "array subscript -\[0-9\]+ is below array bounds" } */
+  T (p->a1x[-1].a1[0])                    /* { dg-warning "array subscript -1 is below array bounds" } */;
+  T (p->a1x[MAX - 1].a1[0]);
+  T (p->a1x[MAX].a1[0]);                  /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+  T (p->a1x[MAX + 1].a1[0]);              /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+
+  T (p->a1x[SSIZE_MAX].a1[0]);            /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+  T (p->a1x[SIZE_MAX].a1[0]);             /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+}
+
+void f_cststring (int i)
+{
+  T (""[SSIZE_MIN]);                      /* { dg-warning "array subscript -\[0-9\]+ is below array bounds of .(const )?char *\\\[1]" "string" { xfail lp64 } } */
+  T (""[SSIZE_MIN + 1]);                  /* { dg-warning "array subscript -\[0-9\]+ is below array bounds of .(const )?char *\\\[1]" "string" } */
+  T (""[-1]);                             /* { dg-warning "array subscript -1 is below array bounds of .(const )?char *\\\[1]" "string" } */
+  T (""[0]);
+  T (""[1]);                              /* { dg-warning "array subscript 1 is above array bounds of .(const )?char *\\\[1]" "string" } */
+  T ("0"[2]);                             /* { dg-warning "array subscript 2 is above array bounds of .(const )?char *\\\[2]" "string" } */
+  T ("012"[2]);
+  T ("012"[3]);
+  T ("012"[4]);                           /* { dg-warning "array subscript 4 is above array bounds of .(const )?char *\\\[4]" "string" } */
+  T ("0123"[SSIZE_MAX]);                  /* { dg-warning "array subscript \[0-9\]+ is above array bounds of .(const )?char *\\\[5]" "string" } */
+  T ("0123"[SIZE_MAX]);                   /* { dg-warning "array subscript \[0-9\]+ is above array bounds of .(const )?char *\\\[5]" "string" } */
+}
+
+void fb_strlen (struct B *p)
+{
+#define strlen __builtin_strlen
+
+  T (strlen (&p->a1x[0].a1[2]));          /* { dg-warning "array subscript \[0-9\]+ is above array bounds" "strlen" } */
+  T (strlen (p->a1x[0].a1 + 2));          /* { dg-warning "array subscript \[0-9\]+ is above array bounds" "strlen" { xfail *-*-* } } */
+}
+
+
+void f_vla (unsigned n)
+{
+  char vla[n];
+
+  T (vla[SSIZE_MIN]);                     /* { dg-warning "array subscript -\[0-9\]+ is below array bounds" "vla" } */
+  T (vla[-1]);                            /* { dg-warning "array subscript -1 is below array bounds" "vla" } */
+  T (vla[0]);
+  T (vla[1]);
+  T (vla[n - 1]);
+  /* It would be nice to diagnose this. */
+  T (vla[n]);                             /* { dg-warning "array subscript \[0-9\]+ is above array bounds" "bug 82608" { xfail *-*-*} } */
+  T (vla[SSIZE_MAX]);                     /* { dg-warning "array subscript \[0-9\]+ is above array bounds" "vla" } */
+}
diff --git a/gcc/testsuite/gcc.dg/Warray-bounds-11.c b/gcc/testsuite/gcc.dg/Warray-bounds-11.c
index 089fa00..c9fc461 100644
--- a/gcc/testsuite/gcc.dg/Warray-bounds-11.c
+++ b/gcc/testsuite/gcc.dg/Warray-bounds-11.c
@@ -57,19 +57,19 @@ struct h3b {
 
 void foo(int (*a)[3])
 {
-	(*a)[4] = 1;	/* { dg-warning "subscript is above array bound" } */
+	(*a)[4] = 1;	/* { dg-warning "subscript 4 is above array bound" } */
 	a[0][0] = 1;	// ok
 	a[1][0] = 1;	// ok
-	a[1][4] = 1;	/* { dg-warning "subscript is above array bound" } */
+	a[1][4] = 1;	/* { dg-warning "subscript 4 is above array bound" } */
 
 	int c[3] = { 0 };
 
-	c[4] = 1;	/* { dg-warning "subscript is above array bound" } */
+	c[4] = 1;	/* { dg-warning "subscript 4 is above array bound" } */
 
-	e[4] = 1;	/* { dg-warning "subscript is above array bound" } */
+	e[4] = 1;	/* { dg-warning "subscript 4 is above array bound" } */
 
 	struct f f;
-	f.f[4] = 1;	/* { dg-warning "subscript is above array bound" } */
+	f.f[4] = 1;	/* { dg-warning "subscript 4 is above array bound" } */
 
 	struct h* h = malloc(sizeof(struct h) + 3 * sizeof(int));
 	struct h0* h0 = malloc(sizeof(struct h0) + 3 * sizeof(int));
@@ -78,15 +78,15 @@ void foo(int (*a)[3])
 
 	h->j[4] = 1;	// flexible array member
 	h0->j[4] = 1;	// zero-sized array extension
-	h1->j[4] = 1;	/* { dg-warning "subscript is above array bound" } */
-	h3->j[4] = 1;	/* { dg-warning "subscript is above array bound" } */
+	h1->j[4] = 1;	/* { dg-warning "subscript 4 is above array bound" } */
+	h3->j[4] = 1;	/* { dg-warning "subscript 4 is above array bound" } */
 
 	struct h0b* h0b = malloc(sizeof(struct h) + 3 * sizeof(int));
 	struct h1b* h1b = malloc(sizeof(struct h1b) + 3 * sizeof(int));
 	struct h3b* h3b = malloc(sizeof(struct h3b));
 //	h0b->j[4] = 1;
-	h1b->j[4] = 1;;	/* { dg-warning "subscript is above array bound" } */
-	h3b->j[4] = 1;;	/* { dg-warning "subscript is above array bound" } */
+	h1b->j[4] = 1;;	/* { dg-warning "subscript 4 is above array bound" } */
+	h3b->j[4] = 1;;	/* { dg-warning "subscript 4 is above array bound" } */
 
 	// make sure nothing gets optimized away
 	bar(*a);
diff --git a/gcc/testsuite/gcc.dg/Warray-bounds-22.c b/gcc/testsuite/gcc.dg/Warray-bounds-22.c
new file mode 100644
index 0000000..ec6fc8b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/Warray-bounds-22.c
@@ -0,0 +1,62 @@
+/* PR tree-optimization/82588 - missing -Warray-bounds on an excessively
+   large index
+   { dg-do compile }
+   { dg-options "-O2 -Warray-bounds -ftrack-macro-expansion=0" }  */
+
+#define SIZE_MAX  __SIZE_MAX__
+#define SSIZE_MAX __PTRDIFF_MAX__
+#define SSIZE_MIN (-SSIZE_MAX - 1)
+
+typedef __PTRDIFF_TYPE__ ssize_t;
+typedef __SIZE_TYPE__    size_t;
+
+struct AX { int n; char ax[]; };
+
+struct A1 { int i; char a1[1]; };
+struct B { int i; struct A1 a1x[]; };
+
+void sink (int, ...);
+
+#define T(expr)   sink (0, (expr))
+
+void test_vla (unsigned m, unsigned n)
+{
+  char vla1[m];
+
+  T (vla1[SSIZE_MIN]);                    /* { dg-warning "array subscript -\[0-9\]+ is below array bounds" "vla" } */
+  T (vla1[-1]);                           /* { dg-warning "array subscript -1 is below array bounds" "vla" } */
+  T (vla1[0]);
+  T (vla1[1]);
+  T (vla1[n - 1]);
+  /* It would be nice to diagnose this. */
+  T (vla1[n]);                            /* { dg-warning "array subscript \[0-9\]+ is above array bounds" "bug 82608" { xfail *-*-*} } */
+  T (vla1[SSIZE_MAX]);                    /* { dg-warning "array subscript \[0-9\]+ is above array bounds" "vla" } */
+
+  char vla2[m][n];
+
+  T (vla2[0][SSIZE_MIN]);                 /* { dg-warning "array subscript -\[0-9\]+ is below array bounds" "vla" } */
+  T (vla2[0][-1]);                        /* { dg-warning "array subscript -1 is below array bounds" "vla" } */
+  T (vla2[0][0]);
+  T (vla2[1][1]);
+  T (vla2[m - 1][n - 1]);
+  /* It would be nice to diagnose this. */
+  T (vla2[m][0]);                         /* { dg-warning "array subscript \[0-9\]+ is above array bounds" "bug 82608" { xfail *-*-*} } */
+  T (vla2[m + 1][0]);                    /* { dg-warning "array subscript \[0-9\]+ is above array bounds" "bug 82608" { xfail *-*-*} } */
+  T (vla2[0][n]);                         /* { dg-warning "array subscript \[0-9\]+ is above array bounds" "bug 82608" { xfail *-*-*} } */
+  T (vla2[0][n + 1]);                     /* { dg-warning "array subscript \[0-9\]+ is above array bounds" "bug 82608" { xfail *-*-*} } */
+  T (vla2[m][n]);                         /* { dg-warning "array subscript \[0-9\]+ is above array bounds" "bug 82608" { xfail *-*-*} } */
+  T (vla2[m + 1][n + 1]);                 /* { dg-warning "array subscript \[0-9\]+ is above array bounds" "bug 82608" { xfail *-*-*} } */
+
+  T (vla2[0][SSIZE_MAX]);                 /* { dg-warning "array subscript \[0-9\]+ is above array bounds" "vla" } */
+  T (vla2[SSIZE_MAX][0]);                 /* { dg-warning "array subscript \[0-9\]+ is above array bounds" "vla" { xfail *-*-* } } */
+  T (vla2[SSIZE_MAX][SSIZE_MAX]);         /* { dg-warning "array subscript \[0-9\]+ is above array bounds" "vla" } */
+
+  struct VLA { char vla[n]; } x;
+
+  T (x.vla[SSIZE_MIN]);                   /* { dg-warning "array subscript -\[0-9\]+ is below array bounds" "vla" } */
+  T (x.vla[-1]);                          /* { dg-warning "array subscript -1 is below array bounds" "vla" } */
+  T (x.vla[0]);
+  T (x.vla[1]);
+  T (x.vla[n - 1]);
+  T (x.vla[SSIZE_MAX]);                   /* { dg-warning "array subscript \[0-9\]+ is above array bounds" "vla" } */
+}
diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
index 2c86b8e..7330e8a 100644
--- a/gcc/tree-vrp.c
+++ b/gcc/tree-vrp.c
@@ -42,6 +42,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimple-iterator.h"
 #include "gimple-walk.h"
 #include "tree-cfg.h"
+#include "tree-dfa.h"
 #include "tree-ssa-loop-manip.h"
 #include "tree-ssa-loop-niter.h"
 #include "tree-ssa-loop.h"
@@ -64,6 +65,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-cfgcleanup.h"
 #include "stringpool.h"
 #include "attribs.h"
+#include "builtins.h"
 
 #define VR_INITIALIZER { VR_UNDEFINED, NULL_TREE, NULL_TREE, NULL }
 
@@ -6675,26 +6677,59 @@ check_array_ref (location_t location, tree ref, bool ignore_off_by_one)
   low_sub = up_sub = TREE_OPERAND (ref, 1);
   up_bound = array_ref_up_bound (ref);
 
-  /* Can not check flexible arrays.  */
   if (!up_bound
-      || TREE_CODE (up_bound) != INTEGER_CST)
-    return;
+      || TREE_CODE (up_bound) != INTEGER_CST
+      || (warn_array_bounds < 2
+	  && array_at_struct_end_p (ref)))
+    {
+      /* Accesses to trailing arrays via pointers may access storage
+	 beyond the types array bounds.  For such arrays, or for flexible
+	 array members as well as for other arrays of an unknown size,
+	 replace the upper bound with a more permissive one that assumes
+	 the size of the largest object is SSIZE_MAX.  */
+      tree eltsize = array_ref_element_size (ref);
+
+      /* FIXME: Handle VLAs.  */
+      if (TREE_CODE (eltsize) != INTEGER_CST)
+	return;
 
-  /* Accesses to trailing arrays via pointers may access storage
-     beyond the types array bounds.  */
-  if (warn_array_bounds < 2
-      && array_at_struct_end_p (ref))
-    return;
+      tree maxbound
+	= build_int_cst (sizetype, ~(1LLU << (TYPE_PRECISION (sizetype) - 1)));
+
+      up_bound_p1 = int_const_binop (TRUNC_DIV_EXPR, maxbound, eltsize);
+
+      tree arg = TREE_OPERAND (ref, 0);
+      tree_code code = TREE_CODE (arg);
+      if (code == COMPONENT_REF)
+	{
+	  HOST_WIDE_INT off;
+	  if (tree base = get_addr_base_and_unit_offset (ref, &off))
+	    {
+	      tree size = TYPE_SIZE_UNIT (TREE_TYPE (base));
+	      if (TREE_CODE (size) == INTEGER_CST)
+		up_bound_p1 = int_const_binop (MINUS_EXPR, up_bound_p1, size);
+	    }
+	  else
+	    return;
+	}
+
+      up_bound = int_const_binop (MINUS_EXPR, up_bound_p1,
+				  build_int_cst (ssizetype, 1));
+    }
+  else
+    up_bound_p1 = int_const_binop (PLUS_EXPR, up_bound,
+				   build_int_cst (TREE_TYPE (up_bound), 1));
 
   low_bound = array_ref_low_bound (ref);
-  up_bound_p1 = int_const_binop (PLUS_EXPR, up_bound,
-				 build_int_cst (TREE_TYPE (up_bound), 1));
+
+  tree artype = TREE_TYPE (TREE_OPERAND (ref, 0));
 
   /* Empty array.  */
   if (tree_int_cst_equal (low_bound, up_bound_p1))
     {
       warning_at (location, OPT_Warray_bounds,
-		  "array subscript is above array bounds");
+		  "array subscript %E is above array bounds of %qT",
+		  low_bound, artype);
       TREE_NO_WARNING (ref) = 1;
     }
 
@@ -6718,7 +6753,8 @@ check_array_ref (location_t location, tree ref, bool ignore_off_by_one)
           && tree_int_cst_le (low_sub, low_bound))
         {
           warning_at (location, OPT_Warray_bounds,
-		      "array subscript is outside array bounds");
+		      "array subscript [%E, %E] is outside array bounds of %qT",
+		      low_sub, up_sub, artype);
           TREE_NO_WARNING (ref) = 1;
         }
     }
@@ -6734,7 +6770,8 @@ check_array_ref (location_t location, tree ref, bool ignore_off_by_one)
 	  fprintf (dump_file, "\n");
 	}
       warning_at (location, OPT_Warray_bounds,
-		  "array subscript is above array bounds");
+		  "array subscript %E is above array bounds of %qT",
+		  up_sub, artype);
       TREE_NO_WARNING (ref) = 1;
     }
   else if (TREE_CODE (low_sub) == INTEGER_CST
@@ -6747,7 +6784,8 @@ check_array_ref (location_t location, tree ref, bool ignore_off_by_one)
 	  fprintf (dump_file, "\n");
 	}
       warning_at (location, OPT_Warray_bounds,
-		  "array subscript is below array bounds");
+		  "array subscript %E is below array bounds of %qT",
+		  low_sub, artype);
       TREE_NO_WARNING (ref) = 1;
     }
 }
@@ -6802,7 +6840,8 @@ search_for_addr_array (tree t, location_t location)
 	      fprintf (dump_file, "\n");
 	    }
 	  warning_at (location, OPT_Warray_bounds,
-		      "array subscript is below array bounds");
+		      "array subscript %wi is below array bounds of %qT",
+		      idx.to_shwi (), TREE_TYPE (tem));
 	  TREE_NO_WARNING (t) = 1;
 	}
       else if (idx > (wi::to_offset (up_bound)
@@ -6815,7 +6854,8 @@ search_for_addr_array (tree t, location_t location)
 	      fprintf (dump_file, "\n");
 	    }
 	  warning_at (location, OPT_Warray_bounds,
-		      "array subscript is above array bounds");
+		      "array subscript %wu is above array bounds of %qT",
+		      idx.to_uhwi (), TREE_TYPE (tem));
 	  TREE_NO_WARNING (t) = 1;
 	}
     }
diff --git a/gcc/tree.c b/gcc/tree.c
index cd77f08..4cf5f66 100644
--- a/gcc/tree.c
+++ b/gcc/tree.c
@@ -12515,6 +12515,9 @@ array_at_struct_end_p (tree ref)
   else
     return false;
 
+  if (TREE_CODE (ref) == STRING_CST)
+    return false;
+
   while (handled_component_p (ref))
     {
       /* If the reference chain contains a component reference to a
Richard Biener Oct. 19, 2017, 8:34 a.m. UTC | #3
On Thu, Oct 19, 2017 at 1:19 AM, Martin Sebor <msebor@gmail.com> wrote:
> On 10/18/2017 04:48 AM, Richard Biener wrote:
>>
>> On Wed, Oct 18, 2017 at 5:34 AM, Martin Sebor <msebor@gmail.com> wrote:
>>>
>>> While testing my latest -Wrestrict changes I noticed a number of
>>> opportunities to improve the -Warray-bounds warning.  Attached
>>> is a patch that implements a solution for the following subset
>>> of these:
>>>
>>> PR tree-optimization/82596 - missing -Warray-bounds on an out-of
>>>   bounds index into string literal
>>> PR tree-optimization/82588 - missing -Warray-bounds on an excessively
>>>   large index
>>> PR tree-optimization/82583 - missing -Warray-bounds on out-of-bounds
>>>   inner indices
>>>
>>> The patch also adds more detail to the -Warray-bounds diagnostics
>>> to make it easier to see the cause of the problem.
>>>
>>> Richard, since the changes touch tree-vrp.c, I look in particular
>>> for your comments.
>>
>>
>> +      /* Accesses to trailing arrays via pointers may access storage
>> +        beyond the types array bounds.  For such arrays, or for flexible
>> +        array members as well as for other arrays of an unknown size,
>> +        replace the upper bound with a more permissive one that assumes
>> +        the size of the largest object is SSIZE_MAX.  */
>>
>> I believe handling those cases are somewhat academic, but ...
>
>
> Thanks for the quick review!
>
> I agree the SSIZE_MAX tests handle corner cases and there are
> arguably more important gaps here to plug (e.g., VLAs).  Then
> again, most bugs tend to lurk in corner cases of one kind or
> another and these seemed like a good way for me to come up to
> speed on the implementation before tackling those.  If you
> have suggestions for which to dive into next I'm happy to take
> them.
>
>> +      tree eltype = TREE_TYPE (ref);
>> +      tree eltsize = TYPE_SIZE_UNIT (eltype);
>>
>> needs to use array_ref_element_size.  Note that this size can be
>> non-constant in which case you now ICE so you have to check
>> this is an INTEGER_CST.
>
>
> Thanks.  I did reproduce a few ICEs due to VLAs.  I've fixed
> the problems and added tests for them.  One-dimensional VLAs
> are now handled the same way arrays of unknown bound are.
> Handling them more intelligently (i.e., taking into account
> the ranges their bounds are in) and especially handling
> multidimensional VLAs will take some effort.
>
>>
>> +      tree maxbound = TYPE_MAX_VALUE (ssizetype);
>>
>> please don't use ssizetype - sizetype can be of different precision
>> than pointers (and thus objects).  size_type_node would come
>> close (maps to size_t), eventually ptrdiff_type_node is also
>> defined by all frontends.
>
>
> Okay, I've changed it to sizetype (it would be nice if there
> were a cleaner way of doing it rather than by bit-twiddling.)
>
> ptrdiff_type would have been my first choice but it's only defined
> by the C/C++ front ends and not available in the middle-end, so
> the other warnings that consider object sizes deal with ssizetype
> (e.g., -Walloc-size-larger-than, -Wstringop- overflow, and
> -Wformat-overflow).
>
> That said, I'm not sure I understand under what conditions
> ssizetype is not the signed equivalent of sizetype.  Can you
> clarify?

I meant to use size_type_node (size_t), not sizetype.  But
I just checked that ptrdiff_type_node is initialized in
build_common_tree_nodes and thus always available.

> As an aside, at some point I would like to get away from a type
> based limit in all these warnings and instead use one that can
> be controlled by an option so that a user can impose a lower limit
> on the maximum size of an object and have all size-related warnings
> (and perhaps even optimizations) enforce it and benefit from it.

You could add a --param that is initialized from ptrdiff_type_node.

>> +      up_bound_p1 = fold_build2 (TRUNC_DIV_EXPR, ssizetype, maxbound,
>> eltsize);
>> +
>>
>> int_const_binop if you insist on using trees...
>
>
> Sure.  (I think offset_int would be more convenient but the rest
> of the function uses trees so I just stuck to those to avoid
> converting things back and forth or disrupting more of the code
> than I had to.)

Understood.

>>
>> +      tree arg = TREE_OPERAND (ref, 0);
>> +      tree_code code = TREE_CODE (arg);
>> +      if (code == COMPONENT_REF)
>> +       {
>> +         HOST_WIDE_INT off;
>> +         if (tree base = get_addr_base_and_unit_offset (ref, &off))
>> +           up_bound_p1 = fold_build2 (MINUS_EXPR, ssizetype, up_bound_p1,
>> +                                      TYPE_SIZE_UNIT (TREE_TYPE (base)));
>> +         else
>> +           return;
>>
>> so this gives up on a.b[i].c.d[k] (ok, array_at_struct_end_p will be
>> false).
>> simply not subtracting anyhing instead of returning would be
>> conservatively
>> correct, no?  Likewise subtracting the offset of the array for all
>> "previous"
>> variably indexed components with assuming the lowest value for the index.
>> But as above I think compensating for the offset of the array within the
>> object
>> is academic ... ;)
>
>
> I was going to say yes (it gives up) but on second thought I don't
> think it does.  Only the major index can be unbounded and the code
> does consider the size of the sub-array when checking the major
> index.  So, IIUC, I think this works correctly as is (*).  What
> doesn't work is VLAs but those are a separate problem.  Let me
> know if I misunderstood your question.

get_addr_base_and_unit_offset will return NULL if there's any variable
component in 'ref'.  So as written it seems to be dead code (you
want to pass 'arg'?)

I was asking you to remove the 'else return' because w/o subtracting
the upper bound is just more conservative.

>
>> +      else if (code == STRING_CST)
>> +       up_bound_p1 = build_int_cst (ssizetype, TREE_STRING_LENGTH (arg));
>>
>> that one is more interesting -- why's the TYPE_DOMAIN of the STRING_CST
>> lacking
>> a max value?  Iff we use build_string_literal it should have the proper
>> type.
>
>
> Good question!  STRING_CST does have a domain.  The problem is
> that array_at_struct_end_p() returns true for STRING_CST.  I've
> added the handling to the function and removed the block above
> from the latest patch.

Can you split out the STRING_CST handling and commit that separately
(split the testcase)?  That part looks ok.

Thanks,
Richard.

> Martin
>
> [*] This is diagnosed:
>
>   struct C { char d[4]; };
>   struct B { struct C c; };
>   struct A { struct B b[7]; };
>
>   int f (struct A a, unsigned i, unsigned k)
>   {
>     if (i < 7) i = 7;
>     if (k < 4) k = 4;
>
>     return a.b[i].c.d[k];
>   }
>
>   warning: array subscript 4 is above array bounds of ‘char[4]’
> [-Warray-bounds]
>      return a.b[i].c.d[k];
>             ~~~~~~~~~~^~~
>   warning: array subscript 7 is above array bounds of ‘struct B[7]’
> [-Warray-bounds]
>      return a.b[i].c.d[k];
>             ~~~^~~
>
Martin Sebor Oct. 19, 2017, 4:12 p.m. UTC | #4
>> Good question!  STRING_CST does have a domain.  The problem is
>> that array_at_struct_end_p() returns true for STRING_CST.  I've
>> added the handling to the function and removed the block above
>> from the latest patch.
>
> Can you split out the STRING_CST handling and commit that separately
> (split the testcase)?  That part looks ok.

I've committed r253902.  It turns out, however, that this subset
of the patch doesn't fix the whole problem.  What's still missing
is the handling of:

   int g (int i)
   {
     return (i < 0 ? ABC : DEF)[7];   // missing -Warray-bounds
   }

Surprisingly, this happens to work in C++ but not in C (which
is in contrast to bug 82609 where it's the other way around).
The root cause is that the expression is represented as
a MEM_REF(ADDR_EXPR (STRING_CST)) and check_array_bounds()
only considers ARRAY_REF and ADDR_EXPR.

I will submit a separate patch for that, perhaps along with
one for bug 82612 that you commented on this morning (missing
-Warray-bounds on a non-zero offset from the address of a non-
array object).

Martin
Martin Sebor Oct. 19, 2017, 11 p.m. UTC | #5
On 10/19/2017 02:34 AM, Richard Biener wrote:
> On Thu, Oct 19, 2017 at 1:19 AM, Martin Sebor <msebor@gmail.com> wrote:
>> On 10/18/2017 04:48 AM, Richard Biener wrote:
>>>
>>> On Wed, Oct 18, 2017 at 5:34 AM, Martin Sebor <msebor@gmail.com> wrote:
>>>>
>>>> While testing my latest -Wrestrict changes I noticed a number of
>>>> opportunities to improve the -Warray-bounds warning.  Attached
>>>> is a patch that implements a solution for the following subset
>>>> of these:
>>>>
>>>> PR tree-optimization/82596 - missing -Warray-bounds on an out-of
>>>>   bounds index into string literal
>>>> PR tree-optimization/82588 - missing -Warray-bounds on an excessively
>>>>   large index
>>>> PR tree-optimization/82583 - missing -Warray-bounds on out-of-bounds
>>>>   inner indices

> I meant to use size_type_node (size_t), not sizetype.  But
> I just checked that ptrdiff_type_node is initialized in
> build_common_tree_nodes and thus always available.

I see.  Using ptrdiff_type_node is preferable for the targets
where ptrdiff_t has a greater precision than size_t (e.g., VMS).
It makes sense now.  I should remember to change all the other
places where I introduced ssizetype to use ptrdiff_type_node.

>
>> As an aside, at some point I would like to get away from a type
>> based limit in all these warnings and instead use one that can
>> be controlled by an option so that a user can impose a lower limit
>> on the maximum size of an object and have all size-related warnings
>> (and perhaps even optimizations) enforce it and benefit from it.
>
> You could add a --param that is initialized from ptrdiff_type_node.

Yes, that's an option to consider.  Thanks.

>
>>> +      tree arg = TREE_OPERAND (ref, 0);
>>> +      tree_code code = TREE_CODE (arg);
>>> +      if (code == COMPONENT_REF)
>>> +       {
>>> +         HOST_WIDE_INT off;
>>> +         if (tree base = get_addr_base_and_unit_offset (ref, &off))
>>> +           up_bound_p1 = fold_build2 (MINUS_EXPR, ssizetype, up_bound_p1,
>>> +                                      TYPE_SIZE_UNIT (TREE_TYPE (base)));
>>> +         else
>>> +           return;
>>>
>>> so this gives up on a.b[i].c.d[k] (ok, array_at_struct_end_p will be
>>> false).
>>> simply not subtracting anyhing instead of returning would be
>>> conservatively
>>> correct, no?  Likewise subtracting the offset of the array for all
>>> "previous"
>>> variably indexed components with assuming the lowest value for the index.
>>> But as above I think compensating for the offset of the array within the
>>> object
>>> is academic ... ;)
>>
>>
>> I was going to say yes (it gives up) but on second thought I don't
>> think it does.  Only the major index can be unbounded and the code
>> does consider the size of the sub-array when checking the major
>> index.  So, IIUC, I think this works correctly as is (*).  What
>> doesn't work is VLAs but those are a separate problem.  Let me
>> know if I misunderstood your question.
>
> get_addr_base_and_unit_offset will return NULL if there's any variable
> component in 'ref'.  So as written it seems to be dead code (you
> want to pass 'arg'?)

Sorry, I'm not sure I understand what you mean.  What do you think
is dead code?  The call to get_addr_base_and_unit_offset() is also
made for an array of unspecified bound (up_bound is null) and for
an array at the end of a struct.  For those the function returns
non-null, and for the others (arrays of runtime bound) it returns
null.  (I passed arg instead of ref but I see no difference in
my tests.)

>
> I was asking you to remove the 'else return' because w/o subtracting
> the upper bound is just more conservative.

Sure, that sounds good.

Attached is an updated patch.

Thanks
Martin
PR tree-optimization/82588 - missing -Warray-bounds on a excessively large index
PR tree-optimization/82583 - missing -Warray-bounds on out-of-bounds inner indic

gcc/ChangeLog:
	PR tree-optimization/82588
	PR tree-optimization/82583
	* tree-vrp.c (check_array_ref): Handle flexible array members,
	string literals, and inner indices.
	(search_for_addr_array): Add detail to diagnostics.

gcc/testsuite/ChangeLog:
	PR tree-optimization/82588
	PR tree-optimization/82583	
	* c-c++-common/Warray-bounds.c: New test.
	* gcc.dg/Warray-bounds-11.c: Adjust.
	* gcc.dg/Warray-bounds-22.c: New test.

diff --git a/gcc/testsuite/c-c++-common/Warray-bounds.c b/gcc/testsuite/c-c++-common/Warray-bounds.c
new file mode 100644
index 0000000..2207999
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/Warray-bounds.c
@@ -0,0 +1,240 @@
+/* PR tree-optimization/82588 - missing -Warray-bounds on an excessively
+   large index
+   { dg-do compile }
+   { dg-options "-O2 -Warray-bounds -ftrack-macro-expansion=0" }  */
+
+#define SIZE_MAX  __SIZE_MAX__
+#define SSIZE_MAX __PTRDIFF_MAX__
+#define SSIZE_MIN (-SSIZE_MAX - 1)
+
+typedef __PTRDIFF_TYPE__ ssize_t;
+typedef __SIZE_TYPE__    size_t;
+
+extern ssize_t signed_value (void)
+{
+  extern volatile ssize_t signed_value_source;
+  return signed_value_source;
+}
+
+extern size_t unsigned_value (void)
+{
+  extern volatile size_t unsigned_value_source;
+  return unsigned_value_source;
+}
+
+ssize_t signed_range (ssize_t min, ssize_t max)
+{
+  ssize_t val = signed_value ();
+  return val < min || max < val ? min : val;
+}
+
+struct AX { int n; char ax[]; };
+
+struct A1 { int i; char a1[1]; };
+struct B { int i; struct A1 a1x[]; };
+
+void sink (int, ...);
+
+#define R(min, max) signed_range (min, max)
+#define T(expr)     sink (0, expr)
+
+struct __attribute__ ((packed)) S16 { unsigned i: 16; };
+
+void farr_char (void)
+{
+  extern char ac[];
+
+  T (ac[SSIZE_MIN]);                      /* { dg-warning "array subscript -\[0-9\]+ is below array bounds of .char *\\\[]." } */
+  T (ac[-1]);                             /* { dg-warning "array subscript -1 is below array bounds" } */
+  T (ac[0]);
+
+  T (ac[SSIZE_MAX - 1]);
+  T (ac[SSIZE_MAX]);                      /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+  T (ac[SSIZE_MAX + (size_t)1]);          /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+  T (ac[SIZE_MAX]);                       /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+}
+
+void farr_s16 (void)
+{
+  extern struct S16 ax[];
+
+  T (ax[SSIZE_MIN]);                      /* { dg-warning "array subscript -\[0-9\]+ is below array bounds of .(struct )?S16 *\\\[]." } */
+  T (ax[-1]);                             /* { dg-warning "array subscript -1 is below array bounds" } */
+  T (ax[0]);
+
+  T (ax[SSIZE_MAX / 2 - 1]);
+  T (ax[SSIZE_MAX / 2]);                  /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+  T (ax[SSIZE_MAX / 2 + (size_t)1]);      /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+  T (ax[SIZE_MAX]);                       /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+}
+
+void farr_s16_7 (void)
+{
+  extern struct S16 ax_7[][7];
+
+  T (ax_7[0][SSIZE_MIN]);                 /* { dg-warning "array subscript -\[0-9\]+ is below array bounds of .(struct )?S16 *\\\[7]." } */
+  T (ax_7[0][-1]);                        /* { dg-warning "array subscript -1 is below array bounds" } */
+  T (ax_7[0][0]);
+  T (ax_7[0][7]);                        /* { dg-warning "array subscript 7 is above array bounds of .(struct )?S16 *\\\[7]." } */
+  T (ax_7[0][8]);                        /* { dg-warning "array subscript 8 is above array bounds of .(struct )?S16 *\\\[7]." } */
+
+  T (ax_7[0][SSIZE_MAX / 2]);            /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+  T (ax_7[0][SIZE_MAX]);                 /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+
+  T (ax_7[SSIZE_MIN][0]);                 /* { dg-warning "array subscript -\[0-9\]+ is below array bounds of .(struct )?S16 *\\\[]\\\[7]." } */
+  T (ax_7[-1][0]);                        /* { dg-warning "array subscript -1 is below array bounds" } */
+
+  T (ax_7[SSIZE_MAX / 2][0]);             /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+  T (ax_7[SIZE_MAX][0]);                  /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+
+  ssize_t i = R (SSIZE_MIN, -1);
+  T (ax_7[i][0]);                         /* { dg-warning "array subscript -1 is below array bounds" } */
+
+  T (ax_7[R (SSIZE_MIN, 0)][0]);
+  T (ax_7[R (-1, 0)][0]);
+  T (ax_7[R (-1, 1)][0]);
+  T (ax_7[R (-1, 7)][0]);
+  T (ax_7[R (-1, SSIZE_MAX)][0]);
+
+  T (ax_7[R ( 1, SSIZE_MAX)][0]);
+  T (ax_7[R (SSIZE_MAX / 14 - 1, SSIZE_MAX)][0]);
+
+  i = R (SSIZE_MAX / 14, SSIZE_MAX);
+  T (ax_7[i][0]);                         /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+
+  T (ax_7[0][R (SSIZE_MIN, 0)]);
+  T (ax_7[0][R (-1, 0)]);
+  T (ax_7[0][R (-1, 1)]);
+  T (ax_7[0][R (-1, 7)]);
+  T (ax_7[0][R (-1, SSIZE_MAX)]);
+  T (ax_7[0][R (-1, SSIZE_MAX)]);
+
+  T (ax_7[0][R (1, SSIZE_MAX)]);
+  T (ax_7[0][R (7, SSIZE_MAX)]);          /* { dg-warning "array subscript 7 is above array bounds" } */
+
+}
+
+void farr_x_5_7 (void)
+{
+  extern struct S16 a[][5][7];
+
+  T (a[0][0][-3]);                        /* { dg-warning "array subscript -3 is below array bounds of .(struct )?S16 *\\\[7]." } */
+  T (a[0][-2][0]);                        /* { dg-warning "array subscript -2 is below array bounds of .(struct )?S16 *\\\[5]\\\[7]." } */
+  T (a[-1][0][0]);                        /* { dg-warning "array subscript -1 is below array bounds of .(struct )?S16 *\\\[]\\\[5]\\\[7]." } */
+
+}
+
+
+void fax (struct AX *p)
+{
+  T (p->ax[SSIZE_MIN]);                   /* { dg-warning "array subscript -\[0-9\]+ is below array bounds" } */
+  T (p->ax[-1]);                          /* { dg-warning "array subscript -1 is below array bounds" } */
+  T (p->ax[0]);
+  T (p->ax[SSIZE_MAX - sizeof *p - 1]);
+  T (p->ax[SSIZE_MAX - sizeof *p]);       /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+  T (p->ax[SSIZE_MAX - sizeof *p + 1]);   /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+  T (p->ax[SIZE_MAX]);                    /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+}
+
+void fa1 (struct A1 *p)
+{
+  T (p->a1[SSIZE_MIN]);                   /* { dg-warning "array subscript -\[0-9\]+ is below array bounds" } */
+  T (p->a1[-1]);                          /* { dg-warning "array subscript -1 is below array bounds" } */
+  T (p->a1[0]);
+  T (p->a1[9]);
+  T (p->a1[SSIZE_MAX - sizeof *p - 1]);
+  T (p->a1[SSIZE_MAX - sizeof *p]);       /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+  T (p->a1[SSIZE_MAX - sizeof *p + 1]);   /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+  T (p->a1[SIZE_MAX]);                    /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+}
+
+void fb (struct B *p)
+{
+  T (p->a1x->a1[SSIZE_MIN]);             /* { dg-warning "array subscript -\[0-9\]+ is below array bounds" } */
+  T (p->a1x->a1[-1]);                    /* { dg-warning "array subscript -1 is below array bounds" } */
+  T (p->a1x->a1[0]);
+  T (p->a1x->a1[9]);                     /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+  T (p->a1x->a1[SSIZE_MAX - sizeof *p]); /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+  T (p->a1x->a1[SSIZE_MAX - sizeof *p + 1]); /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+  T (p->a1x->a1[SIZE_MAX]);               /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+
+  T (p->a1x[1].a1[SSIZE_MIN]);            /* { dg-warning "array subscript -\[0-9\]+ is below array bounds" } */
+  T (p->a1x[1].a1[-1]);                   /* { dg-warning "array subscript -1 is below array bounds" } */
+  T (p->a1x[1].a1[0]);
+  T (p->a1x[1].a1[9]);                    /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+  T (p->a1x[1].a1[SSIZE_MAX - sizeof *p]);/* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+  T (p->a1x[1].a1[SSIZE_MAX - sizeof *p + 1]); /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+  T (p->a1x[1].a1[SIZE_MAX]);             /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+
+  T (p->a1x[2].a1[SSIZE_MIN]);            /* { dg-warning "array subscript -\[0-9\]+ is below array bounds" } */
+  T (p->a1x[2].a1[-1]);                   /* { dg-warning "array subscript -1 is below array bounds" } */
+  T (p->a1x[2].a1[0]);
+  T (p->a1x[2].a1[9]);                    /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+  T (p->a1x[2].a1[SSIZE_MAX - sizeof *p]);/* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+  T (p->a1x[2].a1[SSIZE_MAX - sizeof *p + 1]); /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+  T (p->a1x[2].a1[SIZE_MAX]);             /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+
+  T (p->a1x[3].a1[SSIZE_MIN]);            /* { dg-warning "array subscript -\[0-9\]+ is below array bounds" } */
+  T (p->a1x[3].a1[-1]);                   /* { dg-warning "array subscript -1 is below array bounds" } */
+  T (p->a1x[3].a1[0]);
+  T (p->a1x[3].a1[9]);                    /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+
+  T (p->a1x[9].a1[0]);
+
+  enum { MAX = SSIZE_MAX / sizeof *p->a1x - sizeof *p };
+
+  T (p->a1x[SSIZE_MIN].a1);               /* { dg-warning "array subscript -\[0-9\]+ is below array bounds" } */
+  T (p->a1x[-1].a1);                      /* { dg-warning "array subscript -1 is below array bounds" } */
+  T (p->a1x[MAX].a1);
+  T (p->a1x[MAX + 2].a1);                 /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+
+  T (p->a1x[SSIZE_MAX].a1);               /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+  T (p->a1x[SIZE_MAX].a1);                /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+
+  T (p->a1x[SSIZE_MIN].a1[0]);            /* { dg-warning "array subscript -\[0-9\]+ is below array bounds" } */
+  T (p->a1x[-1].a1[0])                    /* { dg-warning "array subscript -1 is below array bounds" } */;
+  T (p->a1x[MAX - 1].a1[0]);
+  T (p->a1x[MAX].a1[0]);                  /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+  T (p->a1x[MAX + 1].a1[0]);              /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+
+  T (p->a1x[SSIZE_MAX].a1[0]);            /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+  T (p->a1x[SIZE_MAX].a1[0]);             /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+}
+
+void f_cststring (int i)
+{
+  T (""[SSIZE_MIN]);                      /* { dg-warning "array subscript -\[0-9\]+ is below array bounds of .(const )?char *\\\[1]" "string" { xfail lp64 } } */
+  T (""[SSIZE_MIN + 1]);                  /* { dg-warning "array subscript -\[0-9\]+ is below array bounds of .(const )?char *\\\[1]" "string" } */
+  T (""[-1]);                             /* { dg-warning "array subscript -1 is below array bounds of .(const )?char *\\\[1]" "string" } */
+  T (""[0]);
+  T (""[1]);                              /* { dg-warning "array subscript 1 is above array bounds of .(const )?char *\\\[1]" "string" } */
+  T ("0"[2]);                             /* { dg-warning "array subscript 2 is above array bounds of .(const )?char *\\\[2]" "string" } */
+  T ("012"[2]);
+  T ("012"[3]);
+  T ("012"[4]);                           /* { dg-warning "array subscript 4 is above array bounds of .(const )?char *\\\[4]" "string" } */
+  T ("0123"[SSIZE_MAX]);                  /* { dg-warning "array subscript \[0-9\]+ is above array bounds of .(const )?char *\\\[5]" "string" } */
+  T ("0123"[SIZE_MAX]);                   /* { dg-warning "array subscript \[0-9\]+ is above array bounds of .(const )?char *\\\[5]" "string" } */
+}
+
+void fb_strlen (struct B *p)
+{
+#define strlen __builtin_strlen
+
+  T (strlen (&p->a1x[0].a1[2]));          /* { dg-warning "array subscript \[0-9\]+ is above array bounds" "strlen" } */
+  T (strlen (p->a1x[0].a1 + 2));          /* { dg-warning "array subscript \[0-9\]+ is above array bounds" "strlen" { xfail *-*-* } } */
+}
+
+
+void f_vla (unsigned n)
+{
+  char vla[n];
+
+  T (vla[SSIZE_MIN]);                     /* { dg-warning "array subscript -\[0-9\]+ is below array bounds" "vla" } */
+  T (vla[-1]);                            /* { dg-warning "array subscript -1 is below array bounds" "vla" } */
+  T (vla[0]);
+  T (vla[1]);
+  T (vla[n - 1]);
+  /* It would be nice to diagnose this. */
+  T (vla[n]);                             /* { dg-warning "array subscript \[0-9\]+ is above array bounds" "bug 82608" { xfail *-*-*} } */
+  T (vla[SSIZE_MAX]);                     /* { dg-warning "array subscript \[0-9\]+ is above array bounds" "vla" } */
+}
diff --git a/gcc/testsuite/gcc.dg/Warray-bounds-11.c b/gcc/testsuite/gcc.dg/Warray-bounds-11.c
index 089fa00..c9fc461 100644
--- a/gcc/testsuite/gcc.dg/Warray-bounds-11.c
+++ b/gcc/testsuite/gcc.dg/Warray-bounds-11.c
@@ -57,19 +57,19 @@ struct h3b {
 
 void foo(int (*a)[3])
 {
-	(*a)[4] = 1;	/* { dg-warning "subscript is above array bound" } */
+	(*a)[4] = 1;	/* { dg-warning "subscript 4 is above array bound" } */
 	a[0][0] = 1;	// ok
 	a[1][0] = 1;	// ok
-	a[1][4] = 1;	/* { dg-warning "subscript is above array bound" } */
+	a[1][4] = 1;	/* { dg-warning "subscript 4 is above array bound" } */
 
 	int c[3] = { 0 };
 
-	c[4] = 1;	/* { dg-warning "subscript is above array bound" } */
+	c[4] = 1;	/* { dg-warning "subscript 4 is above array bound" } */
 
-	e[4] = 1;	/* { dg-warning "subscript is above array bound" } */
+	e[4] = 1;	/* { dg-warning "subscript 4 is above array bound" } */
 
 	struct f f;
-	f.f[4] = 1;	/* { dg-warning "subscript is above array bound" } */
+	f.f[4] = 1;	/* { dg-warning "subscript 4 is above array bound" } */
 
 	struct h* h = malloc(sizeof(struct h) + 3 * sizeof(int));
 	struct h0* h0 = malloc(sizeof(struct h0) + 3 * sizeof(int));
@@ -78,15 +78,15 @@ void foo(int (*a)[3])
 
 	h->j[4] = 1;	// flexible array member
 	h0->j[4] = 1;	// zero-sized array extension
-	h1->j[4] = 1;	/* { dg-warning "subscript is above array bound" } */
-	h3->j[4] = 1;	/* { dg-warning "subscript is above array bound" } */
+	h1->j[4] = 1;	/* { dg-warning "subscript 4 is above array bound" } */
+	h3->j[4] = 1;	/* { dg-warning "subscript 4 is above array bound" } */
 
 	struct h0b* h0b = malloc(sizeof(struct h) + 3 * sizeof(int));
 	struct h1b* h1b = malloc(sizeof(struct h1b) + 3 * sizeof(int));
 	struct h3b* h3b = malloc(sizeof(struct h3b));
 //	h0b->j[4] = 1;
-	h1b->j[4] = 1;;	/* { dg-warning "subscript is above array bound" } */
-	h3b->j[4] = 1;;	/* { dg-warning "subscript is above array bound" } */
+	h1b->j[4] = 1;;	/* { dg-warning "subscript 4 is above array bound" } */
+	h3b->j[4] = 1;;	/* { dg-warning "subscript 4 is above array bound" } */
 
 	// make sure nothing gets optimized away
 	bar(*a);
diff --git a/gcc/testsuite/gcc.dg/Warray-bounds-22.c b/gcc/testsuite/gcc.dg/Warray-bounds-22.c
new file mode 100644
index 0000000..ca4b102
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/Warray-bounds-22.c
@@ -0,0 +1,95 @@
+/* PR tree-optimization/82588 - missing -Warray-bounds on an excessively
+   large index
+   { dg-do compile }
+   { dg-options "-O2 -Warray-bounds -ftrack-macro-expansion=0" }  */
+
+#define SIZE_MAX  __SIZE_MAX__
+#define SSIZE_MAX __PTRDIFF_MAX__
+#define SSIZE_MIN (-SSIZE_MAX - 1)
+
+typedef __PTRDIFF_TYPE__ ssize_t;
+typedef __SIZE_TYPE__    size_t;
+
+extern ssize_t signed_value (void)
+{
+  extern volatile ssize_t signed_value_source;
+  return signed_value_source;
+}
+
+ssize_t signed_range (ssize_t min, ssize_t max)
+{
+  ssize_t val = signed_value ();
+  return val < min || max < val ? min : val;
+}
+
+struct AX { int n; char ax[]; };
+
+struct A1 { int i; char a1[1]; };
+struct B { int i; struct A1 a1x[]; };
+
+void sink (int, ...);
+
+#define T(expr)   sink (0, (expr))
+
+void test_vla (unsigned m, unsigned n)
+{
+  char vla1[m];
+
+  T (vla1[SSIZE_MIN]);                    /* { dg-warning "array subscript -\[0-9\]+ is below array bounds" "vla" } */
+  T (vla1[-1]);                           /* { dg-warning "array subscript -1 is below array bounds" "vla" } */
+  T (vla1[0]);
+  T (vla1[1]);
+  T (vla1[m - 1]);
+  /* It would be nice to diagnose this. */
+  T (vla1[m]);                            /* { dg-warning "array subscript \[0-9\]+ is above array bounds" "bug 82608" { xfail *-*-*} } */
+  T (vla1[SSIZE_MAX - 1]);
+  T (vla1[SSIZE_MAX]);                    /* { dg-warning "array subscript \[0-9\]+ is above array bounds" "vla" } */
+
+  ssize_t i = signed_range (SSIZE_MAX - 1, SSIZE_MAX);
+  T (vla1[i]);
+
+  char vla2[m][n];
+
+  T (vla2[0][SSIZE_MIN]);                 /* { dg-warning "array subscript -\[0-9\]+ is below array bounds" "vla" } */
+  T (vla2[0][-1]);                        /* { dg-warning "array subscript -1 is below array bounds" "vla" } */
+  T (vla2[0][0]);
+  T (vla2[1][1]);
+  T (vla2[m - 1][n - 1]);
+  /* It would be nice to diagnose this. */
+  T (vla2[m][0]);                         /* { dg-warning "array subscript \[0-9\]+ is above array bounds" "bug 82608" { xfail *-*-*} } */
+  T (vla2[m + 1][0]);                    /* { dg-warning "array subscript \[0-9\]+ is above array bounds" "bug 82608" { xfail *-*-*} } */
+  T (vla2[0][n]);                         /* { dg-warning "array subscript \[0-9\]+ is above array bounds" "bug 82608" { xfail *-*-*} } */
+  T (vla2[0][n + 1]);                     /* { dg-warning "array subscript \[0-9\]+ is above array bounds" "bug 82608" { xfail *-*-*} } */
+  T (vla2[m][n]);                         /* { dg-warning "array subscript \[0-9\]+ is above array bounds" "bug 82608" { xfail *-*-*} } */
+  T (vla2[m + 1][n + 1]);                 /* { dg-warning "array subscript \[0-9\]+ is above array bounds" "bug 82608" { xfail *-*-*} } */
+
+  T (vla2[0][SSIZE_MAX]);                 /* { dg-warning "array subscript \[0-9\]+ is above array bounds" "vla" } */
+  T (vla2[SSIZE_MAX][0]);                 /* { dg-warning "array subscript \[0-9\]+ is above array bounds" "vla" { xfail *-*-* } } */
+  T (vla2[SSIZE_MAX][SSIZE_MAX]);         /* { dg-warning "array subscript \[0-9\]+ is above array bounds" "vla" } */
+
+  struct S256 { char a[256]; } vla3[m];
+
+  T (vla3[SSIZE_MIN].a[0]);               /* { dg-warning "array subscript -\[0-9\]+ is below array bounds" "vla" } */
+  T (vla3[-1].a[0]);                      /* { dg-warning "array subscript -1 is below array bounds" "vla" } */
+  T (vla3[0].a[0]);
+  T (vla3[1].a[0]);
+  T (vla3[m - 1].a[0]);
+  T (vla3[SSIZE_MAX / 256 - 1].a[0]);
+  T (vla3[SSIZE_MAX / 256].a[0]);         /* { dg-warning "array subscript \[0-9\]+ is above array bounds" "vla" } */
+
+  i = signed_range (SSIZE_MAX / 256 - 1, SSIZE_MAX);
+  T (vla3[i].a[0]);
+
+  i = signed_range (SSIZE_MAX / 256, SSIZE_MAX);
+  T (vla3[i].a[0]);                       /* { dg-warning "array subscript \[0-9\]+ is above array bounds" "vla" } */
+
+  struct VLA { char vla[n]; } x;
+
+  T (x.vla[SSIZE_MIN]);                   /* { dg-warning "array subscript -\[0-9\]+ is below array bounds" "vla" } */
+  T (x.vla[-1]);                          /* { dg-warning "array subscript -1 is below array bounds" "vla" } */
+  T (x.vla[0]);
+  T (x.vla[1]);
+  T (x.vla[n - 1]);
+  T (x.vla[SSIZE_MAX - 1]);
+  T (x.vla[SSIZE_MAX]);                   /* { dg-warning "array subscript \[0-9\]+ is above array bounds" "vla" } */
+}
diff --git a/gcc/testsuite/gcc.dg/pr82596.c b/gcc/testsuite/gcc.dg/pr82596.c
new file mode 100644
index 0000000..5dc67c2
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr82596.c
@@ -0,0 +1,27 @@
+/* PR tree-optimization/82596 - missing -Warray-bounds on an out-of-bounds
+   index into string literal
+   { dg-do compile }
+   { dg-options "-O2 -Warray-bounds" } */
+
+#define SIZE_MAX  __SIZE_MAX__
+#define SSIZE_MAX __PTRDIFF_MAX__
+#define SSIZE_MIN (-SSIZE_MAX - 1)
+
+void sink (int, ...);
+
+#define T(arg) sink (arg)
+
+void test_cststring (int i)
+{
+  T (""[SSIZE_MIN]);                      /* { dg-warning "below array bounds" "string" { xfail lp64 } } */
+  T (""[SSIZE_MIN + 1]);                  /* { dg-warning "below array bounds" "string" } */
+  T (""[-1]);                             /* { dg-warning "below array bounds" "string" } */
+  T (""[0]);
+  T (""[1]);                              /* { dg-warning "above array bounds" "string" } */
+  T ("0"[2]);                             /* { dg-warning "above array bounds" "string" } */
+  T ("012"[2]);
+  T ("012"[3]);
+  T ("012"[4]);                           /* { dg-warning "above array bounds" "string" } */
+  T ("0123"[SSIZE_MAX]);                  /* { dg-warning "above array bounds" "string" } */
+  T ("0123"[SIZE_MAX]);                   /* { dg-warning "above array bounds" "string" } */
+}
diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
index 2c86b8e..2e7c26c 100644
--- a/gcc/tree-vrp.c
+++ b/gcc/tree-vrp.c
@@ -42,6 +42,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimple-iterator.h"
 #include "gimple-walk.h"
 #include "tree-cfg.h"
+#include "tree-dfa.h"
 #include "tree-ssa-loop-manip.h"
 #include "tree-ssa-loop-niter.h"
 #include "tree-ssa-loop.h"
@@ -64,6 +65,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-cfgcleanup.h"
 #include "stringpool.h"
 #include "attribs.h"
+#include "builtins.h"
 
 #define VR_INITIALIZER { VR_UNDEFINED, NULL_TREE, NULL_TREE, NULL }
 
@@ -6675,26 +6677,56 @@ check_array_ref (location_t location, tree ref, bool ignore_off_by_one)
   low_sub = up_sub = TREE_OPERAND (ref, 1);
   up_bound = array_ref_up_bound (ref);
 
-  /* Can not check flexible arrays.  */
   if (!up_bound
-      || TREE_CODE (up_bound) != INTEGER_CST)
-    return;
+      || TREE_CODE (up_bound) != INTEGER_CST
+      || (warn_array_bounds < 2
+	  && array_at_struct_end_p (ref)))
+    {
+      /* Accesses to trailing arrays via pointers may access storage
+	 beyond the types array bounds.  For such arrays, or for flexible
+	 array members, as well as for other arrays of an unknown size,
+	 replace the upper bound with a more permissive one that assumes
+	 the size of the largest object is PTRDIFF_MAX.  */
+      tree eltsize = array_ref_element_size (ref);
+
+      /* FIXME: Handle VLAs.  */
+      if (TREE_CODE (eltsize) != INTEGER_CST)
+	return;
 
-  /* Accesses to trailing arrays via pointers may access storage
-     beyond the types array bounds.  */
-  if (warn_array_bounds < 2
-      && array_at_struct_end_p (ref))
-    return;
+      tree maxbound = TYPE_MAX_VALUE (ptrdiff_type_node);
+
+      up_bound_p1 = int_const_binop (TRUNC_DIV_EXPR, maxbound, eltsize);
+
+      tree arg = TREE_OPERAND (ref, 0);
+      tree_code code = TREE_CODE (arg);
+      if (code == COMPONENT_REF)
+	{
+	  HOST_WIDE_INT off;
+	  if (tree base = get_addr_base_and_unit_offset (arg, &off))
+	    {
+	      tree size = TYPE_SIZE_UNIT (TREE_TYPE (base));
+	      if (TREE_CODE (size) == INTEGER_CST)
+		up_bound_p1 = int_const_binop (MINUS_EXPR, up_bound_p1, size);
+	    }
+	}
+
+      up_bound = int_const_binop (MINUS_EXPR, up_bound_p1,
+				  build_int_cst (ptrdiff_type_node, 1));
+    }
+  else
+    up_bound_p1 = int_const_binop (PLUS_EXPR, up_bound,
+				   build_int_cst (TREE_TYPE (up_bound), 1));
 
   low_bound = array_ref_low_bound (ref);
-  up_bound_p1 = int_const_binop (PLUS_EXPR, up_bound,
-				 build_int_cst (TREE_TYPE (up_bound), 1));
+
+  tree artype = TREE_TYPE (TREE_OPERAND (ref, 0));
 
   /* Empty array.  */
   if (tree_int_cst_equal (low_bound, up_bound_p1))
     {
       warning_at (location, OPT_Warray_bounds,
-		  "array subscript is above array bounds");
+		  "array subscript %E is above array bounds of %qT",
+		  low_bound, artype);
       TREE_NO_WARNING (ref) = 1;
     }
 
@@ -6718,7 +6750,8 @@ check_array_ref (location_t location, tree ref, bool ignore_off_by_one)
           && tree_int_cst_le (low_sub, low_bound))
         {
           warning_at (location, OPT_Warray_bounds,
-		      "array subscript is outside array bounds");
+		      "array subscript [%E, %E] is outside array bounds of %qT",
+		      low_sub, up_sub, artype);
           TREE_NO_WARNING (ref) = 1;
         }
     }
@@ -6734,7 +6767,8 @@ check_array_ref (location_t location, tree ref, bool ignore_off_by_one)
 	  fprintf (dump_file, "\n");
 	}
       warning_at (location, OPT_Warray_bounds,
-		  "array subscript is above array bounds");
+		  "array subscript %E is above array bounds of %qT",
+		  up_sub, artype);
       TREE_NO_WARNING (ref) = 1;
     }
   else if (TREE_CODE (low_sub) == INTEGER_CST
@@ -6747,7 +6781,8 @@ check_array_ref (location_t location, tree ref, bool ignore_off_by_one)
 	  fprintf (dump_file, "\n");
 	}
       warning_at (location, OPT_Warray_bounds,
-		  "array subscript is below array bounds");
+		  "array subscript %E is below array bounds of %qT",
+		  low_sub, artype);
       TREE_NO_WARNING (ref) = 1;
     }
 }
@@ -6802,7 +6837,8 @@ search_for_addr_array (tree t, location_t location)
 	      fprintf (dump_file, "\n");
 	    }
 	  warning_at (location, OPT_Warray_bounds,
-		      "array subscript is below array bounds");
+		      "array subscript %wi is below array bounds of %qT",
+		      idx.to_shwi (), TREE_TYPE (tem));
 	  TREE_NO_WARNING (t) = 1;
 	}
       else if (idx > (wi::to_offset (up_bound)
@@ -6815,7 +6851,8 @@ search_for_addr_array (tree t, location_t location)
 	      fprintf (dump_file, "\n");
 	    }
 	  warning_at (location, OPT_Warray_bounds,
-		      "array subscript is above array bounds");
+		      "array subscript %wu is above array bounds of %qT",
+		      idx.to_uhwi (), TREE_TYPE (tem));
 	  TREE_NO_WARNING (t) = 1;
 	}
     }
Richard Biener Oct. 20, 2017, 8:08 a.m. UTC | #6
On Fri, Oct 20, 2017 at 1:00 AM, Martin Sebor <msebor@gmail.com> wrote:
> On 10/19/2017 02:34 AM, Richard Biener wrote:
>>
>> On Thu, Oct 19, 2017 at 1:19 AM, Martin Sebor <msebor@gmail.com> wrote:
>>>
>>> On 10/18/2017 04:48 AM, Richard Biener wrote:
>>>>
>>>>
>>>> On Wed, Oct 18, 2017 at 5:34 AM, Martin Sebor <msebor@gmail.com> wrote:
>>>>>
>>>>>
>>>>> While testing my latest -Wrestrict changes I noticed a number of
>>>>> opportunities to improve the -Warray-bounds warning.  Attached
>>>>> is a patch that implements a solution for the following subset
>>>>> of these:
>>>>>
>>>>> PR tree-optimization/82596 - missing -Warray-bounds on an out-of
>>>>>   bounds index into string literal
>>>>> PR tree-optimization/82588 - missing -Warray-bounds on an excessively
>>>>>   large index
>>>>> PR tree-optimization/82583 - missing -Warray-bounds on out-of-bounds
>>>>>   inner indices
>
>
>> I meant to use size_type_node (size_t), not sizetype.  But
>> I just checked that ptrdiff_type_node is initialized in
>> build_common_tree_nodes and thus always available.
>
>
> I see.  Using ptrdiff_type_node is preferable for the targets
> where ptrdiff_t has a greater precision than size_t (e.g., VMS).
> It makes sense now.  I should remember to change all the other
> places where I introduced ssizetype to use ptrdiff_type_node.
>
>>
>>> As an aside, at some point I would like to get away from a type
>>> based limit in all these warnings and instead use one that can
>>> be controlled by an option so that a user can impose a lower limit
>>> on the maximum size of an object and have all size-related warnings
>>> (and perhaps even optimizations) enforce it and benefit from it.
>>
>>
>> You could add a --param that is initialized from ptrdiff_type_node.
>
>
> Yes, that's an option to consider.  Thanks.
>
>
>>
>>>> +      tree arg = TREE_OPERAND (ref, 0);
>>>> +      tree_code code = TREE_CODE (arg);
>>>> +      if (code == COMPONENT_REF)
>>>> +       {
>>>> +         HOST_WIDE_INT off;
>>>> +         if (tree base = get_addr_base_and_unit_offset (ref, &off))
>>>> +           up_bound_p1 = fold_build2 (MINUS_EXPR, ssizetype,
>>>> up_bound_p1,
>>>> +                                      TYPE_SIZE_UNIT (TREE_TYPE
>>>> (base)));
>>>> +         else
>>>> +           return;
>>>>
>>>> so this gives up on a.b[i].c.d[k] (ok, array_at_struct_end_p will be
>>>> false).
>>>> simply not subtracting anyhing instead of returning would be
>>>> conservatively
>>>> correct, no?  Likewise subtracting the offset of the array for all
>>>> "previous"
>>>> variably indexed components with assuming the lowest value for the
>>>> index.
>>>> But as above I think compensating for the offset of the array within the
>>>> object
>>>> is academic ... ;)
>>>
>>>
>>>
>>> I was going to say yes (it gives up) but on second thought I don't
>>> think it does.  Only the major index can be unbounded and the code
>>> does consider the size of the sub-array when checking the major
>>> index.  So, IIUC, I think this works correctly as is (*).  What
>>> doesn't work is VLAs but those are a separate problem.  Let me
>>> know if I misunderstood your question.
>>
>>
>> get_addr_base_and_unit_offset will return NULL if there's any variable
>> component in 'ref'.  So as written it seems to be dead code (you
>> want to pass 'arg'?)
>
>
> Sorry, I'm not sure I understand what you mean.  What do you think
> is dead code?  The call to get_addr_base_and_unit_offset() is also
> made for an array of unspecified bound (up_bound is null) and for
> an array at the end of a struct.  For those the function returns
> non-null, and for the others (arrays of runtime bound) it returns
> null.  (I passed arg instead of ref but I see no difference in
> my tests.)

If you pass a.b.c[i] it will return NULL, if you pass a.b.c ('arg') it will
return the offset of 'c'.  If you pass a.b[j].c it will still return NULL.
You could use get_ref_base_and_extent which will return the offset
of a.b[0].c in this case and sets max_size != size - but you are only
interested in offset.  The disadvantage of get_ref_base_and_extent
is it returns offset in bits thus if the offset is too large for a HWI
you'll instead get offset == 0 and max_size == -1.

Thus I'm saying this is dead code for variable array accesses
(even for the array you are warning about).  Yes, for constant index
and at-struct-end you'll get sth, but the warning is in VRP because
of variable indexes.

So I suggest to pass 'arg' and use get_ref_base_and_extent
for some extra precision (and possible lossage for very very large
structures).

Thus instead of

+      tree maxbound = TYPE_MAX_VALUE (ptrdiff_type_node);
+
+      up_bound_p1 = int_const_binop (TRUNC_DIV_EXPR, maxbound, eltsize);
+
+      tree arg = TREE_OPERAND (ref, 0);
+      tree_code code = TREE_CODE (arg);
+      if (code == COMPONENT_REF)
+       {
+         HOST_WIDE_INT off;
+         if (tree base = get_addr_base_and_unit_offset (arg, &off))
+           {
+             tree size = TYPE_SIZE_UNIT (TREE_TYPE (base));

(not sure why you're subtracting the size of the base here?!  I
thought you subtract off!)

+             if (TREE_CODE (size) == INTEGER_CST)
+               up_bound_p1 = int_const_binop (MINUS_EXPR, up_bound_p1, size);
+           }

do

    if (handled_component_p (arg))
      {
         HOST_WIDE_INT bitoff, bitsize, bitmax_size;
         bool reverse;
         get_ref_base_and_extent (arg, &bitoff, &bitsize,
&bitmax_size, &reverse);
         up_bound_p1 = int_const_binop (MINUS_EXPR, up_bound_p1,
                              build_int_cst (TREE_TYPE (up_bound_p1),
bitoff / BITS_PER_UNIT));
      }

ok with that change.

Richard.

>
>>
>> I was asking you to remove the 'else return' because w/o subtracting
>> the upper bound is just more conservative.
>
>
> Sure, that sounds good.
>
> Attached is an updated patch.
>
> Thanks
> Martin
Martin Sebor Oct. 20, 2017, 3:43 p.m. UTC | #7
On 10/20/2017 02:08 AM, Richard Biener wrote:
> On Fri, Oct 20, 2017 at 1:00 AM, Martin Sebor <msebor@gmail.com> wrote:
>> On 10/19/2017 02:34 AM, Richard Biener wrote:
>>>
>>> On Thu, Oct 19, 2017 at 1:19 AM, Martin Sebor <msebor@gmail.com> wrote:
>>>>
>>>> On 10/18/2017 04:48 AM, Richard Biener wrote:
>>>>>
>>>>>
>>>>> On Wed, Oct 18, 2017 at 5:34 AM, Martin Sebor <msebor@gmail.com> wrote:
>>>>>>
>>>>>>
>>>>>> While testing my latest -Wrestrict changes I noticed a number of
>>>>>> opportunities to improve the -Warray-bounds warning.  Attached
>>>>>> is a patch that implements a solution for the following subset
>>>>>> of these:
>>>>>>
>>>>>> PR tree-optimization/82596 - missing -Warray-bounds on an out-of
>>>>>>   bounds index into string literal
>>>>>> PR tree-optimization/82588 - missing -Warray-bounds on an excessively
>>>>>>   large index
>>>>>> PR tree-optimization/82583 - missing -Warray-bounds on out-of-bounds
>>>>>>   inner indices
>>
>>
>>> I meant to use size_type_node (size_t), not sizetype.  But
>>> I just checked that ptrdiff_type_node is initialized in
>>> build_common_tree_nodes and thus always available.
>>
>>
>> I see.  Using ptrdiff_type_node is preferable for the targets
>> where ptrdiff_t has a greater precision than size_t (e.g., VMS).
>> It makes sense now.  I should remember to change all the other
>> places where I introduced ssizetype to use ptrdiff_type_node.
>>
>>>
>>>> As an aside, at some point I would like to get away from a type
>>>> based limit in all these warnings and instead use one that can
>>>> be controlled by an option so that a user can impose a lower limit
>>>> on the maximum size of an object and have all size-related warnings
>>>> (and perhaps even optimizations) enforce it and benefit from it.
>>>
>>>
>>> You could add a --param that is initialized from ptrdiff_type_node.
>>
>>
>> Yes, that's an option to consider.  Thanks.
>>
>>
>>>
>>>>> +      tree arg = TREE_OPERAND (ref, 0);
>>>>> +      tree_code code = TREE_CODE (arg);
>>>>> +      if (code == COMPONENT_REF)
>>>>> +       {
>>>>> +         HOST_WIDE_INT off;
>>>>> +         if (tree base = get_addr_base_and_unit_offset (ref, &off))
>>>>> +           up_bound_p1 = fold_build2 (MINUS_EXPR, ssizetype,
>>>>> up_bound_p1,
>>>>> +                                      TYPE_SIZE_UNIT (TREE_TYPE
>>>>> (base)));
>>>>> +         else
>>>>> +           return;
>>>>>
>>>>> so this gives up on a.b[i].c.d[k] (ok, array_at_struct_end_p will be
>>>>> false).
>>>>> simply not subtracting anyhing instead of returning would be
>>>>> conservatively
>>>>> correct, no?  Likewise subtracting the offset of the array for all
>>>>> "previous"
>>>>> variably indexed components with assuming the lowest value for the
>>>>> index.
>>>>> But as above I think compensating for the offset of the array within the
>>>>> object
>>>>> is academic ... ;)
>>>>
>>>>
>>>>
>>>> I was going to say yes (it gives up) but on second thought I don't
>>>> think it does.  Only the major index can be unbounded and the code
>>>> does consider the size of the sub-array when checking the major
>>>> index.  So, IIUC, I think this works correctly as is (*).  What
>>>> doesn't work is VLAs but those are a separate problem.  Let me
>>>> know if I misunderstood your question.
>>>
>>>
>>> get_addr_base_and_unit_offset will return NULL if there's any variable
>>> component in 'ref'.  So as written it seems to be dead code (you
>>> want to pass 'arg'?)
>>
>>
>> Sorry, I'm not sure I understand what you mean.  What do you think
>> is dead code?  The call to get_addr_base_and_unit_offset() is also
>> made for an array of unspecified bound (up_bound is null) and for
>> an array at the end of a struct.  For those the function returns
>> non-null, and for the others (arrays of runtime bound) it returns
>> null.  (I passed arg instead of ref but I see no difference in
>> my tests.)
>
> If you pass a.b.c[i] it will return NULL, if you pass a.b.c ('arg') it will
> return the offset of 'c'.  If you pass a.b[j].c it will still return NULL.
> You could use get_ref_base_and_extent which will return the offset
> of a.b[0].c in this case and sets max_size != size - but you are only
> interested in offset.  The disadvantage of get_ref_base_and_extent
> is it returns offset in bits thus if the offset is too large for a HWI
> you'll instead get offset == 0 and max_size == -1.
>
> Thus I'm saying this is dead code for variable array accesses
> (even for the array you are warning about).  Yes, for constant index
> and at-struct-end you'll get sth, but the warning is in VRP because
> of variable indexes.
>
> So I suggest to pass 'arg' and use get_ref_base_and_extent
> for some extra precision (and possible lossage for very very large
> structures).

Computing bit offsets defeats the out-of-bounds flexible array
index detection because the computation overflows (the function
sets the offset to zero).  I'll go ahead with the first version
unless you have a different suggestion.

Thanks
Martin

>
> Thus instead of
>
> +      tree maxbound = TYPE_MAX_VALUE (ptrdiff_type_node);
> +
> +      up_bound_p1 = int_const_binop (TRUNC_DIV_EXPR, maxbound, eltsize);
> +
> +      tree arg = TREE_OPERAND (ref, 0);
> +      tree_code code = TREE_CODE (arg);
> +      if (code == COMPONENT_REF)
> +       {
> +         HOST_WIDE_INT off;
> +         if (tree base = get_addr_base_and_unit_offset (arg, &off))
> +           {
> +             tree size = TYPE_SIZE_UNIT (TREE_TYPE (base));
>
> (not sure why you're subtracting the size of the base here?!  I
> thought you subtract off!)
>
> +             if (TREE_CODE (size) == INTEGER_CST)
> +               up_bound_p1 = int_const_binop (MINUS_EXPR, up_bound_p1, size);
> +           }
>
> do
>
>     if (handled_component_p (arg))
>       {
>          HOST_WIDE_INT bitoff, bitsize, bitmax_size;
>          bool reverse;
>          get_ref_base_and_extent (arg, &bitoff, &bitsize,
> &bitmax_size, &reverse);
>          up_bound_p1 = int_const_binop (MINUS_EXPR, up_bound_p1,
>                               build_int_cst (TREE_TYPE (up_bound_p1),
> bitoff / BITS_PER_UNIT));
>       }
>
> ok with that change.
>
> Richard.
>
>>
>>>
>>> I was asking you to remove the 'else return' because w/o subtracting
>>> the upper bound is just more conservative.
>>
>>
>> Sure, that sounds good.
>>
>> Attached is an updated patch.
>>
>> Thanks
>> Martin
Richard Biener Oct. 20, 2017, 3:57 p.m. UTC | #8
On October 20, 2017 5:43:40 PM GMT+02:00, Martin Sebor <msebor@gmail.com> wrote:
>On 10/20/2017 02:08 AM, Richard Biener wrote:
>> On Fri, Oct 20, 2017 at 1:00 AM, Martin Sebor <msebor@gmail.com>
>wrote:
>>> On 10/19/2017 02:34 AM, Richard Biener wrote:
>>>>
>>>> On Thu, Oct 19, 2017 at 1:19 AM, Martin Sebor <msebor@gmail.com>
>wrote:
>>>>>
>>>>> On 10/18/2017 04:48 AM, Richard Biener wrote:
>>>>>>
>>>>>>
>>>>>> On Wed, Oct 18, 2017 at 5:34 AM, Martin Sebor <msebor@gmail.com>
>wrote:
>>>>>>>
>>>>>>>
>>>>>>> While testing my latest -Wrestrict changes I noticed a number of
>>>>>>> opportunities to improve the -Warray-bounds warning.  Attached
>>>>>>> is a patch that implements a solution for the following subset
>>>>>>> of these:
>>>>>>>
>>>>>>> PR tree-optimization/82596 - missing -Warray-bounds on an out-of
>>>>>>>   bounds index into string literal
>>>>>>> PR tree-optimization/82588 - missing -Warray-bounds on an
>excessively
>>>>>>>   large index
>>>>>>> PR tree-optimization/82583 - missing -Warray-bounds on
>out-of-bounds
>>>>>>>   inner indices
>>>
>>>
>>>> I meant to use size_type_node (size_t), not sizetype.  But
>>>> I just checked that ptrdiff_type_node is initialized in
>>>> build_common_tree_nodes and thus always available.
>>>
>>>
>>> I see.  Using ptrdiff_type_node is preferable for the targets
>>> where ptrdiff_t has a greater precision than size_t (e.g., VMS).
>>> It makes sense now.  I should remember to change all the other
>>> places where I introduced ssizetype to use ptrdiff_type_node.
>>>
>>>>
>>>>> As an aside, at some point I would like to get away from a type
>>>>> based limit in all these warnings and instead use one that can
>>>>> be controlled by an option so that a user can impose a lower limit
>>>>> on the maximum size of an object and have all size-related
>warnings
>>>>> (and perhaps even optimizations) enforce it and benefit from it.
>>>>
>>>>
>>>> You could add a --param that is initialized from ptrdiff_type_node.
>>>
>>>
>>> Yes, that's an option to consider.  Thanks.
>>>
>>>
>>>>
>>>>>> +      tree arg = TREE_OPERAND (ref, 0);
>>>>>> +      tree_code code = TREE_CODE (arg);
>>>>>> +      if (code == COMPONENT_REF)
>>>>>> +       {
>>>>>> +         HOST_WIDE_INT off;
>>>>>> +         if (tree base = get_addr_base_and_unit_offset (ref,
>&off))
>>>>>> +           up_bound_p1 = fold_build2 (MINUS_EXPR, ssizetype,
>>>>>> up_bound_p1,
>>>>>> +                                      TYPE_SIZE_UNIT (TREE_TYPE
>>>>>> (base)));
>>>>>> +         else
>>>>>> +           return;
>>>>>>
>>>>>> so this gives up on a.b[i].c.d[k] (ok, array_at_struct_end_p will
>be
>>>>>> false).
>>>>>> simply not subtracting anyhing instead of returning would be
>>>>>> conservatively
>>>>>> correct, no?  Likewise subtracting the offset of the array for
>all
>>>>>> "previous"
>>>>>> variably indexed components with assuming the lowest value for
>the
>>>>>> index.
>>>>>> But as above I think compensating for the offset of the array
>within the
>>>>>> object
>>>>>> is academic ... ;)
>>>>>
>>>>>
>>>>>
>>>>> I was going to say yes (it gives up) but on second thought I don't
>>>>> think it does.  Only the major index can be unbounded and the code
>>>>> does consider the size of the sub-array when checking the major
>>>>> index.  So, IIUC, I think this works correctly as is (*).  What
>>>>> doesn't work is VLAs but those are a separate problem.  Let me
>>>>> know if I misunderstood your question.
>>>>
>>>>
>>>> get_addr_base_and_unit_offset will return NULL if there's any
>variable
>>>> component in 'ref'.  So as written it seems to be dead code (you
>>>> want to pass 'arg'?)
>>>
>>>
>>> Sorry, I'm not sure I understand what you mean.  What do you think
>>> is dead code?  The call to get_addr_base_and_unit_offset() is also
>>> made for an array of unspecified bound (up_bound is null) and for
>>> an array at the end of a struct.  For those the function returns
>>> non-null, and for the others (arrays of runtime bound) it returns
>>> null.  (I passed arg instead of ref but I see no difference in
>>> my tests.)
>>
>> If you pass a.b.c[i] it will return NULL, if you pass a.b.c ('arg')
>it will
>> return the offset of 'c'.  If you pass a.b[j].c it will still return
>NULL.
>> You could use get_ref_base_and_extent which will return the offset
>> of a.b[0].c in this case and sets max_size != size - but you are only
>> interested in offset.  The disadvantage of get_ref_base_and_extent
>> is it returns offset in bits thus if the offset is too large for a
>HWI
>> you'll instead get offset == 0 and max_size == -1.
>>
>> Thus I'm saying this is dead code for variable array accesses
>> (even for the array you are warning about).  Yes, for constant index
>> and at-struct-end you'll get sth, but the warning is in VRP because
>> of variable indexes.
>>
>> So I suggest to pass 'arg' and use get_ref_base_and_extent
>> for some extra precision (and possible lossage for very very large
>> structures).
>
>Computing bit offsets defeats the out-of-bounds flexible array
>index detection because the computation overflows (the function
>sets the offset to zero).

It shouldn't if you pass arg rather than ref. 

  I'll go ahead with the first version
>unless you have a different suggestion.
>
>Thanks
>Martin
>
>>
>> Thus instead of
>>
>> +      tree maxbound = TYPE_MAX_VALUE (ptrdiff_type_node);
>> +
>> +      up_bound_p1 = int_const_binop (TRUNC_DIV_EXPR, maxbound,
>eltsize);
>> +
>> +      tree arg = TREE_OPERAND (ref, 0);
>> +      tree_code code = TREE_CODE (arg);
>> +      if (code == COMPONENT_REF)
>> +       {
>> +         HOST_WIDE_INT off;
>> +         if (tree base = get_addr_base_and_unit_offset (arg, &off))
>> +           {
>> +             tree size = TYPE_SIZE_UNIT (TREE_TYPE (base));
>>
>> (not sure why you're subtracting the size of the base here?!  I
>> thought you subtract off!)
>>
>> +             if (TREE_CODE (size) == INTEGER_CST)
>> +               up_bound_p1 = int_const_binop (MINUS_EXPR,
>up_bound_p1, size);
>> +           }
>>
>> do
>>
>>     if (handled_component_p (arg))
>>       {
>>          HOST_WIDE_INT bitoff, bitsize, bitmax_size;
>>          bool reverse;
>>          get_ref_base_and_extent (arg, &bitoff, &bitsize,
>> &bitmax_size, &reverse);
>>          up_bound_p1 = int_const_binop (MINUS_EXPR, up_bound_p1,
>>                               build_int_cst (TREE_TYPE (up_bound_p1),
>> bitoff / BITS_PER_UNIT));
>>       }
>>
>> ok with that change.
>>
>> Richard.
>>
>>>
>>>>
>>>> I was asking you to remove the 'else return' because w/o
>subtracting
>>>> the upper bound is just more conservative.
>>>
>>>
>>> Sure, that sounds good.
>>>
>>> Attached is an updated patch.
>>>
>>> Thanks
>>> Martin
Martin Sebor Oct. 29, 2017, 4:15 p.m. UTC | #9
Ping -- please see my reply below.

On 10/20/2017 09:57 AM, Richard Biener wrote:
>>>>> get_addr_base_and_unit_offset will return NULL if there's any
>> variable
>>>>> component in 'ref'.  So as written it seems to be dead code (you
>>>>> want to pass 'arg'?)
>>>>
>>>>
>>>> Sorry, I'm not sure I understand what you mean.  What do you think
>>>> is dead code?  The call to get_addr_base_and_unit_offset() is also
>>>> made for an array of unspecified bound (up_bound is null) and for
>>>> an array at the end of a struct.  For those the function returns
>>>> non-null, and for the others (arrays of runtime bound) it returns
>>>> null.  (I passed arg instead of ref but I see no difference in
>>>> my tests.)
>>>
>>> If you pass a.b.c[i] it will return NULL, if you pass a.b.c ('arg')
>> it will
>>> return the offset of 'c'.  If you pass a.b[j].c it will still return
>> NULL.
>>> You could use get_ref_base_and_extent which will return the offset
>>> of a.b[0].c in this case and sets max_size != size - but you are only
>>> interested in offset.  The disadvantage of get_ref_base_and_extent
>>> is it returns offset in bits thus if the offset is too large for a
>> HWI
>>> you'll instead get offset == 0 and max_size == -1.
>>>
>>> Thus I'm saying this is dead code for variable array accesses
>>> (even for the array you are warning about).  Yes, for constant index
>>> and at-struct-end you'll get sth, but the warning is in VRP because
>>> of variable indexes.
>>>
>>> So I suggest to pass 'arg' and use get_ref_base_and_extent
>>> for some extra precision (and possible lossage for very very large
>>> structures).
>>
>> Computing bit offsets defeats the out-of-bounds flexible array
>> index detection because the computation overflows (the function
>> sets the offset to zero).
>
> It shouldn't if you pass arg rather than ref.

I suspect you missed my reply in IRC where I confirmed that this
approach doesn't work for the reason you yourself mentioned above:

 >>> The disadvantage of get_ref_base_and_extent
 >>> is it returns offset in bits thus if the offset is too large
 >>> for a HWI you'll instead get offset == 0 and max_size == -1.

This means that using the function you suggest would either prevent
detecting the out-of-bounds indices that overflow due to the bit
offset, thus largely defeating the purpose of the patch (to detect
excessively large indices), or not printing the value of the out-of
bounds index in the warning in this case, which would in turn mean
further changes to the rest of the logic.

>   I'll go ahead with the first version
>> unless you have a different suggestion.

But before I commit the patch as is I want to double-check to make
sure you don't have further suggestions.

Martin

PS For reference the last version of the patch is here:

   https://gcc.gnu.org/ml/gcc-patches/2017-10/msg01304.html
Richard Biener Oct. 30, 2017, 12:02 p.m. UTC | #10
On Sun, Oct 29, 2017 at 5:15 PM, Martin Sebor <msebor@gmail.com> wrote:
> Ping -- please see my reply below.
>
>
> On 10/20/2017 09:57 AM, Richard Biener wrote:
>>>>>>
>>>>>> get_addr_base_and_unit_offset will return NULL if there's any
>>>
>>> variable
>>>>>>
>>>>>> component in 'ref'.  So as written it seems to be dead code (you
>>>>>> want to pass 'arg'?)
>>>>>
>>>>>
>>>>>
>>>>> Sorry, I'm not sure I understand what you mean.  What do you think
>>>>> is dead code?  The call to get_addr_base_and_unit_offset() is also
>>>>> made for an array of unspecified bound (up_bound is null) and for
>>>>> an array at the end of a struct.  For those the function returns
>>>>> non-null, and for the others (arrays of runtime bound) it returns
>>>>> null.  (I passed arg instead of ref but I see no difference in
>>>>> my tests.)
>>>>
>>>>
>>>> If you pass a.b.c[i] it will return NULL, if you pass a.b.c ('arg')
>>>
>>> it will
>>>>
>>>> return the offset of 'c'.  If you pass a.b[j].c it will still return
>>>
>>> NULL.
>>>>
>>>> You could use get_ref_base_and_extent which will return the offset
>>>> of a.b[0].c in this case and sets max_size != size - but you are only
>>>> interested in offset.  The disadvantage of get_ref_base_and_extent
>>>> is it returns offset in bits thus if the offset is too large for a
>>>
>>> HWI
>>>>
>>>> you'll instead get offset == 0 and max_size == -1.
>>>>
>>>> Thus I'm saying this is dead code for variable array accesses
>>>> (even for the array you are warning about).  Yes, for constant index
>>>> and at-struct-end you'll get sth, but the warning is in VRP because
>>>> of variable indexes.
>>>>
>>>> So I suggest to pass 'arg' and use get_ref_base_and_extent
>>>> for some extra precision (and possible lossage for very very large
>>>> structures).
>>>
>>>
>>> Computing bit offsets defeats the out-of-bounds flexible array
>>> index detection because the computation overflows (the function
>>> sets the offset to zero).
>>
>>
>> It shouldn't if you pass arg rather than ref.
>
>
> I suspect you missed my reply in IRC where I confirmed that this
> approach doesn't work for the reason you yourself mentioned above:
>
>>>> The disadvantage of get_ref_base_and_extent
>>>> is it returns offset in bits thus if the offset is too large
>>>> for a HWI you'll instead get offset == 0 and max_size == -1.
>
> This means that using the function you suggest would either prevent
> detecting the out-of-bounds indices that overflow due to the bit
> offset, thus largely defeating the purpose of the patch (to detect
> excessively large indices), or not printing the value of the out-of
> bounds index in the warning in this case, which would in turn mean
> further changes to the rest of the logic.

Well, it would not handle the case of

a.b[offset-bringing-you-bitoffset-overflow].c[i]

but it would handle (compared to your version of the patch)

a.b[j].c[i]

with i being the unreasonably large offset.  That's beause we
look at the bit offset of a.b[j].c thus the _start_ of the array.

I still find the half-address-space case totally pointless to warn about
(even more to get "precise" here by subtracting the offset of the array).
There's not a single testcase that looks motivating to me (like an
error with some signed->unsigned conversion and then GCC magically
figuring out a range you'd warn for)

The diagnostic wording changes like

@@ -6718,7 +6750,8 @@ check_array_ref (location_t location, tree ref,
bool ignore_off_by_one)
           && tree_int_cst_le (low_sub, low_bound))
         {
           warning_at (location, OPT_Warray_bounds,
-      "array subscript is outside array bounds");
+      "array subscript [%E, %E] is outside array bounds of %qT",
+      low_sub, up_sub, artype);
           TREE_NO_WARNING (ref) = 1;
         }
     }

are ok for trunk.

Thanks,
Richard.


>>   I'll go ahead with the first version
>>>
>>> unless you have a different suggestion.
>
>
> But before I commit the patch as is I want to double-check to make
> sure you don't have further suggestions.
>
> Martin
>
> PS For reference the last version of the patch is here:
>
>   https://gcc.gnu.org/ml/gcc-patches/2017-10/msg01304.html
Jeff Law Nov. 6, 2017, 6:41 p.m. UTC | #11
On 10/29/2017 10:15 AM, Martin Sebor wrote:
> Ping -- please see my reply below.
> 
> On 10/20/2017 09:57 AM, Richard Biener wrote:
>>>>>> get_addr_base_and_unit_offset will return NULL if there's any
>>> variable
>>>>>> component in 'ref'.  So as written it seems to be dead code (you
>>>>>> want to pass 'arg'?)
>>>>>
>>>>>
>>>>> Sorry, I'm not sure I understand what you mean.  What do you think
>>>>> is dead code?  The call to get_addr_base_and_unit_offset() is also
>>>>> made for an array of unspecified bound (up_bound is null) and for
>>>>> an array at the end of a struct.  For those the function returns
>>>>> non-null, and for the others (arrays of runtime bound) it returns
>>>>> null.  (I passed arg instead of ref but I see no difference in
>>>>> my tests.)
>>>>
>>>> If you pass a.b.c[i] it will return NULL, if you pass a.b.c ('arg')
>>> it will
>>>> return the offset of 'c'.  If you pass a.b[j].c it will still return
>>> NULL.
>>>> You could use get_ref_base_and_extent which will return the offset
>>>> of a.b[0].c in this case and sets max_size != size - but you are only
>>>> interested in offset.  The disadvantage of get_ref_base_and_extent
>>>> is it returns offset in bits thus if the offset is too large for a
>>> HWI
>>>> you'll instead get offset == 0 and max_size == -1.
>>>>
>>>> Thus I'm saying this is dead code for variable array accesses
>>>> (even for the array you are warning about).  Yes, for constant index
>>>> and at-struct-end you'll get sth, but the warning is in VRP because
>>>> of variable indexes.
>>>>
>>>> So I suggest to pass 'arg' and use get_ref_base_and_extent
>>>> for some extra precision (and possible lossage for very very large
>>>> structures).
>>>
>>> Computing bit offsets defeats the out-of-bounds flexible array
>>> index detection because the computation overflows (the function
>>> sets the offset to zero).
>>
>> It shouldn't if you pass arg rather than ref.
> 
> I suspect you missed my reply in IRC where I confirmed that this
> approach doesn't work for the reason you yourself mentioned above:
> 
>>>> The disadvantage of get_ref_base_and_extent
>>>> is it returns offset in bits thus if the offset is too large
>>>> for a HWI you'll instead get offset == 0 and max_size == -1.
> 
> This means that using the function you suggest would either prevent
> detecting the out-of-bounds indices that overflow due to the bit
> offset, thus largely defeating the purpose of the patch (to detect
> excessively large indices), or not printing the value of the out-of
> bounds index in the warning in this case, which would in turn mean
> further changes to the rest of the logic.
I think Richi's point is that it's more important (in his opinion) to
handle the varying objects within an access like a.b.c[i] than to handle
cases that would overflow when converted to bits.  Thus he'd prefer to
use get_ref_base_and_extent over get_addr_base_and_unit_offset.

I wonder if a hybrid approach here would work.

ie, use get_ref_base_and_extent per Richi's request.  When that returns
an max_size of -1, then call get_addr_base_and_unit_offset and if it
returns that the offset is huge, then warn without any deeper analysis.

Yea, this could trip a false positive if someone makes an array that is
half the address space (give or take), but that's probably not at all
common.  Whereas the additional cases handled by get_ref_base_and_extent
are perhaps more useful.

Thoughts?

jeff
Martin Sebor Nov. 7, 2017, 2:54 a.m. UTC | #12
On 10/30/2017 06:02 AM, Richard Biener wrote:
> On Sun, Oct 29, 2017 at 5:15 PM, Martin Sebor <msebor@gmail.com> wrote:
>> Ping -- please see my reply below.
>>
>>
>> On 10/20/2017 09:57 AM, Richard Biener wrote:
>>>>>>>
>>>>>>> get_addr_base_and_unit_offset will return NULL if there's any
>>>>
>>>> variable
>>>>>>>
>>>>>>> component in 'ref'.  So as written it seems to be dead code (you
>>>>>>> want to pass 'arg'?)
>>>>>>
>>>>>>
>>>>>>
>>>>>> Sorry, I'm not sure I understand what you mean.  What do you think
>>>>>> is dead code?  The call to get_addr_base_and_unit_offset() is also
>>>>>> made for an array of unspecified bound (up_bound is null) and for
>>>>>> an array at the end of a struct.  For those the function returns
>>>>>> non-null, and for the others (arrays of runtime bound) it returns
>>>>>> null.  (I passed arg instead of ref but I see no difference in
>>>>>> my tests.)
>>>>>
>>>>>
>>>>> If you pass a.b.c[i] it will return NULL, if you pass a.b.c ('arg')
>>>>
>>>> it will
>>>>>
>>>>> return the offset of 'c'.  If you pass a.b[j].c it will still return
>>>>
>>>> NULL.
>>>>>
>>>>> You could use get_ref_base_and_extent which will return the offset
>>>>> of a.b[0].c in this case and sets max_size != size - but you are only
>>>>> interested in offset.  The disadvantage of get_ref_base_and_extent
>>>>> is it returns offset in bits thus if the offset is too large for a
>>>>
>>>> HWI
>>>>>
>>>>> you'll instead get offset == 0 and max_size == -1.
>>>>>
>>>>> Thus I'm saying this is dead code for variable array accesses
>>>>> (even for the array you are warning about).  Yes, for constant index
>>>>> and at-struct-end you'll get sth, but the warning is in VRP because
>>>>> of variable indexes.
>>>>>
>>>>> So I suggest to pass 'arg' and use get_ref_base_and_extent
>>>>> for some extra precision (and possible lossage for very very large
>>>>> structures).
>>>>
>>>>
>>>> Computing bit offsets defeats the out-of-bounds flexible array
>>>> index detection because the computation overflows (the function
>>>> sets the offset to zero).
>>>
>>>
>>> It shouldn't if you pass arg rather than ref.
>>
>>
>> I suspect you missed my reply in IRC where I confirmed that this
>> approach doesn't work for the reason you yourself mentioned above:
>>
>>>>> The disadvantage of get_ref_base_and_extent
>>>>> is it returns offset in bits thus if the offset is too large
>>>>> for a HWI you'll instead get offset == 0 and max_size == -1.
>>
>> This means that using the function you suggest would either prevent
>> detecting the out-of-bounds indices that overflow due to the bit
>> offset, thus largely defeating the purpose of the patch (to detect
>> excessively large indices), or not printing the value of the out-of
>> bounds index in the warning in this case, which would in turn mean
>> further changes to the rest of the logic.
>
> Well, it would not handle the case of
>
> a.b[offset-bringing-you-bitoffset-overflow].c[i]
>
> but it would handle (compared to your version of the patch)
>
> a.b[j].c[i]
>
> with i being the unreasonably large offset.  That's beause we
> look at the bit offset of a.b[j].c thus the _start_ of the array.

AFAICS, using get_ref_base_and_extent only results in missing a set
of cases that get_addr_base_and_unit_offset doesn't.  It's entirely
possible I'm missing something but I can find no other difference
or construct a test case where an out-of-bounds index is detected
with the former that isn't with the latter.  Here's an example:

   #define MAX (__PTRDIFF_MAX__ - 256)

   struct A { char a[256]; char ax[]; };

   int f (struct A *p, unsigned long i)
   {
     if (i < MAX + 1)
       i = MAX + 1;     // bug: sizeof *p + i > PTRDIFF_MAX

     return p->ax[i];   // missing -Warray-bounds
   }

This is because the offset of the array returned by
get_ref_base_and_extent when the bit offset computation overflows
is reset to zero.  So the function seems strictly worse in this
context that the other.   Am I missing something that makes this
a worthwhile tradeoff?  (If so, can you please show a test case?)

> I still find the half-address-space case totally pointless to warn about
> (even more to get "precise" here by subtracting the offset of the array).
> There's not a single testcase that looks motivating to me (like an
> error with some signed->unsigned conversion and then GCC magically
> figuring out a range you'd warn for)

-Warray-bounds is documented to

   ...warn about subscripts to arrays that are always out of bounds.

Excessively large indices that would imply an object in excess of
PTRDIFF_MAX bytes are always out of bounds, so expecting them to
be diagnosed consistently, even in corner cases, is in line with
the documentation (and presumably also the intent) of the warning.

Other compilers (Clang and Intel ICC) already detect more of these
kinds of problems than GCC does.  In addition, C++ requires them
to be diagnosed in constexpr contexts.  Improving GCC to match and
even exceed these other compilers is simple and inexpensive, and
makes GCC a better implementation.  I welcome suggestions to
improve the patch (or submit a follow up one) and I'm open to
arguments that one solution is superior than another but I don't
understand your objection to a working solution on the grounds
that it handles corner cases.  All bugs are corner cases that
someone didn't get right.

Martin
Martin Sebor Nov. 7, 2017, 3:23 a.m. UTC | #13
On 11/06/2017 11:41 AM, Jeff Law wrote:
> On 10/29/2017 10:15 AM, Martin Sebor wrote:
>> Ping -- please see my reply below.
>>
>> On 10/20/2017 09:57 AM, Richard Biener wrote:
>>>>>>> get_addr_base_and_unit_offset will return NULL if there's any
>>>> variable
>>>>>>> component in 'ref'.  So as written it seems to be dead code (you
>>>>>>> want to pass 'arg'?)
>>>>>>
>>>>>>
>>>>>> Sorry, I'm not sure I understand what you mean.  What do you think
>>>>>> is dead code?  The call to get_addr_base_and_unit_offset() is also
>>>>>> made for an array of unspecified bound (up_bound is null) and for
>>>>>> an array at the end of a struct.  For those the function returns
>>>>>> non-null, and for the others (arrays of runtime bound) it returns
>>>>>> null.  (I passed arg instead of ref but I see no difference in
>>>>>> my tests.)
>>>>>
>>>>> If you pass a.b.c[i] it will return NULL, if you pass a.b.c ('arg')
>>>> it will
>>>>> return the offset of 'c'.  If you pass a.b[j].c it will still return
>>>> NULL.
>>>>> You could use get_ref_base_and_extent which will return the offset
>>>>> of a.b[0].c in this case and sets max_size != size - but you are only
>>>>> interested in offset.  The disadvantage of get_ref_base_and_extent
>>>>> is it returns offset in bits thus if the offset is too large for a
>>>> HWI
>>>>> you'll instead get offset == 0 and max_size == -1.
>>>>>
>>>>> Thus I'm saying this is dead code for variable array accesses
>>>>> (even for the array you are warning about).  Yes, for constant index
>>>>> and at-struct-end you'll get sth, but the warning is in VRP because
>>>>> of variable indexes.
>>>>>
>>>>> So I suggest to pass 'arg' and use get_ref_base_and_extent
>>>>> for some extra precision (and possible lossage for very very large
>>>>> structures).
>>>>
>>>> Computing bit offsets defeats the out-of-bounds flexible array
>>>> index detection because the computation overflows (the function
>>>> sets the offset to zero).
>>>
>>> It shouldn't if you pass arg rather than ref.
>>
>> I suspect you missed my reply in IRC where I confirmed that this
>> approach doesn't work for the reason you yourself mentioned above:
>>
>>>>> The disadvantage of get_ref_base_and_extent
>>>>> is it returns offset in bits thus if the offset is too large
>>>>> for a HWI you'll instead get offset == 0 and max_size == -1.
>>
>> This means that using the function you suggest would either prevent
>> detecting the out-of-bounds indices that overflow due to the bit
>> offset, thus largely defeating the purpose of the patch (to detect
>> excessively large indices), or not printing the value of the out-of
>> bounds index in the warning in this case, which would in turn mean
>> further changes to the rest of the logic.
> I think Richi's point is that it's more important (in his opinion) to
> handle the varying objects within an access like a.b.c[i] than to handle
> cases that would overflow when converted to bits.  Thus he'd prefer to
> use get_ref_base_and_extent over get_addr_base_and_unit_offset.
>
> I wonder if a hybrid approach here would work.
>
> ie, use get_ref_base_and_extent per Richi's request.  When that returns
> an max_size of -1, then call get_addr_base_and_unit_offset and if it
> returns that the offset is huge, then warn without any deeper analysis.
>
> Yea, this could trip a false positive if someone makes an array that is
> half the address space (give or take), but that's probably not at all
> common.  Whereas the additional cases handled by get_ref_base_and_extent
> are perhaps more useful.
>
> Thoughts?

I'm always open to suggestions for improvements.  I just haven't
been able to construct a test case to demonstrate the improvement
gained by using get_ref_base_and_extent.  The only test cases I've
come up with show false positives.  It could very well be because
of my lack of experience with the APIs so if there is a trade-off
here that could be compensated for by the approach you suggest I'm
happy to incorporate it into the patch.  I just need a test case
to verify that the solution works as intended.

Martin
Richard Biener Nov. 7, 2017, 10:18 a.m. UTC | #14
On Tue, Nov 7, 2017 at 4:23 AM, Martin Sebor <msebor@gmail.com> wrote:
> On 11/06/2017 11:41 AM, Jeff Law wrote:
>>
>> On 10/29/2017 10:15 AM, Martin Sebor wrote:
>>>
>>> Ping -- please see my reply below.
>>>
>>> On 10/20/2017 09:57 AM, Richard Biener wrote:
>>>>>>>>
>>>>>>>> get_addr_base_and_unit_offset will return NULL if there's any
>>>>>
>>>>> variable
>>>>>>>>
>>>>>>>> component in 'ref'.  So as written it seems to be dead code (you
>>>>>>>> want to pass 'arg'?)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Sorry, I'm not sure I understand what you mean.  What do you think
>>>>>>> is dead code?  The call to get_addr_base_and_unit_offset() is also
>>>>>>> made for an array of unspecified bound (up_bound is null) and for
>>>>>>> an array at the end of a struct.  For those the function returns
>>>>>>> non-null, and for the others (arrays of runtime bound) it returns
>>>>>>> null.  (I passed arg instead of ref but I see no difference in
>>>>>>> my tests.)
>>>>>>
>>>>>>
>>>>>> If you pass a.b.c[i] it will return NULL, if you pass a.b.c ('arg')
>>>>>
>>>>> it will
>>>>>>
>>>>>> return the offset of 'c'.  If you pass a.b[j].c it will still return
>>>>>
>>>>> NULL.
>>>>>>
>>>>>> You could use get_ref_base_and_extent which will return the offset
>>>>>> of a.b[0].c in this case and sets max_size != size - but you are only
>>>>>> interested in offset.  The disadvantage of get_ref_base_and_extent
>>>>>> is it returns offset in bits thus if the offset is too large for a
>>>>>
>>>>> HWI
>>>>>>
>>>>>> you'll instead get offset == 0 and max_size == -1.
>>>>>>
>>>>>> Thus I'm saying this is dead code for variable array accesses
>>>>>> (even for the array you are warning about).  Yes, for constant index
>>>>>> and at-struct-end you'll get sth, but the warning is in VRP because
>>>>>> of variable indexes.
>>>>>>
>>>>>> So I suggest to pass 'arg' and use get_ref_base_and_extent
>>>>>> for some extra precision (and possible lossage for very very large
>>>>>> structures).
>>>>>
>>>>>
>>>>> Computing bit offsets defeats the out-of-bounds flexible array
>>>>> index detection because the computation overflows (the function
>>>>> sets the offset to zero).
>>>>
>>>>
>>>> It shouldn't if you pass arg rather than ref.
>>>
>>>
>>> I suspect you missed my reply in IRC where I confirmed that this
>>> approach doesn't work for the reason you yourself mentioned above:
>>>
>>>>>> The disadvantage of get_ref_base_and_extent
>>>>>> is it returns offset in bits thus if the offset is too large
>>>>>> for a HWI you'll instead get offset == 0 and max_size == -1.
>>>
>>>
>>> This means that using the function you suggest would either prevent
>>> detecting the out-of-bounds indices that overflow due to the bit
>>> offset, thus largely defeating the purpose of the patch (to detect
>>> excessively large indices), or not printing the value of the out-of
>>> bounds index in the warning in this case, which would in turn mean
>>> further changes to the rest of the logic.
>>
>> I think Richi's point is that it's more important (in his opinion) to
>> handle the varying objects within an access like a.b.c[i] than to handle
>> cases that would overflow when converted to bits.  Thus he'd prefer to
>> use get_ref_base_and_extent over get_addr_base_and_unit_offset.
>>
>> I wonder if a hybrid approach here would work.
>>
>> ie, use get_ref_base_and_extent per Richi's request.  When that returns
>> an max_size of -1, then call get_addr_base_and_unit_offset and if it
>> returns that the offset is huge, then warn without any deeper analysis.
>>
>> Yea, this could trip a false positive if someone makes an array that is
>> half the address space (give or take), but that's probably not at all
>> common.  Whereas the additional cases handled by get_ref_base_and_extent
>> are perhaps more useful.
>>
>> Thoughts?
>
>
> I'm always open to suggestions for improvements.  I just haven't
> been able to construct a test case to demonstrate the improvement
> gained by using get_ref_base_and_extent.  The only test cases I've
> come up with show false positives.  It could very well be because
> of my lack of experience with the APIs so if there is a trade-off
> here that could be compensated for by the approach you suggest I'm
> happy to incorporate it into the patch.  I just need a test case
> to verify that the solution works as intended.

The difficulty with a testcase like

struct { struct A { int b[1]; } a[5]; } x;

 x.a[i].b[j]

is that b is not considered an array at struct end since one of my
recent changes to array_at_struct_end (basically it disallows having
a flex array as the last member of an array).

It would still stand for non-array components with variable offset
but you can't create C testcases for that.

So yes, for the specific case within the array_at_struct_end_p condition
get_addr_base_and_unit_offset is enough.  IIRC the conditon was
a bit more than just get_addr_base_and_unit_offset.  up_bound !=
INTEGER_CST for example.  So make the above

void foo (int n, int i)
{
 struct { struct A { int b[n]; } a[5]; } x;
 return x.a[i].b[PTRDIFF_MAX/2];
}

with appropriately adjusted constant.  Does that give you the testcase
you want?

As of "it works, catches corner-cases, ..." - yes, it does, but it
adds code that needs to be maintained, may contain bugs, is
executed even for valid code.

Richard.


> Martin
Martin Sebor Nov. 8, 2017, 3:12 a.m. UTC | #15
On 11/07/2017 03:18 AM, Richard Biener wrote:
> On Tue, Nov 7, 2017 at 4:23 AM, Martin Sebor <msebor@gmail.com> wrote:
>> On 11/06/2017 11:41 AM, Jeff Law wrote:
>>>
>>> On 10/29/2017 10:15 AM, Martin Sebor wrote:
>>>>
>>>> Ping -- please see my reply below.
>>>>
>>>> On 10/20/2017 09:57 AM, Richard Biener wrote:
>>>>>>>>>
>>>>>>>>> get_addr_base_and_unit_offset will return NULL if there's any
>>>>>>
>>>>>> variable
>>>>>>>>>
>>>>>>>>> component in 'ref'.  So as written it seems to be dead code (you
>>>>>>>>> want to pass 'arg'?)
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Sorry, I'm not sure I understand what you mean.  What do you think
>>>>>>>> is dead code?  The call to get_addr_base_and_unit_offset() is also
>>>>>>>> made for an array of unspecified bound (up_bound is null) and for
>>>>>>>> an array at the end of a struct.  For those the function returns
>>>>>>>> non-null, and for the others (arrays of runtime bound) it returns
>>>>>>>> null.  (I passed arg instead of ref but I see no difference in
>>>>>>>> my tests.)
>>>>>>>
>>>>>>>
>>>>>>> If you pass a.b.c[i] it will return NULL, if you pass a.b.c ('arg')
>>>>>>
>>>>>> it will
>>>>>>>
>>>>>>> return the offset of 'c'.  If you pass a.b[j].c it will still return
>>>>>>
>>>>>> NULL.
>>>>>>>
>>>>>>> You could use get_ref_base_and_extent which will return the offset
>>>>>>> of a.b[0].c in this case and sets max_size != size - but you are only
>>>>>>> interested in offset.  The disadvantage of get_ref_base_and_extent
>>>>>>> is it returns offset in bits thus if the offset is too large for a
>>>>>>
>>>>>> HWI
>>>>>>>
>>>>>>> you'll instead get offset == 0 and max_size == -1.
>>>>>>>
>>>>>>> Thus I'm saying this is dead code for variable array accesses
>>>>>>> (even for the array you are warning about).  Yes, for constant index
>>>>>>> and at-struct-end you'll get sth, but the warning is in VRP because
>>>>>>> of variable indexes.
>>>>>>>
>>>>>>> So I suggest to pass 'arg' and use get_ref_base_and_extent
>>>>>>> for some extra precision (and possible lossage for very very large
>>>>>>> structures).
>>>>>>
>>>>>>
>>>>>> Computing bit offsets defeats the out-of-bounds flexible array
>>>>>> index detection because the computation overflows (the function
>>>>>> sets the offset to zero).
>>>>>
>>>>>
>>>>> It shouldn't if you pass arg rather than ref.
>>>>
>>>>
>>>> I suspect you missed my reply in IRC where I confirmed that this
>>>> approach doesn't work for the reason you yourself mentioned above:
>>>>
>>>>>>> The disadvantage of get_ref_base_and_extent
>>>>>>> is it returns offset in bits thus if the offset is too large
>>>>>>> for a HWI you'll instead get offset == 0 and max_size == -1.
>>>>
>>>>
>>>> This means that using the function you suggest would either prevent
>>>> detecting the out-of-bounds indices that overflow due to the bit
>>>> offset, thus largely defeating the purpose of the patch (to detect
>>>> excessively large indices), or not printing the value of the out-of
>>>> bounds index in the warning in this case, which would in turn mean
>>>> further changes to the rest of the logic.
>>>
>>> I think Richi's point is that it's more important (in his opinion) to
>>> handle the varying objects within an access like a.b.c[i] than to handle
>>> cases that would overflow when converted to bits.  Thus he'd prefer to
>>> use get_ref_base_and_extent over get_addr_base_and_unit_offset.
>>>
>>> I wonder if a hybrid approach here would work.
>>>
>>> ie, use get_ref_base_and_extent per Richi's request.  When that returns
>>> an max_size of -1, then call get_addr_base_and_unit_offset and if it
>>> returns that the offset is huge, then warn without any deeper analysis.
>>>
>>> Yea, this could trip a false positive if someone makes an array that is
>>> half the address space (give or take), but that's probably not at all
>>> common.  Whereas the additional cases handled by get_ref_base_and_extent
>>> are perhaps more useful.
>>>
>>> Thoughts?
>>
>>
>> I'm always open to suggestions for improvements.  I just haven't
>> been able to construct a test case to demonstrate the improvement
>> gained by using get_ref_base_and_extent.  The only test cases I've
>> come up with show false positives.  It could very well be because
>> of my lack of experience with the APIs so if there is a trade-off
>> here that could be compensated for by the approach you suggest I'm
>> happy to incorporate it into the patch.  I just need a test case
>> to verify that the solution works as intended.
>
> The difficulty with a testcase like
>
> struct { struct A { int b[1]; } a[5]; } x;
>
>  x.a[i].b[j]
>
> is that b is not considered an array at struct end since one of my
> recent changes to array_at_struct_end (basically it disallows having
> a flex array as the last member of an array).
>
> It would still stand for non-array components with variable offset
> but you can't create C testcases for that.
>
> So yes, for the specific case within the array_at_struct_end_p condition
> get_addr_base_and_unit_offset is enough.  IIRC the conditon was
> a bit more than just get_addr_base_and_unit_offset.  up_bound !=
> INTEGER_CST for example.  So make the above
>
> void foo (int n, int i)
> {
>  struct { struct A { int b[n]; } a[5]; } x;
>  return x.a[i].b[PTRDIFF_MAX/2];
> }
>
> with appropriately adjusted constant.  Does that give you the testcase
> you want?

Thank you for the test case.  It is diagnosed the same way
irrespective of which of the two functions is used so it serves
to confirm my understanding that the only difference between
the two functions is bits vs bytes.

Unless you have another test case that does demonstrate that
get_ref_base_and_extent is necessary/helpful, is the last patch
okay to commit?

(Again, to be clear, I'm happy to change or enhance the patch if
I can verify that the change handles cases that the current patch
misses.)

>
> As of "it works, catches corner-cases, ..." - yes, it does, but it
> adds code that needs to be maintained, may contain bugs, is
> executed even for valid code.

Understood.  I don't claim the enhancement is free of any cost
whatsoever.  But it is teeny by most standards and it doesn't
detect just excessively large indices but also negative indices
into last member arrays (bug 68325) and out-of-bounds indices
(bug 82583).  The "excessively large" part does come largely
for free with the other checks.

Martin
Martin Sebor Nov. 13, 2017, 5:37 p.m. UTC | #16
Richard, this thread may have been conflated with the one Re:
[PATCH] enhance -Warray-bounds to detect out-of-bounds offsets
(PR 82455) They are about different things.

I'm still looking for approval of:

   https://gcc.gnu.org/ml/gcc-patches/2017-10/msg01208.html

Thanks
Martin

>> The difficulty with a testcase like
>>
>> struct { struct A { int b[1]; } a[5]; } x;
>>
>>  x.a[i].b[j]
>>
>> is that b is not considered an array at struct end since one of my
>> recent changes to array_at_struct_end (basically it disallows having
>> a flex array as the last member of an array).
>>
>> It would still stand for non-array components with variable offset
>> but you can't create C testcases for that.
>>
>> So yes, for the specific case within the array_at_struct_end_p condition
>> get_addr_base_and_unit_offset is enough.  IIRC the conditon was
>> a bit more than just get_addr_base_and_unit_offset.  up_bound !=
>> INTEGER_CST for example.  So make the above
>>
>> void foo (int n, int i)
>> {
>>  struct { struct A { int b[n]; } a[5]; } x;
>>  return x.a[i].b[PTRDIFF_MAX/2];
>> }
>>
>> with appropriately adjusted constant.  Does that give you the testcase
>> you want?
>
> Thank you for the test case.  It is diagnosed the same way
> irrespective of which of the two functions is used so it serves
> to confirm my understanding that the only difference between
> the two functions is bits vs bytes.
>
> Unless you have another test case that does demonstrate that
> get_ref_base_and_extent is necessary/helpful, is the last patch
> okay to commit?
>
> (Again, to be clear, I'm happy to change or enhance the patch if
> I can verify that the change handles cases that the current patch
> misses.)
>
>>
>> As of "it works, catches corner-cases, ..." - yes, it does, but it
>> adds code that needs to be maintained, may contain bugs, is
>> executed even for valid code.
>
> Understood.  I don't claim the enhancement is free of any cost
> whatsoever.  But it is teeny by most standards and it doesn't
> detect just excessively large indices but also negative indices
> into last member arrays (bug 68325) and out-of-bounds indices
> (bug 82583).  The "excessively large" part does come largely
> for free with the other checks.
>
> Martin
Richard Biener Nov. 14, 2017, 12:28 p.m. UTC | #17
On Mon, Nov 13, 2017 at 6:37 PM, Martin Sebor <msebor@gmail.com> wrote:
> Richard, this thread may have been conflated with the one Re:
> [PATCH] enhance -Warray-bounds to detect out-of-bounds offsets
> (PR 82455) They are about different things.
>
> I'm still looking for approval of:
>
>   https://gcc.gnu.org/ml/gcc-patches/2017-10/msg01208.html

+      tree maxbound
+ = build_int_cst (sizetype, ~(1LLU << (TYPE_PRECISION (sizetype) - 1)));

this looks possibly bogus.  Can you instead use

  up_bound_p1
    = wide_int_to_tree (sizetype, wi::div_trunc (wi::max_value
(TYPE_PRECISION (sizetype), SIGNED), wi::to_wide (eltsize)));

please?  Note you are _not_ computing the proper upper bound here because that
is what you compute plus low_bound.

+      up_bound_p1 = int_const_binop (TRUNC_DIV_EXPR, maxbound, eltsize);

+
+      tree arg = TREE_OPERAND (ref, 0);
+      tree_code code = TREE_CODE (arg);
+      if (code == COMPONENT_REF)
+ {
+  HOST_WIDE_INT off;
+  if (tree base = get_addr_base_and_unit_offset (ref, &off))
+    {
+      tree size = TYPE_SIZE_UNIT (TREE_TYPE (base));
+      if (TREE_CODE (size) == INTEGER_CST)
+ up_bound_p1 = int_const_binop (MINUS_EXPR, up_bound_p1, size);

I think I asked this multiple times now but given 'ref' is the
variable array-ref
a.b.c[i] when you call get_addr_base_and_unit_offset (ref, &off) you always
get a NULL_TREE return value.

So I asked you to pass it 'arg' instead ... which gets you the offset of
a.b.c, which looks like what you intended to get anyway.

I also wonder what you compute here - you are looking at the size of 'base'
but that is the size of 'a'.  You don't even use the computed offset!  Which
means you could have used get_base_address instead!?  Also the type
of 'base' may be completely off given MEM[&blk + 8].b.c[i] would return blk
as base which might be an array of chars and not in any way related to
the type of the innermost structure we access with COMPONENT_REFs.

Why are you only looking at COMPONENT_REF args anyways?  You
don't want to handle a.b[3][i]?

That is, I'd have expected you do

   if (get_addr_base_and_unit_offset (ref, &off))
     up_bound_p1 = wide_int_to_tree (sizetype, wi::sub (wi::to_wide
(up_bound_p1), off));

Richard.

> Thanks
> Martin
>
>
>>> The difficulty with a testcase like
>>>
>>> struct { struct A { int b[1]; } a[5]; } x;
>>>
>>>  x.a[i].b[j]
>>>
>>> is that b is not considered an array at struct end since one of my
>>> recent changes to array_at_struct_end (basically it disallows having
>>> a flex array as the last member of an array).
>>>
>>> It would still stand for non-array components with variable offset
>>> but you can't create C testcases for that.
>>>
>>> So yes, for the specific case within the array_at_struct_end_p condition
>>> get_addr_base_and_unit_offset is enough.  IIRC the conditon was
>>> a bit more than just get_addr_base_and_unit_offset.  up_bound !=
>>> INTEGER_CST for example.  So make the above
>>>
>>> void foo (int n, int i)
>>> {
>>>  struct { struct A { int b[n]; } a[5]; } x;
>>>  return x.a[i].b[PTRDIFF_MAX/2];
>>> }
>>>
>>> with appropriately adjusted constant.  Does that give you the testcase
>>> you want?
>>
>>
>> Thank you for the test case.  It is diagnosed the same way
>> irrespective of which of the two functions is used so it serves
>> to confirm my understanding that the only difference between
>> the two functions is bits vs bytes.
>>
>> Unless you have another test case that does demonstrate that
>> get_ref_base_and_extent is necessary/helpful, is the last patch
>> okay to commit?
>>
>> (Again, to be clear, I'm happy to change or enhance the patch if
>> I can verify that the change handles cases that the current patch
>> misses.)
>>
>>>
>>> As of "it works, catches corner-cases, ..." - yes, it does, but it
>>> adds code that needs to be maintained, may contain bugs, is
>>> executed even for valid code.
>>
>>
>> Understood.  I don't claim the enhancement is free of any cost
>> whatsoever.  But it is teeny by most standards and it doesn't
>> detect just excessively large indices but also negative indices
>> into last member arrays (bug 68325) and out-of-bounds indices
>> (bug 82583).  The "excessively large" part does come largely
>> for free with the other checks.
>>
>> Martin
>
>
Martin Sebor Nov. 14, 2017, 5:45 p.m. UTC | #18
On 11/14/2017 05:28 AM, Richard Biener wrote:
> On Mon, Nov 13, 2017 at 6:37 PM, Martin Sebor <msebor@gmail.com> wrote:
>> Richard, this thread may have been conflated with the one Re:
>> [PATCH] enhance -Warray-bounds to detect out-of-bounds offsets
>> (PR 82455) They are about different things.
>>
>> I'm still looking for approval of:
>>
>>   https://gcc.gnu.org/ml/gcc-patches/2017-10/msg01208.html

Sorry, I pointed to an outdated version.  This is the latest
version:

   https://gcc.gnu.org/ml/gcc-patches/2017-10/msg01304.html

My bad...

>
> +      tree maxbound
> + = build_int_cst (sizetype, ~(1LLU << (TYPE_PRECISION (sizetype) - 1)));
>
> this looks possibly bogus.  Can you instead use
>
>   up_bound_p1
>     = wide_int_to_tree (sizetype, wi::div_trunc (wi::max_value
> (TYPE_PRECISION (sizetype), SIGNED), wi::to_wide (eltsize)));
>
> please?  Note you are _not_ computing the proper upper bound here because that
> is what you compute plus low_bound.
>
> +      up_bound_p1 = int_const_binop (TRUNC_DIV_EXPR, maxbound, eltsize);
>
> +
> +      tree arg = TREE_OPERAND (ref, 0);
> +      tree_code code = TREE_CODE (arg);
> +      if (code == COMPONENT_REF)
> + {
> +  HOST_WIDE_INT off;
> +  if (tree base = get_addr_base_and_unit_offset (ref, &off))
> +    {
> +      tree size = TYPE_SIZE_UNIT (TREE_TYPE (base));
> +      if (TREE_CODE (size) == INTEGER_CST)
> + up_bound_p1 = int_const_binop (MINUS_EXPR, up_bound_p1, size);
>
> I think I asked this multiple times now but given 'ref' is the
> variable array-ref
> a.b.c[i] when you call get_addr_base_and_unit_offset (ref, &off) you always
> get a NULL_TREE return value.
>
> So I asked you to pass it 'arg' instead ... which gets you the offset of
> a.b.c, which looks like what you intended to get anyway.
>
> I also wonder what you compute here - you are looking at the size of 'base'
> but that is the size of 'a'.  You don't even use the computed offset!  Which
> means you could have used get_base_address instead!?  Also the type
> of 'base' may be completely off given MEM[&blk + 8].b.c[i] would return blk
> as base which might be an array of chars and not in any way related to
> the type of the innermost structure we access with COMPONENT_REFs.
>
> Why are you only looking at COMPONENT_REF args anyways?  You
> don't want to handle a.b[3][i]?
>
> That is, I'd have expected you do
>
>    if (get_addr_base_and_unit_offset (ref, &off))
>      up_bound_p1 = wide_int_to_tree (sizetype, wi::sub (wi::to_wide
> (up_bound_p1), off));
>
> Richard.
>
>> Thanks
>> Martin
>>
>>
>>>> The difficulty with a testcase like
>>>>
>>>> struct { struct A { int b[1]; } a[5]; } x;
>>>>
>>>>  x.a[i].b[j]
>>>>
>>>> is that b is not considered an array at struct end since one of my
>>>> recent changes to array_at_struct_end (basically it disallows having
>>>> a flex array as the last member of an array).
>>>>
>>>> It would still stand for non-array components with variable offset
>>>> but you can't create C testcases for that.
>>>>
>>>> So yes, for the specific case within the array_at_struct_end_p condition
>>>> get_addr_base_and_unit_offset is enough.  IIRC the conditon was
>>>> a bit more than just get_addr_base_and_unit_offset.  up_bound !=
>>>> INTEGER_CST for example.  So make the above
>>>>
>>>> void foo (int n, int i)
>>>> {
>>>>  struct { struct A { int b[n]; } a[5]; } x;
>>>>  return x.a[i].b[PTRDIFF_MAX/2];
>>>> }
>>>>
>>>> with appropriately adjusted constant.  Does that give you the testcase
>>>> you want?
>>>
>>>
>>> Thank you for the test case.  It is diagnosed the same way
>>> irrespective of which of the two functions is used so it serves
>>> to confirm my understanding that the only difference between
>>> the two functions is bits vs bytes.
>>>
>>> Unless you have another test case that does demonstrate that
>>> get_ref_base_and_extent is necessary/helpful, is the last patch
>>> okay to commit?
>>>
>>> (Again, to be clear, I'm happy to change or enhance the patch if
>>> I can verify that the change handles cases that the current patch
>>> misses.)
>>>
>>>>
>>>> As of "it works, catches corner-cases, ..." - yes, it does, but it
>>>> adds code that needs to be maintained, may contain bugs, is
>>>> executed even for valid code.
>>>
>>>
>>> Understood.  I don't claim the enhancement is free of any cost
>>> whatsoever.  But it is teeny by most standards and it doesn't
>>> detect just excessively large indices but also negative indices
>>> into last member arrays (bug 68325) and out-of-bounds indices
>>> (bug 82583).  The "excessively large" part does come largely
>>> for free with the other checks.
>>>
>>> Martin
>>
>>
Richard Biener Nov. 15, 2017, 10:51 a.m. UTC | #19
On Tue, Nov 14, 2017 at 6:45 PM, Martin Sebor <msebor@gmail.com> wrote:
> On 11/14/2017 05:28 AM, Richard Biener wrote:
>>
>> On Mon, Nov 13, 2017 at 6:37 PM, Martin Sebor <msebor@gmail.com> wrote:
>>>
>>> Richard, this thread may have been conflated with the one Re:
>>> [PATCH] enhance -Warray-bounds to detect out-of-bounds offsets
>>> (PR 82455) They are about different things.
>>>
>>> I'm still looking for approval of:
>>>
>>>   https://gcc.gnu.org/ml/gcc-patches/2017-10/msg01208.html
>
>
> Sorry, I pointed to an outdated version.  This is the latest
> version:
>
>   https://gcc.gnu.org/ml/gcc-patches/2017-10/msg01304.html
>
> My bad...
>
>
>>
>> +      tree maxbound
>> + = build_int_cst (sizetype, ~(1LLU << (TYPE_PRECISION (sizetype) - 1)));
>>
>> this looks possibly bogus.  Can you instead use
>>
>>   up_bound_p1
>>     = wide_int_to_tree (sizetype, wi::div_trunc (wi::max_value
>> (TYPE_PRECISION (sizetype), SIGNED), wi::to_wide (eltsize)));
>>
>> please?  Note you are _not_ computing the proper upper bound here because
>> that
>> is what you compute plus low_bound.
>>
>> +      up_bound_p1 = int_const_binop (TRUNC_DIV_EXPR, maxbound, eltsize);
>>
>> +
>> +      tree arg = TREE_OPERAND (ref, 0);
>> +      tree_code code = TREE_CODE (arg);
>> +      if (code == COMPONENT_REF)
>> + {
>> +  HOST_WIDE_INT off;
>> +  if (tree base = get_addr_base_and_unit_offset (ref, &off))
>> +    {
>> +      tree size = TYPE_SIZE_UNIT (TREE_TYPE (base));
>> +      if (TREE_CODE (size) == INTEGER_CST)
>> + up_bound_p1 = int_const_binop (MINUS_EXPR, up_bound_p1, size);
>>
>> I think I asked this multiple times now but given 'ref' is the
>> variable array-ref
>> a.b.c[i] when you call get_addr_base_and_unit_offset (ref, &off) you
>> always
>> get a NULL_TREE return value.
>>
>> So I asked you to pass it 'arg' instead ... which gets you the offset of
>> a.b.c, which looks like what you intended to get anyway.
>>
>> I also wonder what you compute here - you are looking at the size of
>> 'base'
>> but that is the size of 'a'.  You don't even use the computed offset!
>> Which
>> means you could have used get_base_address instead!?  Also the type
>> of 'base' may be completely off given MEM[&blk + 8].b.c[i] would return
>> blk
>> as base which might be an array of chars and not in any way related to
>> the type of the innermost structure we access with COMPONENT_REFs.
>>
>> Why are you only looking at COMPONENT_REF args anyways?  You
>> don't want to handle a.b[3][i]?
>>
>> That is, I'd have expected you do
>>
>>    if (get_addr_base_and_unit_offset (ref, &off))
>>      up_bound_p1 = wide_int_to_tree (sizetype, wi::sub (wi::to_wide
>> (up_bound_p1), off));

^^^^^^^^^

Richard.

>> Richard.
>>
>>> Thanks
>>> Martin
>>>
>>>
>>>>> The difficulty with a testcase like
>>>>>
>>>>> struct { struct A { int b[1]; } a[5]; } x;
>>>>>
>>>>>  x.a[i].b[j]
>>>>>
>>>>> is that b is not considered an array at struct end since one of my
>>>>> recent changes to array_at_struct_end (basically it disallows having
>>>>> a flex array as the last member of an array).
>>>>>
>>>>> It would still stand for non-array components with variable offset
>>>>> but you can't create C testcases for that.
>>>>>
>>>>> So yes, for the specific case within the array_at_struct_end_p
>>>>> condition
>>>>> get_addr_base_and_unit_offset is enough.  IIRC the conditon was
>>>>> a bit more than just get_addr_base_and_unit_offset.  up_bound !=
>>>>> INTEGER_CST for example.  So make the above
>>>>>
>>>>> void foo (int n, int i)
>>>>> {
>>>>>  struct { struct A { int b[n]; } a[5]; } x;
>>>>>  return x.a[i].b[PTRDIFF_MAX/2];
>>>>> }
>>>>>
>>>>> with appropriately adjusted constant.  Does that give you the testcase
>>>>> you want?
>>>>
>>>>
>>>>
>>>> Thank you for the test case.  It is diagnosed the same way
>>>> irrespective of which of the two functions is used so it serves
>>>> to confirm my understanding that the only difference between
>>>> the two functions is bits vs bytes.
>>>>
>>>> Unless you have another test case that does demonstrate that
>>>> get_ref_base_and_extent is necessary/helpful, is the last patch
>>>> okay to commit?
>>>>
>>>> (Again, to be clear, I'm happy to change or enhance the patch if
>>>> I can verify that the change handles cases that the current patch
>>>> misses.)
>>>>
>>>>>
>>>>> As of "it works, catches corner-cases, ..." - yes, it does, but it
>>>>> adds code that needs to be maintained, may contain bugs, is
>>>>> executed even for valid code.
>>>>
>>>>
>>>>
>>>> Understood.  I don't claim the enhancement is free of any cost
>>>> whatsoever.  But it is teeny by most standards and it doesn't
>>>> detect just excessively large indices but also negative indices
>>>> into last member arrays (bug 68325) and out-of-bounds indices
>>>> (bug 82583).  The "excessively large" part does come largely
>>>> for free with the other checks.
>>>>
>>>> Martin
>>>
>>>
>>>
>
Martin Sebor Nov. 16, 2017, 3:07 a.m. UTC | #20
On 11/15/2017 03:51 AM, Richard Biener wrote:
> On Tue, Nov 14, 2017 at 6:45 PM, Martin Sebor <msebor@gmail.com> wrote:
>> On 11/14/2017 05:28 AM, Richard Biener wrote:
>>>
>>> On Mon, Nov 13, 2017 at 6:37 PM, Martin Sebor <msebor@gmail.com> wrote:
>>>>
>>>> Richard, this thread may have been conflated with the one Re:
>>>> [PATCH] enhance -Warray-bounds to detect out-of-bounds offsets
>>>> (PR 82455) They are about different things.
>>>>
>>>> I'm still looking for approval of:
>>>>
>>>>   https://gcc.gnu.org/ml/gcc-patches/2017-10/msg01208.html
>>
>>
>> Sorry, I pointed to an outdated version.  This is the latest
>> version:
>>
>>   https://gcc.gnu.org/ml/gcc-patches/2017-10/msg01304.html
>>
>> My bad...
>>
>>
>>>
>>> +      tree maxbound
>>> + = build_int_cst (sizetype, ~(1LLU << (TYPE_PRECISION (sizetype) - 1)));
>>>
>>> this looks possibly bogus.  Can you instead use
>>>
>>>   up_bound_p1
>>>     = wide_int_to_tree (sizetype, wi::div_trunc (wi::max_value
>>> (TYPE_PRECISION (sizetype), SIGNED), wi::to_wide (eltsize)));
>>>
>>> please?  Note you are _not_ computing the proper upper bound here because
>>> that
>>> is what you compute plus low_bound.
>>>
>>> +      up_bound_p1 = int_const_binop (TRUNC_DIV_EXPR, maxbound, eltsize);
>>>
>>> +
>>> +      tree arg = TREE_OPERAND (ref, 0);
>>> +      tree_code code = TREE_CODE (arg);
>>> +      if (code == COMPONENT_REF)
>>> + {
>>> +  HOST_WIDE_INT off;
>>> +  if (tree base = get_addr_base_and_unit_offset (ref, &off))
>>> +    {
>>> +      tree size = TYPE_SIZE_UNIT (TREE_TYPE (base));
>>> +      if (TREE_CODE (size) == INTEGER_CST)
>>> + up_bound_p1 = int_const_binop (MINUS_EXPR, up_bound_p1, size);
>>>
>>> I think I asked this multiple times now but given 'ref' is the
>>> variable array-ref
>>> a.b.c[i] when you call get_addr_base_and_unit_offset (ref, &off) you
>>> always
>>> get a NULL_TREE return value.
>>>
>>> So I asked you to pass it 'arg' instead ... which gets you the offset of
>>> a.b.c, which looks like what you intended to get anyway.
>>>
>>> I also wonder what you compute here - you are looking at the size of
>>> 'base'
>>> but that is the size of 'a'.  You don't even use the computed offset!
>>> Which
>>> means you could have used get_base_address instead!?  Also the type
>>> of 'base' may be completely off given MEM[&blk + 8].b.c[i] would return
>>> blk
>>> as base which might be an array of chars and not in any way related to
>>> the type of the innermost structure we access with COMPONENT_REFs.
>>>
>>> Why are you only looking at COMPONENT_REF args anyways?  You
>>> don't want to handle a.b[3][i]?
>>>
>>> That is, I'd have expected you do
>>>
>>>    if (get_addr_base_and_unit_offset (ref, &off))
>>>      up_bound_p1 = wide_int_to_tree (sizetype, wi::sub (wi::to_wide
>>> (up_bound_p1), off));
>
> ^^^^^^^^^
Martin Sebor Nov. 16, 2017, 3:08 a.m. UTC | #21
On 11/15/2017 03:51 AM, Richard Biener wrote:
> On Tue, Nov 14, 2017 at 6:45 PM, Martin Sebor <msebor@gmail.com> wrote:
>> On 11/14/2017 05:28 AM, Richard Biener wrote:
>>>
>>> On Mon, Nov 13, 2017 at 6:37 PM, Martin Sebor <msebor@gmail.com> wrote:
>>>>
>>>> Richard, this thread may have been conflated with the one Re:
>>>> [PATCH] enhance -Warray-bounds to detect out-of-bounds offsets
>>>> (PR 82455) They are about different things.
>>>>
>>>> I'm still looking for approval of:
>>>>
>>>>   https://gcc.gnu.org/ml/gcc-patches/2017-10/msg01208.html
>>
>>
>> Sorry, I pointed to an outdated version.  This is the latest
>> version:
>>
>>   https://gcc.gnu.org/ml/gcc-patches/2017-10/msg01304.html
>>
>> My bad...
>>
>>
>>>
>>> +      tree maxbound
>>> + = build_int_cst (sizetype, ~(1LLU << (TYPE_PRECISION (sizetype) - 1)));
>>>
>>> this looks possibly bogus.  Can you instead use
>>>
>>>   up_bound_p1
>>>     = wide_int_to_tree (sizetype, wi::div_trunc (wi::max_value
>>> (TYPE_PRECISION (sizetype), SIGNED), wi::to_wide (eltsize)));
>>>
>>> please?  Note you are _not_ computing the proper upper bound here because
>>> that
>>> is what you compute plus low_bound.
>>>
>>> +      up_bound_p1 = int_const_binop (TRUNC_DIV_EXPR, maxbound, eltsize);
>>>
>>> +
>>> +      tree arg = TREE_OPERAND (ref, 0);
>>> +      tree_code code = TREE_CODE (arg);
>>> +      if (code == COMPONENT_REF)
>>> + {
>>> +  HOST_WIDE_INT off;
>>> +  if (tree base = get_addr_base_and_unit_offset (ref, &off))
>>> +    {
>>> +      tree size = TYPE_SIZE_UNIT (TREE_TYPE (base));
>>> +      if (TREE_CODE (size) == INTEGER_CST)
>>> + up_bound_p1 = int_const_binop (MINUS_EXPR, up_bound_p1, size);
>>>
>>> I think I asked this multiple times now but given 'ref' is the
>>> variable array-ref
>>> a.b.c[i] when you call get_addr_base_and_unit_offset (ref, &off) you
>>> always
>>> get a NULL_TREE return value.
>>>
>>> So I asked you to pass it 'arg' instead ... which gets you the offset of
>>> a.b.c, which looks like what you intended to get anyway.
>>>
>>> I also wonder what you compute here - you are looking at the size of
>>> 'base'
>>> but that is the size of 'a'.  You don't even use the computed offset!
>>> Which
>>> means you could have used get_base_address instead!?  Also the type
>>> of 'base' may be completely off given MEM[&blk + 8].b.c[i] would return
>>> blk
>>> as base which might be an array of chars and not in any way related to
>>> the type of the innermost structure we access with COMPONENT_REFs.
>>>
>>> Why are you only looking at COMPONENT_REF args anyways?  You
>>> don't want to handle a.b[3][i]?
>>>
>>> That is, I'd have expected you do
>>>
>>>    if (get_addr_base_and_unit_offset (ref, &off))
>>>      up_bound_p1 = wide_int_to_tree (sizetype, wi::sub (wi::to_wide
>>> (up_bound_p1), off));
>
> ^^^^^^^^^

Please see the attached update.

Martin
PR tree-optimization/82588 - missing -Warray-bounds on a excessively large index
PR tree-optimization/82583 - missing -Warray-bounds on out-of-bounds inner indic

gcc/ChangeLog:
	PR tree-optimization/82588
	PR tree-optimization/82583
	* tree-vrp.c (check_array_ref): Handle flexible array members,
	string literals, and inner indices.
	(search_for_addr_array): Add detail to diagnostics.

gcc/testsuite/ChangeLog:
	PR tree-optimization/82588
	PR tree-optimization/82583	
	* c-c++-common/Warray-bounds.c: New test.
	* gcc.dg/Warray-bounds-11.c: Adjust.
	* gcc.dg/Warray-bounds-22.c: New test.

diff --git a/gcc/testsuite/c-c++-common/Warray-bounds.c b/gcc/testsuite/c-c++-common/Warray-bounds.c
new file mode 100644
index 0000000..bea36fb
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/Warray-bounds.c
@@ -0,0 +1,259 @@
+/* PR tree-optimization/82588 - missing -Warray-bounds on an excessively
+   large index
+   { dg-do compile }
+   { dg-require-effective-target alloca }
+   { dg-options "-O2 -Warray-bounds -ftrack-macro-expansion=0" }  */
+
+#define SIZE_MAX  __SIZE_MAX__
+#define DIFF_MAX  __PTRDIFF_MAX__
+#define DIFF_MIN  (-DIFF_MAX - 1)
+
+#define offsetof(T, m)   __builtin_offsetof (T, m)
+
+typedef __PTRDIFF_TYPE__ ssize_t;
+typedef __SIZE_TYPE__    size_t;
+
+extern ssize_t signed_value (void)
+{
+  extern volatile ssize_t signed_value_source;
+  return signed_value_source;
+}
+
+extern size_t unsigned_value (void)
+{
+  extern volatile size_t unsigned_value_source;
+  return unsigned_value_source;
+}
+
+ssize_t signed_range (ssize_t min, ssize_t max)
+{
+  ssize_t val = signed_value ();
+  return val < min || max < val ? min : val;
+}
+
+typedef struct AX { int n; char ax[]; } AX;
+
+typedef struct A1 { int i; char a1[1]; } A1;
+typedef struct B { int i; struct A1 a1x[]; } B;
+
+void sink (int, ...);
+
+#define R(min, max) signed_range (min, max)
+#define T(expr)     sink (0, expr)
+
+struct __attribute__ ((packed)) S16 { unsigned i: 16; };
+
+void farr_char (void)
+{
+  extern char ac[];
+
+  T (ac[DIFF_MIN]);                       /* { dg-warning "array subscript -\[0-9\]+ is below array bounds of .char *\\\[]." } */
+  T (ac[-1]);                             /* { dg-warning "array subscript -1 is below array bounds" } */
+  T (ac[0]);
+
+  T (ac[DIFF_MAX - 1]);
+  T (ac[DIFF_MAX]);                       /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+  T (ac[DIFF_MAX + (size_t)1]);           /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+  T (ac[SIZE_MAX]);                       /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+  T (ac[R (DIFF_MAX - 1, DIFF_MAX)]);
+  T (ac[R (DIFF_MIN + 1, -1)]);           /* { dg-warning "array subscript -1 is below array bounds" } */
+  T (ac[R (DIFF_MIN + 1, 0)]);
+  T (ac[R (-1, 0)]);
+  T (ac[R (-1, 1)]);
+}
+
+void farr_s16 (void)
+{
+  extern struct S16 ax[];
+
+  T (ax[DIFF_MIN]);                       /* { dg-warning "array subscript -\[0-9\]+ is below array bounds of .(struct )?S16 *\\\[]." } */
+  T (ax[-1]);                             /* { dg-warning "array subscript -1 is below array bounds" } */
+  T (ax[0]);
+
+  T (ax[DIFF_MAX / 2 - 1]);
+  T (ax[DIFF_MAX / 2]);                   /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+  T (ax[DIFF_MAX / 2 + (size_t)1]);       /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+  T (ax[SIZE_MAX]);                       /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+  T (ax[R (DIFF_MIN, -1)]);               /* { dg-warning "array subscript -1 is below array bounds" } */
+  T (ax[R (DIFF_MAX / 2 - 1, DIFF_MAX)]);
+  T (ax[R (DIFF_MAX / 2, DIFF_MAX)]);     /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+}
+
+void farr_s16_7 (void)
+{
+  extern struct S16 ax_7[][7];
+
+  T (ax_7[0][DIFF_MIN]);                  /* { dg-warning "array subscript -\[0-9\]+ is below array bounds of .(struct )?S16 *\\\[7]." } */
+  T (ax_7[0][-1]);                        /* { dg-warning "array subscript -1 is below array bounds" } */
+  T (ax_7[0][0]);
+  T (ax_7[0][7]);                         /* { dg-warning "array subscript 7 is above array bounds of .(struct )?S16 *\\\[7]." } */
+  T (ax_7[0][8]);                         /* { dg-warning "array subscript 8 is above array bounds of .(struct )?S16 *\\\[7]." } */
+
+  T (ax_7[0][DIFF_MAX / 2]);             /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+  T (ax_7[0][SIZE_MAX]);                 /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+
+  T (ax_7[DIFF_MIN][0]);                 /* { dg-warning "array subscript -\[0-9\]+ is below array bounds of .(struct )?S16 *\\\[]\\\[7]." } */
+  T (ax_7[-1][0]);                        /* { dg-warning "array subscript -1 is below array bounds" } */
+
+  T (ax_7[DIFF_MAX / 2][0]);              /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+  T (ax_7[SIZE_MAX][0]);                  /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+
+  ssize_t i = R (DIFF_MIN, -1);
+  T (ax_7[i][0]);                         /* { dg-warning "array subscript -1 is below array bounds" } */
+
+  T (ax_7[R (DIFF_MIN, -1)][0]);          /* { dg-warning "array subscript -1 is below array bounds" } */
+  T (ax_7[R (DIFF_MIN, 0)][0]);
+  T (ax_7[R (-2, -1)][0]);                /* { dg-warning "array subscript -1 is below array bounds" } */
+  T (ax_7[R (-1, 0)][0]);
+  T (ax_7[R (-1, 1)][0]);
+  T (ax_7[R (-1, 7)][0]);
+  T (ax_7[R (-1, DIFF_MAX)][0]);
+
+  T (ax_7[R ( 1, DIFF_MAX)][0]);
+  T (ax_7[R (DIFF_MAX / 14 - 1, DIFF_MAX)][0]);
+
+  i = R (DIFF_MAX / 14, DIFF_MAX);
+  T (ax_7[i][0]);                         /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+
+  T (ax_7[0][R (DIFF_MIN, 0)]);
+  T (ax_7[0][R (-1, 0)]);
+  T (ax_7[0][R (-1, 1)]);
+  T (ax_7[0][R (-1, 7)]);
+  T (ax_7[0][R (-1, DIFF_MAX)]);
+  T (ax_7[0][R (-1, DIFF_MAX)]);
+
+  T (ax_7[0][R (1, DIFF_MAX)]);
+  T (ax_7[0][R (7, DIFF_MAX)]);           /* { dg-warning "array subscript 7 is above array bounds" } */
+
+}
+
+void farr_x_5_7 (void)
+{
+  extern struct S16 a[][5][7];
+
+  T (a[0][0][-3]);                        /* { dg-warning "array subscript -3 is below array bounds of .(struct )?S16 *\\\[7]." } */
+  T (a[0][-2][0]);                        /* { dg-warning "array subscript -2 is below array bounds of .(struct )?S16 *\\\[5]\\\[7]." } */
+  T (a[-1][0][0]);                        /* { dg-warning "array subscript -1 is below array bounds of .(struct )?S16 *\\\[]\\\[5]\\\[7]." } */
+  T (a[R (-4, -3)][0][0]);                /* { dg-warning "array subscript -3 is below array bounds" } */
+  T (a[0][R (-3, -2)][0]);                /* { dg-warning "array subscript -2 is below array bounds" } */
+  T (a[0][0][R (-2, -1)]);                /* { dg-warning "array subscript -1 is below array bounds" } */
+}
+
+
+void fax (struct AX *p)
+{
+  T (p->ax[DIFF_MIN]);                   /* { dg-warning "array subscript -\[0-9\]+ is below array bounds" } */
+  T (p->ax[-1]);                          /* { dg-warning "array subscript -1 is below array bounds" } */
+  T (p->ax[0]);
+  T (p->ax[DIFF_MAX - sizeof *p - 1]);
+  T (p->ax[DIFF_MAX - sizeof *p]);        /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+  T (p->ax[DIFF_MAX - sizeof *p + 1]);    /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+  T (p->ax[SIZE_MAX]);                    /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+  T (p->ax[R (DIFF_MIN, -1)]);            /* { dg-warning "array subscript -1 is below array bounds" } */
+  T (p->ax[R (-1, 1)]);
+  T (p->ax[R (0, DIFF_MAX - 1)]);
+  T (p->ax[R (DIFF_MAX - 1, DIFF_MAX)]);/* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+}
+
+void fa1 (struct A1 *p)
+{
+  T (p->a1[DIFF_MIN]);                    /* { dg-warning "array subscript -\[0-9\]+ is below array bounds" } */
+  T (p->a1[-1]);                          /* { dg-warning "array subscript -1 is below array bounds" } */
+  T (p->a1[0]);
+  T (p->a1[9]);
+  T (p->a1[DIFF_MAX - offsetof (A1, a1) - 1]);
+  T (p->a1[DIFF_MAX - offsetof (A1, a1)]);/* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+  T (p->a1[DIFF_MAX - offsetof (A1, a1) + 1]); /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+  T (p->a1[SIZE_MAX]);                    /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+}
+
+void fb (struct B *p)
+{
+  T (p->a1x->a1[DIFF_MIN]);               /* { dg-warning "array subscript -\[0-9\]+ is below array bounds" } */
+  T (p->a1x->a1[-1]);                     /* { dg-warning "array subscript -1 is below array bounds" } */
+  T (p->a1x->a1[0]);
+  T (p->a1x->a1[9]);                      /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+  T (p->a1x->a1[DIFF_MAX - sizeof *p]);   /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+  T (p->a1x->a1[DIFF_MAX - sizeof *p + 1]); /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+  T (p->a1x->a1[SIZE_MAX]);               /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+
+  T (p->a1x[1].a1[DIFF_MIN]);             /* { dg-warning "array subscript -\[0-9\]+ is below array bounds" } */
+  T (p->a1x[1].a1[-1]);                   /* { dg-warning "array subscript -1 is below array bounds" } */
+  T (p->a1x[1].a1[0]);
+  T (p->a1x[1].a1[9]);                    /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+  T (p->a1x[1].a1[DIFF_MAX - sizeof *p]); /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+  T (p->a1x[1].a1[DIFF_MAX - sizeof *p + 1]); /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+  T (p->a1x[1].a1[SIZE_MAX]);             /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+
+  T (p->a1x[2].a1[DIFF_MIN]);             /* { dg-warning "array subscript -\[0-9\]+ is below array bounds" } */
+  T (p->a1x[2].a1[-1]);                   /* { dg-warning "array subscript -1 is below array bounds" } */
+  T (p->a1x[2].a1[0]);
+  T (p->a1x[2].a1[9]);                    /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+  T (p->a1x[2].a1[DIFF_MAX - sizeof *p]);/* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+  T (p->a1x[2].a1[DIFF_MAX - sizeof *p + 1]); /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+  T (p->a1x[2].a1[SIZE_MAX]);             /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+
+  T (p->a1x[3].a1[DIFF_MIN]);             /* { dg-warning "array subscript -\[0-9\]+ is below array bounds" } */
+  T (p->a1x[3].a1[-1]);                   /* { dg-warning "array subscript -1 is below array bounds" } */
+  T (p->a1x[3].a1[0]);
+  T (p->a1x[3].a1[9]);                    /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+
+  T (p->a1x[9].a1[0]);
+
+  enum { MAX = DIFF_MAX / sizeof *p->a1x - sizeof *p };
+
+  T (p->a1x[DIFF_MIN].a1);                /* { dg-warning "array subscript -\[0-9\]+ is below array bounds" } */
+  T (p->a1x[-1].a1);                      /* { dg-warning "array subscript -1 is below array bounds" } */
+  T (p->a1x[MAX].a1);
+  T (p->a1x[MAX + 2].a1);                 /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+
+  T (p->a1x[DIFF_MAX].a1);                /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+  T (p->a1x[SIZE_MAX].a1);                /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+
+  T (p->a1x[DIFF_MIN].a1[0]);             /* { dg-warning "array subscript -\[0-9\]+ is below array bounds" } */
+  T (p->a1x[-1].a1[0])                    /* { dg-warning "array subscript -1 is below array bounds" } */;
+  T (p->a1x[MAX - 1].a1[0]);
+  T (p->a1x[MAX].a1[0]);                  /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+  T (p->a1x[MAX + 1].a1[0]);              /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+
+  T (p->a1x[DIFF_MAX].a1[0]);             /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+  T (p->a1x[SIZE_MAX].a1[0]);             /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+}
+
+void f_cststring (int i)
+{
+  T (""[DIFF_MIN]);                       /* { dg-warning "array subscript -\[0-9\]+ is below array bounds of .(const )?char *\\\[1]" "string" { xfail lp64 } } */
+  T (""[DIFF_MIN + 1]);                   /* { dg-warning "array subscript -\[0-9\]+ is below array bounds of .(const )?char *\\\[1]" "string" } */
+  T (""[-1]);                             /* { dg-warning "array subscript -1 is below array bounds of .(const )?char *\\\[1]" "string" } */
+  T (""[0]);
+  T (""[1]);                              /* { dg-warning "array subscript 1 is above array bounds of .(const )?char *\\\[1]" "string" } */
+  T ("0"[2]);                             /* { dg-warning "array subscript 2 is above array bounds of .(const )?char *\\\[2]" "string" } */
+  T ("012"[2]);
+  T ("012"[3]);
+  T ("012"[4]);                           /* { dg-warning "array subscript 4 is above array bounds of .(const )?char *\\\[4]" "string" } */
+  T ("0123"[DIFF_MAX]);                   /* { dg-warning "array subscript \[0-9\]+ is above array bounds of .(const )?char *\\\[5]" "string" } */
+  T ("0123"[SIZE_MAX]);                   /* { dg-warning "array subscript \[0-9\]+ is above array bounds of .(const )?char *\\\[5]" "string" } */
+}
+
+void fb_strlen (struct B *p)
+{
+#define strlen __builtin_strlen
+
+  T (strlen (&p->a1x[0].a1[2]));          /* { dg-warning "array subscript \[0-9\]+ is above array bounds" "strlen" } */
+  T (strlen (p->a1x[0].a1 + 2));          /* { dg-warning "array subscript \[0-9\]+ is above array bounds" "strlen" { xfail *-*-* } } */
+}
+
+
+void f_vla (unsigned n)
+{
+  char vla[n];
+
+  T (vla[DIFF_MIN]);                      /* { dg-warning "array subscript -\[0-9\]+ is below array bounds" "vla" } */
+  T (vla[-1]);                            /* { dg-warning "array subscript -1 is below array bounds" "vla" } */
+  T (vla[0]);
+  T (vla[1]);
+  T (vla[n - 1]);
+  /* It would be nice to diagnose this. */
+  T (vla[n]);                             /* { dg-warning "array subscript \[0-9\]+ is above array bounds" "bug 82608" { xfail *-*-*} } */
+  T (vla[DIFF_MAX]);                      /* { dg-warning "array subscript \[0-9\]+ is above array bounds" "vla" } */
+}
diff --git a/gcc/testsuite/gcc.dg/Warray-bounds-11.c b/gcc/testsuite/gcc.dg/Warray-bounds-11.c
index 089fa00..c9fc461 100644
--- a/gcc/testsuite/gcc.dg/Warray-bounds-11.c
+++ b/gcc/testsuite/gcc.dg/Warray-bounds-11.c
@@ -57,19 +57,19 @@ struct h3b {
 
 void foo(int (*a)[3])
 {
-	(*a)[4] = 1;	/* { dg-warning "subscript is above array bound" } */
+	(*a)[4] = 1;	/* { dg-warning "subscript 4 is above array bound" } */
 	a[0][0] = 1;	// ok
 	a[1][0] = 1;	// ok
-	a[1][4] = 1;	/* { dg-warning "subscript is above array bound" } */
+	a[1][4] = 1;	/* { dg-warning "subscript 4 is above array bound" } */
 
 	int c[3] = { 0 };
 
-	c[4] = 1;	/* { dg-warning "subscript is above array bound" } */
+	c[4] = 1;	/* { dg-warning "subscript 4 is above array bound" } */
 
-	e[4] = 1;	/* { dg-warning "subscript is above array bound" } */
+	e[4] = 1;	/* { dg-warning "subscript 4 is above array bound" } */
 
 	struct f f;
-	f.f[4] = 1;	/* { dg-warning "subscript is above array bound" } */
+	f.f[4] = 1;	/* { dg-warning "subscript 4 is above array bound" } */
 
 	struct h* h = malloc(sizeof(struct h) + 3 * sizeof(int));
 	struct h0* h0 = malloc(sizeof(struct h0) + 3 * sizeof(int));
@@ -78,15 +78,15 @@ void foo(int (*a)[3])
 
 	h->j[4] = 1;	// flexible array member
 	h0->j[4] = 1;	// zero-sized array extension
-	h1->j[4] = 1;	/* { dg-warning "subscript is above array bound" } */
-	h3->j[4] = 1;	/* { dg-warning "subscript is above array bound" } */
+	h1->j[4] = 1;	/* { dg-warning "subscript 4 is above array bound" } */
+	h3->j[4] = 1;	/* { dg-warning "subscript 4 is above array bound" } */
 
 	struct h0b* h0b = malloc(sizeof(struct h) + 3 * sizeof(int));
 	struct h1b* h1b = malloc(sizeof(struct h1b) + 3 * sizeof(int));
 	struct h3b* h3b = malloc(sizeof(struct h3b));
 //	h0b->j[4] = 1;
-	h1b->j[4] = 1;;	/* { dg-warning "subscript is above array bound" } */
-	h3b->j[4] = 1;;	/* { dg-warning "subscript is above array bound" } */
+	h1b->j[4] = 1;;	/* { dg-warning "subscript 4 is above array bound" } */
+	h3b->j[4] = 1;;	/* { dg-warning "subscript 4 is above array bound" } */
 
 	// make sure nothing gets optimized away
 	bar(*a);
diff --git a/gcc/testsuite/gcc.dg/Warray-bounds-22.c b/gcc/testsuite/gcc.dg/Warray-bounds-22.c
new file mode 100644
index 0000000..f66bfb3
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/Warray-bounds-22.c
@@ -0,0 +1,96 @@
+/* PR tree-optimization/82588 - missing -Warray-bounds on an excessively
+   large index
+   { dg-do compile }
+   { dg-require-effective-target alloca }
+   { dg-options "-O2 -Warray-bounds -ftrack-macro-expansion=0" }  */
+
+#define SIZE_MAX  __SIZE_MAX__
+#define DIFF_MAX __PTRDIFF_MAX__
+#define DIFF_MIN (-DIFF_MAX - 1)
+
+typedef __PTRDIFF_TYPE__ ptrdiff_t;
+typedef __SIZE_TYPE__    size_t;
+
+extern ptrdiff_t signed_value (void)
+{
+  extern volatile ptrdiff_t signed_value_source;
+  return signed_value_source;
+}
+
+ptrdiff_t signed_range (ptrdiff_t min, ptrdiff_t max)
+{
+  ptrdiff_t val = signed_value ();
+  return val < min || max < val ? min : val;
+}
+
+typedef struct AX { int n; char ax[]; } AX;
+
+typedef struct A1 { int i; char a1[1]; } A1;
+typedef struct B { int i; struct A1 a1x[]; } B;
+
+void sink (int, ...);
+
+#define T(expr)   sink (0, (expr))
+
+void test_vla (unsigned m, unsigned n)
+{
+  char vla1[m];
+
+  T (vla1[DIFF_MIN]);                     /* { dg-warning "array subscript -\[0-9\]+ is below array bounds" "vla" } */
+  T (vla1[-1]);                           /* { dg-warning "array subscript -1 is below array bounds" "vla" } */
+  T (vla1[0]);
+  T (vla1[1]);
+  T (vla1[m - 1]);
+  /* It would be nice to diagnose this. */
+  T (vla1[m]);                            /* { dg-warning "array subscript \[0-9\]+ is above array bounds" "bug 82608" { xfail *-*-*} } */
+  T (vla1[DIFF_MAX - 1]);
+  T (vla1[DIFF_MAX]);                     /* { dg-warning "array subscript \[0-9\]+ is above array bounds" "vla" } */
+
+  ptrdiff_t i = signed_range (DIFF_MAX - 1, DIFF_MAX);
+  T (vla1[i]);
+
+  char vla2[m][n];
+
+  T (vla2[0][DIFF_MIN]);                  /* { dg-warning "array subscript -\[0-9\]+ is below array bounds" "vla" } */
+  T (vla2[0][-1]);                        /* { dg-warning "array subscript -1 is below array bounds" "vla" } */
+  T (vla2[0][0]);
+  T (vla2[1][1]);
+  T (vla2[m - 1][n - 1]);
+  /* It would be nice to diagnose this. */
+  T (vla2[m][0]);                         /* { dg-warning "array subscript \[0-9\]+ is above array bounds" "bug 82608" { xfail *-*-*} } */
+  T (vla2[m + 1][0]);                     /* { dg-warning "array subscript \[0-9\]+ is above array bounds" "bug 82608" { xfail *-*-*} } */
+  T (vla2[0][n]);                         /* { dg-warning "array subscript \[0-9\]+ is above array bounds" "bug 82608" { xfail *-*-*} } */
+  T (vla2[0][n + 1]);                     /* { dg-warning "array subscript \[0-9\]+ is above array bounds" "bug 82608" { xfail *-*-*} } */
+  T (vla2[m][n]);                         /* { dg-warning "array subscript \[0-9\]+ is above array bounds" "bug 82608" { xfail *-*-*} } */
+  T (vla2[m + 1][n + 1]);                 /* { dg-warning "array subscript \[0-9\]+ is above array bounds" "bug 82608" { xfail *-*-*} } */
+
+  T (vla2[0][DIFF_MAX]);                  /* { dg-warning "array subscript \[0-9\]+ is above array bounds" "vla" } */
+  T (vla2[DIFF_MAX][0]);                  /* { dg-warning "array subscript \[0-9\]+ is above array bounds" "vla" { xfail *-*-* } } */
+  T (vla2[DIFF_MAX][DIFF_MAX]);           /* { dg-warning "array subscript \[0-9\]+ is above array bounds" "vla" } */
+
+  struct S256 { char a[256]; } vla3[m];
+
+  T (vla3[DIFF_MIN].a[0]);                /* { dg-warning "array subscript -\[0-9\]+ is below array bounds" "vla" } */
+  T (vla3[-1].a[0]);                      /* { dg-warning "array subscript -1 is below array bounds" "vla" } */
+  T (vla3[0].a[0]);
+  T (vla3[1].a[0]);
+  T (vla3[m - 1].a[0]);
+  T (vla3[DIFF_MAX / 256 - 1].a[0]);
+  T (vla3[DIFF_MAX / 256].a[0]);          /* { dg-warning "array subscript \[0-9\]+ is above array bounds" "vla" } */
+
+  i = signed_range (DIFF_MAX / 256 - 1, DIFF_MAX);
+  T (vla3[i].a[0]);
+
+  i = signed_range (DIFF_MAX / 256, DIFF_MAX);
+  T (vla3[i].a[0]);                       /* { dg-warning "array subscript \[0-9\]+ is above array bounds" "vla" } */
+
+  struct VLA { char vla[n]; } x;
+
+  T (x.vla[DIFF_MIN]);                    /* { dg-warning "array subscript -\[0-9\]+ is below array bounds" "vla" } */
+  T (x.vla[-1]);                          /* { dg-warning "array subscript -1 is below array bounds" "vla" } */
+  T (x.vla[0]);
+  T (x.vla[1]);
+  T (x.vla[n - 1]);
+  T (x.vla[DIFF_MAX - 1]);
+  T (x.vla[DIFF_MAX]);                    /* { dg-warning "array subscript \[0-9\]+ is above array bounds" "vla" } */
+}
diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
index 945b3a9..e248f59 100644
--- a/gcc/tree-vrp.c
+++ b/gcc/tree-vrp.c
@@ -42,6 +42,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimple-iterator.h"
 #include "gimple-walk.h"
 #include "tree-cfg.h"
+#include "tree-dfa.h"
 #include "tree-ssa-loop-manip.h"
 #include "tree-ssa-loop-niter.h"
 #include "tree-ssa-loop.h"
@@ -65,6 +66,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "stringpool.h"
 #include "attribs.h"
 #include "vr-values.h"
+#include "builtins.h"
 
 /* Set of SSA names found live during the RPO traversal of the function
    for still active basic-blocks.  */
@@ -4781,26 +4783,51 @@ vrp_prop::check_array_ref (location_t location, tree ref,
   low_sub = up_sub = TREE_OPERAND (ref, 1);
   up_bound = array_ref_up_bound (ref);
 
-  /* Can not check flexible arrays.  */
   if (!up_bound
-      || TREE_CODE (up_bound) != INTEGER_CST)
-    return;
+      || TREE_CODE (up_bound) != INTEGER_CST
+      || (warn_array_bounds < 2
+	  && array_at_struct_end_p (ref)))
+    {
+      /* Accesses to trailing arrays via pointers may access storage
+	 beyond the types array bounds.  For such arrays, or for flexible
+	 array members, as well as for other arrays of an unknown size,
+	 replace the upper bound with a more permissive one that assumes
+	 the size of the largest object is PTRDIFF_MAX.  */
+      tree eltsize = array_ref_element_size (ref);
+
+      /* FIXME: Handle VLAs.  */
+      if (TREE_CODE (eltsize) != INTEGER_CST)
+	return;
 
-  /* Accesses to trailing arrays via pointers may access storage
-     beyond the types array bounds.  */
-  if (warn_array_bounds < 2
-      && array_at_struct_end_p (ref))
-    return;
+      tree maxbound = TYPE_MAX_VALUE (ptrdiff_type_node);
+
+      up_bound_p1 = int_const_binop (TRUNC_DIV_EXPR, maxbound, eltsize);
+
+      tree arg = TREE_OPERAND (ref, 0);
+
+      HOST_WIDE_INT off;
+      if (get_addr_base_and_unit_offset (arg, &off))
+	up_bound_p1 = wide_int_to_tree (sizetype,
+					wi::sub (wi::to_wide (up_bound_p1),
+						 off));
+
+      up_bound = int_const_binop (MINUS_EXPR, up_bound_p1,
+				  build_int_cst (ptrdiff_type_node, 1));
+    }
+  else
+    up_bound_p1 = int_const_binop (PLUS_EXPR, up_bound,
+				   build_int_cst (TREE_TYPE (up_bound), 1));
 
   low_bound = array_ref_low_bound (ref);
-  up_bound_p1 = int_const_binop (PLUS_EXPR, up_bound,
-				 build_int_cst (TREE_TYPE (up_bound), 1));
+
+  tree artype = TREE_TYPE (TREE_OPERAND (ref, 0));
 
   /* Empty array.  */
   if (tree_int_cst_equal (low_bound, up_bound_p1))
     {
       warning_at (location, OPT_Warray_bounds,
-		  "array subscript is above array bounds");
+		  "array subscript %E is above array bounds of %qT",
+		  low_bound, artype);
       TREE_NO_WARNING (ref) = 1;
     }
 
@@ -4824,7 +4851,8 @@ vrp_prop::check_array_ref (location_t location, tree ref,
           && tree_int_cst_le (low_sub, low_bound))
         {
           warning_at (location, OPT_Warray_bounds,
-		      "array subscript is outside array bounds");
+		      "array subscript [%E, %E] is outside array bounds of %qT",
+		      low_sub, up_sub, artype);
           TREE_NO_WARNING (ref) = 1;
         }
     }
@@ -4840,7 +4868,8 @@ vrp_prop::check_array_ref (location_t location, tree ref,
 	  fprintf (dump_file, "\n");
 	}
       warning_at (location, OPT_Warray_bounds,
-		  "array subscript is above array bounds");
+		  "array subscript %E is above array bounds of %qT",
+		  up_sub, artype);
       TREE_NO_WARNING (ref) = 1;
     }
   else if (TREE_CODE (low_sub) == INTEGER_CST
@@ -4853,7 +4882,8 @@ vrp_prop::check_array_ref (location_t location, tree ref,
 	  fprintf (dump_file, "\n");
 	}
       warning_at (location, OPT_Warray_bounds,
-		  "array subscript is below array bounds");
+		  "array subscript %E is below array bounds of %qT",
+		  low_sub, artype);
       TREE_NO_WARNING (ref) = 1;
     }
 }
@@ -4908,7 +4938,8 @@ vrp_prop::search_for_addr_array (tree t, location_t location)
 	      fprintf (dump_file, "\n");
 	    }
 	  warning_at (location, OPT_Warray_bounds,
-		      "array subscript is below array bounds");
+		      "array subscript %wi is below array bounds of %qT",
+		      idx.to_shwi (), TREE_TYPE (tem));
 	  TREE_NO_WARNING (t) = 1;
 	}
       else if (idx > (wi::to_offset (up_bound)
@@ -4921,7 +4952,8 @@ vrp_prop::search_for_addr_array (tree t, location_t location)
 	      fprintf (dump_file, "\n");
 	    }
 	  warning_at (location, OPT_Warray_bounds,
-		      "array subscript is above array bounds");
+		      "array subscript %wu is above array bounds of %qT",
+		      idx.to_uhwi (), TREE_TYPE (tem));
 	  TREE_NO_WARNING (t) = 1;
 	}
     }
Richard Biener Nov. 16, 2017, 10:09 a.m. UTC | #22
On Thu, Nov 16, 2017 at 4:08 AM, Martin Sebor <msebor@gmail.com> wrote:
> On 11/15/2017 03:51 AM, Richard Biener wrote:
>>
>> On Tue, Nov 14, 2017 at 6:45 PM, Martin Sebor <msebor@gmail.com> wrote:
>>>
>>> On 11/14/2017 05:28 AM, Richard Biener wrote:
>>>>
>>>>
>>>> On Mon, Nov 13, 2017 at 6:37 PM, Martin Sebor <msebor@gmail.com> wrote:
>>>>>
>>>>>
>>>>> Richard, this thread may have been conflated with the one Re:
>>>>> [PATCH] enhance -Warray-bounds to detect out-of-bounds offsets
>>>>> (PR 82455) They are about different things.
>>>>>
>>>>> I'm still looking for approval of:
>>>>>
>>>>>   https://gcc.gnu.org/ml/gcc-patches/2017-10/msg01208.html
>>>
>>>
>>>
>>> Sorry, I pointed to an outdated version.  This is the latest
>>> version:
>>>
>>>   https://gcc.gnu.org/ml/gcc-patches/2017-10/msg01304.html
>>>
>>> My bad...
>>>
>>>
>>>>
>>>> +      tree maxbound
>>>> + = build_int_cst (sizetype, ~(1LLU << (TYPE_PRECISION (sizetype) -
>>>> 1)));
>>>>
>>>> this looks possibly bogus.  Can you instead use
>>>>
>>>>   up_bound_p1
>>>>     = wide_int_to_tree (sizetype, wi::div_trunc (wi::max_value
>>>> (TYPE_PRECISION (sizetype), SIGNED), wi::to_wide (eltsize)));
>>>>
>>>> please?  Note you are _not_ computing the proper upper bound here
>>>> because
>>>> that
>>>> is what you compute plus low_bound.
>>>>
>>>> +      up_bound_p1 = int_const_binop (TRUNC_DIV_EXPR, maxbound,
>>>> eltsize);
>>>>
>>>> +
>>>> +      tree arg = TREE_OPERAND (ref, 0);
>>>> +      tree_code code = TREE_CODE (arg);
>>>> +      if (code == COMPONENT_REF)
>>>> + {
>>>> +  HOST_WIDE_INT off;
>>>> +  if (tree base = get_addr_base_and_unit_offset (ref, &off))
>>>> +    {
>>>> +      tree size = TYPE_SIZE_UNIT (TREE_TYPE (base));
>>>> +      if (TREE_CODE (size) == INTEGER_CST)
>>>> + up_bound_p1 = int_const_binop (MINUS_EXPR, up_bound_p1, size);
>>>>
>>>> I think I asked this multiple times now but given 'ref' is the
>>>> variable array-ref
>>>> a.b.c[i] when you call get_addr_base_and_unit_offset (ref, &off) you
>>>> always
>>>> get a NULL_TREE return value.
>>>>
>>>> So I asked you to pass it 'arg' instead ... which gets you the offset of
>>>> a.b.c, which looks like what you intended to get anyway.
>>>>
>>>> I also wonder what you compute here - you are looking at the size of
>>>> 'base'
>>>> but that is the size of 'a'.  You don't even use the computed offset!
>>>> Which
>>>> means you could have used get_base_address instead!?  Also the type
>>>> of 'base' may be completely off given MEM[&blk + 8].b.c[i] would return
>>>> blk
>>>> as base which might be an array of chars and not in any way related to
>>>> the type of the innermost structure we access with COMPONENT_REFs.
>>>>
>>>> Why are you only looking at COMPONENT_REF args anyways?  You
>>>> don't want to handle a.b[3][i]?
>>>>
>>>> That is, I'd have expected you do
>>>>
>>>>    if (get_addr_base_and_unit_offset (ref, &off))
>>>>      up_bound_p1 = wide_int_to_tree (sizetype, wi::sub (wi::to_wide
>>>> (up_bound_p1), off));
>>
>>
>> ^^^^^^^^^
>
>
> Please see the attached update.

Ok.

Thanks,
Richard.

> Martin
diff mbox series

Patch

PR tree-optimization/82596 - missing -Warray-bounds on an out-of-bounds index into string literal
PR tree-optimization/82588 - missing -Warray-bounds on a excessively large index
PR tree-optimization/82583 - missing -Warray-bounds on out-of-bounds inner indic

gcc/ChangeLog:
	PR tree-optimization/82596
	PR tree-optimization/82588
	PR tree-optimization/82583	
	* tree-vrp.c (check_array_ref): Handle flexible array members,
	string literals, and inner indices.
	(search_for_addr_array): Add detail to diagnostics.

gcc/testsuite/ChangeLog:

	PR tree-optimization/82596
	PR tree-optimization/82588
	PR tree-optimization/82583	
	* c-c++-common/Warray-bounds.c: New test.
	* gcc.dg/Warray-bounds-11.c: Adjust.

diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
index 2c86b8e..88cce15 100644
--- a/gcc/tree-vrp.c
+++ b/gcc/tree-vrp.c
@@ -42,6 +42,7 @@  along with GCC; see the file COPYING3.  If not see
 #include "gimple-iterator.h"
 #include "gimple-walk.h"
 #include "tree-cfg.h"
+#include "tree-dfa.h"
 #include "tree-ssa-loop-manip.h"
 #include "tree-ssa-loop-niter.h"
 #include "tree-ssa-loop.h"
@@ -64,6 +65,7 @@  along with GCC; see the file COPYING3.  If not see
 #include "tree-cfgcleanup.h"
 #include "stringpool.h"
 #include "attribs.h"
+#include "builtins.h"
 
 #define VR_INITIALIZER { VR_UNDEFINED, NULL_TREE, NULL_TREE, NULL }
 
@@ -6675,26 +6677,51 @@  check_array_ref (location_t location, tree ref, bool ignore_off_by_one)
   low_sub = up_sub = TREE_OPERAND (ref, 1);
   up_bound = array_ref_up_bound (ref);
 
-  /* Can not check flexible arrays.  */
   if (!up_bound
-      || TREE_CODE (up_bound) != INTEGER_CST)
-    return;
+      || (warn_array_bounds < 2
+	  && array_at_struct_end_p (ref)))
+    {
+      /* Accesses to trailing arrays via pointers may access storage
+	 beyond the types array bounds.  For such arrays, or for flexible
+	 array members as well as for other arrays of an unknown size,
+	 replace the upper bound with a more permissive one that assumes
+	 the size of the largest object is SSIZE_MAX.  */
+      tree eltype = TREE_TYPE (ref);
+      tree eltsize = TYPE_SIZE_UNIT (eltype);
+      tree maxbound = TYPE_MAX_VALUE (ssizetype);
+      up_bound_p1 = fold_build2 (TRUNC_DIV_EXPR, ssizetype, maxbound, eltsize);
+
+      tree arg = TREE_OPERAND (ref, 0);
+      tree_code code = TREE_CODE (arg);
+      if (code == COMPONENT_REF)
+	{
+	  HOST_WIDE_INT off;
+	  if (tree base = get_addr_base_and_unit_offset (ref, &off))
+	    up_bound_p1 = fold_build2 (MINUS_EXPR, ssizetype, up_bound_p1,
+				       TYPE_SIZE_UNIT (TREE_TYPE (base)));
+	  else
+	    return;
+	}
+      else if (code == STRING_CST)
+	up_bound_p1 = build_int_cst (ssizetype, TREE_STRING_LENGTH (arg));
 
-  /* Accesses to trailing arrays via pointers may access storage
-     beyond the types array bounds.  */
-  if (warn_array_bounds < 2
-      && array_at_struct_end_p (ref))
-    return;
+      up_bound = int_const_binop (MINUS_EXPR, up_bound_p1,
+				  build_int_cst (ssizetype, 1));
+    }
+  else
+    up_bound_p1 = int_const_binop (PLUS_EXPR, up_bound,
+				   build_int_cst (TREE_TYPE (up_bound), 1));
 
   low_bound = array_ref_low_bound (ref);
-  up_bound_p1 = int_const_binop (PLUS_EXPR, up_bound,
-				 build_int_cst (TREE_TYPE (up_bound), 1));
+
+  tree artype = TREE_TYPE (TREE_OPERAND (ref, 0));
 
   /* Empty array.  */
   if (tree_int_cst_equal (low_bound, up_bound_p1))
     {
       warning_at (location, OPT_Warray_bounds,
-		  "array subscript is above array bounds");
+		  "array subscript %E is above array bounds of %qT",
+		  low_bound, artype);
       TREE_NO_WARNING (ref) = 1;
     }
 
@@ -6718,7 +6745,8 @@  check_array_ref (location_t location, tree ref, bool ignore_off_by_one)
           && tree_int_cst_le (low_sub, low_bound))
         {
           warning_at (location, OPT_Warray_bounds,
-		      "array subscript is outside array bounds");
+		      "array subscript [%E, %E] is outside array bounds of %qT",
+		      low_sub, up_sub, artype);
           TREE_NO_WARNING (ref) = 1;
         }
     }
@@ -6734,7 +6762,8 @@  check_array_ref (location_t location, tree ref, bool ignore_off_by_one)
 	  fprintf (dump_file, "\n");
 	}
       warning_at (location, OPT_Warray_bounds,
-		  "array subscript is above array bounds");
+		  "array subscript %E is above array bounds of %qT",
+		  up_sub, artype);
       TREE_NO_WARNING (ref) = 1;
     }
   else if (TREE_CODE (low_sub) == INTEGER_CST
@@ -6747,7 +6776,8 @@  check_array_ref (location_t location, tree ref, bool ignore_off_by_one)
 	  fprintf (dump_file, "\n");
 	}
       warning_at (location, OPT_Warray_bounds,
-		  "array subscript is below array bounds");
+		  "array subscript %E is below array bounds of %qT",
+		  low_sub, artype);
       TREE_NO_WARNING (ref) = 1;
     }
 }
@@ -6802,7 +6832,8 @@  search_for_addr_array (tree t, location_t location)
 	      fprintf (dump_file, "\n");
 	    }
 	  warning_at (location, OPT_Warray_bounds,
-		      "array subscript is below array bounds");
+		      "array subscript %wi is below array bounds of %qT",
+		      idx.to_shwi (), TREE_TYPE (tem));
 	  TREE_NO_WARNING (t) = 1;
 	}
       else if (idx > (wi::to_offset (up_bound)
@@ -6815,7 +6846,8 @@  search_for_addr_array (tree t, location_t location)
 	      fprintf (dump_file, "\n");
 	    }
 	  warning_at (location, OPT_Warray_bounds,
-		      "array subscript is above array bounds");
+		      "array subscript %wu is above array bounds of %qT",
+		      idx.to_uhwi (), TREE_TYPE (tem));
 	  TREE_NO_WARNING (t) = 1;
 	}
     }
diff --git a/gcc/testsuite/c-c++-common/Warray-bounds.c b/gcc/testsuite/c-c++-common/Warray-bounds.c
new file mode 100644
index 0000000..f94bd6e
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/Warray-bounds.c
@@ -0,0 +1,179 @@ 
+/* PR tree-optimization/82588 - missing -Warray-bounds on an excessively
+   large index
+   { dg-do compile }
+   { dg-options "-O2 -Warray-bounds -ftrack-macro-expansion=0" }  */
+
+#define SIZE_MAX  __SIZE_MAX__
+#define SSIZE_MAX __PTRDIFF_MAX__
+#define SSIZE_MIN (-SSIZE_MAX - 1)
+
+typedef __PTRDIFF_TYPE__ ssize_t;
+typedef __SIZE_TYPE__    size_t;
+
+struct AX { int n; char ax[]; };
+
+struct A1 { int i; char a1[1]; };
+struct B { int i; struct A1 a1x[]; };
+
+void sink (int, ...);
+
+#define T(expr)   sink (0, (expr))
+
+struct __attribute__ ((packed)) S16 { unsigned i: 16; };
+
+void farr_char (void)
+{
+  extern char ac[];
+
+  T (ac[SSIZE_MIN]);                      /* { dg-warning "array subscript -\[0-9\]+ is below array bounds of .char *\\\[]." } */
+  T (ac[-1]);                             /* { dg-warning "array subscript -1 is below array bounds" } */
+  T (ac[0]);
+
+  T (ac[SSIZE_MAX - 1]);
+  T (ac[SSIZE_MAX]);                      /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+  T (ac[SSIZE_MAX + (size_t)1]);          /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+  T (ac[SIZE_MAX]);                       /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+}
+
+void farr_s16 (void)
+{
+  extern struct S16 ax[];
+
+  T (ax[SSIZE_MIN]);                      /* { dg-warning "array subscript -\[0-9\]+ is below array bounds of .struct S16\\\[]." } */
+  T (ax[-1]);                             /* { dg-warning "array subscript -1 is below array bounds" } */
+  T (ax[0]);
+
+  T (ax[SSIZE_MAX / 2 - 1]);
+  T (ax[SSIZE_MAX / 2]);                  /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+  T (ax[SSIZE_MAX / 2 + (size_t)1]);      /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+  T (ax[SIZE_MAX]);                       /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+}
+
+void farr_s16_7 (void)
+{
+  extern struct S16 ax_7[][7];
+
+  T (ax_7[0][SSIZE_MIN]);                 /* { dg-warning "array subscript -\[0-9\]+ is below array bounds of .struct S16 *\\\[7]." } */
+  T (ax_7[0][-1]);                        /* { dg-warning "array subscript -1 is below array bounds" } */
+  T (ax_7[0][0]);
+  T (ax_7[0][7]);                        /* { dg-warning "array subscript 7 is above array bounds of .struct S16 *\\\[7]." } */
+  T (ax_7[0][8]);                        /* { dg-warning "array subscript 8 is above array bounds of .struct S16 *\\\[7]." } */
+
+  T (ax_7[0][SSIZE_MAX / 2]);            /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+  T (ax_7[0][SIZE_MAX]);                 /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+
+  T (ax_7[SSIZE_MIN][0]);                 /* { dg-warning "array subscript -\[0-9\]+ is below array bounds of .struct S16 *\\\[]\\\[7]." } */
+  T (ax_7[-1][0]);                        /* { dg-warning "array subscript -1 is below array bounds" } */
+
+  T (ax_7[SSIZE_MAX / 2][0]);             /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+  T (ax_7[SIZE_MAX][0]);                  /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+}
+
+void farr_x_5_7 (void)
+{
+  extern struct S16 a[][5][7];
+
+  T (a[0][0][-3]);                        /* { dg-warning "array subscript -3 is below array bounds of .struct S16 *\\\[7]." } */
+  T (a[0][-2][0]);                        /* { dg-warning "array subscript -2 is below array bounds of .struct S16 *\\\[5]\\\[7]." } */
+  T (a[-1][0][0]);                        /* { dg-warning "array subscript -1 is below array bounds of .struct S16 *\\\[]\\\[5]\\\[7]." } */
+}
+
+
+void fax (struct AX *p)
+{
+  T (p->ax[SSIZE_MIN]);                   /* { dg-warning "array subscript -\[0-9\]+ is below array bounds" } */
+  T (p->ax[-1]);                          /* { dg-warning "array subscript -1 is below array bounds" } */
+  T (p->ax[0]);
+  T (p->ax[SSIZE_MAX - sizeof *p - 1]);
+  T (p->ax[SSIZE_MAX - sizeof *p]);       /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+  T (p->ax[SSIZE_MAX - sizeof *p + 1]);   /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+  T (p->ax[SIZE_MAX]);                    /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+}
+
+void fa1 (struct A1 *p)
+{
+  T (p->a1[SSIZE_MIN]);                   /* { dg-warning "array subscript -\[0-9\]+ is below array bounds" } */
+  T (p->a1[-1]);                          /* { dg-warning "array subscript -1 is below array bounds" } */
+  T (p->a1[0]);
+  T (p->a1[9]);
+  T (p->a1[SSIZE_MAX - sizeof *p - 1]);
+  T (p->a1[SSIZE_MAX - sizeof *p]);       /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+  T (p->a1[SSIZE_MAX - sizeof *p + 1]);   /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+  T (p->a1[SIZE_MAX]);                    /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+}
+
+void fb (struct B *p)
+{
+  T (p->a1x->a1[SSIZE_MIN]);             /* { dg-warning "array subscript -\[0-9\]+ is below array bounds" } */
+  T (p->a1x->a1[-1]);                    /* { dg-warning "array subscript -1 is below array bounds" } */
+  T (p->a1x->a1[0]);
+  T (p->a1x->a1[9]);                     /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+  T (p->a1x->a1[SSIZE_MAX - sizeof *p]); /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+  T (p->a1x->a1[SSIZE_MAX - sizeof *p + 1]); /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+  T (p->a1x->a1[SIZE_MAX]);               /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+
+  T (p->a1x[1].a1[SSIZE_MIN]);            /* { dg-warning "array subscript -\[0-9\]+ is below array bounds" } */
+  T (p->a1x[1].a1[-1]);                   /* { dg-warning "array subscript -1 is below array bounds" } */
+  T (p->a1x[1].a1[0]);
+  T (p->a1x[1].a1[9]);                    /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+  T (p->a1x[1].a1[SSIZE_MAX - sizeof *p]);/* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+  T (p->a1x[1].a1[SSIZE_MAX - sizeof *p + 1]); /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+  T (p->a1x[1].a1[SIZE_MAX]);             /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+
+  T (p->a1x[2].a1[SSIZE_MIN]);            /* { dg-warning "array subscript -\[0-9\]+ is below array bounds" } */
+  T (p->a1x[2].a1[-1]);                   /* { dg-warning "array subscript -1 is below array bounds" } */
+  T (p->a1x[2].a1[0]);
+  T (p->a1x[2].a1[9]);                    /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+  T (p->a1x[2].a1[SSIZE_MAX - sizeof *p]);/* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+  T (p->a1x[2].a1[SSIZE_MAX - sizeof *p + 1]); /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+  T (p->a1x[2].a1[SIZE_MAX]);             /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+
+  T (p->a1x[3].a1[SSIZE_MIN]);            /* { dg-warning "array subscript -\[0-9\]+ is below array bounds" } */
+  T (p->a1x[3].a1[-1]);                   /* { dg-warning "array subscript -1 is below array bounds" } */
+  T (p->a1x[3].a1[0]);
+  T (p->a1x[3].a1[9]);                    /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+
+  T (p->a1x[9].a1[0]);
+
+  enum { MAX = SSIZE_MAX / sizeof *p->a1x - sizeof *p };
+
+  T (p->a1x[SSIZE_MIN].a1);               /* { dg-warning "array subscript -\[0-9\]+ is below array bounds" } */
+  T (p->a1x[-1].a1);                      /* { dg-warning "array subscript -1 is below array bounds" } */
+  T (p->a1x[MAX].a1);
+  T (p->a1x[MAX + 2].a1);                 /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+
+  T (p->a1x[SSIZE_MAX].a1);               /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+  T (p->a1x[SIZE_MAX].a1);                /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+
+  T (p->a1x[SSIZE_MIN].a1[0]);            /* { dg-warning "array subscript -\[0-9\]+ is below array bounds" } */
+  T (p->a1x[-1].a1[0])                    /* { dg-warning "array subscript -1 is below array bounds" } */;
+  T (p->a1x[MAX - 1].a1[0]);
+  T (p->a1x[MAX].a1[0]);                  /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+  T (p->a1x[MAX + 1].a1[0]);              /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+
+  T (p->a1x[SSIZE_MAX].a1[0]);            /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+  T (p->a1x[SIZE_MAX].a1[0]);             /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
+}
+
+void f_cststring (int i)
+{
+  T (""[SSIZE_MIN]);                      /* { dg-warning "array subscript -\[0-9\]+ is below array bounds of .char *\\\[1]" "string" { xfail *-*-* } } */
+  T (""[SSIZE_MIN + 1]);                  /* { dg-warning "array subscript -\[0-9\]+ is below array bounds of .char *\\\[1]" "string" } */
+  T (""[-1]);                             /* { dg-warning "array subscript -1 is below array bounds of .char *\\\[1]" "string" } */
+  T (""[0]);
+  T (""[1]);                              /* { dg-warning "array subscript 1 is above array bounds of .char *\\\[1]" "string" } */
+  T ("0"[2]);                             /* { dg-warning "array subscript 2 is above array bounds of .char *\\\[2]" "string" } */
+  T ("012"[2]);
+  T ("012"[3]);
+  T ("012"[4]);                           /* { dg-warning "array subscript 4 is above array bounds of .char *\\\[4]" "string" } */
+  T ("0123"[SSIZE_MAX]);                  /* { dg-warning "array subscript \[0-9\]+ is above array bounds of .char *\\\[5]" "string" } */
+  T ("0123"[SIZE_MAX]);                   /* { dg-warning "array subscript \[0-9\]+ is above array bounds of .char *\\\[5]" "string" } */
+}
+
+void fb_strlen (struct B *p)
+{
+#define strlen __builtin_strlen
+
+  sink (strlen (&p->a1x[0].a1[2]));       /* { dg-warning "array subscript \[0-9\]+ is above array bounds" "strlen" } */
+  sink (strlen (p->a1x[0].a1 + 2));       /* { dg-warning "array subscript \[0-9\]+ is above array bounds" "strlen" { xfail *-*-* } } */
+}
diff --git a/gcc/testsuite/gcc.dg/Warray-bounds-11.c b/gcc/testsuite/gcc.dg/Warray-bounds-11.c
index 089fa00..c9fc461 100644
--- a/gcc/testsuite/gcc.dg/Warray-bounds-11.c
+++ b/gcc/testsuite/gcc.dg/Warray-bounds-11.c
@@ -57,19 +57,19 @@  struct h3b {
 
 void foo(int (*a)[3])
 {
-	(*a)[4] = 1;	/* { dg-warning "subscript is above array bound" } */
+	(*a)[4] = 1;	/* { dg-warning "subscript 4 is above array bound" } */
 	a[0][0] = 1;	// ok
 	a[1][0] = 1;	// ok
-	a[1][4] = 1;	/* { dg-warning "subscript is above array bound" } */
+	a[1][4] = 1;	/* { dg-warning "subscript 4 is above array bound" } */
 
 	int c[3] = { 0 };
 
-	c[4] = 1;	/* { dg-warning "subscript is above array bound" } */
+	c[4] = 1;	/* { dg-warning "subscript 4 is above array bound" } */
 
-	e[4] = 1;	/* { dg-warning "subscript is above array bound" } */
+	e[4] = 1;	/* { dg-warning "subscript 4 is above array bound" } */
 
 	struct f f;
-	f.f[4] = 1;	/* { dg-warning "subscript is above array bound" } */
+	f.f[4] = 1;	/* { dg-warning "subscript 4 is above array bound" } */
 
 	struct h* h = malloc(sizeof(struct h) + 3 * sizeof(int));
 	struct h0* h0 = malloc(sizeof(struct h0) + 3 * sizeof(int));
@@ -78,15 +78,15 @@  void foo(int (*a)[3])
 
 	h->j[4] = 1;	// flexible array member
 	h0->j[4] = 1;	// zero-sized array extension
-	h1->j[4] = 1;	/* { dg-warning "subscript is above array bound" } */
-	h3->j[4] = 1;	/* { dg-warning "subscript is above array bound" } */
+	h1->j[4] = 1;	/* { dg-warning "subscript 4 is above array bound" } */
+	h3->j[4] = 1;	/* { dg-warning "subscript 4 is above array bound" } */
 
 	struct h0b* h0b = malloc(sizeof(struct h) + 3 * sizeof(int));
 	struct h1b* h1b = malloc(sizeof(struct h1b) + 3 * sizeof(int));
 	struct h3b* h3b = malloc(sizeof(struct h3b));
 //	h0b->j[4] = 1;
-	h1b->j[4] = 1;;	/* { dg-warning "subscript is above array bound" } */
-	h3b->j[4] = 1;;	/* { dg-warning "subscript is above array bound" } */
+	h1b->j[4] = 1;;	/* { dg-warning "subscript 4 is above array bound" } */
+	h3b->j[4] = 1;;	/* { dg-warning "subscript 4 is above array bound" } */
 
 	// make sure nothing gets optimized away
 	bar(*a);