diff mbox series

PING [PATCH] warn for strlen of arrays with missing nul (PR 86552, 86711, 86714) )

Message ID 3136e337-a551-3853-c536-b3b9ff0acdb9@gmail.com
State New
Headers show
Series PING [PATCH] warn for strlen of arrays with missing nul (PR 86552, 86711, 86714) ) | expand

Commit Message

Martin Sebor Aug. 2, 2018, 2:44 a.m. UTC
Since the foundation of the patch is detecting and avoiding
the overly aggressive folding of unterminated char arrays,
besides issuing a warning for such arguments to strlen,
the patch also fixes pr86711 - wrong folding of memchr, and
pr86714 - tree-ssa-forwprop.c confused by too long initializer.

The substance of the attached updated patch is unchanged,
I have just added test cases for the two additional bugs.

Bernd, as I mentioned Wednesday, the patch supersedes
yours here:
https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01800.html

Martin

On 07/30/2018 01:17 PM, Martin Sebor wrote:
> Attached is an updated version of the patch that handles more
> instances of calling strlen() on a constant array that is not
> a nul-terminated string.
>
> No other functions except strlen are explicitly handled yet,
> and neither are constant arrays with braced-initializer lists
> like const char a[] = { 'a', 'b', 'c' };  I am testing
> an independent solution for those (bug 86552).  Once those
> are handled the warning will be able to detect those as well.
>
> Tested on x86_64-linux.
>
> On 07/25/2018 05:38 PM, Martin Sebor wrote:
>> Ping: https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01124.html
>>
>> The fix for bug 86532 has been checked in so this enhancement
>> can now be applied on top of it (with only minor adjustments).
>>
>> On 07/19/2018 02:08 PM, Martin Sebor wrote:
>>> In the discussion of my patch for pr86532 Bernd noted that
>>> GCC silently accepts constant character arrays with no
>>> terminating nul as arguments to strlen (and other string
>>> functions).
>>>
>>> The attached patch is a first step in detecting these kinds
>>> of bugs in strlen calls by issuing -Wstringop-overflow.
>>> The next step is to modify all other handlers of built-in
>>> functions to detect the same problem (not part of this patch).
>>> Yet another step is to detect these problems in arguments
>>> initialized using the non-string form:
>>>
>>>   const char a[] = { 'a', 'b', 'c' };
>>>
>>> This patch is meant to apply on top of the one for bug 86532
>>> (I tested it with an earlier version of that patch so there
>>> is code in the context that does not appear in the latest
>>> version of the other diff).
>>>
>>> Martin
>>>
>>
>

Comments

Bernd Edlinger Aug. 2, 2018, 1:26 p.m. UTC | #1
On 08/02/18 04:44, Martin Sebor wrote:
> Since the foundation of the patch is detecting and avoiding
> the overly aggressive folding of unterminated char arrays,
> besides issuing a warning for such arguments to strlen,
> the patch also fixes pr86711 - wrong folding of memchr, and
> pr86714 - tree-ssa-forwprop.c confused by too long initializer.
> 
> The substance of the attached updated patch is unchanged,
> I have just added test cases for the two additional bugs.
> 
> Bernd, as I mentioned Wednesday, the patch supersedes
> yours here:
> https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01800.html
> 

No problem, but I hope you understand, that I still uphold
my patch.

So we have two patches now:
- mine, fixing a wrong code bug,
- yours, implementing a new warning and fixing a wrong
code bug at the same time.

I will add a few comments to your patch below.

> Martin
> 
> On 07/30/2018 01:17 PM, Martin Sebor wrote:
>> Attached is an updated version of the patch that handles more
>> instances of calling strlen() on a constant array that is not
>> a nul-terminated string.
>>
>> No other functions except strlen are explicitly handled yet,
>> and neither are constant arrays with braced-initializer lists
>> like const char a[] = { 'a', 'b', 'c' };  I am testing
>> an independent solution for those (bug 86552).  Once those
>> are handled the warning will be able to detect those as well.
>>
>> Tested on x86_64-linux.
>>
>> On 07/25/2018 05:38 PM, Martin Sebor wrote:
>>> Ping: https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01124.html
>>>
>>> The fix for bug 86532 has been checked in so this enhancement
>>> can now be applied on top of it (with only minor adjustments).
>>>
>>> On 07/19/2018 02:08 PM, Martin Sebor wrote:
>>>> In the discussion of my patch for pr86532 Bernd noted that
>>>> GCC silently accepts constant character arrays with no
>>>> terminating nul as arguments to strlen (and other string
>>>> functions).
>>>>
>>>> The attached patch is a first step in detecting these kinds
>>>> of bugs in strlen calls by issuing -Wstringop-overflow.
>>>> The next step is to modify all other handlers of built-in
>>>> functions to detect the same problem (not part of this patch).
>>>> Yet another step is to detect these problems in arguments
>>>> initialized using the non-string form:
>>>>
>>>>   const char a[] = { 'a', 'b', 'c' };
>>>>
>>>> This patch is meant to apply on top of the one for bug 86532
>>>> (I tested it with an earlier version of that patch so there
>>>> is code in the context that does not appear in the latest
>>>> version of the other diff).
>>>>
>>>> Martin
>>>>
>>>
>>
> 
> PR tree-optimization/86714 - tree-ssa-forwprop.c confused by too long initializer
> PR tree-optimization/86711 - wrong folding of memchr
> PR tree-optimization/86552 - missing warning for reading past the end of non-string arrays
> 
> gcc/ChangeLog:
> 
> 	PR tree-optimization/86714
> 	PR tree-optimization/86711
> 	PR tree-optimization/86552
> 	* builtins.h (warn_string_no_nul): Declare..
> 	(c_strlen): Add argument.
> 	* builtins.c (warn_string_no_nul): New function.
> 	(fold_builtin_strlen): Add argument.  Detect missing nul.
> 	(fold_builtin_1): Adjust.
> 	(string_length): Add argument and use it.
> 	(c_strlen): Same.
> 	(expand_builtin_strlen): Detect missing nul.
> 	* expr.c (string_constant): Add arguments.  Detect missing nul
> 	terminator and outermost declaration it's missing in.
> 	* expr.h (string_constant): Add argument.
> 	* fold-const.c (c_getstr): Change argument to bool*, rename
> 	other arguments.
> 	* fold-const-call.c (fold_const_call): Detect missing nul.
> 	* gimple-fold.c (get_range_strlen): Add argument.
> 	(get_maxval_strlen): Adjust.
> 	* gimple-fold.h (get_range_strlen): Add argument.
> 
> gcc/testsuite/ChangeLog:
> 
> 	PR tree-optimization/86714
> 	PR tree-optimization/86711
> 	PR tree-optimization/86552
> 	* gcc.c-torture/execute/memchr-1.c: New test.
> 	* gcc.c-torture/execute/pr86714.c: New test.
> 	* gcc.dg/warn-strlen-no-nul.c: New test.
> 
> diff --git a/gcc/builtins.c b/gcc/builtins.c
> index aa3e0d8..f4924d5 100644
> --- a/gcc/builtins.c
> +++ b/gcc/builtins.c
> @@ -150,7 +150,7 @@ static tree stabilize_va_list_loc (location_t, tree, int);
>  static rtx expand_builtin_expect (tree, rtx);
>  static tree fold_builtin_constant_p (tree);
>  static tree fold_builtin_classify_type (tree);
> -static tree fold_builtin_strlen (location_t, tree, tree);
> +static tree fold_builtin_strlen (location_t, tree, tree, tree);
>  static tree fold_builtin_inf (location_t, tree, int);
>  static tree rewrite_call_expr (location_t, tree, int, tree, int, ...);
>  static bool validate_arg (const_tree, enum tree_code code);
> @@ -550,6 +550,36 @@ string_length (const void *ptr, unsigned eltsize, unsigned maxelts)
>    return n;
>  }
>  
> +/* For a call expression EXP to a function that expects a string argument,
> +   issue a diagnostic due to it being a called with an argument NONSTR
> +   that is a character array with no terminating NUL.  */
> +
> +void
> +warn_string_no_nul (location_t loc, tree exp, tree fndecl, tree nonstr)
> +{
> +  loc = expansion_point_location_if_in_system_header (loc);
> +
> +  bool warned;
> +  if (exp)
> +    {
> +      if (!fndecl)
> +	fndecl = get_callee_fndecl (exp);
> +      warned = warning_at (loc, OPT_Wstringop_overflow_,
> +			   "%K%qD argument missing terminating nul",
> +			   exp, fndecl);
> +    }
> +  else
> +    {
> +      gcc_assert (fndecl);
> +      warned = warning_at (loc, OPT_Wstringop_overflow_,
> +			   "%qD argument missing terminating nul",
> +			   fndecl);
> +    }
> +
> +  if (warned && DECL_P (nonstr))
> +    inform (DECL_SOURCE_LOCATION (nonstr), "referenced argument declared here");
> +}
> +
>  /* Compute the length of a null-terminated character string or wide
>     character string handling character sizes of 1, 2, and 4 bytes.
>     TREE_STRING_LENGTH is not the right way because it evaluates to
> @@ -567,37 +597,60 @@ string_length (const void *ptr, unsigned eltsize, unsigned maxelts)
>     accesses.  Note that this implies the result is not going to be emitted
>     into the instruction stream.
>  
> +   When ARR is non-null and the string is not properly nul-terminated,
> +   set *ARR to the declaration of the outermost constant object whose
> +   initializer (or one of its elements) is not nul-terminated.
> +
>     The value returned is of type `ssizetype'.
>  
>     Unfortunately, string_constant can't access the values of const char
>     arrays with initializers, so neither can we do so here.  */
>  
>  tree
> -c_strlen (tree src, int only_value)
> +c_strlen (tree src, int only_value, tree *arr /* = NULL */)
>  {
>    STRIP_NOPS (src);
> +
> +  /* Used to detect non-nul-terminated strings in subexpressions
> +     of a conditional expression.  When ARR is null, point it at
> +     one of the elements for simplicity.  */
> +  tree arrs[] = { NULL_TREE, NULL_TREE };
> +  if (!arr)
> +    arr = arrs;

now arr is always != NULL

> +
>    if (TREE_CODE (src) == COND_EXPR
>        && (only_value || !TREE_SIDE_EFFECTS (TREE_OPERAND (src, 0))))
>      {
> -      tree len1, len2;
> -
> -      len1 = c_strlen (TREE_OPERAND (src, 1), only_value);
> -      len2 = c_strlen (TREE_OPERAND (src, 2), only_value);
> +      tree len1 = c_strlen (TREE_OPERAND (src, 1), only_value, arrs);
> +      tree len2 = c_strlen (TREE_OPERAND (src, 2), only_value, arrs + 1);
>        if (tree_int_cst_equal (len1, len2))
> -	return len1;
> +	{

funny, if called with NULL *arr and arrs[0] alias each other.

> +	  *arr = arrs[0] ? arrs[0] : arrs[1];
> +	  return len1;
> +	}
>      }
>  
>    if (TREE_CODE (src) == COMPOUND_EXPR
>        && (only_value || !TREE_SIDE_EFFECTS (TREE_OPERAND (src, 0))))
> -    return c_strlen (TREE_OPERAND (src, 1), only_value);
> +    return c_strlen (TREE_OPERAND (src, 1), only_value, arr);
>  
>    location_t loc = EXPR_LOC_OR_LOC (src, input_location);
>  
>    /* Offset from the beginning of the string in bytes.  */
>    tree byteoff;
> -  src = string_constant (src, &byteoff);
> -  if (src == 0)
> -    return NULL_TREE;
> +  /* Set if array is nul-terminated, false otherwise.  */
> +  bool nulterm;

note arr is always != null or pointing to arrs[0].

> +  src = string_constant (src, &byteoff, &nulterm, arr);
> +  if (!src)
> +    {
> +      *arr = arrs[0] ? arrs[0] : arrs[1];
> +      return NULL_TREE;
> +    }
> +
> +  /* Clear *ARR when the string is nul-terminated.  It should be
> +     of no interest to callers.  */
> +  if (nulterm)
> +    *arr = NULL_TREE;
>  
>    /* Determine the size of the string element.  */
>    unsigned eltsize
> @@ -621,6 +674,12 @@ c_strlen (tree src, int only_value)
>  	maxelts = maxelts / eltsize - 1;
>        }
>  
> +  /* Unless the caller is prepared to handle it by passing in a non-null
> +     ARR, fail if the terminating nul doesn't fit in the array the string
> +     is stored in (as in const char a[3] = "123";  */

note arr is always != NULL, thus this if is never taken.

> +  if (!arr && maxelts < strelts)
> +    return NULL_TREE;
> +
>    /* PTR can point to the byte representation of any string type, including
>       char* and wchar_t*.  */
>    const char *ptr = TREE_STRING_POINTER (src);
> @@ -650,7 +709,8 @@ c_strlen (tree src, int only_value)
>        offsave = fold_convert (ssizetype, offsave);
>        tree condexp = fold_build2_loc (loc, LE_EXPR, boolean_type_node, offsave,
>  				      build_int_cst (ssizetype, len * eltsize));

this computation is wrong, it computes not in units of eltsize,
I am however not sure if it is really good that this function tries to
compute strlen of wide character strings.

That said, please fix this computation first, in a different patch
instead of just fixing the indentation. (I know I pointed that lines are too
long here, but that was before I realized that the whole length computation
here is wrong).

> -      tree lenexp = size_diffop_loc (loc, ssize_int (strelts * eltsize), offsave);
> +      tree lenexp = size_diffop_loc (loc, ssize_int (strelts * eltsize),
> +				     offsave);
>        return fold_build3_loc (loc, COND_EXPR, ssizetype, condexp, lenexp,
>  			      build_zero_cst (ssizetype));
>      }
> @@ -690,7 +750,7 @@ c_strlen (tree src, int only_value)
>       Since ELTOFF is our starting index into the string, no further
>       calculation is needed.  */

What are you fixing here, I think that was another bug.
If this fixes something then it should be in a different patch,
just handling this.

>    unsigned len = string_length (ptr + eltoff * eltsize, eltsize,
> -				maxelts - eltoff);
> +				strelts - eltoff);
>  
>    return ssize_int (len);
>  }
> @@ -2855,7 +2915,6 @@ expand_builtin_strlen (tree exp, rtx target,
>  
>    struct expand_operand ops[4];
>    rtx pat;
> -  tree len;
>    tree src = CALL_EXPR_ARG (exp, 0);
>    rtx src_reg;
>    rtx_insn *before_strlen;
> @@ -2864,20 +2923,39 @@ expand_builtin_strlen (tree exp, rtx target,
>    unsigned int align;
>  
>    /* If the length can be computed at compile-time, return it.  */
> -  len = c_strlen (src, 0);
> +  tree array;
> +  tree len = c_strlen (src, 0, &array);

You know the c_strlen tries to compute wide character sizes,
but strlen does not do that, strlen (L"abc") should give 1
(or 0 on a BE machine)
I wonder if that is correct.

>    if (len)
> -    return expand_expr (len, target, target_mode, EXPAND_NORMAL);
> +    {
> +      if (array)
> +	{
> +	  /* Array refers to the non-nul terminated constant array
> +	     whose length is attempted to be computed.  */

I really wonder if it would not make more sense to have a
nonterminated_string_constant_p instead.

Last time I wanted to implement a warning in expand I faced the
problem that inlined functions will get one warning per invocation?

> +	  warn_string_no_nul (EXPR_LOCATION (exp), exp, NULL_TREE, array);
> +	  return NULL_RTX;
> +	}
> +      return expand_expr (len, target, target_mode, EXPAND_NORMAL);
> +    }
>  
>    /* If the length can be computed at compile-time and is constant
>       integer, but there are side-effects in src, evaluate
>       src for side-effects, then return len.
>       E.g. x = strlen (i++ ? "xfoo" + 1 : "bar");
>       can be optimized into: i++; x = 3;  */
> -  len = c_strlen (src, 1);
> -  if (len && TREE_CODE (len) == INTEGER_CST)
> +  len = c_strlen (src, 1, &array);
> +  if (len)
>      {
> -      expand_expr (src, const0_rtx, VOIDmode, EXPAND_NORMAL);
> -      return expand_expr (len, target, target_mode, EXPAND_NORMAL);
> +      if (array)
> +	{
> +	  warn_string_no_nul (EXPR_LOCATION (exp), exp, NULL_TREE, array);
> +	  return NULL_RTX;
> +	}
> +
> +      if (TREE_CODE (len) == INTEGER_CST)
> +	{
> +	  expand_expr (src, const0_rtx, VOIDmode, EXPAND_NORMAL);
> +	  return expand_expr (len, target, target_mode, EXPAND_NORMAL);
> +	}
>      }
>  
>    align = get_pointer_alignment (src) / BITS_PER_UNIT;
> @@ -8255,19 +8333,30 @@ fold_builtin_classify_type (tree arg)
>    return build_int_cst (integer_type_node, type_to_class (TREE_TYPE (arg)));
>  }
>  
> -/* Fold a call to __builtin_strlen with argument ARG.  */
> +/* Fold a strlen call to FNDECL of TYPE, and with argument ARG.  */
>  
>  static tree
> -fold_builtin_strlen (location_t loc, tree type, tree arg)
> +fold_builtin_strlen (location_t loc, tree fndecl, tree type, tree arg)
>  {
>    if (!validate_arg (arg, POINTER_TYPE))
>      return NULL_TREE;
>    else
>      {
> -      tree len = c_strlen (arg, 0);
> -
> +      tree arr = NULL_TREE;
> +      tree len = c_strlen (arg, 0, &arr);

Is it possible to write a test case where strlen(L"test") reaches this point?
what will c_strlen return then?

>        if (len)
> -	return fold_convert_loc (loc, type, len);
> +	{
> +	  if (loc == UNKNOWN_LOCATION && EXPR_HAS_LOCATION (arg))
> +	    loc = EXPR_LOCATION (arg);
> +
> +	  /* To avoid warning multiple times about non-nul-terminated
> +	     strings only warn if their length has been determined
> +	     and it's being folded.  */
> +	  if (arr)
> +	    warn_string_no_nul (loc, NULL_TREE, fndecl, arr);
> +
> +	  return fold_convert_loc (loc, type, len);
> +	}
>  
>        return NULL_TREE;
>      }
> @@ -9175,7 +9264,7 @@ fold_builtin_1 (location_t loc, tree fndecl, tree arg0)
>        return fold_builtin_classify_type (arg0);
>  
>      case BUILT_IN_STRLEN:
> -      return fold_builtin_strlen (loc, type, arg0);
> +      return fold_builtin_strlen (loc, fndecl, type, arg0);
>  
>      CASE_FLT_FN (BUILT_IN_FABS):
>      CASE_FLT_FN_FLOATN_NX (BUILT_IN_FABS):
> diff --git a/gcc/builtins.h b/gcc/builtins.h
> index 2e0a2f9..73b0b0b 100644
> --- a/gcc/builtins.h
> +++ b/gcc/builtins.h
> @@ -58,7 +58,7 @@ extern bool get_pointer_alignment_1 (tree, unsigned int *,
>  				     unsigned HOST_WIDE_INT *);
>  extern unsigned int get_pointer_alignment (tree);
>  extern unsigned string_length (const void*, unsigned, unsigned);
> -extern tree c_strlen (tree, int);
> +extern tree c_strlen (tree, int, tree * = NULL);
>  extern void expand_builtin_setjmp_setup (rtx, rtx);
>  extern void expand_builtin_setjmp_receiver (rtx);
>  extern void expand_builtin_update_setjmp_buf (rtx);
> @@ -103,7 +103,7 @@ extern bool target_char_cst_p (tree t, char *p);
>  
>  extern internal_fn associated_internal_fn (tree);
>  extern internal_fn replacement_internal_fn (gcall *);
> -
> +extern void warn_string_no_nul (location_t, tree, tree, tree);
>  extern tree max_object_size ();
>  
>  #endif /* GCC_BUILTINS_H */
> diff --git a/gcc/expr.c b/gcc/expr.c
> index de6709d..edbd7f8 100644
> --- a/gcc/expr.c
> +++ b/gcc/expr.c
> @@ -11271,10 +11271,14 @@ is_aligning_offset (const_tree offset, const_tree exp)
>  /* Return the tree node if an ARG corresponds to a string constant or zero
>     if it doesn't.  If we return nonzero, set *PTR_OFFSET to the (possibly
>     non-constant) offset in bytes within the string that ARG is accessing.
> +   If NULTERM is non-null, consider valid even sequences of characters that
> +   aren't nul-terminated strings.  In that case, set NULTERM if ARG refers
> +   to such a sequence and clear it otherwise.
>     The type of the offset is sizetype.  */
>  
>  tree
> -string_constant (tree arg, tree *ptr_offset)
> +string_constant (tree arg, tree *ptr_offset, bool *nulterm /* = NULL */,
> +		 tree *decl /* = NULL */)
>  {
>    tree array;
>    STRIP_NOPS (arg);
> @@ -11328,7 +11332,7 @@ string_constant (tree arg, tree *ptr_offset)
>  	return NULL_TREE;
>  
>        tree offset;
> -      if (tree str = string_constant (arg0, &offset))
> +      if (tree str = string_constant (arg0, &offset, nulterm, decl))
>  	{
>  	  /* Avoid pointers to arrays (see bug 86622).  */
>  	  if (POINTER_TYPE_P (TREE_TYPE (arg))
> @@ -11368,6 +11372,10 @@ string_constant (tree arg, tree *ptr_offset)
>    if (TREE_CODE (array) == STRING_CST)

Well, actually I think there _are_ STING_CSTs which are not null terminated.
Maybe not in C. But Fortran, Ada, Go...

>      {
>        *ptr_offset = fold_convert (sizetype, offset);
> +      if (nulterm)
> +	*nulterm = true;
> +      if (decl)
> +	*decl = NULL_TREE;
>        return array;
>      }
>  
> @@ -11414,6 +11422,49 @@ string_constant (tree arg, tree *ptr_offset)
>    if (!array_size || TREE_CODE (array_size) != INTEGER_CST)
>      return NULL_TREE;
>  
> +  unsigned HOST_WIDE_INT array_elts = tree_to_uhwi (array_size);
> +

I don't understand why this is necessary at all.
It looks way too complicated, to say the least.

TREE_TYPE (init) has already the type of the member.

> +  /* When ARG refers to an aggregate (of arrays) try to determine
> +     the size of the character array within the aggregate.  */
> +  tree ref = arg;
> +  tree reftype = TREE_TYPE (arg);
> +
> +  if (TREE_CODE (ref) == MEM_REF)
> +    {
> +      ref = TREE_OPERAND (ref, 0);
> +      if (TREE_CODE (ref) == ADDR_EXPR)
> +	{
> +	  ref = TREE_OPERAND (ref, 0);
> +	  reftype = TREE_TYPE (ref);
> +	}
> +    }
> +  else
> +    while (TREE_CODE (ref) == ARRAY_REF)
> +      {
> +	reftype = TREE_TYPE (ref);
> +	ref = TREE_OPERAND (ref, 0);
> +      }
> +
> +  if (TREE_CODE (ref) == COMPONENT_REF)
> +    reftype = TREE_TYPE (ref);
> +
> +  while (TREE_CODE (reftype) == ARRAY_TYPE)
> +    {
> +      tree next = TREE_TYPE (reftype);
> +      if (TREE_CODE (next) == INTEGER_TYPE)
> +	{
> +	  if (tree size = TYPE_SIZE_UNIT (reftype))
> +	    if (tree_fits_uhwi_p (size))
> +	      array_elts = tree_to_uhwi (size);

so array_elts is measued in bytes.

> +	  break;
> +	}
> +
> +      reftype = TREE_TYPE (reftype);
> +    }
> +
> +  if (decl)
> +    *decl = array;
> +
>    /* Avoid returning a string that doesn't fit in the array
>       it is stored in, like
>       const char a[4] = "abcde";
> @@ -11427,7 +11478,9 @@ string_constant (tree arg, tree *ptr_offset)
>    unsigned HOST_WIDE_INT length = TREE_STRING_LENGTH (init);
>    length = string_length (TREE_STRING_POINTER (init), charsize,
>  			  length / charsize);

Some callers especially those where the wrong code happens, expect
to be able to access the STRING_CST up to TREE_STRING_LENGTH,
But using string_length assume thy stop at the first nul char.

> -  if (compare_tree_int (array_size, length + 1) < 0)
> +  if (nulterm)

but here you compare bytes with length which is measued un chars.

> +    *nulterm = array_elts > length;
> +  else if (array_elts <= length)
>      return NULL_TREE;
>  
>    *ptr_offset = offset;
> diff --git a/gcc/expr.h b/gcc/expr.h
> index cf047d4..e630979 100644
> --- a/gcc/expr.h
> +++ b/gcc/expr.h
> @@ -288,7 +288,7 @@ expand_normal (tree exp)
>  
>  /* Return the tree node and offset if a given argument corresponds to
>     a string constant.  */
> -extern tree string_constant (tree, tree *);
> +extern tree string_constant (tree, tree *, bool * = NULL, tree * = NULL);
>  
>  /* Two different ways of generating switch statements.  */
>  extern int try_casesi (tree, tree, tree, tree, rtx, rtx, rtx, profile_probability);
> diff --git a/gcc/fold-const-call.c b/gcc/fold-const-call.c
> index 06a42060..849a443 100644
> --- a/gcc/fold-const-call.c
> +++ b/gcc/fold-const-call.c
> @@ -1199,9 +1199,14 @@ fold_const_call (combined_fn fn, tree type, tree arg)
>    switch (fn)
>      {
>      case CFN_BUILT_IN_STRLEN:
> -      if (const char *str = c_getstr (arg))
> -	return build_int_cst (type, strlen (str));
> -      return NULL_TREE;
> +      {
> +	bool nulterm;
> +	if (const char *str = c_getstr (arg, NULL, &nulterm))
> +	  if (nulterm)
> +	    return build_int_cst (type, strlen (str));
> +
> +	return NULL_TREE;
> +      }
>  
>      CASE_CFN_NAN:
>      CASE_FLT_FN_FLOATN_NX (CFN_BUILT_IN_NAN):
> diff --git a/gcc/fold-const.c b/gcc/fold-const.c
> index b318fc77..ecbc38c 100644
> --- a/gcc/fold-const.c
> +++ b/gcc/fold-const.c
> @@ -14577,23 +14577,23 @@ fold_build_pointer_plus_hwi_loc (location_t loc, tree ptr, HOST_WIDE_INT off)
>  /* Return a pointer P to a NUL-terminated string representing the sequence
>     of constant characters referred to by SRC (or a subsequence of such
>     characters within it if SRC is a reference to a string plus some
> -   constant offset).  If STRLEN is non-null, store stgrlen(P) in *STRLEN.
> -   If STRSIZE is non-null, store in *STRSIZE the size of the array
> -   the string is stored in; in that case, even though P points to a NUL
> -   terminated string, SRC need not refer to one.  This can happen when
> -   SRC refers to a constant character array initialized to all non-NUL
> -   values, as in the C declaration: char a[4] = "1234";  */
> +   constant offset).  If STRSIZE is non-null, store the size of the string
> +   literal in *STRSIZE, including any embedded or terminating nuls.  If
> +   If NULLTERM is non-null, set *NULLTERM if the referenced string is
> +   guaranteed to contain a terminating NUL.  Otherwise clear it.  This
> +   can happen in the case of valid C declarations such as:
> +   const char a[3] = "123";  */
>  
>  const char *
> -c_getstr (tree src, unsigned HOST_WIDE_INT *strlen /* = NULL */,
> -	  unsigned HOST_WIDE_INT *strsize /* = NULL */)
> +c_getstr (tree src, unsigned HOST_WIDE_INT *strsize /* = NULL */,
> +	  bool *nulterm /* = NULL */)
>  {
>    tree offset_node;
>  
> -  if (strlen)
> -    *strlen = 0;
> +  if (strsize)
> +    *strsize = 0;
>  
> -  src = string_constant (src, &offset_node);
> +  src = string_constant (src, &offset_node, nulterm);
>    if (src == 0)
>      return NULL;
>  
> @@ -14606,47 +14606,42 @@ c_getstr (tree src, unsigned HOST_WIDE_INT *strlen /* = NULL */,
>  	offset = tree_to_uhwi (offset_node);
>      }
>  
> -  /* STRING_LENGTH is the size of the string literal, including any
> -     embedded NULs.  STRING_SIZE is the size of the array the string
> +  /* STRING_SIZE is the size of the string literal, including any
> +     embedded NULs.  ARRAY_SIZE is the size of the array the string
>       literal is stored in.  */
> -  unsigned HOST_WIDE_INT string_length = TREE_STRING_LENGTH (src);
> -  unsigned HOST_WIDE_INT string_size = string_length;
> +  unsigned HOST_WIDE_INT string_size = TREE_STRING_LENGTH (src);
> +  unsigned HOST_WIDE_INT array_size = string_size;
>    tree type = TREE_TYPE (src);
>    if (tree size = TYPE_SIZE_UNIT (type))
>      if (tree_fits_shwi_p (size))
> -      string_size = tree_to_uhwi (size);
> +      array_size = tree_to_uhwi (size);
> +
> +  const char *string = TREE_STRING_POINTER (src);
>  
> -  if (strlen)
> +  if (strsize)
>      {
> -      /* Compute and store the length of the substring at OFFSET.
> +      /* Compute and store the size of the substring at OFFSET.
>  	 All offsets past the initial length refer to null strings.  */
> -      if (offset <= string_length)
> -	*strlen = string_length - offset;

this should be offset < string_length.

> +      if (offset <= string_size)
> +	*strsize = string_size - offset;
>        else
> -	*strlen = 0;

this should be 1, you may access the NUL byte of "".

> +	*strsize = 0;
>      }
>  
> -  const char *string = TREE_STRING_POINTER (src);
> -
> -  if (string_length == 0
> -      || offset >= string_size)
> +  if (string_size == 0
> +      || offset >= array_size)
>      return NULL;
>  
> -  if (strsize)
> -    {
> -      /* Support even constant character arrays that aren't proper
> -	 NUL-terminated strings.  */
> -      *strsize = string_size;
> -    }
> -  else if (string[string_length - 1] != '\0')

Well, this is broken for wide character strings.
but I hope we can get rid of STRING_CST which are
not explicitly null terminated.

> +  if (!nulterm && string[string_size - 1] != '\0')
>      {
> -      /* Support only properly NUL-terminated strings but handle
> -	 consecutive strings within the same array, such as the six
> -	 substrings in "1\0002\0003".  */
> +      /* When NULTERM is null, support only properly nul-terminated
> +	 strings but handle consecutive strings within the same array,
> +	 such as the six substrings in "1\0002\0003".  Otherwise, let
> +	 the caller deal with non-nul-terminated arrays.  */
>        return NULL;
>      }
>  
> -  return offset <= string_length ? string + offset : "";

this should be offset < string_size.

> +  return offset <= string_size ? string + offset : "";
>  }
>  
>  /* Given a tree T, compute which bits in T may be nonzero.  */
> diff --git a/gcc/fold-const.h b/gcc/fold-const.h
> index 1b9ccc0..a58a4a2 100644
> --- a/gcc/fold-const.h
> +++ b/gcc/fold-const.h
> @@ -188,7 +188,7 @@ extern tree const_unop (enum tree_code, tree, tree);
>  extern tree const_binop (enum tree_code, tree, tree, tree);
>  extern bool negate_mathfn_p (combined_fn);
>  extern const char *c_getstr (tree, unsigned HOST_WIDE_INT * = NULL,
> -			     unsigned HOST_WIDE_INT * = NULL);
> +			     bool * = NULL);
>  extern wide_int tree_nonzero_bits (const_tree);
>  
>  /* Return OFF converted to a pointer offset type suitable as offset for
> diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
> index c3fa570..9eefb37 100644
> --- a/gcc/gimple-fold.c
> +++ b/gcc/gimple-fold.c
> @@ -1275,11 +1275,13 @@ gimple_fold_builtin_memset (gimple_stmt_iterator *gsi, tree c, tree len)
>     Set *FLEXP to true if the range of the string lengths has been
>     obtained from the upper bound of an array at the end of a struct.
>     Such an array may hold a string that's longer than its upper bound
> -   due to it being used as a poor-man's flexible array member.  */
> +   due to it being used as a poor-man's flexible array member.
> +   Clear *NULTERM if ARG refers to a constant array that is known
> +   not be nul-terminated.  */
>  
>  static bool
>  get_range_strlen (tree arg, tree length[2], bitmap *visited, int type,
> -		  int fuzzy, bool *flexp)
> +		  int fuzzy, bool *flexp, bool *nulterm)
>  {
>    tree var, val = NULL_TREE;
>    gimple *def_stmt;
> @@ -1301,7 +1303,8 @@ get_range_strlen (tree arg, tree length[2], bitmap *visited, int type,
>  	      if (TREE_CODE (aop0) == INDIRECT_REF
>  		  && TREE_CODE (TREE_OPERAND (aop0, 0)) == SSA_NAME)
>  		return get_range_strlen (TREE_OPERAND (aop0, 0),
> -					 length, visited, type, fuzzy, flexp);
> +					 length, visited, type, fuzzy, flexp,
> +					 nulterm);
>  	    }
>  	  else if (TREE_CODE (TREE_OPERAND (op, 0)) == COMPONENT_REF && fuzzy)
>  	    {
> @@ -1329,13 +1332,18 @@ get_range_strlen (tree arg, tree length[2], bitmap *visited, int type,
>  	    return false;
>  	}
>        else
> -	val = c_strlen (arg, 1);
> +	{
> +	  tree arr;
> +	  val = c_strlen (arg, 1, &arr);
> +	  if (val && arr)
> +	    *nulterm = false;
> +	}
>  
>        if (!val && fuzzy)
>  	{
>  	  if (TREE_CODE (arg) == ADDR_EXPR)
>  	    return get_range_strlen (TREE_OPERAND (arg, 0), length,
> -				     visited, type, fuzzy, flexp);
> +				     visited, type, fuzzy, flexp, nulterm);
>  
>  	  if (TREE_CODE (arg) == ARRAY_REF)
>  	    {
> @@ -1477,7 +1485,8 @@ get_range_strlen (tree arg, tree length[2], bitmap *visited, int type,
>              || gimple_assign_unary_nop_p (def_stmt))
>            {
>              tree rhs = gimple_assign_rhs1 (def_stmt);
> -	    return get_range_strlen (rhs, length, visited, type, fuzzy, flexp);
> +	    return get_range_strlen (rhs, length, visited, type, fuzzy, flexp,
> +				     nulterm);
>            }
>  	else if (gimple_assign_rhs_code (def_stmt) == COND_EXPR)
>  	  {
> @@ -1486,7 +1495,7 @@ get_range_strlen (tree arg, tree length[2], bitmap *visited, int type,
>  
>  	    for (unsigned int i = 0; i < 2; i++)
>  	      if (!get_range_strlen (ops[i], length, visited, type, fuzzy,
> -				     flexp))
> +				     flexp, nulterm))
>  		{
>  		  if (fuzzy == 2)
>  		    *maxlen = build_all_ones_cst (size_type_node);
> @@ -1513,7 +1522,8 @@ get_range_strlen (tree arg, tree length[2], bitmap *visited, int type,
>              if (arg == gimple_phi_result (def_stmt))
>                continue;
>  
> -	    if (!get_range_strlen (arg, length, visited, type, fuzzy, flexp))
> +	    if (!get_range_strlen (arg, length, visited, type, fuzzy, flexp,
> +				   nulterm))
>  	      {
>  		if (fuzzy == 2)
>  		  *maxlen = build_all_ones_cst (size_type_node);
> @@ -1545,19 +1555,27 @@ get_range_strlen (tree arg, tree length[2], bitmap *visited, int type,
>     and false if PHIs and COND_EXPRs are to be handled optimistically,
>     if we can determine string length minimum and maximum; it will use
>     the minimum from the ones where it can be determined.
> -   STRICT false should be only used for warning code.  */
> +   STRICT false should be only used for warning code.
> +   When non-null, clear *NULTERM if ARG refers to a constant array
> +   that is known not be nul-terminated.  Otherwise set it to true.  */
>  
>  bool
> -get_range_strlen (tree arg, tree minmaxlen[2], bool strict)
> +get_range_strlen (tree arg, tree minmaxlen[2], bool strict /* = false */,
> +		  bool *nulterm /* = NULL */)
>  {
>    bitmap visited = NULL;
>  
>    minmaxlen[0] = NULL_TREE;
>    minmaxlen[1] = NULL_TREE;
>  
> +  bool nultermbuf;
> +  if (!nulterm)
> +    nulterm = &nultermbuf;
> +  *nulterm = true;
> +
>    bool flexarray = false;
>    if (!get_range_strlen (arg, minmaxlen, &visited, 1, strict ? 1 : 2,
> -			 &flexarray))
> +			 &flexarray, nulterm))
>      {
>        minmaxlen[0] = NULL_TREE;
>        minmaxlen[1] = NULL_TREE;
> @@ -1576,7 +1594,7 @@ get_maxval_strlen (tree arg, int type)
>    tree len[2] = { NULL_TREE, NULL_TREE };
>  
>    bool dummy;
> -  if (!get_range_strlen (arg, len, &visited, type, 0, &dummy))
> +  if (!get_range_strlen (arg, len, &visited, type, 0, &dummy, NULL))
>      len[1] = NULL_TREE;
>    if (visited)
>      BITMAP_FREE (visited);
> @@ -3496,12 +3514,14 @@ static bool
>  gimple_fold_builtin_strlen (gimple_stmt_iterator *gsi)
>  {
>    gimple *stmt = gsi_stmt (*gsi);
> +  tree arg = gimple_call_arg (stmt, 0);
>  
>    wide_int minlen;
>    wide_int maxlen;
>  
>    tree lenrange[2];
> -  if (!get_range_strlen (gimple_call_arg (stmt, 0), lenrange, true)
> +  bool nulterm;
> +  if (!get_range_strlen (arg, lenrange, true, &nulterm)
>        && lenrange[0] && TREE_CODE (lenrange[0]) == INTEGER_CST
>        && lenrange[1] && TREE_CODE (lenrange[1]) == INTEGER_CST)
>      {
> @@ -3523,6 +3543,10 @@ gimple_fold_builtin_strlen (gimple_stmt_iterator *gsi)
>  
>    if (minlen == maxlen)
>      {
> +      if (!nulterm)
> +	warn_string_no_nul (gimple_location (stmt), NULL_TREE,
> +			    gimple_call_fndecl (stmt), arg);
> +
>        lenrange[0] = force_gimple_operand_gsi (gsi, lenrange[0], true, NULL,
>  					      true, GSI_SAME_STMT);
>        replace_call_with_value (gsi, lenrange[0]);
> diff --git a/gcc/gimple-fold.h b/gcc/gimple-fold.h
> index 04e9bfa..fe11728 100644
> --- a/gcc/gimple-fold.h
> +++ b/gcc/gimple-fold.h
> @@ -25,7 +25,7 @@ along with GCC; see the file COPYING3.  If not see
>  extern tree create_tmp_reg_or_ssa_name (tree, gimple *stmt = NULL);
>  extern tree canonicalize_constructor_val (tree, tree);
>  extern tree get_symbol_constant_value (tree);
> -extern bool get_range_strlen (tree, tree[2], bool = false);
> +extern bool get_range_strlen (tree, tree[2], bool = false, bool * = NULL);
>  extern tree get_maxval_strlen (tree, int);
>  extern void gimplify_and_update_call_from_tree (gimple_stmt_iterator *, tree);
>  extern bool fold_stmt (gimple_stmt_iterator *);
> diff --git a/gcc/testsuite/gcc.c-torture/execute/memchr-1.c b/gcc/testsuite/gcc.c-torture/execute/memchr-1.c
> new file mode 100644
> index 0000000..2614bee
> --- /dev/null
> +++ b/gcc/testsuite/gcc.c-torture/execute/memchr-1.c
> @@ -0,0 +1,43 @@
> +/* PR tree-optimization/86711 - wrong folding of memchr
> +
> +   Verify that memchr() of arrays initialized with string literals
> +   where the nul doesn't fit in the array doesn't find the nul.  */
> +
> +extern void* memchr (const void*, int, __SIZE_TYPE__);
> +
> +#define A(expr) \
> +  ((expr) \
> +  ? (void)0							\
> +  : (__builtin_printf ("assertion failed on line %i: %s\n",	\
> +		       __LINE__, #expr), \
> +     __builtin_abort ()))
> +
> +static const char a1[4] = "1234";
> +static const char a2[2][4] = { "1234", "5678" };
> +
> +static const char a3[2][4] = { "1234", "567" };
> +
> +int main ()
> +{
> +  volatile int i = 0;
> +
> +  A (memchr (a1, 0, sizeof a1) == 0);
> +  A (memchr (a1 + 1, 0, sizeof a1 - 1) == 0);
> +  A (memchr (a1 + 3, 0, sizeof a1 - 3) == 0);
> +  A (memchr (a1 + i, 0, sizeof a1) == 0);
> +
> +  A (memchr (a2, 0, sizeof a2) == 0);
> +
> +  A (memchr (a2[0], 0, sizeof a2[0]) == 0);
> +  A (memchr (a2[1], 0, sizeof a2[1]) == 0);
> +
> +  A (memchr (a2[0] + 1, 0, sizeof a2[0] - 1) == 0);
> +  A (memchr (a2[1] + 2, 0, sizeof a2[1] - 2) == 0);
> +  A (memchr (a2[1] + 3, 0, sizeof a2[1] - 3) == 0);
> +
> +  A (memchr (a2[i], 0, sizeof a2[i]) == 0);
> +  A (memchr (a2[i] + 1, 0, sizeof a2[i] - 1) == 0);
> +
> +  /* This one must find it.  */
> +  A (memchr (a3, 0, sizeof a3) == &a3[1][3]);
> +}
> diff --git a/gcc/testsuite/gcc.c-torture/execute/pr86714.c b/gcc/testsuite/gcc.c-torture/execute/pr86714.c
> new file mode 100644
> index 0000000..3ad6852
> --- /dev/null
> +++ b/gcc/testsuite/gcc.c-torture/execute/pr86714.c
> @@ -0,0 +1,26 @@
> +/* PR tree-optimization/86714 - tree-ssa-forwprop.c confused by too
> +   long initializer
> +
> +   The excessively long initializer for a[0] is undefined but this
> +   test verifies that the excess elements are not considered a part
> +   of the value of the array as a matter of QoI.  */
> +
> +const char a[2][3] = { "1234", "xyz" };
> +char b[6];
> +
> +void *pb = b;
> +
> +int main ()
> +{
> +   __builtin_memcpy (b, a, 4);
> +   __builtin_memset (b + 4, 'a', 2);
> +
> +   if (b[0] != '1' || b[1] != '2' || b[2] != '3'
> +       || b[3] != 'x' || b[4] != 'a' || b[5] != 'a')
> +     __builtin_abort ();
> +
> +   if (__builtin_memcmp (pb, "123xaa", 6))
> +     __builtin_abort ();
> +
> +   return 0;
> +}
> diff --git a/gcc/testsuite/gcc.dg/warn-strlen-no-nul.c b/gcc/testsuite/gcc.dg/warn-strlen-no-nul.c
> new file mode 100644
> index 0000000..838528f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/warn-strlen-no-nul.c
> @@ -0,0 +1,239 @@
> +/* PR tree-optimization/86552 - missing warning for reading past the end
> +   of non-string arrays
> +   { dg-do compile }
> +   { dg-options "-O2 -Wall -ftrack-macro-expansion=0" } */
> +
> +extern __SIZE_TYPE__ strlen (const char*);
> +
> +const char a[5] = "12345";   /* { dg-message "declared here" } */
> +
> +int i0 = 0;
> +int i1 = 1;
> +
> +void sink (int, ...);
> +
> +#define CONCAT(a, b)   a ## b
> +#define CAT(a, b)      CONCAT(a, b)
> +
> +#define T(str)					\
> +  __attribute__ ((noipa))			\
> +  void CAT (test_, __LINE__) (void) {		\
> +    sink (strlen (str));			\
> +  } typedef void dummy_type
> +
> +T (a);                /* { dg-warning "argument missing terminating nul" }  */
> +T (&a[0]);            /* { dg-warning "nul" }  */
> +T (&a[0] + 1);        /* { dg-warning "nul" }  */
> +T (&a[1]);            /* { dg-warning "nul" }  */
> +T (&a[i0]);           /* { dg-warning "nul" }  */
> +T (&a[i0] + 1);       /* { dg-warning "nul" }  */
> +
> +
> +const char b[][5] = { /* { dg-message "declared here" } */
> +  "12", "123", "1234", "54321"
> +};
> +
> +T (b[0]);
> +T (b[1]);
> +T (b[2]);
> +T (b[3]);             /* { dg-warning "nul" }  */
> +T (b[i0]);
> +
> +T (&b[2][1]);
> +T (&b[2][1] + 1);
> +T (&b[2][i0]);
> +T (&b[2][1] + i0);
> +
> +T (&b[3][1]);         /* { dg-warning "nul" }  */
> +T (&b[3][1] + 1);     /* { dg-warning "nul" }  */
> +T (&b[3][i0]);        /* { dg-warning "nul" }  */
> +T (&b[3][1] + i0);    /* { dg-warning "nul" }  */
> +T (&b[3][i0] + i1);   /* { dg-warning "nul" }  */
> +
> +T (i0 ? "" : b[0]);
> +T (i0 ? "" : b[1]);
> +T (i0 ? "" : b[2]);
> +T (i0 ? "" : b[3]);               /* { dg-warning "nul" }  */
> +T (i0 ? b[0] : "");
> +T (i0 ? b[1] : "");
> +T (i0 ? b[2] : "");
> +T (i0 ? b[3] : "");               /* { dg-warning "nul" }  */
> +
> +T (i0 ? "1234" : b[3]);           /* { dg-warning "nul" }  */
> +T (i0 ? b[3] : "1234");           /* { dg-warning "nul" }  */
> +
> +T (i0 ? a : b[3]);                /* { dg-warning "nul" }  */
> +T (i0 ? b[0] : b[2]);
> +T (i0 ? b[2] : b[3]);             /* { dg-warning "nul" }  */
> +T (i0 ? b[3] : b[2]);             /* { dg-warning "nul" }  */
> +
> +T (i0 ? b[0] : &b[3][0] + 1);     /* { dg-warning "nul" }  */
> +T (i0 ? b[1] : &b[3][1] + i0);    /* { dg-warning "nul" }  */
> +
> +/* It's possible to detect the missing nul in the following two
> +   expressions but GCC doesn't do it yet.  */
> +T (i0 ? &b[3][1] + i0 : b[2]);    /* { dg-warning "nul" "bug" { xfail *-*-* } }  */
> +T (i0 ? &b[3][i0] : &b[3][i1]);   /* { dg-warning "nul" "bug" { xfail *-*-* } }  */
> +
> +
> +struct A { char a[5], b[5]; };
> +
> +const struct A s = { "1234", "12345" };
> +
> +T (s.a);
> +T (&s.a[0]);
> +T (&s.a[0] + 1);
> +T (&s.a[0] + i0);
> +T (&s.a[1]);
> +T (&s.a[1] + 1);
> +T (&s.a[1] + i0);
> +
> +T (s.b);              /* { dg-warning "nul" }  */
> +T (&s.b[0]);          /* { dg-warning "nul" }  */
> +T (&s.b[0] + 1);      /* { dg-warning "nul" }  */
> +T (&s.b[0] + i0);     /* { dg-warning "nul" }  */
> +T (&s.b[1]);          /* { dg-warning "nul" }  */
> +T (&s.b[1] + 1);      /* { dg-warning "nul" }  */
> +T (&s.b[1] + i0);     /* { dg-warning "nul" }  */
> +
> +struct B { struct A a[2]; };
> +
> +const struct B ba[] = {
> +  { { { "123", "12345" }, { "12345", "123" } } },
> +  { { { "12345", "123" }, { "123", "12345" } } },
> +  { { { "1", "12" },      { "123", "1234" } } },
> +  { { { "123", "1234" },  { "12345", "12" } } }
> +};
> +
> +T (ba[0].a[0].a);
> +T (&ba[0].a[0].a[0]);
> +T (&ba[0].a[0].a[0] + 1);
> +T (&ba[0].a[0].a[0] + i0);
> +T (&ba[0].a[0].a[1]);
> +T (&ba[0].a[0].a[1] + 1);
> +T (&ba[0].a[0].a[1] + i0);
> +
> +T (ba[0].a[0].b);           /* { dg-warning "nul" }  */
> +T (&ba[0].a[0].b[0]);       /* { dg-warning "nul" }  */
> +T (&ba[0].a[0].b[0] + 1);   /* { dg-warning "nul" }  */
> +T (&ba[0].a[0].b[0] + i0);  /* { dg-warning "nul" }  */
> +T (&ba[0].a[0].b[1]);       /* { dg-warning "nul" }  */
> +T (&ba[0].a[0].b[1] + 1);   /* { dg-warning "nul" }  */
> +T (&ba[0].a[0].b[1] + i0);  /* { dg-warning "nul" }  */
> +
> +T (ba[0].a[1].a);           /* { dg-warning "nul" }  */
> +T (&ba[0].a[1].a[0]);       /* { dg-warning "nul" }  */
> +T (&ba[0].a[1].a[0] + 1);   /* { dg-warning "nul" }  */
> +T (&ba[0].a[1].a[0] + i0);  /* { dg-warning "nul" }  */
> +T (&ba[0].a[1].a[1]);       /* { dg-warning "nul" }  */
> +T (&ba[0].a[1].a[1] + 1);   /* { dg-warning "nul" }  */
> +T (&ba[0].a[1].a[1] + i0);  /* { dg-warning "nul" }  */
> +
> +T (ba[0].a[1].b);
> +T (&ba[0].a[1].b[0]);
> +T (&ba[0].a[1].b[0] + 1);
> +T (&ba[0].a[1].b[0] + i0);
> +T (&ba[0].a[1].b[1]);
> +T (&ba[0].a[1].b[1] + 1);
> +T (&ba[0].a[1].b[1] + i0);
> +
> +
> +T (ba[1].a[0].a);           /* { dg-warning "nul" }  */
> +T (&ba[1].a[0].a[0]);       /* { dg-warning "nul" }  */
> +T (&ba[1].a[0].a[0] + 1);   /* { dg-warning "nul" }  */
> +T (&ba[1].a[0].a[0] + i0);  /* { dg-warning "nul" }  */
> +T (&ba[1].a[0].a[1]);       /* { dg-warning "nul" }  */
> +T (&ba[1].a[0].a[1] + 1);   /* { dg-warning "nul" }  */
> +T (&ba[1].a[0].a[1] + i0);  /* { dg-warning "nul" }  */
> +
> +T (ba[1].a[0].b);
> +T (&ba[1].a[0].b[0]);
> +T (&ba[1].a[0].b[0] + 1);
> +T (&ba[1].a[0].b[0] + i0);
> +T (&ba[1].a[0].b[1]);
> +T (&ba[1].a[0].b[1] + 1);
> +T (&ba[1].a[0].b[1] + i0);
> +
> +T (ba[1].a[1].a);
> +T (&ba[1].a[1].a[0]);
> +T (&ba[1].a[1].a[0] + 1);
> +T (&ba[1].a[1].a[0] + i0);
> +T (&ba[1].a[1].a[1]);
> +T (&ba[1].a[1].a[1] + 1);
> +T (&ba[1].a[1].a[1] + i0);
> +
> +T (ba[1].a[1].b);           /* { dg-warning "nul" }  */
> +T (&ba[1].a[1].b[0]);       /* { dg-warning "nul" }  */
> +T (&ba[1].a[1].b[0] + 1);   /* { dg-warning "nul" }  */
> +T (&ba[1].a[1].b[0] + i0);  /* { dg-warning "nul" }  */
> +T (&ba[1].a[1].b[1]);       /* { dg-warning "nul" }  */
> +T (&ba[1].a[1].b[1] + 1);   /* { dg-warning "nul" }  */
> +T (&ba[1].a[1].b[1] + i0);  /* { dg-warning "nul" }  */
> +
> +
> +T (ba[2].a[0].a);
> +T (&ba[2].a[0].a[0]);
> +T (&ba[2].a[0].a[0] + 1);
> +T (&ba[2].a[0].a[0] + i0);
> +T (&ba[2].a[0].a[1]);
> +T (&ba[2].a[0].a[1] + 1);
> +T (&ba[2].a[0].a[1] + i0);
> +
> +T (ba[2].a[0].b);
> +T (&ba[2].a[0].b[0]);
> +T (&ba[2].a[0].b[0] + 1);
> +T (&ba[2].a[0].b[0] + i0);
> +T (&ba[2].a[0].b[1]);
> +T (&ba[2].a[0].b[1] + 1);
> +T (&ba[2].a[0].b[1] + i0);
> +
> +T (ba[2].a[1].a);
> +T (&ba[2].a[1].a[0]);
> +T (&ba[2].a[1].a[0] + 1);
> +T (&ba[2].a[1].a[0] + i0);
> +T (&ba[2].a[1].a[1]);
> +T (&ba[2].a[1].a[1] + 1);
> +T (&ba[2].a[1].a[1] + i0);
> +
> +
> +T (ba[3].a[0].a);
> +T (&ba[3].a[0].a[0]);
> +T (&ba[3].a[0].a[0] + 1);
> +T (&ba[3].a[0].a[0] + i0);
> +T (&ba[3].a[0].a[1]);
> +T (&ba[3].a[0].a[1] + 1);
> +T (&ba[3].a[0].a[1] + i0);
> +
> +T (ba[3].a[0].b);
> +T (&ba[3].a[0].b[0]);
> +T (&ba[3].a[0].b[0] + 1);
> +T (&ba[3].a[0].b[0] + i0);
> +T (&ba[3].a[0].b[1]);
> +T (&ba[3].a[0].b[1] + 1);
> +T (&ba[3].a[0].b[1] + i0);
> +
> +T (ba[3].a[1].a);           /* { dg-warning "nul" }  */
> +T (&ba[3].a[1].a[0]);	    /* { dg-warning "nul" }  */
> +T (&ba[3].a[1].a[0] + 1);   /* { dg-warning "nul" }  */
> +T (&ba[3].a[1].a[0] + i0);  /* { dg-warning "nul" }  */
> +T (&ba[3].a[1].a[1]);	    /* { dg-warning "nul" }  */
> +T (&ba[3].a[1].a[1] + 1);   /* { dg-warning "nul" }  */
> +T (&ba[3].a[1].a[1] + i0);  /* { dg-warning "nul" }  */
> +
> +T (ba[3].a[1].b);
> +T (&ba[3].a[1].b[0]);	
> +T (&ba[3].a[1].b[0] + 1);
> +T (&ba[3].a[1].b[0] + i0);
> +T (&ba[3].a[1].b[1]);	
> +T (&ba[3].a[1].b[1] + 1);
> +T (&ba[3].a[1].b[1] + i0);
> +
> +
> +T (i0 ? ba[0].a[0].a : ba[0].a[0].b);           /* { dg-warning "nul" }  */
> +T (i0 ? ba[0].a[0].a : ba[0].a[0].b);           /* { dg-warning "nul" }  */
> +
> +T (i0 ? &ba[0].a[0].a[0] : &ba[3].a[1].a[0]);   /* { dg-warning "nul" }  */
> +T (i0 ? &ba[3].a[1].a[1] :  ba[0].a[0].a);      /* { dg-warning "nul" }  */
> +
> +T (i0 ? ba[0].a[0].a : ba[0].a[1].b);
> +T (i0 ? ba[0].a[1].b : ba[0].a[0].a);


Bernd.
Bernd Edlinger Aug. 2, 2018, 6:56 p.m. UTC | #2
On 08/02/18 15:26, Bernd Edlinger wrote:
>>
>>    /* If the length can be computed at compile-time, return it.  */
>> -  len = c_strlen (src, 0);
>> +  tree array;
>> +  tree len = c_strlen (src, 0, &array);
> 
> You know the c_strlen tries to compute wide character sizes,
> but strlen does not do that, strlen (L"abc") should give 1
> (or 0 on a BE machine)
> I wonder if that is correct.
> 
[snip]
>>
>>  static tree
>> -fold_builtin_strlen (location_t loc, tree type, tree arg)
>> +fold_builtin_strlen (location_t loc, tree fndecl, tree type, tree arg)
>>  {
>>    if (!validate_arg (arg, POINTER_TYPE))
>>      return NULL_TREE;
>>    else
>>      {
>> -      tree len = c_strlen (arg, 0);
>> -
>> +      tree arr = NULL_TREE;
>> +      tree len = c_strlen (arg, 0, &arr);
> 
> Is it possible to write a test case where strlen(L"test") reaches this point?
> what will c_strlen return then?
> 

Yes, of course it is:

$ cat y.c
int f(char *x)
{
   return __builtin_strlen(x);
}

int main ()
{
   return f((char*)&L"abcdef"[0]);
}
$ gcc -O3 -S y.c
$ cat y.s
main:
.LFB1:
	.cfi_startproc
	movl	$6, %eax
	ret
	.cfi_endproc

The reason is that c_strlen tries to fold wide chars at all.
I do not know when that was introduced, was that already before your last patches?
Is it possible to revert the last few patches cleanly?


Bernd.
Martin Sebor Aug. 2, 2018, 8:34 p.m. UTC | #3
On 08/02/2018 12:56 PM, Bernd Edlinger wrote:
> On 08/02/18 15:26, Bernd Edlinger wrote:
>>>
>>>    /* If the length can be computed at compile-time, return it.  */
>>> -  len = c_strlen (src, 0);
>>> +  tree array;
>>> +  tree len = c_strlen (src, 0, &array);
>>
>> You know the c_strlen tries to compute wide character sizes,
>> but strlen does not do that, strlen (L"abc") should give 1
>> (or 0 on a BE machine)
>> I wonder if that is correct.
>>
> [snip]
>>>
>>>  static tree
>>> -fold_builtin_strlen (location_t loc, tree type, tree arg)
>>> +fold_builtin_strlen (location_t loc, tree fndecl, tree type, tree arg)
>>>  {
>>>    if (!validate_arg (arg, POINTER_TYPE))
>>>      return NULL_TREE;
>>>    else
>>>      {
>>> -      tree len = c_strlen (arg, 0);
>>> -
>>> +      tree arr = NULL_TREE;
>>> +      tree len = c_strlen (arg, 0, &arr);
>>
>> Is it possible to write a test case where strlen(L"test") reaches this point?
>> what will c_strlen return then?
>>
>
> Yes, of course it is:
>
> $ cat y.c
> int f(char *x)
> {
>    return __builtin_strlen(x);
> }
>
> int main ()
> {
>    return f((char*)&L"abcdef"[0]);
> }
> $ gcc -O3 -S y.c
> $ cat y.s
> main:
> .LFB1:
> 	.cfi_startproc
> 	movl	$6, %eax
> 	ret
> 	.cfi_endproc
>
> The reason is that c_strlen tries to fold wide chars at all.
> I do not know when that was introduced, was that already before your last patches?

The function itself was introduced in 1992 if not earlier,
before wide strings even existed.  AFAICS, it has always
accepted strings of all widths.  Until r241489 (in GCC 7)
it computed their length in bytes, not characters.  I don't
know if that was on purpose or if it was just never changed
to compute the length in characters when wide strings were
first introduced.  From the name I assume it's the latter.
The difference wasn't detected until sprintf tests were added
for wide string directives.  The ChangeLog description for
the change reads: Correctly handle wide strings.  I didn't
consider pathological cases like strlen (L"abc").  It
shouldn't be difficult to change to fix this case.

Martin
Bernd Edlinger Aug. 3, 2018, 1 p.m. UTC | #4
On 08/02/18 22:34, Martin Sebor wrote:
> On 08/02/2018 12:56 PM, Bernd Edlinger wrote:
>> On 08/02/18 15:26, Bernd Edlinger wrote:
>>>>
>>>>    /* If the length can be computed at compile-time, return it.  */
>>>> -  len = c_strlen (src, 0);
>>>> +  tree array;
>>>> +  tree len = c_strlen (src, 0, &array);
>>>
>>> You know the c_strlen tries to compute wide character sizes,
>>> but strlen does not do that, strlen (L"abc") should give 1
>>> (or 0 on a BE machine)
>>> I wonder if that is correct.
>>>
>> [snip]
>>>>
>>>>  static tree
>>>> -fold_builtin_strlen (location_t loc, tree type, tree arg)
>>>> +fold_builtin_strlen (location_t loc, tree fndecl, tree type, tree arg)
>>>>  {
>>>>    if (!validate_arg (arg, POINTER_TYPE))
>>>>      return NULL_TREE;
>>>>    else
>>>>      {
>>>> -      tree len = c_strlen (arg, 0);
>>>> -
>>>> +      tree arr = NULL_TREE;
>>>> +      tree len = c_strlen (arg, 0, &arr);
>>>
>>> Is it possible to write a test case where strlen(L"test") reaches this point?
>>> what will c_strlen return then?
>>>
>>
>> Yes, of course it is:
>>
>> $ cat y.c
>> int f(char *x)
>> {
>>    return __builtin_strlen(x);
>> }
>>
>> int main ()
>> {
>>    return f((char*)&L"abcdef"[0]);
>> }
>> $ gcc -O3 -S y.c
>> $ cat y.s
>> main:
>> .LFB1:
>>     .cfi_startproc
>>     movl    $6, %eax
>>     ret
>>     .cfi_endproc
>>
>> The reason is that c_strlen tries to fold wide chars at all.
>> I do not know when that was introduced, was that already before your last patches?
> 
> The function itself was introduced in 1992 if not earlier,
> before wide strings even existed.  AFAICS, it has always
> accepted strings of all widths.  Until r241489 (in GCC 7)
> it computed their length in bytes, not characters.  I don't
> know if that was on purpose or if it was just never changed
> to compute the length in characters when wide strings were
> first introduced.  From the name I assume it's the latter.
> The difference wasn't detected until sprintf tests were added
> for wide string directives.  The ChangeLog description for
> the change reads: Correctly handle wide strings.  I didn't
> consider pathological cases like strlen (L"abc").  It
> shouldn't be difficult to change to fix this case.
> 

Oh, oh, oh....

$ cat y3.c
int main ()
{
   char c[100];
   int x = __builtin_sprintf (c, "%S", L"\uFFFF");

   __builtin_printf("%d %ld\n", x,__builtin_strlen(c));
}

$ gcc-4.8 -O3 -std=c99 y3.c
$ ./a.out
-1 0
$ gcc -O3 y3.c
$ ./a.out
1 0
$ echo $LANG
de_DE.UTF-8

I would have expected L"\uFFFF" to converted to UTF-8
or another encoding, so the return value if sprintf is
far from obvious, and probably language dependent.

Why do you think it is a good idea to use really every
opportunity to optimize totally unnecessary things like
using the return value from the sprintf function as it is?

Did you never think this adds a significant maintenance
burden on the rest of us as well?


Bernd.
Martin Sebor Aug. 3, 2018, 7:59 p.m. UTC | #5
On 08/03/2018 07:00 AM, Bernd Edlinger wrote:
> On 08/02/18 22:34, Martin Sebor wrote:
>> On 08/02/2018 12:56 PM, Bernd Edlinger wrote:
>>> On 08/02/18 15:26, Bernd Edlinger wrote:
>>>>>
>>>>>    /* If the length can be computed at compile-time, return it.  */
>>>>> -  len = c_strlen (src, 0);
>>>>> +  tree array;
>>>>> +  tree len = c_strlen (src, 0, &array);
>>>>
>>>> You know the c_strlen tries to compute wide character sizes,
>>>> but strlen does not do that, strlen (L"abc") should give 1
>>>> (or 0 on a BE machine)
>>>> I wonder if that is correct.
>>>>
>>> [snip]
>>>>>
>>>>>  static tree
>>>>> -fold_builtin_strlen (location_t loc, tree type, tree arg)
>>>>> +fold_builtin_strlen (location_t loc, tree fndecl, tree type, tree arg)
>>>>>  {
>>>>>    if (!validate_arg (arg, POINTER_TYPE))
>>>>>      return NULL_TREE;
>>>>>    else
>>>>>      {
>>>>> -      tree len = c_strlen (arg, 0);
>>>>> -
>>>>> +      tree arr = NULL_TREE;
>>>>> +      tree len = c_strlen (arg, 0, &arr);
>>>>
>>>> Is it possible to write a test case where strlen(L"test") reaches this point?
>>>> what will c_strlen return then?
>>>>
>>>
>>> Yes, of course it is:
>>>
>>> $ cat y.c
>>> int f(char *x)
>>> {
>>>    return __builtin_strlen(x);
>>> }
>>>
>>> int main ()
>>> {
>>>    return f((char*)&L"abcdef"[0]);
>>> }
>>> $ gcc -O3 -S y.c
>>> $ cat y.s
>>> main:
>>> .LFB1:
>>>     .cfi_startproc
>>>     movl    $6, %eax
>>>     ret
>>>     .cfi_endproc
>>>
>>> The reason is that c_strlen tries to fold wide chars at all.
>>> I do not know when that was introduced, was that already before your last patches?
>>
>> The function itself was introduced in 1992 if not earlier,
>> before wide strings even existed.  AFAICS, it has always
>> accepted strings of all widths.  Until r241489 (in GCC 7)
>> it computed their length in bytes, not characters.  I don't
>> know if that was on purpose or if it was just never changed
>> to compute the length in characters when wide strings were
>> first introduced.  From the name I assume it's the latter.
>> The difference wasn't detected until sprintf tests were added
>> for wide string directives.  The ChangeLog description for
>> the change reads: Correctly handle wide strings.  I didn't
>> consider pathological cases like strlen (L"abc").  It
>> shouldn't be difficult to change to fix this case.
>>
>
> Oh, oh, oh....
>
> $ cat y3.c
> int main ()
> {
>    char c[100];
>    int x = __builtin_sprintf (c, "%S", L"\uFFFF");
>
>    __builtin_printf("%d %ld\n", x,__builtin_strlen(c));
> }
>
> $ gcc-4.8 -O3 -std=c99 y3.c
> $ ./a.out
> -1 0
> $ gcc -O3 y3.c
> $ ./a.out
> 1 0
> $ echo $LANG
> de_DE.UTF-8
>
> I would have expected L"\uFFFF" to converted to UTF-8
> or another encoding, so the return value if sprintf is
> far from obvious, and probably language dependent.
>
> Why do you think it is a good idea to use really every
> opportunity to optimize totally unnecessary things like
> using the return value from the sprintf function as it is?
>
> Did you never think this adds a significant maintenance
> burden on the rest of us as well?

Your condescending tone is uncalled for, and you clearly speak
out of ignorance.  I don't owe you an explanation but as I have
said multiple times: most of my work, including the sprintf pass,
is primarily motivated by detecting bugs like buffer overflow.
Optimization is only a secondary goal (but bug detection depends
on it).  It may come as a shock to you but mistakes happen.
That's why it's important to make an effort to detect them.
This is one is a simple typo (handling %S the same way as %s
instead of %ls.

If you are incapable of a professional tone I would suggest
you go harass someone else.

Martin
Jeff Law Aug. 15, 2018, 5:31 a.m. UTC | #6
On 08/03/2018 07:00 AM, Bernd Edlinger wrote:
> On 08/02/18 22:34, Martin Sebor wrote:
>> On 08/02/2018 12:56 PM, Bernd Edlinger wrote:
>>> On 08/02/18 15:26, Bernd Edlinger wrote:
>>>>>
>>>>>    /* If the length can be computed at compile-time, return it.  */
>>>>> -  len = c_strlen (src, 0);
>>>>> +  tree array;
>>>>> +  tree len = c_strlen (src, 0, &array);
>>>>
>>>> You know the c_strlen tries to compute wide character sizes,
>>>> but strlen does not do that, strlen (L"abc") should give 1
>>>> (or 0 on a BE machine)
>>>> I wonder if that is correct.
>>>>
>>> [snip]
>>>>>
>>>>>  static tree
>>>>> -fold_builtin_strlen (location_t loc, tree type, tree arg)
>>>>> +fold_builtin_strlen (location_t loc, tree fndecl, tree type, tree arg)
>>>>>  {
>>>>>    if (!validate_arg (arg, POINTER_TYPE))
>>>>>      return NULL_TREE;
>>>>>    else
>>>>>      {
>>>>> -      tree len = c_strlen (arg, 0);
>>>>> -
>>>>> +      tree arr = NULL_TREE;
>>>>> +      tree len = c_strlen (arg, 0, &arr);
>>>>
>>>> Is it possible to write a test case where strlen(L"test") reaches this point?
>>>> what will c_strlen return then?
>>>>
>>>
>>> Yes, of course it is:
>>>
>>> $ cat y.c
>>> int f(char *x)
>>> {
>>>    return __builtin_strlen(x);
>>> }
>>>
>>> int main ()
>>> {
>>>    return f((char*)&L"abcdef"[0]);
>>> }
>>> $ gcc -O3 -S y.c
>>> $ cat y.s
>>> main:
>>> .LFB1:
>>>     .cfi_startproc
>>>     movl    $6, %eax
>>>     ret
>>>     .cfi_endproc
>>>
>>> The reason is that c_strlen tries to fold wide chars at all.
>>> I do not know when that was introduced, was that already before your last patches?
>>
>> The function itself was introduced in 1992 if not earlier,
>> before wide strings even existed.  AFAICS, it has always
>> accepted strings of all widths.  Until r241489 (in GCC 7)
>> it computed their length in bytes, not characters.  I don't
>> know if that was on purpose or if it was just never changed
>> to compute the length in characters when wide strings were
>> first introduced.  From the name I assume it's the latter.
>> The difference wasn't detected until sprintf tests were added
>> for wide string directives.  The ChangeLog description for
>> the change reads: Correctly handle wide strings.  I didn't
>> consider pathological cases like strlen (L"abc").  It
>> shouldn't be difficult to change to fix this case.
>>
> 
> Oh, oh, oh....
> 
> $ cat y3.c
> int main ()
> {
>    char c[100];
>    int x = __builtin_sprintf (c, "%S", L"\uFFFF");
> 
>    __builtin_printf("%d %ld\n", x,__builtin_strlen(c));
> }
> 
> $ gcc-4.8 -O3 -std=c99 y3.c
> $ ./a.out
> -1 0
> $ gcc -O3 y3.c
> $ ./a.out
> 1 0
> $ echo $LANG
> de_DE.UTF-8
> 
> I would have expected L"\uFFFF" to converted to UTF-8
> or another encoding, so the return value if sprintf is
> far from obvious, and probably language dependent.
FWIW, Martin has a patch (under review) that I think will fix this and
includes a testcase that is likely inspired by the code above.

> 
> Why do you think it is a good idea to use really every
> opportunity to optimize totally unnecessary things like
> using the return value from the sprintf function as it is?
> 
> Did you never think this adds a significant maintenance
> burden on the rest of us as well?
It largely came along for free during the implementation of the sprintf
warnings.  That's changed a bit over time, but it's still the case that
the sprintf warnings do all the analysis necessary to optimize the
sprintf return value.

As both Martin and I have stated before the real goal is getting good
warnings from sprintf.  Optimization is a distant second.

jeff
Jeff Law Aug. 17, 2018, 5:14 a.m. UTC | #7
On 08/01/2018 08:44 PM, Martin Sebor wrote:
> Since the foundation of the patch is detecting and avoiding
> the overly aggressive folding of unterminated char arrays,
> besides issuing a warning for such arguments to strlen,
> the patch also fixes pr86711 - wrong folding of memchr, and
> pr86714 - tree-ssa-forwprop.c confused by too long initializer.
> 
> The substance of the attached updated patch is unchanged,
> I have just added test cases for the two additional bugs.
> 
Just to be absolutely sure.  The patch attached to this message is
superceded by the later 6 part patchkit that fixes 86714, 86711 and 86552.

I'm just trying to make sure I'm looking at the right kits from both you
and Bernd for the next step.

Thanks,
Jeff
Martin Sebor Aug. 17, 2018, 2:38 p.m. UTC | #8
On 08/16/2018 11:14 PM, Jeff Law wrote:
> On 08/01/2018 08:44 PM, Martin Sebor wrote:
>> Since the foundation of the patch is detecting and avoiding
>> the overly aggressive folding of unterminated char arrays,
>> besides issuing a warning for such arguments to strlen,
>> the patch also fixes pr86711 - wrong folding of memchr, and
>> pr86714 - tree-ssa-forwprop.c confused by too long initializer.
>>
>> The substance of the attached updated patch is unchanged,
>> I have just added test cases for the two additional bugs.
>>
> Just to be absolutely sure.  The patch attached to this message is
> superceded by the later 6 part patchkit that fixes 86714, 86711 and 86552.
>
> I'm just trying to make sure I'm looking at the right kits from both you
> and Bernd for the next step.

Yes, I broke up this patch into a series, with the basic
"infrastructure" in part 1:

[PATCH 1/6] prevent folding of unterminated const arrays (PR
86711, 86714)
https://gcc.gnu.org/ml/gcc-patches/2018-08/msg00769.html

and the strlen missing nul detection in part 2:
[PATCH 2/6] detect unterminated const arrays in strlen calls
(PR 86552)
https://gcc.gnu.org/ml/gcc-patches/2018-08/msg00778.html

The rest of the series (parts 3 through 6) add the missing
nul detection to a few other built-ins.

Martin
Jeff Law Aug. 24, 2018, 6:36 a.m. UTC | #9
On 08/02/2018 07:26 AM, Bernd Edlinger wrote:
> On 08/02/18 04:44, Martin Sebor wrote:
>> Since the foundation of the patch is detecting and avoiding
>> the overly aggressive folding of unterminated char arrays,
>> besides issuing a warning for such arguments to strlen,
>> the patch also fixes pr86711 - wrong folding of memchr, and
>> pr86714 - tree-ssa-forwprop.c confused by too long initializer.
>>
>> The substance of the attached updated patch is unchanged,
>> I have just added test cases for the two additional bugs.
>>
>> Bernd, as I mentioned Wednesday, the patch supersedes
>> yours here:
>> https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01800.html
>>
> 
> No problem, but I hope you understand, that I still uphold
> my patch.
> 
> So we have two patches now:
> - mine, fixing a wrong code bug,
> - yours, implementing a new warning and fixing a wrong
> code bug at the same time.
> 
> I will add a few comments to your patch below.
[ ... ]

So a lot of the comments are out of date, presumably because Martin
fixed the issues you pointed out in his second version of the patch.
But there's still some useful nuggets in your comments that are still
relevant.

FYI it appears that sometimes you comment above a chunk of code, and
other times below.  That makes it exceedingly difficult to figure out
the issue you're trying to raise.


> 
>> Martin
>>
>> On 07/30/2018 01:17 PM, Martin Sebor wrote:
>>> Attached is an updated version of the patch that handles more
>>> instances of calling strlen() on a constant array that is not
>>> a nul-terminated string.
>>>
>>> No other functions except strlen are explicitly handled yet,
>>> and neither are constant arrays with braced-initializer lists
>>> like const char a[] = { 'a', 'b', 'c' };  I am testing
>>> an independent solution for those (bug 86552).  Once those
>>> are handled the warning will be able to detect those as well.
>>>
>>> Tested on x86_64-linux.
>>>
>>> On 07/25/2018 05:38 PM, Martin Sebor wrote:
>>>> Ping: https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01124.html
>>>>
>>>> The fix for bug 86532 has been checked in so this enhancement
>>>> can now be applied on top of it (with only minor adjustments).
>>>>
>>>> On 07/19/2018 02:08 PM, Martin Sebor wrote:
>>>>> In the discussion of my patch for pr86532 Bernd noted that
>>>>> GCC silently accepts constant character arrays with no
>>>>> terminating nul as arguments to strlen (and other string
>>>>> functions).
>>>>>
>>>>> The attached patch is a first step in detecting these kinds
>>>>> of bugs in strlen calls by issuing -Wstringop-overflow.
>>>>> The next step is to modify all other handlers of built-in
>>>>> functions to detect the same problem (not part of this patch).
>>>>> Yet another step is to detect these problems in arguments
>>>>> initialized using the non-string form:
>>>>>
>>>>>   const char a[] = { 'a', 'b', 'c' };
>>>>>
>>>>> This patch is meant to apply on top of the one for bug 86532
>>>>> (I tested it with an earlier version of that patch so there
>>>>> is code in the context that does not appear in the latest
>>>>> version of the other diff).
>>>>>
>>>>> Martin
>>>>>
>>>>
>>>
>>
>> PR tree-optimization/86714 - tree-ssa-forwprop.c confused by too long initializer
>> PR tree-optimization/86711 - wrong folding of memchr
>> PR tree-optimization/86552 - missing warning for reading past the end of non-string arrays
>>
>> gcc/ChangeLog:
>>
>> 	PR tree-optimization/86714
>> 	PR tree-optimization/86711
>> 	PR tree-optimization/86552
>> 	* builtins.h (warn_string_no_nul): Declare..
>> 	(c_strlen): Add argument.
>> 	* builtins.c (warn_string_no_nul): New function.
>> 	(fold_builtin_strlen): Add argument.  Detect missing nul.
>> 	(fold_builtin_1): Adjust.
>> 	(string_length): Add argument and use it.
>> 	(c_strlen): Same.
>> 	(expand_builtin_strlen): Detect missing nul.
>> 	* expr.c (string_constant): Add arguments.  Detect missing nul
>> 	terminator and outermost declaration it's missing in.
>> 	* expr.h (string_constant): Add argument.
>> 	* fold-const.c (c_getstr): Change argument to bool*, rename
>> 	other arguments.
>> 	* fold-const-call.c (fold_const_call): Detect missing nul.
>> 	* gimple-fold.c (get_range_strlen): Add argument.
>> 	(get_maxval_strlen): Adjust.
>> 	* gimple-fold.h (get_range_strlen): Add argument.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 	PR tree-optimization/86714
>> 	PR tree-optimization/86711
>> 	PR tree-optimization/86552
>> 	* gcc.c-torture/execute/memchr-1.c: New test.
>> 	* gcc.c-torture/execute/pr86714.c: New test.
>> 	* gcc.dg/warn-strlen-no-nul.c: New test.
>>
>> diff --git a/gcc/builtins.c b/gcc/builtins.c
>> index aa3e0d8..f4924d5 100644
>> --- a/gcc/builtins.c
>> +++ b/gcc/builtins.c
>> @@ -150,7 +150,7 @@ static tree stabilize_va_list_loc (location_t, tree, int);
>>  static rtx expand_builtin_expect (tree, rtx);
>>  static tree fold_builtin_constant_p (tree);
>>  static tree fold_builtin_classify_type (tree);
>> -static tree fold_builtin_strlen (location_t, tree, tree);
>> +static tree fold_builtin_strlen (location_t, tree, tree, tree);
>>  static tree fold_builtin_inf (location_t, tree, int);
>>  static tree rewrite_call_expr (location_t, tree, int, tree, int, ...);
>>  static bool validate_arg (const_tree, enum tree_code code);
>> @@ -550,6 +550,36 @@ string_length (const void *ptr, unsigned eltsize, unsigned maxelts)
>>    return n;
>>  }
>>  
>> +/* For a call expression EXP to a function that expects a string argument,
>> +   issue a diagnostic due to it being a called with an argument NONSTR
>> +   that is a character array with no terminating NUL.  */
>> +
>> +void
>> +warn_string_no_nul (location_t loc, tree exp, tree fndecl, tree nonstr)
>> +{
>> +  loc = expansion_point_location_if_in_system_header (loc);
>> +
>> +  bool warned;
>> +  if (exp)
>> +    {
>> +      if (!fndecl)
>> +	fndecl = get_callee_fndecl (exp);
>> +      warned = warning_at (loc, OPT_Wstringop_overflow_,
>> +			   "%K%qD argument missing terminating nul",
>> +			   exp, fndecl);
>> +    }
>> +  else
>> +    {
>> +      gcc_assert (fndecl);
>> +      warned = warning_at (loc, OPT_Wstringop_overflow_,
>> +			   "%qD argument missing terminating nul",
>> +			   fndecl);
>> +    }
>> +
>> +  if (warned && DECL_P (nonstr))
>> +    inform (DECL_SOURCE_LOCATION (nonstr), "referenced argument declared here");
>> +}
>> +
>>  /* Compute the length of a null-terminated character string or wide
>>     character string handling character sizes of 1, 2, and 4 bytes.
>>     TREE_STRING_LENGTH is not the right way because it evaluates to
>> @@ -567,37 +597,60 @@ string_length (const void *ptr, unsigned eltsize, unsigned maxelts)
>>     accesses.  Note that this implies the result is not going to be emitted
>>     into the instruction stream.
>>  
>> +   When ARR is non-null and the string is not properly nul-terminated,
>> +   set *ARR to the declaration of the outermost constant object whose
>> +   initializer (or one of its elements) is not nul-terminated.
>> +
>>     The value returned is of type `ssizetype'.
>>  
>>     Unfortunately, string_constant can't access the values of const char
>>     arrays with initializers, so neither can we do so here.  */
>>  
>>  tree
>> -c_strlen (tree src, int only_value)
>> +c_strlen (tree src, int only_value, tree *arr /* = NULL */)
>>  {
>>    STRIP_NOPS (src);
>> +
>> +  /* Used to detect non-nul-terminated strings in subexpressions
>> +     of a conditional expression.  When ARR is null, point it at
>> +     one of the elements for simplicity.  */
>> +  tree arrs[] = { NULL_TREE, NULL_TREE };
>> +  if (!arr)
>> +    arr = arrs;
> 
> now arr is always != NULL
Right.  It's renamed in the follow-up patch from Martin, but yes, it's
always non-null.  Note in this case your comment is after the code
you're referring to.

> 
>> +
>>    if (TREE_CODE (src) == COND_EXPR
>>        && (only_value || !TREE_SIDE_EFFECTS (TREE_OPERAND (src, 0))))
>>      {
>> -      tree len1, len2;
>> -
>> -      len1 = c_strlen (TREE_OPERAND (src, 1), only_value);
>> -      len2 = c_strlen (TREE_OPERAND (src, 2), only_value);
>> +      tree len1 = c_strlen (TREE_OPERAND (src, 1), only_value, arrs);
>> +      tree len2 = c_strlen (TREE_OPERAND (src, 2), only_value, arrs + 1);
>>        if (tree_int_cst_equal (len1, len2))
>> -	return len1;
>> +	{
> 
> funny, if called with NULL *arr and arrs[0] alias each other.

> 
>> +	  *arr = arrs[0] ? arrs[0] : arrs[1];
>> +	  return len1;
>> +	}
>>      }

And in this case it looks like your comment is before the code you're
commenting about.  It was fairly obvious in this case because the code
prior to your "funny, if called..." comment didn't reference arr or arrs
at all.

And more importantly, why are you concerned about the aliasing?





>>  
>>    if (TREE_CODE (src) == COMPOUND_EXPR
>>        && (only_value || !TREE_SIDE_EFFECTS (TREE_OPERAND (src, 0))))
>> -    return c_strlen (TREE_OPERAND (src, 1), only_value);
>> +    return c_strlen (TREE_OPERAND (src, 1), only_value, arr);
>>  
>>    location_t loc = EXPR_LOC_OR_LOC (src, input_location);
>>  
>>    /* Offset from the beginning of the string in bytes.  */
>>    tree byteoff;
>> -  src = string_constant (src, &byteoff);
>> -  if (src == 0)
>> -    return NULL_TREE;
>> +  /* Set if array is nul-terminated, false otherwise.  */
>> +  bool nulterm;
> 
> note arr is always != null or pointing to arrs[0].
> 
>> +  src = string_constant (src, &byteoff, &nulterm, arr);
>> +  if (!src)
>> +    {
>> +      *arr = arrs[0] ? arrs[0] : arrs[1];
>> +      return NULL_TREE;
>> +    }
>> +
>> +  /* Clear *ARR when the string is nul-terminated.  It should be
>> +     of no interest to callers.  */
>> +  if (nulterm)
>> +    *arr = NULL_TREE;
>>  
>>    /* Determine the size of the string element.  */
>>    unsigned eltsize
>> @@ -621,6 +674,12 @@ c_strlen (tree src, int only_value)
>>  	maxelts = maxelts / eltsize - 1;
>>        }
>>  
>> +  /* Unless the caller is prepared to handle it by passing in a non-null
>> +     ARR, fail if the terminating nul doesn't fit in the array the string
>> +     is stored in (as in const char a[3] = "123";  */
> 
> note arr is always != NULL, thus this if is never taken.
> 
>> +  if (!arr && maxelts < strelts)
>> +    return NULL_TREE;
>> +
Right.  And I think that check is gone in the second version of Martin's
patch.


>>    /* PTR can point to the byte representation of any string type, including
>>       char* and wchar_t*.  */
>>    const char *ptr = TREE_STRING_POINTER (src);
>> @@ -650,7 +709,8 @@ c_strlen (tree src, int only_value)
>>        offsave = fold_convert (ssizetype, offsave);
>>        tree condexp = fold_build2_loc (loc, LE_EXPR, boolean_type_node, offsave,
>>  				      build_int_cst (ssizetype, len * eltsize));
> 
> this computation is wrong, it computes not in units of eltsize,
> I am however not sure if it is really good that this function tries to
> compute strlen of wide character strings.
Note that in the second version of Martin's patch eltsize will always be
1 when we get here.  Furthermore, the multiplication by eltsize is gone.
 It looks like you switched back to commenting after the code for that
comment, but then immediately thereafter...


> 
> That said, please fix this computation first, in a different patch
> instead of just fixing the indentation. (I know I pointed that lines are too
> long here, but that was before I realized that the whole length computation
> here is wrong).
> 
>> -      tree lenexp = size_diffop_loc (loc, ssize_int (strelts * eltsize), offsave);
>> +      tree lenexp = size_diffop_loc (loc, ssize_int (strelts * eltsize),
>> +				     offsave);
>>        return fold_build3_loc (loc, COND_EXPR, ssizetype, condexp, lenexp,
>>  			      build_zero_cst (ssizetype));
>>      }
>> @@ -690,7 +750,7 @@ c_strlen (tree src, int only_value)
>>       Since ELTOFF is our starting index into the string, no further
>>       calculation is needed.  */
The comment shown above appears to refer to the code below the comment.
Again, this makes it exceedingly confusing to understand your comments
and take appropriate action.




> 
> What are you fixing here, I think that was another bug.
> If this fixes something then it should be in a different patch,
> just handling this.
> 
>>    unsigned len = string_length (ptr + eltoff * eltsize, eltsize,
>> -				maxelts - eltoff);
>> +				strelts - eltoff);
>>  
>>    return ssize_int (len);
>>  }
I'm guessing the comment in this case refers to the code after.
Presumably questioning the change from maxelts to strelts.




>> @@ -2855,7 +2915,6 @@ expand_builtin_strlen (tree exp, rtx target,
>>  
>>    struct expand_operand ops[4];
>>    rtx pat;
>> -  tree len;
>>    tree src = CALL_EXPR_ARG (exp, 0);
>>    rtx src_reg;
>>    rtx_insn *before_strlen;
>> @@ -2864,20 +2923,39 @@ expand_builtin_strlen (tree exp, rtx target,
>>    unsigned int align;
>>  
>>    /* If the length can be computed at compile-time, return it.  */
>> -  len = c_strlen (src, 0);
>> +  tree array;
>> +  tree len = c_strlen (src, 0, &array);
> 
> You know the c_strlen tries to compute wide character sizes,
> but strlen does not do that, strlen (L"abc") should give 1
> (or 0 on a BE machine)
> I wonder if that is correct.
So I think this is fixed by your change which restored the default
behavior of c_strlen to count bytes.  Which restores the behavior of
c_strlen to match that of the strlen library call.

So for something like L"abc", the right value is 1 or 0 for LE and BE
targets respectively.



> 
>>    if (len)
>> -    return expand_expr (len, target, target_mode, EXPAND_NORMAL);
>> +    {
>> +      if (array)
>> +	{
>> +	  /* Array refers to the non-nul terminated constant array
>> +	     whose length is attempted to be computed.  */
> 
> I really wonder if it would not make more sense to have a
> nonterminated_string_constant_p instead.
Rather than a boolean I think we want a tree * so that we can bubble up
more information to the caller WRT non-terminated strings.

> 
> Last time I wanted to implement a warning in expand I faced the
> problem that inlined functions will get one warning per invocation?
Yea, but that's just the way things are.  I don't think it's really an
issue for the patches we're looking at right now.

>> @@ -8255,19 +8333,30 @@ fold_builtin_classify_type (tree arg)
>>    return build_int_cst (integer_type_node, type_to_class (TREE_TYPE (arg)));
>>  }
>>  
>> -/* Fold a call to __builtin_strlen with argument ARG.  */
>> +/* Fold a strlen call to FNDECL of TYPE, and with argument ARG.  */
>>  
>>  static tree
>> -fold_builtin_strlen (location_t loc, tree type, tree arg)
>> +fold_builtin_strlen (location_t loc, tree fndecl, tree type, tree arg)
>>  {
>>    if (!validate_arg (arg, POINTER_TYPE))
>>      return NULL_TREE;
>>    else
>>      {
>> -      tree len = c_strlen (arg, 0);
>> -
>> +      tree arr = NULL_TREE;
>> +      tree len = c_strlen (arg, 0, &arr);
> 
> Is it possible to write a test case where strlen(L"test") reaches this point?
> what will c_strlen return then?
Given your fix to have c_strlen count bytes by default, I think we're OK
here.

> 
>> @@ -11328,7 +11332,7 @@ string_constant (tree arg, tree *ptr_offset)
>>  	return NULL_TREE;
>>  
>>        tree offset;
>> -      if (tree str = string_constant (arg0, &offset))
>> +      if (tree str = string_constant (arg0, &offset, nulterm, decl))
>>  	{
>>  	  /* Avoid pointers to arrays (see bug 86622).  */
>>  	  if (POINTER_TYPE_P (TREE_TYPE (arg))
>> @@ -11368,6 +11372,10 @@ string_constant (tree arg, tree *ptr_offset)
>>    if (TREE_CODE (array) == STRING_CST)
> 
> Well, actually I think there _are_ STING_CSTs which are not null terminated.
> Maybe not in C. But Fortran, Ada, Go...
> 
>>      {
>>        *ptr_offset = fold_convert (sizetype, offset);
>> +      if (nulterm)
>> +	*nulterm = true;
>> +      if (decl)
>> +	*decl = NULL_TREE;
>>        return array;
>>      }
So your comment seems to be referring to the code above the comment as
well as below.  Confusing.  Consistency really helps.

Are we at a point where we're ready to declare STRING_CSTs as always
being properly terminated?  If not the hunk of code above needs some
rethinking since we can't guarantee the string is properly terminated.


>>  
>> @@ -11414,6 +11422,49 @@ string_constant (tree arg, tree *ptr_offset)
>>    if (!array_size || TREE_CODE (array_size) != INTEGER_CST)
>>      return NULL_TREE;
>>  
>> +  unsigned HOST_WIDE_INT array_elts = tree_to_uhwi (array_size);
>> +
> 
> I don't understand why this is necessary at all.
> It looks way too complicated, to say the least.
> 
> TREE_TYPE (init) has already the type of the member.
> 
>> +  /* When ARG refers to an aggregate (of arrays) try to determine
>> +     the size of the character array within the aggregate.  */
>> +  tree ref = arg;
>> +  tree reftype = TREE_TYPE (arg);
>> +
>> +  if (TREE_CODE (ref) == MEM_REF)
>> +    {
>> +      ref = TREE_OPERAND (ref, 0);
>> +      if (TREE_CODE (ref) == ADDR_EXPR)
>> +	{
>> +	  ref = TREE_OPERAND (ref, 0);
>> +	  reftype = TREE_TYPE (ref);
>> +	}
>> +    }
>> +  else
>> +    while (TREE_CODE (ref) == ARRAY_REF)
>> +      {
>> +	reftype = TREE_TYPE (ref);
>> +	ref = TREE_OPERAND (ref, 0);
>> +      }
>> +
>> +  if (TREE_CODE (ref) == COMPONENT_REF)
>> +    reftype = TREE_TYPE (ref);
>> +
>> +  while (TREE_CODE (reftype) == ARRAY_TYPE)
>> +    {
>> +      tree next = TREE_TYPE (reftype);
>> +      if (TREE_CODE (next) == INTEGER_TYPE)
>> +	{
>> +	  if (tree size = TYPE_SIZE_UNIT (reftype))
>> +	    if (tree_fits_uhwi_p (size))
>> +	      array_elts = tree_to_uhwi (size);
> 
> so array_elts is measued in bytes.
So I'm guessing your comment about the code looking way too complicated
is referring to the code *after* your comment.  That code is not in the
v2 patch.  At least not 01/06 which addresses the codegen/opt issues.  I
haven't checked the full kit to see if this code appears in a subsequent
patch.


> 
>> +	  break;
>> +	}
>> +
>> +      reftype = TREE_TYPE (reftype);
>> +    }
>> +
>> +  if (decl)
>> +    *decl = array;
>> +
>>    /* Avoid returning a string that doesn't fit in the array
>>       it is stored in, like
>>       const char a[4] = "abcde";
>> @@ -11427,7 +11478,9 @@ string_constant (tree arg, tree *ptr_offset)
>>    unsigned HOST_WIDE_INT length = TREE_STRING_LENGTH (init);
>>    length = string_length (TREE_STRING_POINTER (init), charsize,
>>  			  length / charsize);
> 
> Some callers especially those where the wrong code happens, expect
> to be able to access the STRING_CST up to TREE_STRING_LENGTH,
> But using string_length assume thy stop at the first nul char.
Example please?


> 
>> -  if (compare_tree_int (array_size, length + 1) < 0)
>> +  if (nulterm)
> 
> but here you compare bytes with length which is measued un chars.
> 
>> +    *nulterm = array_elts > length;
>> +  else if (array_elts <= length)
>>      return NULL_TREE;
>>  
>>    *ptr_offset = offset;
Seems wrong.  But my eyes are glazing over badly.  I'm going to have to
look at this and your subsequent comments tomorrow.

Jeff
Bernd Edlinger Aug. 24, 2018, 12:27 p.m. UTC | #10
On 08/24/18 08:36, Jeff Law wrote:
> On 08/02/2018 07:26 AM, Bernd Edlinger wrote:
>> On 08/02/18 04:44, Martin Sebor wrote:
>>> Since the foundation of the patch is detecting and avoiding
>>> the overly aggressive folding of unterminated char arrays,
>>> besides issuing a warning for such arguments to strlen,
>>> the patch also fixes pr86711 - wrong folding of memchr, and
>>> pr86714 - tree-ssa-forwprop.c confused by too long initializer.
>>>
>>> The substance of the attached updated patch is unchanged,
>>> I have just added test cases for the two additional bugs.
>>>
>>> Bernd, as I mentioned Wednesday, the patch supersedes
>>> yours here:
>>> https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01800.html
>>>
>>
>> No problem, but I hope you understand, that I still uphold
>> my patch.
>>
>> So we have two patches now:
>> - mine, fixing a wrong code bug,
>> - yours, implementing a new warning and fixing a wrong
>> code bug at the same time.
>>
>> I will add a few comments to your patch below.
> [ ... ]
> 
> So a lot of the comments are out of date, presumably because Martin
> fixed the issues you pointed out in his second version of the patch.
> But there's still some useful nuggets in your comments that are still
> relevant.
> 

Yes, possible, and therefore I would like updated patches summarize
the changes.

> FYI it appears that sometimes you comment above a chunk of code, and
> other times below.  That makes it exceedingly difficult to figure out
> the issue you're trying to raise.
> 

Okydoky.


> 
>>
>>> Martin
>>>
>>> On 07/30/2018 01:17 PM, Martin Sebor wrote:
>>>> Attached is an updated version of the patch that handles more
>>>> instances of calling strlen() on a constant array that is not
>>>> a nul-terminated string.
>>>>
>>>> No other functions except strlen are explicitly handled yet,
>>>> and neither are constant arrays with braced-initializer lists
>>>> like const char a[] = { 'a', 'b', 'c' };  I am testing
>>>> an independent solution for those (bug 86552).  Once those
>>>> are handled the warning will be able to detect those as well.
>>>>
>>>> Tested on x86_64-linux.
>>>>
>>>> On 07/25/2018 05:38 PM, Martin Sebor wrote:
>>>>> Ping: https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01124.html
>>>>>
>>>>> The fix for bug 86532 has been checked in so this enhancement
>>>>> can now be applied on top of it (with only minor adjustments).
>>>>>
>>>>> On 07/19/2018 02:08 PM, Martin Sebor wrote:
>>>>>> In the discussion of my patch for pr86532 Bernd noted that
>>>>>> GCC silently accepts constant character arrays with no
>>>>>> terminating nul as arguments to strlen (and other string
>>>>>> functions).
>>>>>>
>>>>>> The attached patch is a first step in detecting these kinds
>>>>>> of bugs in strlen calls by issuing -Wstringop-overflow.
>>>>>> The next step is to modify all other handlers of built-in
>>>>>> functions to detect the same problem (not part of this patch).
>>>>>> Yet another step is to detect these problems in arguments
>>>>>> initialized using the non-string form:
>>>>>>
>>>>>>    const char a[] = { 'a', 'b', 'c' };
>>>>>>
>>>>>> This patch is meant to apply on top of the one for bug 86532
>>>>>> (I tested it with an earlier version of that patch so there
>>>>>> is code in the context that does not appear in the latest
>>>>>> version of the other diff).
>>>>>>
>>>>>> Martin
>>>>>>
>>>>>
>>>>
>>>
>>> PR tree-optimization/86714 - tree-ssa-forwprop.c confused by too long initializer
>>> PR tree-optimization/86711 - wrong folding of memchr
>>> PR tree-optimization/86552 - missing warning for reading past the end of non-string arrays
>>>
>>> gcc/ChangeLog:
>>>
>>> 	PR tree-optimization/86714
>>> 	PR tree-optimization/86711
>>> 	PR tree-optimization/86552
>>> 	* builtins.h (warn_string_no_nul): Declare..
>>> 	(c_strlen): Add argument.
>>> 	* builtins.c (warn_string_no_nul): New function.
>>> 	(fold_builtin_strlen): Add argument.  Detect missing nul.
>>> 	(fold_builtin_1): Adjust.
>>> 	(string_length): Add argument and use it.
>>> 	(c_strlen): Same.
>>> 	(expand_builtin_strlen): Detect missing nul.
>>> 	* expr.c (string_constant): Add arguments.  Detect missing nul
>>> 	terminator and outermost declaration it's missing in.
>>> 	* expr.h (string_constant): Add argument.
>>> 	* fold-const.c (c_getstr): Change argument to bool*, rename
>>> 	other arguments.
>>> 	* fold-const-call.c (fold_const_call): Detect missing nul.
>>> 	* gimple-fold.c (get_range_strlen): Add argument.
>>> 	(get_maxval_strlen): Adjust.
>>> 	* gimple-fold.h (get_range_strlen): Add argument.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>> 	PR tree-optimization/86714
>>> 	PR tree-optimization/86711
>>> 	PR tree-optimization/86552
>>> 	* gcc.c-torture/execute/memchr-1.c: New test.
>>> 	* gcc.c-torture/execute/pr86714.c: New test.
>>> 	* gcc.dg/warn-strlen-no-nul.c: New test.
>>>
>>> diff --git a/gcc/builtins.c b/gcc/builtins.c
>>> index aa3e0d8..f4924d5 100644
>>> --- a/gcc/builtins.c
>>> +++ b/gcc/builtins.c
>>> @@ -150,7 +150,7 @@ static tree stabilize_va_list_loc (location_t, tree, int);
>>>   static rtx expand_builtin_expect (tree, rtx);
>>>   static tree fold_builtin_constant_p (tree);
>>>   static tree fold_builtin_classify_type (tree);
>>> -static tree fold_builtin_strlen (location_t, tree, tree);
>>> +static tree fold_builtin_strlen (location_t, tree, tree, tree);
>>>   static tree fold_builtin_inf (location_t, tree, int);
>>>   static tree rewrite_call_expr (location_t, tree, int, tree, int, ...);
>>>   static bool validate_arg (const_tree, enum tree_code code);
>>> @@ -550,6 +550,36 @@ string_length (const void *ptr, unsigned eltsize, unsigned maxelts)
>>>     return n;
>>>   }
>>>   
>>> +/* For a call expression EXP to a function that expects a string argument,
>>> +   issue a diagnostic due to it being a called with an argument NONSTR
>>> +   that is a character array with no terminating NUL.  */
>>> +
>>> +void
>>> +warn_string_no_nul (location_t loc, tree exp, tree fndecl, tree nonstr)
>>> +{
>>> +  loc = expansion_point_location_if_in_system_header (loc);
>>> +
>>> +  bool warned;
>>> +  if (exp)
>>> +    {
>>> +      if (!fndecl)
>>> +	fndecl = get_callee_fndecl (exp);
>>> +      warned = warning_at (loc, OPT_Wstringop_overflow_,
>>> +			   "%K%qD argument missing terminating nul",
>>> +			   exp, fndecl);
>>> +    }
>>> +  else
>>> +    {
>>> +      gcc_assert (fndecl);
>>> +      warned = warning_at (loc, OPT_Wstringop_overflow_,
>>> +			   "%qD argument missing terminating nul",
>>> +			   fndecl);
>>> +    }
>>> +
>>> +  if (warned && DECL_P (nonstr))
>>> +    inform (DECL_SOURCE_LOCATION (nonstr), "referenced argument declared here");
>>> +}
>>> +
>>>   /* Compute the length of a null-terminated character string or wide
>>>      character string handling character sizes of 1, 2, and 4 bytes.
>>>      TREE_STRING_LENGTH is not the right way because it evaluates to
>>> @@ -567,37 +597,60 @@ string_length (const void *ptr, unsigned eltsize, unsigned maxelts)
>>>      accesses.  Note that this implies the result is not going to be emitted
>>>      into the instruction stream.
>>>   
>>> +   When ARR is non-null and the string is not properly nul-terminated,
>>> +   set *ARR to the declaration of the outermost constant object whose
>>> +   initializer (or one of its elements) is not nul-terminated.
>>> +
>>>      The value returned is of type `ssizetype'.
>>>   
>>>      Unfortunately, string_constant can't access the values of const char
>>>      arrays with initializers, so neither can we do so here.  */
>>>   
>>>   tree
>>> -c_strlen (tree src, int only_value)
>>> +c_strlen (tree src, int only_value, tree *arr /* = NULL */)
>>>   {
>>>     STRIP_NOPS (src);
>>> +
>>> +  /* Used to detect non-nul-terminated strings in subexpressions
>>> +     of a conditional expression.  When ARR is null, point it at
>>> +     one of the elements for simplicity.  */
>>> +  tree arrs[] = { NULL_TREE, NULL_TREE };
>>> +  if (!arr)
>>> +    arr = arrs;
>>
>> now arr is always != NULL
> Right.  It's renamed in the follow-up patch from Martin, but yes, it's
> always non-null.  Note in this case your comment is after the code
> you're referring to.
> 
>>
>>> +
>>>     if (TREE_CODE (src) == COND_EXPR
>>>         && (only_value || !TREE_SIDE_EFFECTS (TREE_OPERAND (src, 0))))
>>>       {
>>> -      tree len1, len2;
>>> -
>>> -      len1 = c_strlen (TREE_OPERAND (src, 1), only_value);
>>> -      len2 = c_strlen (TREE_OPERAND (src, 2), only_value);
>>> +      tree len1 = c_strlen (TREE_OPERAND (src, 1), only_value, arrs);
>>> +      tree len2 = c_strlen (TREE_OPERAND (src, 2), only_value, arrs + 1);
>>>         if (tree_int_cst_equal (len1, len2))
>>> -	return len1;
>>> +	{
>>
>> funny, if called with NULL *arr and arrs[0] alias each other.
> 
>>
>>> +	  *arr = arrs[0] ? arrs[0] : arrs[1];
>>> +	  return len1;
>>> +	}
>>>       }
> 
> And in this case it looks like your comment is before the code you're
> commenting about.  It was fairly obvious in this case because the code
> prior to your "funny, if called..." comment didn't reference arr or arrs
> at all.
> 
> And more importantly, why are you concerned about the aliasing?
> 

It is just *arr = arrs[0] does nothing, but it looks like the author
was not aware of it.  It may be okay, but causes head-scratching.
If you don't have the context you will think this does something different.

> 
> 
> 
> 
>>>   
>>>     if (TREE_CODE (src) == COMPOUND_EXPR
>>>         && (only_value || !TREE_SIDE_EFFECTS (TREE_OPERAND (src, 0))))
>>> -    return c_strlen (TREE_OPERAND (src, 1), only_value);
>>> +    return c_strlen (TREE_OPERAND (src, 1), only_value, arr);
>>>   
>>>     location_t loc = EXPR_LOC_OR_LOC (src, input_location);
>>>   
>>>     /* Offset from the beginning of the string in bytes.  */
>>>     tree byteoff;
>>> -  src = string_constant (src, &byteoff);
>>> -  if (src == 0)
>>> -    return NULL_TREE;
>>> +  /* Set if array is nul-terminated, false otherwise.  */
>>> +  bool nulterm;
>>
>> note arr is always != null or pointing to arrs[0].
>>
>>> +  src = string_constant (src, &byteoff, &nulterm, arr);
>>> +  if (!src)
>>> +    {
>>> +      *arr = arrs[0] ? arrs[0] : arrs[1];
>>> +      return NULL_TREE;
>>> +    }
>>> +
>>> +  /* Clear *ARR when the string is nul-terminated.  It should be
>>> +     of no interest to callers.  */
>>> +  if (nulterm)
>>> +    *arr = NULL_TREE;
>>>   
>>>     /* Determine the size of the string element.  */
>>>     unsigned eltsize
>>> @@ -621,6 +674,12 @@ c_strlen (tree src, int only_value)
>>>   	maxelts = maxelts / eltsize - 1;
>>>         }
>>>   
>>> +  /* Unless the caller is prepared to handle it by passing in a non-null
>>> +     ARR, fail if the terminating nul doesn't fit in the array the string
>>> +     is stored in (as in const char a[3] = "123";  */
>>
>> note arr is always != NULL, thus this if is never taken.
>>
>>> +  if (!arr && maxelts < strelts)
>>> +    return NULL_TREE;
>>> +
> Right.  And I think that check is gone in the second version of Martin's
> patch.
> 
> 
>>>     /* PTR can point to the byte representation of any string type, including
>>>        char* and wchar_t*.  */
>>>     const char *ptr = TREE_STRING_POINTER (src);
>>> @@ -650,7 +709,8 @@ c_strlen (tree src, int only_value)
>>>         offsave = fold_convert (ssizetype, offsave);
>>>         tree condexp = fold_build2_loc (loc, LE_EXPR, boolean_type_node, offsave,
>>>   				      build_int_cst (ssizetype, len * eltsize));
>>
>> this computation is wrong, it computes not in units of eltsize,
>> I am however not sure if it is really good that this function tries to
>> compute strlen of wide character strings.
> Note that in the second version of Martin's patch eltsize will always be
> 1 when we get here.  Furthermore, the multiplication by eltsize is gone.
>   It looks like you switched back to commenting after the code for that
> comment, but then immediately thereafter...
> 

Acutally I had no idea that the second patch did resolve some of my comments
and which those were.

I had the impression that it is just splitting up of a large patch into
several smaller without reworking at the same time.

Once again, a summary what was changed would be helpful.

> 
>>
>> That said, please fix this computation first, in a different patch
>> instead of just fixing the indentation. (I know I pointed that lines are too
>> long here, but that was before I realized that the whole length computation
>> here is wrong).
>>
>>> -      tree lenexp = size_diffop_loc (loc, ssize_int (strelts * eltsize), offsave);
>>> +      tree lenexp = size_diffop_loc (loc, ssize_int (strelts * eltsize),
>>> +				     offsave);
>>>         return fold_build3_loc (loc, COND_EXPR, ssizetype, condexp, lenexp,
>>>   			      build_zero_cst (ssizetype));
>>>       }
>>> @@ -690,7 +750,7 @@ c_strlen (tree src, int only_value)
>>>        Since ELTOFF is our starting index into the string, no further
>>>        calculation is needed.  */
> The comment shown above appears to refer to the code below the comment.
> Again, this makes it exceedingly confusing to understand your comments
> and take appropriate action.
> 
> 
> 
> 
>>
>> What are you fixing here, I think that was another bug.
>> If this fixes something then it should be in a different patch,
>> just handling this.
>>
>>>     unsigned len = string_length (ptr + eltoff * eltsize, eltsize,
>>> -				maxelts - eltoff);
>>> +				strelts - eltoff);
>>>   
>>>     return ssize_int (len);
>>>   }
> I'm guessing the comment in this case refers to the code after.
> Presumably questioning the change from maxelts to strelts.
> 
> 

Yes.  I was thinking that could be a patch of its own.

> 
> 
>>> @@ -2855,7 +2915,6 @@ expand_builtin_strlen (tree exp, rtx target,
>>>   
>>>     struct expand_operand ops[4];
>>>     rtx pat;
>>> -  tree len;
>>>     tree src = CALL_EXPR_ARG (exp, 0);
>>>     rtx src_reg;
>>>     rtx_insn *before_strlen;
>>> @@ -2864,20 +2923,39 @@ expand_builtin_strlen (tree exp, rtx target,
>>>     unsigned int align;
>>>   
>>>     /* If the length can be computed at compile-time, return it.  */
>>> -  len = c_strlen (src, 0);
>>> +  tree array;
>>> +  tree len = c_strlen (src, 0, &array);
>>
>> You know the c_strlen tries to compute wide character sizes,
>> but strlen does not do that, strlen (L"abc") should give 1
>> (or 0 on a BE machine)
>> I wonder if that is correct.
> So I think this is fixed by your change which restored the default
> behavior of c_strlen to count bytes.  Which restores the behavior of
> c_strlen to match that of the strlen library call.
> 
> So for something like L"abc", the right value is 1 or 0 for LE and BE
> targets respectively.
> 

Well, yes, although I changed my mind on strlen(L"abc") meanwhile.

This may appear as if I contradict myself but, it is more like I learn.

The installed patch for the ELTSIZE parameter was in part inspired by the
thought about "not folding invalid stuff" that was forming in my mind
at that time, even before I wrote it down and sent it to this list.

So the patch does reject to give a value for strlen(L"abc") since
it is an invalid call.

Previously that was counting bytes in a wide char string, but I doubt
that was the original intention though.


> 
> 
>>
>>>     if (len)
>>> -    return expand_expr (len, target, target_mode, EXPAND_NORMAL);
>>> +    {
>>> +      if (array)
>>> +	{
>>> +	  /* Array refers to the non-nul terminated constant array
>>> +	     whose length is attempted to be computed.  */
>>
>> I really wonder if it would not make more sense to have a
>> nonterminated_string_constant_p instead.
> Rather than a boolean I think we want a tree * so that we can bubble up
> more information to the caller WRT non-terminated strings.
> 
>>
>> Last time I wanted to implement a warning in expand I faced the
>> problem that inlined functions will get one warning per invocation?
> Yea, but that's just the way things are.  I don't think it's really an
> issue for the patches we're looking at right now.
> 

No.

>>> @@ -8255,19 +8333,30 @@ fold_builtin_classify_type (tree arg)
>>>     return build_int_cst (integer_type_node, type_to_class (TREE_TYPE (arg)));
>>>   }
>>>   
>>> -/* Fold a call to __builtin_strlen with argument ARG.  */
>>> +/* Fold a strlen call to FNDECL of TYPE, and with argument ARG.  */
>>>   
>>>   static tree
>>> -fold_builtin_strlen (location_t loc, tree type, tree arg)
>>> +fold_builtin_strlen (location_t loc, tree fndecl, tree type, tree arg)
>>>   {
>>>     if (!validate_arg (arg, POINTER_TYPE))
>>>       return NULL_TREE;
>>>     else
>>>       {
>>> -      tree len = c_strlen (arg, 0);
>>> -
>>> +      tree arr = NULL_TREE;
>>> +      tree len = c_strlen (arg, 0, &arr);
>>
>> Is it possible to write a test case where strlen(L"test") reaches this point?
>> what will c_strlen return then?
> Given your fix to have c_strlen count bytes by default, I think we're OK
> here.
> 

Yes.  But my fix was still incomplete.
So incremental progress is necessary here.

>>
>>> @@ -11328,7 +11332,7 @@ string_constant (tree arg, tree *ptr_offset)
>>>   	return NULL_TREE;
>>>   
>>>         tree offset;
>>> -      if (tree str = string_constant (arg0, &offset))
>>> +      if (tree str = string_constant (arg0, &offset, nulterm, decl))
>>>   	{
>>>   	  /* Avoid pointers to arrays (see bug 86622).  */
>>>   	  if (POINTER_TYPE_P (TREE_TYPE (arg))
>>> @@ -11368,6 +11372,10 @@ string_constant (tree arg, tree *ptr_offset)
>>>     if (TREE_CODE (array) == STRING_CST)
>>
>> Well, actually I think there _are_ STING_CSTs which are not null terminated.
>> Maybe not in C. But Fortran, Ada, Go...
>>
>>>       {
>>>         *ptr_offset = fold_convert (sizetype, offset);
>>> +      if (nulterm)
>>> +	*nulterm = true;
>>> +      if (decl)
>>> +	*decl = NULL_TREE;
>>>         return array;
>>>       }
> So your comment seems to be referring to the code above the comment as
> well as below.  Confusing.  Consistency really helps.
> 
> Are we at a point where we're ready to declare STRING_CSTs as always
> being properly terminated?  If not the hunk of code above needs some
> rethinking since we can't guarantee the string is properly terminated.
> 

No.  That is still in the flux.
And I am not sure if the "properly terminated" will survive.

I am digging deep to the ground now.
And the correctness of the code in questing depends heavily on the
results.  Therefore I would like to fix this from the ground.

Adding lots of new code that is based on wrong assumptions
will not help.


> 
>>>   
>>> @@ -11414,6 +11422,49 @@ string_constant (tree arg, tree *ptr_offset)
>>>     if (!array_size || TREE_CODE (array_size) != INTEGER_CST)
>>>       return NULL_TREE;
>>>   
>>> +  unsigned HOST_WIDE_INT array_elts = tree_to_uhwi (array_size);
>>> +
>>
>> I don't understand why this is necessary at all.
>> It looks way too complicated, to say the least.
>>
>> TREE_TYPE (init) has already the type of the member.
>>
>>> +  /* When ARG refers to an aggregate (of arrays) try to determine
>>> +     the size of the character array within the aggregate.  */
>>> +  tree ref = arg;
>>> +  tree reftype = TREE_TYPE (arg);
>>> +
>>> +  if (TREE_CODE (ref) == MEM_REF)
>>> +    {
>>> +      ref = TREE_OPERAND (ref, 0);
>>> +      if (TREE_CODE (ref) == ADDR_EXPR)
>>> +	{
>>> +	  ref = TREE_OPERAND (ref, 0);
>>> +	  reftype = TREE_TYPE (ref);
>>> +	}
>>> +    }
>>> +  else
>>> +    while (TREE_CODE (ref) == ARRAY_REF)
>>> +      {
>>> +	reftype = TREE_TYPE (ref);
>>> +	ref = TREE_OPERAND (ref, 0);
>>> +      }
>>> +
>>> +  if (TREE_CODE (ref) == COMPONENT_REF)
>>> +    reftype = TREE_TYPE (ref);
>>> +
>>> +  while (TREE_CODE (reftype) == ARRAY_TYPE)
>>> +    {
>>> +      tree next = TREE_TYPE (reftype);
>>> +      if (TREE_CODE (next) == INTEGER_TYPE)
>>> +	{
>>> +	  if (tree size = TYPE_SIZE_UNIT (reftype))
>>> +	    if (tree_fits_uhwi_p (size))
>>> +	      array_elts = tree_to_uhwi (size);
>>
>> so array_elts is measued in bytes.
> So I'm guessing your comment about the code looking way too complicated
> is referring to the code *after* your comment.  That code is not in the
> v2 patch.  At least not 01/06 which addresses the codegen/opt issues.  I
> haven't checked the full kit to see if this code appears in a subsequent
> patch.
> 

No I meant the code between
/* When ARG refers to an aggregate (of arrays) try to determine
    the size of the character array within the aggregate.  */

and here.  It is wrong (assuming C semantic on GIMPLE).

> 
>>
>>> +	  break;
>>> +	}
>>> +
>>> +      reftype = TREE_TYPE (reftype);
>>> +    }
>>> +
>>> +  if (decl)
>>> +    *decl = array;
>>> +
>>>     /* Avoid returning a string that doesn't fit in the array
>>>        it is stored in, like
>>>        const char a[4] = "abcde";
>>> @@ -11427,7 +11478,9 @@ string_constant (tree arg, tree *ptr_offset)
>>>     unsigned HOST_WIDE_INT length = TREE_STRING_LENGTH (init);
>>>     length = string_length (TREE_STRING_POINTER (init), charsize,
>>>   			  length / charsize);
>>
>> Some callers especially those where the wrong code happens, expect
>> to be able to access the STRING_CST up to TREE_STRING_LENGTH,
>> But using string_length assume thy stop at the first nul char.
> Example please?
> 

tree-ssa-forwprop.c:
               str1 = string_constant (src1, &off1);
               if (str1 == NULL_TREE)
                 break;
               if (!tree_fits_uhwi_p (off1)
                   || compare_tree_int (off1, TREE_STRING_LENGTH (str1) - 1) > 0
                   || compare_tree_int (len1, TREE_STRING_LENGTH (str1)
                                              - tree_to_uhwi (off1)) > 0



Bernd.
Jeff Law Aug. 24, 2018, 4:04 p.m. UTC | #11
On 08/24/2018 06:27 AM, Bernd Edlinger wrote:
[ Lots of snipping throughout ]


>>>
>>>> +
>>>>     if (TREE_CODE (src) == COND_EXPR
>>>>         && (only_value || !TREE_SIDE_EFFECTS (TREE_OPERAND (src, 0))))
>>>>       {
>>>> -      tree len1, len2;
>>>> -
>>>> -      len1 = c_strlen (TREE_OPERAND (src, 1), only_value);
>>>> -      len2 = c_strlen (TREE_OPERAND (src, 2), only_value);
>>>> +      tree len1 = c_strlen (TREE_OPERAND (src, 1), only_value, arrs);
>>>> +      tree len2 = c_strlen (TREE_OPERAND (src, 2), only_value, arrs + 1);
>>>>         if (tree_int_cst_equal (len1, len2))
>>>> -	return len1;
>>>> +	{
>>>
>>> funny, if called with NULL *arr and arrs[0] alias each other.
>>
>>>
>>>> +	  *arr = arrs[0] ? arrs[0] : arrs[1];
>>>> +	  return len1;
>>>> +	}
>>>>       }
>>
>> And in this case it looks like your comment is before the code you're
>> commenting about.  It was fairly obvious in this case because the code
>> prior to your "funny, if called..." comment didn't reference arr or arrs
>> at all.
>>
>> And more importantly, why are you concerned about the aliasing?
>>
> 
> It is just *arr = arrs[0] does nothing, but it looks like the author
> was not aware of it.  It may be okay, but causes head-scratching.
> If you don't have the context you will think this does something different.
*arr = arrs[0] ? arrs[0] : arrs[1];

Yes, there's potentially a dead store when arr points to arrs[0] because
of the earlier initialization when NULL was passed for arr.  But
otherwise we'd be checking repeatedly that arr != NULL.


>>>>     /* PTR can point to the byte representation of any string type, including
>>>>        char* and wchar_t*.  */
>>>>     const char *ptr = TREE_STRING_POINTER (src);
>>>> @@ -650,7 +709,8 @@ c_strlen (tree src, int only_value)
>>>>         offsave = fold_convert (ssizetype, offsave);
>>>>         tree condexp = fold_build2_loc (loc, LE_EXPR, boolean_type_node, offsave,
>>>>   				      build_int_cst (ssizetype, len * eltsize));
>>>
>>> this computation is wrong, it computes not in units of eltsize,
>>> I am however not sure if it is really good that this function tries to
>>> compute strlen of wide character strings.
>> Note that in the second version of Martin's patch eltsize will always be
>> 1 when we get here.  Furthermore, the multiplication by eltsize is gone.
>>   It looks like you switched back to commenting after the code for that
>> comment, but then immediately thereafter...
>>
> 
> Acutally I had no idea that the second patch did resolve some of my comments
> and which those were.
It happens.  Multiple long threads make it difficult to follow.

> 
> I had the impression that it is just splitting up of a large patch into
> several smaller without reworking at the same time.
> 
> Once again, a summary what was changed would be helpful.
Agreed.


>>> What are you fixing here, I think that was another bug.
>>> If this fixes something then it should be in a different patch,
>>> just handling this.
>>>
>>>>     unsigned len = string_length (ptr + eltoff * eltsize, eltsize,
>>>> -				maxelts - eltoff);
>>>> +				strelts - eltoff);
>>>>   
>>>>     return ssize_int (len);
>>>>   }
>> I'm guessing the comment in this case refers to the code after.
>> Presumably questioning the change from maxelts to strelts.
>>
>>
> 
> Yes.  I was thinking that could be a patch of its own.
Funny.  I found myself in agreement and was going to extract this out
and run it through the usual tests and get it onto the trunk
immediately.  Then I realized you'd already fixed this in the patch that
added the eltsize paramter to c_strlen which has already been committed
:-)



> 
> Well, yes, although I changed my mind on strlen(L"abc") meanwhile.
> 
> This may appear as if I contradict myself but, it is more like I learn.
> 
> The installed patch for the ELTSIZE parameter was in part inspired by the
> thought about "not folding invalid stuff" that was forming in my mind
> at that time, even before I wrote it down and sent it to this list.
ACK.  And I think there seems to be consensus forming around that
concept which is good.

If we ultimately decide not to fold strlen of a wide character string,
then that'll be an easy enough change to make.  In the mean time bring
consistent with how the C library strlen works is a good thing IMHO.


>>>> @@ -8255,19 +8333,30 @@ fold_builtin_classify_type (tree arg)
>>>>     return build_int_cst (integer_type_node, type_to_class (TREE_TYPE (arg)));
>>>>   }
>>>>   
>>>> -/* Fold a call to __builtin_strlen with argument ARG.  */
>>>> +/* Fold a strlen call to FNDECL of TYPE, and with argument ARG.  */
>>>>   
>>>>   static tree
>>>> -fold_builtin_strlen (location_t loc, tree type, tree arg)
>>>> +fold_builtin_strlen (location_t loc, tree fndecl, tree type, tree arg)
>>>>   {
>>>>     if (!validate_arg (arg, POINTER_TYPE))
>>>>       return NULL_TREE;
>>>>     else
>>>>       {
>>>> -      tree len = c_strlen (arg, 0);
>>>> -
>>>> +      tree arr = NULL_TREE;
>>>> +      tree len = c_strlen (arg, 0, &arr);
>>>
>>> Is it possible to write a test case where strlen(L"test") reaches this point?
>>> what will c_strlen return then?
>> Given your fix to have c_strlen count bytes by default, I think we're OK
>> here.
>>
> 
> Yes.  But my fix was still incomplete.
> So incremental progress is necessary here.
That's fine.  I'm generally a fan of incremental improvements.

>>>
>>>>       {
>>>>         *ptr_offset = fold_convert (sizetype, offset);
>>>> +      if (nulterm)
>>>> +	*nulterm = true;
>>>> +      if (decl)
>>>> +	*decl = NULL_TREE;
>>>>         return array;
>>>>       }
>> So your comment seems to be referring to the code above the comment as
>> well as below.  Confusing.  Consistency really helps.
>>
>> Are we at a point where we're ready to declare STRING_CSTs as always
>> being properly terminated?  If not the hunk of code above needs some
>> rethinking since we can't guarantee the string is properly terminated.
>>
> 
> No.  That is still in the flux.
> And I am not sure if the "properly terminated" will survive.
> 
> I am digging deep to the ground now.
> And the correctness of the code in questing depends heavily on the
> results.  Therefore I would like to fix this from the ground.
> 
> Adding lots of new code that is based on wrong assumptions
> will not help.
OK.  So clearly this hunk needs rethinking.  I'm not sure if/how
critical the code above is -- we may be able to get away with deferring
this chunk.  I'll have to look at that more deeply.

> 
> 
>>
>>>>   
>>>> @@ -11414,6 +11422,49 @@ string_constant (tree arg, tree *ptr_offset)
>>>>     if (!array_size || TREE_CODE (array_size) != INTEGER_CST)
>>>>       return NULL_TREE;
>>>>   
>>>> +  unsigned HOST_WIDE_INT array_elts = tree_to_uhwi (array_size);
>>>> +
>>>
>>> I don't understand why this is necessary at all.
>>> It looks way too complicated, to say the least.
>>>
>>> TREE_TYPE (init) has already the type of the member.
>>>
>>>> +  /* When ARG refers to an aggregate (of arrays) try to determine
>>>> +     the size of the character array within the aggregate.  */
>>>> +  tree ref = arg;
>>>> +  tree reftype = TREE_TYPE (arg);
>>>> +
>>>> +  if (TREE_CODE (ref) == MEM_REF)
>>>> +    {
>>>> +      ref = TREE_OPERAND (ref, 0);
>>>> +      if (TREE_CODE (ref) == ADDR_EXPR)
>>>> +	{
>>>> +	  ref = TREE_OPERAND (ref, 0);
>>>> +	  reftype = TREE_TYPE (ref);
>>>> +	}
>>>> +    }
>>>> +  else
>>>> +    while (TREE_CODE (ref) == ARRAY_REF)
>>>> +      {
>>>> +	reftype = TREE_TYPE (ref);
>>>> +	ref = TREE_OPERAND (ref, 0);
>>>> +      }
>>>> +
>>>> +  if (TREE_CODE (ref) == COMPONENT_REF)
>>>> +    reftype = TREE_TYPE (ref);
>>>> +
>>>> +  while (TREE_CODE (reftype) == ARRAY_TYPE)
>>>> +    {
>>>> +      tree next = TREE_TYPE (reftype);
>>>> +      if (TREE_CODE (next) == INTEGER_TYPE)
>>>> +	{
>>>> +	  if (tree size = TYPE_SIZE_UNIT (reftype))
>>>> +	    if (tree_fits_uhwi_p (size))
>>>> +	      array_elts = tree_to_uhwi (size);
>>>
>>> so array_elts is measued in bytes.
>> So I'm guessing your comment about the code looking way too complicated
>> is referring to the code *after* your comment.  That code is not in the
>> v2 patch.  At least not 01/06 which addresses the codegen/opt issues.  I
>> haven't checked the full kit to see if this code appears in a subsequent
>> patch.
>>
> 
> No I meant the code between
> /* When ARG refers to an aggregate (of arrays) try to determine
>     the size of the character array within the aggregate.  */
> 
> and here.  It is wrong (assuming C semantic on GIMPLE).
Ah.  Yea.  I think we're broadly in agreement we can't do that for
anything which affects optimization/codegen.  Given those bits aren't in
Martin's v2 patch I think we can move on.


> 
>>
>>>
>>>> +	  break;
>>>> +	}
>>>> +
>>>> +      reftype = TREE_TYPE (reftype);
>>>> +    }
>>>> +
>>>> +  if (decl)
>>>> +    *decl = array;
>>>> +
>>>>     /* Avoid returning a string that doesn't fit in the array
>>>>        it is stored in, like
>>>>        const char a[4] = "abcde";
>>>> @@ -11427,7 +11478,9 @@ string_constant (tree arg, tree *ptr_offset)
>>>>     unsigned HOST_WIDE_INT length = TREE_STRING_LENGTH (init);
>>>>     length = string_length (TREE_STRING_POINTER (init), charsize,
>>>>   			  length / charsize);
>>>
>>> Some callers especially those where the wrong code happens, expect
>>> to be able to access the STRING_CST up to TREE_STRING_LENGTH,
>>> But using string_length assume thy stop at the first nul char.
>> Example please?
>>
> 
> tree-ssa-forwprop.c:
>                str1 = string_constant (src1, &off1);
>                if (str1 == NULL_TREE)
>                  break;
>                if (!tree_fits_uhwi_p (off1)
>                    || compare_tree_int (off1, TREE_STRING_LENGTH (str1) - 1) > 0
>                    || compare_tree_int (len1, TREE_STRING_LENGTH (str1)
>                                               - tree_to_uhwi (off1)) > 0
I understand that you're concerned about reading past the end of the
returned STRING_CST via the subsequent memcpy in tree-ssa-forwprop.c
which starts at src1+off1 and reads len1 bytes.

But I'm not sure how that can happen in the V2 patch from Martin.  In
fact, one of the fundamental goals of the V2 patch is to avoid this
exact problem.

Jeff
Jeff Law Aug. 24, 2018, 4:51 p.m. UTC | #12
On 08/02/2018 07:26 AM, Bernd Edlinger wrote:
> 
>> -  if (compare_tree_int (array_size, length + 1) < 0)
>> +  if (nulterm)
> but here you compare bytes with length which is measued un chars.
> 
>> +    *nulterm = array_elts > length;
>> +  else if (array_elts <= length)
>>      return NULL_TREE;
>>  
>>    *ptr_offset = offset;
Actually in the V2 patch length is measured by element size.  So I think
this is a non-issue.



>> -      /* Compute and store the length of the substring at OFFSET.
>> +      /* Compute and store the size of the substring at OFFSET.
>>  	 All offsets past the initial length refer to null strings.  */
>> -      if (offset <= string_length)
>> -	*strlen = string_length - offset;
> this should be offset < string_length.
> 
>> +      if (offset <= string_size)
>> +	*strsize = string_size - offset;
>>        else
>> -	*strlen = 0;
> this should be 1, you may access the NUL byte of "".
> 
>> +	*strsize = 0;
>>      }
Agreed in both cases based on the defined behavior for the function (NUL
is included).


>>  
>> -  const char *string = TREE_STRING_POINTER (src);
>> -
>> -  if (string_length == 0
>> -      || offset >= string_size)
>> +  if (string_size == 0
>> +      || offset >= array_size)
>>      return NULL;
>>  
>> -  if (strsize)
>> -    {
>> -      /* Support even constant character arrays that aren't proper
>> -	 NUL-terminated strings.  */
>> -      *strsize = string_size;
>> -    }
>> -  else if (string[string_length - 1] != '\0')
> Well, this is broken for wide character strings.
> but I hope we can get rid of STRING_CST which are
> not explicitly null terminated.
This hunk is gone in the V2 patch.  So I'm not going to worry about it
right now.



> 
>> +  if (!nulterm && string[string_size - 1] != '\0')
>>      {
>> -      /* Support only properly NUL-terminated strings but handle
>> -	 consecutive strings within the same array, such as the six
>> -	 substrings in "1\0002\0003".  */
>> +      /* When NULTERM is null, support only properly nul-terminated
>> +	 strings but handle consecutive strings within the same array,
>> +	 such as the six substrings in "1\0002\0003".  Otherwise, let
>> +	 the caller deal with non-nul-terminated arrays.  */
>>        return NULL;
>>      }
>>  
>> -  return offset <= string_length ? string + offset : "";
> this should be offset < string_size.
> 
>> +  return offset <= string_size ? string + offset : "";
Agreed.

jeff
Bernd Edlinger Aug. 24, 2018, 5:26 p.m. UTC | #13
On 08/24/18 18:51, Jeff Law wrote:
> On 08/02/2018 07:26 AM, Bernd Edlinger wrote:
>>
>>> -  if (compare_tree_int (array_size, length + 1) < 0)
>>> +  if (nulterm)
>> but here you compare bytes with length which is measued un chars.
>>
>>> +    *nulterm = array_elts > length;
>>> +  else if (array_elts <= length)
>>>       return NULL_TREE;
>>>   
>>>     *ptr_offset = offset;
> Actually in the V2 patch length is measured by element size.  So I think
> this is a non-issue.
> 
> 
> 
>>> -      /* Compute and store the length of the substring at OFFSET.
>>> +      /* Compute and store the size of the substring at OFFSET.
>>>   	 All offsets past the initial length refer to null strings.  */
>>> -      if (offset <= string_length)
>>> -	*strlen = string_length - offset;
>> this should be offset < string_length.
>>
>>> +      if (offset <= string_size)
>>> +	*strsize = string_size - offset;
>>>         else
>>> -	*strlen = 0;
>> this should be 1, you may access the NUL byte of "".
>>
>>> +	*strsize = 0;
>>>       }
> Agreed in both cases based on the defined behavior for the function (NUL
> is included).
> 
> 
>>>   
>>> -  const char *string = TREE_STRING_POINTER (src);
>>> -
>>> -  if (string_length == 0
>>> -      || offset >= string_size)
>>> +  if (string_size == 0
>>> +      || offset >= array_size)
>>>       return NULL;
>>>   
>>> -  if (strsize)
>>> -    {
>>> -      /* Support even constant character arrays that aren't proper
>>> -	 NUL-terminated strings.  */
>>> -      *strsize = string_size;
>>> -    }
>>> -  else if (string[string_length - 1] != '\0')
>> Well, this is broken for wide character strings.
>> but I hope we can get rid of STRING_CST which are
>> not explicitly null terminated.

I am afraid that is not going to happen.
Maybe we can get STRING_CST that are never longer
than the TYPE_UNIT_SIZE, but c_strlen and c_getstr
need to take care that the string is zero-terminated.

string_constant, should not promise the string is zero terminated.
But instead it can promise that:
1) the STRING_CST is valid up to TREE_STRING_LENGTH
2) mem_size is >= TREE_STRING_LENGTH
3) memory between TREE_STRING_LENGTH and mem_size is ZERO.

It will not guarantee anything about zero termination any more.

But that will again need an adjustment to the string_constant
and likely to my 86711/86714/87053 patch set.

Sorry for all this back and forth.

> This hunk is gone in the V2 patch.  So I'm not going to worry about it
> right now.
> 
> 

Hmm.  In my tree I wanted to drop this check first, but that
did not work right. So I skip the check if strsize is not NULL.
But if it is NULL, I check the element size is 1, if it is
not 1 return NULL, because it is no character string.

I had several iterations here, and it might still be dependent
on semantical properties of STRING_CST that may not be be granted
in the light of my recent discoveries.


In the end, the best approach might be to either merge my patch
with Martins, or step-wise, first fixing wrong code, and then
implementing warnings without fixing wrong code.

Anyway, the smaller the patch the better for the review.


Bernd.
Bernd Edlinger Aug. 24, 2018, 9:56 p.m. UTC | #14
On 08/24/18 18:04, Jeff Law wrote:
> On 08/24/2018 06:27 AM, Bernd Edlinger wrote:
> [ Lots of snipping throughout ]
> 
> 
>>>>
>>>>> +
>>>>>      if (TREE_CODE (src) == COND_EXPR
>>>>>          && (only_value || !TREE_SIDE_EFFECTS (TREE_OPERAND (src, 0))))
>>>>>        {
>>>>> -      tree len1, len2;
>>>>> -
>>>>> -      len1 = c_strlen (TREE_OPERAND (src, 1), only_value);
>>>>> -      len2 = c_strlen (TREE_OPERAND (src, 2), only_value);
>>>>> +      tree len1 = c_strlen (TREE_OPERAND (src, 1), only_value, arrs);
>>>>> +      tree len2 = c_strlen (TREE_OPERAND (src, 2), only_value, arrs + 1);
>>>>>          if (tree_int_cst_equal (len1, len2))
>>>>> -	return len1;
>>>>> +	{
>>>>
>>>> funny, if called with NULL *arr and arrs[0] alias each other.
>>>
>>>>
>>>>> +	  *arr = arrs[0] ? arrs[0] : arrs[1];
>>>>> +	  return len1;
>>>>> +	}
>>>>>        }
>>>
>>> And in this case it looks like your comment is before the code you're
>>> commenting about.  It was fairly obvious in this case because the code
>>> prior to your "funny, if called..." comment didn't reference arr or arrs
>>> at all.
>>>
>>> And more importantly, why are you concerned about the aliasing?
>>>
>>
>> It is just *arr = arrs[0] does nothing, but it looks like the author
>> was not aware of it.  It may be okay, but causes head-scratching.
>> If you don't have the context you will think this does something different.
> *arr = arrs[0] ? arrs[0] : arrs[1];
> 
> Yes, there's potentially a dead store when arr points to arrs[0] because
> of the earlier initialization when NULL was passed for arr.  But
> otherwise we'd be checking repeatedly that arr != NULL.
> 
> 
>>>>>      /* PTR can point to the byte representation of any string type, including
>>>>>         char* and wchar_t*.  */
>>>>>      const char *ptr = TREE_STRING_POINTER (src);
>>>>> @@ -650,7 +709,8 @@ c_strlen (tree src, int only_value)
>>>>>          offsave = fold_convert (ssizetype, offsave);
>>>>>          tree condexp = fold_build2_loc (loc, LE_EXPR, boolean_type_node, offsave,
>>>>>    				      build_int_cst (ssizetype, len * eltsize));
>>>>
>>>> this computation is wrong, it computes not in units of eltsize,
>>>> I am however not sure if it is really good that this function tries to
>>>> compute strlen of wide character strings.
>>> Note that in the second version of Martin's patch eltsize will always be
>>> 1 when we get here.  Furthermore, the multiplication by eltsize is gone.
>>>    It looks like you switched back to commenting after the code for that
>>> comment, but then immediately thereafter...
>>>
>>
>> Acutally I had no idea that the second patch did resolve some of my comments
>> and which those were.
> It happens.  Multiple long threads make it difficult to follow.
> 
>>
>> I had the impression that it is just splitting up of a large patch into
>> several smaller without reworking at the same time.
>>
>> Once again, a summary what was changed would be helpful.
> Agreed.
> 
> 
>>>> What are you fixing here, I think that was another bug.
>>>> If this fixes something then it should be in a different patch,
>>>> just handling this.
>>>>
>>>>>      unsigned len = string_length (ptr + eltoff * eltsize, eltsize,
>>>>> -				maxelts - eltoff);
>>>>> +				strelts - eltoff);
>>>>>    
>>>>>      return ssize_int (len);
>>>>>    }
>>> I'm guessing the comment in this case refers to the code after.
>>> Presumably questioning the change from maxelts to strelts.
>>>
>>>
>>
>> Yes.  I was thinking that could be a patch of its own.
> Funny.  I found myself in agreement and was going to extract this out
> and run it through the usual tests and get it onto the trunk
> immediately.  Then I realized you'd already fixed this in the patch that
> added the eltsize paramter to c_strlen which has already been committed
> :-)
> 
> 
> 
>>
>> Well, yes, although I changed my mind on strlen(L"abc") meanwhile.
>>
>> This may appear as if I contradict myself but, it is more like I learn.
>>
>> The installed patch for the ELTSIZE parameter was in part inspired by the
>> thought about "not folding invalid stuff" that was forming in my mind
>> at that time, even before I wrote it down and sent it to this list.
> ACK.  And I think there seems to be consensus forming around that
> concept which is good.
> 
> If we ultimately decide not to fold strlen of a wide character string,
> then that'll be an easy enough change to make.  In the mean time bring
> consistent with how the C library strlen works is a good thing IMHO.
> 
> 
>>>>> @@ -8255,19 +8333,30 @@ fold_builtin_classify_type (tree arg)
>>>>>      return build_int_cst (integer_type_node, type_to_class (TREE_TYPE (arg)));
>>>>>    }
>>>>>    
>>>>> -/* Fold a call to __builtin_strlen with argument ARG.  */
>>>>> +/* Fold a strlen call to FNDECL of TYPE, and with argument ARG.  */
>>>>>    
>>>>>    static tree
>>>>> -fold_builtin_strlen (location_t loc, tree type, tree arg)
>>>>> +fold_builtin_strlen (location_t loc, tree fndecl, tree type, tree arg)
>>>>>    {
>>>>>      if (!validate_arg (arg, POINTER_TYPE))
>>>>>        return NULL_TREE;
>>>>>      else
>>>>>        {
>>>>> -      tree len = c_strlen (arg, 0);
>>>>> -
>>>>> +      tree arr = NULL_TREE;
>>>>> +      tree len = c_strlen (arg, 0, &arr);
>>>>
>>>> Is it possible to write a test case where strlen(L"test") reaches this point?
>>>> what will c_strlen return then?
>>> Given your fix to have c_strlen count bytes by default, I think we're OK
>>> here.
>>>
>>
>> Yes.  But my fix was still incomplete.
>> So incremental progress is necessary here.
> That's fine.  I'm generally a fan of incremental improvements.
> 
>>>>
>>>>>        {
>>>>>          *ptr_offset = fold_convert (sizetype, offset);
>>>>> +      if (nulterm)
>>>>> +	*nulterm = true;
>>>>> +      if (decl)
>>>>> +	*decl = NULL_TREE;
>>>>>          return array;
>>>>>        }
>>> So your comment seems to be referring to the code above the comment as
>>> well as below.  Confusing.  Consistency really helps.
>>>
>>> Are we at a point where we're ready to declare STRING_CSTs as always
>>> being properly terminated?  If not the hunk of code above needs some
>>> rethinking since we can't guarantee the string is properly terminated.
>>>
>>
>> No.  That is still in the flux.
>> And I am not sure if the "properly terminated" will survive.
>>
>> I am digging deep to the ground now.
>> And the correctness of the code in questing depends heavily on the
>> results.  Therefore I would like to fix this from the ground.
>>
>> Adding lots of new code that is based on wrong assumptions
>> will not help.
> OK.  So clearly this hunk needs rethinking.  I'm not sure if/how
> critical the code above is -- we may be able to get away with deferring
> this chunk.  I'll have to look at that more deeply.
> 

Yes. Note one interesting thing.  The new version of the STRING_CST
consistency check in varasm.c does some magic.
This time I think Richard generally agreed on the approach.

The surprise is, it fixes both PR86711/86714 without any other patch.

I really think that is the preferred way to go ahead.

If you prevent inconsistent input to the string_constant function
then it is a lot easier to prevent inconsistent output.

So I start to think that at least my proposed fix for PR86711/86714
seems like it was not yet at the root cause of the problem.
The root cause was probably the semantical differences between
different uses of STRING_CSTs.


>>
>>
>>>
>>>>>    
>>>>> @@ -11414,6 +11422,49 @@ string_constant (tree arg, tree *ptr_offset)
>>>>>      if (!array_size || TREE_CODE (array_size) != INTEGER_CST)
>>>>>        return NULL_TREE;
>>>>>    
>>>>> +  unsigned HOST_WIDE_INT array_elts = tree_to_uhwi (array_size);
>>>>> +
>>>>
>>>> I don't understand why this is necessary at all.
>>>> It looks way too complicated, to say the least.
>>>>
>>>> TREE_TYPE (init) has already the type of the member.
>>>>
>>>>> +  /* When ARG refers to an aggregate (of arrays) try to determine
>>>>> +     the size of the character array within the aggregate.  */
>>>>> +  tree ref = arg;
>>>>> +  tree reftype = TREE_TYPE (arg);
>>>>> +
>>>>> +  if (TREE_CODE (ref) == MEM_REF)
>>>>> +    {
>>>>> +      ref = TREE_OPERAND (ref, 0);
>>>>> +      if (TREE_CODE (ref) == ADDR_EXPR)
>>>>> +	{
>>>>> +	  ref = TREE_OPERAND (ref, 0);
>>>>> +	  reftype = TREE_TYPE (ref);
>>>>> +	}
>>>>> +    }
>>>>> +  else
>>>>> +    while (TREE_CODE (ref) == ARRAY_REF)
>>>>> +      {
>>>>> +	reftype = TREE_TYPE (ref);
>>>>> +	ref = TREE_OPERAND (ref, 0);
>>>>> +      }
>>>>> +
>>>>> +  if (TREE_CODE (ref) == COMPONENT_REF)
>>>>> +    reftype = TREE_TYPE (ref);
>>>>> +
>>>>> +  while (TREE_CODE (reftype) == ARRAY_TYPE)
>>>>> +    {
>>>>> +      tree next = TREE_TYPE (reftype);
>>>>> +      if (TREE_CODE (next) == INTEGER_TYPE)
>>>>> +	{
>>>>> +	  if (tree size = TYPE_SIZE_UNIT (reftype))
>>>>> +	    if (tree_fits_uhwi_p (size))
>>>>> +	      array_elts = tree_to_uhwi (size);
>>>>
>>>> so array_elts is measued in bytes.
>>> So I'm guessing your comment about the code looking way too complicated
>>> is referring to the code *after* your comment.  That code is not in the
>>> v2 patch.  At least not 01/06 which addresses the codegen/opt issues.  I
>>> haven't checked the full kit to see if this code appears in a subsequent
>>> patch.
>>>
>>
>> No I meant the code between
>> /* When ARG refers to an aggregate (of arrays) try to determine
>>      the size of the character array within the aggregate.  */
>>
>> and here.  It is wrong (assuming C semantic on GIMPLE).
> Ah.  Yea.  I think we're broadly in agreement we can't do that for
> anything which affects optimization/codegen.  Given those bits aren't in
> Martin's v2 patch I think we can move on.
> 
> 

Yes, with the new semantics of STRING_CST we can guarantee the
STRING_CST have a _valid_ TYPE_UNIT_SIZE, even in cases of flexible array members
where previously the TYPE_UNIT_SIZE was NULL.  That makes a big difference here.


So I would wait with this topic until we have the semantic issues resolved.



>>
>>>
>>>>
>>>>> +	  break;
>>>>> +	}
>>>>> +
>>>>> +      reftype = TREE_TYPE (reftype);
>>>>> +    }
>>>>> +
>>>>> +  if (decl)
>>>>> +    *decl = array;
>>>>> +
>>>>>      /* Avoid returning a string that doesn't fit in the array
>>>>>         it is stored in, like
>>>>>         const char a[4] = "abcde";
>>>>> @@ -11427,7 +11478,9 @@ string_constant (tree arg, tree *ptr_offset)
>>>>>      unsigned HOST_WIDE_INT length = TREE_STRING_LENGTH (init);
>>>>>      length = string_length (TREE_STRING_POINTER (init), charsize,
>>>>>    			  length / charsize);
>>>>
>>>> Some callers especially those where the wrong code happens, expect
>>>> to be able to access the STRING_CST up to TREE_STRING_LENGTH,
>>>> But using string_length assume thy stop at the first nul char.
>>> Example please?
>>>
>>
>> tree-ssa-forwprop.c:
>>                 str1 = string_constant (src1, &off1);
>>                 if (str1 == NULL_TREE)
>>                   break;
>>                 if (!tree_fits_uhwi_p (off1)
>>                     || compare_tree_int (off1, TREE_STRING_LENGTH (str1) - 1) > 0
>>                     || compare_tree_int (len1, TREE_STRING_LENGTH (str1)
>>                                                - tree_to_uhwi (off1)) > 0
> I understand that you're concerned about reading past the end of the
> returned STRING_CST via the subsequent memcpy in tree-ssa-forwprop.c
> which starts at src1+off1 and reads len1 bytes.
> 
> But I'm not sure how that can happen in the V2 patch from Martin.  In
> fact, one of the fundamental goals of the V2 patch is to avoid this
> exact problem.
> 

here I quote form martins 1/6 patch:
> +  /* Compute the lower bound number of elements (not bytes) in the array
> +     that the string is used to initialize.  The actual size of the array
> +     will be may be greater if the string is shorter, but the important
> +     data point is whether the literal, including the terminating nul,
> +     fits in the array. */
> +  unsigned HOST_WIDE_INT array_elts
> +    = tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (init))) / charsize;
> +
> +  /* Compute the string length in (wide) characters.  */

So prior to my STRING_CST patch this will ICE on a flexible array member,
because those have TYPE_SIZE_UNIT = NULL, and tree_to_uhwi will ICE.

I used:

  compare_tree_int (TYPE_SIZE_UNIT (TREE_TYPE (init)),
                    TREE_STRING_LENGTH (init))
  
and this will not ICE with NULL, but consider it like infinity,
and return 1.

So my version did not ICE in that case.

Once we have the new STRING_CST semantics in place it will make
this defined, but it is at the same time completely unnecessary.


Bernd.
Jeff Law Aug. 24, 2018, 11:54 p.m. UTC | #15
On 08/24/2018 11:26 AM, Bernd Edlinger wrote:
> On 08/24/18 18:51, Jeff Law wrote:
>>> Well, this is broken for wide character strings.
>>> but I hope we can get rid of STRING_CST which are
>>> not explicitly null terminated.
> 
> I am afraid that is not going to happen.
> Maybe we can get STRING_CST that are never longer
> than the TYPE_UNIT_SIZE, but c_strlen and c_getstr
> need to take care that the string is zero-terminated.
> 
> string_constant, should not promise the string is zero terminated.
> But instead it can promise that:
> 1) the STRING_CST is valid up to TREE_STRING_LENGTH
> 2) mem_size is >= TREE_STRING_LENGTH
> 3) memory between TREE_STRING_LENGTH and mem_size is ZERO.
> 
> It will not guarantee anything about zero termination any more.
Interesting because those conditions would be sufficient to deal with a
regression I stumbled over after fixing Martin's patch to not assume
that all STRING_CSTs are NUL terminated.

But I need to think about this a bit more.  Essentially the question
we'd need to ask is whether or not these are sufficient in general or
just in specific cases.

I tend to think they're not sufficient in general. If a string returned
by string_constant that didn't have a terminating NUL, but which did
pass the tests above were ultimately passed to the runtime's str*
routines, then the call may run off the end of the string.  We'd like to
be able to warn for that.

So ISTM those rules are only valid in contexts where we know the result
isn't going to be passed to str* and friends within the C library.

I do think they're sufficient to avoid problems with the
tree-ssa-forwprop code we've looked at.  So what may make the most sense
is to have that routine indicate it's willing to accept unterminated
strings, then check the conditions above before optimizing the code.

> 
> In the end, the best approach might be to either merge my patch
> with Martins, or step-wise, first fixing wrong code, and then
> implementing warnings without fixing wrong code.
Unsure at this time.  I've been working with both.  I suspect that if we
went with yours that we'd then turn around and layer Martin's on top of
it because of the desire to signal to callers that we have an
unterminated string and have the callers take appropriate action.  Which
begs the question of whether or not we just go with Martin's -- ie, is
there really any value in using both.  I haven't seen indications there
is value in that approach, but I'm still poking at things.

Jeff
Bernd Edlinger Aug. 25, 2018, 6:32 a.m. UTC | #16
On 08/25/18 01:54, Jeff Law wrote:
> On 08/24/2018 11:26 AM, Bernd Edlinger wrote:
>> On 08/24/18 18:51, Jeff Law wrote:
>>>> Well, this is broken for wide character strings.
>>>> but I hope we can get rid of STRING_CST which are
>>>> not explicitly null terminated.
>>
>> I am afraid that is not going to happen.
>> Maybe we can get STRING_CST that are never longer
>> than the TYPE_UNIT_SIZE, but c_strlen and c_getstr
>> need to take care that the string is zero-terminated.
>>
>> string_constant, should not promise the string is zero terminated.
>> But instead it can promise that:
>> 1) the STRING_CST is valid up to TREE_STRING_LENGTH
>> 2) mem_size is >= TREE_STRING_LENGTH
>> 3) memory between TREE_STRING_LENGTH and mem_size is ZERO.
>>
>> It will not guarantee anything about zero termination any more.
> Interesting because those conditions would be sufficient to deal with a
> regression I stumbled over after fixing Martin's patch to not assume
> that all STRING_CSTs are NUL terminated.
> 
> But I need to think about this a bit more.  Essentially the question
> we'd need to ask is whether or not these are sufficient in general or
> just in specific cases.
> 
> I tend to think they're not sufficient in general. If a string returned
> by string_constant that didn't have a terminating NUL, but which did
> pass the tests above were ultimately passed to the runtime's str*
> routines, then the call may run off the end of the string.  We'd like to
> be able to warn for that.
> 
> So ISTM those rules are only valid in contexts where we know the result
> isn't going to be passed to str* and friends within the C library.
> 
> I do think they're sufficient to avoid problems with the
> tree-ssa-forwprop code we've looked at.  So what may make the most sense
> is to have that routine indicate it's willing to accept unterminated
> strings, then check the conditions above before optimizing the code.
> 

There are not too many callers of string_constant.
Not all need zero termination.

But I think if the are interested in zero-termination
they should simply call c_strlen or c_getstr.


>>
>> In the end, the best approach might be to either merge my patch
>> with Martins, or step-wise, first fixing wrong code, and then
>> implementing warnings without fixing wrong code.
> Unsure at this time.  I've been working with both.  I suspect that if we
> went with yours that we'd then turn around and layer Martin's on top of
> it because of the desire to signal to callers that we have an
> unterminated string and have the callers take appropriate action.  Which
> begs the question of whether or not we just go with Martin's -- ie, is
> there really any value in using both.  I haven't seen indications there
> is value in that approach, but I'm still poking at things.
> 

Well, ya call it "layer one patch over the other"
I call it "incremental improvements".


Bernd.
Jeff Law Aug. 25, 2018, 5:32 p.m. UTC | #17
On 08/25/2018 12:32 AM, Bernd Edlinger wrote:
> On 08/25/18 01:54, Jeff Law wrote:
>> On 08/24/2018 11:26 AM, Bernd Edlinger wrote:
>>> On 08/24/18 18:51, Jeff Law wrote:
>>>>> Well, this is broken for wide character strings.
>>>>> but I hope we can get rid of STRING_CST which are
>>>>> not explicitly null terminated.
>>>
>>> I am afraid that is not going to happen.
>>> Maybe we can get STRING_CST that are never longer
>>> than the TYPE_UNIT_SIZE, but c_strlen and c_getstr
>>> need to take care that the string is zero-terminated.
>>>
>>> string_constant, should not promise the string is zero terminated.
>>> But instead it can promise that:
>>> 1) the STRING_CST is valid up to TREE_STRING_LENGTH
>>> 2) mem_size is >= TREE_STRING_LENGTH
>>> 3) memory between TREE_STRING_LENGTH and mem_size is ZERO.
>>>
>>> It will not guarantee anything about zero termination any more.
>> Interesting because those conditions would be sufficient to deal with a
>> regression I stumbled over after fixing Martin's patch to not assume
>> that all STRING_CSTs are NUL terminated.
>>
>> But I need to think about this a bit more.  Essentially the question
>> we'd need to ask is whether or not these are sufficient in general or
>> just in specific cases.
>>
>> I tend to think they're not sufficient in general. If a string returned
>> by string_constant that didn't have a terminating NUL, but which did
>> pass the tests above were ultimately passed to the runtime's str*
>> routines, then the call may run off the end of the string.  We'd like to
>> be able to warn for that.
>>
>> So ISTM those rules are only valid in contexts where we know the result
>> isn't going to be passed to str* and friends within the C library.
>>
>> I do think they're sufficient to avoid problems with the
>> tree-ssa-forwprop code we've looked at.  So what may make the most sense
>> is to have that routine indicate it's willing to accept unterminated
>> strings, then check the conditions above before optimizing the code.
>>
> 
> There are not too many callers of string_constant.
> Not all need zero termination.
Right.  And in retrospect we probably should have avoided default
parameter overloads and just fixed the callers.  But that can be a
follow-up.

> 
> But I think if the are interested in zero-termination
> they should simply call c_strlen or c_getstr.
Perhaps.


> 
> 
>>>
>>> In the end, the best approach might be to either merge my patch
>>> with Martins, or step-wise, first fixing wrong code, and then
>>> implementing warnings without fixing wrong code.
>> Unsure at this time.  I've been working with both.  I suspect that if we
>> went with yours that we'd then turn around and layer Martin's on top of
>> it because of the desire to signal to callers that we have an
>> unterminated string and have the callers take appropriate action.  Which
>> begs the question of whether or not we just go with Martin's -- ie, is
>> there really any value in using both.  I haven't seen indications there
>> is value in that approach, but I'm still poking at things.
>>
> 
> Well, ya call it "layer one patch over the other"
> I call it "incremental improvements".
It is (of course) a case by case basis.  The way I try to look at these
things is to ask whether or not the first patch under consideration
would have any value/purpose after the second patch was installed.  If
so, then it may make sense to include both.  If not, then we really just
want one patch.

Jeff
Bernd Edlinger Aug. 25, 2018, 6:36 p.m. UTC | #18
On 08/25/18 19:32, Jeff Law wrote:
> On 08/25/2018 12:32 AM, Bernd Edlinger wrote:
>> On 08/25/18 01:54, Jeff Law wrote:
>>> On 08/24/2018 11:26 AM, Bernd Edlinger wrote:
>>>> On 08/24/18 18:51, Jeff Law wrote:
>>>>>> Well, this is broken for wide character strings.
>>>>>> but I hope we can get rid of STRING_CST which are
>>>>>> not explicitly null terminated.
>>>>
>>>> I am afraid that is not going to happen.
>>>> Maybe we can get STRING_CST that are never longer
>>>> than the TYPE_UNIT_SIZE, but c_strlen and c_getstr
>>>> need to take care that the string is zero-terminated.
>>>>
>>>> string_constant, should not promise the string is zero terminated.
>>>> But instead it can promise that:
>>>> 1) the STRING_CST is valid up to TREE_STRING_LENGTH
>>>> 2) mem_size is >= TREE_STRING_LENGTH
>>>> 3) memory between TREE_STRING_LENGTH and mem_size is ZERO.
>>>>
>>>> It will not guarantee anything about zero termination any more.
>>> Interesting because those conditions would be sufficient to deal with a
>>> regression I stumbled over after fixing Martin's patch to not assume
>>> that all STRING_CSTs are NUL terminated.
>>>
>>> But I need to think about this a bit more.  Essentially the question
>>> we'd need to ask is whether or not these are sufficient in general or
>>> just in specific cases.
>>>
>>> I tend to think they're not sufficient in general. If a string returned
>>> by string_constant that didn't have a terminating NUL, but which did
>>> pass the tests above were ultimately passed to the runtime's str*
>>> routines, then the call may run off the end of the string.  We'd like to
>>> be able to warn for that.
>>>
>>> So ISTM those rules are only valid in contexts where we know the result
>>> isn't going to be passed to str* and friends within the C library.
>>>
>>> I do think they're sufficient to avoid problems with the
>>> tree-ssa-forwprop code we've looked at.  So what may make the most sense
>>> is to have that routine indicate it's willing to accept unterminated
>>> strings, then check the conditions above before optimizing the code.
>>>
>>
>> There are not too many callers of string_constant.
>> Not all need zero termination.
> Right.  And in retrospect we probably should have avoided default
> parameter overloads and just fixed the callers.  But that can be a
> follow-up.
> 
>>
>> But I think if the are interested in zero-termination
>> they should simply call c_strlen or c_getstr.
> Perhaps.
> 
> 
>>
>>
>>>>
>>>> In the end, the best approach might be to either merge my patch
>>>> with Martins, or step-wise, first fixing wrong code, and then
>>>> implementing warnings without fixing wrong code.
>>> Unsure at this time.  I've been working with both.  I suspect that if we
>>> went with yours that we'd then turn around and layer Martin's on top of
>>> it because of the desire to signal to callers that we have an
>>> unterminated string and have the callers take appropriate action.  Which
>>> begs the question of whether or not we just go with Martin's -- ie, is
>>> there really any value in using both.  I haven't seen indications there
>>> is value in that approach, but I'm still poking at things.
>>>
>>
>> Well, ya call it "layer one patch over the other"
>> I call it "incremental improvements".
> It is (of course) a case by case basis.  The way I try to look at these
> things is to ask whether or not the first patch under consideration
> would have any value/purpose after the second patch was installed.  If
> so, then it may make sense to include both.  If not, then we really just
> want one patch.
> 

Agreed.  I think the question is which of the possible STRING_CST
semantics we want to have in the end (the middle-end).
Everything builds on top of the semantic properties of STRING_CSTs.

My first attempt of fix the STRING_CST semantic was trying to make
string_constant happy.

My second attempt is trying to make Richard happy.  And when I look
at both patches, I think the second one is better, and more simple.


BTW I need to correct on statement in my last e-mail:

On 08/24/18 23:55, Bernd Edlinger wrote:>
> here I quote form martins 1/6 patch:
>> +  /* Compute the lower bound number of elements (not bytes) in the array
>> +     that the string is used to initialize.  The actual size of the array
>> +     will be may be greater if the string is shorter, but the important
>> +     data point is whether the literal, including the terminating nul,
>> +     fits in the array. */
>> +  unsigned HOST_WIDE_INT array_elts
>> +    = tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (init))) / charsize;
>> +
>> +  /* Compute the string length in (wide) characters.  */
> 
> So prior to my STRING_CST patch this will ICE on a flexible array member,
> because those have TYPE_SIZE_UNIT = NULL, and tree_to_uhwi will ICE.
> 
> I used:
> 
>   compare_tree_int (TYPE_SIZE_UNIT (TREE_TYPE (init)),
>                     TREE_STRING_LENGTH (init))
> 
> and this will not ICE with NULL, but consider it like infinity,
> and return 1.
> 
> So my version did not ICE in that case.
> 

Oooops,

actually both versions will likely ICE on TYPE_SIZE_UNIT == NULL.
I actually tried to test that case, but have done something wrong.

I hope we can get rid if the incomplete types in the middle-end.

So maybe just inject "if (!tree_fits_uhwi_p (...)) return NULL_TREE;"
here?   Or maybe just defer until we have clarity about the semantics.


Bernd.
Jeff Law Aug. 25, 2018, 7:02 p.m. UTC | #19
On 08/25/2018 12:36 PM, Bernd Edlinger wrote:

>>>>
>>>
>>> Well, ya call it "layer one patch over the other"
>>> I call it "incremental improvements".
>> It is (of course) a case by case basis.  The way I try to look at these
>> things is to ask whether or not the first patch under consideration
>> would have any value/purpose after the second patch was installed.  If
>> so, then it may make sense to include both.  If not, then we really just
>> want one patch.
>>
> 
> Agreed.  I think the question is which of the possible STRING_CST
> semantics we want to have in the end (the middle-end).
> Everything builds on top of the semantic properties of STRING_CSTs.
This certainly plays a role.  I bumped pretty hard against the
STRING_CST semantics issue with Martin's patch.  I'm hoping that making
those more consistent will ultimately simplify things and avoid the
problems I'm stumbling over.

Of course, that means more delays in getting this sorted out.  I really
thought I had a viable plan a couple days ago, but I'm having to rethink
in light of some of the issues raised.


> 
> My first attempt of fix the STRING_CST semantic was trying to make
> string_constant happy.
> 
> My second attempt is trying to make Richard happy.  And when I look
> at both patches, I think the second one is better, and more simple.
In general I've found that Richie's advice generally results in a
cleaner implementation ;-)
> 
> 
> BTW I need to correct on statement in my last e-mail:
> 
> On 08/24/18 23:55, Bernd Edlinger wrote:>
>> here I quote form martins 1/6 patch:
>>> +  /* Compute the lower bound number of elements (not bytes) in the array
>>> +     that the string is used to initialize.  The actual size of the array
>>> +     will be may be greater if the string is shorter, but the important
>>> +     data point is whether the literal, including the terminating nul,
>>> +     fits in the array. */
>>> +  unsigned HOST_WIDE_INT array_elts
>>> +    = tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (init))) / charsize;
>>> +
>>> +  /* Compute the string length in (wide) characters.  */
>>
>> So prior to my STRING_CST patch this will ICE on a flexible array member,
>> because those have TYPE_SIZE_UNIT = NULL, and tree_to_uhwi will ICE.
>>
>> I used:
>>
>>   compare_tree_int (TYPE_SIZE_UNIT (TREE_TYPE (init)),
>>                     TREE_STRING_LENGTH (init))
>>
>> and this will not ICE with NULL, but consider it like infinity,
>> and return 1.
>>
>> So my version did not ICE in that case.
>>
> 
> Oooops,
> 
> actually both versions will likely ICE on TYPE_SIZE_UNIT == NULL.
> I actually tried to test that case, but have done something wrong.
> 
> I hope we can get rid if the incomplete types in the middle-end.
> 
> So maybe just inject "if (!tree_fits_uhwi_p (...)) return NULL_TREE;"
> here?   Or maybe just defer until we have clarity about the semantics.
Not sure.  I've tried largely to not let VLA issues drive anything here.
 I'm not a fan of them for a variety of reasons and thus I tend to look
at all the VLA stuff as exceptional cases.

jeff
Bernd Edlinger Aug. 25, 2018, 7:32 p.m. UTC | #20
On 08/25/18 21:02, Jeff Law wrote:
> On 08/25/2018 12:36 PM, Bernd Edlinger wrote:
> 
>>>>>
>>>>
>>>> Well, ya call it "layer one patch over the other"
>>>> I call it "incremental improvements".
>>> It is (of course) a case by case basis.  The way I try to look at these
>>> things is to ask whether or not the first patch under consideration
>>> would have any value/purpose after the second patch was installed.  If
>>> so, then it may make sense to include both.  If not, then we really just
>>> want one patch.
>>>
>>
>> Agreed.  I think the question is which of the possible STRING_CST
>> semantics we want to have in the end (the middle-end).
>> Everything builds on top of the semantic properties of STRING_CSTs.
> This certainly plays a role.  I bumped pretty hard against the
> STRING_CST semantics issue with Martin's patch.  I'm hoping that making
> those more consistent will ultimately simplify things and avoid the
> problems I'm stumbling over.
> 
> Of course, that means more delays in getting this sorted out.  I really
> thought I had a viable plan a couple days ago, but I'm having to rethink
> in light of some of the issues raised.
> 

I think we should slow down.

> 
>>
>> My first attempt of fix the STRING_CST semantic was trying to make
>> string_constant happy.
>>
>> My second attempt is trying to make Richard happy.  And when I look
>> at both patches, I think the second one is better, and more simple.
> In general I've found that Richie's advice generally results in a
> cleaner implementation ;-)
>>
>>
>> BTW I need to correct on statement in my last e-mail:
>>
>> On 08/24/18 23:55, Bernd Edlinger wrote:>
>>> here I quote form martins 1/6 patch:
>>>> +  /* Compute the lower bound number of elements (not bytes) in the array
>>>> +     that the string is used to initialize.  The actual size of the array
>>>> +     will be may be greater if the string is shorter, but the important
>>>> +     data point is whether the literal, including the terminating nul,
>>>> +     fits in the array. */
>>>> +  unsigned HOST_WIDE_INT array_elts
>>>> +    = tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (init))) / charsize;
>>>> +
>>>> +  /* Compute the string length in (wide) characters.  */
>>>
>>> So prior to my STRING_CST patch this will ICE on a flexible array member,
>>> because those have TYPE_SIZE_UNIT = NULL, and tree_to_uhwi will ICE.
>>>
>>> I used:
>>>
>>>    compare_tree_int (TYPE_SIZE_UNIT (TREE_TYPE (init)),
>>>                      TREE_STRING_LENGTH (init))
>>>
>>> and this will not ICE with NULL, but consider it like infinity,
>>> and return 1.
>>>
>>> So my version did not ICE in that case.
>>>
>>
>> Oooops,
>>
>> actually both versions will likely ICE on TYPE_SIZE_UNIT == NULL.
>> I actually tried to test that case, but have done something wrong.
>>
>> I hope we can get rid if the incomplete types in the middle-end.
>>
>> So maybe just inject "if (!tree_fits_uhwi_p (...)) return NULL_TREE;"
>> here?   Or maybe just defer until we have clarity about the semantics.
> Not sure.  I've tried largely to not let VLA issues drive anything here.
>   I'm not a fan of them for a variety of reasons and thus I tend to look
> at all the VLA stuff as exceptional cases.
> 

We should have a test case with flexible array members, those behave
slightly different than VLAs.

struct {
   int i;
   char x[];
} s;

const struct s s = { 1, "test" };

int f()
{
   return strlen(s.x);
}


By the way, when you change so much on Martin's patch it might be
good to post it again on this list, when it's ready, so maybe I can have a
look at it and help out with some review comments?  (please put me on CC)


Bernd.
Martin Sebor Aug. 25, 2018, 8:42 p.m. UTC | #21
On 08/25/2018 01:32 PM, Bernd Edlinger wrote:
> On 08/25/18 21:02, Jeff Law wrote:
>> On 08/25/2018 12:36 PM, Bernd Edlinger wrote:
>>
>>>>>>
>>>>>
>>>>> Well, ya call it "layer one patch over the other"
>>>>> I call it "incremental improvements".
>>>> It is (of course) a case by case basis.  The way I try to look at these
>>>> things is to ask whether or not the first patch under consideration
>>>> would have any value/purpose after the second patch was installed.  If
>>>> so, then it may make sense to include both.  If not, then we really just
>>>> want one patch.
>>>>
>>>
>>> Agreed.  I think the question is which of the possible STRING_CST
>>> semantics we want to have in the end (the middle-end).
>>> Everything builds on top of the semantic properties of STRING_CSTs.
>> This certainly plays a role.  I bumped pretty hard against the
>> STRING_CST semantics issue with Martin's patch.  I'm hoping that making
>> those more consistent will ultimately simplify things and avoid the
>> problems I'm stumbling over.
>>
>> Of course, that means more delays in getting this sorted out.  I really
>> thought I had a viable plan a couple days ago, but I'm having to rethink
>> in light of some of the issues raised.
>>
>
> I think we should slow down.

That's coming from someone whose been piling on revisions upon
revisions of your own work here as you change your mind about
whether STRING_CST should or shouldn't have a nul byte at
the end.  You've been doing nothing but slowing us down for
the last five weeks.

There's nothing in this basic patch that cannot be easily adjusted
if something changes in the future.  But there is quite a bit of
useful work already that builds on the basic infrastructure in it
that could move forward and that wouldn't be significantly affected
by changes to the underlying representation.

Martin
Jeff Law Aug. 25, 2018, 11:22 p.m. UTC | #22
On 08/25/2018 01:32 PM, Bernd Edlinger wrote:
> On 08/25/18 21:02, Jeff Law wrote:
>> On 08/25/2018 12:36 PM, Bernd Edlinger wrote:
>>
>>>>>>
>>>>>
>>>>> Well, ya call it "layer one patch over the other"
>>>>> I call it "incremental improvements".
>>>> It is (of course) a case by case basis.  The way I try to look at these
>>>> things is to ask whether or not the first patch under consideration
>>>> would have any value/purpose after the second patch was installed.  If
>>>> so, then it may make sense to include both.  If not, then we really just
>>>> want one patch.
>>>>
>>>
>>> Agreed.  I think the question is which of the possible STRING_CST
>>> semantics we want to have in the end (the middle-end).
>>> Everything builds on top of the semantic properties of STRING_CSTs.
>> This certainly plays a role.  I bumped pretty hard against the
>> STRING_CST semantics issue with Martin's patch.  I'm hoping that making
>> those more consistent will ultimately simplify things and avoid the
>> problems I'm stumbling over.
>>
>> Of course, that means more delays in getting this sorted out.  I really
>> thought I had a viable plan a couple days ago, but I'm having to rethink
>> in light of some of the issues raised.
>>
> 
> I think we should slow down.
> 
>>
>>>
>>> My first attempt of fix the STRING_CST semantic was trying to make
>>> string_constant happy.
>>>
>>> My second attempt is trying to make Richard happy.  And when I look
>>> at both patches, I think the second one is better, and more simple.
>> In general I've found that Richie's advice generally results in a
>> cleaner implementation ;-)
>>>
>>>
>>> BTW I need to correct on statement in my last e-mail:
>>>
>>> On 08/24/18 23:55, Bernd Edlinger wrote:>
>>>> here I quote form martins 1/6 patch:
>>>>> +  /* Compute the lower bound number of elements (not bytes) in the array
>>>>> +     that the string is used to initialize.  The actual size of the array
>>>>> +     will be may be greater if the string is shorter, but the important
>>>>> +     data point is whether the literal, including the terminating nul,
>>>>> +     fits in the array. */
>>>>> +  unsigned HOST_WIDE_INT array_elts
>>>>> +    = tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (init))) / charsize;
>>>>> +
>>>>> +  /* Compute the string length in (wide) characters.  */
>>>>
>>>> So prior to my STRING_CST patch this will ICE on a flexible array member,
>>>> because those have TYPE_SIZE_UNIT = NULL, and tree_to_uhwi will ICE.
>>>>
>>>> I used:
>>>>
>>>>    compare_tree_int (TYPE_SIZE_UNIT (TREE_TYPE (init)),
>>>>                      TREE_STRING_LENGTH (init))
>>>>
>>>> and this will not ICE with NULL, but consider it like infinity,
>>>> and return 1.
>>>>
>>>> So my version did not ICE in that case.
>>>>
>>>
>>> Oooops,
>>>
>>> actually both versions will likely ICE on TYPE_SIZE_UNIT == NULL.
>>> I actually tried to test that case, but have done something wrong.
>>>
>>> I hope we can get rid if the incomplete types in the middle-end.
>>>
>>> So maybe just inject "if (!tree_fits_uhwi_p (...)) return NULL_TREE;"
>>> here?   Or maybe just defer until we have clarity about the semantics.
>> Not sure.  I've tried largely to not let VLA issues drive anything here.
>>   I'm not a fan of them for a variety of reasons and thus I tend to look
>> at all the VLA stuff as exceptional cases.
>>
> 
> We should have a test case with flexible array members, those behave
> slightly different than VLAs.
> 
> struct {
>    int i;
>    char x[];
> } s;
> 
> const struct s s = { 1, "test" };
> 
> int f()
> {
>    return strlen(s.x);
> }
> 
> 
> By the way, when you change so much on Martin's patch it might be
> good to post it again on this list, when it's ready, so maybe I can have a
> look at it and help out with some review comments?  (please put me on CC)
Actually very little changed other than mechanical stuff.  It's one
chunk of code that assumed STRING_CSTs are always NUL terminated that's
problem with that code in isolation.

But I also have to evaluate your patches in the same area and also look
at the known follow-ups to see how the various patches are likely to
interact.

And each time core assumptions/issues are reexamined parts of the
evaluation have to be redone.  In some cases they may simplify, in
others they make the analysis more complex.

Jeff
> 
> 
> Bernd.
>
Bernd Edlinger Aug. 26, 2018, 10:20 a.m. UTC | #23
On 08/25/18 22:42, Martin Sebor wrote:
> On 08/25/2018 01:32 PM, Bernd Edlinger wrote:
>> On 08/25/18 21:02, Jeff Law wrote:
>>> On 08/25/2018 12:36 PM, Bernd Edlinger wrote:
>>>
>>>>>>>
>>>>>>
>>>>>> Well, ya call it "layer one patch over the other"
>>>>>> I call it "incremental improvements".
>>>>> It is (of course) a case by case basis.  The way I try to look at these
>>>>> things is to ask whether or not the first patch under consideration
>>>>> would have any value/purpose after the second patch was installed.  If
>>>>> so, then it may make sense to include both.  If not, then we really just
>>>>> want one patch.
>>>>>
>>>>
>>>> Agreed.  I think the question is which of the possible STRING_CST
>>>> semantics we want to have in the end (the middle-end).
>>>> Everything builds on top of the semantic properties of STRING_CSTs.
>>> This certainly plays a role.  I bumped pretty hard against the
>>> STRING_CST semantics issue with Martin's patch.  I'm hoping that making
>>> those more consistent will ultimately simplify things and avoid the
>>> problems I'm stumbling over.
>>>
>>> Of course, that means more delays in getting this sorted out.  I really
>>> thought I had a viable plan a couple days ago, but I'm having to rethink
>>> in light of some of the issues raised.
>>>
>>
>> I think we should slow down.
> 
> That's coming from someone whose been piling on revisions upon
> revisions of your own work here as you change your mind about
> whether STRING_CST should or shouldn't have a nul byte at
> the end.  You've been doing nothing but slowing us down for
> the last five weeks.
> 

Yes, you know, I may be stubborn, but my point of view regarding
these matters is not set in stone.

And it happens that I listen to what others say, including you.

And if after careful consideration I find myself in agreement with
what was said, I just take the freedom to change my point of view.

I for one have learned a lot out of this discussion and out of
writing and repeatedly revising my patches.

I truly hope others have as well gained some new insights.


Thanks
Bernd.
Jeff Law Aug. 29, 2018, 5:17 p.m. UTC | #24
On 08/02/2018 12:56 PM, Bernd Edlinger wrote:
> On 08/02/18 15:26, Bernd Edlinger wrote:
>>>
>>>    /* If the length can be computed at compile-time, return it.  */
>>> -  len = c_strlen (src, 0);
>>> +  tree array;
>>> +  tree len = c_strlen (src, 0, &array);
>>
>> You know the c_strlen tries to compute wide character sizes,
>> but strlen does not do that, strlen (L"abc") should give 1
>> (or 0 on a BE machine)
>> I wonder if that is correct.
>>
> [snip]
>>>
>>>  static tree
>>> -fold_builtin_strlen (location_t loc, tree type, tree arg)
>>> +fold_builtin_strlen (location_t loc, tree fndecl, tree type, tree arg)
>>>  {
>>>    if (!validate_arg (arg, POINTER_TYPE))
>>>      return NULL_TREE;
>>>    else
>>>      {
>>> -      tree len = c_strlen (arg, 0);
>>> -
>>> +      tree arr = NULL_TREE;
>>> +      tree len = c_strlen (arg, 0, &arr);
>>
>> Is it possible to write a test case where strlen(L"test") reaches this point?
>> what will c_strlen return then?
>>
> 
> Yes, of course it is:
> 
> $ cat y.c
> int f(char *x)
> {
>    return __builtin_strlen(x);
> }
> 
> int main ()
> {
>    return f((char*)&L"abcdef"[0]);
> }
FWIW, I've twiddled this a bit and included it in Martin's patch for
86711/86714.  THe proper return value is 0 or 1 depending on endianness.


Jeff
diff mbox series

Patch

PR tree-optimization/86714 - tree-ssa-forwprop.c confused by too long initializer
PR tree-optimization/86711 - wrong folding of memchr
PR tree-optimization/86552 - missing warning for reading past the end of non-string arrays

gcc/ChangeLog:

	PR tree-optimization/86714
	PR tree-optimization/86711
	PR tree-optimization/86552
	* builtins.h (warn_string_no_nul): Declare..
	(c_strlen): Add argument.
	* builtins.c (warn_string_no_nul): New function.
	(fold_builtin_strlen): Add argument.  Detect missing nul.
	(fold_builtin_1): Adjust.
	(string_length): Add argument and use it.
	(c_strlen): Same.
	(expand_builtin_strlen): Detect missing nul.
	* expr.c (string_constant): Add arguments.  Detect missing nul
	terminator and outermost declaration it's missing in.
	* expr.h (string_constant): Add argument.
	* fold-const.c (c_getstr): Change argument to bool*, rename
	other arguments.
	* fold-const-call.c (fold_const_call): Detect missing nul.
	* gimple-fold.c (get_range_strlen): Add argument.
	(get_maxval_strlen): Adjust.
	* gimple-fold.h (get_range_strlen): Add argument.

gcc/testsuite/ChangeLog:

	PR tree-optimization/86714
	PR tree-optimization/86711
	PR tree-optimization/86552
	* gcc.c-torture/execute/memchr-1.c: New test.
	* gcc.c-torture/execute/pr86714.c: New test.
	* gcc.dg/warn-strlen-no-nul.c: New test.

diff --git a/gcc/builtins.c b/gcc/builtins.c
index aa3e0d8..f4924d5 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -150,7 +150,7 @@  static tree stabilize_va_list_loc (location_t, tree, int);
 static rtx expand_builtin_expect (tree, rtx);
 static tree fold_builtin_constant_p (tree);
 static tree fold_builtin_classify_type (tree);
-static tree fold_builtin_strlen (location_t, tree, tree);
+static tree fold_builtin_strlen (location_t, tree, tree, tree);
 static tree fold_builtin_inf (location_t, tree, int);
 static tree rewrite_call_expr (location_t, tree, int, tree, int, ...);
 static bool validate_arg (const_tree, enum tree_code code);
@@ -550,6 +550,36 @@  string_length (const void *ptr, unsigned eltsize, unsigned maxelts)
   return n;
 }
 
+/* For a call expression EXP to a function that expects a string argument,
+   issue a diagnostic due to it being a called with an argument NONSTR
+   that is a character array with no terminating NUL.  */
+
+void
+warn_string_no_nul (location_t loc, tree exp, tree fndecl, tree nonstr)
+{
+  loc = expansion_point_location_if_in_system_header (loc);
+
+  bool warned;
+  if (exp)
+    {
+      if (!fndecl)
+	fndecl = get_callee_fndecl (exp);
+      warned = warning_at (loc, OPT_Wstringop_overflow_,
+			   "%K%qD argument missing terminating nul",
+			   exp, fndecl);
+    }
+  else
+    {
+      gcc_assert (fndecl);
+      warned = warning_at (loc, OPT_Wstringop_overflow_,
+			   "%qD argument missing terminating nul",
+			   fndecl);
+    }
+
+  if (warned && DECL_P (nonstr))
+    inform (DECL_SOURCE_LOCATION (nonstr), "referenced argument declared here");
+}
+
 /* Compute the length of a null-terminated character string or wide
    character string handling character sizes of 1, 2, and 4 bytes.
    TREE_STRING_LENGTH is not the right way because it evaluates to
@@ -567,37 +597,60 @@  string_length (const void *ptr, unsigned eltsize, unsigned maxelts)
    accesses.  Note that this implies the result is not going to be emitted
    into the instruction stream.
 
+   When ARR is non-null and the string is not properly nul-terminated,
+   set *ARR to the declaration of the outermost constant object whose
+   initializer (or one of its elements) is not nul-terminated.
+
    The value returned is of type `ssizetype'.
 
    Unfortunately, string_constant can't access the values of const char
    arrays with initializers, so neither can we do so here.  */
 
 tree
-c_strlen (tree src, int only_value)
+c_strlen (tree src, int only_value, tree *arr /* = NULL */)
 {
   STRIP_NOPS (src);
+
+  /* Used to detect non-nul-terminated strings in subexpressions
+     of a conditional expression.  When ARR is null, point it at
+     one of the elements for simplicity.  */
+  tree arrs[] = { NULL_TREE, NULL_TREE };
+  if (!arr)
+    arr = arrs;
+
   if (TREE_CODE (src) == COND_EXPR
       && (only_value || !TREE_SIDE_EFFECTS (TREE_OPERAND (src, 0))))
     {
-      tree len1, len2;
-
-      len1 = c_strlen (TREE_OPERAND (src, 1), only_value);
-      len2 = c_strlen (TREE_OPERAND (src, 2), only_value);
+      tree len1 = c_strlen (TREE_OPERAND (src, 1), only_value, arrs);
+      tree len2 = c_strlen (TREE_OPERAND (src, 2), only_value, arrs + 1);
       if (tree_int_cst_equal (len1, len2))
-	return len1;
+	{
+	  *arr = arrs[0] ? arrs[0] : arrs[1];
+	  return len1;
+	}
     }
 
   if (TREE_CODE (src) == COMPOUND_EXPR
       && (only_value || !TREE_SIDE_EFFECTS (TREE_OPERAND (src, 0))))
-    return c_strlen (TREE_OPERAND (src, 1), only_value);
+    return c_strlen (TREE_OPERAND (src, 1), only_value, arr);
 
   location_t loc = EXPR_LOC_OR_LOC (src, input_location);
 
   /* Offset from the beginning of the string in bytes.  */
   tree byteoff;
-  src = string_constant (src, &byteoff);
-  if (src == 0)
-    return NULL_TREE;
+  /* Set if array is nul-terminated, false otherwise.  */
+  bool nulterm;
+  src = string_constant (src, &byteoff, &nulterm, arr);
+  if (!src)
+    {
+      *arr = arrs[0] ? arrs[0] : arrs[1];
+      return NULL_TREE;
+    }
+
+  /* Clear *ARR when the string is nul-terminated.  It should be
+     of no interest to callers.  */
+  if (nulterm)
+    *arr = NULL_TREE;
 
   /* Determine the size of the string element.  */
   unsigned eltsize
@@ -621,6 +674,12 @@  c_strlen (tree src, int only_value)
 	maxelts = maxelts / eltsize - 1;
       }
 
+  /* Unless the caller is prepared to handle it by passing in a non-null
+     ARR, fail if the terminating nul doesn't fit in the array the string
+     is stored in (as in const char a[3] = "123";  */
+  if (!arr && maxelts < strelts)
+    return NULL_TREE;
+
   /* PTR can point to the byte representation of any string type, including
      char* and wchar_t*.  */
   const char *ptr = TREE_STRING_POINTER (src);
@@ -650,7 +709,8 @@  c_strlen (tree src, int only_value)
       offsave = fold_convert (ssizetype, offsave);
       tree condexp = fold_build2_loc (loc, LE_EXPR, boolean_type_node, offsave,
 				      build_int_cst (ssizetype, len * eltsize));
-      tree lenexp = size_diffop_loc (loc, ssize_int (strelts * eltsize), offsave);
+      tree lenexp = size_diffop_loc (loc, ssize_int (strelts * eltsize),
+				     offsave);
       return fold_build3_loc (loc, COND_EXPR, ssizetype, condexp, lenexp,
 			      build_zero_cst (ssizetype));
     }
@@ -690,7 +750,7 @@  c_strlen (tree src, int only_value)
      Since ELTOFF is our starting index into the string, no further
      calculation is needed.  */
   unsigned len = string_length (ptr + eltoff * eltsize, eltsize,
-				maxelts - eltoff);
+				strelts - eltoff);
 
   return ssize_int (len);
 }
@@ -2855,7 +2915,6 @@  expand_builtin_strlen (tree exp, rtx target,
 
   struct expand_operand ops[4];
   rtx pat;
-  tree len;
   tree src = CALL_EXPR_ARG (exp, 0);
   rtx src_reg;
   rtx_insn *before_strlen;
@@ -2864,20 +2923,39 @@  expand_builtin_strlen (tree exp, rtx target,
   unsigned int align;
 
   /* If the length can be computed at compile-time, return it.  */
-  len = c_strlen (src, 0);
+  tree array;
+  tree len = c_strlen (src, 0, &array);
   if (len)
-    return expand_expr (len, target, target_mode, EXPAND_NORMAL);
+    {
+      if (array)
+	{
+	  /* Array refers to the non-nul terminated constant array
+	     whose length is attempted to be computed.  */
+	  warn_string_no_nul (EXPR_LOCATION (exp), exp, NULL_TREE, array);
+	  return NULL_RTX;
+	}
+      return expand_expr (len, target, target_mode, EXPAND_NORMAL);
+    }
 
   /* If the length can be computed at compile-time and is constant
      integer, but there are side-effects in src, evaluate
      src for side-effects, then return len.
      E.g. x = strlen (i++ ? "xfoo" + 1 : "bar");
      can be optimized into: i++; x = 3;  */
-  len = c_strlen (src, 1);
-  if (len && TREE_CODE (len) == INTEGER_CST)
+  len = c_strlen (src, 1, &array);
+  if (len)
     {
-      expand_expr (src, const0_rtx, VOIDmode, EXPAND_NORMAL);
-      return expand_expr (len, target, target_mode, EXPAND_NORMAL);
+      if (array)
+	{
+	  warn_string_no_nul (EXPR_LOCATION (exp), exp, NULL_TREE, array);
+	  return NULL_RTX;
+	}
+
+      if (TREE_CODE (len) == INTEGER_CST)
+	{
+	  expand_expr (src, const0_rtx, VOIDmode, EXPAND_NORMAL);
+	  return expand_expr (len, target, target_mode, EXPAND_NORMAL);
+	}
     }
 
   align = get_pointer_alignment (src) / BITS_PER_UNIT;
@@ -8255,19 +8333,30 @@  fold_builtin_classify_type (tree arg)
   return build_int_cst (integer_type_node, type_to_class (TREE_TYPE (arg)));
 }
 
-/* Fold a call to __builtin_strlen with argument ARG.  */
+/* Fold a strlen call to FNDECL of TYPE, and with argument ARG.  */
 
 static tree
-fold_builtin_strlen (location_t loc, tree type, tree arg)
+fold_builtin_strlen (location_t loc, tree fndecl, tree type, tree arg)
 {
   if (!validate_arg (arg, POINTER_TYPE))
     return NULL_TREE;
   else
     {
-      tree len = c_strlen (arg, 0);
-
+      tree arr = NULL_TREE;
+      tree len = c_strlen (arg, 0, &arr);
       if (len)
-	return fold_convert_loc (loc, type, len);
+	{
+	  if (loc == UNKNOWN_LOCATION && EXPR_HAS_LOCATION (arg))
+	    loc = EXPR_LOCATION (arg);
+
+	  /* To avoid warning multiple times about non-nul-terminated
+	     strings only warn if their length has been determined
+	     and it's being folded.  */
+	  if (arr)
+	    warn_string_no_nul (loc, NULL_TREE, fndecl, arr);
+
+	  return fold_convert_loc (loc, type, len);
+	}
 
       return NULL_TREE;
     }
@@ -9175,7 +9264,7 @@  fold_builtin_1 (location_t loc, tree fndecl, tree arg0)
       return fold_builtin_classify_type (arg0);
 
     case BUILT_IN_STRLEN:
-      return fold_builtin_strlen (loc, type, arg0);
+      return fold_builtin_strlen (loc, fndecl, type, arg0);
 
     CASE_FLT_FN (BUILT_IN_FABS):
     CASE_FLT_FN_FLOATN_NX (BUILT_IN_FABS):
diff --git a/gcc/builtins.h b/gcc/builtins.h
index 2e0a2f9..73b0b0b 100644
--- a/gcc/builtins.h
+++ b/gcc/builtins.h
@@ -58,7 +58,7 @@  extern bool get_pointer_alignment_1 (tree, unsigned int *,
 				     unsigned HOST_WIDE_INT *);
 extern unsigned int get_pointer_alignment (tree);
 extern unsigned string_length (const void*, unsigned, unsigned);
-extern tree c_strlen (tree, int);
+extern tree c_strlen (tree, int, tree * = NULL);
 extern void expand_builtin_setjmp_setup (rtx, rtx);
 extern void expand_builtin_setjmp_receiver (rtx);
 extern void expand_builtin_update_setjmp_buf (rtx);
@@ -103,7 +103,7 @@  extern bool target_char_cst_p (tree t, char *p);
 
 extern internal_fn associated_internal_fn (tree);
 extern internal_fn replacement_internal_fn (gcall *);
-
+extern void warn_string_no_nul (location_t, tree, tree, tree);
 extern tree max_object_size ();
 
 #endif /* GCC_BUILTINS_H */
diff --git a/gcc/expr.c b/gcc/expr.c
index de6709d..edbd7f8 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -11271,10 +11271,14 @@  is_aligning_offset (const_tree offset, const_tree exp)
 /* Return the tree node if an ARG corresponds to a string constant or zero
    if it doesn't.  If we return nonzero, set *PTR_OFFSET to the (possibly
    non-constant) offset in bytes within the string that ARG is accessing.
+   If NULTERM is non-null, consider valid even sequences of characters that
+   aren't nul-terminated strings.  In that case, set NULTERM if ARG refers
+   to such a sequence and clear it otherwise.
    The type of the offset is sizetype.  */
 
 tree
-string_constant (tree arg, tree *ptr_offset)
+string_constant (tree arg, tree *ptr_offset, bool *nulterm /* = NULL */,
+		 tree *decl /* = NULL */)
 {
   tree array;
   STRIP_NOPS (arg);
@@ -11328,7 +11332,7 @@  string_constant (tree arg, tree *ptr_offset)
 	return NULL_TREE;
 
       tree offset;
-      if (tree str = string_constant (arg0, &offset))
+      if (tree str = string_constant (arg0, &offset, nulterm, decl))
 	{
 	  /* Avoid pointers to arrays (see bug 86622).  */
 	  if (POINTER_TYPE_P (TREE_TYPE (arg))
@@ -11368,6 +11372,10 @@  string_constant (tree arg, tree *ptr_offset)
   if (TREE_CODE (array) == STRING_CST)
     {
       *ptr_offset = fold_convert (sizetype, offset);
+      if (nulterm)
+	*nulterm = true;
+      if (decl)
+	*decl = NULL_TREE;
       return array;
     }
 
@@ -11414,6 +11422,49 @@  string_constant (tree arg, tree *ptr_offset)
   if (!array_size || TREE_CODE (array_size) != INTEGER_CST)
     return NULL_TREE;
 
+  unsigned HOST_WIDE_INT array_elts = tree_to_uhwi (array_size);
+
+  /* When ARG refers to an aggregate (of arrays) try to determine
+     the size of the character array within the aggregate.  */
+  tree ref = arg;
+  tree reftype = TREE_TYPE (arg);
+
+  if (TREE_CODE (ref) == MEM_REF)
+    {
+      ref = TREE_OPERAND (ref, 0);
+      if (TREE_CODE (ref) == ADDR_EXPR)
+	{
+	  ref = TREE_OPERAND (ref, 0);
+	  reftype = TREE_TYPE (ref);
+	}
+    }
+  else
+    while (TREE_CODE (ref) == ARRAY_REF)
+      {
+	reftype = TREE_TYPE (ref);
+	ref = TREE_OPERAND (ref, 0);
+      }
+
+  if (TREE_CODE (ref) == COMPONENT_REF)
+    reftype = TREE_TYPE (ref);
+
+  while (TREE_CODE (reftype) == ARRAY_TYPE)
+    {
+      tree next = TREE_TYPE (reftype);
+      if (TREE_CODE (next) == INTEGER_TYPE)
+	{
+	  if (tree size = TYPE_SIZE_UNIT (reftype))
+	    if (tree_fits_uhwi_p (size))
+	      array_elts = tree_to_uhwi (size);
+	  break;
+	}
+
+      reftype = TREE_TYPE (reftype);
+    }
+
+  if (decl)
+    *decl = array;
+
   /* Avoid returning a string that doesn't fit in the array
      it is stored in, like
      const char a[4] = "abcde";
@@ -11427,7 +11478,9 @@  string_constant (tree arg, tree *ptr_offset)
   unsigned HOST_WIDE_INT length = TREE_STRING_LENGTH (init);
   length = string_length (TREE_STRING_POINTER (init), charsize,
 			  length / charsize);
-  if (compare_tree_int (array_size, length + 1) < 0)
+  if (nulterm)
+    *nulterm = array_elts > length;
+  else if (array_elts <= length)
     return NULL_TREE;
 
   *ptr_offset = offset;
diff --git a/gcc/expr.h b/gcc/expr.h
index cf047d4..e630979 100644
--- a/gcc/expr.h
+++ b/gcc/expr.h
@@ -288,7 +288,7 @@  expand_normal (tree exp)
 
 /* Return the tree node and offset if a given argument corresponds to
    a string constant.  */
-extern tree string_constant (tree, tree *);
+extern tree string_constant (tree, tree *, bool * = NULL, tree * = NULL);
 
 /* Two different ways of generating switch statements.  */
 extern int try_casesi (tree, tree, tree, tree, rtx, rtx, rtx, profile_probability);
diff --git a/gcc/fold-const-call.c b/gcc/fold-const-call.c
index 06a42060..849a443 100644
--- a/gcc/fold-const-call.c
+++ b/gcc/fold-const-call.c
@@ -1199,9 +1199,14 @@  fold_const_call (combined_fn fn, tree type, tree arg)
   switch (fn)
     {
     case CFN_BUILT_IN_STRLEN:
-      if (const char *str = c_getstr (arg))
-	return build_int_cst (type, strlen (str));
-      return NULL_TREE;
+      {
+	bool nulterm;
+	if (const char *str = c_getstr (arg, NULL, &nulterm))
+	  if (nulterm)
+	    return build_int_cst (type, strlen (str));
+
+	return NULL_TREE;
+      }
 
     CASE_CFN_NAN:
     CASE_FLT_FN_FLOATN_NX (CFN_BUILT_IN_NAN):
diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index b318fc77..ecbc38c 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -14577,23 +14577,23 @@  fold_build_pointer_plus_hwi_loc (location_t loc, tree ptr, HOST_WIDE_INT off)
 /* Return a pointer P to a NUL-terminated string representing the sequence
    of constant characters referred to by SRC (or a subsequence of such
    characters within it if SRC is a reference to a string plus some
-   constant offset).  If STRLEN is non-null, store stgrlen(P) in *STRLEN.
-   If STRSIZE is non-null, store in *STRSIZE the size of the array
-   the string is stored in; in that case, even though P points to a NUL
-   terminated string, SRC need not refer to one.  This can happen when
-   SRC refers to a constant character array initialized to all non-NUL
-   values, as in the C declaration: char a[4] = "1234";  */
+   constant offset).  If STRSIZE is non-null, store the size of the string
+   literal in *STRSIZE, including any embedded or terminating nuls.  If
+   If NULLTERM is non-null, set *NULLTERM if the referenced string is
+   guaranteed to contain a terminating NUL.  Otherwise clear it.  This
+   can happen in the case of valid C declarations such as:
+   const char a[3] = "123";  */
 
 const char *
-c_getstr (tree src, unsigned HOST_WIDE_INT *strlen /* = NULL */,
-	  unsigned HOST_WIDE_INT *strsize /* = NULL */)
+c_getstr (tree src, unsigned HOST_WIDE_INT *strsize /* = NULL */,
+	  bool *nulterm /* = NULL */)
 {
   tree offset_node;
 
-  if (strlen)
-    *strlen = 0;
+  if (strsize)
+    *strsize = 0;
 
-  src = string_constant (src, &offset_node);
+  src = string_constant (src, &offset_node, nulterm);
   if (src == 0)
     return NULL;
 
@@ -14606,47 +14606,42 @@  c_getstr (tree src, unsigned HOST_WIDE_INT *strlen /* = NULL */,
 	offset = tree_to_uhwi (offset_node);
     }
 
-  /* STRING_LENGTH is the size of the string literal, including any
-     embedded NULs.  STRING_SIZE is the size of the array the string
+  /* STRING_SIZE is the size of the string literal, including any
+     embedded NULs.  ARRAY_SIZE is the size of the array the string
      literal is stored in.  */
-  unsigned HOST_WIDE_INT string_length = TREE_STRING_LENGTH (src);
-  unsigned HOST_WIDE_INT string_size = string_length;
+  unsigned HOST_WIDE_INT string_size = TREE_STRING_LENGTH (src);
+  unsigned HOST_WIDE_INT array_size = string_size;
   tree type = TREE_TYPE (src);
   if (tree size = TYPE_SIZE_UNIT (type))
     if (tree_fits_shwi_p (size))
-      string_size = tree_to_uhwi (size);
+      array_size = tree_to_uhwi (size);
+
+  const char *string = TREE_STRING_POINTER (src);
 
-  if (strlen)
+  if (strsize)
     {
-      /* Compute and store the length of the substring at OFFSET.
+      /* Compute and store the size of the substring at OFFSET.
 	 All offsets past the initial length refer to null strings.  */
-      if (offset <= string_length)
-	*strlen = string_length - offset;
+      if (offset <= string_size)
+	*strsize = string_size - offset;
       else
-	*strlen = 0;
+	*strsize = 0;
     }
 
-  const char *string = TREE_STRING_POINTER (src);
-
-  if (string_length == 0
-      || offset >= string_size)
+  if (string_size == 0
+      || offset >= array_size)
     return NULL;
 
-  if (strsize)
-    {
-      /* Support even constant character arrays that aren't proper
-	 NUL-terminated strings.  */
-      *strsize = string_size;
-    }
-  else if (string[string_length - 1] != '\0')
+  if (!nulterm && string[string_size - 1] != '\0')
     {
-      /* Support only properly NUL-terminated strings but handle
-	 consecutive strings within the same array, such as the six
-	 substrings in "1\0002\0003".  */
+      /* When NULTERM is null, support only properly nul-terminated
+	 strings but handle consecutive strings within the same array,
+	 such as the six substrings in "1\0002\0003".  Otherwise, let
+	 the caller deal with non-nul-terminated arrays.  */
       return NULL;
     }
 
-  return offset <= string_length ? string + offset : "";
+  return offset <= string_size ? string + offset : "";
 }
 
 /* Given a tree T, compute which bits in T may be nonzero.  */
diff --git a/gcc/fold-const.h b/gcc/fold-const.h
index 1b9ccc0..a58a4a2 100644
--- a/gcc/fold-const.h
+++ b/gcc/fold-const.h
@@ -188,7 +188,7 @@  extern tree const_unop (enum tree_code, tree, tree);
 extern tree const_binop (enum tree_code, tree, tree, tree);
 extern bool negate_mathfn_p (combined_fn);
 extern const char *c_getstr (tree, unsigned HOST_WIDE_INT * = NULL,
-			     unsigned HOST_WIDE_INT * = NULL);
+			     bool * = NULL);
 extern wide_int tree_nonzero_bits (const_tree);
 
 /* Return OFF converted to a pointer offset type suitable as offset for
diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
index c3fa570..9eefb37 100644
--- a/gcc/gimple-fold.c
+++ b/gcc/gimple-fold.c
@@ -1275,11 +1275,13 @@  gimple_fold_builtin_memset (gimple_stmt_iterator *gsi, tree c, tree len)
    Set *FLEXP to true if the range of the string lengths has been
    obtained from the upper bound of an array at the end of a struct.
    Such an array may hold a string that's longer than its upper bound
-   due to it being used as a poor-man's flexible array member.  */
+   due to it being used as a poor-man's flexible array member.
+   Clear *NULTERM if ARG refers to a constant array that is known
+   not be nul-terminated.  */
 
 static bool
 get_range_strlen (tree arg, tree length[2], bitmap *visited, int type,
-		  int fuzzy, bool *flexp)
+		  int fuzzy, bool *flexp, bool *nulterm)
 {
   tree var, val = NULL_TREE;
   gimple *def_stmt;
@@ -1301,7 +1303,8 @@  get_range_strlen (tree arg, tree length[2], bitmap *visited, int type,
 	      if (TREE_CODE (aop0) == INDIRECT_REF
 		  && TREE_CODE (TREE_OPERAND (aop0, 0)) == SSA_NAME)
 		return get_range_strlen (TREE_OPERAND (aop0, 0),
-					 length, visited, type, fuzzy, flexp);
+					 length, visited, type, fuzzy, flexp,
+					 nulterm);
 	    }
 	  else if (TREE_CODE (TREE_OPERAND (op, 0)) == COMPONENT_REF && fuzzy)
 	    {
@@ -1329,13 +1332,18 @@  get_range_strlen (tree arg, tree length[2], bitmap *visited, int type,
 	    return false;
 	}
       else
-	val = c_strlen (arg, 1);
+	{
+	  tree arr;
+	  val = c_strlen (arg, 1, &arr);
+	  if (val && arr)
+	    *nulterm = false;
+	}
 
       if (!val && fuzzy)
 	{
 	  if (TREE_CODE (arg) == ADDR_EXPR)
 	    return get_range_strlen (TREE_OPERAND (arg, 0), length,
-				     visited, type, fuzzy, flexp);
+				     visited, type, fuzzy, flexp, nulterm);
 
 	  if (TREE_CODE (arg) == ARRAY_REF)
 	    {
@@ -1477,7 +1485,8 @@  get_range_strlen (tree arg, tree length[2], bitmap *visited, int type,
             || gimple_assign_unary_nop_p (def_stmt))
           {
             tree rhs = gimple_assign_rhs1 (def_stmt);
-	    return get_range_strlen (rhs, length, visited, type, fuzzy, flexp);
+	    return get_range_strlen (rhs, length, visited, type, fuzzy, flexp,
+				     nulterm);
           }
 	else if (gimple_assign_rhs_code (def_stmt) == COND_EXPR)
 	  {
@@ -1486,7 +1495,7 @@  get_range_strlen (tree arg, tree length[2], bitmap *visited, int type,
 
 	    for (unsigned int i = 0; i < 2; i++)
 	      if (!get_range_strlen (ops[i], length, visited, type, fuzzy,
-				     flexp))
+				     flexp, nulterm))
 		{
 		  if (fuzzy == 2)
 		    *maxlen = build_all_ones_cst (size_type_node);
@@ -1513,7 +1522,8 @@  get_range_strlen (tree arg, tree length[2], bitmap *visited, int type,
             if (arg == gimple_phi_result (def_stmt))
               continue;
 
-	    if (!get_range_strlen (arg, length, visited, type, fuzzy, flexp))
+	    if (!get_range_strlen (arg, length, visited, type, fuzzy, flexp,
+				   nulterm))
 	      {
 		if (fuzzy == 2)
 		  *maxlen = build_all_ones_cst (size_type_node);
@@ -1545,19 +1555,27 @@  get_range_strlen (tree arg, tree length[2], bitmap *visited, int type,
    and false if PHIs and COND_EXPRs are to be handled optimistically,
    if we can determine string length minimum and maximum; it will use
    the minimum from the ones where it can be determined.
-   STRICT false should be only used for warning code.  */
+   STRICT false should be only used for warning code.
+   When non-null, clear *NULTERM if ARG refers to a constant array
+   that is known not be nul-terminated.  Otherwise set it to true.  */
 
 bool
-get_range_strlen (tree arg, tree minmaxlen[2], bool strict)
+get_range_strlen (tree arg, tree minmaxlen[2], bool strict /* = false */,
+		  bool *nulterm /* = NULL */)
 {
   bitmap visited = NULL;
 
   minmaxlen[0] = NULL_TREE;
   minmaxlen[1] = NULL_TREE;
 
+  bool nultermbuf;
+  if (!nulterm)
+    nulterm = &nultermbuf;
+  *nulterm = true;
+
   bool flexarray = false;
   if (!get_range_strlen (arg, minmaxlen, &visited, 1, strict ? 1 : 2,
-			 &flexarray))
+			 &flexarray, nulterm))
     {
       minmaxlen[0] = NULL_TREE;
       minmaxlen[1] = NULL_TREE;
@@ -1576,7 +1594,7 @@  get_maxval_strlen (tree arg, int type)
   tree len[2] = { NULL_TREE, NULL_TREE };
 
   bool dummy;
-  if (!get_range_strlen (arg, len, &visited, type, 0, &dummy))
+  if (!get_range_strlen (arg, len, &visited, type, 0, &dummy, NULL))
     len[1] = NULL_TREE;
   if (visited)
     BITMAP_FREE (visited);
@@ -3496,12 +3514,14 @@  static bool
 gimple_fold_builtin_strlen (gimple_stmt_iterator *gsi)
 {
   gimple *stmt = gsi_stmt (*gsi);
+  tree arg = gimple_call_arg (stmt, 0);
 
   wide_int minlen;
   wide_int maxlen;
 
   tree lenrange[2];
-  if (!get_range_strlen (gimple_call_arg (stmt, 0), lenrange, true)
+  bool nulterm;
+  if (!get_range_strlen (arg, lenrange, true, &nulterm)
       && lenrange[0] && TREE_CODE (lenrange[0]) == INTEGER_CST
       && lenrange[1] && TREE_CODE (lenrange[1]) == INTEGER_CST)
     {
@@ -3523,6 +3543,10 @@  gimple_fold_builtin_strlen (gimple_stmt_iterator *gsi)
 
   if (minlen == maxlen)
     {
+      if (!nulterm)
+	warn_string_no_nul (gimple_location (stmt), NULL_TREE,
+			    gimple_call_fndecl (stmt), arg);
+
       lenrange[0] = force_gimple_operand_gsi (gsi, lenrange[0], true, NULL,
 					      true, GSI_SAME_STMT);
       replace_call_with_value (gsi, lenrange[0]);
diff --git a/gcc/gimple-fold.h b/gcc/gimple-fold.h
index 04e9bfa..fe11728 100644
--- a/gcc/gimple-fold.h
+++ b/gcc/gimple-fold.h
@@ -25,7 +25,7 @@  along with GCC; see the file COPYING3.  If not see
 extern tree create_tmp_reg_or_ssa_name (tree, gimple *stmt = NULL);
 extern tree canonicalize_constructor_val (tree, tree);
 extern tree get_symbol_constant_value (tree);
-extern bool get_range_strlen (tree, tree[2], bool = false);
+extern bool get_range_strlen (tree, tree[2], bool = false, bool * = NULL);
 extern tree get_maxval_strlen (tree, int);
 extern void gimplify_and_update_call_from_tree (gimple_stmt_iterator *, tree);
 extern bool fold_stmt (gimple_stmt_iterator *);
diff --git a/gcc/testsuite/gcc.c-torture/execute/memchr-1.c b/gcc/testsuite/gcc.c-torture/execute/memchr-1.c
new file mode 100644
index 0000000..2614bee
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/memchr-1.c
@@ -0,0 +1,43 @@ 
+/* PR tree-optimization/86711 - wrong folding of memchr
+
+   Verify that memchr() of arrays initialized with string literals
+   where the nul doesn't fit in the array doesn't find the nul.  */
+
+extern void* memchr (const void*, int, __SIZE_TYPE__);
+
+#define A(expr) \
+  ((expr) \
+  ? (void)0							\
+  : (__builtin_printf ("assertion failed on line %i: %s\n",	\
+		       __LINE__, #expr), \
+     __builtin_abort ()))
+
+static const char a1[4] = "1234";
+static const char a2[2][4] = { "1234", "5678" };
+
+static const char a3[2][4] = { "1234", "567" };
+
+int main ()
+{
+  volatile int i = 0;
+
+  A (memchr (a1, 0, sizeof a1) == 0);
+  A (memchr (a1 + 1, 0, sizeof a1 - 1) == 0);
+  A (memchr (a1 + 3, 0, sizeof a1 - 3) == 0);
+  A (memchr (a1 + i, 0, sizeof a1) == 0);
+
+  A (memchr (a2, 0, sizeof a2) == 0);
+
+  A (memchr (a2[0], 0, sizeof a2[0]) == 0);
+  A (memchr (a2[1], 0, sizeof a2[1]) == 0);
+
+  A (memchr (a2[0] + 1, 0, sizeof a2[0] - 1) == 0);
+  A (memchr (a2[1] + 2, 0, sizeof a2[1] - 2) == 0);
+  A (memchr (a2[1] + 3, 0, sizeof a2[1] - 3) == 0);
+
+  A (memchr (a2[i], 0, sizeof a2[i]) == 0);
+  A (memchr (a2[i] + 1, 0, sizeof a2[i] - 1) == 0);
+
+  /* This one must find it.  */
+  A (memchr (a3, 0, sizeof a3) == &a3[1][3]);
+}
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr86714.c b/gcc/testsuite/gcc.c-torture/execute/pr86714.c
new file mode 100644
index 0000000..3ad6852
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr86714.c
@@ -0,0 +1,26 @@ 
+/* PR tree-optimization/86714 - tree-ssa-forwprop.c confused by too
+   long initializer
+
+   The excessively long initializer for a[0] is undefined but this
+   test verifies that the excess elements are not considered a part
+   of the value of the array as a matter of QoI.  */
+
+const char a[2][3] = { "1234", "xyz" };
+char b[6];
+
+void *pb = b;
+
+int main ()
+{
+   __builtin_memcpy (b, a, 4);
+   __builtin_memset (b + 4, 'a', 2);
+
+   if (b[0] != '1' || b[1] != '2' || b[2] != '3'
+       || b[3] != 'x' || b[4] != 'a' || b[5] != 'a')
+     __builtin_abort ();
+
+   if (__builtin_memcmp (pb, "123xaa", 6))
+     __builtin_abort ();
+
+   return 0;
+}
diff --git a/gcc/testsuite/gcc.dg/warn-strlen-no-nul.c b/gcc/testsuite/gcc.dg/warn-strlen-no-nul.c
new file mode 100644
index 0000000..838528f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/warn-strlen-no-nul.c
@@ -0,0 +1,239 @@ 
+/* PR tree-optimization/86552 - missing warning for reading past the end
+   of non-string arrays
+   { dg-do compile }
+   { dg-options "-O2 -Wall -ftrack-macro-expansion=0" } */
+
+extern __SIZE_TYPE__ strlen (const char*);
+
+const char a[5] = "12345";   /* { dg-message "declared here" } */
+
+int i0 = 0;
+int i1 = 1;
+
+void sink (int, ...);
+
+#define CONCAT(a, b)   a ## b
+#define CAT(a, b)      CONCAT(a, b)
+
+#define T(str)					\
+  __attribute__ ((noipa))			\
+  void CAT (test_, __LINE__) (void) {		\
+    sink (strlen (str));			\
+  } typedef void dummy_type
+
+T (a);                /* { dg-warning "argument missing terminating nul" }  */
+T (&a[0]);            /* { dg-warning "nul" }  */
+T (&a[0] + 1);        /* { dg-warning "nul" }  */
+T (&a[1]);            /* { dg-warning "nul" }  */
+T (&a[i0]);           /* { dg-warning "nul" }  */
+T (&a[i0] + 1);       /* { dg-warning "nul" }  */
+
+
+const char b[][5] = { /* { dg-message "declared here" } */
+  "12", "123", "1234", "54321"
+};
+
+T (b[0]);
+T (b[1]);
+T (b[2]);
+T (b[3]);             /* { dg-warning "nul" }  */
+T (b[i0]);
+
+T (&b[2][1]);
+T (&b[2][1] + 1);
+T (&b[2][i0]);
+T (&b[2][1] + i0);
+
+T (&b[3][1]);         /* { dg-warning "nul" }  */
+T (&b[3][1] + 1);     /* { dg-warning "nul" }  */
+T (&b[3][i0]);        /* { dg-warning "nul" }  */
+T (&b[3][1] + i0);    /* { dg-warning "nul" }  */
+T (&b[3][i0] + i1);   /* { dg-warning "nul" }  */
+
+T (i0 ? "" : b[0]);
+T (i0 ? "" : b[1]);
+T (i0 ? "" : b[2]);
+T (i0 ? "" : b[3]);               /* { dg-warning "nul" }  */
+T (i0 ? b[0] : "");
+T (i0 ? b[1] : "");
+T (i0 ? b[2] : "");
+T (i0 ? b[3] : "");               /* { dg-warning "nul" }  */
+
+T (i0 ? "1234" : b[3]);           /* { dg-warning "nul" }  */
+T (i0 ? b[3] : "1234");           /* { dg-warning "nul" }  */
+
+T (i0 ? a : b[3]);                /* { dg-warning "nul" }  */
+T (i0 ? b[0] : b[2]);
+T (i0 ? b[2] : b[3]);             /* { dg-warning "nul" }  */
+T (i0 ? b[3] : b[2]);             /* { dg-warning "nul" }  */
+
+T (i0 ? b[0] : &b[3][0] + 1);     /* { dg-warning "nul" }  */
+T (i0 ? b[1] : &b[3][1] + i0);    /* { dg-warning "nul" }  */
+
+/* It's possible to detect the missing nul in the following two
+   expressions but GCC doesn't do it yet.  */
+T (i0 ? &b[3][1] + i0 : b[2]);    /* { dg-warning "nul" "bug" { xfail *-*-* } }  */
+T (i0 ? &b[3][i0] : &b[3][i1]);   /* { dg-warning "nul" "bug" { xfail *-*-* } }  */
+
+
+struct A { char a[5], b[5]; };
+
+const struct A s = { "1234", "12345" };
+
+T (s.a);
+T (&s.a[0]);
+T (&s.a[0] + 1);
+T (&s.a[0] + i0);
+T (&s.a[1]);
+T (&s.a[1] + 1);
+T (&s.a[1] + i0);
+
+T (s.b);              /* { dg-warning "nul" }  */
+T (&s.b[0]);          /* { dg-warning "nul" }  */
+T (&s.b[0] + 1);      /* { dg-warning "nul" }  */
+T (&s.b[0] + i0);     /* { dg-warning "nul" }  */
+T (&s.b[1]);          /* { dg-warning "nul" }  */
+T (&s.b[1] + 1);      /* { dg-warning "nul" }  */
+T (&s.b[1] + i0);     /* { dg-warning "nul" }  */
+
+struct B { struct A a[2]; };
+
+const struct B ba[] = {
+  { { { "123", "12345" }, { "12345", "123" } } },
+  { { { "12345", "123" }, { "123", "12345" } } },
+  { { { "1", "12" },      { "123", "1234" } } },
+  { { { "123", "1234" },  { "12345", "12" } } }
+};
+
+T (ba[0].a[0].a);
+T (&ba[0].a[0].a[0]);
+T (&ba[0].a[0].a[0] + 1);
+T (&ba[0].a[0].a[0] + i0);
+T (&ba[0].a[0].a[1]);
+T (&ba[0].a[0].a[1] + 1);
+T (&ba[0].a[0].a[1] + i0);
+
+T (ba[0].a[0].b);           /* { dg-warning "nul" }  */
+T (&ba[0].a[0].b[0]);       /* { dg-warning "nul" }  */
+T (&ba[0].a[0].b[0] + 1);   /* { dg-warning "nul" }  */
+T (&ba[0].a[0].b[0] + i0);  /* { dg-warning "nul" }  */
+T (&ba[0].a[0].b[1]);       /* { dg-warning "nul" }  */
+T (&ba[0].a[0].b[1] + 1);   /* { dg-warning "nul" }  */
+T (&ba[0].a[0].b[1] + i0);  /* { dg-warning "nul" }  */
+
+T (ba[0].a[1].a);           /* { dg-warning "nul" }  */
+T (&ba[0].a[1].a[0]);       /* { dg-warning "nul" }  */
+T (&ba[0].a[1].a[0] + 1);   /* { dg-warning "nul" }  */
+T (&ba[0].a[1].a[0] + i0);  /* { dg-warning "nul" }  */
+T (&ba[0].a[1].a[1]);       /* { dg-warning "nul" }  */
+T (&ba[0].a[1].a[1] + 1);   /* { dg-warning "nul" }  */
+T (&ba[0].a[1].a[1] + i0);  /* { dg-warning "nul" }  */
+
+T (ba[0].a[1].b);
+T (&ba[0].a[1].b[0]);
+T (&ba[0].a[1].b[0] + 1);
+T (&ba[0].a[1].b[0] + i0);
+T (&ba[0].a[1].b[1]);
+T (&ba[0].a[1].b[1] + 1);
+T (&ba[0].a[1].b[1] + i0);
+
+
+T (ba[1].a[0].a);           /* { dg-warning "nul" }  */
+T (&ba[1].a[0].a[0]);       /* { dg-warning "nul" }  */
+T (&ba[1].a[0].a[0] + 1);   /* { dg-warning "nul" }  */
+T (&ba[1].a[0].a[0] + i0);  /* { dg-warning "nul" }  */
+T (&ba[1].a[0].a[1]);       /* { dg-warning "nul" }  */
+T (&ba[1].a[0].a[1] + 1);   /* { dg-warning "nul" }  */
+T (&ba[1].a[0].a[1] + i0);  /* { dg-warning "nul" }  */
+
+T (ba[1].a[0].b);
+T (&ba[1].a[0].b[0]);
+T (&ba[1].a[0].b[0] + 1);
+T (&ba[1].a[0].b[0] + i0);
+T (&ba[1].a[0].b[1]);
+T (&ba[1].a[0].b[1] + 1);
+T (&ba[1].a[0].b[1] + i0);
+
+T (ba[1].a[1].a);
+T (&ba[1].a[1].a[0]);
+T (&ba[1].a[1].a[0] + 1);
+T (&ba[1].a[1].a[0] + i0);
+T (&ba[1].a[1].a[1]);
+T (&ba[1].a[1].a[1] + 1);
+T (&ba[1].a[1].a[1] + i0);
+
+T (ba[1].a[1].b);           /* { dg-warning "nul" }  */
+T (&ba[1].a[1].b[0]);       /* { dg-warning "nul" }  */
+T (&ba[1].a[1].b[0] + 1);   /* { dg-warning "nul" }  */
+T (&ba[1].a[1].b[0] + i0);  /* { dg-warning "nul" }  */
+T (&ba[1].a[1].b[1]);       /* { dg-warning "nul" }  */
+T (&ba[1].a[1].b[1] + 1);   /* { dg-warning "nul" }  */
+T (&ba[1].a[1].b[1] + i0);  /* { dg-warning "nul" }  */
+
+
+T (ba[2].a[0].a);
+T (&ba[2].a[0].a[0]);
+T (&ba[2].a[0].a[0] + 1);
+T (&ba[2].a[0].a[0] + i0);
+T (&ba[2].a[0].a[1]);
+T (&ba[2].a[0].a[1] + 1);
+T (&ba[2].a[0].a[1] + i0);
+
+T (ba[2].a[0].b);
+T (&ba[2].a[0].b[0]);
+T (&ba[2].a[0].b[0] + 1);
+T (&ba[2].a[0].b[0] + i0);
+T (&ba[2].a[0].b[1]);
+T (&ba[2].a[0].b[1] + 1);
+T (&ba[2].a[0].b[1] + i0);
+
+T (ba[2].a[1].a);
+T (&ba[2].a[1].a[0]);
+T (&ba[2].a[1].a[0] + 1);
+T (&ba[2].a[1].a[0] + i0);
+T (&ba[2].a[1].a[1]);
+T (&ba[2].a[1].a[1] + 1);
+T (&ba[2].a[1].a[1] + i0);
+
+
+T (ba[3].a[0].a);
+T (&ba[3].a[0].a[0]);
+T (&ba[3].a[0].a[0] + 1);
+T (&ba[3].a[0].a[0] + i0);
+T (&ba[3].a[0].a[1]);
+T (&ba[3].a[0].a[1] + 1);
+T (&ba[3].a[0].a[1] + i0);
+
+T (ba[3].a[0].b);
+T (&ba[3].a[0].b[0]);
+T (&ba[3].a[0].b[0] + 1);
+T (&ba[3].a[0].b[0] + i0);
+T (&ba[3].a[0].b[1]);
+T (&ba[3].a[0].b[1] + 1);
+T (&ba[3].a[0].b[1] + i0);
+
+T (ba[3].a[1].a);           /* { dg-warning "nul" }  */
+T (&ba[3].a[1].a[0]);	    /* { dg-warning "nul" }  */
+T (&ba[3].a[1].a[0] + 1);   /* { dg-warning "nul" }  */
+T (&ba[3].a[1].a[0] + i0);  /* { dg-warning "nul" }  */
+T (&ba[3].a[1].a[1]);	    /* { dg-warning "nul" }  */
+T (&ba[3].a[1].a[1] + 1);   /* { dg-warning "nul" }  */
+T (&ba[3].a[1].a[1] + i0);  /* { dg-warning "nul" }  */
+
+T (ba[3].a[1].b);
+T (&ba[3].a[1].b[0]);	
+T (&ba[3].a[1].b[0] + 1);
+T (&ba[3].a[1].b[0] + i0);
+T (&ba[3].a[1].b[1]);	
+T (&ba[3].a[1].b[1] + 1);
+T (&ba[3].a[1].b[1] + i0);
+
+
+T (i0 ? ba[0].a[0].a : ba[0].a[0].b);           /* { dg-warning "nul" }  */
+T (i0 ? ba[0].a[0].a : ba[0].a[0].b);           /* { dg-warning "nul" }  */
+
+T (i0 ? &ba[0].a[0].a[0] : &ba[3].a[1].a[0]);   /* { dg-warning "nul" }  */
+T (i0 ? &ba[3].a[1].a[1] :  ba[0].a[0].a);      /* { dg-warning "nul" }  */
+
+T (i0 ? ba[0].a[0].a : ba[0].a[1].b);
+T (i0 ? ba[0].a[1].b : ba[0].a[0].a);