handle multibyte stores larger than char in strlen (PR 91183, 86888)
diff mbox series

Message ID 753cb634-2790-2aec-99bb-9126f88d3ca7@gmail.com
State New
Headers show
Series
  • handle multibyte stores larger than char in strlen (PR 91183, 86888)
Related show

Commit Message

Martin Sebor July 19, 2019, 10:04 p.m. UTC
On targets with permissive alignment requirements GCC sometimes
lowers stores of short (between two and 16 bytes), power-of-two
char sequences  to single integer stores of the corresponding
width.  This happens for sequences of ordinary character stores
as well as for some  calls to memcpy.

However, the strlen pass is only prepared to handle character
stores and not those of wider integers.  As a result, the strlen
optimization tends to get defeated in cases when it could benefit
the most: very short strings.  I counted 1544 instances where
the strlen optimization was disabled in a GCC build on x86_64
due to this sort of early store merging, and over two thousand
in a build of the Linux kernel.

In addition, -Wstringop-overflow only considers calls to string
functions and is ineffective against past-the-end accesses by
these merged multibyte stores.

To improve the effectiveness of both the optimization as well
as the warning the attached patch enhances the strlen pass to
consider these wide accesses.  Since the infrastructure for doing
this is already in place (strlen can compute multibyte accesses
via MEM_REFs of character arrays), the enhancement isn't very
intrusive.  It relies on the native_encode_expr function to
determine the encoding of an expression and its "length".

I tested the patch on x86_64.  I expect the tests the patch
adds will need some adjustment for big-endian and strictly
aligned targets.

Martin

Comments

Jeff Law July 22, 2019, 9:33 p.m. UTC | #1
On 7/19/19 4:04 PM, Martin Sebor wrote:
> On targets with permissive alignment requirements GCC sometimes
> lowers stores of short (between two and 16 bytes), power-of-two
> char sequences  to single integer stores of the corresponding
> width.  This happens for sequences of ordinary character stores
> as well as for some  calls to memcpy.
> 
> However, the strlen pass is only prepared to handle character
> stores and not those of wider integers.  As a result, the strlen
> optimization tends to get defeated in cases when it could benefit
> the most: very short strings.  I counted 1544 instances where
> the strlen optimization was disabled in a GCC build on x86_64
> due to this sort of early store merging, and over two thousand
> in a build of the Linux kernel.
> 
> In addition, -Wstringop-overflow only considers calls to string
> functions and is ineffective against past-the-end accesses by
> these merged multibyte stores.
> 
> To improve the effectiveness of both the optimization as well
> as the warning the attached patch enhances the strlen pass to
> consider these wide accesses.  Since the infrastructure for doing
> this is already in place (strlen can compute multibyte accesses
> via MEM_REFs of character arrays), the enhancement isn't very
> intrusive.  It relies on the native_encode_expr function to
> determine the encoding of an expression and its "length".
> 
> I tested the patch on x86_64.  I expect the tests the patch
> adds will need some adjustment for big-endian and strictly
> aligned targets.
> 
> Martin
> 
> gcc-91183.diff
> 
> PR tree-optimization/91183 - strlen of a strcpy result with a conditional source not folded
> PR tree-optimization/86688 - missing -Wstringop-overflow using a non-string local array in strnlen with excessive bound
> 
> gcc/ChangeLog:
> 
> 	PR tree-optimization/91183
> 	PR tree-optimization/86688
> 	* builtins.c (compute_objsize): Handle MEM_REF.
> 	* tree-ssa-strlen.c (class ssa_name_limit_t): New.
> 	(get_min_string_length): Remove.
> 	(count_nonzero_bytes): New function.
> 	(handle_char_store): Rename...
> 	(handle_store): to this.  Handle multibyte stores via integer types.
> 	(strlen_check_and_optimize_stmt): Adjust conditional and the called
> 	function name.
> 
> gcc/testsuite/ChangeLog:
> 
> 	PR tree-optimization/91183
> 	PR tree-optimization/86688
> 	* gcc.dg/attr-nonstring-2.c: Remove xfails.
> 	* gcc.dg/strlenopt-70.c: New test.
> 	* gcc.dg/strlenopt-71.c: New test.
> 	* gcc.dg/strlenopt-72.c: New test.
> 	* gcc.dg/strlenopt-8.c: Remove xfails.
> 


> +
>    if (TREE_CODE (dest) != ADDR_EXPR)
>      return NULL_TREE;
>  
> diff --git a/gcc/tree-ssa-strlen.c b/gcc/tree-ssa-strlen.c
> index 88b6bd7869e..ca1aca3ed9e 100644
> --- a/gcc/tree-ssa-strlen.c
> +++ b/gcc/tree-ssa-strlen.c

> +
> +/* If the SSA_NAME has already been "seen" return a positive value.
> +   Otherwise add it to VISITED.  If the SSA_NAME limit has been
> +   reached, return a negative value.  Otherwise return zero.  */
> +
> +int ssa_name_limit_t::next_ssa_name (tree ssa_name)
Nit.  Return type on a different line than the function's name.  The
point behind that convention is to make it easier to search for a
function's definition.


> +/* Determine the minimum and maximum number of leading non-zero bytes
> +   in the representation of EXP and set LENRANGE[0] and LENRANGE[1]
> +   to each.  Set LENRANGE[2] to the total number of bytes in
> +   the representation.  Set *NULTREM if the representation contains
> +   a zero byte, and set *ALLNUL if all the bytes are zero.  Avoid
> +   recursing deeper than the limits in SNLIM allow.  Return true
> +   on success and false otherwise.  */
S/NULTREM/NULTERM/






>  
>    if (si != NULL)
>      {
> -      int cmp = compare_nonzero_chars (si, offset);
> -      gcc_assert (offset == 0 || cmp >= 0);
> -      if (storing_all_zeros_p && cmp == 0 && si->full_string_p)
> +      /* Set to 1 if the string described by SI is being written into
> +	 before the terminating nul (if it has one), to zero if the nul
> +	 is being overwritten but not beyond, or negative otherwise.  */
> +      int store_b4_nul[2];
Umm "store_b4_nul"?  Are you really trying to save 3 characters in the
name by writing it this way?  :-)

You've got two entries in the array, but the comment reads as if there's
just one element.  What is the difference between the two array elements?

I didn't see anything terribly concerning so far, but I do want to look
at handle_store again after the comment is fixed so that I'm sure I'm
interpreting things correctly.

Jeff
Martin Sebor July 22, 2019, 11:17 p.m. UTC | #2
On 7/22/19 3:33 PM, Jeff Law wrote:
> On 7/19/19 4:04 PM, Martin Sebor wrote:
>> On targets with permissive alignment requirements GCC sometimes
>> lowers stores of short (between two and 16 bytes), power-of-two
>> char sequences  to single integer stores of the corresponding
>> width.  This happens for sequences of ordinary character stores
>> as well as for some  calls to memcpy.
>>
>> However, the strlen pass is only prepared to handle character
>> stores and not those of wider integers.  As a result, the strlen
>> optimization tends to get defeated in cases when it could benefit
>> the most: very short strings.  I counted 1544 instances where
>> the strlen optimization was disabled in a GCC build on x86_64
>> due to this sort of early store merging, and over two thousand
>> in a build of the Linux kernel.
>>
>> In addition, -Wstringop-overflow only considers calls to string
>> functions and is ineffective against past-the-end accesses by
>> these merged multibyte stores.
>>
>> To improve the effectiveness of both the optimization as well
>> as the warning the attached patch enhances the strlen pass to
>> consider these wide accesses.  Since the infrastructure for doing
>> this is already in place (strlen can compute multibyte accesses
>> via MEM_REFs of character arrays), the enhancement isn't very
>> intrusive.  It relies on the native_encode_expr function to
>> determine the encoding of an expression and its "length".
>>
>> I tested the patch on x86_64.  I expect the tests the patch
>> adds will need some adjustment for big-endian and strictly
>> aligned targets.
>>
>> Martin
>>
>> gcc-91183.diff
>>
>> PR tree-optimization/91183 - strlen of a strcpy result with a conditional source not folded
>> PR tree-optimization/86688 - missing -Wstringop-overflow using a non-string local array in strnlen with excessive bound
>>
>> gcc/ChangeLog:
>>
>> 	PR tree-optimization/91183
>> 	PR tree-optimization/86688
>> 	* builtins.c (compute_objsize): Handle MEM_REF.
>> 	* tree-ssa-strlen.c (class ssa_name_limit_t): New.
>> 	(get_min_string_length): Remove.
>> 	(count_nonzero_bytes): New function.
>> 	(handle_char_store): Rename...
>> 	(handle_store): to this.  Handle multibyte stores via integer types.
>> 	(strlen_check_and_optimize_stmt): Adjust conditional and the called
>> 	function name.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 	PR tree-optimization/91183
>> 	PR tree-optimization/86688
>> 	* gcc.dg/attr-nonstring-2.c: Remove xfails.
>> 	* gcc.dg/strlenopt-70.c: New test.
>> 	* gcc.dg/strlenopt-71.c: New test.
>> 	* gcc.dg/strlenopt-72.c: New test.
>> 	* gcc.dg/strlenopt-8.c: Remove xfails.
>>
> 
> 
>> +
>>     if (TREE_CODE (dest) != ADDR_EXPR)
>>       return NULL_TREE;
>>   
>> diff --git a/gcc/tree-ssa-strlen.c b/gcc/tree-ssa-strlen.c
>> index 88b6bd7869e..ca1aca3ed9e 100644
>> --- a/gcc/tree-ssa-strlen.c
>> +++ b/gcc/tree-ssa-strlen.c
> 
>> +
>> +/* If the SSA_NAME has already been "seen" return a positive value.
>> +   Otherwise add it to VISITED.  If the SSA_NAME limit has been
>> +   reached, return a negative value.  Otherwise return zero.  */
>> +
>> +int ssa_name_limit_t::next_ssa_name (tree ssa_name)
> Nit.  Return type on a different line than the function's name.  The
> point behind that convention is to make it easier to search for a
> function's definition.
> 
> 
>> +/* Determine the minimum and maximum number of leading non-zero bytes
>> +   in the representation of EXP and set LENRANGE[0] and LENRANGE[1]
>> +   to each.  Set LENRANGE[2] to the total number of bytes in
>> +   the representation.  Set *NULTREM if the representation contains
>> +   a zero byte, and set *ALLNUL if all the bytes are zero.  Avoid
>> +   recursing deeper than the limits in SNLIM allow.  Return true
>> +   on success and false otherwise.  */
> S/NULTREM/NULTERM/
> 
> 
> 
> 
> 
> 
>>   
>>     if (si != NULL)
>>       {
>> -      int cmp = compare_nonzero_chars (si, offset);
>> -      gcc_assert (offset == 0 || cmp >= 0);
>> -      if (storing_all_zeros_p && cmp == 0 && si->full_string_p)
>> +      /* Set to 1 if the string described by SI is being written into
>> +	 before the terminating nul (if it has one), to zero if the nul
>> +	 is being overwritten but not beyond, or negative otherwise.  */
>> +      int store_b4_nul[2];
> Umm "store_b4_nul"?  Are you really trying to save 3 characters in the
> name by writing it this way?  :-)

I'm actually saving four characters over "store_before_nul" ;-)

It's just a name I picked because I like it.  Would you prefer to
go back to the original 'cmp' instead?  It doesn't get much shorter
than that, or less descriptive, especially when there is no comment
explaining what it's for.  (FWIW, I renamed it because I found myself
going back to the description of compare_nonzero_chars to remember
what cmp actually meant.)

> You've got two entries in the array, but the comment reads as if there's
> just one element.  What is the difference between the two array elements?

Since handle_store deals with sequences of one or more bytes some
of the optimizations it implements(*) need to consider both where
the first byte of the sequence is stored and where the last one is.
The first element of this array is for the first byte and the second
one is for the last character.  The comment a few lines down is meant
to make the distinction clear but we can expand the comment above
the variable even more.

	  /* The offset of the last stored byte.  */
	  unsigned HOST_WIDE_INT endoff = offset + lenrange[2] - 1;

(lenrange[2] is the size of the store.)

[*] E.g., the one we discussed not so long ago (PR90989) or the one
that removes a store of '\0' over the terminating '\0'.

> 
> I didn't see anything terribly concerning so far, but I do want to look
> at handle_store again after the comment is fixed so that I'm sure I'm
> interpreting things correctly.

Does this help?

   /* The corresponding element is set to 1 if the first and last
      element, respectively, of the sequence of characters being
      written over the string described by SI ends before
      the terminating nul (if it has one), to zero if the nul is
      being overwritten but not beyond, or negative otherwise.  */

Martin
Jeff Law July 24, 2019, 5:06 p.m. UTC | #3
On 7/22/19 5:17 PM, Martin Sebor wrote:

>> Umm "store_b4_nul"?  Are you really trying to save 3 characters in the
>> name by writing it this way?  :-)
> 
> I'm actually saving four characters over "store_before_nul" ;-)
:-)

> 
> It's just a name I picked because I like it.  Would you prefer to
> go back to the original 'cmp' instead?  It doesn't get much shorter
> than that, or less descriptive, especially when there is no comment
> explaining what it's for.  (FWIW, I renamed it because I found myself
> going back to the description of compare_nonzero_chars to remember
> what cmp actually meant.)
Fair enough.  Though "b4" reads like it belongs in a text message to me.
  I don't want to get nitty over this.  Will s/b4/before/ work for you?

If you wanted to add a comment before the variable, that would be good
as well, particularly since we don't have a good name.


> 
>> You've got two entries in the array, but the comment reads as if there's
>> just one element.  What is the difference between the two array elements?
> 
> Since handle_store deals with sequences of one or more bytes some
> of the optimizations it implements(*) need to consider both where
> the first byte of the sequence is stored and where the last one is.
> The first element of this array is for the first byte and the second
> one is for the last character.  The comment a few lines down is meant
> to make the distinction clear but we can expand the comment above
> the variable even more.
I think  my brain locked up with the "b4".  Maybe it went into a mode
where I'm trying to parse texts from my teenager which is clearly
incompatible with reading code. :-)

> 
>       /* The offset of the last stored byte.  */
>       unsigned HOST_WIDE_INT endoff = offset + lenrange[2] - 1;
> 
> (lenrange[2] is the size of the store.)
> 
> [*] E.g., the one we discussed not so long ago (PR90989) or the one
> that removes a store of '\0' over the terminating '\0'.
> 
>>
>> I didn't see anything terribly concerning so far, but I do want to look
>> at handle_store again after the comment is fixed so that I'm sure I'm
>> interpreting things correctly.
> 
> Does this help?
> 
>   /* The corresponding element is set to 1 if the first and last
>      element, respectively, of the sequence of characters being
>      written over the string described by SI ends before
>      the terminating nul (if it has one), to zero if the nul is
>      being overwritten but not beyond, or negative otherwise.  */
Yea.  I also note a /* */ empty comment in handle_store, you probably
wanted to write something there :-)

OK with the nit fixes noted earlier, variable name fix and comment fixes.


jeff
Martin Sebor July 25, 2019, 12:31 a.m. UTC | #4
On 7/24/19 11:06 AM, Jeff Law wrote:
> On 7/22/19 5:17 PM, Martin Sebor wrote:
> 
>>> Umm "store_b4_nul"?  Are you really trying to save 3 characters in the
>>> name by writing it this way?  :-)
>>
>> I'm actually saving four characters over "store_before_nul" ;-)
> :-)
> 
>>
>> It's just a name I picked because I like it.  Would you prefer to
>> go back to the original 'cmp' instead?  It doesn't get much shorter
>> than that, or less descriptive, especially when there is no comment
>> explaining what it's for.  (FWIW, I renamed it because I found myself
>> going back to the description of compare_nonzero_chars to remember
>> what cmp actually meant.)
> Fair enough.  Though "b4" reads like it belongs in a text message to me.
>    I don't want to get nitty over this.  Will s/b4/before/ work for you?

If it's distracting I'll change it.

> 
> If you wanted to add a comment before the variable, that would be good
> as well, particularly since we don't have a good name.
> 
> 
>>
>>> You've got two entries in the array, but the comment reads as if there's
>>> just one element.  What is the difference between the two array elements?
>>
>> Since handle_store deals with sequences of one or more bytes some
>> of the optimizations it implements(*) need to consider both where
>> the first byte of the sequence is stored and where the last one is.
>> The first element of this array is for the first byte and the second
>> one is for the last character.  The comment a few lines down is meant
>> to make the distinction clear but we can expand the comment above
>> the variable even more.
> I think  my brain locked up with the "b4".  Maybe it went into a mode
> where I'm trying to parse texts from my teenager which is clearly
> incompatible with reading code. :-)

That's a good enough argument to change it :)

>>        /* The offset of the last stored byte.  */
>>        unsigned HOST_WIDE_INT endoff = offset + lenrange[2] - 1;
>>
>> (lenrange[2] is the size of the store.)
>>
>> [*] E.g., the one we discussed not so long ago (PR90989) or the one
>> that removes a store of '\0' over the terminating '\0'.
>>
>>>
>>> I didn't see anything terribly concerning so far, but I do want to look
>>> at handle_store again after the comment is fixed so that I'm sure I'm
>>> interpreting things correctly.
>>
>> Does this help?
>>
>>    /* The corresponding element is set to 1 if the first and last
>>       element, respectively, of the sequence of characters being
>>       written over the string described by SI ends before
>>       the terminating nul (if it has one), to zero if the nul is
>>       being overwritten but not beyond, or negative otherwise.  */
> Yea.  I also note a /* */ empty comment in handle_store, you probably
> wanted to write something there :-)

I did initially want to write something there but then I wasn't
sure the conditional was quite right.  I went to investigate and
forgot to fix the conditional.  There was an unnecessarily test
there so I removed both it and the comment placeholder.

> 
> OK with the nit fixes noted earlier, variable name fix and comment fixes.

Committed in r273783 after retesting and including a test for
the warning that I had left out of the patch I posted here.

Martin

PS I suspect some of the tests I added might need tweaking on
big-endian systems.  I'll deal with them tomorrow.
Martin Sebor July 25, 2019, 2:17 a.m. UTC | #5
> Committed in r273783 after retesting and including a test for
> the warning that I had left out of the patch I posted here.
> 
> Martin
> 
> PS I suspect some of the tests I added might need tweaking on
> big-endian systems.  I'll deal with them tomorrow.

And maybe also strictly aligned targets.  A sparc-solaris2.11 cross
shows these failures.  It looks like it doesn't like something about
some of the 64 bit stores in the tests.

FAIL: gcc.dg/strlenopt-70.c scan-tree-dump-times optimized "strlen" 0
FAIL: gcc.dg/strlenopt-70.c scan-tree-dump-times optimized 
"_not_eliminated_" 0
FAIL: gcc.dg/strlenopt-71.c (test for excess errors)
FAIL: gcc.dg/strlenopt-72.c scan-tree-dump-times optimized "strlen" 0
FAIL: gcc.dg/strlenopt-72.c scan-tree-dump-times optimized 
"_not_eliminated_" 0
Jeff Law July 25, 2019, 2:39 a.m. UTC | #6
On 7/24/19 8:17 PM, Martin Sebor wrote:
>> Committed in r273783 after retesting and including a test for
>> the warning that I had left out of the patch I posted here.
>>
>> Martin
>>
>> PS I suspect some of the tests I added might need tweaking on
>> big-endian systems.  I'll deal with them tomorrow.
> 
> And maybe also strictly aligned targets.  A sparc-solaris2.11 cross
> shows these failures.  It looks like it doesn't like something about
> some of the 64 bit stores in the tests.
> 
> FAIL: gcc.dg/strlenopt-70.c scan-tree-dump-times optimized "strlen" 0
> FAIL: gcc.dg/strlenopt-70.c scan-tree-dump-times optimized
> "_not_eliminated_" 0
> FAIL: gcc.dg/strlenopt-71.c (test for excess errors)
> FAIL: gcc.dg/strlenopt-72.c scan-tree-dump-times optimized "strlen" 0
> FAIL: gcc.dg/strlenopt-72.c scan-tree-dump-times optimized
> "_not_eliminated_" 0


msp430-elf failures:


> New tests that FAIL (5 tests):
> 
> msp430-sim: gcc.dg/strlenopt-70.c (test for excess errors)
> msp430-sim: gcc.dg/strlenopt-70.c scan-tree-dump-times optimized "_not_eliminated_" 0
> msp430-sim: gcc.dg/strlenopt-71.c (test for excess errors)
> msp430-sim: gcc.dg/strlenopt-72.c scan-tree-dump-times optimized "_not_eliminated_" 0
> msp430-sim: gcc.dg/strlenopt-72.c scan-tree-dump-times optimized "strlen" 0
Jeff Law July 25, 2019, 2:41 a.m. UTC | #7
On 7/24/19 8:17 PM, Martin Sebor wrote:
>> Committed in r273783 after retesting and including a test for
>> the warning that I had left out of the patch I posted here.
>>
>> Martin
>>
>> PS I suspect some of the tests I added might need tweaking on
>> big-endian systems.  I'll deal with them tomorrow.
> 
> And maybe also strictly aligned targets.  A sparc-solaris2.11 cross
> shows these failures.  It looks like it doesn't like something about
> some of the 64 bit stores in the tests.
> 
> FAIL: gcc.dg/strlenopt-70.c scan-tree-dump-times optimized "strlen" 0
> FAIL: gcc.dg/strlenopt-70.c scan-tree-dump-times optimized
> "_not_eliminated_" 0
> FAIL: gcc.dg/strlenopt-71.c (test for excess errors)
> FAIL: gcc.dg/strlenopt-72.c scan-tree-dump-times optimized "strlen" 0
> FAIL: gcc.dg/strlenopt-72.c scan-tree-dump-times optimized
> "_not_eliminated_" 0

visium-elf:

> New tests that FAIL (16 tests):
> 
> visium-sim: gcc.dg/Wstringop-overflow-14.c  (test for warnings, line 24)
> visium-sim: gcc.dg/Wstringop-overflow-14.c (test for excess errors)
> visium-sim: gcc.dg/strlenopt-70.c scan-tree-dump-times optimized "_not_eliminated_" 0
> visium-sim: gcc.dg/strlenopt-70.c scan-tree-dump-times optimized "strlen" 0
> visium-sim: gcc.dg/strlenopt-72.c scan-tree-dump-times optimized "_not_eliminated_" 0
> visium-sim: gcc.dg/strlenopt-72.c scan-tree-dump-times optimized "strlen" 0
Martin Sebor July 25, 2019, 7:42 p.m. UTC | #8
On 7/24/19 8:39 PM, Jeff Law wrote:
> On 7/24/19 8:17 PM, Martin Sebor wrote:
>>> Committed in r273783 after retesting and including a test for
>>> the warning that I had left out of the patch I posted here.
>>>
>>> Martin
>>>
>>> PS I suspect some of the tests I added might need tweaking on
>>> big-endian systems.  I'll deal with them tomorrow.
>>
>> And maybe also strictly aligned targets.  A sparc-solaris2.11 cross
>> shows these failures.  It looks like it doesn't like something about
>> some of the 64 bit stores in the tests.
>>
>> FAIL: gcc.dg/strlenopt-70.c scan-tree-dump-times optimized "strlen" 0
>> FAIL: gcc.dg/strlenopt-70.c scan-tree-dump-times optimized
>> "_not_eliminated_" 0
>> FAIL: gcc.dg/strlenopt-71.c (test for excess errors)
>> FAIL: gcc.dg/strlenopt-72.c scan-tree-dump-times optimized "strlen" 0
>> FAIL: gcc.dg/strlenopt-72.c scan-tree-dump-times optimized
>> "_not_eliminated_" 0
> 
> 
> msp430-elf failures:
> 
> 
>> New tests that FAIL (5 tests):
>>
>> msp430-sim: gcc.dg/strlenopt-70.c (test for excess errors)
>> msp430-sim: gcc.dg/strlenopt-70.c scan-tree-dump-times optimized "_not_eliminated_" 0

This one is due to bugs in the endian macros the test defines
to try to handle both big and little-endian.  I've fixed those/

>> msp430-sim: gcc.dg/strlenopt-71.c (test for excess errors)

Same here, though this is a runtime test so it will also fail
to link outside of an emulator.  (Changing the test harness to
report these tests as UNSUPPORTED instead would avoid this sort
of ambiguity.)

>> msp430-sim: gcc.dg/strlenopt-72.c scan-tree-dump-times optimized "_not_eliminated_" 0
>> msp430-sim: gcc.dg/strlenopt-72.c scan-tree-dump-times optimized "strlen" 0

This failure is due to to a combination of the absence of early
store merging in pr83821.  The former prevents the stores below

   char a[4];
   a[0] = 'X';
   a[1] = 0;
   a[2] = a[3] = 0;

from turning into:

   MEM <unsigned int> [(char * {ref-all})&b] = 49;

pr83821 causes the strlen pass to invalidate strlen information
when it sees the assignment to a[2] (or beyond).  I need to dust
off and resubmit my patch for that from last year.

Until that patch is approved I have disabled the test on strictly
aligned targets.

The failure in gcc.dg/Wstringop-overflow-14.c on visium-elf was
because of a difference between strictly aligned targets and
the rest, which triggers different warnings between the two sets.
I disabled the unexpected warning and the test passes.

I've just committed r273812 and r273814 with these changes.

Martin
Jeff Law July 26, 2019, 3:10 a.m. UTC | #9
On 7/25/19 1:42 PM, Martin Sebor wrote:
> On 7/24/19 8:39 PM, Jeff Law wrote:
>> On 7/24/19 8:17 PM, Martin Sebor wrote:
>>>> Committed in r273783 after retesting and including a test for
>>>> the warning that I had left out of the patch I posted here.
>>>>
>>>> Martin
>>>>
>>>> PS I suspect some of the tests I added might need tweaking on
>>>> big-endian systems.  I'll deal with them tomorrow.
>>>
>>> And maybe also strictly aligned targets.  A sparc-solaris2.11 cross
>>> shows these failures.  It looks like it doesn't like something about
>>> some of the 64 bit stores in the tests.
>>>
>>> FAIL: gcc.dg/strlenopt-70.c scan-tree-dump-times optimized "strlen" 0
>>> FAIL: gcc.dg/strlenopt-70.c scan-tree-dump-times optimized
>>> "_not_eliminated_" 0
>>> FAIL: gcc.dg/strlenopt-71.c (test for excess errors)
>>> FAIL: gcc.dg/strlenopt-72.c scan-tree-dump-times optimized "strlen" 0
>>> FAIL: gcc.dg/strlenopt-72.c scan-tree-dump-times optimized
>>> "_not_eliminated_" 0
>>
>>
>> msp430-elf failures:
>>
>>
>>> New tests that FAIL (5 tests):
>>>
>>> msp430-sim: gcc.dg/strlenopt-70.c (test for excess errors)
>>> msp430-sim: gcc.dg/strlenopt-70.c scan-tree-dump-times optimized
>>> "_not_eliminated_" 0
> 
> This one is due to bugs in the endian macros the test defines
> to try to handle both big and little-endian.  I've fixed those/
> 
>>> msp430-sim: gcc.dg/strlenopt-71.c (test for excess errors)
> 
> Same here, though this is a runtime test so it will also fail
> to link outside of an emulator.  (Changing the test harness to
> report these tests as UNSUPPORTED instead would avoid this sort
> of ambiguity.)
Well, the "test for excess errors" should be an indicator that something
went wrong before trying to execute.  Typically these are compile-time
warnings/errors or occasionally a link time error due to an undefined
symbol.

I don't have access to the tester, so I can't really look at it until I
return though.

> 
>>> msp430-sim: gcc.dg/strlenopt-72.c scan-tree-dump-times optimized
>>> "_not_eliminated_" 0
>>> msp430-sim: gcc.dg/strlenopt-72.c scan-tree-dump-times optimized
>>> "strlen" 0
> 
> This failure is due to to a combination of the absence of early
> store merging in pr83821.  The former prevents the stores below
> 
>   char a[4];
>   a[0] = 'X';
>   a[1] = 0;
>   a[2] = a[3] = 0;
> 
> from turning into:
> 
>   MEM <unsigned int> [(char * {ref-all})&b] = 49;
> 
> pr83821 causes the strlen pass to invalidate strlen information
> when it sees the assignment to a[2] (or beyond).  I need to dust
> off and resubmit my patch for that from last year.
> 
> Until that patch is approved I have disabled the test on strictly
> aligned targets.
> 
> The failure in gcc.dg/Wstringop-overflow-14.c on visium-elf was
> because of a difference between strictly aligned targets and
> the rest, which triggers different warnings between the two sets.
> I disabled the unexpected warning and the test passes.
> 
> I've just committed r273812 and r273814 with these changes.
No problem.  When I'm back from PTO I'll review state in the tester and
pass along anything else related.

Jeff

Patch
diff mbox series

PR tree-optimization/91183 - strlen of a strcpy result with a conditional source not folded
PR tree-optimization/86688 - missing -Wstringop-overflow using a non-string local array in strnlen with excessive bound

gcc/ChangeLog:

	PR tree-optimization/91183
	PR tree-optimization/86688
	* builtins.c (compute_objsize): Handle MEM_REF.
	* tree-ssa-strlen.c (class ssa_name_limit_t): New.
	(get_min_string_length): Remove.
	(count_nonzero_bytes): New function.
	(handle_char_store): Rename...
	(handle_store): to this.  Handle multibyte stores via integer types.
	(strlen_check_and_optimize_stmt): Adjust conditional and the called
	function name.

gcc/testsuite/ChangeLog:

	PR tree-optimization/91183
	PR tree-optimization/86688
	* gcc.dg/attr-nonstring-2.c: Remove xfails.
	* gcc.dg/strlenopt-70.c: New test.
	* gcc.dg/strlenopt-71.c: New test.
	* gcc.dg/strlenopt-72.c: New test.
	* gcc.dg/strlenopt-8.c: Remove xfails.

diff --git a/gcc/builtins.c b/gcc/builtins.c
index e5a9261e84c..695a9d191af 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -3652,6 +3652,20 @@  compute_objsize (tree dest, int ostype)
   if (!ostype)
     return NULL_TREE;
 
+  if (TREE_CODE (dest) == MEM_REF)
+    {
+      tree ref = TREE_OPERAND (dest, 0);
+      tree off = TREE_OPERAND (dest, 1);
+      if (tree size = compute_objsize (ref, ostype))
+	{
+	  if (tree_int_cst_lt (off, size))
+	    return fold_build2 (MINUS_EXPR, size_type_node, size, off);
+	  return integer_zero_node;
+	}
+
+      return NULL_TREE;
+    }
+
   if (TREE_CODE (dest) != ADDR_EXPR)
     return NULL_TREE;
 
diff --git a/gcc/testsuite/gcc.dg/attr-nonstring-2.c b/gcc/testsuite/gcc.dg/attr-nonstring-2.c
index 246a3729a2a..ef2144d6207 100644
--- a/gcc/testsuite/gcc.dg/attr-nonstring-2.c
+++ b/gcc/testsuite/gcc.dg/attr-nonstring-2.c
@@ -73,8 +73,8 @@  void test_strnlen_string_cst (void)
   T (3, "12",  3, 1);
   T (3, "12",  3, 9);
   T (3, "123", 3, 1);
-  T (3, "123", 3, 4);               /* { dg-warning "argument 1 declared attribute .nonstring. is smaller than the specified bound 4" "bug 86688" { xfail *-*-* } } */
-  T (3, "123", 3, 9);               /* { dg-warning "argument 1 declared attribute .nonstring. is smaller than the specified bound 9" "bug 86688" { xfail *-*-* } } */
+  T (3, "123", 3, 4);               /* { dg-warning "argument 1 declared attribute .nonstring. is smaller than the specified bound 4" } */
+  T (3, "123", 3, 9);               /* { dg-warning "argument 1 declared attribute .nonstring. is smaller than the specified bound 9" } */
 
   T (5, "1",   2, 1);
   T (5, "1",   2, 2);
@@ -110,6 +110,6 @@  void test_strnlen_string_range (void)
 {
   T (3, "1",   2, UR (0, 1));
   T (3, "1",   2, UR (3, 9));
-  T (3, "123", 3, UR (4, 5));       /* { dg-warning "argument 1 declared attribute .nonstring. is smaller than the specified bound \\\[4, 5]" "bug 86688" { xfail *-*-* } } */
-  T (3, "123", 3, UR (5, 9));       /* { dg-warning "argument 1 declared attribute .nonstring. is smaller than the specified bound \\\[5, 9]" "bug 86688" { xfail *-*-* } } */
+  T (3, "123", 3, UR (4, 5));       /* { dg-warning "argument 1 declared attribute .nonstring. is smaller than the specified bound \\\[4, 5]" } */
+  T (3, "123", 3, UR (5, 9));       /* { dg-warning "argument 1 declared attribute .nonstring. is smaller than the specified bound \\\[5, 9]" } */
 }
diff --git a/gcc/testsuite/gcc.dg/strlenopt-70.c b/gcc/testsuite/gcc.dg/strlenopt-70.c
new file mode 100644
index 00000000000..59e1081c9b5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/strlenopt-70.c
@@ -0,0 +1,288 @@ 
+/* PR tree-optimization/91183 - strlen of a strcpy result with a conditional
+   source not folded
+   Test to verify that strlen can determine string lengths from wider stores
+   than narrow characters.  This matters because on targets that can handle
+   unaligned stores and where GCC lowers multi-character stores into smaller
+   numbers of wider stores.
+   { dg-do compile }
+   { dg-options "-O2 -fdump-tree-optimized" }  */
+
+#include "strlenopt.h"
+
+#define CHAR_BIT __CHAR_BIT__
+
+typedef __INT16_TYPE__ int16_t;
+typedef __INT32_TYPE__ int32_t;
+typedef __INT64_TYPE__  int64_t;
+typedef __UINT64_TYPE__ uint64_t;
+typedef __uint128_t     uint128_t;
+
+#define CAT(x, y) x ## y
+#define CONCAT(x, y) CAT (x, y)
+#define FAILNAME(name) CONCAT (call_ ## name ##_on_line_, __LINE__)
+
+#define FAIL(name) do {				\
+    extern void FAILNAME (name) (void);		\
+    FAILNAME (name)();				\
+  } while (0)
+
+/* Macros to emit a call to function named
+     call_failed_to_be_eliminated_on_line_NNN()
+   for each call that's expected to be eliminated.  The dg-final
+   scan-tree-dump-time directive at the bottom of the test verifies
+   that no such call appears in output.  */
+#define ELIM(expr)					\
+  if ((expr)) FAIL (not_eliminated); else (void)0
+
+/* Verify that 'strlen (A) EXPECT' is folded to true.  When non-null,
+   the first sizeof (INIT) - 1 bytes of the INIT arrray are stored
+   in A first, followed by *(TYPE*)A = ASSIGN.  */
+#define T(init, type, off, assign, expect) do {			\
+    char a[32];							\
+    memcpy (a, init ? init : "", init ? sizeof init - 1 : 0);	\
+    *(type*)(a + off) = assign;					\
+    ELIM (!(strlen (a) expect));				\
+  } while (0)
+
+/* Same as above but the assignment consisting of the two quadwords
+   QW1 and QW2 to support int128_t.  */
+#define T2(init, type, off, qw0, qw1, expect) do {			\
+    char a[32];								\
+    memcpy (a, init ? init : "", init ? sizeof init - 1: 0);		\
+    type assign = ((type)qw0 << (sizeof (type) * CHAR_BIT / 2)) | (type)qw1; \
+    *(type*)(a + off) = assign;						\
+    ELIM (!(strlen (a) expect));					\
+  } while (0)
+
+/* Same as T but works around the optimizer dropping the initializing
+   store before the assignment and defeating the strlen optimization.  */
+#define TX(init, type, off, assign, expect) do {		\
+    char a[32];							\
+    strcpy (a, init + 2);					\
+    strcat (a, init + sizeof (init) - 3);			\
+    *(type*)(a + off) = assign;					\
+    ELIM (!(strlen (a) expect));				\
+  } while (0)
+
+
+#if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
+#  define I16(s) ((s[0] << 8) + s[1])
+#  define I32(s) ((s[0] << 24) + (s[1] << 16) + (s[2] << 8) + s[3])
+#  define I64(s)						\
+  (((uint64_t)s[0] << 56)					\
+   + ((uint64_t)s[1] << 48)					\
+   + ((uint64_t)s[2] << 40)					\
+   + ((uint64_t)s[3] << 32)					\
+   + ((uint64_t)s[4] << 24)					\
+   + ((uint64_t)s[5] << 16)					\
+   + ((uint64_t)s[6] << 8)					\
+   + s[7])
+#elif __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
+#  define I16(s) ((s[1] << 8) + s[0])
+#  define I32(s) ((s[3] << 24) + (s[2] << 16) + (s[1] << 8) + s[0])
+#  define I64(s)						\
+  (((uint64_t)s[7] << 56)					\
+   + ((uint64_t)s[6] << 48)					\
+   + ((uint64_t)s[5] << 40)					\
+   + ((uint64_t)s[4] << 32)					\
+   + ((uint64_t)s[3] << 24)					\
+   + ((uint64_t)s[2] << 16)					\
+   + ((uint64_t)s[1] << 8)					\
+   + s[0])
+#endif
+
+#if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
+
+void store_16bit_be (void)
+{
+  T ("xxx", int16_t, 0, 0x0001, == 0);
+  T ("xxx", int16_t, 0, 0x0010, == 0);
+  T ("xxx", int16_t, 0, 0x0011, == 0);
+  T ("xxx", int16_t, 0, 0x0100, == 1);
+  T ("xxx", int16_t, 0, 0x1000, == 1);
+  T ("xxx", int16_t, 0, 0x1100, == 1);
+}
+
+#elif __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
+
+void store_16bit_le (int i)
+{
+  int16_t x0000 = I16 ("\0\0");
+  int16_t x0001 = 0x0001;
+  int16_t x0010 = 0x0010;
+  int16_t x0011 = 0x0011;
+  int16_t x0100 = 0x0100;
+  int16_t x1000 = 0x1000;
+  int16_t x1100 = 0x1100;
+
+  T (0,        int16_t, 0, x0000, == 0);
+  T ("x",      int16_t, 0, x0000, == 0);
+  T ("xx",     int16_t, 0, x0000, == 0);
+  T ("xxxx",   int16_t, 0, x0000, == 0);
+  T (0,        int16_t, 0, x0001, == 1);
+  T ("\0\0\0", int16_t, 0, x0001, == 1);
+  T (0,        int16_t, 0, x0010, == 1);
+  T ("x\0\0",  int16_t, 0, x0010, == 1);
+  T (0,        int16_t, 0, x0011, == 1);
+  T ("xx\0",   int16_t, 0, x0011, == 1);
+  T (0,        int16_t, 0, x0100, == 0);
+  T ("\0\0\0", int16_t, 0, x0100, == 0);
+  T (0,        int16_t, 0, x1000, == 0);
+  T ("x\0\0",  int16_t, 0, x1000, == 0);
+  T (0,        int16_t, 0, x1100, == 0);
+  T ("xx\0",   int16_t, 0, x1100, == 0);
+
+  // FIXME: This fails because of the next test but succeeds on its own.
+  // T (0,        int16_t, 0, i ? x0001 : x0010, == 1);
+  T ("xxx",    int16_t, 0, i ? x0100 : x1100, == 0);
+}
+
+#endif
+
+void store_32bit (volatile int i)
+{
+  T (0,      int32_t, 0, 0, == 0);
+  T ("x",    int32_t, 0, 0, == 0);
+  T ("xx",   int32_t, 0, 0, == 0);
+  T ("xxx",  int32_t, 0, 0, == 0);
+  T ("xxxx", int32_t, 0, 0, == 0);
+
+  T ("\0",   int32_t, 1, 0, == 0);
+  T ("x",    int32_t, 1, 0, == 1);
+  T ("xx",   int32_t, 2, 0, == 2);
+  T ("xxx",  int32_t, 3, 0, == 3);
+
+  T ("xxx",  int32_t, 0, I32 ("\01\0\0\0"), == 1);
+  T ("xxx",  int32_t, 0, I32 ("\0\01\0\0"), == 0);
+  T ("xxx",  int32_t, 0, I32 ("\0\0\01\0"), == 0);
+  T ("xxx",  int32_t, 0, I32 ("\0\0\0\01"), == 0);
+
+  T ("xxx",  int32_t, 0, I32 ("\1\2\0\0"), == 2);
+  T ("xxx",  int32_t, 0, I32 ("\0\1\2\0"), == 0);
+  T ("xxx",  int32_t, 0, I32 ("\0\0\1\2"), == 0);
+
+  T ("xxx",  int32_t, 0, I32 ("\1\2\3\0"), == 3);
+  T ("xxx",  int32_t, 0, I32 ("\0\1\2\3"), == 0);
+
+  int32_t x00332211 = I32 ("123\0");
+  int32_t x00002211 = I32 ("12\0\0");
+  int32_t x00000011 = I32 ("1\0\0\0");
+
+  T ("xxxx", int32_t, 0, i ? x00332211 : x00002211, <= 3);
+  T ("xxxx", int32_t, 0, i ? x00332211 : x00002211, >= 2);
+  T ("xxxx", int32_t, 0, i ? x00332211 : x00000011, <= 3);
+  T ("xxxx", int32_t, 0, i ? x00332211 : x00000011, >= 1);
+
+  TX ("abcde",  int32_t, 0, i ? I32 ("1234") : I32 ("1235"), == 5);
+  TX ("abcde",  int32_t, 1, i ? I32 ("1234") : I32 ("1235"), == 5);
+
+  TX ("abcdef", int32_t, 0, i ? I32 ("1235") : I32 ("1234"), == 6);
+  TX ("abcdef", int32_t, 1, i ? I32 ("1235") : I32 ("1234"), == 6);
+  TX ("abcdef", int32_t, 2, i ? I32 ("1235") : I32 ("1234"), == 6);
+  TX ("abcdef", int32_t, 3, i ? I32 ("124\0") : I32 ("123\0"), == 6);
+  TX ("abcdef", int32_t, 3, i ? I32 ("12\0\0") : I32 ("13\0\0"), == 5);
+
+  TX ("abcdef", int32_t, 3, i ? I32 ("12\0\0") : I32 ("123\0"), >= 5);
+  TX ("abcdef", int32_t, 3, i ? I32 ("12\0\0") : I32 ("123\0"), < 7);
+}
+
+void store_64bit (int i)
+{
+  T2 ("xxxxxxx", int64_t, 0,                0, I32 ("\1\0\0\0"), == 1);
+  T2 ("xxxxxxx", int64_t, 0,                0, I32 ("\0\1\0\0"), == 0);
+  T2 ("xxxxxxx", int64_t, 0,                0, I32 ("\0\0\1\0"), == 0);
+  T2 ("xxxxxxx", int64_t, 0,                0, I32 ("\0\00\0\1"), == 0);
+  T2 ("xxxxxxx", int64_t, 0, I32 ("\1\0\0\0"), 0, == 0);
+  T2 ("xxxxxxx", int64_t, 0, I32 ("\0\1\0\0"), 0, == 0);
+  T2 ("xxxxxxx", int64_t, 0, I32 ("\0\0\1\0"), 0, == 0);
+  T2 ("xxxxxxx", int64_t, 0, I32 ("\0\0\0\1"), 0, == 0);
+
+  T2 ("xxxxxxx", int64_t, 0, 0, I32 ("\1\2\0\0"), == 2);
+  T2 ("xxxxxxx", int64_t, 0, 0, I32 ("\0\1\2\0"), == 0);
+  T2 ("xxxxxxx", int64_t, 0, 0, I32 ("\0\0\1\2"), == 0);
+
+  T2 ("xxxxxxx", int64_t, 0, 0, I32 ("\1\2\3\0"), == 3);
+  T2 ("xxxxxxx", int64_t, 0, 0, I32 ("\0\1\2\3"), == 0);
+
+  T2 ("xxxxxxx", int64_t, 0, 0, I32 ("\1\2\3\4"), == 4);
+  T2 ("xxxxxxx", int64_t, 0, I32 ("\5\0\0\0"), I32 ("\1\2\3\4"), == 5);
+  T2 ("xxxxxxx", int64_t, 0, I32 ("\5\6\0\0"), I32 ("\1\2\3\4"), == 6);
+  T2 ("xxxxxxx", int64_t, 0, I32 ("\5\6\7\0"), I32 ("\1\2\3\4"), == 7);
+
+  int64_t x7777777 = I64 ("\7\7\7\7\7\7\7");
+  int64_t x666666 = I64 ("\6\6\6\6\6\6\0");
+  int64_t x4444 = I64 ("\4\4\4\4\0\0\0");
+  int64_t x3333 = I64 ("\3\3\3\3\0\0\0");
+  int64_t x1 = I64 ("\1\0\0\0\0\0\0");
+
+  T ("x\0xxxxxx", int64_t, 0, i ? x7777777 : x666666, <= 7);
+  T ("xx\0xxxxx", int64_t, 0, i ? x7777777 : x666666, >= 6);
+  T ("xxx\0xxxx", int64_t, 0, i ? x666666 : x1, <= 6);
+  T ("xxxx\0xxx", int64_t, 0, i ? x666666 : x1, >= 1);
+  T ("xxxxxx\0x", int64_t, 0, i ? x4444 : x3333, == 4);
+}
+
+void store_128bit (void)
+{
+  uint64_t x1 = I64 ("\1\0\0\0\0\0\0\0");
+  uint64_t x01 = I64 ("\0\1\0\0\0\0\0\0");
+  uint64_t x001 = I64 ("\0\0\1\0\0\0\0\0");
+  uint64_t x0001 = I64 ("\0\0\0\1\0\0\0\0");
+  uint64_t x00001 = I64 ("\0\0\0\0\1\0\0\0");
+  uint64_t x000001 = I64 ("\0\0\0\0\0\1\0\0");
+  uint64_t x0000001 = I64 ("\0\0\0\0\0\0\1\0");
+  uint64_t x00000001 = I64 ("\0\0\0\0\0\0\0\1");
+
+  T2 ("xxxxxxx", uint128_t, 0, 0,        x1, == 1);
+  T2 ("xxxxxxx", uint128_t, 0, 0,       x01, == 0);
+  T2 ("xxxxxxx", uint128_t, 0, 0,      x001, == 0);
+  T2 ("xxxxxxx", uint128_t, 0, 0,     x0001, == 0);
+  T2 ("xxxxxxx", uint128_t, 0, 0,    x00001, == 0);
+  T2 ("xxxxxxx", uint128_t, 0, 0,   x000001, == 0);
+  T2 ("xxxxxxx", uint128_t, 0, 0,  x0000001, == 0);
+  T2 ("xxxxxxx", uint128_t, 0, 0, x00000001, == 0);
+
+  T2 ("xxxxxxx", uint128_t, 0,        x1, 0, == 0);
+  T2 ("xxxxxxx", uint128_t, 0,       x01, 0, == 0);
+  T2 ("xxxxxxx", uint128_t, 0,      x001, 0, == 0);
+  T2 ("xxxxxxx", uint128_t, 0,     x0001, 0, == 0);
+  T2 ("xxxxxxx", uint128_t, 0,    x00001, 0, == 0);
+  T2 ("xxxxxxx", uint128_t, 0,   x000001, 0, == 0);
+  T2 ("xxxxxxx", uint128_t, 0,  x0000001, 0, == 0);
+  T2 ("xxxxxxx", uint128_t, 0, x00000001, 0, == 0);
+
+  T2 ("xxxxxxx", uint128_t, 0, 0, I64 ("\2\1\0\0\0\0\0\0"), == 2);
+  T2 ("xxxxxxx", uint128_t, 0, 0, I64 ("\0\2\1\0\0\0\0\0"), == 0);
+
+  T2 ("xxxxxxx", uint128_t, 0, 0, I64 ("\3\2\1\0\0\0\0\0"), == 3);
+  T2 ("xxxxxxx", uint128_t, 0, 0, I64 ("\0\3\2\1\0\0\0\0"), == 0);
+
+  uint64_t x4321     = I64 ("\4\3\2\1\0\0\0\0");
+  uint64_t x54321    = I64 ("\5\4\3\2\1\0\0\0");
+  uint64_t x654321   = I64 ("\6\5\4\3\2\1\0\0");
+  uint64_t x7654321  = I64 ("\7\6\5\4\3\2\1\0");
+  uint64_t x87654321 = I64 ("8\7\6\5\4\3\2\1");
+  uint64_t x9        = I64 ("9\0\0\0\0\0\0\0");
+  uint64_t xa9       = I64 ("a9\0\0\0\0\0\0");
+  uint64_t xba9      = I64 ("ba9\0\0\0\0\0\0");
+  uint64_t xcba9     = I64 ("cba9\0\0\0\0\0");
+  uint64_t xdcba9    = I64 ("dcba9\0\0\0\0");
+  uint64_t xedcba9   = I64 ("edcba9\0\0\0\0");
+  uint64_t xfedcba9  = I64 ("fedcba9\0\0\0");
+
+  T2 (0, uint128_t, 0,        0,     x4321, ==  4);
+  T2 (0, uint128_t, 0,        0,    x54321, ==  5);
+  T2 (0, uint128_t, 0,        0,   x654321, ==  6);
+  T2 (0, uint128_t, 0,        0,  x7654321, ==  7);
+  T2 (0, uint128_t, 0,        0, x87654321, ==  8);
+  T2 (0, uint128_t, 0,       x9, x87654321, ==  9);
+  T2 (0, uint128_t, 0,      xa9, x87654321, == 10);
+  T2 (0, uint128_t, 0,     xba9, x87654321, == 11);
+  T2 (0, uint128_t, 0,    xcba9, x87654321, == 12);
+  T2 (0, uint128_t, 0,   xdcba9, x87654321, == 13);
+  T2 (0, uint128_t, 0,  xedcba9, x87654321, == 14);
+  T2 (0, uint128_t, 0, xfedcba9, x87654321, == 15);
+}
+
+/* { dg-final { scan-tree-dump-times "strlen" 0 "optimized" } }
+   { dg-final { scan-tree-dump-times "_not_eliminated_" 0 "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/strlenopt-71.c b/gcc/testsuite/gcc.dg/strlenopt-71.c
new file mode 100644
index 00000000000..0e6485c52cc
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/strlenopt-71.c
@@ -0,0 +1,214 @@ 
+/* PR tree-optimization/91183 - strlen of a strcpy result with a conditional
+   source not folded
+   Runtime test to verify that multibyte stores are handled correctly.
+   { dg-do run }
+   { dg-options "-O2 -Wall" }  */
+
+#include "strlenopt.h"
+
+#define CHAR_BIT __CHAR_BIT__
+
+typedef __INT16_TYPE__ int16_t;
+typedef __INT32_TYPE__ int32_t;
+typedef __INT64_TYPE__ int64_t;
+typedef __uint128_t    uint128_t;
+
+#define NOIPA __attribute__ ((noclone, noinline, noipa))
+
+/* Prevent the optimizer from detemining invariants from prior tests.  */
+NOIPA void terminate (void)
+{
+  __builtin_abort ();
+}
+
+#define VERIFY(expr, str)						\
+  do {									\
+    const unsigned expect = strlen (str);				\
+    const unsigned len = strlen (expr);					\
+    if (len != expect)							\
+      {									\
+	__builtin_printf ("line %i: strlen(%s) == %u failed: "		\
+			  "got %u with a = \"%.*s\"\n",			\
+			  __LINE__, #expr, expect, len,			\
+			  (int)sizeof a, a);				\
+	terminate ();							\
+      }									\
+    if (memcmp (a, str, expect + 1))					\
+      {									\
+	__builtin_printf ("line %i: expected string \"%s\", "		\
+			  "got a = \"%.*s\"\n",				\
+			  __LINE__, str, (int)sizeof a, a);		\
+	terminate ();							\
+      }									\
+  } while (0)
+
+#if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
+#  define I16(s) ((s[0] << 8) + s[1])
+#  define I32(s) ((s[0] << 24) + (s[1] << 16) + (s[2] << 8) + s[3])
+#elif __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
+#  define I16(s) ((s[1] << 8) + s[0])
+#  define I32(s) ((s[3] << 24) + (s[2] << 16) + (s[1] << 8) + s[0])
+#endif
+
+char a[32];
+
+NOIPA void
+i16_1 (void)
+{
+  *(int16_t*)a = I16 ("12");
+  *(int16_t*)(a + 2) = I16 ("3");
+  VERIFY (a, "123");
+
+  *(int16_t*)(a + 1) = I16 ("23");
+  VERIFY (a, "123");
+
+  *(int16_t*)(a) = I16 ("12");
+  VERIFY (a, "123");
+
+  *(int16_t*)(a + 1) = I16 ("2");
+  VERIFY (a, "12");
+
+  *(int16_t*)(a + 3) = I16 ("45");
+  *(int16_t*)(a + 2) = I16 ("34");
+  VERIFY (a, "12345");
+}
+
+NOIPA void
+i16_2 (void)
+{
+  strcpy (a, "12");
+  strcat (a, "34");
+
+  *(int16_t*)a = I16 ("12");
+  VERIFY (a, "1234");
+
+  *(int16_t*)(a + 1) = I16 ("12");
+  VERIFY (a, "1124");
+
+  *(int16_t*)(a + 2) = I16 ("12");
+  VERIFY (a, "1112");
+
+  *(int16_t*)(a + 3) = I16 ("12");
+  VERIFY (a, "11112");
+
+  *(int16_t*)(a + 4) = I16 ("12");
+  VERIFY (a, "111112");
+}
+
+
+NOIPA void
+i32_1 (void)
+{
+  *(int32_t*)a = I32 ("1234");
+  VERIFY (a, "1234");
+
+  *(int32_t*)(a + 1) = I32 ("2345");
+  VERIFY (a, "12345");
+}
+
+NOIPA void
+i32_2 (void)
+{
+  strcpy (a, "12");
+  strcat (a, "34");
+
+  *(int32_t*)a = I32 ("1234");
+  VERIFY (a, "1234");
+
+  *(int32_t*)(a + 4) = I32 ("567");
+  VERIFY (a, "1234567");
+
+  *(int32_t*)(a + 7) = I32 ("89\0");
+  VERIFY (a, "123456789");
+
+  *(int32_t*)(a + 3) = I32 ("4567");
+  VERIFY (a, "123456789");
+
+  *(int32_t*)(a + 2) = I32 ("3456");
+  VERIFY (a, "123456789");
+
+  *(int32_t*)(a + 1) = I32 ("2345");
+  VERIFY (a, "123456789");
+}
+
+
+NOIPA void
+i32_3 (void)
+{
+  strcpy (a, "1234");
+  strcat (a, "5678");
+
+  *(int32_t*)a = I32 ("1234");
+  VERIFY (a, "12345678");
+
+  *(int32_t*)(a + 1) = I32 ("234");
+  VERIFY (a, "1234");
+
+  *(int32_t*)(a + 2) = I32 ("3456");
+  VERIFY (a, "12345678");
+
+  *(int32_t*)(a + 3) = I32 ("4567");
+  VERIFY (a, "12345678");
+
+  *(int32_t*)(a + 4) = I32 ("5678");
+  VERIFY (a, "12345678");
+
+  *(int32_t*)(a + 5) = I32 ("6789");
+  VERIFY (a, "123456789");
+
+  *(int32_t*)(a + 6) = I32 ("789A");
+  VERIFY (a, "123456789A");
+}
+
+volatile int vzero = 0;
+
+NOIPA void
+i32_4 (void)
+{
+  strcpy (a, "1234");
+  strcat (a, "5678");
+
+  *(int32_t*)a = vzero ? I32 ("1\0\0\0") : I32 ("1234");
+  VERIFY (a, "12345678");
+
+  *(int32_t*)a = vzero ? I32 ("12\0\0") : I32 ("1234");
+  VERIFY (a, "12345678");
+
+  *(int32_t*)a = vzero ? I32 ("123\0") : I32 ("1234");
+  VERIFY (a, "12345678");
+
+  *(int32_t*)a = vzero ? I32 ("1234") : I32 ("1234");
+  VERIFY (a, "12345678");
+
+  *(int32_t*)a = vzero ? I32 ("1235") : I32 ("1234");
+  VERIFY (a, "12345678");
+
+  *(int32_t*)a = vzero ? I32 ("1234") : I32 ("123\0");
+  VERIFY (a, "123");
+
+  *(int32_t*)(a + 3) = vzero ? I32 ("456\0") : I32 ("4567");
+  VERIFY (a, "12345678");
+}
+
+
+int main ()
+{
+  memset (a, 0, sizeof a);
+  i16_1 ();
+
+  memset (a, 0, sizeof a);
+  i16_2 ();
+
+
+  memset (a, 0, sizeof a);
+  i32_1 ();
+
+  memset (a, 0, sizeof a);
+  i32_2 ();
+
+  memset (a, 0, sizeof a);
+  i32_3 ();
+
+  memset (a, 0, sizeof a);
+  i32_4 ();
+}
diff --git a/gcc/testsuite/gcc.dg/strlenopt-72.c b/gcc/testsuite/gcc.dg/strlenopt-72.c
new file mode 100644
index 00000000000..9a565af9f07
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/strlenopt-72.c
@@ -0,0 +1,71 @@ 
+/* PR tree-optimization/91183 - strlen of a strcpy result with a conditional
+   source not folded
+   Test to verify that strlen can determine string lengths from wider stores
+   than narrow characters.  This matters because on targets that can handle
+   unaligned stores and where GCC lowers multi-character stores into smaller
+   numbers of wider stores.
+   { dg-do compile }
+   { dg-options "-O2 -fdump-tree-optimized" }  */
+
+#include "strlenopt.h"
+
+#define CHAR_BIT __CHAR_BIT__
+
+typedef __INT16_TYPE__ int16_t;
+typedef __INT32_TYPE__ int32_t;
+typedef __INT64_TYPE__ int64_t;
+typedef __uint128_t    uint128_t;
+
+#define CAT(x, y) x ## y
+#define CONCAT(x, y) CAT (x, y)
+#define FAILNAME(name) CONCAT (call_ ## name ##_on_line_, __LINE__)
+
+#define FAIL(name) do {				\
+    extern void FAILNAME (name) (void);		\
+    FAILNAME (name)();				\
+  } while (0)
+
+/* Macros to emit a call to function named
+     call_failed_to_be_eliminated_on_line_NNN()
+   for each call that's expected to be eliminated.  The dg-final
+   scan-tree-dump-time directive at the bottom of the test verifies
+   that no such call appears in output.  */
+#define ELIM(expr)					\
+  if ((expr)) FAIL (not_eliminated); else (void)0
+
+#undef T
+#define T(N, ncpy, expect, assign) do {				\
+    char a[N], b[N];						\
+    assign;							\
+    memcpy (b, a, ncpy);					\
+    ELIM (!(expect == strlen (b)));				\
+  } while (0)
+
+void test_copy (void)
+{
+  T (2, 1, 0, (a[0] = 0));
+  T (2, 2, 0, (a[0] = 0, a[1] = 0));
+  T (2, 2, 1, (a[0] = '1', a[1] = 0));
+  T (4, 3, 2, (a[0] = '1', a[1] = '2', a[2] = 0));
+  // Not handled due to pr83821:
+  // T (4, 3, 1, (a[0] = '1', a[1] = 0, a[2] = '2'));
+  T (4, 2, 1, (a[0] = '1', a[1] = 0,   a[2] = 0,   a[3] = 0));
+  // Not handled due to pr83821:
+  // T (4, 3, 1, (a[0] = '1', a[1] = 0,   a[2] = 0,   a[3] = 0));
+  T (4, 4, 1, (a[0] = '1', a[1] = 0,   a[2] = 0,   a[3] = 0));
+  T (4, 3, 2, (a[0] = '1', a[1] = '2', a[2] = 0,   a[3] = 0));
+  T (4, 4, 2, (a[0] = '1', a[1] = '2', a[2] = 0,   a[3] = 0));
+  T (4, 4, 3, (a[0] = '1', a[1] = '2', a[2] = '3', a[3] = 0));
+  T (5, 4, 1, (a[0] = '1', a[1] = 0,   a[2] = 0,   a[3] = 0));
+  T (5, 4, 2, (a[0] = '1', a[1] = '2', a[2] = 0,   a[3] = 0));
+  T (5, 4, 3, (a[0] = '1', a[1] = '2', a[2] = '3', a[3] = 0));
+  // Note handled:
+  // T (5, 5, 1, (a[0] = '1', a[1] = 0,   a[2] = 0,   a[3] = 0,   a[4] = 0));
+  // T (5, 5, 2, (a[0] = '1', a[1] = '2', a[2] = 0,   a[3] = 0,   a[4] = 0));
+  // T (5, 5, 3, (a[0] = '1', a[1] = '2', a[2] = '3', a[3] = 0,   a[4] = 0));
+  T (5, 5, 4, (a[0] = '1', a[1] = '2', a[2] = '3', a[3] = '4', a[4] = 0));
+}
+
+
+/* { dg-final { scan-tree-dump-times "strlen" 0 "optimized" } }
+   { dg-final { scan-tree-dump-times "_not_eliminated_" 0 "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/strlenopt-8.c b/gcc/testsuite/gcc.dg/strlenopt-8.c
index 85c6d38a965..f43b809ed02 100644
--- a/gcc/testsuite/gcc.dg/strlenopt-8.c
+++ b/gcc/testsuite/gcc.dg/strlenopt-8.c
@@ -43,13 +43,7 @@  main ()
   return 0;
 }
 
-/* On non-strict-align targets we inline the memcpy that strcat is turned
-   into and end up with a short typed load / store which strlenopt is not
-   able to analyze.  */
-
-/* { dg-final { scan-tree-dump-times "strlen \\(" 0 "strlen" { xfail non_strict_align } } } */
-/* { dg-final { scan-tree-dump-times "memcpy \\(" 2 "strlen" { target { non_strict_align } } } } */
-/* { dg-final { scan-tree-dump-times "memcpy \\(" 4 "strlen" { target { ! non_strict_align } } } }  */
+/* { dg-final { scan-tree-dump-times "strlen \\(" 0 "strlen" } } */
 /* { dg-final { scan-tree-dump-times "strcpy \\(" 0 "strlen" } } */
 /* { dg-final { scan-tree-dump-times "strcat \\(" 0 "strlen" } } */
 /* { dg-final { scan-tree-dump-times "strchr \\(" 0 "strlen" } } */
diff --git a/gcc/tree-ssa-strlen.c b/gcc/tree-ssa-strlen.c
index 88b6bd7869e..ca1aca3ed9e 100644
--- a/gcc/tree-ssa-strlen.c
+++ b/gcc/tree-ssa-strlen.c
@@ -3328,78 +3328,289 @@  handle_pointer_plus (gimple_stmt_iterator *gsi)
     }
 }
 
-/* If RHS, either directly or indirectly, refers to a string of constant
-   length, return the length.  Otherwise, if it refers to a character
-   constant, return 1 if the constant is non-zero and 0 if it is nul.
-   Otherwise, return a negative value.  */
+/* Describes recursion limits used by count_nonzero_bytes.  */
 
-static HOST_WIDE_INT
-get_min_string_length (tree rhs, bool *full_string_p)
+class ssa_name_limit_t
 {
-  if (INTEGRAL_TYPE_P (TREE_TYPE (rhs)))
+  bitmap visited;         /* Bitmap of visited SSA_NAMEs.  */
+  unsigned ssa_def_max;   /* Longest chain of SSA_NAMEs to follow.  */
+
+  /* Not copyable or assignable.  */
+  ssa_name_limit_t (ssa_name_limit_t&);
+  void operator= (ssa_name_limit_t&);
+
+ public:
+
+  ssa_name_limit_t ()
+    : visited (NULL),
+    ssa_def_max (PARAM_VALUE (PARAM_SSA_NAME_DEF_CHAIN_LIMIT)) { }
+
+  int next_ssa_name (tree);
+
+  ~ssa_name_limit_t ()
     {
-      if (tree_expr_nonzero_p (rhs))
+      if (visited)
+	BITMAP_FREE (visited);
+    }
+};
+
+/* If the SSA_NAME has already been "seen" return a positive value.
+   Otherwise add it to VISITED.  If the SSA_NAME limit has been
+   reached, return a negative value.  Otherwise return zero.  */
+
+int ssa_name_limit_t::next_ssa_name (tree ssa_name)
+{
+  if (!visited)
+    visited = BITMAP_ALLOC (NULL);
+
+  /* Return a positive value if SSA_NAME has already been visited.  */
+  if (!bitmap_set_bit (visited, SSA_NAME_VERSION (ssa_name)))
+    return 1;
+
+  /* Return a negative value to let caller avoid recursing beyond
+     the specified limit.  */
+  if (ssa_def_max == 0)
+    return -1;
+
+  --ssa_def_max;
+
+  return 0;
+}
+
+/* Determine the minimum and maximum number of leading non-zero bytes
+   in the representation of EXP and set LENRANGE[0] and LENRANGE[1]
+   to each.  Set LENRANGE[2] to the total number of bytes in
+   the representation.  Set *NULTREM if the representation contains
+   a zero byte, and set *ALLNUL if all the bytes are zero.  Avoid
+   recursing deeper than the limits in SNLIM allow.  Return true
+   on success and false otherwise.  */
+
+static bool
+count_nonzero_bytes (tree exp, unsigned lenrange[3], bool *nulterm,
+		     bool *allnul, bool *allnonnul, ssa_name_limit_t &snlim)
+{
+  if (TREE_CODE (exp) == SSA_NAME)
+    {
+      /* Handle a single-character specially.  */
+      tree type = TREE_TYPE (exp);
+      if (TREE_CODE (type) == INTEGER_TYPE
+	  && TYPE_MODE (type) == TYPE_MODE (char_type_node)
+	  && TYPE_PRECISION (type) == TYPE_PRECISION (char_type_node))
 	{
-	  *full_string_p = false;
-	  return 1;
+	  /* Determine if the character EXP is known to be non-zero
+	     (even if its exact value is not known) and if so, recurse
+	     once to set the range, etc.  */
+	  if (tree_expr_nonzero_p (exp))
+	    return count_nonzero_bytes (build_int_cst (type, 1), lenrange,
+					nulterm, allnul, allnonnul, snlim);
+	  /* Don't know whether EXP is or isn't nonzero.  */
+	  return false;
 	}
 
-      *full_string_p = true;
-      return 0;
+      gimple *stmt = SSA_NAME_DEF_STMT (exp);
+      if (gimple_code (stmt) != GIMPLE_PHI)
+	return false;
+
+      /* Avoid processing an SSA_NAME that has already been visited
+	 or if an SSA_NAME limit has been reached.  Indicate success
+	 if the former and failure if the latter.  */
+      if (int res = snlim.next_ssa_name (exp))
+	return res > 0;
+
+      /* Determine the minimum and maximum from the PHI arguments.  */
+      unsigned int n = gimple_phi_num_args (stmt);
+      for (unsigned i = 0; i != n; i++)
+	{
+	  tree def = gimple_phi_arg_def (stmt, i);
+	  if (!count_nonzero_bytes (def, lenrange, nulterm, allnul, allnonnul,
+				    snlim))
+	    return false;
+	}
+
+      return true;
     }
 
-  if (TREE_CODE (rhs) == MEM_REF
-      && integer_zerop (TREE_OPERAND (rhs, 1)))
+  /* Offset from the beginning of the representation bytes, a pointer
+     to the representation, and the number of bytes of the representation
+     to consider (may be less than the object size for MEM_REF).  */
+  unsigned HOST_WIDE_INT offset = 0;
+  const char *prep = NULL;
+  unsigned nbytes = 0;
+
+  if (TREE_CODE (exp) == MEM_REF)
     {
-      rhs = TREE_OPERAND (rhs, 0);
-      if (TREE_CODE (rhs) == ADDR_EXPR)
-	{
-	  tree rhs_addr = rhs;
+      /* If the MEM_REF operand is the address of an object such as
+	 a string or integer, extract it and the offset into it.  */
+      tree arg = TREE_OPERAND (exp, 0);
+      if (TREE_CODE (arg) != ADDR_EXPR)
+	return false;
+
+      tree off = TREE_OPERAND (exp, 1);
+      if (TREE_CODE (off) != INTEGER_CST
+	  || !tree_fits_uhwi_p (off))
+	return false;
 
-	  rhs = TREE_OPERAND (rhs, 0);
-	  if (TREE_CODE (rhs) != STRING_CST)
+      offset = tree_to_uhwi (off);
+      if (INT_MAX < offset)
+	return false;
+
+      /* The size of the MEM_REF access determines the number of bytes.  */
+      tree type = TREE_TYPE (exp);
+      if (tree typesize = TYPE_SIZE_UNIT (type))
+	nbytes = tree_to_uhwi (typesize);
+
+      if (offset == 0 && TREE_CODE (exp) != STRING_CST)
+	{
+	  int idx = get_stridx (arg);
+	  if (idx > 0)
 	    {
-	      int idx = get_stridx (rhs_addr);
-	      if (idx > 0)
+	      strinfo *si = get_strinfo (idx);
+	      if (si && tree_fits_shwi_p (si->nonzero_chars))
 		{
-		  strinfo *si = get_strinfo (idx);
-		  if (si
-		      && tree_fits_shwi_p (si->nonzero_chars))
-		    {
-		      *full_string_p = si->full_string_p;
-		      return tree_to_shwi (si->nonzero_chars);
-		    }
+		  unsigned len = tree_to_shwi (si->nonzero_chars);
+		  if (len < lenrange[0])
+		    lenrange[0] = len;
+		  if (lenrange[1] < len)
+		    lenrange[1] = len;
+		  if (lenrange[2] < len + 1)
+		    lenrange[2] = len + 1;
+
+		  if (!si->full_string_p)
+		    *nulterm = false;
+
+		  /* Since only the length of the string are known and
+		     its contents, clear ALLNUL and ALLNONNUL purely on
+		     the basis of the length.  */
+		  if (len)
+		    *allnul = false;
+		  else
+		    *allnonnul = false;
+		  return true;
 		}
 	    }
 	}
+
+      /* Proceed to extract the object representation below.  */
+      exp = TREE_OPERAND (arg, 0);
     }
 
-  if (TREE_CODE (rhs) == VAR_DECL
-      && TREE_READONLY (rhs))
-    rhs = DECL_INITIAL (rhs);
+  if (TREE_CODE (exp) == VAR_DECL && TREE_READONLY (exp))
+    {
+      exp = DECL_INITIAL (exp);
+      if (!exp)
+	return false;
+    }
 
-  if (rhs && TREE_CODE (rhs) == STRING_CST)
+  if (TREE_CODE (exp) == STRING_CST)
     {
-      HOST_WIDE_INT len = strlen (TREE_STRING_POINTER (rhs));
-      *full_string_p = len < TREE_STRING_LENGTH (rhs);
-      return len;
+      /* Set PREP and NBYTES to the string representation.  */
+      gcc_assert (offset <= INT_MAX);
+
+      if (!nbytes)
+	{
+	  /* Unless NBYTES has already been determined above from
+	     MEM_REF, set it here.  It includes all internal nuls,
+	     including the terminating one if the string has one.  */
+	  nbytes = TREE_STRING_LENGTH (exp);
+	  if (nbytes <= offset)
+	    return false;
+	}
+
+      prep = TREE_STRING_POINTER (exp) + offset;
     }
 
-  return -1;
+  unsigned char buf[256];
+  if (!prep)
+    {
+      /* Try to extract the representation of the constant object.  */
+      nbytes = native_encode_expr (exp, buf, sizeof buf, -1);
+      if (!nbytes)
+	return false;
+
+      prep = reinterpret_cast <char *>(buf);
+    }
+
+  /* Compute the number of leading nonzero bytes in the representation
+     and update the minimum and maximum.  */
+  unsigned n = strnlen (prep, nbytes);
+
+  if (n < lenrange[0])
+    lenrange[0] = n;
+  if (lenrange[1] < n)
+    lenrange[1] = n;
+
+  /* Set the size of the representation.  */
+  if (lenrange[2] < nbytes)
+    lenrange[2] = nbytes;
+
+  /* Clear NULTERM if none of the bytes is zero.  */
+  if (n == nbytes)
+    *nulterm = false;
+
+  if (n)
+    {
+      /* When the initial number of non-zero bytes N is non-zero, reset
+       *ALLNUL; if N is less than that the size of the representation
+       also clear *ALLNONNUL.  */
+      *allnul = false;
+      if (n < nbytes)
+	*allnonnul = false;
+    }
+  else if (*allnul || *allnonnul)
+    {
+      *allnonnul = false;
+
+      if (*allnul)
+	{
+	  /* When either ALLNUL is set and N is zero, also determine
+	     whether all subsequent bytes after the first one (which
+	     is nul) are zero or nonzero and clear ALLNUL if not.  */
+	  for (const char *p = prep; p != prep + nbytes; ++p)
+	    if (*p)
+	      {
+		*allnul = false;
+		break;
+	      }
+	}
+    }
+
+  return true;
+}
+
+/* Same as above except with an implicit SSA_NAME limit.  */
+
+static bool
+count_nonzero_bytes (tree exp, unsigned lenrange[3], bool *nulterm,
+		     bool *allnul, bool *allnonnul)
+{
+  /* Set to optimistic values so the caller doesn't have to worry about
+     initializing these and to what.  On success, the function will clear
+     these if it determines their values are different but being recursive
+     it never sets either to true.  On failure, their values are
+     unspecified.  */
+  *nulterm = true;
+  *allnul = true;
+  *allnonnul = true;
+
+  ssa_name_limit_t snlim;
+  return count_nonzero_bytes (exp, lenrange, nulterm, allnul, allnonnul, snlim);
 }
 
-/* Handle a single or multiple character store either by single
-   character assignment or by multi-character assignment from
-   MEM_REF.  */
+/* Handle a single or multibyte store other than by a built-in function,
+   either via a single character assignment or by multi-byte assignment
+   either via MEM_REF or via a type other than char (such as in
+   '*(int*)a = 12345').  Return true when handled.  */
 
 static bool
-handle_char_store (gimple_stmt_iterator *gsi)
+handle_store (gimple_stmt_iterator *gsi)
 {
   int idx = -1;
   strinfo *si = NULL;
   gimple *stmt = gsi_stmt (*gsi);
   tree ssaname = NULL_TREE, lhs = gimple_assign_lhs (stmt);
   tree rhs = gimple_assign_rhs1 (stmt);
+
+  /* The offset of the first byte in LHS modified by the the store.  */
   unsigned HOST_WIDE_INT offset = 0;
 
   if (TREE_CODE (lhs) == MEM_REF
@@ -3428,23 +3639,75 @@  handle_char_store (gimple_stmt_iterator *gsi)
 	si = get_strinfo (idx);
     }
 
+  /* Minimum and maximum leading non-zero bytes and the size of the store.  */
+  unsigned lenrange[] = { UINT_MAX, 0, 0 };
+
+  /* Set to the minimum length of the string being assigned if known.  */
+  unsigned HOST_WIDE_INT rhs_minlen;
+
   /* STORING_NONZERO_P is true iff not all stored characters are zero.
+     STORING_ALL_NONZERO_P is true if all stored characters are zero.
      STORING_ALL_ZEROS_P is true iff all stored characters are zero.
      Both are false when it's impossible to determine which is true.  */
   bool storing_nonzero_p;
-  bool storing_all_zeros_p = initializer_zerop (rhs, &storing_nonzero_p);
-  if (!storing_nonzero_p)
-    storing_nonzero_p = tree_expr_nonzero_p (rhs);
-  bool full_string_p = storing_all_zeros_p;
+  bool storing_all_nonzero_p;
+  bool storing_all_zeros_p;
+  /* FULL_STRING_P is set when the stored sequence of characters form
+     a nul-terminated string.  */
+  bool full_string_p;
 
-  /* Set to the length of the string being assigned if known.  */
-  HOST_WIDE_INT rhslen;
+  const bool ranges_valid
+    = count_nonzero_bytes (rhs, lenrange, &full_string_p,
+			   &storing_all_zeros_p, &storing_all_nonzero_p);
+  if (ranges_valid)
+    {
+      rhs_minlen = lenrange[0];
+      storing_nonzero_p = lenrange[1] > 0;
+
+      if (tree dstsize = compute_objsize (lhs, 1))
+	if (compare_tree_int (dstsize, lenrange[2]) < 0)
+	  warning_n (gimple_location (stmt), OPT_Wstringop_overflow_,
+		     lenrange[2],
+		     "%Gwriting %u byte into a region of size %E",
+		     "%Gwriting %u bytes into a region of size %E",
+		     stmt, lenrange[2], dstsize);
+    }
+  else
+    {
+      rhs_minlen = HOST_WIDE_INT_M1U;
+      full_string_p = false;
+      storing_nonzero_p = false;
+      storing_all_zeros_p = false;
+      storing_all_nonzero_p = false;
+    }
 
   if (si != NULL)
     {
-      int cmp = compare_nonzero_chars (si, offset);
-      gcc_assert (offset == 0 || cmp >= 0);
-      if (storing_all_zeros_p && cmp == 0 && si->full_string_p)
+      /* Set to 1 if the string described by SI is being written into
+	 before the terminating nul (if it has one), to zero if the nul
+	 is being overwritten but not beyond, or negative otherwise.  */
+      int store_b4_nul[2];
+      if (ranges_valid)
+	{
+	  /* The offset of the last stored byte.  */
+	  unsigned HOST_WIDE_INT endoff = offset + lenrange[2] - 1;
+	  store_b4_nul[0] = compare_nonzero_chars (si, offset);
+	  if (endoff == offset)
+	    store_b4_nul[1] = store_b4_nul[0];
+	  else
+	    store_b4_nul[1] = compare_nonzero_chars (si, endoff);
+	}
+      else
+	{
+	  store_b4_nul[0] = compare_nonzero_chars (si, offset);
+	  store_b4_nul[1] = store_b4_nul[0];
+	  gcc_assert (offset == 0 || store_b4_nul[0] >= 0);
+	}
+
+      if (storing_all_zeros_p
+	  && store_b4_nul[0] == 0
+	  && store_b4_nul[1] == 0
+	  && si->full_string_p)
 	{
 	  /* When overwriting a '\0' with a '\0', the store can be removed
 	     if we know it has been stored in the current function.  */
@@ -3463,16 +3726,23 @@  handle_char_store (gimple_stmt_iterator *gsi)
 	    }
 	}
 
-      if (cmp > 0
+      if (store_b4_nul[1] > 0
 	  && storing_nonzero_p
+	  /* */
+	  && (!ranges_valid
+	      || (lenrange[0] == lenrange[1]
+		  && lenrange[0] == lenrange[2]))
 	  && TREE_CODE (TREE_TYPE (rhs)) == INTEGER_TYPE)
 	{
-	  /* Handle a single non-nul character store.
+	  /* Handle a store of one or more non-nul characters that ends
+	     before the terminating nul of the destination and so does
+	     not affect its length
 	     If si->nonzero_chars > OFFSET, we aren't overwriting '\0',
-	     and if we aren't storing '\0', we know that the length of the
-	     string and any other zero terminated string in memory remains
-	     the same.  In that case we move to the next gimple statement and
-	     return to signal the caller that it shouldn't invalidate anything.
+	     and if we aren't storing '\0', we know that the length of
+	     the string and any other zero terminated string in memory
+	     remains the same.  In that case we move to the next gimple
+	     statement and return to signal the caller that it shouldn't
+	     invalidate anything.
 
 	     This is benefical for cases like:
 
@@ -3493,7 +3763,7 @@  handle_char_store (gimple_stmt_iterator *gsi)
 
       if (storing_all_zeros_p
 	  || storing_nonzero_p
-	  || (offset != 0 && cmp > 0))
+	  || (offset != 0 && store_b4_nul[1] > 0))
 	{
 	  /* When STORING_NONZERO_P, we know that the string will start
 	     with at least OFFSET + 1 nonzero characters.  If storing
@@ -3506,22 +3776,15 @@  handle_char_store (gimple_stmt_iterator *gsi)
 	     OFFSET characters long.
 
 	     Otherwise, we're storing an unknown value at offset OFFSET,
-	     so need to clip the nonzero_chars to OFFSET.  */
-	  bool full_string_p = storing_all_zeros_p;
-	  HOST_WIDE_INT len = 1;
-	  if (storing_nonzero_p)
-	    {
-	      /* Try to get the minimum length of the string (or
-		 individual character) being stored.  If it fails,
-		 STORING_NONZERO_P guarantees it's at least 1.  */
-	      len = get_min_string_length (rhs, &full_string_p);
-	      if (len < 0)
-		len = 1;
-	    }
-
+	     so need to clip the nonzero_chars to OFFSET.
+	     Use the minimum length of the string (or individual character)
+	     being stored if it's known.  Otherwise, STORING_NONZERO_P
+	     guarantees it's at least 1.  */
+	  HOST_WIDE_INT len
+	    = storing_nonzero_p && ranges_valid ? lenrange[0] : 1;
 	  location_t loc = gimple_location (stmt);
 	  tree oldlen = si->nonzero_chars;
-	  if (cmp == 0 && si->full_string_p)
+	  if (store_b4_nul[1] == 0 && si->full_string_p)
 	    /* We're overwriting the nul terminator with a nonzero or
 	       unknown character.  If the previous stmt was a memcpy,
 	       its length may be decreased.  */
@@ -3567,8 +3830,7 @@  handle_char_store (gimple_stmt_iterator *gsi)
 	  HOST_WIDE_INT slen = (storing_all_zeros_p
 				? 0
 				: (storing_nonzero_p
-				   ? get_min_string_length (rhs, &full_string_p)
-				   : -1));
+				   && ranges_valid ? lenrange[0] : -1));
 	  tree len = (slen <= 0
 		      ? size_zero_node
 		      : build_int_cst (size_type_node, slen));
@@ -3583,18 +3845,18 @@  handle_char_store (gimple_stmt_iterator *gsi)
 	}
     }
   else if (idx == 0
-	   && (rhslen = get_min_string_length (rhs, &full_string_p)) >= 0
+	   && rhs_minlen < HOST_WIDE_INT_M1U
 	   && ssaname == NULL_TREE
 	   && TREE_CODE (TREE_TYPE (lhs)) == ARRAY_TYPE)
     {
       HOST_WIDE_INT a = int_size_in_bytes (TREE_TYPE (lhs));
-      if (a > 0 && (unsigned HOST_WIDE_INT) a > (unsigned HOST_WIDE_INT) rhslen)
+      if (a > 0 && (unsigned HOST_WIDE_INT) a > rhs_minlen)
 	{
 	  int idx = new_addr_stridx (lhs);
 	  if (idx != 0)
 	    {
 	      si = new_strinfo (build_fold_addr_expr (lhs), idx,
-				build_int_cst (size_type_node, rhslen),
+				build_int_cst (size_type_node, rhs_minlen),
 				full_string_p);
 	      set_strinfo (idx, si);
 	      si->dont_invalidate = true;
@@ -3707,6 +3969,16 @@  fold_strstr_to_strncmp (tree rhs1, tree rhs2, gimple *stmt)
     }
 }
 
+/* Return true if TYPE corresponds to a narrow character type. */
+
+static bool
+is_char_type (tree type)
+{
+  return (TREE_CODE (type) == INTEGER_TYPE
+	  && TYPE_MODE (type) == TYPE_MODE (char_type_node)
+	  && TYPE_PRECISION (type) == TYPE_PRECISION (char_type_node));
+}
+
 /* Attempt to check for validity of the performed access a single statement
    at *GSI using string length knowledge, and to optimize it.
    If the given basic block needs clean-up of EH, CLEANUP_EH is set to
@@ -3907,18 +4179,32 @@  strlen_check_and_optimize_stmt (gimple_stmt_iterator *gsi, bool *cleanup_eh)
 	  }
       }
     else if (TREE_CODE (lhs) != SSA_NAME && !TREE_SIDE_EFFECTS (lhs))
-	{
-	  tree type = TREE_TYPE (lhs);
-	  if (TREE_CODE (type) == ARRAY_TYPE)
-	    type = TREE_TYPE (type);
-	  if (TREE_CODE (type) == INTEGER_TYPE
-	      && TYPE_MODE (type) == TYPE_MODE (char_type_node)
-	      && TYPE_PRECISION (type) == TYPE_PRECISION (char_type_node))
-	    {
-	      if (! handle_char_store (gsi))
-		return false;
-	    }
-	}
+      {
+	tree type = TREE_TYPE (lhs);
+	if (TREE_CODE (type) == ARRAY_TYPE)
+	  type = TREE_TYPE (type);
+
+	bool is_char_store = is_char_type (type);
+	if (!is_char_store && TREE_CODE (lhs) == MEM_REF)
+	  {
+	    /* To consider stores into char objects via integer types
+	       other than char but not those to non-character objects,
+	       determine the type of the destination rather than just
+	       the type of the access.  */
+	    tree ref = TREE_OPERAND (lhs, 0);
+	    type = TREE_TYPE (ref);
+	    if (TREE_CODE (type) == POINTER_TYPE)
+	      type = TREE_TYPE (type);
+	    if (TREE_CODE (type) == ARRAY_TYPE)
+	      type = TREE_TYPE (type);
+	    if (is_char_type (type))
+	      is_char_store = true;
+	  }
+
+	/* Handle a single or multibyte assignment.  */
+	if (is_char_store && !handle_store (gsi))
+	  return false;
+      }
     }
   else if (gcond *cond = dyn_cast<gcond *> (stmt))
     {