Message ID | df25c3ab-2e13-38c2-3883-25a00edf4d96@gmail.com |
---|---|
State | New |
Headers | show |
Series | avoid assuming known string length is constant (PR 83896) | expand |
Martin Sebor <msebor@gmail.com> writes: > Recent improvements to the strlen pass introduced the assumption > that when the length of a string has been recorded by the pass > the length is necessarily constant. Bug 83896 shows that this > assumption is not always true, and that GCC fails with an ICE > when it doesn't hold. To avoid the ICE the attached patch > removes the assumption. > > x86_64-linux bootstrap successful, regression test in progress. > > Martin > > PR tree-optimization/83896 - ice in get_string_len on a call to strlen with non-constant length > > gcc/ChangeLog: > > PR tree-optimization/83896 > * tree-ssa-strlen.c (get_string_len): Avoid assuming length is constant. > > gcc/testsuite/ChangeLog: > > PR tree-optimization/83896 > * gcc.dg/strlenopt-43.c: New test. > > Index: gcc/tree-ssa-strlen.c > =================================================================== > --- gcc/tree-ssa-strlen.c (revision 256752) > +++ gcc/tree-ssa-strlen.c (working copy) > @@ -2772,7 +2772,9 @@ handle_pointer_plus (gimple_stmt_iterator *gsi) > } > } > > -/* Check if RHS is string_cst possibly wrapped by mem_ref. */ > +/* If RHS, either directly or indirectly, refers to a string of constant > + length, return it. Otherwise return a negative value. */ > + > static int > get_string_len (tree rhs) > { I think this should be returning HOST_WIDE_INT given the unconstrained tree_to_shwi return. Same type change for rhslen in the caller. (Not my call, but it might be better to have a more specific function name, given that the file already had "get_string_length" before this function was added.) > @@ -2789,7 +2791,8 @@ get_string_len (tree rhs) > if (idx > 0) > { > strinfo *si = get_strinfo (idx); > - if (si && si->full_string_p) > + if (si && si->full_string_p > + && TREE_CODE (si->nonzero_chars) == INTEGER_CST) > return tree_to_shwi (si->nonzero_chars); tree_fits_shwi_p? Thanks, Richard > } > } > Index: gcc/testsuite/gcc.dg/strlenopt-43.c > =================================================================== > --- gcc/testsuite/gcc.dg/strlenopt-43.c (nonexistent) > +++ gcc/testsuite/gcc.dg/strlenopt-43.c (working copy) > @@ -0,0 +1,13 @@ > +/* PR tree-optimization/83896 - ice in get_string_len on a call to strlen > + with non-constant length > + { dg-do compile } > + { dg-options "-O2 -Wall" } */ > + > +extern char a[5]; > +extern char b[]; > + > +void f (void) > +{ > + if (__builtin_strlen (b) != 4) > + __builtin_memcpy (a, b, sizeof a); > +}
On Tue, Jan 16, 2018 at 07:37:30PM +0000, Richard Sandiford wrote: > > -/* Check if RHS is string_cst possibly wrapped by mem_ref. */ > > +/* If RHS, either directly or indirectly, refers to a string of constant > > + length, return it. Otherwise return a negative value. */ > > + > > static int > > get_string_len (tree rhs) > > { > > I think this should be returning HOST_WIDE_INT given the unconstrained > tree_to_shwi return. Same type change for rhslen in the caller. > > (Not my call, but it might be better to have a more specific function name, > given that the file already had "get_string_length" before this function > was added.) Yeah, certainly for both. > > @@ -2789,7 +2791,8 @@ get_string_len (tree rhs) > > if (idx > 0) > > { > > strinfo *si = get_strinfo (idx); > > - if (si && si->full_string_p) > > + if (si && si->full_string_p > > + && TREE_CODE (si->nonzero_chars) == INTEGER_CST) > > return tree_to_shwi (si->nonzero_chars); > > tree_fits_shwi_p? Surely that instead of TREE_CODE check, but even that will not make sure it fits into host int, so yes, it should be HOST_WIDE_INT and the code should make sure it is also >= 0. Jakub
On 01/16/2018 12:37 PM, Richard Sandiford wrote: > Martin Sebor <msebor@gmail.com> writes: >> Recent improvements to the strlen pass introduced the assumption >> that when the length of a string has been recorded by the pass >> the length is necessarily constant. Bug 83896 shows that this >> assumption is not always true, and that GCC fails with an ICE >> when it doesn't hold. To avoid the ICE the attached patch >> removes the assumption. >> >> x86_64-linux bootstrap successful, regression test in progress. >> >> Martin >> >> PR tree-optimization/83896 - ice in get_string_len on a call to strlen with non-constant length >> >> gcc/ChangeLog: >> >> PR tree-optimization/83896 >> * tree-ssa-strlen.c (get_string_len): Avoid assuming length is constant. >> >> gcc/testsuite/ChangeLog: >> >> PR tree-optimization/83896 >> * gcc.dg/strlenopt-43.c: New test. >> >> Index: gcc/tree-ssa-strlen.c >> =================================================================== >> --- gcc/tree-ssa-strlen.c (revision 256752) >> +++ gcc/tree-ssa-strlen.c (working copy) >> @@ -2772,7 +2772,9 @@ handle_pointer_plus (gimple_stmt_iterator *gsi) >> } >> } >> >> -/* Check if RHS is string_cst possibly wrapped by mem_ref. */ >> +/* If RHS, either directly or indirectly, refers to a string of constant >> + length, return it. Otherwise return a negative value. */ >> + >> static int >> get_string_len (tree rhs) >> { > > I think this should be returning HOST_WIDE_INT given the unconstrained > tree_to_shwi return. Same type change for rhslen in the caller. Thanks for looking at it! I confess it's not completely clear to me in what type the pass tracks string lengths. For string constants, get_stridx() returns an int with the their length bit-flipped. I tried to maintain that invariant in the change I introduced in the block toward the end of the function (in a different patch). But then in other places the pass works with HOST_WIDE_INT, so it looks like it would be appropriate to use here as well. I tried to come up with a test case that would exercise this conversion but couldn't. If you (or someone else) have an idea for one I'd be more than happy to add it to the test suite. > (Not my call, but it might be better to have a more specific function name, > given that the file already had "get_string_length" before this function > was added.) I renamed it (again), this time to get_string_cst_length(). Nothing better came to mind. > >> @@ -2789,7 +2791,8 @@ get_string_len (tree rhs) >> if (idx > 0) >> { >> strinfo *si = get_strinfo (idx); >> - if (si && si->full_string_p) >> + if (si && si->full_string_p >> + && TREE_CODE (si->nonzero_chars) == INTEGER_CST) >> return tree_to_shwi (si->nonzero_chars); > > tree_fits_shwi_p? Sigh. Yes. I still keep forgetting about all these gotchas. Dealing with integers is so painfully error-prone in GCC (as evident from all the bug reports with ICEs for these things). It would be much simpler and safer if tree_to_shwi() returned true on success and false for error (e.g., null, non-const, or overflow) and took an extra argument for the result. Then the code would become: HOST_WIDE_INT result; if (si && tree_to_shwi (&result, si->nonzero_chars)) return result; and it would be nearly impossible to forget to check for bad input. Anyway, attached is an updated patch. Martin > > Thanks, > Richard > >> } >> } >> Index: gcc/testsuite/gcc.dg/strlenopt-43.c >> =================================================================== >> --- gcc/testsuite/gcc.dg/strlenopt-43.c (nonexistent) >> +++ gcc/testsuite/gcc.dg/strlenopt-43.c (working copy) >> @@ -0,0 +1,13 @@ >> +/* PR tree-optimization/83896 - ice in get_string_len on a call to strlen >> + with non-constant length >> + { dg-do compile } >> + { dg-options "-O2 -Wall" } */ >> + >> +extern char a[5]; >> +extern char b[]; >> + >> +void f (void) >> +{ >> + if (__builtin_strlen (b) != 4) >> + __builtin_memcpy (a, b, sizeof a); >> +} PR tree-optimization/83896 - ice in get_string_len on a call to strlen with non-constant length gcc/ChangeLog: PR tree-optimization/83896 * tree-ssa-strlen.c (get_string_len): Rename... (get_string_cst_length): ...to this. Return HOST_WIDE_INT. Avoid assuming length is constant. (handle_char_store): Use HOST_WIDE_INT for string length. gcc/testsuite/ChangeLog: PR tree-optimization/83896 * gcc.dg/strlenopt-43.c: New test. Index: gcc/tree-ssa-strlen.c =================================================================== --- gcc/tree-ssa-strlen.c (revision 256752) +++ gcc/tree-ssa-strlen.c (working copy) @@ -2772,16 +2772,20 @@ handle_pointer_plus (gimple_stmt_iterator *gsi) } } -/* Check if RHS is string_cst possibly wrapped by mem_ref. */ -static int -get_string_len (tree rhs) +/* If RHS, either directly or indirectly, refers to a string of constant + length, return it. Otherwise return a negative value. */ + +static HOST_WIDE_INT +get_string_cst_length (tree rhs) { if (TREE_CODE (rhs) == MEM_REF && integer_zerop (TREE_OPERAND (rhs, 1))) { - tree rhs_addr = rhs = TREE_OPERAND (rhs, 0); + rhs = TREE_OPERAND (rhs, 0); if (TREE_CODE (rhs) == ADDR_EXPR) { + tree rhs_addr = rhs; + rhs = TREE_OPERAND (rhs, 0); if (TREE_CODE (rhs) != STRING_CST) { @@ -2789,7 +2793,8 @@ handle_pointer_plus (gimple_stmt_iterator *gsi) if (idx > 0) { strinfo *si = get_strinfo (idx); - if (si && si->full_string_p) + if (si && si->full_string_p + && tree_fits_shwi_p (si->nonzero_chars)) return tree_to_shwi (si->nonzero_chars); } } @@ -2801,10 +2806,7 @@ handle_pointer_plus (gimple_stmt_iterator *gsi) rhs = DECL_INITIAL (rhs); if (rhs && TREE_CODE (rhs) == STRING_CST) - { - unsigned HOST_WIDE_INT ilen = strlen (TREE_STRING_POINTER (rhs)); - return ilen <= INT_MAX ? ilen : -1; - } + return strlen (TREE_STRING_POINTER (rhs)); return -1; } @@ -2822,7 +2824,7 @@ handle_char_store (gimple_stmt_iterator *gsi) unsigned HOST_WIDE_INT offset = 0; /* Set to the length of the string being assigned if known. */ - int rhslen; + HOST_WIDE_INT rhslen; if (TREE_CODE (lhs) == MEM_REF && TREE_CODE (TREE_OPERAND (lhs, 0)) == SSA_NAME) @@ -2967,7 +2969,7 @@ handle_char_store (gimple_stmt_iterator *gsi) } } else if (idx == 0 - && (rhslen = get_string_len (gimple_assign_rhs1 (stmt))) >= 0 + && (rhslen = get_string_cst_length (gimple_assign_rhs1 (stmt))) >= 0 && ssaname == NULL_TREE && TREE_CODE (TREE_TYPE (lhs)) == ARRAY_TYPE) { Index: gcc/testsuite/gcc.dg/strlenopt-43.c =================================================================== --- gcc/testsuite/gcc.dg/strlenopt-43.c (nonexistent) +++ gcc/testsuite/gcc.dg/strlenopt-43.c (working copy) @@ -0,0 +1,13 @@ +/* PR tree-optimization/83896 - ice in get_string_len on a call to strlen + with non-constant length + { dg-do compile } + { dg-options "-O2 -Wall -fdump-tree-optimized" } */ + +extern char a[5]; +extern char b[]; + +void f (void) +{ + if (__builtin_strlen (b) != 4) + __builtin_memcpy (a, b, sizeof a); +}
On 01/16/2018 02:26 PM, Jakub Jelinek wrote: > On Tue, Jan 16, 2018 at 07:37:30PM +0000, Richard Sandiford wrote: >>> -/* Check if RHS is string_cst possibly wrapped by mem_ref. */ >>> +/* If RHS, either directly or indirectly, refers to a string of constant >>> + length, return it. Otherwise return a negative value. */ >>> + >>> static int >>> get_string_len (tree rhs) >>> { >> >> I think this should be returning HOST_WIDE_INT given the unconstrained >> tree_to_shwi return. Same type change for rhslen in the caller. >> >> (Not my call, but it might be better to have a more specific function name, >> given that the file already had "get_string_length" before this function >> was added.) > > Yeah, certainly for both. > >>> @@ -2789,7 +2791,8 @@ get_string_len (tree rhs) >>> if (idx > 0) >>> { >>> strinfo *si = get_strinfo (idx); >>> - if (si && si->full_string_p) >>> + if (si && si->full_string_p >>> + && TREE_CODE (si->nonzero_chars) == INTEGER_CST) >>> return tree_to_shwi (si->nonzero_chars); >> >> tree_fits_shwi_p? > > Surely that instead of TREE_CODE check, but even that will not make sure it > fits into host int, so yes, it should be HOST_WIDE_INT and the code should > make sure it is also >= 0. I made these changes except for the last part: How/when can the length be negative? Martin
On Tue, Jan 16, 2018 at 03:20:24PM -0700, Martin Sebor wrote: > Thanks for looking at it! I confess it's not completely clear > to me in what type the pass tracks string lengths. For string > constants, get_stridx() returns an int with the their length > bit-flipped. I tried to maintain that invariant in the change That is because TREE_STRING_LENGTH is an int, so gcc doesn't allow string literals longer than 2GB. All other length are tracked as tree. Jakub
On Tue, Jan 16, 2018 at 03:20:24PM -0700, Martin Sebor wrote: > gcc/ChangeLog: > > PR tree-optimization/83896 > * tree-ssa-strlen.c (get_string_len): Rename... > (get_string_cst_length): ...to this. Return HOST_WIDE_INT. > Avoid assuming length is constant. > (handle_char_store): Use HOST_WIDE_INT for string length. > > gcc/testsuite/ChangeLog: > > PR tree-optimization/83896 > * gcc.dg/strlenopt-43.c: New test. ... > > if (TREE_CODE (rhs) == MEM_REF > && integer_zerop (TREE_OPERAND (rhs, 1))) For the future, there is no reason it couldn't handle also non-zero offsets if the to be returned length is bigger or equal than that offset, where the offset can be then subtracted. But let's defer that for GCC9. > @@ -2789,7 +2793,8 @@ handle_pointer_plus (gimple_stmt_iterator *gsi) > if (idx > 0) > { > strinfo *si = get_strinfo (idx); > - if (si && si->full_string_p) > + if (si && si->full_string_p > + && tree_fits_shwi_p (si->nonzero_chars)) If a && or || using condition doesn't fit onto a single line, it should be split into one condition per line, so please change the above into: if (si && si->full_string_p && tree_fits_shwi_p (si->nonzero_chars)) > @@ -2822,7 +2824,7 @@ handle_char_store (gimple_stmt_iterator *gsi) > unsigned HOST_WIDE_INT offset = 0; > > /* Set to the length of the string being assigned if known. */ > - int rhslen; > + HOST_WIDE_INT rhslen; Please remove the empty line above the above comment, the function starts with definitions of multiple variables, rhslen isn't even initialized and there is nothing special on it compared to others, so they should be in one block. Ok for trunk with those changes. > --- gcc/testsuite/gcc.dg/strlenopt-43.c (nonexistent) > +++ gcc/testsuite/gcc.dg/strlenopt-43.c (working copy) > @@ -0,0 +1,13 @@ > +/* PR tree-optimization/83896 - ice in get_string_len on a call to strlen > + with non-constant length > + { dg-do compile } > + { dg-options "-O2 -Wall -fdump-tree-optimized" } */ > + > +extern char a[5]; > +extern char b[]; > + > +void f (void) > +{ > + if (__builtin_strlen (b) != 4) > + __builtin_memcpy (a, b, sizeof a); > +} Most of the strlenopt*.c tests use the strlenopt.h header and then use the non-__builtin_ function names, if you want to change that too, please do so, but not a requirement. Jakub
On 01/22/2018 09:33 AM, Jakub Jelinek wrote: > On Tue, Jan 16, 2018 at 03:20:24PM -0700, Martin Sebor wrote: >> gcc/ChangeLog: >> >> PR tree-optimization/83896 >> * tree-ssa-strlen.c (get_string_len): Rename... >> (get_string_cst_length): ...to this. Return HOST_WIDE_INT. >> Avoid assuming length is constant. >> (handle_char_store): Use HOST_WIDE_INT for string length. >> >> gcc/testsuite/ChangeLog: >> >> PR tree-optimization/83896 >> * gcc.dg/strlenopt-43.c: New test. > ... >> >> if (TREE_CODE (rhs) == MEM_REF >> && integer_zerop (TREE_OPERAND (rhs, 1))) > > For the future, there is no reason it couldn't handle > also non-zero offsets if the to be returned length is bigger or > equal than that offset, where the offset can be then subtracted. > But let's defer that for GCC9. I opened bug 84069 for this oversight. > >> @@ -2789,7 +2793,8 @@ handle_pointer_plus (gimple_stmt_iterator *gsi) >> if (idx > 0) >> { >> strinfo *si = get_strinfo (idx); >> - if (si && si->full_string_p) >> + if (si && si->full_string_p >> + && tree_fits_shwi_p (si->nonzero_chars)) > > If a && or || using condition doesn't fit onto a single line, it should be > split into one condition per line, so please change the above into: > if (si > && si->full_string_p > && tree_fits_shwi_p (si->nonzero_chars)) > >> @@ -2822,7 +2824,7 @@ handle_char_store (gimple_stmt_iterator *gsi) >> unsigned HOST_WIDE_INT offset = 0; >> >> /* Set to the length of the string being assigned if known. */ >> - int rhslen; >> + HOST_WIDE_INT rhslen; > > Please remove the empty line above the above comment, the function > starts with definitions of multiple variables, rhslen isn't even > initialized and there is nothing special on it compared to others, > so they should be in one block. > > Ok for trunk with those changes. I committed the patch with the tweaks suggested above and below in 257100. Martin > >> --- gcc/testsuite/gcc.dg/strlenopt-43.c (nonexistent) >> +++ gcc/testsuite/gcc.dg/strlenopt-43.c (working copy) >> @@ -0,0 +1,13 @@ >> +/* PR tree-optimization/83896 - ice in get_string_len on a call to strlen >> + with non-constant length >> + { dg-do compile } >> + { dg-options "-O2 -Wall -fdump-tree-optimized" } */ >> + >> +extern char a[5]; >> +extern char b[]; >> + >> +void f (void) >> +{ >> + if (__builtin_strlen (b) != 4) >> + __builtin_memcpy (a, b, sizeof a); >> +} > > Most of the strlenopt*.c tests use the strlenopt.h header and then > use the non-__builtin_ function names, if you want to change that too, > please do so, but not a requirement. > > Jakub >
PR tree-optimization/83896 - ice in get_string_len on a call to strlen with non-constant length gcc/ChangeLog: PR tree-optimization/83896 * tree-ssa-strlen.c (get_string_len): Avoid assuming length is constant. gcc/testsuite/ChangeLog: PR tree-optimization/83896 * gcc.dg/strlenopt-43.c: New test. Index: gcc/tree-ssa-strlen.c =================================================================== --- gcc/tree-ssa-strlen.c (revision 256752) +++ gcc/tree-ssa-strlen.c (working copy) @@ -2772,7 +2772,9 @@ handle_pointer_plus (gimple_stmt_iterator *gsi) } } -/* Check if RHS is string_cst possibly wrapped by mem_ref. */ +/* If RHS, either directly or indirectly, refers to a string of constant + length, return it. Otherwise return a negative value. */ + static int get_string_len (tree rhs) { @@ -2789,7 +2791,8 @@ get_string_len (tree rhs) if (idx > 0) { strinfo *si = get_strinfo (idx); - if (si && si->full_string_p) + if (si && si->full_string_p + && TREE_CODE (si->nonzero_chars) == INTEGER_CST) return tree_to_shwi (si->nonzero_chars); } } Index: gcc/testsuite/gcc.dg/strlenopt-43.c =================================================================== --- gcc/testsuite/gcc.dg/strlenopt-43.c (nonexistent) +++ gcc/testsuite/gcc.dg/strlenopt-43.c (working copy) @@ -0,0 +1,13 @@ +/* PR tree-optimization/83896 - ice in get_string_len on a call to strlen + with non-constant length + { dg-do compile } + { dg-options "-O2 -Wall" } */ + +extern char a[5]; +extern char b[]; + +void f (void) +{ + if (__builtin_strlen (b) != 4) + __builtin_memcpy (a, b, sizeof a); +}