Message ID | 5ee56fe9-bf96-196f-6bd6-6ecbda2d1ca0@gmail.com |
---|---|
State | New |
Headers | show |
Series | print full STRING_CST in Gimple dumps (PR 87052) | expand |
On Wed, Aug 22, 2018 at 4:56 AM Martin Sebor <msebor@gmail.com> wrote: > > In the discussion of the fallout from the enhancement for pr71625 > it was pointed out that STRING_CSts in Gimple dumps extend only > to the first nul and don't include any subsequent characters, > and that this makes the dumps harder to read and might give rise > to the question whether the code is correct. > > In the attached patch I enhance the pretty_print_string() function > to print the full contents of a STRING_CST, including any embedded > nuls to make the dumps clearer. I got rid of the single digit > escapes like '\1' since they make the string look ambiguous. > If TREE_STRING_LENGTH (node) is guaranteed to be non-zero these > days the test for it being so may be redundant but I figured it's > better to be safe than sorry. > > A further enhancement might be to also distinguish the type of > the STRING_CST. And somehow indicate whether it is \0 terminated (just thinking of the GIMPLE FE and how it parses string literals simply by relying on libcpp). Can you write a not \0 terminated string literal in C? The patch is OK. Thanks, Richard. > > Martin >
On 08/22/2018 06:48 AM, Richard Biener wrote: > On Wed, Aug 22, 2018 at 4:56 AM Martin Sebor <msebor@gmail.com> wrote: >> >> In the discussion of the fallout from the enhancement for pr71625 >> it was pointed out that STRING_CSts in Gimple dumps extend only >> to the first nul and don't include any subsequent characters, >> and that this makes the dumps harder to read and might give rise >> to the question whether the code is correct. >> >> In the attached patch I enhance the pretty_print_string() function >> to print the full contents of a STRING_CST, including any embedded >> nuls to make the dumps clearer. I got rid of the single digit >> escapes like '\1' since they make the string look ambiguous. >> If TREE_STRING_LENGTH (node) is guaranteed to be non-zero these >> days the test for it being so may be redundant but I figured it's >> better to be safe than sorry. >> >> A further enhancement might be to also distinguish the type of >> the STRING_CST. > > And somehow indicate whether it is \0 terminated (just thinking of > the GIMPLE FE and how it parses string literals simply by relying > on libcpp). > > Can you write a not \0 terminated string literal in C? Yes: char a[2] = "12"; I briefly considered making the terminating nul visible when there was one but there's at least one test that scans GIMPLE for the expected string and it failed so I didn't pursue this idea further. The tests can certainly be changed to look for the new pattern if we should decide to make a change here. I'm not sure what would be best. Printing "12\x00" for an ordinary nul-terminated literal looks like there might be an extra nul after the first one. Leaving it out doesn't distinguish it from the unterminated array. Using the braced-list notation (i.e., a = { '1', '2' } for unterminated arrays doesn't capture the fact that it's represented as STRING_CST). I'm open to suggestions. Martin
On Wed, Aug 22, 2018 at 5:05 PM Martin Sebor <msebor@gmail.com> wrote: > > On 08/22/2018 06:48 AM, Richard Biener wrote: > > On Wed, Aug 22, 2018 at 4:56 AM Martin Sebor <msebor@gmail.com> wrote: > >> > >> In the discussion of the fallout from the enhancement for pr71625 > >> it was pointed out that STRING_CSts in Gimple dumps extend only > >> to the first nul and don't include any subsequent characters, > >> and that this makes the dumps harder to read and might give rise > >> to the question whether the code is correct. > >> > >> In the attached patch I enhance the pretty_print_string() function > >> to print the full contents of a STRING_CST, including any embedded > >> nuls to make the dumps clearer. I got rid of the single digit > >> escapes like '\1' since they make the string look ambiguous. > >> If TREE_STRING_LENGTH (node) is guaranteed to be non-zero these > >> days the test for it being so may be redundant but I figured it's > >> better to be safe than sorry. > >> > >> A further enhancement might be to also distinguish the type of > >> the STRING_CST. > > > > And somehow indicate whether it is \0 terminated (just thinking of > > the GIMPLE FE and how it parses string literals simply by relying > > on libcpp). > > > > Can you write a not \0 terminated string literal in C? > > Yes: char a[2] = "12"; I thought they are fully defined in translation phase #1 ... > I briefly considered making the terminating nul visible when > there was one but there's at least one test that scans GIMPLE > for the expected string and it failed so I didn't pursue this > idea further. Yes, I think it would just confuse people at the moment. I thought of somehow marking non-terminated literals, like "12"[nt] or so ... or by using an alternate quote "12' (eh). Anyway, not too important I guess if you for a moment exclude the -fdump-XXX-gimple and re-parsing with the GIMPLE FE case. Richard. > The tests can certainly be changed to look for the new pattern > if we should decide to make a change here. I'm not sure what > would be best. Printing "12\x00" for an ordinary nul-terminated > literal looks like there might be an extra nul after the first > one. Leaving it out doesn't distinguish it from the unterminated > array. Using the braced-list notation (i.e., a = { '1', '2' } > for unterminated arrays doesn't capture the fact that it's > represented as STRING_CST). > > I'm open to suggestions. > > Martin
Hi, On Thu, 23 Aug 2018, Richard Biener wrote: > > > Can you write a not \0 terminated string literal in C? > > > > Yes: char a[2] = "12"; > > I thought they are fully defined in translation phase #1 ... No, you can't write a string literal which is not zero terminated, because in translation phase 7 a zero code is appended to all character sequences resulting from string literals, which is then used to allocate and initialize a static (wide) character array of just the right size, including the zero code. The above construct uses that static char[3] array from the string literal to initialize a char[2] array (which is explicitely allowed), and _that_ one is not zero terminated. But it's also no string literal. (Of course, due to as-if the intermediate char[3] array won't usually be explicitely constructed.) Ciao, Michael.
On 08/23/2018 08:07 AM, Michael Matz wrote: > Hi, > > On Thu, 23 Aug 2018, Richard Biener wrote: > >>>> Can you write a not \0 terminated string literal in C? >>> >>> Yes: char a[2] = "12"; >> >> I thought they are fully defined in translation phase #1 ... > > No, you can't write a string literal which is not zero terminated, because > in translation phase 7 a zero code is appended to all character sequences > resulting from string literals, which is then used to allocate and > initialize a static (wide) character array of just the right size, > including the zero code. > > The above construct uses that static char[3] array from the string literal > to initialize a char[2] array (which is explicitely allowed), and _that_ > one is not zero terminated. But it's also no string literal. Yes, you're right. I misinterpreted Richard's question as asking if one could construct an unterminated array of chars using a string literal. The distinction that I thought would be useful to capture in Gimple is between "12" that initializes an array of two elements (and where the nul doesn't fit) vs "12" that initializes one of three elements (and where the nul does fit). It's probably not terribly important when the array type also appears in Gimple like in the case above but there are (perhaps corner) cases where it doesn't, as in the second one below: char a[2] = "12"; char *p = (char[2]){ "12" }; which ends up represented as: char a[2]; char * p; char D.1910[2]; try { a = "12"; D.1910 = "12"; p = &D.1910; } Martin
On Thu, Aug 23, 2018 at 5:13 PM Martin Sebor <msebor@gmail.com> wrote: > > On 08/23/2018 08:07 AM, Michael Matz wrote: > > Hi, > > > > On Thu, 23 Aug 2018, Richard Biener wrote: > > > >>>> Can you write a not \0 terminated string literal in C? > >>> > >>> Yes: char a[2] = "12"; > >> > >> I thought they are fully defined in translation phase #1 ... > > > > No, you can't write a string literal which is not zero terminated, because > > in translation phase 7 a zero code is appended to all character sequences > > resulting from string literals, which is then used to allocate and > > initialize a static (wide) character array of just the right size, > > including the zero code. > > > > The above construct uses that static char[3] array from the string literal > > to initialize a char[2] array (which is explicitely allowed), and _that_ > > one is not zero terminated. But it's also no string literal. > > Yes, you're right. I misinterpreted Richard's question as > asking if one could construct an unterminated array of chars > using a string literal. > > The distinction that I thought would be useful to capture in > Gimple is between "12" that initializes an array of two elements > (and where the nul doesn't fit) vs "12" that initializes one of > three elements (and where the nul does fit). It's probably not > terribly important when the array type also appears in Gimple > like in the case above but there are (perhaps corner) cases > where it doesn't, as in the second one below: > > char a[2] = "12"; > char *p = (char[2]){ "12" }; > > which ends up represented as: > > char a[2]; > char * p; > char D.1910[2]; > > try > { > a = "12"; > D.1910 = "12"; > p = &D.1910; > } Yeah, we currently require useless_type_conversion_p between the LHS type and the RHS type (type of a vs. type of "12") which implies TYPE_SIZE matches (if constant). That means GIMPLE expects at least the type of "12" to not include the NUL. Richard. > Martin
PR middle-end/87052 - STRING_CST printing incomplete in Gimple dumps gcc/testsuite/ChangeLog: PR middle-end/87052 * gcc.dg/pr87052.c: New test. gcc/ChangeLog: PR middle-end/87052 * tree-pretty-print.c (pretty_print_string): Add argument. (dump_generic_node): Call to pretty_print_string with string size. Index: gcc/tree-pretty-print.c =================================================================== --- gcc/tree-pretty-print.c (revision 263754) +++ gcc/tree-pretty-print.c (working copy) @@ -37,7 +37,7 @@ along with GCC; see the file COPYING3. If not see /* Local functions, macros and variables. */ static const char *op_symbol (const_tree); -static void pretty_print_string (pretty_printer *, const char*); +static void pretty_print_string (pretty_printer *, const char*, unsigned); static void newline_and_indent (pretty_printer *, int); static void maybe_init_pretty_print (FILE *); static void print_struct_decl (pretty_printer *, const_tree, int, dump_flags_t); @@ -1800,10 +1800,13 @@ dump_generic_node (pretty_printer *pp, tree node, break; case STRING_CST: - pp_string (pp, "\""); - pretty_print_string (pp, TREE_STRING_POINTER (node)); - pp_string (pp, "\""); - break; + { + pp_string (pp, "\""); + if (unsigned nbytes = TREE_STRING_LENGTH (node)) + pretty_print_string (pp, TREE_STRING_POINTER (node), nbytes); + pp_string (pp, "\""); + break; + } case VECTOR_CST: { @@ -3865,15 +3868,16 @@ print_call_name (pretty_printer *pp, tree node, du } } -/* Parses the string STR and replaces new-lines by '\n', tabs by '\t', ... */ +/* Print the first N characters in the array STR, replacing non-printable + characters (including embedded nuls) with unambiguous escape sequences. */ static void -pretty_print_string (pretty_printer *pp, const char *str) +pretty_print_string (pretty_printer *pp, const char *str, unsigned n) { if (str == NULL) return; - while (*str) + for ( ; n; --n, ++str) { switch (str[0]) { @@ -3913,48 +3917,20 @@ static void pp_string (pp, "\\'"); break; - /* No need to handle \0; the loop terminates on \0. */ - - case '\1': - pp_string (pp, "\\1"); - break; - - case '\2': - pp_string (pp, "\\2"); - break; - - case '\3': - pp_string (pp, "\\3"); - break; - - case '\4': - pp_string (pp, "\\4"); - break; - - case '\5': - pp_string (pp, "\\5"); - break; - - case '\6': - pp_string (pp, "\\6"); - break; - - case '\7': - pp_string (pp, "\\7"); - break; - default: - if (!ISPRINT (str[0])) + if (str[0] || n > 1) { - char buf[5]; - sprintf (buf, "\\x%x", (unsigned char)str[0]); - pp_string (pp, buf); + if (!ISPRINT (str[0])) + { + char buf[5]; + sprintf (buf, "\\x%02x", (unsigned char)str[0]); + pp_string (pp, buf); + } + else + pp_character (pp, str[0]); + break; } - else - pp_character (pp, str[0]); - break; } - str++; } } Index: gcc/testsuite/gcc.dg/pr87052.c =================================================================== --- gcc/testsuite/gcc.dg/pr87052.c (nonexistent) +++ gcc/testsuite/gcc.dg/pr87052.c (working copy) @@ -0,0 +1,41 @@ +/* PR middle-end/87052 - STRING_CST printing incomplete in Gimple dumps + { dg-do compile } + { dg-options "-fdump-tree-gimple" } */ + +void sink (const void*, ...); + +void test (void) +{ + const char a[3] = "\000ab"; + + /* Expect the following in the dump: + a = "\x00ab"; */ + + const char b[] = { 'a', 0, 'b', 'c' }; + + /* Expect the following: + b = "a\x00bc"; */ + + const char c[] = ""; + + /* Expect the following: + c = ""; */ + + const char d[0] = { }; + + /* Expect the following: + d = ""; */ + + const char e[0] = ""; + + /* Expect nothing. */ + + sink (a, b, c, d, e); +} + +/* { dg-final { scan-tree-dump-times "a = \"\\\\x00ab\";" 1 "gimple" } } + { dg-final { scan-tree-dump-times "b = \"a\\\\x00bc\";" 1 "gimple" } } + { dg-final { scan-tree-dump-times "c = \"\";" 1 "gimple" } } + { dg-final { scan-tree-dump-times "d = { *};" 1 "gimple" } } + { dg-final { scan-tree-dump-times "e = " 1 "gimple" } } + { dg-final { scan-tree-dump-times "e = {CLOBBER}" 1 "gimple" } } */