Fix not properly nul-terminated string constants in JIT

Message ID AM5PR0701MB2657D814EBE588723929D667E4210@AM5PR0701MB2657.eurprd07.prod.outlook.com
State New
Headers show
Series
  • Fix not properly nul-terminated string constants in JIT
Related show

Commit Message

Bernd Edlinger Aug. 5, 2018, 4:59 p.m.
Hi!


My other patch with adds assertions to varasm.c regarding correct
nul termination of sting literals did make these incorrect string
constants in JIT frontend fail.

The string constants are not nul terminated if their length exceeds
200 characters.  The test cases do not use strings of that size where
that would make a difference.  But using a fixed index type is clearly
wrong.

This patch removes the fixed char[200] array type from playback::context,
and uses build_string_literal instead of using build_string directly.


Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
Is it OK for trunk?


Thanks
Bernd.

Comments

Bernd Edlinger Aug. 26, 2018, 7:40 p.m. | #1
Ping...

This is just plain wrong, independent
of any STRING_CST semantic issues.

The original patch (retested on current trunk) is
here: https://gcc.gnu.org/ml/gcc-patches/2018-08/msg00370.html


On 08/05/18 18:59, Bernd Edlinger wrote:
> Hi!
> 
> 
> My other patch with adds assertions to varasm.c regarding correct
> nul termination of sting literals did make these incorrect string
> constants in JIT frontend fail.
> 
> The string constants are not nul terminated if their length exceeds
> 200 characters.  The test cases do not use strings of that size where
> that would make a difference.  But using a fixed index type is clearly
> wrong.
> 
> This patch removes the fixed char[200] array type from playback::context,
> and uses build_string_literal instead of using build_string directly.
> 
> 
> Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
> Is it OK for trunk?
> 
> 
> Thanks
> Bernd.
Bernd Edlinger Oct. 10, 2018, 7:43 p.m. | #2
Ping...


On 08/26/18 21:40, Bernd Edlinger wrote:
> Ping...
> 
> This is just plain wrong, independent
> of any STRING_CST semantic issues.
> 
> The original patch (retested on current trunk) is
> here: https://gcc.gnu.org/ml/gcc-patches/2018-08/msg00370.html
> 
> 
> On 08/05/18 18:59, Bernd Edlinger wrote:
>> Hi!
>>
>>
>> My other patch with adds assertions to varasm.c regarding correct
>> nul termination of sting literals did make these incorrect string
>> constants in JIT frontend fail.
>>
>> The string constants are not nul terminated if their length exceeds
>> 200 characters.  The test cases do not use strings of that size where
>> that would make a difference.  But using a fixed index type is clearly
>> wrong.
>>
>> This patch removes the fixed char[200] array type from playback::context,
>> and uses build_string_literal instead of using build_string directly.
>>
>>
>> Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
>> Is it OK for trunk?
>>
>>
>> Thanks
>> Bernd.
Bernd Edlinger Nov. 1, 2018, 3:09 p.m. | #3
Ping...

On 10/10/18 9:43 PM, Bernd Edlinger wrote:
> Ping...
> 
> 
> On 08/26/18 21:40, Bernd Edlinger wrote:
>> Ping...
>>
>> This is just plain wrong, independent
>> of any STRING_CST semantic issues.
>>
>> The original patch (retested on current trunk) is
>> here: https://gcc.gnu.org/ml/gcc-patches/2018-08/msg00370.html
>>
>>
>> On 08/05/18 18:59, Bernd Edlinger wrote:
>>> Hi!
>>>
>>>
>>> My other patch with adds assertions to varasm.c regarding correct
>>> nul termination of sting literals did make these incorrect string
>>> constants in JIT frontend fail.
>>>
>>> The string constants are not nul terminated if their length exceeds
>>> 200 characters.  The test cases do not use strings of that size where
>>> that would make a difference.  But using a fixed index type is clearly
>>> wrong.
>>>
>>> This patch removes the fixed char[200] array type from playback::context,
>>> and uses build_string_literal instead of using build_string directly.
>>>
>>>
>>> Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
>>> Is it OK for trunk?
>>>
>>>
>>> Thanks
>>> Bernd.
David Malcolm Nov. 2, 2018, 8:40 p.m. | #4
On Sun, 2018-08-05 at 16:59 +0000, Bernd Edlinger wrote:
> Hi!
> 
> 
> My other patch with adds assertions to varasm.c regarding correct
> nul termination of sting literals did make these incorrect string
> constants in JIT frontend fail.
> 
> The string constants are not nul terminated if their length exceeds
> 200 characters.  The test cases do not use strings of that size where
> that would make a difference.  But using a fixed index type is
> clearly
> wrong.
> 
> This patch removes the fixed char[200] array type from
> playback::context,
> and uses build_string_literal instead of using build_string directly.
> 
> 
> Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
> Is it OK for trunk?

Sorry for the belated response.

Was this tested with --enable-host-shared and --enable-languages=jit ?
Note that "jit" is not included in --enable-languages=all.

The patch seems reasonable, but I'm a little confused over the meaning
of "len" in build_string_literal and build_string: does it refer to the
length or the size of the string?

> @@ -617,16 +616,9 @@ playback::rvalue *
>  playback::context::
>  new_string_literal (const char *value)
>  {
> -  tree t_str = build_string (strlen (value), value);
> -  gcc_assert (m_char_array_type_node);
> -  TREE_TYPE (t_str) = m_char_array_type_node;
> -
> -  /* Convert to (const char*), loosely based on
> -     c/c-typeck.c: array_to_pointer_conversion,
> -     by taking address of start of string.  */
> -  tree t_addr = build1 (ADDR_EXPR, m_const_char_ptr, t_str);
> +  tree t_str = build_string_literal (strlen (value) + 1, value);
>  
> -  return new rvalue (this, t_addr);
> +  return new rvalue (this, t_str);
>  }

In the above, the call to build_string with strlen is replaced with
build_string_literal with strlen + 1.


build_string's comment says:

"Note that for a C string literal, LEN should include the trailing
NUL."

but has:

  length = len + offsetof (struct tree_string, str) + 1;

and:

  TREE_STRING_LENGTH (s) = len;
  memcpy (s->string.str, str, len);
  s->string.str[len] = '\0';

suggesting that the "len" parameter is in fact the length *without* the
trailing NUL, and that a trailing NUL is added by build_string.

However build_string_literal has:

  t = build_string (len, str);
  elem = build_type_variant (char_type_node, 1, 0);
  index = build_index_type (size_int (len - 1));

suggesting that the len is passed directly to build_string (and thus
ought to be strlen), but the build_index_type uses len - 1 (which
suggests that len is the size of the string, rather than its length).

What's the intended meaning of len in these functions?

Thanks
Dave
Bernd Edlinger Nov. 2, 2018, 9:48 p.m. | #5
On 11/2/18 9:40 PM, David Malcolm wrote:
> On Sun, 2018-08-05 at 16:59 +0000, Bernd Edlinger wrote:
>> Hi!
>>
>>
>> My other patch with adds assertions to varasm.c regarding correct
>> nul termination of sting literals did make these incorrect string
>> constants in JIT frontend fail.
>>
>> The string constants are not nul terminated if their length exceeds
>> 200 characters.  The test cases do not use strings of that size where
>> that would make a difference.  But using a fixed index type is
>> clearly
>> wrong.
>>
>> This patch removes the fixed char[200] array type from
>> playback::context,
>> and uses build_string_literal instead of using build_string directly.
>>
>>
>> Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
>> Is it OK for trunk?
> 
> Sorry for the belated response.
> 
> Was this tested with --enable-host-shared and --enable-languages=jit ?
> Note that "jit" is not included in --enable-languages=all.
> 

Yes, of course.  The test suite contains a few string constants, just
all of them are shorter than 200 characters.  But I think removing this
artificial limit enables the existing test cases to test that the
shorter string is in fact zero terminated.

> The patch seems reasonable, but I'm a little confused over the meaning
> of "len" in build_string_literal and build_string: does it refer to the
> length or the size of the string?
> 

build_string_literal:
For languages that use zero-terminated strings, len is strlen(str)+1, and
str is a zero terminated single-byte character string.
For languages that don't use zero-terminated strings, len is the size of
the string and str is not zero terminated.

build_string:
constructs a STRING_CST tree object, which is usable as is in some contexts,
like for asm constraints, but as a string literal it is incomplete, and
needs an index type.  The index type defines the memory size which must
be larger than the string precision.  Excess memory is implicitly cleared.

This means currently all jit strings shorter than 200 characters
are filled with zero up to the limit of 200 chars as imposed by
m_char_array_type_node.  Strings of exactly 200 chars are not zero terminated,
and larger strings should result in an assertion (excess precision was previously
allowed, but no zero termination was appended, when that is not part of
the original string constant).

Previously it was allowed to have memory size less than the string len, which
had complicated the STRING_CST semantics in the middle-end, but with the
string_cst semantic rework I did for gcc-9 this is no longer allowed and
results in (checking) assertions in varasm.c.

>> @@ -617,16 +616,9 @@ playback::rvalue *
>>   playback::context::
>>   new_string_literal (const char *value)
>>   {
>> -  tree t_str = build_string (strlen (value), value);
>> -  gcc_assert (m_char_array_type_node);
>> -  TREE_TYPE (t_str) = m_char_array_type_node;
>> -
>> -  /* Convert to (const char*), loosely based on
>> -     c/c-typeck.c: array_to_pointer_conversion,
>> -     by taking address of start of string.  */
>> -  tree t_addr = build1 (ADDR_EXPR, m_const_char_ptr, t_str);
>> +  tree t_str = build_string_literal (strlen (value) + 1, value);
>>   
>> -  return new rvalue (this, t_addr);
>> +  return new rvalue (this, t_str);
>>   }
> 
> In the above, the call to build_string with strlen is replaced with
> build_string_literal with strlen + 1.
> 
> 
> build_string's comment says:
> 
> "Note that for a C string literal, LEN should include the trailing
> NUL."
> 
> but has:
> 
>    length = len + offsetof (struct tree_string, str) + 1;
> 
> and:
> 
>    TREE_STRING_LENGTH (s) = len;
>    memcpy (s->string.str, str, len);
>    s->string.str[len] = '\0';
> 
> suggesting that the "len" parameter is in fact the length *without* the
> trailing NUL, and that a trailing NUL is added by build_string.
> 

Yes, string constants in tree objects have another zero termiation,
but varasm.c does something different, there the index range takes
precedence.
The index range is built in build_string_literal as follows:

   elem = build_type_variant (char_type_node, 1, 0);
   index = build_index_type (size_int (len - 1));
   type = build_array_type (elem, index);

therefore the string constant hast the type char[0..len-1]
thus only len bytes are significant for code generation, the extra
nul is just for "convenience".

> However build_string_literal has:
> 
>    t = build_string (len, str);
>    elem = build_type_variant (char_type_node, 1, 0);
>    index = build_index_type (size_int (len - 1));
> 
> suggesting that the len is passed directly to build_string (and thus
> ought to be strlen), but the build_index_type uses len - 1 (which
> suggests that len is the size of the string, rather than its length).
> 
> What's the intended meaning of len in these functions?
> 

I hope this helps.


Thanks
Bernd.


> Thanks
> Dave
>
Bernd Edlinger Nov. 9, 2018, 4:52 p.m. | #6
Hi Dave,

is the patch OK, or do you still have questions?


Thanks
Bernd.

On 11/2/18 10:48 PM, Bernd Edlinger wrote:
> On 11/2/18 9:40 PM, David Malcolm wrote:
>> On Sun, 2018-08-05 at 16:59 +0000, Bernd Edlinger wrote:
>>> Hi!
>>>
>>>
>>> My other patch with adds assertions to varasm.c regarding correct
>>> nul termination of sting literals did make these incorrect string
>>> constants in JIT frontend fail.
>>>
>>> The string constants are not nul terminated if their length exceeds
>>> 200 characters.  The test cases do not use strings of that size where
>>> that would make a difference.  But using a fixed index type is
>>> clearly
>>> wrong.
>>>
>>> This patch removes the fixed char[200] array type from
>>> playback::context,
>>> and uses build_string_literal instead of using build_string directly.
>>>
>>>
>>> Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
>>> Is it OK for trunk?
>>
>> Sorry for the belated response.
>>
>> Was this tested with --enable-host-shared and --enable-languages=jit ?
>> Note that "jit" is not included in --enable-languages=all.
>>
> 
> Yes, of course.  The test suite contains a few string constants, just
> all of them are shorter than 200 characters.  But I think removing this
> artificial limit enables the existing test cases to test that the
> shorter string is in fact zero terminated.
> 
>> The patch seems reasonable, but I'm a little confused over the meaning
>> of "len" in build_string_literal and build_string: does it refer to the
>> length or the size of the string?
>>
> 
> build_string_literal:
> For languages that use zero-terminated strings, len is strlen(str)+1, and
> str is a zero terminated single-byte character string.
> For languages that don't use zero-terminated strings, len is the size of
> the string and str is not zero terminated.
> 
> build_string:
> constructs a STRING_CST tree object, which is usable as is in some contexts,
> like for asm constraints, but as a string literal it is incomplete, and
> needs an index type.  The index type defines the memory size which must
> be larger than the string precision.  Excess memory is implicitly cleared.
> 
> This means currently all jit strings shorter than 200 characters
> are filled with zero up to the limit of 200 chars as imposed by
> m_char_array_type_node.  Strings of exactly 200 chars are not zero terminated,
> and larger strings should result in an assertion (excess precision was previously
> allowed, but no zero termination was appended, when that is not part of
> the original string constant).
> 
> Previously it was allowed to have memory size less than the string len, which
> had complicated the STRING_CST semantics in the middle-end, but with the
> string_cst semantic rework I did for gcc-9 this is no longer allowed and
> results in (checking) assertions in varasm.c.
> 
>>> @@ -617,16 +616,9 @@ playback::rvalue *
>>>   playback::context::
>>>   new_string_literal (const char *value)
>>>   {
>>> -  tree t_str = build_string (strlen (value), value);
>>> -  gcc_assert (m_char_array_type_node);
>>> -  TREE_TYPE (t_str) = m_char_array_type_node;
>>> -
>>> -  /* Convert to (const char*), loosely based on
>>> -     c/c-typeck.c: array_to_pointer_conversion,
>>> -     by taking address of start of string.  */
>>> -  tree t_addr = build1 (ADDR_EXPR, m_const_char_ptr, t_str);
>>> +  tree t_str = build_string_literal (strlen (value) + 1, value);
>>> -  return new rvalue (this, t_addr);
>>> +  return new rvalue (this, t_str);
>>>   }
>>
>> In the above, the call to build_string with strlen is replaced with
>> build_string_literal with strlen + 1.
>>
>>
>> build_string's comment says:
>>
>> "Note that for a C string literal, LEN should include the trailing
>> NUL."
>>
>> but has:
>>
>>    length = len + offsetof (struct tree_string, str) + 1;
>>
>> and:
>>
>>    TREE_STRING_LENGTH (s) = len;
>>    memcpy (s->string.str, str, len);
>>    s->string.str[len] = '\0';
>>
>> suggesting that the "len" parameter is in fact the length *without* the
>> trailing NUL, and that a trailing NUL is added by build_string.
>>
> 
> Yes, string constants in tree objects have another zero termiation,
> but varasm.c does something different, there the index range takes
> precedence.
> The index range is built in build_string_literal as follows:
> 
>    elem = build_type_variant (char_type_node, 1, 0);
>    index = build_index_type (size_int (len - 1));
>    type = build_array_type (elem, index);
> 
> therefore the string constant hast the type char[0..len-1]
> thus only len bytes are significant for code generation, the extra
> nul is just for "convenience".
> 
>> However build_string_literal has:
>>
>>    t = build_string (len, str);
>>    elem = build_type_variant (char_type_node, 1, 0);
>>    index = build_index_type (size_int (len - 1));
>>
>> suggesting that the len is passed directly to build_string (and thus
>> ought to be strlen), but the build_index_type uses len - 1 (which
>> suggests that len is the size of the string, rather than its length).
>>
>> What's the intended meaning of len in these functions?
>>
> 
> I hope this helps.
> 
> 
> Thanks
> Bernd.
> 
> 
>> Thanks
>> Dave
>>

Patch

2018-08-05  Bernd Edlinger  <bernd.edlinger@hotmail.de>

	* jit-playback.c (playback::context::context): Remove
	m_char_array_type_node.
	(playback::context::new_string_literal): Use
	build_string_literal.
	(playback::context::replay): Remove m_char_array_type_node.
	* jit-playback.h (playback::context::m_char_array_type_node): Remove.

diff -pur gcc/jit/jit-playback.c gcc/jit/jit-playback.c
--- gcc/jit/jit-playback.c	2018-06-28 09:08:01.000000000 +0200
+++ gcc/jit/jit-playback.c	2018-08-05 15:58:15.815403219 +0200
@@ -81,7 +81,6 @@  playback::context::context (recording::c
   : log_user (ctxt->get_logger ()),
     m_recording_ctxt (ctxt),
     m_tempdir (NULL),
-    m_char_array_type_node (NULL),
     m_const_char_ptr (NULL)
 {
   JIT_LOG_SCOPE (get_logger ());
@@ -617,16 +616,9 @@  playback::rvalue *
 playback::context::
 new_string_literal (const char *value)
 {
-  tree t_str = build_string (strlen (value), value);
-  gcc_assert (m_char_array_type_node);
-  TREE_TYPE (t_str) = m_char_array_type_node;
-
-  /* Convert to (const char*), loosely based on
-     c/c-typeck.c: array_to_pointer_conversion,
-     by taking address of start of string.  */
-  tree t_addr = build1 (ADDR_EXPR, m_const_char_ptr, t_str);
+  tree t_str = build_string_literal (strlen (value) + 1, value);
 
-  return new rvalue (this, t_addr);
+  return new rvalue (this, t_str);
 }
 
 /* Construct a playback::rvalue instance (wrapping a tree) for a
@@ -2633,10 +2625,6 @@  playback::context::
 replay ()
 {
   JIT_LOG_SCOPE (get_logger ());
-  /* Adapted from c-common.c:c_common_nodes_and_builtins.  */
-  tree array_domain_type = build_index_type (size_int (200));
-  m_char_array_type_node
-    = build_array_type (char_type_node, array_domain_type);
 
   m_const_char_ptr
     = build_pointer_type (build_qualified_type (char_type_node,
diff -pur gcc/jit/jit-playback.h gcc/jit/jit-playback.h
--- gcc/jit/jit-playback.h	2018-01-03 11:03:58.000000000 +0100
+++ gcc/jit/jit-playback.h	2018-08-05 15:58:52.988918367 +0200
@@ -316,7 +316,6 @@  private:
 
   auto_vec<function *> m_functions;
   auto_vec<tree> m_globals;
-  tree m_char_array_type_node;
   tree m_const_char_ptr;
 
   /* Source location handling.  */