diff mbox

preprocessor/58580 - preprocessor goes OOM with warning for zero literals

Message ID 87k3gnqj7k.fsf@redhat.com
State New
Headers show

Commit Message

Dodji Seketeli Nov. 5, 2013, 9:41 a.m. UTC
Bernd Edlinger <bernd.edlinger@hotmail.de> writes:

[...]

>> if (!string_len)
>> {
>> string_len = 200;
>> - string = XNEWVEC (char, string_len);
>> + string = XCNEWVEC (char, string_len);
>> }
>> + else
>> + memset (string, 0, string_len);
>
> Is this memset still necessary?

Of course not ...

[...]

> If "ptr" is passed to get_line it will try to reallocate it,
> which must fail, right?
>
> Maybe, this line of code is unreachable?
>
> Who is responsible for reallocating "string" get_line or read_line?

Correct, these are real concerns.


I am wondering what I was thinking.  Actually, I think read_line should
almost just call get_line now.  Like what is done in the new version of
the patch below; basically if there is a line to return, read_line just
gets it (the static buffer containing the line) from get_line and
returns it, otherwise the static buffer containing the last read line is
left untouched and read_line returns a NULL constant.


I guess this resolves the valid concern that you raised below:

    > If the previous invocation of read_line already had read some
    > characters of the following line, how is that information
    > recovered? How is it detected if another file is to be read this
    > time?

Thank you very much for this thorough review.

Here is the updated patch that I am bootstrapping:

gcc/ChangeLog:

	* input.h (location_get_source_line): Take an additional line_size
	parameter.
	* input.c (get_line): New static function definition.
	(read_line): Take an additional line_length output parameter to be
	set to the size of the line.  Use the new get_line function do the
	actual line reading.
	(location_get_source_line): Take an additional output line_len
	parameter.  Update the use of read_line to pass it the line_len
	parameter.
	* diagnostic.c (adjust_line): Take an additional input parameter
	for the length of the line, rather than calculating it with
	strlen.
	(diagnostic_show_locus): Adjust the use of
	location_get_source_line and adjust_line with respect to their new
	signature.  While displaying a line now, do not stop at the first
	null byte.  Rather, display the zero byte as a space and keep
	going until we reach the size of the line.

gcc/testsuite/ChangeLog:

	* c-c++-common/cpp/warning-zero-in-literals-1.c: New test file.
---
 gcc/diagnostic.c                                   |  17 ++--
 gcc/input.c                                        | 111 ++++++++++++++++-----
 gcc/input.h                                        |   3 +-
 .../c-c++-common/cpp/warning-zero-in-literals-1.c  | Bin 0 -> 240 bytes
 4 files changed, 97 insertions(+), 34 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/cpp/warning-zero-in-literals-1.c

diff --git a/gcc/testsuite/c-c++-common/cpp/warning-zero-in-literals-1.c b/gcc/testsuite/c-c++-common/cpp/warning-zero-in-literals-1.c
new file mode 100644
index 0000000000000000000000000000000000000000..ff2ed962ac96e47ae05b0b040f4e10b8e09637e2
GIT binary patch
literal 240
zcmdPbSEyD<N!LxuS12e-Ehx%QPAx80sO92PVo*}h*HVDUmM0eFW#*+TDCL#r<R~O(
UBo-wmm!uXcDby-x=?^KT09Xk|)&Kwi

literal 0
HcmV?d00001

Comments

Bernd Edlinger Nov. 5, 2013, 11:07 a.m. UTC | #1
Hi,

you're welcome.
Just one more thought on the design.

If you want to have at least a chance to survive something like:


dd if=/dev/zero of=test.c bs=10240 count=10000000

gcc -Wall test.c


Then you should change the implementation of read_line to
_not_ returning something like 100GB of zeros.

IMHO it would be nice to limit lines returned to 10.000 bytes,
maybe add "..." or "<line too long>" if the limit is reached.
Just skip over-sized bytes until the newline is consumed, to make
the line numbers consistent.

And maybe it would make the life of read_line's callers lots easier
if the zero-chars are silently replaced with spaces in the returned
line buffer.

That would allow to keep the current interface, and somehow
reduce the complexity of this patch.

What do you think?

Regards
Bernd.



On Tue, 5 Nov 2013 10:41:19, Dodji Seketeli wrote:
>
> Bernd Edlinger <bernd.edlinger@hotmail.de> writes:
>
> [...]
>
>>> if (!string_len)
>>> {
>>> string_len = 200;
>>> - string = XNEWVEC (char, string_len);
>>> + string = XCNEWVEC (char, string_len);
>>> }
>>> + else
>>> + memset (string, 0, string_len);
>>
>> Is this memset still necessary?
>
> Of course not ...
>
> [...]
>
>> If "ptr" is passed to get_line it will try to reallocate it,
>> which must fail, right?
>>
>> Maybe, this line of code is unreachable?
>>
>> Who is responsible for reallocating "string" get_line or read_line?
>
> Correct, these are real concerns.
>
>
> I am wondering what I was thinking. Actually, I think read_line should
> almost just call get_line now. Like what is done in the new version of
> the patch below; basically if there is a line to return, read_line just
> gets it (the static buffer containing the line) from get_line and
> returns it, otherwise the static buffer containing the last read line is
> left untouched and read_line returns a NULL constant.
>
>
> I guess this resolves the valid concern that you raised below:
>
>> If the previous invocation of read_line already had read some
>> characters of the following line, how is that information
>> recovered? How is it detected if another file is to be read this
>> time?
>
> Thank you very much for this thorough review.
>
> Here is the updated patch that I am bootstrapping:
>
> gcc/ChangeLog:
>
> * input.h (location_get_source_line): Take an additional line_size
> parameter.
> * input.c (get_line): New static function definition.
> (read_line): Take an additional line_length output parameter to be
> set to the size of the line. Use the new get_line function do the
> actual line reading.
> (location_get_source_line): Take an additional output line_len
> parameter. Update the use of read_line to pass it the line_len
> parameter.
> * diagnostic.c (adjust_line): Take an additional input parameter
> for the length of the line, rather than calculating it with
> strlen.
> (diagnostic_show_locus): Adjust the use of
> location_get_source_line and adjust_line with respect to their new
> signature. While displaying a line now, do not stop at the first
> null byte. Rather, display the zero byte as a space and keep
> going until we reach the size of the line.
>
> gcc/testsuite/ChangeLog:
>
> * c-c++-common/cpp/warning-zero-in-literals-1.c: New test file.
> ---
> gcc/diagnostic.c | 17 ++--
> gcc/input.c | 111 ++++++++++++++++-----
> gcc/input.h | 3 +-
> .../c-c++-common/cpp/warning-zero-in-literals-1.c | Bin 0 -> 240 bytes
> 4 files changed, 97 insertions(+), 34 deletions(-)
> create mode 100644 gcc/testsuite/c-c++-common/cpp/warning-zero-in-literals-1.c
>
> diff --git a/gcc/diagnostic.c b/gcc/diagnostic.c
> index 36094a1..e0c5d9d 100644
> --- a/gcc/diagnostic.c
> +++ b/gcc/diagnostic.c
> @@ -259,12 +259,13 @@ diagnostic_build_prefix (diagnostic_context *context,
> MAX_WIDTH by some margin, then adjust the start of the line such
> that the COLUMN is smaller than MAX_WIDTH minus the margin. The
> margin is either 10 characters or the difference between the column
> - and the length of the line, whatever is smaller. */
> + and the length of the line, whatever is smaller. The length of
> + LINE is given by LINE_WIDTH. */
> static const char *
> -adjust_line (const char *line, int max_width, int *column_p)
> +adjust_line (const char *line, int line_width,
> + int max_width, int *column_p)
> {
> int right_margin = 10;
> - int line_width = strlen (line);
> int column = *column_p;
>
> right_margin = MIN (line_width - column, right_margin);
> @@ -284,6 +285,7 @@ diagnostic_show_locus (diagnostic_context * context,
> const diagnostic_info *diagnostic)
> {
> const char *line;
> + int line_width;
> char *buffer;
> expanded_location s;
> int max_width;
> @@ -297,22 +299,25 @@ diagnostic_show_locus (diagnostic_context * context,
>
> context->last_location = diagnostic->location;
> s = expand_location_to_spelling_point (diagnostic->location);
> - line = location_get_source_line (s);
> + line = location_get_source_line (s, &line_width);
> if (line == NULL)
> return;
>
> max_width = context->caret_max_width;
> - line = adjust_line (line, max_width, &(s.column));
> + line = adjust_line (line, line_width, max_width, &(s.column));
>
> pp_newline (context->printer);
> saved_prefix = pp_get_prefix (context->printer);
> pp_set_prefix (context->printer, NULL);
> pp_space (context->printer);
> - while (max_width> 0 && *line != '\0')
> + while (max_width> 0 && line_width> 0)
> {
> char c = *line == '\t' ? ' ' : *line;
> + if (c == '\0')
> + c = ' ';
> pp_character (context->printer, c);
> max_width--;
> + line_width--;
> line++;
> }
> pp_newline (context->printer);
> diff --git a/gcc/input.c b/gcc/input.c
> index a141a92..9526d88 100644
> --- a/gcc/input.c
> +++ b/gcc/input.c
> @@ -87,53 +87,110 @@ expand_location_1 (source_location loc,
> return xloc;
> }
>
> -/* Reads one line from file into a static buffer. */
> -static const char *
> -read_line (FILE *file)
> +/* This function reads a line that might contain bytes whose value is
> + zero. It returns the number of bytes read. The 'end-of-line'
> + character found at the end of the line is not contained in the
> + returned buffer. Note that this function has been adapted from
> + getline() and _IO_getdelim() GNU C library functions. It's been
> + duplicated here because the getline() function is not necessarily
> + present on all platforms.
> +
> + LINEPTR points to a buffer that is to contain the line read.
> +
> + N points to the size of the the LINEPTR buffer.
> +
> + FP points to the file to consider. */
> +
> +static ssize_t
> +get_line (char **lineptr, size_t *n, FILE *fp)
> {
> - static char *string;
> - static size_t string_len;
> - size_t pos = 0;
> - char *ptr;
> + ssize_t cur_len = 0, len;
> + char buf[16384];
> +
> + if (lineptr == NULL || n == NULL)
> + return -1;
>
> - if (!string_len)
> + if (*lineptr == NULL || *n == 0)
> {
> - string_len = 200;
> - string = XNEWVEC (char, string_len);
> + *n = 120;
> + *lineptr = XNEWVEC (char, *n);
> }
>
> - while ((ptr = fgets (string + pos, string_len - pos, file)))
> - {
> - size_t len = strlen (string + pos);
> + len = fread (buf, 1, sizeof buf, fp);
> + if (ferror (fp))
> + return -1;
>
> - if (string[pos + len - 1] == '\n')
> + for (;;)
> + {
> + size_t needed;
> + char *t = (char*) memchr (buf, '\n', len);
> + if (t != NULL) len = (t - buf);
> + if (__builtin_expect (len>= SSIZE_MAX - cur_len, 0))
> + return -1;
> + needed = cur_len + len + 1;
> + if (needed> *n)
> {
> - string[pos + len - 1] = 0;
> - return string;
> + char *new_lineptr;
> + if (needed < 2 * *n)
> + needed = 2 * *n;
> + new_lineptr = XRESIZEVEC (char, *lineptr, needed);
> + *lineptr = new_lineptr;
> + *n = needed;
> }
> - pos += len;
> - string = XRESIZEVEC (char, string, string_len * 2);
> - string_len *= 2;
> + memcpy (*lineptr + cur_len, buf, len);
> + cur_len += len;
> + if (t != NULL)
> + break;
> + len = fread (buf, 1, sizeof buf, fp);
> + if (ferror (fp))
> + return -1;
> + if (len == 0)
> + break;
> }
> -
> - return pos ? string : NULL;
> +
> + if (cur_len)
> + (*lineptr)[cur_len] = '\0';
> + return cur_len;
> +}
> +
> +/* Reads one line from FILE into a static buffer. LINE_LENGTH is set
> + by this function to the length of the returned line. Note that the
> + returned line can contain several zero bytes. Also note that the
> + returned string is allocated in static storage that is going to be
> + re-used by subsequent invocations of read_line. */
> +static const char *
> +read_line (FILE *file, int *line_length)
> +{
> + static char *string;
> + static size_t string_len;
> +
> + *line_length = get_line (&string, &string_len, file);
> + return *line_length ? string : NULL;
> }
>
> /* Return the physical source line that corresponds to xloc in a
> buffer that is statically allocated. The newline is replaced by
> - the null character. */
> + the null character. Note that the line can contain several null
> + characters, so LINE_LEN, if non-null, points to the actual length
> + of the line. */
>
> const char *
> -location_get_source_line (expanded_location xloc)
> +location_get_source_line (expanded_location xloc,
> + int *line_len)
> {
> - const char *buffer;
> - int lines = 1;
> + const char *buffer = NULL, *ptr;
> + int lines = 0, len = 0;
> FILE *stream = xloc.file ? fopen (xloc.file, "r") : NULL;
> if (!stream)
> return NULL;
>
> - while ((buffer = read_line (stream)) && lines < xloc.line)
> - lines++;
> + while ((ptr = read_line (stream, &len)) && lines < xloc.line)
> + {
> + buffer = ptr;
> + lines++;
> + if (line_len)
> + *line_len = len;
> + }
>
> fclose (stream);
> return buffer;
> diff --git a/gcc/input.h b/gcc/input.h
> index 8fdc7b2..128e28c 100644
> --- a/gcc/input.h
> +++ b/gcc/input.h
> @@ -37,7 +37,8 @@ extern char builtins_location_check[(BUILTINS_LOCATION
> < RESERVED_LOCATION_COUNT) ? 1 : -1];
>
> extern expanded_location expand_location (source_location);
> -extern const char *location_get_source_line (expanded_location xloc);
> +extern const char *location_get_source_line (expanded_location xloc,
> + int *line_size);
> extern expanded_location expand_location_to_spelling_point (source_location);
> extern source_location expansion_point_location_if_in_system_header (source_location);
>
> diff --git a/gcc/testsuite/c-c++-common/cpp/warning-zero-in-literals-1.c b/gcc/testsuite/c-c++-common/cpp/warning-zero-in-literals-1.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..ff2ed962ac96e47ae05b0b040f4e10b8e09637e2
> GIT binary patch
> literal 240
> zcmdPbSEyD<N!LxuS12e-Ehx%QPAx80sO92PVo*}h*HVDUmM0eFW#*+TDCL#r<R~O(
> UBo-wmm!uXcDby-x=?^KT09Xk|)&Kwi
>
> literal 0
> HcmV?d00001
>
> --
> Dodji
Dodji Seketeli Nov. 5, 2013, 11:39 a.m. UTC | #2
Bernd Edlinger <bernd.edlinger@hotmail.de> writes:

> If you want to have at least a chance to survive something like:
>
>
> dd if=/dev/zero of=test.c bs=10240 count=10000000
>
> gcc -Wall test.c
>
>
> Then you should change the implementation of read_line to
> _not_ returning something like 100GB of zeros.

I'd say that in that case, we'd rather just die in an OOM condition and
be done with it.  Otherwise, If fear that read_line might become too
slow; you'd have to detect that the content is just zeros, for instance.

> IMHO it would be nice to limit lines returned to 10.000 bytes,
> maybe add "..." or "<line too long>" if the limit is reached.

In general, setting a limit for pathological cases like this is a good
idea, I think.  But that seems a bit ouf of the scope of this particular
bug fix; we'd need to e.g, define a new command line argument to extend
that limit if need be, for instance.  If people really feel strongly
about this I can propose a later patch to set a limit in get_line and
define a command like argument that would override that parameter.

> And maybe it would make the life of read_line's callers lots easier
> if the zero-chars are silently replaced with spaces in the returned
> line buffer.

As speed seemed to be a concern (even if, in my opinion, we are dealing
with diagnostics that are being emitted when the compilation has been
halted anyway, so we shouldn't be too concerned, unless we are talking
about pathological cases), I think that read_line should be fast by
default.  If a particular caller doesn't want to see the zeros (and thus
is ready to pay the speed price) then it can replace the zeros with
white space.  Otherwise, let's have read_line be as fast as possible.

Also keep in mind that in subsequent patches, read_line might be re-used
by e.g, gcov in nominal contexts where we don't have zeros in the middle
of the line.  In that case, speed can be a concern.

Thanks for the helpful thoughts.
Bernd Edlinger Nov. 6, 2013, 10:22 p.m. UTC | #3
Sorry Dodji,

I still do not see how this is supposed to work:

If the previous invocation of get_line already had read some
characters of the following line(s), how is that information
recovered?

I see it is copied behind lineptr[cur_len].
But when get_line is re-entered, cur_len is set to zero again.
and all that contents up to 16K are forgotten. Right?

If an empty line of just a new-line is read, the return value
of get_line is 0, and string is "". But the return value of
read_line is NULL in that case. Now the function
location_get_source_line will leave the while loop.
But there may be more lines, propably not just empty ones?

How did you test your patch?


Regards
Bernd.
diff mbox

Patch

diff --git a/gcc/diagnostic.c b/gcc/diagnostic.c
index 36094a1..e0c5d9d 100644
--- a/gcc/diagnostic.c
+++ b/gcc/diagnostic.c
@@ -259,12 +259,13 @@  diagnostic_build_prefix (diagnostic_context *context,
    MAX_WIDTH by some margin, then adjust the start of the line such
    that the COLUMN is smaller than MAX_WIDTH minus the margin.  The
    margin is either 10 characters or the difference between the column
-   and the length of the line, whatever is smaller.  */
+   and the length of the line, whatever is smaller.  The length of
+   LINE is given by LINE_WIDTH.  */
 static const char *
-adjust_line (const char *line, int max_width, int *column_p)
+adjust_line (const char *line, int line_width,
+	     int max_width, int *column_p)
 {
   int right_margin = 10;
-  int line_width = strlen (line);
   int column = *column_p;
 
   right_margin = MIN (line_width - column, right_margin);
@@ -284,6 +285,7 @@  diagnostic_show_locus (diagnostic_context * context,
 		       const diagnostic_info *diagnostic)
 {
   const char *line;
+  int line_width;
   char *buffer;
   expanded_location s;
   int max_width;
@@ -297,22 +299,25 @@  diagnostic_show_locus (diagnostic_context * context,
 
   context->last_location = diagnostic->location;
   s = expand_location_to_spelling_point (diagnostic->location);
-  line = location_get_source_line (s);
+  line = location_get_source_line (s, &line_width);
   if (line == NULL)
     return;
 
   max_width = context->caret_max_width;
-  line = adjust_line (line, max_width, &(s.column));
+  line = adjust_line (line, line_width, max_width, &(s.column));
 
   pp_newline (context->printer);
   saved_prefix = pp_get_prefix (context->printer);
   pp_set_prefix (context->printer, NULL);
   pp_space (context->printer);
-  while (max_width > 0 && *line != '\0')
+  while (max_width > 0 && line_width > 0)
     {
       char c = *line == '\t' ? ' ' : *line;
+      if (c == '\0')
+	c = ' ';
       pp_character (context->printer, c);
       max_width--;
+      line_width--;
       line++;
     }
   pp_newline (context->printer);
diff --git a/gcc/input.c b/gcc/input.c
index a141a92..9526d88 100644
--- a/gcc/input.c
+++ b/gcc/input.c
@@ -87,53 +87,110 @@  expand_location_1 (source_location loc,
   return xloc;
 }
 
-/* Reads one line from file into a static buffer.  */
-static const char *
-read_line (FILE *file)
+/* This function reads a line that might contain bytes whose value is
+   zero.  It returns the number of bytes read.  The 'end-of-line'
+   character found at the end of the line is not contained in the
+   returned buffer.  Note that this function has been adapted from
+   getline() and _IO_getdelim() GNU C library functions.  It's been
+   duplicated here because the getline() function is not necessarily
+   present on all platforms.
+
+   LINEPTR points to a buffer that is to contain the line read.
+
+   N points to the size of the the LINEPTR buffer.
+
+   FP points to the file to consider.  */
+
+static ssize_t
+get_line (char **lineptr, size_t *n, FILE *fp)
 {
-  static char *string;
-  static size_t string_len;
-  size_t pos = 0;
-  char *ptr;
+  ssize_t cur_len = 0, len;
+  char buf[16384];
+
+  if (lineptr == NULL || n == NULL)
+    return -1;
 
-  if (!string_len)
+  if (*lineptr == NULL || *n == 0)
     {
-      string_len = 200;
-      string = XNEWVEC (char, string_len);
+      *n = 120;
+      *lineptr = XNEWVEC (char, *n);
     }
 
-  while ((ptr = fgets (string + pos, string_len - pos, file)))
-    {
-      size_t len = strlen (string + pos);
+  len = fread (buf, 1, sizeof buf, fp);
+  if (ferror (fp))
+    return -1;
 
-      if (string[pos + len - 1] == '\n')
+  for (;;)
+    {
+      size_t needed;
+      char *t = (char*) memchr (buf, '\n', len);
+      if (t != NULL) len = (t - buf);
+      if (__builtin_expect (len >= SSIZE_MAX - cur_len, 0))
+	return -1;
+      needed = cur_len + len + 1;
+      if (needed > *n)
 	{
-	  string[pos + len - 1] = 0;
-	  return string;
+	  char *new_lineptr;
+	  if (needed < 2 * *n)
+	    needed = 2 * *n;
+	  new_lineptr = XRESIZEVEC (char, *lineptr, needed);
+	  *lineptr = new_lineptr;
+	  *n = needed;
 	}
-      pos += len;
-      string = XRESIZEVEC (char, string, string_len * 2);
-      string_len *= 2;
+      memcpy (*lineptr + cur_len, buf, len);
+      cur_len += len;
+      if (t != NULL)
+	break;
+      len = fread (buf, 1, sizeof buf, fp);
+      if (ferror (fp))
+	return -1;
+      if (len == 0)
+	break;
     }
-      
-  return pos ? string : NULL;
+
+  if (cur_len)
+    (*lineptr)[cur_len] = '\0';
+  return cur_len;
+}
+
+/* Reads one line from FILE into a static buffer.  LINE_LENGTH is set
+   by this function to the length of the returned line.  Note that the
+   returned line can contain several zero bytes.  Also note that the
+   returned string is allocated in static storage that is going to be
+   re-used by subsequent invocations of read_line.  */
+static const char *
+read_line (FILE *file, int *line_length)
+{
+  static char *string;
+  static size_t string_len;
+
+  *line_length = get_line (&string, &string_len, file);
+  return *line_length ? string : NULL;
 }
 
 /* Return the physical source line that corresponds to xloc in a
    buffer that is statically allocated.  The newline is replaced by
-   the null character.  */
+   the null character.  Note that the line can contain several null
+   characters, so LINE_LEN, if non-null, points to the actual length
+   of the line.  */
 
 const char *
-location_get_source_line (expanded_location xloc)
+location_get_source_line (expanded_location xloc,
+			  int *line_len)
 {
-  const char *buffer;
-  int lines = 1;
+  const char *buffer = NULL, *ptr;
+  int lines = 0, len = 0;
   FILE *stream = xloc.file ? fopen (xloc.file, "r") : NULL;
   if (!stream)
     return NULL;
 
-  while ((buffer = read_line (stream)) && lines < xloc.line)
-    lines++;
+  while ((ptr = read_line (stream, &len)) && lines < xloc.line)
+    {
+      buffer = ptr;
+      lines++;
+      if (line_len)
+	*line_len = len;
+    }
 
   fclose (stream);
   return buffer;
diff --git a/gcc/input.h b/gcc/input.h
index 8fdc7b2..128e28c 100644
--- a/gcc/input.h
+++ b/gcc/input.h
@@ -37,7 +37,8 @@  extern char builtins_location_check[(BUILTINS_LOCATION
 				     < RESERVED_LOCATION_COUNT) ? 1 : -1];
 
 extern expanded_location expand_location (source_location);
-extern const char *location_get_source_line (expanded_location xloc);
+extern const char *location_get_source_line (expanded_location xloc,
+					     int *line_size);
 extern expanded_location expand_location_to_spelling_point (source_location);
 extern source_location expansion_point_location_if_in_system_header (source_location);