Patchwork preprocessor/58580 - preprocessor goes OOM with warning for zero literals

login
register
mail settings
Submitter Dodji Seketeli
Date Oct. 31, 2013, 12:46 p.m.
Message ID <871u311ucp.fsf@redhat.com>
Download mbox | patch
Permalink /patch/287484/
State New
Headers show

Comments

Dodji Seketeli - Oct. 31, 2013, 12:46 p.m.
Hello,

In this problem report, the compiler is fed a (bogus) translation unit
in which some literals contains bytes whose value is zero.  The
preprocessor detects that and proceeds to emit diagnostics for that
king of bogus literals.  But then when the diagnostics machinery
re-reads the input file again to display the bogus literals with a
caret, it attempts to calculate the length of each of the lines it got
using fgets.  The line length calculation is done using strlen.  But
that doesn't work well when the content of the line can have several
zero bytes.  The result is that the read_line never sees the end of
the line because strlen repeatedly reports that the line ends before
the end-of-line character; so read_line thinks its buffer for reading
the line is too small; it thus increases the buffer, leading to a huge
memory consumption, pain and disaster.

The patch below introduces a new string_length() function that can
return the length of a string contained in a buffer even if the string
contains zero bytes; it does so by starting from the end of the buffer
and stops when it encounters the first non-null byte; for that to
work, the buffer must have been totally zeroed before getting data.
read_line is then modified to return the length of the line along
with the line itself, as the line can now contain zero bytes.  Callers
of read_line are adjusted consequently.

diagnostic_show_locus() is modified to consider that a line can have
characters of value zero, and so just show a white space when
instructed to display one of these characters.

Tested on x86_64-unknown-linux-gnu against trunk.

I realize this is diagnostics code and I am supposed to be a maintainer
for it, but I'd appreciate a review for it nonetheless.

Thanks.

gcc/ChangeLog:

	* input.h (location_get_source_line): Take an additional line_size
	parameter by reference.
	* input.c (string_length): New static function definition.
	(read_line): Take an additional line_length output parameter to be
	set to the size of the line.  Use the new string_length function
	to compute the size of the line returned by fgets, rather than
	using strlen.  Ensure that the buffer is initially zeroed; ensure
	that when growing the buffer too.
	(location_get_source_line): Take an additional output line_len
	parameter.  Update the use of read_line to pass it the line_len
	parameter.
	* diagnostic.c (adjust_line): Take an additional input parameter
	for the length of the line, rather than calculating it with
	strlen.
	(diagnostic_show_locus): Adjust the use of
	location_get_source_line and adjust_line with respect to their new
	signature.  While displaying a line now, do not stop at the first
	null byte.  Rather, display the zero byte as a space and keep
	going until we reach the size of the line.

gcc/testsuite/ChangeLog:

	* c-c++-common/cpp/warning-zero-in-literals-1.c: New test file.
---
 gcc/diagnostic.c                                   |  17 +++---
 gcc/input.c                                        |  60 +++++++++++++++++----
 gcc/input.h                                        |   3 +-
 .../c-c++-common/cpp/warning-zero-in-literals-1.c  | Bin 0 -> 240 bytes
 4 files changed, 62 insertions(+), 18 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/cpp/warning-zero-in-literals-1.c

diff --git a/gcc/testsuite/c-c++-common/cpp/warning-zero-in-literals-1.c b/gcc/testsuite/c-c++-common/cpp/warning-zero-in-literals-1.c
new file mode 100644
index 0000000000000000000000000000000000000000..ff2ed962ac96e47ae05b0b040f4e10b8e09637e2
GIT binary patch
literal 240
zcmdPbSEyD<N!LxuS12e-Ehx%QPAx80sO92PVo*}h*HVDUmM0eFW#*+TDCL#r<R~O(
UBo-wmm!uXcDby-x=?^KT09Xk|)&Kwi

literal 0
HcmV?d00001
Manuel López-Ibáñez - Oct. 31, 2013, 4:52 p.m.
On 31 October 2013 05:46, Dodji Seketeli <dodji@redhat.com> wrote:
> +*/
> +static size_t
> +string_length (const char* buf, size_t buf_size)
> +{
> +  for (int i = buf_size - 1; i > 0; --i)
> +    {
> +      if (buf[i] != 0)
> +       return i + 1;
> +
> +      if (buf[i - 1] != 0)
> +       return i;
> +    }
> +  return 0;
> +}

Why do you check both buf[i] and buf[i - 1] within the loop?

Cheers,

Manuel.

Patch

diff --git a/gcc/diagnostic.c b/gcc/diagnostic.c
index 36094a1..0ca7081 100644
--- a/gcc/diagnostic.c
+++ b/gcc/diagnostic.c
@@ -259,12 +259,13 @@  diagnostic_build_prefix (diagnostic_context *context,
    MAX_WIDTH by some margin, then adjust the start of the line such
    that the COLUMN is smaller than MAX_WIDTH minus the margin.  The
    margin is either 10 characters or the difference between the column
-   and the length of the line, whatever is smaller.  */
+   and the length of the line, whatever is smaller.  The length of
+   LINE is given by LINE_WIDTH.  */
 static const char *
-adjust_line (const char *line, int max_width, int *column_p)
+adjust_line (const char *line, int line_width,
+	     int max_width, int *column_p)
 {
   int right_margin = 10;
-  int line_width = strlen (line);
   int column = *column_p;
 
   right_margin = MIN (line_width - column, right_margin);
@@ -284,6 +285,7 @@  diagnostic_show_locus (diagnostic_context * context,
 		       const diagnostic_info *diagnostic)
 {
   const char *line;
+  int line_width;
   char *buffer;
   expanded_location s;
   int max_width;
@@ -297,22 +299,25 @@  diagnostic_show_locus (diagnostic_context * context,
 
   context->last_location = diagnostic->location;
   s = expand_location_to_spelling_point (diagnostic->location);
-  line = location_get_source_line (s);
+  line = location_get_source_line (s, line_width);
   if (line == NULL)
     return;
 
   max_width = context->caret_max_width;
-  line = adjust_line (line, max_width, &(s.column));
+  line = adjust_line (line, line_width, max_width, &(s.column));
 
   pp_newline (context->printer);
   saved_prefix = pp_get_prefix (context->printer);
   pp_set_prefix (context->printer, NULL);
   pp_space (context->printer);
-  while (max_width > 0 && *line != '\0')
+  while (max_width > 0 && line_width > 0)
     {
       char c = *line == '\t' ? ' ' : *line;
+      if (c == '\0')
+	c = ' ';
       pp_character (context->printer, c);
       max_width--;
+      line_width--;
       line++;
     }
   pp_newline (context->printer);
diff --git a/gcc/input.c b/gcc/input.c
index a141a92..b72183c 100644
--- a/gcc/input.c
+++ b/gcc/input.c
@@ -87,9 +87,37 @@  expand_location_1 (source_location loc,
   return xloc;
 }
 
-/* Reads one line from file into a static buffer.  */
+/* Returns the length of a string contained in a buffer that was
+   initially totally zeroed.  The length of the string is the number
+   of bytes up to the last non-null byte of the buffer.  Note that
+   with this definition, the string can contain bytes which value is
+   zero.
+
+   BUF is a pointer to the beginning of the buffer containing the
+   string to consider.
+
+   BUF_SIZE is the size of the buffer containing the string to
+   consider.  The string length is thus at most BUF_SIZE.
+*/
+static size_t
+string_length (const char* buf, size_t buf_size)
+{
+  for (int i = buf_size - 1; i > 0; --i)
+    {
+      if (buf[i] != 0)
+	return i + 1;
+
+      if (buf[i - 1] != 0)
+	return i;
+    }
+  return 0;
+}
+
+/* Reads one line from FILE into a static buffer.  LINE_LENGTH is set
+   by this function to the length of the returned line.  Note that the
+   returned line can contain several zero bytes.  */
 static const char *
-read_line (FILE *file)
+read_line (FILE *file, int& line_length)
 {
   static char *string;
   static size_t string_len;
@@ -99,32 +127,42 @@  read_line (FILE *file)
   if (!string_len)
     {
       string_len = 200;
-      string = XNEWVEC (char, string_len);
+      string = XCNEWVEC (char, string_len);
     }
+  else
+    memset (string, 0, string_len);
 
   while ((ptr = fgets (string + pos, string_len - pos, file)))
     {
-      size_t len = strlen (string + pos);
+      size_t len = string_length (string + pos, string_len - pos);
 
       if (string[pos + len - 1] == '\n')
 	{
 	  string[pos + len - 1] = 0;
+	  line_length = len;
 	  return string;
 	}
       pos += len;
-      string = XRESIZEVEC (char, string, string_len * 2);
-      string_len *= 2;
-    }
-      
+      size_t string_len2 = string_len * 2;
+      char *string2 = XCNEWVEC (char, string_len2);
+      memmove (string2, string, string_len);
+      XDELETE (string);;
+      string = string2;
+      string_len = string_len2;
+     }
+
+  line_length = pos ? string_len : 0;
   return pos ? string : NULL;
 }
 
 /* Return the physical source line that corresponds to xloc in a
    buffer that is statically allocated.  The newline is replaced by
-   the null character.  */
+   the null character.  Note that the line can contain several null
+   characters, so LINE_LEN contains the actual length of the line.  */
 
 const char *
-location_get_source_line (expanded_location xloc)
+location_get_source_line (expanded_location xloc,
+			  int& line_len)
 {
   const char *buffer;
   int lines = 1;
@@ -132,7 +170,7 @@  location_get_source_line (expanded_location xloc)
   if (!stream)
     return NULL;
 
-  while ((buffer = read_line (stream)) && lines < xloc.line)
+  while ((buffer = read_line (stream, line_len)) && lines < xloc.line)
     lines++;
 
   fclose (stream);
diff --git a/gcc/input.h b/gcc/input.h
index 8fdc7b2..79b3a10 100644
--- a/gcc/input.h
+++ b/gcc/input.h
@@ -37,7 +37,8 @@  extern char builtins_location_check[(BUILTINS_LOCATION
 				     < RESERVED_LOCATION_COUNT) ? 1 : -1];
 
 extern expanded_location expand_location (source_location);
-extern const char *location_get_source_line (expanded_location xloc);
+extern const char *location_get_source_line (expanded_location xloc,
+					     int& line_size);
 extern expanded_location expand_location_to_spelling_point (source_location);
 extern source_location expansion_point_location_if_in_system_header (source_location);