diff mbox series

[committed,part2] PR fortran/95090 - ICE: identifier overflow

Message ID trinity-b0708fbd-6e5c-4fc0-b74f-4af90740e364-1590780482832@3c-app-gmx-bap60
State New
Headers show
Series [committed,part2] PR fortran/95090 - ICE: identifier overflow | expand

Commit Message

Harald Anlauf May 29, 2020, 7:28 p.m. UTC
The initial patch for this PR had some fallout which for unknown reason
did only show up on i686, but not on x86_64.  With initial guidance by
Manfred Schwarb three further locations exhibiting buffer overrun could
be identified in a gdb session and were fixed.

Committed as 'obvious' to master.

Thanks,
Harald


PR fortran/95090 - ICE: identifier overflow

The initial fix for this PR uncovered several latent issues with further
too small string buffers which showed up only when testing on i686.
Provide sufficiently large temporaries.

2020-05-29  Harald Anlauf  <anlauf@gmx.de>

gcc/fortran/
	PR fortran/95090
	* class.c (get_unique_type_string): Enlarge temporary for
	name-mangling.  Use strncpy to prevent buffer overrun.
	(get_unique_hashed_string): Enlarge temporary.
	(gfc_hash_value): Enlarge temporary for name-mangling.

Comments

H.J. Lu May 29, 2020, 9:57 p.m. UTC | #1
On Fri, May 29, 2020 at 1:39 PM Harald Anlauf <anlauf@gmx.de> wrote:
>
> The initial patch for this PR had some fallout which for unknown reason
> did only show up on i686, but not on x86_64.  With initial guidance by
> Manfred Schwarb three further locations exhibiting buffer overrun could
> be identified in a gdb session and were fixed.
>
> Committed as 'obvious' to master.
>
> Thanks,
> Harald
>
>
> PR fortran/95090 - ICE: identifier overflow
>
> The initial fix for this PR uncovered several latent issues with further
> too small string buffers which showed up only when testing on i686.
> Provide sufficiently large temporaries.
>
> 2020-05-29  Harald Anlauf  <anlauf@gmx.de>
>
> gcc/fortran/
>         PR fortran/95090
>         * class.c (get_unique_type_string): Enlarge temporary for
>         name-mangling.  Use strncpy to prevent buffer overrun.
>         (get_unique_hashed_string): Enlarge temporary.
>         (gfc_hash_value): Enlarge temporary for name-mangling.

This breaks bootstrap:

https://gcc.gnu.org/pipermail/gcc-regression/2020-May/072642.html

../../src-master/gcc/fortran/class.c:487:13: error: ‘char*
strncpy(char*, const char*, size_t)’ specified bound 67 equals
destination size [-Werror=stringop-truncation]
  487 |     strncpy (dt_name, gfc_dt_upper_string (derived->name),
sizeof (dt_name));
Harald Anlauf May 29, 2020, 10:20 p.m. UTC | #2
Hi H.J.,

> Gesendet: Freitag, 29. Mai 2020 um 23:57 Uhr
> Von: "H.J. Lu" <hjl.tools@gmail.com>

> This breaks bootstrap:
> 
> https://gcc.gnu.org/pipermail/gcc-regression/2020-May/072642.html
> 
> ../../src-master/gcc/fortran/class.c:487:13: error: ‘char*
> strncpy(char*, const char*, size_t)’ specified bound 67 equals
> destination size [-Werror=stringop-truncation]
>   487 |     strncpy (dt_name, gfc_dt_upper_string (derived->name),
> sizeof (dt_name));

what is the right way to deal with that?

I haven't seen any use of strlcpy in the gcc sources.  This would do
the right thing and would fit here.

So should one use the clumsy way:

           strncpy(buf, str, buflen - 1);
           if (buflen > 0)
               buf[buflen - 1]= '\0';

Or is there something more convenient that keeps gcc happy?

Harald
H.J. Lu May 29, 2020, 10:23 p.m. UTC | #3
On Fri, May 29, 2020 at 3:20 PM Harald Anlauf <anlauf@gmx.de> wrote:
>
> Hi H.J.,
>
> > Gesendet: Freitag, 29. Mai 2020 um 23:57 Uhr
> > Von: "H.J. Lu" <hjl.tools@gmail.com>
>
> > This breaks bootstrap:
> >
> > https://gcc.gnu.org/pipermail/gcc-regression/2020-May/072642.html
> >
> > ../../src-master/gcc/fortran/class.c:487:13: error: ‘char*
> > strncpy(char*, const char*, size_t)’ specified bound 67 equals
> > destination size [-Werror=stringop-truncation]
> >   487 |     strncpy (dt_name, gfc_dt_upper_string (derived->name),
> > sizeof (dt_name));
>
> what is the right way to deal with that?
>
> I haven't seen any use of strlcpy in the gcc sources.  This would do
> the right thing and would fit here.
>
> So should one use the clumsy way:
>
>            strncpy(buf, str, buflen - 1);
>            if (buflen > 0)
>                buf[buflen - 1]= '\0';
>
> Or is there something more convenient that keeps gcc happy?

dt_name[sizeof (dt_name) - 1] = '\0';
strncpy (dt_name, gfc_dt_upper_string (derived->name), sizeof (dt_name) - 1);
Jakub Jelinek May 29, 2020, 11:10 p.m. UTC | #4
On Sat, May 30, 2020 at 12:20:19AM +0200, Harald Anlauf wrote:
> > Gesendet: Freitag, 29. Mai 2020 um 23:57 Uhr
> > Von: "H.J. Lu" <hjl.tools@gmail.com>
> 
> > This breaks bootstrap:
> > 
> > https://gcc.gnu.org/pipermail/gcc-regression/2020-May/072642.html
> > 
> > ../../src-master/gcc/fortran/class.c:487:13: error: ‘char*
> > strncpy(char*, const char*, size_t)’ specified bound 67 equals
> > destination size [-Werror=stringop-truncation]
> >   487 |     strncpy (dt_name, gfc_dt_upper_string (derived->name),
> > sizeof (dt_name));
> 
> what is the right way to deal with that?
> 
> I haven't seen any use of strlcpy in the gcc sources.  This would do
> the right thing and would fit here.
> 
> So should one use the clumsy way:
> 
>            strncpy(buf, str, buflen - 1);
>            if (buflen > 0)
>                buf[buflen - 1]= '\0';
> 
> Or is there something more convenient that keeps gcc happy?

Depends on what exactly you want to achieve, but strncpy is pretty much
always the wrong answer.
Do you really want to clear the rest of buf, when gfc_dt_upper_string
is much shorter?  That is one thing that is only seldom useful from strncpy
behavior.  The other is that it doesn't zero terminate on truncation.
I'd suggest e.g. remember gfc_dt_upper_string result in a temporary,
compute len = strnlen (res, sizeof (dt_name) - 1);
and then you can just memcpy and zero terminate buf[len] = '\0';
Or do you ever want to truncate?

	Jakub
Harald Anlauf May 30, 2020, 12:48 p.m. UTC | #5
> > > This breaks bootstrap:
> > > 
> > > https://gcc.gnu.org/pipermail/gcc-regression/2020-May/072642.html
> > > 
> > > ../../src-master/gcc/fortran/class.c:487:13: error: ‘char*
> > > strncpy(char*, const char*, size_t)’ specified bound 67 equals
> > > destination size [-Werror=stringop-truncation]
> > >   487 |     strncpy (dt_name, gfc_dt_upper_string (derived->name),
> > > sizeof (dt_name));
> > 
> > So should one use the clumsy way:
> > 
> >            strncpy(buf, str, buflen - 1);
> >            if (buflen > 0)
> >                buf[buflen - 1]= '\0';
> > 
> > Or is there something more convenient that keeps gcc happy?
> 
> Depends on what exactly you want to achieve, but strncpy is pretty much
> always the wrong answer.
> Do you really want to clear the rest of buf, when gfc_dt_upper_string
> is much shorter?  That is one thing that is only seldom useful from strncpy
> behavior.  The other is that it doesn't zero terminate on truncation.
> I'd suggest e.g. remember gfc_dt_upper_string result in a temporary,
> compute len = strnlen (res, sizeof (dt_name) - 1);
> and then you can just memcpy and zero terminate buf[len] = '\0';
> Or do you ever want to truncate?

I'ld like to detect the situation that when somebody modifies name-mangling in
a way that generates a buffer overrun during regtesting so that the temporaries
to adjust are easier to find.

After thinking about your and H.J.'s suggestions, the shortest solution
I came up with is:

diff --git a/gcc/fortran/class.c b/gcc/fortran/class.c
index db395624a16..6d0924da2b8 100644
--- a/gcc/fortran/class.c
+++ b/gcc/fortran/class.c
@@ -484,7 +484,12 @@ get_unique_type_string (char *string, gfc_symbol *derived)
   if (derived->attr.unlimited_polymorphic)
     strcpy (dt_name, "STAR");
   else
-    strncpy (dt_name, gfc_dt_upper_string (derived->name), sizeof (dt_name));
+    {
+      dt_name[sizeof (dt_name)-1] = '\0';
+      strcpy (dt_name, gfc_dt_upper_string (derived->name));
+      if (dt_name[sizeof (dt_name)-1] != '\0')
+       gfc_internal_error ("get_unique_type_string: identifier overflow");
+    }
   if (derived->attr.unlimited_polymorphic)
     sprintf (string, "_%s", dt_name);
   else if (derived->module)

That would have given me a useful error on x86_64.

Is this OK for master?

Harald
Jakub Jelinek May 30, 2020, 12:54 p.m. UTC | #6
On Sat, May 30, 2020 at 02:48:32PM +0200, Harald Anlauf wrote:
> I'ld like to detect the situation that when somebody modifies name-mangling in
> a way that generates a buffer overrun during regtesting so that the temporaries
> to adjust are easier to find.
> 
> After thinking about your and H.J.'s suggestions, the shortest solution
> I came up with is:
> 
> diff --git a/gcc/fortran/class.c b/gcc/fortran/class.c
> index db395624a16..6d0924da2b8 100644
> --- a/gcc/fortran/class.c
> +++ b/gcc/fortran/class.c
> @@ -484,7 +484,12 @@ get_unique_type_string (char *string, gfc_symbol *derived)
>    if (derived->attr.unlimited_polymorphic)
>      strcpy (dt_name, "STAR");
>    else
> -    strncpy (dt_name, gfc_dt_upper_string (derived->name), sizeof (dt_name));
> +    {
> +      dt_name[sizeof (dt_name)-1] = '\0';
> +      strcpy (dt_name, gfc_dt_upper_string (derived->name));
> +      if (dt_name[sizeof (dt_name)-1] != '\0')
> +       gfc_internal_error ("get_unique_type_string: identifier overflow");
> +    }
>    if (derived->attr.unlimited_polymorphic)
>      sprintf (string, "_%s", dt_name);
>    else if (derived->module)
> 
> That would have given me a useful error on x86_64.
> 
> Is this OK for master?

That is detecting it after the buffer overflow has happened already, that is
too late, after UB anything can happen.
+    {
+      const char *upper = gfc_dt_upper_string (derived->name);
+      size_t len = strnlen (upper, sizeof (dt_name));
+      gcc_assert (len < sizeof (dt_name));
+      memcpy (dt_name, upper, len);
+      dt_name[len] = '\0';
+    }
does detect it before overflowing it.

	Jakub
Andreas Schwab May 30, 2020, 6:32 p.m. UTC | #7
On Mai 30 2020, Harald Anlauf wrote:

> diff --git a/gcc/fortran/class.c b/gcc/fortran/class.c
> index db395624a16..6d0924da2b8 100644
> --- a/gcc/fortran/class.c
> +++ b/gcc/fortran/class.c
> @@ -484,7 +484,12 @@ get_unique_type_string (char *string, gfc_symbol *derived)
>    if (derived->attr.unlimited_polymorphic)
>      strcpy (dt_name, "STAR");
>    else
> -    strncpy (dt_name, gfc_dt_upper_string (derived->name), sizeof (dt_name));
> +    {
> +      dt_name[sizeof (dt_name)-1] = '\0';
> +      strcpy (dt_name, gfc_dt_upper_string (derived->name));
> +      if (dt_name[sizeof (dt_name)-1] != '\0')
> +       gfc_internal_error ("get_unique_type_string: identifier overflow");
> +    }

Why do you need to create the copy in the first place?

Please revert, if you cannot fix that soon.

Andreas.
Harald Anlauf May 30, 2020, 6:57 p.m. UTC | #8
> That is detecting it after the buffer overflow has happened already, that is
> too late, after UB anything can happen.
> +    {
> +      const char *upper = gfc_dt_upper_string (derived->name);
> +      size_t len = strnlen (upper, sizeof (dt_name));
> +      gcc_assert (len < sizeof (dt_name));
> +      memcpy (dt_name, upper, len);
> +      dt_name[len] = '\0';
> +    }
> does detect it before overflowing it.

OK.  Here's what I committed:


    PR fortran/95090 - ICE: identifier overflow

    Implement buffer overrun check for temporary that holds mangled names.

    2020-05-30  Harald Anlauf  <anlauf@gmx.de>

    gcc/fortran/
            PR fortran/95090
            * class.c (get_unique_type_string): Use buffer overrun check.

diff --git a/gcc/fortran/class.c b/gcc/fortran/class.c
index db395624a16..afd8885a1ea 100644
--- a/gcc/fortran/class.c
+++ b/gcc/fortran/class.c
@@ -484,7 +484,14 @@ get_unique_type_string (char *string, gfc_symbol *derived)
   if (derived->attr.unlimited_polymorphic)
     strcpy (dt_name, "STAR");
   else
-    strncpy (dt_name, gfc_dt_upper_string (derived->name), sizeof (dt_name));
+    {
+      const char *upper = gfc_dt_upper_string (derived->name);
+      size_t len = strnlen (upper, sizeof (dt_name));
+      if (len >= sizeof (dt_name))
+       gfc_internal_error ("get_unique_type_string: identifier overflow");
+      memcpy (dt_name, upper, len);
+      dt_name[len] = '\0';
+    }
   if (derived->attr.unlimited_polymorphic)
     sprintf (string, "_%s", dt_name);
   else if (derived->module)


Harald
Jakub Jelinek May 30, 2020, 7:11 p.m. UTC | #9
On Sat, May 30, 2020 at 08:57:46PM +0200, Harald Anlauf wrote:
> > That is detecting it after the buffer overflow has happened already, that is
> > too late, after UB anything can happen.
> > +    {
> > +      const char *upper = gfc_dt_upper_string (derived->name);
> > +      size_t len = strnlen (upper, sizeof (dt_name));
> > +      gcc_assert (len < sizeof (dt_name));
> > +      memcpy (dt_name, upper, len);
> > +      dt_name[len] = '\0';
> > +    }
> > does detect it before overflowing it.
> 
> OK.  Here's what I committed:
> 
> 
>     PR fortran/95090 - ICE: identifier overflow
> 
>     Implement buffer overrun check for temporary that holds mangled names.
> 
>     2020-05-30  Harald Anlauf  <anlauf@gmx.de>
> 
>     gcc/fortran/
>             PR fortran/95090
>             * class.c (get_unique_type_string): Use buffer overrun check.

Thanks.  Though, see Andreas' mail, I think he is right that there is really
no reason to have that dt_name local buffer at all, just changing
dt_name from an array to const char *dt_name; and changing the strcpy to
dt_name = "STAR";
and the former strncpy (dt_name, ...) to
dt_name = gfc_dt_upper_string (derived->name);
should do it.

There is a possible buffer overflow in the string with or without that
change but to fix that I think it would be desirable to pass not just the
string buffer to the function but also the length of the buffer and in the
function verify it will not overflow.  There is no reason to use sprintf
which is fairly expensive, and could be even simplified.

So, once dt_name is const char *, change that
  if (derived->attr.unlimited_polymorphic)
    sprintf (string, "_%s", dt_name);
  else if (derived->module)
    sprintf (string, "%s_%s", derived->module, dt_name);
  else if (derived->ns->proc_name)
    sprintf (string, "%s_%s", derived->ns->proc_name->name, dt_name);
  else
    sprintf (string, "_%s", dt_name);
to something like:
  const char *first = "";
  if (!derived->attr.unlimited_polymorphic)
    {
      if (derived->module)
	first = derived->module;
      else if (derived->ns->proc_name)
	first = derived->ns->proc_name->name;
    }
  size_t len1 = strlen (first), len2 = strlen (dt_name);
  if (len1 + 1 + len2 + 1 >= len) // len being the new passed argument - length of the buffer pointed by string
    gfc_internal_error (...);
  memcpy (string, first, len1);
  string[len1] = '_';
  memcpy (string + len1 + 1, dt_name, len2 + 1);

	Jakub
Jakub Jelinek May 30, 2020, 7:28 p.m. UTC | #10
On Sat, May 30, 2020 at 09:11:23PM +0200, Jakub Jelinek via Gcc-patches wrote:
> There is a possible buffer overflow in the string with or without that
> change but to fix that I think it would be desirable to pass not just the
> string buffer to the function but also the length of the buffer and in the
> function verify it will not overflow.  There is no reason to use sprintf
> which is fairly expensive, and could be even simplified.
> 
> So, once dt_name is const char *, change that
>   if (derived->attr.unlimited_polymorphic)
>     sprintf (string, "_%s", dt_name);
>   else if (derived->module)
>     sprintf (string, "%s_%s", derived->module, dt_name);
>   else if (derived->ns->proc_name)
>     sprintf (string, "%s_%s", derived->ns->proc_name->name, dt_name);
>   else
>     sprintf (string, "_%s", dt_name);
> to something like:
>   const char *first = "";
>   if (!derived->attr.unlimited_polymorphic)
>     {
>       if (derived->module)
> 	first = derived->module;
>       else if (derived->ns->proc_name)
> 	first = derived->ns->proc_name->name;
>     }
>   size_t len1 = strlen (first), len2 = strlen (dt_name);
>   if (len1 + 1 + len2 + 1 >= len) // len being the new passed argument - length of the buffer pointed by string
>     gfc_internal_error (...);
>   memcpy (string, first, len1);
>   string[len1] = '_';
>   memcpy (string + len1 + 1, dt_name, len2 + 1);

Or if you prefer replace everything starting with len1 above
with snprintf (string, len, "%s_%s", first, dt_name);
which will truncate (and if you need, you could
  if ((size_t) snprintf (string, len, "%s_%s", first, dt_name) >= len)
    gfc_internal_error (...);

	Jakub
Harald Anlauf May 30, 2020, 7:30 p.m. UTC | #11
> Thanks.  Though, see Andreas' mail, I think he is right that there is really
> no reason to have that dt_name local buffer at all, just changing
> dt_name from an array to const char *dt_name; and changing the strcpy to
> dt_name = "STAR";
> and the former strncpy (dt_name, ...) to
> dt_name = gfc_dt_upper_string (derived->name);
> should do it.

Certainly.

The functions in question existed long before I became involved in gfortran
development.  AFAIU is only the (rather) receently modified name mangling
to take into account newer Fortran language features that caused the ICE
when the maximum allowed symbol lengths are used.
This is hopefully now fully exerted in the testcase.

Symbol lengths are checked elsewhere.  A user program should not be able to
trigger an ICE here.  (Fingers crossed)

Of course one could always rewrite gfortran to use a more modern C style.
We're all looking for more volunteers!
(I'm much better at Fortran than at C).

Nevertheless, thanks for your patience and your explanations.  They might
positively influence my future contribution.

Harald
diff mbox series

Patch

diff --git a/gcc/fortran/class.c b/gcc/fortran/class.c
index 9aa3eb7282c..db395624a16 100644
--- a/gcc/fortran/class.c
+++ b/gcc/fortran/class.c
@@ -479,11 +479,12 @@  gfc_class_initializer (gfc_typespec *ts, gfc_expr *init_expr)
 static void
 get_unique_type_string (char *string, gfc_symbol *derived)
 {
-  char dt_name[GFC_MAX_SYMBOL_LEN+1];
+  /* Provide sufficient space to hold "Pdtsymbol".  */
+  char dt_name[GFC_MAX_SYMBOL_LEN+4];
   if (derived->attr.unlimited_polymorphic)
     strcpy (dt_name, "STAR");
   else
-    strcpy (dt_name, gfc_dt_upper_string (derived->name));
+    strncpy (dt_name, gfc_dt_upper_string (derived->name), sizeof (dt_name));
   if (derived->attr.unlimited_polymorphic)
     sprintf (string, "_%s", dt_name);
   else if (derived->module)
@@ -501,7 +502,8 @@  get_unique_type_string (char *string, gfc_symbol *derived)
 static void
 get_unique_hashed_string (char *string, gfc_symbol *derived)
 {
-  char tmp[2*GFC_MAX_SYMBOL_LEN+2];
+  /* Provide sufficient space to hold "symbol_Pdtsymbol".  */
+  char tmp[2*GFC_MAX_SYMBOL_LEN+5];
   get_unique_type_string (&tmp[0], derived);
   /* If string is too long, use hash value in hex representation (allow for
      extra decoration, cf. gfc_build_class_symbol & gfc_find_derived_vtab).
@@ -523,7 +525,8 @@  unsigned int
 gfc_hash_value (gfc_symbol *sym)
 {
   unsigned int hash = 0;
-  char c[2*(GFC_MAX_SYMBOL_LEN+1)];
+  /* Provide sufficient space to hold "symbol_Pdtsymbol".  */
+  char c[2*GFC_MAX_SYMBOL_LEN+5];
   int i, len;

   get_unique_type_string (&c[0], sym);