diff mbox series

stdio: Remove memory leak from multibyte convertion (BZ#25691)

Message ID 20200319155202.12366-1-adhemerval.zanella@linaro.org
State New
Headers show
Series stdio: Remove memory leak from multibyte convertion (BZ#25691) | expand

Commit Message

develop--- via Libc-alpha March 19, 2020, 3:52 p.m. UTC
Both default and wide printf functions might leak memory when
manipulate multibyte characters conversion depending of the size
of the input (whether __libc_use_alloca trigger or not the fallback
heap allocation).

This patch fixes it by issuing the required memory free operation
and also change the alloca/malloc strategy to a scratch buffer.

The testcase uses input argument size that trigger memory leaks
on unpatched code, using a scratch buffer the threashold to use
heap allocation is lower.

Checked on x86_64-linux-gnu and i686-linux-gnu.
---
 stdio-common/Makefile             |   9 ++-
 stdio-common/tst-printf-bz25691.c | 110 ++++++++++++++++++++++++++++++
 stdio-common/vfprintf-internal.c  | 108 +++++++++++++----------------
 3 files changed, 163 insertions(+), 64 deletions(-)
 create mode 100644 stdio-common/tst-printf-bz25691.c

Comments

Florian Weimer March 19, 2020, 3:52 p.m. UTC | #1
* Adhemerval Zanella via Libc-alpha:

> Both default and wide printf functions might leak memory when
> manipulate multibyte characters conversion depending of the size
> of the input (whether __libc_use_alloca trigger or not the fallback
> heap allocation).
>
> This patch fixes it by issuing the required memory free operation
> and also change the alloca/malloc strategy to a scratch buffer.
>
> The testcase uses input argument size that trigger memory leaks
> on unpatched code, using a scratch buffer the threashold to use
> heap allocation is lower.

Should we remove the malloc altogether?

I think we already have patches for that.
develop--- via Libc-alpha March 19, 2020, 3:59 p.m. UTC | #2
On 19/03/2020 12:52, Florian Weimer wrote:
> * Adhemerval Zanella via Libc-alpha:
> 
>> Both default and wide printf functions might leak memory when
>> manipulate multibyte characters conversion depending of the size
>> of the input (whether __libc_use_alloca trigger or not the fallback
>> heap allocation).
>>
>> This patch fixes it by issuing the required memory free operation
>> and also change the alloca/malloc strategy to a scratch buffer.
>>
>> The testcase uses input argument size that trigger memory leaks
>> on unpatched code, using a scratch buffer the threashold to use
>> heap allocation is lower.
> 
> Should we remove the malloc altogether?
> 
> I think we already have patches for that.
> 

Which malloc exactly (there are still some in printf) and which 
patches are you referring ?
Florian Weimer March 19, 2020, 4:49 p.m. UTC | #3
* Adhemerval Zanella via Libc-alpha:

> On 19/03/2020 12:52, Florian Weimer wrote:
>> * Adhemerval Zanella via Libc-alpha:
>> 
>>> Both default and wide printf functions might leak memory when
>>> manipulate multibyte characters conversion depending of the size
>>> of the input (whether __libc_use_alloca trigger or not the fallback
>>> heap allocation).
>>>
>>> This patch fixes it by issuing the required memory free operation
>>> and also change the alloca/malloc strategy to a scratch buffer.
>>>
>>> The testcase uses input argument size that trigger memory leaks
>>> on unpatched code, using a scratch buffer the threashold to use
>>> heap allocation is lower.
>> 
>> Should we remove the malloc altogether?
>> 
>> I think we already have patches for that.
>> 
>
> Which malloc exactly (there are still some in printf) and which 
> patches are you referring ?

This seems to be it:

<https://sourceware.org/pipermail/libc-alpha/2017-June/082098.html>
develop--- via Libc-alpha March 19, 2020, 8:53 p.m. UTC | #4
On 19/03/2020 13:49, Florian Weimer wrote:
> * Adhemerval Zanella via Libc-alpha:
> 
>> On 19/03/2020 12:52, Florian Weimer wrote:
>>> * Adhemerval Zanella via Libc-alpha:
>>>
>>>> Both default and wide printf functions might leak memory when
>>>> manipulate multibyte characters conversion depending of the size
>>>> of the input (whether __libc_use_alloca trigger or not the fallback
>>>> heap allocation).
>>>>
>>>> This patch fixes it by issuing the required memory free operation
>>>> and also change the alloca/malloc strategy to a scratch buffer.
>>>>
>>>> The testcase uses input argument size that trigger memory leaks
>>>> on unpatched code, using a scratch buffer the threashold to use
>>>> heap allocation is lower.
>>>
>>> Should we remove the malloc altogether?
>>>
>>> I think we already have patches for that.
>>>
>>
>> Which malloc exactly (there are still some in printf) and which 
>> patches are you referring ?
> 
> This seems to be it:
> 
> <https://sourceware.org/pipermail/libc-alpha/2017-June/082098.html>
> 

Hum I thought you have pushed it, but maybe you had assumed my last
message was a blocker. I will resend your version with the modifications
discussed in thread along with the testcase.
diff mbox series

Patch

diff --git a/stdio-common/Makefile b/stdio-common/Makefile
index 95af0c12d7..0a8d66b846 100644
--- a/stdio-common/Makefile
+++ b/stdio-common/Makefile
@@ -66,6 +66,7 @@  tests := tstscanf test_rdwr test-popen tstgetln test-fseek \
 	 tst-scanf-round \
 	 tst-renameat2 tst-bz11319 tst-bz11319-fortify2 \
 	 scanf14a scanf16a \
+	 tst-printf-bz25691
 
 
 test-srcs = tst-unbputc tst-printf tst-printfsz-islongdouble
@@ -75,10 +76,12 @@  tests-special += $(objpfx)tst-unbputc.out $(objpfx)tst-printf.out \
 		 $(objpfx)tst-printf-bz18872-mem.out \
 		 $(objpfx)tst-setvbuf1-cmp.out \
 		 $(objpfx)tst-vfprintf-width-prec-mem.out \
-		 $(objpfx)tst-printfsz-islongdouble.out
+		 $(objpfx)tst-printfsz-islongdouble.out \
+		 $(objpfx)tst-printf-bz25691-mem.out
 generated += tst-printf-bz18872.c tst-printf-bz18872.mtrace \
 	     tst-printf-bz18872-mem.out \
-	     tst-vfprintf-width-prec.mtrace tst-vfprintf-width-prec-mem.out
+	     tst-vfprintf-width-prec.mtrace tst-vfprintf-width-prec-mem.out \
+	     tst-printf-bz25691.mtrace tst-printf-bz25691-mem.out
 endif
 
 include ../Rules
@@ -100,6 +103,8 @@  endif
 tst-printf-bz18872-ENV = MALLOC_TRACE=$(objpfx)tst-printf-bz18872.mtrace
 tst-vfprintf-width-prec-ENV = \
   MALLOC_TRACE=$(objpfx)tst-vfprintf-width-prec.mtrace
+tst-printf-bz25691-ENV = \
+  MALLOC_TRACE=$(objpfx)tst-printf-bz25691.mtrace
 
 $(objpfx)tst-unbputc.out: tst-unbputc.sh $(objpfx)tst-unbputc
 	$(SHELL) $< $(common-objpfx) '$(test-program-prefix)' > $@; \
diff --git a/stdio-common/tst-printf-bz25691.c b/stdio-common/tst-printf-bz25691.c
new file mode 100644
index 0000000000..4ebb91fb44
--- /dev/null
+++ b/stdio-common/tst-printf-bz25691.c
@@ -0,0 +1,110 @@ 
+/* Test for memory leak with large width (BZ#25691).
+   Copyright (C) 2020 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <wchar.h>
+#include <stdint.h>
+#include <locale.h>
+
+#include <mcheck.h>
+#include <support/check.h>
+#include <support/support.h>
+
+static int
+do_test (void)
+{
+  mtrace ();
+
+  /* For 's' conversion specifier with 'l' modifier the array must be
+     converted to multibyte characters up to the precision specific
+     value.  This might require a heap allocated temporary buffer for
+     large precisions.  */
+  {
+    /* The input size value is to force a heap allocation on temporary
+       buffer.  */
+    const size_t winputsize = 64 * 1024 + 1;
+    wchar_t *winput = xmalloc (winputsize * sizeof (wchar_t));
+    wmemset (winput, L'a', winputsize - 1);
+    winput[winputsize - 1] = L'\0';
+
+    char result[9];
+    const char expected[] = "aaaaaaaa";
+    int ret;
+
+    ret = snprintf (result, sizeof (result), "%.65537ls", winput);
+    TEST_COMPARE (ret, winputsize - 1);
+    TEST_COMPARE_BLOB (result, sizeof (result), expected, sizeof (expected));
+
+    ret = snprintf (result, sizeof (result), "%ls", winput);
+    TEST_COMPARE (ret, winputsize - 1);
+    TEST_COMPARE_BLOB (result, sizeof (result), expected, sizeof (expected));
+
+    free (winput);
+  }
+
+  /* For 's' converstion specifier the array is interpreted as a multibyte
+     character sequence and converted to wide characters up to the precision
+     specific value.  This might require a heap allocated temporary buffer
+     for large precisions.  */
+  {
+    /* The input size value is to force a heap allocation on temporary
+       buffer.  */
+    const size_t mbssize = 32 * 1024;
+    char *mbs = xmalloc (mbssize);
+    memset (mbs, 'a', mbssize - 1);
+    mbs[mbssize - 1] = '\0';
+
+    const size_t expectedsize = 32 * 1024;
+    wchar_t *expected = xmalloc (expectedsize * sizeof (wchar_t));
+    wmemset (expected, L'a', expectedsize - 1);
+    expected[expectedsize-1] = L'\0';
+
+    const size_t resultsize = mbssize * sizeof (wchar_t);
+    wchar_t *result = xmalloc (resultsize);
+    int ret;
+
+    ret = swprintf (result, resultsize, L"%.65537s", mbs);
+    TEST_COMPARE (ret, mbssize - 1);
+    TEST_COMPARE_BLOB (result, (ret + 1) * sizeof (wchar_t),
+		       expected, expectedsize * sizeof (wchar_t));
+
+    ret = swprintf (result, resultsize, L"%1$.65537s", mbs);
+    TEST_COMPARE (ret, mbssize - 1);
+    TEST_COMPARE_BLOB (result, (ret + 1) * sizeof (wchar_t),
+		       expected, expectedsize * sizeof (wchar_t));
+
+    /* Same test, but with an invalid multibyte sequence.  */
+    mbs[mbssize - 2] = 0xff;
+
+    ret = swprintf (result, resultsize, L"%.65537s", mbs);
+    TEST_COMPARE (ret, -1);
+
+    ret = swprintf (result, resultsize, L"%1$.65537s", mbs);
+    TEST_COMPARE (ret, -1);
+
+    free (mbs);
+    free (result);
+    free (expected);
+  }
+
+  return 0;
+}
+
+#include <support/test-driver.c>
diff --git a/stdio-common/vfprintf-internal.c b/stdio-common/vfprintf-internal.c
index 3be92d4b6e..4ec246e081 100644
--- a/stdio-common/vfprintf-internal.c
+++ b/stdio-common/vfprintf-internal.c
@@ -1020,10 +1020,6 @@  static const uint8_t jump_table[] =
       break;								      \
 									      \
     LABEL (form_string):						      \
-      {									      \
-	size_t len;							      \
-	int string_malloced;						      \
-									      \
 	/* The string argument could in fact be `char *' or `wchar_t *'.      \
 	   But this should not make a difference here.  */		      \
 	if (fspec == NULL)						      \
@@ -1032,9 +1028,12 @@  static const uint8_t jump_table[] =
 	  string = (CHAR_T *) args_value[fspec->data_arg].pa_wstring;	      \
 									      \
 	/* Entry point for printing other strings.  */			      \
-      LABEL (print_string):						      \
+    LABEL (print_string):						      \
+      {									      \
+	size_t len;							      \
+	struct scratch_buffer stringbuf;				      \
+	scratch_buffer_init (&stringbuf);				      \
 									      \
-	string_malloced = 0;						      \
 	if (string == NULL)						      \
 	  {								      \
 	    /* Write "(null)" if there's space.  */			      \
@@ -1061,28 +1060,20 @@  static const uint8_t jump_table[] =
 	    /* Allocate dynamically an array which definitely is long	      \
 	       enough for the wide character version.  Each byte in the	      \
 	       multi-byte string can produce at most one wide character.  */  \
-	    if (__glibc_unlikely (len > SIZE_MAX / sizeof (wchar_t)))	      \
-	      {								      \
-		__set_errno (EOVERFLOW);				      \
-		done = -1;						      \
-		goto all_done;						      \
-	      }								      \
-	    else if (__libc_use_alloca (len * sizeof (wchar_t)))	      \
-	      string = (CHAR_T *) alloca (len * sizeof (wchar_t));	      \
-	    else if ((string = (CHAR_T *) malloc (len * sizeof (wchar_t)))    \
-		     == NULL)						      \
+	    if (scratch_buffer_set_array_size (&stringbuf, len,		      \
+					       sizeof (wchar_t)))	      \
 	      {								      \
-		done = -1;						      \
-		goto all_done;						      \
+	        string = stringbuf.data;				      \
+	        memset (&mbstate, '\0', sizeof (mbstate_t));		      \
+	        len = __mbsrtowcs (string, &mbs, len, &mbstate);	      \
 	      }								      \
-	    else							      \
-	      string_malloced = 1;					      \
+            else							      \
+	      len = -1;							      \
 									      \
-	    memset (&mbstate, '\0', sizeof (mbstate_t));		      \
-	    len = __mbsrtowcs (string, &mbs, len, &mbstate);		      \
 	    if (len == (size_t) -1)					      \
 	      {								      \
 		/* Illegal multibyte character.  */			      \
+		scratch_buffer_free (&stringbuf);			      \
 		done = -1;						      \
 		goto all_done;						      \
 	      }								      \
@@ -1100,16 +1091,16 @@  static const uint8_t jump_table[] =
 	if ((width -= len) < 0)						      \
 	  {								      \
 	    outstring (string, len);					      \
-	    break;							      \
 	  }								      \
-									      \
-	if (!left)							      \
-	  PAD (L' ');							      \
-	outstring (string, len);					      \
-	if (left)							      \
-	  PAD (L' ');							      \
-	if (__glibc_unlikely (string_malloced))				      \
-	  free (string);						      \
+	else								      \
+	  {								      \
+	    if (!left)							      \
+	      PAD (L' ');						      \
+	    outstring (string, len);					      \
+	    if (left)							      \
+	      PAD (L' ');						      \
+	  }								      \
+	scratch_buffer_free (&stringbuf);				      \
       }									      \
       break;
 #else
@@ -1156,10 +1147,6 @@  static const uint8_t jump_table[] =
       break;								      \
 									      \
     LABEL (form_string):						      \
-      {									      \
-	size_t len;							      \
-	int string_malloced;						      \
-									      \
 	/* The string argument could in fact be `char *' or `wchar_t *'.      \
 	   But this should not make a difference here.  */		      \
 	if (fspec == NULL)						      \
@@ -1168,9 +1155,12 @@  static const uint8_t jump_table[] =
 	  string = (char *) args_value[fspec->data_arg].pa_string;	      \
 									      \
 	/* Entry point for printing other strings.  */			      \
-      LABEL (print_string):						      \
+    LABEL (print_string):						      \
+      {									      \
+	size_t len;							      \
+	struct scratch_buffer stringbuf;				      \
+	scratch_buffer_init (&stringbuf);				      \
 									      \
-	string_malloced = 0;						      \
 	if (string == NULL)						      \
 	  {								      \
 	    /* Write "(null)" if there's space.  */			      \
@@ -1201,19 +1191,16 @@  static const uint8_t jump_table[] =
 									      \
 	    memset (&mbstate, '\0', sizeof (mbstate_t));		      \
 									      \
+	    len = -1;							      \
 	    if (prec >= 0)						      \
 	      {								      \
 		/* The string `s2' might not be NUL terminated.  */	      \
-		if (__libc_use_alloca (prec))				      \
-		  string = (char *) alloca (prec);			      \
-		else if ((string = (char *) malloc (prec)) == NULL)	      \
+		if (scratch_buffer_set_array_size (&stringbuf, prec,	      \
+						   sizeof (char)))	      \
 		  {							      \
-		    done = -1;						      \
-		    goto all_done;					      \
+		    string = stringbuf.data;				      \
+		    len = __wcsrtombs (string, &s2, prec, &mbstate);	      \
 		  }							      \
-		else							      \
-		  string_malloced = 1;					      \
-		len = __wcsrtombs (string, &s2, prec, &mbstate);	      \
 	      }								      \
 	    else							      \
 	      {								      \
@@ -1222,22 +1209,19 @@  static const uint8_t jump_table[] =
 		  {							      \
 		    assert (__mbsinit (&mbstate));			      \
 		    s2 = (const wchar_t *) string;			      \
-		    if (__libc_use_alloca (len + 1))			      \
-		      string = (char *) alloca (len + 1);		      \
-		    else if ((string = (char *) malloc (len + 1)) == NULL)    \
+		    if (scratch_buffer_set_array_size (&stringbuf, len + 1,   \
+						       sizeof (char)))	      \
 		      {							      \
-			done = -1;					      \
-			goto all_done;					      \
+			string = stringbuf.data;			      \
+		        __wcsrtombs (string, &s2, len + 1, &mbstate);	      \
 		      }							      \
-		    else						      \
-		      string_malloced = 1;				      \
-		    (void) __wcsrtombs (string, &s2, len + 1, &mbstate);      \
 		  }							      \
 	      }								      \
 									      \
 	    if (len == (size_t) -1)					      \
 	      {								      \
 		/* Illegal wide-character string.  */			      \
+		scratch_buffer_free (&stringbuf);			      \
 		done = -1;						      \
 		goto all_done;						      \
 	      }								      \
@@ -1246,16 +1230,16 @@  static const uint8_t jump_table[] =
 	if ((width -= len) < 0)						      \
 	  {								      \
 	    outstring (string, len);					      \
-	    break;							      \
 	  }								      \
-									      \
-	if (!left)							      \
-	  PAD (' ');							      \
-	outstring (string, len);					      \
-	if (left)							      \
-	  PAD (' ');							      \
-	if (__glibc_unlikely (string_malloced))			              \
-	  free (string);						      \
+	else								      \
+	  {								      \
+	    if (!left)							      \
+	      PAD (' ');						      \
+	    outstring (string, len);					      \
+	    if (left)							      \
+	      PAD (' ');						      \
+	  }								      \
+        scratch_buffer_free (&stringbuf);				      \
       }									      \
       break;
 #endif