manual: Update alloca and variable length array documentation

Message ID 63922f93-860a-249a-c14c-7efea3a3189b@redhat.com
State New
Headers show

Commit Message

Florian Weimer Aug. 11, 2017, 9:53 a.m.
I'm resubmitting my old documentation patch (from early 2016) for
reconsideration.

I reinstated the mention of GNU compiler compatibility for strdupa and
strndupa.

Thanks,
Florian

Comments

Carlos O'Donell Aug. 11, 2017, 12:41 p.m. | #1
On 08/11/2017 05:53 AM, Florian Weimer wrote:
> I'm resubmitting my old documentation patch (from early 2016) for
> reconsideration.
> 
> I reinstated the mention of GNU compiler compatibility for strdupa and
> strndupa.

Can you pleas provide a reference to previous reviews and dicussions so
I can look over what was covered or rejected?

Thanks.

c.
Florian Weimer Aug. 11, 2017, 12:52 p.m. | #2
On 08/11/2017 02:41 PM, Carlos O'Donell wrote:
> On 08/11/2017 05:53 AM, Florian Weimer wrote:
>> I'm resubmitting my old documentation patch (from early 2016) for
>> reconsideration.
>>
>> I reinstated the mention of GNU compiler compatibility for strdupa and
>> strndupa.
> 
> Can you pleas provide a reference to previous reviews and dicussions so
> I can look over what was covered or rejected?

I think this was the old thread:

  <https://sourceware.org/ml/libc-alpha/2016-01/msg00019.html>

Florian
Carlos O'Donell Aug. 11, 2017, 1:11 p.m. | #3
On 08/11/2017 05:53 AM, Florian Weimer wrote:
> I'm resubmitting my old documentation patch (from early 2016) for
> reconsideration.
> 
> I reinstated the mention of GNU compiler compatibility for strdupa and
> strndupa.

Thanks for the updated reference to the previous conversation with Paul Eggert.
I read that as a reference for this review.

The changes look good to me, they are better than what we had before.

At a high level we are removing the limitation about use in parameter lists,
and adding warnings about security, all good to me.

Cheers,
Carlos.
Florian Weimer Aug. 11, 2017, 1:28 p.m. | #4
On 08/11/2017 03:11 PM, Carlos O'Donell wrote:
> On 08/11/2017 05:53 AM, Florian Weimer wrote:
>> I'm resubmitting my old documentation patch (from early 2016) for
>> reconsideration.
>>
>> I reinstated the mention of GNU compiler compatibility for strdupa and
>> strndupa.
> 
> Thanks for the updated reference to the previous conversation with Paul Eggert.
> I read that as a reference for this review.
> 
> The changes look good to me, they are better than what we had before.
> 
> At a high level we are removing the limitation about use in parameter lists,
> and adding warnings about security, all good to me.

What about this node name?

@c Node name preserved for backwards compatibility; the correct
@c terminology is ``variable length array''.
@node GNU C Variable-Size Arrays
@subsubsection ISO C Variable Length Arrays
@cindex variable length arrays

Should I change it, potentially invalidating hyperlinks?

The node name is quite visible even in the HTML version.

Thanks,
Florian
Paul Eggert Aug. 11, 2017, 6:28 p.m. | #5
Thanks, this looks good, with one quibble:
> +Compared to @code{malloc}, variable length arrays share the same
> +advantages and disadvantages as @code{alloca}.  In particular, there is
> +no error checking (and security vulnerabilities can result from large
> +allocation requests), and some @nongnusystems{} do not support variable
> +length arrays because they only support earlier versions of ISO C which
> +do not include variable length arrays.

The last 2 lines are obsolete. Although C99 required VLAs, a standard 
C11 implementation that defines __STDC_NO_VLA__ need not support VLAs. I 
suggest removing " because they only support earlier versions of ISO C 
which do not include variable length arrays".
Carlos O'Donell Aug. 14, 2017, 4:27 p.m. | #6
On 08/11/2017 09:28 AM, Florian Weimer wrote:
> On 08/11/2017 03:11 PM, Carlos O'Donell wrote:
>> On 08/11/2017 05:53 AM, Florian Weimer wrote:
>>> I'm resubmitting my old documentation patch (from early 2016) for
>>> reconsideration.
>>>
>>> I reinstated the mention of GNU compiler compatibility for strdupa and
>>> strndupa.
>>
>> Thanks for the updated reference to the previous conversation with Paul Eggert.
>> I read that as a reference for this review.
>>
>> The changes look good to me, they are better than what we had before.
>>
>> At a high level we are removing the limitation about use in parameter lists,
>> and adding warnings about security, all good to me.
> 
> What about this node name?
> 
> @c Node name preserved for backwards compatibility; the correct
> @c terminology is ``variable length array''.
> @node GNU C Variable-Size Arrays
> @subsubsection ISO C Variable Length Arrays
> @cindex variable length arrays
> 
> Should I change it, potentially invalidating hyperlinks?
> 
> The node name is quite visible even in the HTML version.

Yes, please change it to match.

I'm not worried about preserving html hyperlinks to individual
sections.

We need to be flexible when it comes to restructuring and improving
our documentation.

Cheers,
Carlos.

Patch

manual: Update alloca and variable length array documentation

2017-08-11  Florian Weimer  <fweimer@redhat.com>

	* manual/memory.texi (Variable Size Automatic): Document
	interaction between alloca and variable length arrays.  Mention
	function inlining.  Remove obsolete warning about alloca in
	function parameter lists.
	(Advantages of Alloca): Note that alloca is async-signal-safe.
	Mention C++ destructors and lack of length checking in open2.
	(Disadvantages of Alloca): Clarify consequences of the lack of
	error checking.  Do no mention the non-existing alloca emulation.
	(GNU C Variable-Size Arrays): Switch terminology from GNU C
	variable-sized arrays to ISO C variable length arrays.  Mention
	security aspect and aliasing violations.  Clarify loop behavior.
	Remove NB, now part of the alloca documentation.

	* manual/string.texi (Copying Strings and Arrays): Add warning
	about alloca and length checking to strdupa.  Rephrase restriction
	to GNU CC.
	(Truncating Strings): Add warning to strndupa.  Rephrase
	restriction to GNU CC.

diff --git a/manual/memory.texi b/manual/memory.texi
index 82f473806c..a5707e1e91 100644
--- a/manual/memory.texi
+++ b/manual/memory.texi
@@ -2802,13 +2802,28 @@  The function @code{alloca} supports a kind of half-dynamic allocation in
 which blocks are allocated dynamically but freed automatically.
 
 Allocating a block with @code{alloca} is an explicit action; you can
-allocate as many blocks as you wish, and compute the size at run time.  But
-all the blocks are freed when you exit the function that @code{alloca} was
-called from, just as if they were automatic variables declared in that
-function.  There is no way to free the space explicitly.
+allocate as many blocks as you wish, and compute the size at run time.
+Memory allocated this way is freed automatically, at some point after
+the scope which contains the @code{alloca} call is left:
+
+@itemize @bullet
+@item
+@cindex variable length arrays
+If the scope calling @code{alloca} contains a variable length array, or
+is nested in such a scope, then the object allocated with @code{alloca}
+is deallocated when the closest enclosing scope which defines a variable
+length array is left.
+
+@item
+If no enclosing scope with a variable length array exist, the allocated
+object is deallocated when the function is exited, either normally or
+abnormally (for example, by throwing a C++ exception).  The life time of
+such objects is not extended by function inlining.
+@end itemize
 
 The prototype for @code{alloca} is in @file{stdlib.h}.  This function is
-a BSD extension.
+a BSD extension.  It requires special support from the compiler, but
+most compilers (including the GNU compilers) support it.
 @pindex stdlib.h
 
 @deftypefun {void *} alloca (size_t @var{size})
@@ -2819,21 +2834,11 @@  The return value of @code{alloca} is the address of a block of @var{size}
 bytes of memory, allocated in the stack frame of the calling function.
 @end deftypefun
 
-Do not use @code{alloca} inside the arguments of a function call---you
-will get unpredictable results, because the stack space for the
-@code{alloca} would appear on the stack in the middle of the space for
-the function arguments.  An example of what to avoid is @code{foo (x,
-alloca (4), y)}.
-@c This might get fixed in future versions of GCC, but that won't make
-@c it safe with compilers generally.
-
 @menu
 * Alloca Example::              Example of using @code{alloca}.
 * Advantages of Alloca::        Reasons to use @code{alloca}.
 * Disadvantages of Alloca::     Reasons to avoid @code{alloca}.
-* GNU C Variable-Size Arrays::  Only in GNU C, here is an alternative
-				 method of allocating dynamically and
-				 freeing automatically.
+* GNU C Variable-Size Arrays::  On-stack dynamic allocation in ISO C.
 @end menu
 
 @node Alloca Example
@@ -2891,6 +2896,14 @@  blocks, space used for any size block can be reused for any other size.
 @code{alloca} does not cause memory fragmentation.
 
 @item
+@cindex mmap
+The @code{alloca} function can be safely called from a signal handler.
+But signal handlers may run with little stack space available, so
+it is unclear how much memory can be safely allocted with @code{alloca}.
+This means that robust code may have to use @code{mmap} instead.
+@xref{Memory-mapped I/O}.
+
+@item
 @cindex longjmp
 Nonlocal exits done with @code{longjmp} (@pxref{Non-Local Exits})
 automatically free the space allocated with @code{alloca} when they exit
@@ -2922,7 +2935,13 @@  freed even when an error occurs, with no special effort required.
 By contrast, the previous definition of @code{open2} (which uses
 @code{malloc} and @code{free}) would develop a memory leak if it were
 changed in this way.  Even if you are willing to make more changes to
-fix it, there is no easy way to do so.
+fix it, there is no easy way to do so (except to switch to C++ and
+destructors).
+
+Note that the @code{open2} example with @code{alloca} is incorrect if
+@code{str1} and @code{str2} can be very long strings because
+@code{alloca} does not fail gracefully in case too many bytes are
+requested (see below).
 @end itemize
 
 @node Disadvantages of Alloca
@@ -2936,22 +2955,38 @@  These are the disadvantages of @code{alloca} in comparison with
 @itemize @bullet
 @item
 If you try to allocate more memory than the machine can provide, you
-don't get a clean error message.  Instead you get a fatal signal like
-the one you would get from an infinite recursion; probably a
-segmentation violation (@pxref{Program Error Signals}).
+don't get a clean error message.  Instead, you end up with undefined
+behavior.  In many cases, the program will just crash (which can still
+result in a denial-of-service vulnerability), but sometimes, it is
+possible to abuse an unbounded @code{alloca} to cause other security
+vulnerabilities such as information disclosure or arbitrary code
+execution.
 
 @item
 Some @nongnusystems{} fail to support @code{alloca}, so it is less
-portable.  However, a slower emulation of @code{alloca} written in C
-is available for use on systems with this deficiency.
+portable.
 @end itemize
 
+Due to lack of error checking, security-sensitive code must ensure that
+no large objects are allocated with @code{alloca}.  In general this
+means that the size argument is checked against an arbitrary limit (say,
+4096), and an error is returned if it is exceeded, or fallback to
+@code{malloc} is performed.
+
+Extra care is required when @code{alloca} is called from within the loop
+or from a function called recursively.  In these cases, depending on the
+loop iteration count or the depth of the recursion, smaller allocation
+sizes can exhaust the stack and trigger undefined behavior.  This
+problem exists with callback functions as well.
+
+@c Node name preserved for backwards compatibility; the correct
+@c terminology is ``variable length array''.
 @node GNU C Variable-Size Arrays
-@subsubsection GNU C Variable-Size Arrays
-@cindex variable-sized arrays
+@subsubsection ISO C Variable Length Arrays
+@cindex variable length arrays
 
-In GNU C, you can replace most uses of @code{alloca} with an array of
-variable size.  Here is how @code{open2} would look then:
+In ISO C, you can replace most uses of @code{alloca} with an array of
+variable length.  Here is how @code{open2} would look then:
 
 @smallexample
 int open2 (char *str1, char *str2, int flags, int mode)
@@ -2962,26 +2997,40 @@  int open2 (char *str1, char *str2, int flags, int mode)
 @}
 @end smallexample
 
+Compared to @code{malloc}, variable length arrays share the same
+advantages and disadvantages as @code{alloca}.  In particular, there is
+no error checking (and security vulnerabilities can result from large
+allocation requests), and some @nongnusystems{} do not support variable
+length arrays because they only support earlier versions of ISO C which
+do not include variable length arrays.
+
+The variable length array version of @code{open2}, as shown above, still
+suffers from the same problem as the @code{alloca}-based variant: It
+does not check that the strings are short enough, to avoid undefined
+behavior which are the result of large allocation requests.
+
 But @code{alloca} is not always equivalent to a variable-sized array, for
 several reasons:
 
 @itemize @bullet
 @item
-A variable size array's space is freed at the end of the scope of the
-name of the array.  The space allocated with @code{alloca}
-remains until the end of the function.
+Memory returned by @code{alloca} is untyped.  A variable length array
+has always a specific type (even if it is an array of characters), and
+using it with another type can introduce aliasing violations into the
+program.
 
 @item
-It is possible to use @code{alloca} within a loop, allocating an
-additional block on each iteration.  This is impossible with
-variable-sized arrays.
+A variable length array is deallocated at the end of the scope of the
+name of the array.  The space allocated with @code{alloca} remains until
+the end of the function or the closest enclosing scope which defines any
+variable length array.
 @end itemize
 
-@strong{NB:} If you mix use of @code{alloca} and variable-sized arrays
-within one function, exiting a scope in which a variable-sized array was
-declared frees all blocks allocated with @code{alloca} during the
-execution of that scope.
-
+The second difference is most pronounced in loops: With @code{alloca},
+the allocated object can be referenced from later iterations and after
+the loop body has been exited.  But a loop with a variable length array
+can execute an arbitrary number of times, without exhausting the
+available stack, as long as the individual arrays are short enough.
 
 @node Resizing the Data Segment
 @section Resizing the Data Segment
diff --git a/manual/string.texi b/manual/string.texi
index ac02c6d85e..d50527f585 100644
--- a/manual/string.texi
+++ b/manual/string.texi
@@ -626,24 +626,20 @@  The behavior of @code{wcpcpy} is undefined if the strings overlap.
 This macro is similar to @code{strdup} but allocates the new string
 using @code{alloca} instead of @code{malloc} (@pxref{Variable Size
 Automatic}).  This means of course the returned string has the same
-limitations as any block of memory allocated using @code{alloca}.
+limitations as any block of memory allocated using @code{alloca}, and
+@code{strdupa} can introduce security vulnerabilities due to the lack of
+failure checking.
 
-For obvious reasons @code{strdupa} is implemented only as a macro;
-you cannot get the address of this function.  Despite this limitation
-it is a useful function.  The following code shows a situation where
-using @code{malloc} would be a lot more expensive.
+For obvious reasons @code{strdupa} is implemented only as a macro; you
+cannot get the address of this function.  The following code shows an
+example of its use:
 
 @smallexample
 @include strdupa.c.texi
 @end smallexample
 
-Please note that calling @code{strtok} using @var{path} directly is
-invalid.  It is also not allowed to call @code{strdupa} in the argument
-list of @code{strtok} since @code{strdupa} uses @code{alloca}
-(@pxref{Variable Size Automatic}) can interfere with the parameter
-passing.
-
-This function is only available if GNU CC is used.
+The @code{strdupa} macro is only available with GNU-compatible
+compilers.
 @end deftypefn
 
 @deftypefun void bcopy (const void *@var{from}, void *@var{to}, size_t @var{size})
@@ -934,16 +930,16 @@  processing text.
 This function is similar to @code{strndup} but like @code{strdupa} it
 allocates the new string using @code{alloca} @pxref{Variable Size
 Automatic}.  The same advantages and limitations of @code{strdupa} are
-valid for @code{strndupa}, too.
+valid for @code{strndupa}.  In particular, @code{strndupa} can introduce
+security vulnerabilities due to the lack of error checking.
 
 This function is implemented only as a macro, just like @code{strdupa}.
-Just as @code{strdupa} this macro also must not be used inside the
-parameter list in a function call.
 
 As noted below, this function is generally a poor choice for
 processing text.
 
-@code{strndupa} is only available if GNU CC is used.
+The @code{strndupa} macro is only available with GNU-compatible
+compilers.
 @end deftypefn
 
 @deftypefun {char *} stpncpy (char *restrict @var{to}, const char *restrict @var{from}, size_t @var{size})