mention disabling GCC built-ins for customization

Message ID 2f2f96d3-5487-f791-8554-310beae0721b@gmail.com
State New
Headers show
Series
  • mention disabling GCC built-ins for customization
Related show

Commit Message

Martin Sebor June 13, 2018, 6:19 p.m.
The GCC sprintf optimization introduced in GCC 7 relies on
the conformance of the library implementation of formatted
I/O functions.  When a sprintf customization changes
the effects of the function in a way that's observable by
a strictly conforming program it might interfere with
the optimization.

Since this may not be obvious to users the attached patch
adds a couple of sentences to the manual to make it clear
that the sprintf built-in handling needs to be disabled
when customizing the function.

Martin

Comments

Florian Weimer June 13, 2018, 8:35 p.m. | #1
* Martin Sebor:

>  @strong{Portability Note:} The ability to extend the syntax of
>  @code{printf} template strings is a GNU extension.  ISO standard C has
> -nothing similar.
> +nothing similar.  When using the GNU C compiler or any other compiler
> +that interprets calls to standard I/O functions according to the rules
> +of the language standard it is necessary to disable such handling by
> +the appropriate compiler option.  Otherwise the behavior of a program
> +that relies on the extension is undefined.

Aren't there ISO extensions to C which define additional format
specifiers which GCC knows nothing about?  So maybe it makes more
sense to say that if the application uses format specifiers not known
by GCC, behavior is undefined (unless the compiler option is used).
Martin Sebor June 13, 2018, 8:55 p.m. | #2
On 06/13/2018 02:35 PM, Florian Weimer wrote:
> * Martin Sebor:
>
>>  @strong{Portability Note:} The ability to extend the syntax of
>>  @code{printf} template strings is a GNU extension.  ISO standard C has
>> -nothing similar.
>> +nothing similar.  When using the GNU C compiler or any other compiler
>> +that interprets calls to standard I/O functions according to the rules
>> +of the language standard it is necessary to disable such handling by
>> +the appropriate compiler option.  Otherwise the behavior of a program
>> +that relies on the extension is undefined.
>
> Aren't there ISO extensions to C which define additional format
> specifiers which GCC knows nothing about?  So maybe it makes more
> sense to say that if the application uses format specifiers not known
> by GCC, behavior is undefined (unless the compiler option is used).

The GCC optimization is disabled when the format string contains
invalid or unhandled specifiers/modifiers etc, so even those may
still be undefined in Glibc they aren't a problem for GCC.

What would cause a problem for the GCC optimization is a change
to the behavior of one of the standard conversions, like %i, or
%s.  One example would be changing the number of bytes output by
the conversion.  Another example of a future GCC optimization
that would lead to undefined behavior is a hook that modified
the string argument to %s (when GCC starts to assume that
the argument is not clobbered by a sprintf call).

Martin
Florian Weimer June 13, 2018, 9:01 p.m. | #3
* Martin Sebor:

> On 06/13/2018 02:35 PM, Florian Weimer wrote:
>> * Martin Sebor:
>>
>>>  @strong{Portability Note:} The ability to extend the syntax of
>>>  @code{printf} template strings is a GNU extension.  ISO standard C has
>>> -nothing similar.
>>> +nothing similar.  When using the GNU C compiler or any other compiler
>>> +that interprets calls to standard I/O functions according to the rules
>>> +of the language standard it is necessary to disable such handling by
>>> +the appropriate compiler option.  Otherwise the behavior of a program
>>> +that relies on the extension is undefined.
>>
>> Aren't there ISO extensions to C which define additional format
>> specifiers which GCC knows nothing about?  So maybe it makes more
>> sense to say that if the application uses format specifiers not known
>> by GCC, behavior is undefined (unless the compiler option is used).
>
> The GCC optimization is disabled when the format string contains
> invalid or unhandled specifiers/modifiers etc, so even those may
> still be undefined in Glibc they aren't a problem for GCC.

Good.

> What would cause a problem for the GCC optimization is a change
> to the behavior of one of the standard conversions, like %i, or
> %s.  One example would be changing the number of bytes output by
> the conversion.  Another example of a future GCC optimization
> that would lead to undefined behavior is a hook that modified
> the string argument to %s (when GCC starts to assume that
> the argument is not clobbered by a sprintf call).

So it's not so much about extending the syntax, but altering the
behavior of existing syntax, right?
Martin Sebor June 13, 2018, 9:30 p.m. | #4
On 06/13/2018 03:01 PM, Florian Weimer wrote:
> * Martin Sebor:
>
>> On 06/13/2018 02:35 PM, Florian Weimer wrote:
>>> * Martin Sebor:
>>>
>>>>  @strong{Portability Note:} The ability to extend the syntax of
>>>>  @code{printf} template strings is a GNU extension.  ISO standard C has
>>>> -nothing similar.
>>>> +nothing similar.  When using the GNU C compiler or any other compiler
>>>> +that interprets calls to standard I/O functions according to the rules
>>>> +of the language standard it is necessary to disable such handling by
>>>> +the appropriate compiler option.  Otherwise the behavior of a program
>>>> +that relies on the extension is undefined.
>>>
>>> Aren't there ISO extensions to C which define additional format
>>> specifiers which GCC knows nothing about?  So maybe it makes more
>>> sense to say that if the application uses format specifiers not known
>>> by GCC, behavior is undefined (unless the compiler option is used).
>>
>> The GCC optimization is disabled when the format string contains
>> invalid or unhandled specifiers/modifiers etc, so even those may
>> still be undefined in Glibc they aren't a problem for GCC.
>
> Good.
>
>> What would cause a problem for the GCC optimization is a change
>> to the behavior of one of the standard conversions, like %i, or
>> %s.  One example would be changing the number of bytes output by
>> the conversion.  Another example of a future GCC optimization
>> that would lead to undefined behavior is a hook that modified
>> the string argument to %s (when GCC starts to assume that
>> the argument is not clobbered by a sprintf call).
>
> So it's not so much about extending the syntax, but altering the
> behavior of existing syntax, right?

Yes, that's probably pretty close.

Just to be clear, it extends beyond changes to the printf behavior
of directives.  A %s hook, for example, cannot rely on being called
for every %s conversion, even if it doesn't change its behavior.
(Say if all it did was count its occurrences.)  This is because
GCC transforms printf("%s", s) to puts(s) and sprintf(d, "%s", s)
to stcrpy(d, s).

But adding a hook for a new/undefined conversion specification
that doesn't match an existing one in any way should not be
okay.

Martin
Andreas Schwab June 14, 2018, 7:25 a.m. | #5
On Jun 13 2018, Martin Sebor <msebor@gmail.com> wrote:

> diff --git a/manual/stdio.texi b/manual/stdio.texi
> index 38be236..d945955 100644
> --- a/manual/stdio.texi
> +++ b/manual/stdio.texi
> @@ -2963,7 +2963,11 @@ The facilities of this section are declared in the header file
>  
>  @strong{Portability Note:} The ability to extend the syntax of
>  @code{printf} template strings is a GNU extension.  ISO standard C has
> -nothing similar.
> +nothing similar.  When using the GNU C compiler or any other compiler
> +that interprets calls to standard I/O functions according to the rules
> +of the language standard it is necessary to disable such handling by
> +the appropriate compiler option.  Otherwise the behavior of a program
> +that relies on the extension is undefined.

The manual already says that redefining existing conversions causes
problems:

    You can redefine the standard output conversions, but this is probably
    not a good idea because of the potential for confusion.  Library routines
    written by other people could break if you do this.

We should extend that with a stronger language, independent of any
compiler behaviour.

Andreas.
Martin Sebor June 14, 2018, 7:11 p.m. | #6
On 06/14/2018 01:25 AM, Andreas Schwab wrote:
> On Jun 13 2018, Martin Sebor <msebor@gmail.com> wrote:
>
>> diff --git a/manual/stdio.texi b/manual/stdio.texi
>> index 38be236..d945955 100644
>> --- a/manual/stdio.texi
>> +++ b/manual/stdio.texi
>> @@ -2963,7 +2963,11 @@ The facilities of this section are declared in the header file
>>
>>  @strong{Portability Note:} The ability to extend the syntax of
>>  @code{printf} template strings is a GNU extension.  ISO standard C has
>> -nothing similar.
>> +nothing similar.  When using the GNU C compiler or any other compiler
>> +that interprets calls to standard I/O functions according to the rules
>> +of the language standard it is necessary to disable such handling by
>> +the appropriate compiler option.  Otherwise the behavior of a program
>> +that relies on the extension is undefined.
>
> The manual already says that redefining existing conversions causes
> problems:
>
>     You can redefine the standard output conversions, but this is probably
>     not a good idea because of the potential for confusion.  Library routines
>     written by other people could break if you do this.
>
> We should extend that with a stronger language, independent of any
> compiler behaviour.

That sounds fine to me.  Attached is an updated patch that also
adds text to this paragraph.  I added a Portability Note before
the text to highlight the portability impact, similarly to
the prior paragraph.

Martin
diff --git a/manual/stdio.texi b/manual/stdio.texi
index 38be236..26a570f 100644
--- a/manual/stdio.texi
+++ b/manual/stdio.texi
@@ -2963,7 +2963,11 @@ The facilities of this section are declared in the header file
 
 @strong{Portability Note:} The ability to extend the syntax of
 @code{printf} template strings is a GNU extension.  ISO standard C has
-nothing similar.
+nothing similar.  When using the GNU C compiler or any other compiler
+that interprets calls to standard I/O functions according to the rules
+of the language standard it is necessary to disable such handling by
+the appropriate compiler option.  Otherwise the behavior of a program
+that relies on the extension is undefined.
 
 @node Registering New Conversions
 @subsection Registering New Conversions
@@ -3017,9 +3021,13 @@ function when this format specifier appears in the format string.
 The return value is @code{0} on success, and @code{-1} on failure
 (which occurs if @var{spec} is out of range).
 
-You can redefine the standard output conversions, but this is probably
-not a good idea because of the potential for confusion.  Library routines
-written by other people could break if you do this.
+@strong{Portability Note:} It is possible to redefine the standard output
+conversions but doing so is strongly discouraged because it may interfere
+with the behavior of programs and compiler implementations that assume
+the effects of the conversions conform to the relevant language standards.
+In addition, conforming compilers need not guarantee that the function
+registered for a standard conversion will be called for each such
+conversion in every format string in a program.
 @end deftypefun
 
 @node Conversion Specifier Options
Andreas Schwab June 18, 2018, 7:41 a.m. | #7
On Jun 14 2018, Martin Sebor <msebor@gmail.com> wrote:

> diff --git a/manual/stdio.texi b/manual/stdio.texi
> index 38be236..26a570f 100644
> --- a/manual/stdio.texi
> +++ b/manual/stdio.texi
> @@ -2963,7 +2963,11 @@ The facilities of this section are declared in the header file
>  
>  @strong{Portability Note:} The ability to extend the syntax of
>  @code{printf} template strings is a GNU extension.  ISO standard C has
> -nothing similar.
> +nothing similar.  When using the GNU C compiler or any other compiler
> +that interprets calls to standard I/O functions according to the rules
> +of the language standard it is necessary to disable such handling by
> +the appropriate compiler option.  Otherwise the behavior of a program
> +that relies on the extension is undefined.

Is that really needed?  Didn't you say that GCC disables its
optimisations when an unknown format is encountered?

Andreas.
Martin Sebor June 19, 2018, 3 a.m. | #8
On 06/18/2018 01:41 AM, Andreas Schwab wrote:
> On Jun 14 2018, Martin Sebor <msebor@gmail.com> wrote:
>
>> diff --git a/manual/stdio.texi b/manual/stdio.texi
>> index 38be236..26a570f 100644
>> --- a/manual/stdio.texi
>> +++ b/manual/stdio.texi
>> @@ -2963,7 +2963,11 @@ The facilities of this section are declared in the header file
>>
>>  @strong{Portability Note:} The ability to extend the syntax of
>>  @code{printf} template strings is a GNU extension.  ISO standard C has
>> -nothing similar.
>> +nothing similar.  When using the GNU C compiler or any other compiler
>> +that interprets calls to standard I/O functions according to the rules
>> +of the language standard it is necessary to disable such handling by
>> +the appropriate compiler option.  Otherwise the behavior of a program
>> +that relies on the extension is undefined.
>
> Is that really needed?  Didn't you say that GCC disables its
> optimisations when an unknown format is encountered?

Saying the behavior is undefined is the conservative thing to do.
I would not be comfortable providing a stronger guarantee for GCC
but if it were thought necessary to say more, the place to go into
the details would the GCC manual where they could be kept in sync
with the implementation.  IMO, the Glibc manual (and the manual
of any other I/O library) should defer to the compiler manual,
and that's what this text does.

Martin
Martin Sebor June 27, 2018, 11:37 p.m. | #9
Ping: https://sourceware.org/ml/libc-alpha/2018-06/msg00428.html

If there are no further comments I'd like to commit this change
later this week.

On 06/14/2018 01:11 PM, Martin Sebor wrote:
> On 06/14/2018 01:25 AM, Andreas Schwab wrote:
>> On Jun 13 2018, Martin Sebor <msebor@gmail.com> wrote:
>>
>>> diff --git a/manual/stdio.texi b/manual/stdio.texi
>>> index 38be236..d945955 100644
>>> --- a/manual/stdio.texi
>>> +++ b/manual/stdio.texi
>>> @@ -2963,7 +2963,11 @@ The facilities of this section are declared in
>>> the header file
>>>
>>>  @strong{Portability Note:} The ability to extend the syntax of
>>>  @code{printf} template strings is a GNU extension.  ISO standard C has
>>> -nothing similar.
>>> +nothing similar.  When using the GNU C compiler or any other compiler
>>> +that interprets calls to standard I/O functions according to the rules
>>> +of the language standard it is necessary to disable such handling by
>>> +the appropriate compiler option.  Otherwise the behavior of a program
>>> +that relies on the extension is undefined.
>>
>> The manual already says that redefining existing conversions causes
>> problems:
>>
>>     You can redefine the standard output conversions, but this is
>> probably
>>     not a good idea because of the potential for confusion.  Library
>> routines
>>     written by other people could break if you do this.
>>
>> We should extend that with a stronger language, independent of any
>> compiler behaviour.
>
> That sounds fine to me.  Attached is an updated patch that also
> adds text to this paragraph.  I added a Portability Note before
> the text to highlight the portability impact, similarly to
> the prior paragraph.
>
> Martin
Martin Sebor June 29, 2018, 5 p.m. | #10
I have committed the change:
   http://tinyurl.com/y7jbvfs4

Martin

On 06/27/2018 05:37 PM, Martin Sebor wrote:
> Ping: https://sourceware.org/ml/libc-alpha/2018-06/msg00428.html
>
> If there are no further comments I'd like to commit this change
> later this week.
>
> On 06/14/2018 01:11 PM, Martin Sebor wrote:
>> On 06/14/2018 01:25 AM, Andreas Schwab wrote:
>>> On Jun 13 2018, Martin Sebor <msebor@gmail.com> wrote:
>>>
>>>> diff --git a/manual/stdio.texi b/manual/stdio.texi
>>>> index 38be236..d945955 100644
>>>> --- a/manual/stdio.texi
>>>> +++ b/manual/stdio.texi
>>>> @@ -2963,7 +2963,11 @@ The facilities of this section are declared in
>>>> the header file
>>>>
>>>>  @strong{Portability Note:} The ability to extend the syntax of
>>>>  @code{printf} template strings is a GNU extension.  ISO standard C has
>>>> -nothing similar.
>>>> +nothing similar.  When using the GNU C compiler or any other compiler
>>>> +that interprets calls to standard I/O functions according to the rules
>>>> +of the language standard it is necessary to disable such handling by
>>>> +the appropriate compiler option.  Otherwise the behavior of a program
>>>> +that relies on the extension is undefined.
>>>
>>> The manual already says that redefining existing conversions causes
>>> problems:
>>>
>>>     You can redefine the standard output conversions, but this is
>>> probably
>>>     not a good idea because of the potential for confusion.  Library
>>> routines
>>>     written by other people could break if you do this.
>>>
>>> We should extend that with a stronger language, independent of any
>>> compiler behaviour.
>>
>> That sounds fine to me.  Attached is an updated patch that also
>> adds text to this paragraph.  I added a Portability Note before
>> the text to highlight the portability impact, similarly to
>> the prior paragraph.
>>
>> Martin
>

Patch

ChangeLog:
	* manual/stdio.texi (Customizing printf): Add a note to disable
	built-in handling.

diff --git a/manual/stdio.texi b/manual/stdio.texi
index 38be236..d945955 100644
--- a/manual/stdio.texi
+++ b/manual/stdio.texi
@@ -2963,7 +2963,11 @@  The facilities of this section are declared in the header file
 
 @strong{Portability Note:} The ability to extend the syntax of
 @code{printf} template strings is a GNU extension.  ISO standard C has
-nothing similar.
+nothing similar.  When using the GNU C compiler or any other compiler
+that interprets calls to standard I/O functions according to the rules
+of the language standard it is necessary to disable such handling by
+the appropriate compiler option.  Otherwise the behavior of a program
+that relies on the extension is undefined.
 
 @node Registering New Conversions
 @subsection Registering New Conversions