diff mbox series

stpncpy.3, string_copying.7: Clarify that st[rp]ncpy() do NOT produce a string

Message ID 20231108221638.37101-2-alx@kernel.org
State New
Headers show
Series stpncpy.3, string_copying.7: Clarify that st[rp]ncpy() do NOT produce a string | expand

Commit Message

Alejandro Colomar Nov. 8, 2023, 10:17 p.m. UTC
These copy *from* a string.  But the destination is a simple character
sequence within an array; not a string.

Suggested-by: DJ Delorie <dj@redhat.com>
Cc: Jonny Grant <jg@jguk.org>
Cc: Matthew House <mattlloydhouse@gmail.com>
Cc: Oskari Pirhonen <xxc3ncoredxx@gmail.com>
Cc: Thorsten Kukuk <kukuk@suse.com>
Cc: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
Cc: Zack Weinberg <zack@owlfolio.org>
Cc: "G. Branden Robinson" <g.branden.robinson@gmail.com>
Cc: Carlos O'Donell <carlos@redhat.com>
Signed-off-by: Alejandro Colomar <alx@kernel.org>
---

Resending, including the mailing lists, which I forgot.

 man3/stpncpy.3        | 17 +++++++++++++----
 man7/string_copying.7 | 20 ++++++++++----------
 2 files changed, 23 insertions(+), 14 deletions(-)

Comments

Paul Eggert Nov. 8, 2023, 11:06 p.m. UTC | #1
On 11/8/23 14:17, Alejandro Colomar wrote:
> These copy*from*  a string

Not necessarily. For example, in strncpy (DST, SRC, N), SRC need not be 
a string.

By the way, have you looked at the recent (i.e., this-year) changes to 
the glibc manual's string section? They're relevant.
DJ Delorie Nov. 8, 2023, 11:28 p.m. UTC | #2
Paul Eggert <eggert@cs.ucla.edu> writes:
> Not necessarily. For example, in strncpy (DST, SRC, N), SRC need not be 
> a string.

But it will be treated as one, for the purposes of this function.
Alejandro Colomar Nov. 9, 2023, 12:24 a.m. UTC | #3
Hi Paul,

On Wed, Nov 08, 2023 at 03:06:40PM -0800, Paul Eggert wrote:
> On 11/8/23 14:17, Alejandro Colomar wrote:
> > These copy*from*  a string
> 
> Not necessarily. For example, in strncpy (DST, SRC, N), SRC need not be a
> string.

Pedantically, true.  But since it's quite rare to copy from a
fixed-width null-padded array into another, I didn't want to waste
space on that and possibly confuse readers.  In such a case, the source
buffer must be at least as large as the destination buffer, and will
likely be the same size (because having fixed-width stuff, why make it
different), so memcpy(3) will probably be simpler.

> 
> By the way, have you looked at the recent (i.e., this-year) changes to the
> glibc manual's string section? They're relevant.

I hadn't; after your message, I have.
<https://sourceware.org/glibc/manual/2.38/html_mono/libc.html#String-and-Array-Utilities>

I like how it connects all the functions, and it explains the concepts
and gives advice (e.g., avoid truncation as it's usually evil), and
compares the different functions.

However, I think it misses a few things:

-  strncpy(3) and strncat(3) are not related at all.  They don't have
   the same relation that strcpy(3) and strcat(3) have.  You can't
   write the following code in any case:

	strncpy(dst, foo, sizeof(dst));
	strncat(dst, bar, sizeof(dst));

   as you would with strcpy(3) or strlcpy(3).

   strncpy(3) and strncat(3) are opposite functions: the former reads
   from a string and writes to a fixed-width null-padded buffer, and the
   latter reads from a fixed-width buffer and writes to a string.  (You
   can use them in other cases, pedantically, as you said above, but
   those cases are rather unreal.)

-  strncpy(3) is in a section that starts by saying:

   > The functions described in this section copy or concatenate the
   > possibly-truncated contents of a string or array to another

   This may mislead programmers to believe it is useful for producing
   strings, when it's not.

In general, I would like the manual to put some more distance between
these functions and the term "string".  As DJ mentioned, it might be
useful to mention utmp(5) and tar(1) as niche use cases for
st[rp]ncpy(3).

And now for some typo:

-  In the following sentence under "5.2 String and Array Conventions":

   > The array arguments and return values for these functions have type
   > void * or wchar_t.

   I believe it meant `void *` or `wchar_t *`


Cheers,

Alex
Oskari Pirhonen Nov. 9, 2023, 7:23 a.m. UTC | #4
On Wed, Nov 08, 2023 at 23:17:07 +0100, Alejandro Colomar wrote:
> These copy *from* a string.  But the destination is a simple character
> sequence within an array; not a string.
> 
> Suggested-by: DJ Delorie <dj@redhat.com>
> Cc: Jonny Grant <jg@jguk.org>
> Cc: Matthew House <mattlloydhouse@gmail.com>
> Cc: Oskari Pirhonen <xxc3ncoredxx@gmail.com>
> Cc: Thorsten Kukuk <kukuk@suse.com>
> Cc: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
> Cc: Zack Weinberg <zack@owlfolio.org>
> Cc: "G. Branden Robinson" <g.branden.robinson@gmail.com>
> Cc: Carlos O'Donell <carlos@redhat.com>
> Signed-off-by: Alejandro Colomar <alx@kernel.org>
> ---

I like the "with bytes from a string" wording. Good call.

- Oskari

> 
> Resending, including the mailing lists, which I forgot.
> 
>  man3/stpncpy.3        | 17 +++++++++++++----
>  man7/string_copying.7 | 20 ++++++++++----------
>  2 files changed, 23 insertions(+), 14 deletions(-)
> 
> diff --git a/man3/stpncpy.3 b/man3/stpncpy.3
> index b6bbfd0a3..f86ff8c29 100644
> --- a/man3/stpncpy.3
> +++ b/man3/stpncpy.3
> @@ -6,9 +6,8 @@
>  .TH stpncpy 3 (date) "Linux man-pages (unreleased)"
>  .SH NAME
>  stpncpy, strncpy
> -\- zero a fixed-width buffer and
> -copy a string into a character sequence with truncation
> -and zero the rest of it
> +\-
> +fill a fixed-width null-padded buffer with bytes from a string
>  .SH LIBRARY
>  Standard C library
>  .RI ( libc ", " \-lc )
> @@ -37,7 +36,7 @@ .SH SYNOPSIS
>          _GNU_SOURCE
>  .fi
>  .SH DESCRIPTION
> -These functions copy the string pointed to by
> +These functions copy bytes from the string pointed to by
>  .I src
>  into a null-padded character sequence at the fixed-width buffer pointed to by
>  .IR dst .
> @@ -110,6 +109,16 @@ .SH CAVEATS
>  These functions produce a null-padded character sequence,
>  not a string (see
>  .BR string_copying (7)).
> +For example:
> +.P
> +.in +4n
> +.EX
> +strncpy(buf, "1", 5);       // { \[aq]1\[aq],   0,   0,   0,   0 }
> +strncpy(buf, "1234", 5);    // { \[aq]1\[aq], \[aq]2\[aq], \[aq]3\[aq], \[aq]4\[aq],   0 }
> +strncpy(buf, "12345", 5);   // { \[aq]1\[aq], \[aq]2\[aq], \[aq]3\[aq], \[aq]4\[aq], \[aq]5\[aq] }
> +strncpy(buf, "123456", 5);  // { \[aq]1\[aq], \[aq]2\[aq], \[aq]3\[aq], \[aq]4\[aq], \[aq]5\[aq] }
> +.EE
> +.in
>  .P
>  It's impossible to distinguish truncation by the result of the call,
>  from a character sequence that just fits the destination buffer;
> diff --git a/man7/string_copying.7 b/man7/string_copying.7
> index cadf1c539..0e179ba34 100644
> --- a/man7/string_copying.7
> +++ b/man7/string_copying.7
> @@ -41,15 +41,11 @@ .SS Strings
>  .\" ----- SYNOPSIS :: Null-padded character sequences --------/
>  .SS Null-padded character sequences
>  .nf
> -// Zero a fixed-width buffer, and
> -// copy a string into a character sequence with truncation.
> -.BI "char *stpncpy(char " dst "[restrict ." sz "], \
> +// Fill a fixed-width null-padded buffer with bytes from a string.
> +.BI "char *strncpy(char " dst "[restrict ." sz "], \
>  const char *restrict " src ,
>  .BI "               size_t " sz );
> -.P
> -// Zero a fixed-width buffer, and
> -// copy a string into a character sequence with truncation.
> -.BI "char *strncpy(char " dst "[restrict ." sz "], \
> +.BI "char *stpncpy(char " dst "[restrict ." sz "], \
>  const char *restrict " src ,
>  .BI "               size_t " sz );
>  .P
> @@ -240,14 +236,18 @@ .SS Truncate or not?
>  .\" ----- DESCRIPTION :: Null-padded character sequences --------------/
>  .SS Null-padded character sequences
>  For historic reasons,
> -some standard APIs,
> +some standard APIs and file formats,
>  such as
> -.BR utmpx (5),
> +.BR utmpx (5)
> +and
> +.BR tar (1),
>  use null-padded character sequences in fixed-width buffers.
>  To interface with them,
>  specialized functions need to be used.
>  .P
> -To copy strings into them, use
> +To copy bytes from strings into these buffers, use
> +.BR strncpy (3)
> +or
>  .BR stpncpy (3).
>  .P
>  To copy from an unterminated string within a fixed-width buffer into a string,
> -- 
> 2.42.0
Jonny Grant Nov. 9, 2023, 2:11 p.m. UTC | #5
On 08/11/2023 23:06, Paul Eggert wrote:
> On 11/8/23 14:17, Alejandro Colomar wrote:
>> These copy*from*  a string
> 
> Not necessarily. For example, in strncpy (DST, SRC, N), SRC need not be a string.
> 
> By the way, have you looked at the recent (i.e., this-year) changes to the glibc manual's string section? They're relevant.

That's a great reference page Paul, lots of useful information in the manual.
https://www.gnu.org/software/libc/manual/html_node/String-and-Array-Utilities.html

Re this man page:

https://man7.org/linux/man-pages/man3/string.3.html

 Obsolete functions
       char *strncpy(char dest[restrict .n], const char src[restrict .n],
                     size_t n);
              Copy at most n bytes from string src to dest, returning a
              pointer to the start of dest.


It could clarify
"Copy at most n bytes from string src to ARRAY dest, returning a
pointer to the start of ARRAY dest."

(caps for my emphasis in this email)

Kind regards
Jonny
Alejandro Colomar Nov. 9, 2023, 2:35 p.m. UTC | #6
Hi Jonny,

On Thu, Nov 09, 2023 at 02:11:14PM +0000, Jonny Grant wrote:
> On 08/11/2023 23:06, Paul Eggert wrote:
> > On 11/8/23 14:17, Alejandro Colomar wrote:
> >> These copy*from*  a string
> > 
> > Not necessarily. For example, in strncpy (DST, SRC, N), SRC need not be a string.
> > 
> > By the way, have you looked at the recent (i.e., this-year) changes to the glibc manual's string section? They're relevant.
> 
> That's a great reference page Paul, lots of useful information in the manual.
> https://www.gnu.org/software/libc/manual/html_node/String-and-Array-Utilities.html
> 
> Re this man page:
> 
> https://man7.org/linux/man-pages/man3/string.3.html
> 
>  Obsolete functions
>        char *strncpy(char dest[restrict .n], const char src[restrict .n],
>                      size_t n);
>               Copy at most n bytes from string src to dest, returning a
>               pointer to the start of dest.

Uh, I forgot about that page.  I'll have a look at it and update it.  At
least, I need to remove that "Obsolete functions".

> 
> 
> It could clarify
> "Copy at most n bytes from string src to ARRAY dest, returning a
> pointer to the start of ARRAY dest."

I think I prefer DJ's suggestion:

"Fill a fixed‐width null‐padded buffer with bytes from a string."

Thanks!
Alex

> 
> (caps for my emphasis in this email)
> 
> Kind regards
> Jonny
Jonny Grant Nov. 9, 2023, 2:47 p.m. UTC | #7
On 09/11/2023 14:35, Alejandro Colomar wrote:
> Hi Jonny,
> 
> On Thu, Nov 09, 2023 at 02:11:14PM +0000, Jonny Grant wrote:
>> On 08/11/2023 23:06, Paul Eggert wrote:
>>> On 11/8/23 14:17, Alejandro Colomar wrote:
>>>> These copy*from*  a string
>>>
>>> Not necessarily. For example, in strncpy (DST, SRC, N), SRC need not be a string.
>>>
>>> By the way, have you looked at the recent (i.e., this-year) changes to the glibc manual's string section? They're relevant.
>>
>> That's a great reference page Paul, lots of useful information in the manual.
>> https://www.gnu.org/software/libc/manual/html_node/String-and-Array-Utilities.html
>>
>> Re this man page:
>>
>> https://man7.org/linux/man-pages/man3/string.3.html
>>
>>  Obsolete functions
>>        char *strncpy(char dest[restrict .n], const char src[restrict .n],
>>                      size_t n);
>>               Copy at most n bytes from string src to dest, returning a
>>               pointer to the start of dest.
> 
> Uh, I forgot about that page.  I'll have a look at it and update it.  At
> least, I need to remove that "Obsolete functions".
> 
>>
>>
>> It could clarify
>> "Copy at most n bytes from string src to ARRAY dest, returning a
>> pointer to the start of ARRAY dest."
> 
> I think I prefer DJ's suggestion:
> 
> "Fill a fixed‐width null‐padded buffer with bytes from a string."

Better to make it clear it's null-padded after?

"Fill a fixed‐width buffer with bytes from a string and pad with null bytes."

I'll leave it with you.

Kind regards
Jonny
Alejandro Colomar Nov. 9, 2023, 3:02 p.m. UTC | #8
On Thu, Nov 09, 2023 at 02:47:05PM +0000, Jonny Grant wrote:
> >> It could clarify
> >> "Copy at most n bytes from string src to ARRAY dest, returning a
> >> pointer to the start of ARRAY dest."
> > 
> > I think I prefer DJ's suggestion:
> > 
> > "Fill a fixed‐width null‐padded buffer with bytes from a string."
> 
> Better to make it clear it's null-padded after?
> 
> "Fill a fixed‐width buffer with bytes from a string and pad with null bytes."

Yes, that looks even better.  And I wasn't very happy with "bytes".
Maybe:

"Fill a fixed-width buffer with characters from a string and pad with
null bytes."

Thanks,
Alex

> 
> I'll leave it with you.
> 
> Kind regards
> Jonny
DJ Delorie Nov. 9, 2023, 5:30 p.m. UTC | #9
Alejandro Colomar <alx@kernel.org> writes:
> "Fill a fixed-width buffer with characters from a string and pad with
> null bytes."

The pedant in me says it should be NUL bytes (or NUL's), not null bytes.
nul/NUL is a character, null/NULL is a pointer.
Andreas Schwab Nov. 9, 2023, 5:54 p.m. UTC | #10
On Nov 09 2023, DJ Delorie wrote:

> The pedant in me says it should be NUL bytes (or NUL's), not null bytes.
> nul/NUL is a character, null/NULL is a pointer.

NUL is the ASCII abbreviation for Null (see RFC 20).
Alejandro Colomar Nov. 9, 2023, 6 p.m. UTC | #11
Hi DJ,

On Thu, Nov 09, 2023 at 12:30:17PM -0500, DJ Delorie wrote:
> Alejandro Colomar <alx@kernel.org> writes:
> > "Fill a fixed-width buffer with characters from a string and pad with
> > null bytes."
> 
> The pedant in me says it should be NUL bytes (or NUL's), not null bytes.
> nul/NUL is a character, null/NULL is a pointer.

Here's what man-pages(7) (written by Michael Kerrisk) says:

   NULL, NUL, null pointer, and null byte
     A null pointer is a pointer that points to nothing, and  is  nor‐
     mally  indicated by the constant NULL.  On the other hand, NUL is
     the null byte, a byte with the value 0, represented in C via  the
     character constant '\0'.

     The  preferred  term  for the pointer is "null pointer" or simply
     "NULL"; avoid writing "NULL pointer".

     The preferred term for the byte is "null  byte".   Avoid  writing
     "NUL",  since  it is too easily confused with "NULL".  Avoid also
     the terms "zero byte" and "null character".  The byte that termi‐
     nates a C string should be described  as  "the  terminating  null
     byte";  strings  may be described as "null‐terminated", but avoid
     the use of "NUL‐terminated".


I don't necessarily agree with all of that, but mostly.  I don't agree
with not saying null character, because as well as we have the null wide
character (L'\0'), using null character for '\0' makes it symmetric.

Other than that, I mostly agree with Michael.  Here's what I think of
these terms:

-  NULL is a null pointer constant (as well as 0 is another null pointer
   constant).

-  A null pointer is a more generic term that includes a run-time null
   pointer as well. 

-  The null byte is 0.

-  The null character, '\0', is composed of a null byte.

-  The null wide character, L'\0' is composed of several null bytes.

-  NUL is the ASCII name of the null byte, or maybe is it null character
   here?  It's a bit muddy.

I use null byte for padding, and null character for the string
terminator, to make a stronger difference between strings and
null-padded fixed-width arrays.  I need to review string_copying(7) to
make sure I was consistent in this regard.

Colloquially, I find it fine to write NULL instead of null pointer (even
for non-constant cases), and NUL instead of any of "null character",
"null byte", or "null wide character", but for being precise, I prefer
"null something".

Cheers,
Alex
Jonny Grant Nov. 9, 2023, 7:42 p.m. UTC | #12
On 09/11/2023 17:30, DJ Delorie wrote:
> Alejandro Colomar <alx@kernel.org> writes:
>> "Fill a fixed-width buffer with characters from a string and pad with
>> null bytes."
> 
> The pedant in me says it should be NUL bytes (or NUL's), not null bytes.
> nul/NUL is a character, null/NULL is a pointer.
> 

NUL would be a big improvement.

Kind regards, Jonny
diff mbox series

Patch

diff --git a/man3/stpncpy.3 b/man3/stpncpy.3
index b6bbfd0a3..f86ff8c29 100644
--- a/man3/stpncpy.3
+++ b/man3/stpncpy.3
@@ -6,9 +6,8 @@ 
 .TH stpncpy 3 (date) "Linux man-pages (unreleased)"
 .SH NAME
 stpncpy, strncpy
-\- zero a fixed-width buffer and
-copy a string into a character sequence with truncation
-and zero the rest of it
+\-
+fill a fixed-width null-padded buffer with bytes from a string
 .SH LIBRARY
 Standard C library
 .RI ( libc ", " \-lc )
@@ -37,7 +36,7 @@  .SH SYNOPSIS
         _GNU_SOURCE
 .fi
 .SH DESCRIPTION
-These functions copy the string pointed to by
+These functions copy bytes from the string pointed to by
 .I src
 into a null-padded character sequence at the fixed-width buffer pointed to by
 .IR dst .
@@ -110,6 +109,16 @@  .SH CAVEATS
 These functions produce a null-padded character sequence,
 not a string (see
 .BR string_copying (7)).
+For example:
+.P
+.in +4n
+.EX
+strncpy(buf, "1", 5);       // { \[aq]1\[aq],   0,   0,   0,   0 }
+strncpy(buf, "1234", 5);    // { \[aq]1\[aq], \[aq]2\[aq], \[aq]3\[aq], \[aq]4\[aq],   0 }
+strncpy(buf, "12345", 5);   // { \[aq]1\[aq], \[aq]2\[aq], \[aq]3\[aq], \[aq]4\[aq], \[aq]5\[aq] }
+strncpy(buf, "123456", 5);  // { \[aq]1\[aq], \[aq]2\[aq], \[aq]3\[aq], \[aq]4\[aq], \[aq]5\[aq] }
+.EE
+.in
 .P
 It's impossible to distinguish truncation by the result of the call,
 from a character sequence that just fits the destination buffer;
diff --git a/man7/string_copying.7 b/man7/string_copying.7
index cadf1c539..0e179ba34 100644
--- a/man7/string_copying.7
+++ b/man7/string_copying.7
@@ -41,15 +41,11 @@  .SS Strings
 .\" ----- SYNOPSIS :: Null-padded character sequences --------/
 .SS Null-padded character sequences
 .nf
-// Zero a fixed-width buffer, and
-// copy a string into a character sequence with truncation.
-.BI "char *stpncpy(char " dst "[restrict ." sz "], \
+// Fill a fixed-width null-padded buffer with bytes from a string.
+.BI "char *strncpy(char " dst "[restrict ." sz "], \
 const char *restrict " src ,
 .BI "               size_t " sz );
-.P
-// Zero a fixed-width buffer, and
-// copy a string into a character sequence with truncation.
-.BI "char *strncpy(char " dst "[restrict ." sz "], \
+.BI "char *stpncpy(char " dst "[restrict ." sz "], \
 const char *restrict " src ,
 .BI "               size_t " sz );
 .P
@@ -240,14 +236,18 @@  .SS Truncate or not?
 .\" ----- DESCRIPTION :: Null-padded character sequences --------------/
 .SS Null-padded character sequences
 For historic reasons,
-some standard APIs,
+some standard APIs and file formats,
 such as
-.BR utmpx (5),
+.BR utmpx (5)
+and
+.BR tar (1),
 use null-padded character sequences in fixed-width buffers.
 To interface with them,
 specialized functions need to be used.
 .P
-To copy strings into them, use
+To copy bytes from strings into these buffers, use
+.BR strncpy (3)
+or
 .BR stpncpy (3).
 .P
 To copy from an unterminated string within a fixed-width buffer into a string,