diff mbox series

manual: Drop incorrect statement on PIPE_BUF and blocking writes

Message ID 20240325085927.2041034-1-stepnem@smrk.net
State New
Headers show
Series manual: Drop incorrect statement on PIPE_BUF and blocking writes | expand

Commit Message

Štěpán Němec March 25, 2024, 8:59 a.m. UTC
Typical Linux defaults are 4096 bytes of PIPE_BUF and 65536 bytes of
kernel pipe buffer; the latter, not the former, limits the amount
of data writable without blocking.  E.g., observe the different
behavior of the following two command lines (assuming the above
defaults):

{ dd if=/dev/zero bs=65536 count=1 2>/dev/null;
  echo 'all written' >&2; } |
    { sleep 1; wc -c; }

{ dd if=/dev/zero bs=65537 count=1 2>/dev/null;
  echo 'all written' >&2; } |
    { sleep 1; wc -c; }

Only the latter waits 1s before printing 'all written', due to the
number of bytes being written exceeding the kernel pipe buffer.
PIPE_BUF (still only 4096 bytes) is irrelevant here.

From pipe(7):

  Before  Linux 2.6.11, the capacity of a pipe was the same as the system
  page size (e.g., 4096 bytes on i386).  Since Linux 2.6.11, the pipe ca‐
  pacity  is 16 pages (i.e., 65,536 bytes in a system with a page size of
  4096 bytes).  Since Linux 2.6.35,  the  default  pipe  capacity  is  16
  pages,  but  the  capacity  can  be  queried and set using the fcntl(2)
  F_GETPIPE_SZ and F_SETPIPE_SZ operations.  See fcntl(2) for more infor‐
  mation.
---
 manual/pipe.texi | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)


base-commit: dc1a77269c971652a8a5167ec366792eae052e65

Comments

Florian Weimer March 25, 2024, 11:46 a.m. UTC | #1
* Štěpán Němec:

> diff --git a/manual/pipe.texi b/manual/pipe.texi
> index 483c40c5c3dd..8a9a275cafe7 100644
> --- a/manual/pipe.texi
> +++ b/manual/pipe.texi
> @@ -312,8 +312,7 @@
>  
>  Reading or writing a larger amount of data may not be atomic; for
>  example, output data from other processes sharing the descriptor may be
> -interspersed.  Also, once @code{PIPE_BUF} characters have been written,
> -further writes will block until some characters are read.
> +interspersed.

Maybe “further may block” instead?  I think the reference to PIPE_BUF
and blocking could still be helpful, except that it's not a guarantee,
as you correctly point out.

Do you have copyright assignment?  If not, please add Signed-off-by: in
a second submission of the patch.

Thanks,
Florian
Štěpán Němec March 25, 2024, 12:13 p.m. UTC | #2
On Mon, 25 Mar 2024 12:46:47 +0100
Florian Weimer wrote:

> * Štěpán Němec:
>
>> diff --git a/manual/pipe.texi b/manual/pipe.texi
>> index 483c40c5c3dd..8a9a275cafe7 100644
>> --- a/manual/pipe.texi
>> +++ b/manual/pipe.texi
>> @@ -312,8 +312,7 @@
>>  
>>  Reading or writing a larger amount of data may not be atomic; for
>>  example, output data from other processes sharing the descriptor may be
>> -interspersed.  Also, once @code{PIPE_BUF} characters have been written,
>> -further writes will block until some characters are read.
>> +interspersed.
>
> Maybe “further may block” instead?  I think the reference to PIPE_BUF
> and blocking could still be helpful, except that it's not a guarantee,
> as you correctly point out.

(Assuming you meant “further writes may block”, i.e., just
s/will/may/ in the pre-patch text.)

Ignoring the fact that the sentence seems simply wrong, at
least in environments where the vast majority of glibc
installations run (Linux with the relevant parameters as
described in my commit message), I don't find the sentence
particularly helpful, as the section focuses on _atomicity_,
not blocking, so I find the sudden side note on blocking
somewhat out of place here in any case.

And as for your particular suggestion (if I understood it
correctly), I would find that formulation _very_ unhelpful,
unless supplemented by additional details (i.e., under what
conditions "may" the blocking happen; but again, why talk
about this at all in a section titled "Pipe Atomicity"?).

> Do you have copyright assignment?

I do not, and I thought it wasn't necessary for this kind of
change.

> If not, please add Signed-off-by: in a second submission
> of the patch.

Will do (if the result of the discussion calls for it, i.e.,
some version of my patch turns out acceptable).

Thanks,

  Štěpán
Zack Weinberg March 25, 2024, 4:20 p.m. UTC | #3
On Mon, Mar 25, 2024, at 8:13 AM, Štěpán Němec wrote:
>>>  Reading or writing a larger amount of data may not be atomic; for
>>>  example, output data from other processes sharing the descriptor may be
>>> -interspersed.  Also, once @code{PIPE_BUF} characters have been written,
>>> -further writes will block until some characters are read.
>>> +interspersed.
>>
>> Maybe “further may block” instead?  I think the reference to PIPE_BUF
>> and blocking could still be helpful, except that it's not a guarantee,
>> as you correctly point out.

It's not correct to say that a write of 65536 bytes will _never_
block.  Rather, the pipe capacity on Linux is (by default) 65536
bytes, and, if nothing is reading, _any write_ that tries to put a
65537th byte into the pipe will block.  For example, both of these
will wait 1s before printing "all written":

{ dd if=/dev/zero bs=1 count=1 status=none;
  dd if=/dev/zero bs=65536 count=1 status=none;
  echo 'all written' >&2; } |
    { sleep 1; wc -c; }

{ dd if=/dev/zero bs=1 count=1 status=none;
  dd if=/dev/zero bs=65535 count=1 status=none;
  dd if=/dev/zero bs=1 count=1 status=none;
  echo 'all written' >&2; } |
    { sleep 1; wc -c; }

I agree that it is weird to talk about this in a section that's
nominally about atomicity.  But I think we shouldn't be calling the
"no interspersed data from other processes" behavior that we're trying
to describe here "atomicity" at all!  Quoting
<https://pubs.opengroup.org/onlinepubs/9699919799/functions/write.html>:

# Write requests to a pipe or FIFO shall be handled in the same way as
# a regular file with the following exceptions:
...
# * Write requests of {PIPE_BUF} bytes or less shall not be
#   interleaved with data from other processes doing writes on the
#   same pipe. Writes of greater than {PIPE_BUF} bytes may have data
#   interleaved, on arbitrary boundaries, with writes by other
#   processes, whether or not the O_NONBLOCK flag of the file status
#   flags is set.

This is a weak statement.  It does *not* guarantee "that nothing else
in the system can observe a state in which it is partially complete,"
as the manual currently puts it.  Nor does it guarantee anything
about how much a process reading from the pipe will receive if it
does a larger read than the write.  (To put that another way, if you
write data packets to a pipe, the reader cannot use the return value
of read() to tell how big the packets were.)

Also, it's not clear to me from what you wrote, whether Linux extends
the no-interleaved-data guarantee writes larger than PIPE_BUF as long
as they are smaller than the pipe capacity, but if it does, we should
say so only in a way that makes it clear it's not portable to rely on
that.

So I propose the appended revision to pipe.texi instead of what you
proposed.  It moves all this discussion to the beginning of the
chapter and explains everything more thoroughly, and hopefully
also correctly.

zw

diff --git a/manual/pipe.texi b/manual/pipe.texi
index 483c40c5c3..92c1733c75 100644
--- a/manual/pipe.texi
+++ b/manual/pipe.texi
@@ -9,30 +9,58 @@ handled in a first-in, first-out (FIFO) order.  The pipe has no name; it
 is created for one use and both ends must be inherited from the single
 process which created the pipe.
 
+@cindex FIFO
 @cindex FIFO special file
-A @dfn{FIFO special file} is similar to a pipe, but instead of being an
-anonymous, temporary connection, a FIFO has a name or names like any
-other file.  Processes open the FIFO by name in order to communicate
-through it.
+A @dfn{FIFO special file}, commonly shortened to @dfn{FIFO}, is
+similar to a pipe, but instead of being an anonymous, temporary
+connection, a FIFO has a name or names like any other file.
+Processes open the FIFO by name in order to communicate through it.
 
-A pipe or FIFO has to be open at both ends simultaneously.  If you read
-from a pipe or FIFO file that doesn't have any processes writing to it
+A pipe or FIFO has to be open at both ends simultaneously.  If you
+read from a pipe or FIFO that doesn't have any processes writing to it
 (perhaps because they have all closed the file, or exited), the read
 returns end-of-file.  Writing to a pipe or FIFO that doesn't have a
 reading process is treated as an error condition; it generates a
 @code{SIGPIPE} signal, and fails with error code @code{EPIPE} if the
 signal is handled or blocked.
 
-Neither pipes nor FIFO special files allow file positioning.  Both
-reading and writing operations happen sequentially; reading from the
-beginning of the file and writing at the end.
+Neither pipes nor FIFOs allow file positioning.  Both reading and
+writing operations happen sequentially; reading from the beginning of
+the file and writing at the end.
+
+If two or more processes are writing to the same pipe or FIFO, the
+data written by each process may be interleaved arbitrarily with data
+written by the others.  There is only one exception: Each time a
+process makes a call to @code{write}, @code{writev}, or other
+primitive I/O function (@pxref{I/O Primitives}) that writes, in total,
+no more than @code{PIPE_BUF} bytes of data, @emph{that data} will not
+be split by data written by other processes.  But data written by
+other processes could appear immediately before or afterward.
+
+@xref{Limits for Files}, for information about the @code{PIPE_BUF}
+parameter.  Note that @code{PIPE_BUF} is usually smaller than the
+default buffer size used by I/O on streams (i.e.@: @code{BUFSIZ});
+@xref{Stream Buffering}, for how to control the stream buffer size.
+
+Pipes and FIFOs may have a limit on the amount of data that's been
+written, but not yet read, that they can store.  This limit is called
+the @dfn{capacity} of the pipe or FIFO. A write that would overfill
+the pipe---put more data into it than its capacity---will block until
+something reads from the pipe (unless the @code{O_NONBLOCK} flag is
+set; @pxref{Operating Modes}).  If the write is smaller than
+@code{PIPE_BUF}, none of the data will enter the pipe until all of it
+can; if the write is larger, there is no guarantee about how much
+data enters the pipe and when.
+
+The capacity must be @emph{at least} @code{PIPE_BUF}.  Often it is
+bigger.  Some systems provide a way to query what the capacity is,
+or to set it for individual pipes and FIFOs.
 
 @menu
 * Creating a Pipe::             Making a pipe with the @code{pipe} function.
 * Pipe to a Subprocess::        Using a pipe to communicate with a
 				 child process.
 * FIFO Special Files::          Making a FIFO special file.
-* Pipe Atomicity::		When pipe (or FIFO) I/O is atomic.
 @end menu
 
 @node Creating a Pipe
@@ -106,6 +134,16 @@ The advantage of using @code{popen} and @code{pclose} is that the
 interface is much simpler and easier to use.  But it doesn't offer as
 much flexibility as using the low-level functions directly.
 
+When using pipes to receive data from a subprocess, either with the
+low-level functions or with @code{popen} and @code{pclose}, you must
+make sure to read all the data @emph{before} you wait for the
+subprocess to complete (by calling @code{pclose}, or any of the
+functions described in @pxref{Process Completion}).  This is because,
+if the subprocess writes more data than the pipe's capacity, it will
+block until you read some of it.  If you're waiting for the subprocess
+to complete, you're not doing any reading, so the subprocess will
+never exit, and you'll never read any data---a deadlock condition.
+
 @deftypefun {FILE *} popen (const char *@var{command}, const char *@var{mode})
 @standards{POSIX.2, stdio.h}
 @standards{SVID, stdio.h}
@@ -299,21 +337,3 @@ The directory that would contain the file resides on a read-only file
 system.
 @end table
 @end deftypefun
-
-@node Pipe Atomicity
-@section Atomicity of Pipe I/O
-
-Reading or writing pipe data is @dfn{atomic} if the size of data written
-is not greater than @code{PIPE_BUF}.  This means that the data transfer
-seems to be an instantaneous unit, in that nothing else in the system
-can observe a state in which it is partially complete.  Atomic I/O may
-not begin right away (it may need to wait for buffer space or for data),
-but once it does begin it finishes immediately.
-
-Reading or writing a larger amount of data may not be atomic; for
-example, output data from other processes sharing the descriptor may be
-interspersed.  Also, once @code{PIPE_BUF} characters have been written,
-further writes will block until some characters are read.
-
-@xref{Limits for Files}, for information about the @code{PIPE_BUF}
-parameter.
Štěpán Němec March 25, 2024, 9:32 p.m. UTC | #4
On Mon, 25 Mar 2024 12:20:14 -0400
Zack Weinberg wrote:

> On Mon, Mar 25, 2024, at 8:13 AM, Štěpán Němec wrote:
>>>>  Reading or writing a larger amount of data may not be atomic; for
>>>>  example, output data from other processes sharing the descriptor may be
>>>> -interspersed.  Also, once @code{PIPE_BUF} characters have been written,
>>>> -further writes will block until some characters are read.
>>>> +interspersed.
>>>
>>> Maybe “further may block” instead?  I think the reference to PIPE_BUF
>>> and blocking could still be helpful, except that it's not a guarantee,
>>> as you correctly point out.
>
> It's not correct to say that a write of 65536 bytes will _never_
> block.  Rather, the pipe capacity on Linux is (by default) 65536
> bytes, and, if nothing is reading, _any write_ that tries to put a
> 65537th byte into the pipe will block.  For example, both of these
> will wait 1s before printing "all written":
>
> { dd if=/dev/zero bs=1 count=1 status=none;
>   dd if=/dev/zero bs=65536 count=1 status=none;
>   echo 'all written' >&2; } |
>     { sleep 1; wc -c; }
>
> { dd if=/dev/zero bs=1 count=1 status=none;
>   dd if=/dev/zero bs=65535 count=1 status=none;
>   dd if=/dev/zero bs=1 count=1 status=none;
>   echo 'all written' >&2; } |
>     { sleep 1; wc -c; }

This seems correct and perhaps interesting, but how is it
relevant?  I did not "say that a write of 65536 bytes will
_never_ block".  I used a simple example to illustrate why
the statement in the manual about PIPE_BUF being the factor
causing blocking write was incorrect.

> I agree that it is weird to talk about this in a section that's
> nominally about atomicity.  But I think we shouldn't be calling the
> "no interspersed data from other processes" behavior that we're trying
> to describe here "atomicity" at all!

Why?  The very document you cite below (POSIX write(2))
makes the _atomic_ ("A write is atomic if the whole amount
written in one operation is not interleaved with data from
any other process. [...] This volume of POSIX.1-2017 does
not say whether write requests for more than {PIPE_BUF}
bytes are atomic, but requires that writes of {PIPE_BUF} or
fewer bytes shall be atomic.") vs _blocking_ ("The effective
size of a pipe or FIFO (the maximum amount that can be
written in one operation without blocking) may vary
dynamically, depending on the implementation, so it is not
possible to specify a fixed value for it.") distinction
right at the beginning of RATIONALE, not mentioning that the
terminology seems well established, and etymologically
fitting (ἄτομος meaning “indivisible”).

> Quoting
> <https://pubs.opengroup.org/onlinepubs/9699919799/functions/write.html>:
>
> # Write requests to a pipe or FIFO shall be handled in the same way as
> # a regular file with the following exceptions:
> ...
> # * Write requests of {PIPE_BUF} bytes or less shall not be
> #   interleaved with data from other processes doing writes on the
> #   same pipe. Writes of greater than {PIPE_BUF} bytes may have data
> #   interleaved, on arbitrary boundaries, with writes by other
> #   processes, whether or not the O_NONBLOCK flag of the file status
> #   flags is set.
>
> This is a weak statement.

How so?  See here for the definition of "shall" in POSIXspeak:
https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap01.html#tag_01_05_05

> It does *not* guarantee "that nothing else in the system
> can observe a state in which it is partially complete," as
> the manual currently puts it.

I admit I'm unable to extract much useful meaning from the
vague "nothing else in the system", but if we restrict our
perspective to the two ends of a pipe, "can[not] observe a
state in which it is partially complete" sounds about right,
doesn't it?

> Nor does it guarantee anything about how much a process
> reading from the pipe will receive if it does a larger
> read than the write.  (To put that another way, if you
> write data packets to a pipe, the reader cannot use the
> return value of read() to tell how big the packets were.)

This seems to be confusing atomicity with blocking again.

> Also, it's not clear to me from what you wrote, whether Linux extends
> the no-interleaved-data guarantee writes larger than PIPE_BUF as long
> as they are smaller than the pipe capacity,

I don't know about any such guarantee.

> but if it does, we should say so only in a way that makes
> it clear it's not portable to rely on that.
>
> So I propose the appended revision to pipe.texi instead of what you
> proposed.  It moves all this discussion to the beginning of the
> chapter and explains everything more thoroughly, and hopefully
> also correctly.

FWIW, I find your proposed text clear, helpful and matching
my understanding, and would welcome it to supersede my patch.

Thanks,

  Štěpán
diff mbox series

Patch

diff --git a/manual/pipe.texi b/manual/pipe.texi
index 483c40c5c3dd..8a9a275cafe7 100644
--- a/manual/pipe.texi
+++ b/manual/pipe.texi
@@ -312,8 +312,7 @@ 
 
 Reading or writing a larger amount of data may not be atomic; for
 example, output data from other processes sharing the descriptor may be
-interspersed.  Also, once @code{PIPE_BUF} characters have been written,
-further writes will block until some characters are read.
+interspersed.
 
 @xref{Limits for Files}, for information about the @code{PIPE_BUF}
 parameter.