diff mbox

libio: Always use _IO_BUFSIZE for stream buffers [BZ #4099]

Message ID 56E17C8E.1070209@redhat.com
State New
Headers show

Commit Message

Florian Weimer March 10, 2016, 1:54 p.m. UTC
Separating this change, as requested by Roland.

Thanks,
Florian

Comments

Roland McGrath March 11, 2016, 9:52 p.m. UTC | #1
Justify with clear rationale.
Florian Weimer March 14, 2016, 11:08 a.m. UTC | #2
On 03/11/2016 10:52 PM, Roland McGrath wrote:
> Justify with clear rationale.

It fixes bug 4099.  We need an arbitrary limit for that.

The libstdc++ buffer size is 8192 (or 8191), so this makes buffering
more consistent across the system.

The PostgreSQL people did extensive benchmarks to determine their
block/page size, and settled for a 8192 (but they do not use stdio
streams, for obvious reasons).

<stdio.h> documents BUFSIZ as the default buffer size. The new
implementation matches that.

Additional memory consumption is limited because file descriptors are a
scarce resource.

I can do some benchmarking, but I don't expect any compelling results.

Florian
Carlos O'Donell March 14, 2016, 7:31 p.m. UTC | #3
On 03/14/2016 07:08 AM, Florian Weimer wrote:
> On 03/11/2016 10:52 PM, Roland McGrath wrote:
>> Justify with clear rationale.
> 
> It fixes bug 4099.  We need an arbitrary limit for that.
> 
> The libstdc++ buffer size is 8192 (or 8191), so this makes buffering
> more consistent across the system.
> 
> The PostgreSQL people did extensive benchmarks to determine their
> block/page size, and settled for a 8192 (but they do not use stdio
> streams, for obvious reasons).
> 
> <stdio.h> documents BUFSIZ as the default buffer size. The new
> implementation matches that.
> 
> Additional memory consumption is limited because file descriptors are a
> scarce resource.
> 
> I can do some benchmarking, but I don't expect any compelling results.

I don't know that benchmarking is required, Roland just asked for clear
rationale.

However, it would be wonderful if you added a microbenchmark just to make
sure we don't actually cause any unforseen problems. This way people can
run such a benchmark again on their remote filesystems and give us results.

Your answer seems clear enough to me. I agree with it too. The advertised
st_blksize is useful only in the abstract. The runtime has to pick something
which works well with the current implementation as a whole.

The only objection I might see is that this is actually a Linux-specific
tuning that you've done. Nobody knows if this tuning has any impact on
Hurd or not.

I would consider this OK to checkin only if you provide a detailed comment
that talks about the tradeoffs being made here and why _IO_BUFSIZE was
chosen.

In summary:
- Add comment just above setting _IO_BUFSIZE about tradeoff [Required]
- Add microbenchmark to avoid surprises [Optional]
Roland McGrath March 18, 2016, 10:52 p.m. UTC | #4
> On 03/11/2016 10:52 PM, Roland McGrath wrote:
> > Justify with clear rationale.
> 
> It fixes bug 4099.  We need an arbitrary limit for that.

That is justification for imposing an arbitrary maximum on the
automatically-chosen size.  Similar logic on the other side of the coin is
justification for imposing an arbitrary minimum on the automatically-chosen
size.  Neither is justification for always using a single fixed size.

> The libstdc++ buffer size is 8192 (or 8191), so this makes buffering
> more consistent across the system.

That's an internal implementation choice in libstdc++.  There is no reason
to expect it to stay the same, nor special reason to think that just
because libstdc++ chose it that it's ideal.

> The PostgreSQL people did extensive benchmarks to determine their
> block/page size, and settled for a 8192 (but they do not use stdio
> streams, for obvious reasons).

That's lovely.  They can inform the implementors of whatever filesystem(s)
they were using in their benchmarks that st_blksize=8192 is what they
should be reporting.

> <stdio.h> documents BUFSIZ as the default buffer size. The new
> implementation matches that.

It's the default in the sense that it's what setbuf uses.  So it's a
permanent part of the ABI and therefore can't be changed easily regardless
of whether it's a desireable value.  If the comments or other documentation
are unclear as to the true (very tiny) significance of BUFSIZ, they should
be fixed.

> Additional memory consumption is limited because file descriptors are a
> scarce resource.

There is no reason to consider file descriptors scarce.
The per-process limit is fungible.

> I can do some benchmarking, but I don't expect any compelling results.

Whatever the results, they would not IMHO be relevant here.

POSIX specifies that st_blksize is the "preferred I/O block size for this
object".  It's the kernel's responsibility to give userland good advice
through this channel.  If there are common buggy kernels that give bad
advice, that is a reason to apply upper and lower limits to the advice from
the kernel.  But the expectation should be that the kernel gets fixed to
give good advice, and the optimal thing to do with a good kernel is to
follow its advice.  

Since the recommended use of st_blksize in this way is a standard user
feature and not just what stdio's implementation happens to do, there is an
argument to be made that the limiting of the value should be done in the
*stat functions reported st_blksize values rather than in stdio's use of
them.  (I'm ambivalent about this point.)


Thanks,
Roland
Florian Weimer March 31, 2016, 10:14 a.m. UTC | #5
On 03/18/2016 11:52 PM, Roland McGrath wrote:

> Whatever the results, they would not IMHO be relevant here.
> 
> POSIX specifies that st_blksize is the "preferred I/O block size for this
> object".  It's the kernel's responsibility to give userland good advice
> through this channel.  If there are common buggy kernels that give bad
> advice, that is a reason to apply upper and lower limits to the advice from
> the kernel.  But the expectation should be that the kernel gets fixed to
> give good advice, and the optimal thing to do with a good kernel is to
> follow its advice.  
> 
> Since the recommended use of st_blksize in this way is a standard user
> feature and not just what stdio's implementation happens to do, there is an
> argument to be made that the limiting of the value should be done in the
> *stat functions reported st_blksize values rather than in stdio's use of
> them.  (I'm ambivalent about this point.)

That's a good point.  I'll try to get feedback from kernel file system
developers on this matter.

Thanks,
Florian
Rich Felker April 1, 2016, 6:19 p.m. UTC | #6
On Fri, Mar 18, 2016 at 03:52:58PM -0700, Roland McGrath wrote:
> > I can do some benchmarking, but I don't expect any compelling results.
> 
> Whatever the results, they would not IMHO be relevant here.
> 
> POSIX specifies that st_blksize is the "preferred I/O block size for this
> object".  It's the kernel's responsibility to give userland good advice
> through this channel.  If there are common buggy kernels that give bad
> advice, that is a reason to apply upper and lower limits to the advice from
> the kernel.  But the expectation should be that the kernel gets fixed to
> give good advice, and the optimal thing to do with a good kernel is to
> follow its advice.  
> 
> Since the recommended use of st_blksize in this way is a standard user
> feature and not just what stdio's implementation happens to do, there is an
> argument to be made that the limiting of the value should be done in the
> *stat functions reported st_blksize values rather than in stdio's use of
> them.  (I'm ambivalent about this point.)

Regardless of st_blksize being "the preferred size", it's not suitable
for stdio, at least not for read purposes, because sparse/random
access reads are a valid application usage for stdio. Reading an
unboundedly large "optimal" block size, only to use one byte and throw
the rest away, is unacceptably pessimistic behavior and is the whole
point behind bug 4099.

If you insist on keeping unboundedly large buffers honoring
st_blksize, one option would be to only use the full buffer for
writing, and limit it to 4k or 8k for reading. But I think it's best
to just ignore st_blksize and use a reasonable buffer size all the
time.

Rich
Carlos O'Donell April 2, 2016, 1:15 a.m. UTC | #7
On 04/01/2016 02:19 PM, Rich Felker wrote:
> On Fri, Mar 18, 2016 at 03:52:58PM -0700, Roland McGrath wrote:
>>> I can do some benchmarking, but I don't expect any compelling results.
>>
>> Whatever the results, they would not IMHO be relevant here.
>>
>> POSIX specifies that st_blksize is the "preferred I/O block size for this
>> object".  It's the kernel's responsibility to give userland good advice
>> through this channel.  If there are common buggy kernels that give bad
>> advice, that is a reason to apply upper and lower limits to the advice from
>> the kernel.  But the expectation should be that the kernel gets fixed to
>> give good advice, and the optimal thing to do with a good kernel is to
>> follow its advice.  
>>
>> Since the recommended use of st_blksize in this way is a standard user
>> feature and not just what stdio's implementation happens to do, there is an
>> argument to be made that the limiting of the value should be done in the
>> *stat functions reported st_blksize values rather than in stdio's use of
>> them.  (I'm ambivalent about this point.)
> 
> Regardless of st_blksize being "the preferred size", it's not suitable
> for stdio, at least not for read purposes, because sparse/random
> access reads are a valid application usage for stdio. Reading an
> unboundedly large "optimal" block size, only to use one byte and throw
> the rest away, is unacceptably pessimistic behavior and is the whole
> point behind bug 4099.
> 
> If you insist on keeping unboundedly large buffers honoring
> st_blksize, one option would be to only use the full buffer for
> writing, and limit it to 4k or 8k for reading. But I think it's best
> to just ignore st_blksize and use a reasonable buffer size all the
> time.

I think Roland has a good point though, if the kernel is going to buffer
for you using filesystem-based knowledge, then why doesn't it just report
an st_blksize that's small, say 8192 bytes, given the implementation?
What purpose does it serve to set st_blksize to 2MB?
Rich Felker April 2, 2016, 1:59 a.m. UTC | #8
On Fri, Apr 01, 2016 at 09:15:21PM -0400, Carlos O'Donell wrote:
> On 04/01/2016 02:19 PM, Rich Felker wrote:
> > On Fri, Mar 18, 2016 at 03:52:58PM -0700, Roland McGrath wrote:
> >>> I can do some benchmarking, but I don't expect any compelling results.
> >>
> >> Whatever the results, they would not IMHO be relevant here.
> >>
> >> POSIX specifies that st_blksize is the "preferred I/O block size for this
> >> object".  It's the kernel's responsibility to give userland good advice
> >> through this channel.  If there are common buggy kernels that give bad
> >> advice, that is a reason to apply upper and lower limits to the advice from
> >> the kernel.  But the expectation should be that the kernel gets fixed to
> >> give good advice, and the optimal thing to do with a good kernel is to
> >> follow its advice.  
> >>
> >> Since the recommended use of st_blksize in this way is a standard user
> >> feature and not just what stdio's implementation happens to do, there is an
> >> argument to be made that the limiting of the value should be done in the
> >> *stat functions reported st_blksize values rather than in stdio's use of
> >> them.  (I'm ambivalent about this point.)
> > 
> > Regardless of st_blksize being "the preferred size", it's not suitable
> > for stdio, at least not for read purposes, because sparse/random
> > access reads are a valid application usage for stdio. Reading an
> > unboundedly large "optimal" block size, only to use one byte and throw
> > the rest away, is unacceptably pessimistic behavior and is the whole
> > point behind bug 4099.
> > 
> > If you insist on keeping unboundedly large buffers honoring
> > st_blksize, one option would be to only use the full buffer for
> > writing, and limit it to 4k or 8k for reading. But I think it's best
> > to just ignore st_blksize and use a reasonable buffer size all the
> > time.
> 
> I think Roland has a good point though, if the kernel is going to buffer
> for you using filesystem-based knowledge, then why doesn't it just report
> an st_blksize that's small, say 8192 bytes, given the implementation?
> What purpose does it serve to set st_blksize to 2MB?

Oh, I totally agree that the kernel is being stupid here. But it's
also stupid for glibc to honor values that obviously do not make sense
for the usage case at hand (reading when you can't know whether you'll
throw most of the result away).

Rich
Roland McGrath April 2, 2016, 2:13 a.m. UTC | #9
> Oh, I totally agree that the kernel is being stupid here. But it's
> also stupid for glibc to honor values that obviously do not make sense
> for the usage case at hand (reading when you can't know whether you'll
> throw most of the result away).

Hence the only actual suggestion I made: apply fixed lower and upper bounds
to the st_blksize value.
diff mbox

Patch

2016-03-08  Florian Weimer  <fweimer@redhat.com>

	[BZ #4099]
	* libio/filedoalloc.c (_IO_file_doallocate): Always use _IO_BUFSIZ
	as the buffer size.

diff --git a/libio/filedoalloc.c b/libio/filedoalloc.c
index 4f9d738..74ff79b 100644
--- a/libio/filedoalloc.c
+++ b/libio/filedoalloc.c
@@ -56,8 +56,6 @@ 
 /* Modified for GNU iostream by Per Bothner 1991, 1992. */
 
 #include "libioP.h"
-#include <device-nrs.h>
-#include <sys/stat.h>
 #include <stdlib.h>
 #include <unistd.h>
 
@@ -72,36 +70,17 @@  local_isatty (int fd)
 }
 
 /* Allocate a file buffer, or switch to unbuffered I/O.  Streams for
-   TTY devices default to line buffered.  */
+ * TTY devices default to line buffered.  */
 int
 _IO_file_doallocate (_IO_FILE *fp)
 {
-  _IO_size_t size;
-  char *p;
-  struct stat64 st;
-
-  size = _IO_BUFSIZ;
-  if (fp->_fileno >= 0 && __builtin_expect (_IO_SYSSTAT (fp, &st), 0) >= 0)
-    {
-      if (S_ISCHR (st.st_mode))
-	{
-	  /* Possibly a tty.  */
-	  if (
-#ifdef DEV_TTY_P
-	      DEV_TTY_P (&st) ||
-#endif
-	      local_isatty (fp->_fileno))
-	    fp->_flags |= _IO_LINE_BUF;
-	}
-#if _IO_HAVE_ST_BLKSIZE
-      if (st.st_blksize > 0)
-	size = st.st_blksize;
-#endif
-    }
-  p = malloc (size);
+  /* Switch to line buffering for TTYs.  */
+  if (fp->_fileno >= 0 && local_isatty (fp->_fileno))
+    fp->_flags |= _IO_LINE_BUF;
+  char *p = malloc (_IO_BUFSIZ);
   if (__glibc_unlikely (p == NULL))
     return EOF;
-  _IO_setb (fp, p, p + size, 1);
+  _IO_setb (fp, p, p + _IO_BUFSIZ, 1);
   return 1;
 }
 libc_hidden_def (_IO_file_doallocate)
-- 
2.4.3