diff mbox

libio: Always use _IO_BUFSIZE for stream buffers [BZ #4099]

Message ID dd5e4979-233b-76ea-806f-75422571fd6a@redhat.com
State New
Headers show

Commit Message

Florian Weimer May 19, 2016, 2:57 p.m. UTC
On 03/31/2016 12:14 PM, Florian Weimer wrote:
> On 03/18/2016 11:52 PM, Roland McGrath wrote:
>
>> Whatever the results, they would not IMHO be relevant here.
>>
>> POSIX specifies that st_blksize is the "preferred I/O block size for this
>> object".  It's the kernel's responsibility to give userland good advice
>> through this channel.  If there are common buggy kernels that give bad
>> advice, that is a reason to apply upper and lower limits to the advice from
>> the kernel.  But the expectation should be that the kernel gets fixed to
>> give good advice, and the optimal thing to do with a good kernel is to
>> follow its advice.
>>
>> Since the recommended use of st_blksize in this way is a standard user
>> feature and not just what stdio's implementation happens to do, there is an
>> argument to be made that the limiting of the value should be done in the
>> *stat functions reported st_blksize values rather than in stdio's use of
>> them.  (I'm ambivalent about this point.)
>
> That's a good point.  I'll try to get feedback from kernel file system
> developers on this matter.

I wasn't able to get any feedback.  Based on Rich's point about random 
I/O and Roland's earlier suggestion, I'm just capping the reported 
buffer size to 8192 in the attached patch.

Thanks,
Florian

Comments

Florian Weimer June 24, 2016, 3:35 p.m. UTC | #1
On 05/19/2016 04:57 PM, Florian Weimer wrote:
> On 03/31/2016 12:14 PM, Florian Weimer wrote:
>> On 03/18/2016 11:52 PM, Roland McGrath wrote:
>>
>>> Whatever the results, they would not IMHO be relevant here.
>>>
>>> POSIX specifies that st_blksize is the "preferred I/O block size for
>>> this
>>> object".  It's the kernel's responsibility to give userland good advice
>>> through this channel.  If there are common buggy kernels that give bad
>>> advice, that is a reason to apply upper and lower limits to the
>>> advice from
>>> the kernel.  But the expectation should be that the kernel gets fixed to
>>> give good advice, and the optimal thing to do with a good kernel is to
>>> follow its advice.
>>>
>>> Since the recommended use of st_blksize in this way is a standard user
>>> feature and not just what stdio's implementation happens to do, there
>>> is an
>>> argument to be made that the limiting of the value should be done in the
>>> *stat functions reported st_blksize values rather than in stdio's use of
>>> them.  (I'm ambivalent about this point.)
>>
>> That's a good point.  I'll try to get feedback from kernel file system
>> developers on this matter.
>
> I wasn't able to get any feedback.  Based on Rich's point about random
> I/O and Roland's earlier suggestion, I'm just capping the reported
> buffer size to 8192 in the attached patch.

Ping?

   <https://sourceware.org/ml/libc-alpha/2016-05/msg00428.html>

Thanks,
Florian
Florian Weimer Nov. 29, 2016, 4:50 p.m. UTC | #2
On 06/24/2016 05:35 PM, Florian Weimer wrote:
> On 05/19/2016 04:57 PM, Florian Weimer wrote:
>> On 03/31/2016 12:14 PM, Florian Weimer wrote:
>>> On 03/18/2016 11:52 PM, Roland McGrath wrote:
>>>
>>>> Whatever the results, they would not IMHO be relevant here.
>>>>
>>>> POSIX specifies that st_blksize is the "preferred I/O block size for
>>>> this
>>>> object".  It's the kernel's responsibility to give userland good advice
>>>> through this channel.  If there are common buggy kernels that give bad
>>>> advice, that is a reason to apply upper and lower limits to the
>>>> advice from
>>>> the kernel.  But the expectation should be that the kernel gets
>>>> fixed to
>>>> give good advice, and the optimal thing to do with a good kernel is to
>>>> follow its advice.
>>>>
>>>> Since the recommended use of st_blksize in this way is a standard user
>>>> feature and not just what stdio's implementation happens to do, there
>>>> is an
>>>> argument to be made that the limiting of the value should be done in
>>>> the
>>>> *stat functions reported st_blksize values rather than in stdio's
>>>> use of
>>>> them.  (I'm ambivalent about this point.)
>>>
>>> That's a good point.  I'll try to get feedback from kernel file system
>>> developers on this matter.
>>
>> I wasn't able to get any feedback.  Based on Rich's point about random
>> I/O and Roland's earlier suggestion, I'm just capping the reported
>> buffer size to 8192 in the attached patch.
>
> Ping?
>
>   <https://sourceware.org/ml/libc-alpha/2016-05/msg00428.html>

Ping?

Thanks,
Florian
Carlos O'Donell Nov. 30, 2016, 1:15 p.m. UTC | #3
On 11/29/2016 11:50 AM, Florian Weimer wrote:
> On 06/24/2016 05:35 PM, Florian Weimer wrote:
>> On 05/19/2016 04:57 PM, Florian Weimer wrote:
>>> On 03/31/2016 12:14 PM, Florian Weimer wrote:
>>>> On 03/18/2016 11:52 PM, Roland McGrath wrote:
>>>>
>>>>> Whatever the results, they would not IMHO be relevant here.
>>>>>
>>>>> POSIX specifies that st_blksize is the "preferred I/O block size for
>>>>> this
>>>>> object".  It's the kernel's responsibility to give userland good advice
>>>>> through this channel.  If there are common buggy kernels that give bad
>>>>> advice, that is a reason to apply upper and lower limits to the
>>>>> advice from
>>>>> the kernel.  But the expectation should be that the kernel gets
>>>>> fixed to
>>>>> give good advice, and the optimal thing to do with a good kernel is to
>>>>> follow its advice.
>>>>>
>>>>> Since the recommended use of st_blksize in this way is a standard user
>>>>> feature and not just what stdio's implementation happens to do, there
>>>>> is an
>>>>> argument to be made that the limiting of the value should be done in
>>>>> the
>>>>> *stat functions reported st_blksize values rather than in stdio's
>>>>> use of
>>>>> them.  (I'm ambivalent about this point.)
>>>>
>>>> That's a good point.  I'll try to get feedback from kernel file system
>>>> developers on this matter.
>>>
>>> I wasn't able to get any feedback.  Based on Rich's point about random
>>> I/O and Roland's earlier suggestion, I'm just capping the reported
>>> buffer size to 8192 in the attached patch.
>>
>> Ping?
>>
>>   <https://sourceware.org/ml/libc-alpha/2016-05/msg00428.html>
> 
> Ping?

Just to confirm the user can still call stat themselves and adjust the
buffer size up if they want?

If that's the case then this looks good to me and users have a way
to re-enable the old behaviour.

This kind of tuning can be changed at will from release to release
or patched by the distributions if they have problems with the change.

I don't see any real issue here, the kernel is returning "optimal" values
but this conflicts with what the user themselves want to be doing. Thus
providing a fixed cap for the default is fine (as long as the user can undo
the choice).
Florian Weimer Nov. 30, 2016, 1:17 p.m. UTC | #4
On 11/30/2016 02:15 PM, Carlos O'Donell wrote:
> On 11/29/2016 11:50 AM, Florian Weimer wrote:
>> On 06/24/2016 05:35 PM, Florian Weimer wrote:
>>> On 05/19/2016 04:57 PM, Florian Weimer wrote:
>>>> On 03/31/2016 12:14 PM, Florian Weimer wrote:
>>>>> On 03/18/2016 11:52 PM, Roland McGrath wrote:
>>>>>
>>>>>> Whatever the results, they would not IMHO be relevant here.
>>>>>>
>>>>>> POSIX specifies that st_blksize is the "preferred I/O block size for
>>>>>> this
>>>>>> object".  It's the kernel's responsibility to give userland good advice
>>>>>> through this channel.  If there are common buggy kernels that give bad
>>>>>> advice, that is a reason to apply upper and lower limits to the
>>>>>> advice from
>>>>>> the kernel.  But the expectation should be that the kernel gets
>>>>>> fixed to
>>>>>> give good advice, and the optimal thing to do with a good kernel is to
>>>>>> follow its advice.
>>>>>>
>>>>>> Since the recommended use of st_blksize in this way is a standard user
>>>>>> feature and not just what stdio's implementation happens to do, there
>>>>>> is an
>>>>>> argument to be made that the limiting of the value should be done in
>>>>>> the
>>>>>> *stat functions reported st_blksize values rather than in stdio's
>>>>>> use of
>>>>>> them.  (I'm ambivalent about this point.)
>>>>>
>>>>> That's a good point.  I'll try to get feedback from kernel file system
>>>>> developers on this matter.
>>>>
>>>> I wasn't able to get any feedback.  Based on Rich's point about random
>>>> I/O and Roland's earlier suggestion, I'm just capping the reported
>>>> buffer size to 8192 in the attached patch.
>>>
>>> Ping?
>>>
>>>   <https://sourceware.org/ml/libc-alpha/2016-05/msg00428.html>
>>
>> Ping?
>
> Just to confirm the user can still call stat themselves and adjust the
> buffer size up if they want?

They can still install their own buffer, yes.

> If that's the case then this looks good to me and users have a way
> to re-enable the old behaviour.

Thanks,
Florian
Carlos O'Donell Nov. 30, 2016, 9:14 p.m. UTC | #5
On 11/30/2016 08:17 AM, Florian Weimer wrote:
> They can still install their own buffer, yes.

The only tradeoff I see is that you're making wasteful
write syscalls while the kernel waits for enough data
to optimally send to the target device.

You really have to know a lot about your device to have
enough information to do this optimally.

xstat()
https://lwn.net/Articles/686106/
~~~
Howells noted that Dave Chinner wanted more I/O 
parameters (e.g. preferred read and write sizes, 
erase block size). There were five to seven 
different numbers that Chinner wanted, but those 
could always be added later, he said. 
~~~

And the sysfs information is useless for my SATA disks...

[carlos@athas ~]$ cat /sys/block/sda/queue/optimal_io_size 
0

So we aren't yet at any kind of utopia where there is
enough information to make the right choice. Even then
the "right" choice depends on what you're trying to do
e.g. small write, large write, throughput, low-latency etc.
diff mbox

Patch

libio: Limit buffer size to 8192 bytes [BZ #4099]

This avoids overly large buffers with network file systems which report
very large block sizes.

2016-05-19  Florian Weimer  <fweimer@redhat.com>

	[BZ #4099]
	* libio/filedoalloc.c (_IO_file_doallocate): Limit buffer size to
	_IO_BUFSIZ (8192).

diff --git a/NEWS b/NEWS
index b3fd3cc..dec9757 100644
--- a/NEWS
+++ b/NEWS
@@ -33,6 +33,12 @@  Version 2.24
   group: files [SUCCESS=merge] nis
   Implemented by Stephen Gallagher (Red Hat).
 
+* The buffer size for byte-oriented stdio streams is now limited to 8192
+  bytes by default.  Previously, on Linux, the default buffer size on most
+  file systems was 4096 bytes (and thus remains unchanged), except on
+  network file systems, where the buffer size was unpredictable and could be
+  as large as several megabytes.
+
 Security related changes:
 
 * An unnecessary stack copy in _nss_dns_getnetbyname_r was removed.  It
diff --git a/libio/filedoalloc.c b/libio/filedoalloc.c
index 4f9d738..ded0725 100644
--- a/libio/filedoalloc.c
+++ b/libio/filedoalloc.c
@@ -94,7 +94,7 @@  _IO_file_doallocate (_IO_FILE *fp)
 	    fp->_flags |= _IO_LINE_BUF;
 	}
 #if _IO_HAVE_ST_BLKSIZE
-      if (st.st_blksize > 0)
+      if (st.st_blksize > 0 && st.st_blksize < _IO_BUFSIZ)
 	size = st.st_blksize;
 #endif
     }