Message ID | 4E48B0C3.2010203@ctc-g.co.jp |
---|---|
State | Rejected, archived |
Delegated to: | David Miller |
Headers | show |
You suggested to use non-blocking writes, but we think we have to rewrite the Apache code if doing so. That is, we have to make a modification to Apache that depends on the architecture. By using this patch, it can be handled by changing the configuration a little bit on the kernel side for such applications that it is difficult to do so on application side. (2011/08/15 14:47), David Miller wrote: > From: "Jun.Kondo"<jun.kondo@ctc-g.co.jp> > Date: Mon, 15 Aug 2011 14:38:11 +0900 > >> 2. to limit the block time of the write in order to >> prevent the timeout of upper layer applications >> even when the connection has low throughput, such >> as low rate streaming > Use non-blocking writes if you want this behavior. > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: "Jun.Kondo" <jun.kondo@ctc-g.co.jp> Date: Fri, 19 Aug 2011 18:28:45 +0900 > You suggested to use non-blocking writes, but we think > we have to rewrite the Apache code if doing so. > That is, we have to make a modification to Apache that > depends on the architecture. > By using this patch, it can be handled by changing the > configuration a little bit on the kernel side for such > applications that it is difficult to do so on application > side. The kernel provides the facilities necessary to achieve your goals. It is a userspace problem. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
By using this patch, we want to prevent "timeout occured over the network that is low throughput but available". But in the current implementation, both blocking and non-blocking, user processes can't recognize the reason in detail when failed to write to socket buffer, we think. is it (really) network problem ? or is wmem not enough free to write? As stated above, we think it is difficult for user processes to handle timeout of writing socket buffer, when wmem is configured large value.(to ensure high throughput over the high ralency network, like 3G). (2011/08/19 18:43), David Miller wrote: > From: "Jun.Kondo"<jun.kondo@ctc-g.co.jp> > Date: Fri, 19 Aug 2011 18:28:45 +0900 > >> You suggested to use non-blocking writes, but we think >> we have to rewrite the Apache code if doing so. >> That is, we have to make a modification to Apache that >> depends on the architecture. >> By using this patch, it can be handled by changing the >> configuration a little bit on the kernel side for such >> applications that it is difficult to do so on application >> side. > The kernel provides the facilities necessary to achieve your > goals. It is a userspace problem. >
On Mon, 22 Aug 2011 09:33:52 +0900, "Jun.Kondo" wrote: > By using this patch, we want to prevent "timeout occured over the network > that is low throughput but available". > > But in the current implementation, both blocking and non-blocking, > user processes can't recognize the reason in detail > when failed to write to socket buffer, we think. For your application it should not matter WHY the data can be written to the peer. It can be happened that the peer close the window, some scheduling bottleneck or whatever else. A blocking socket means for you that some data is in the pipe, waiting for transmit. This is the knowledge that you require, and you should deal with it. A blocking socket does not mean FAILED, a failure is returned via ECONNRESET or otherwise. So everything is fine when your socket blocks. Probably you should adjust your Apache timeouts or other parts of the program logic. > As stated above, we think it is difficult for user processes to handle > timeout of writing socket buffer, > when wmem is configured large value.(to ensure high throughput over the > high ralency network, like 3G). No, you should adjust your code and account that the socket has data in the pipe. That's all. Changing tcp_lowat -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: "Jun.Kondo" <jun.kondo@ctc-g.co.jp> Date: Mon, 22 Aug 2011 09:33:52 +0900 > is it (really) network problem ? > or is wmem not enough free to write? Oh yes you can indeed make this determination, by using the socket timeouts via the SO_RCVTIMEO and SO_SNDTIMEO socket options. Timeouts, when hit, will return -EINTR, whereas lack of buffer space on a non-blocking socket will return -EAGAIN. I think you simply are unaware of the facilities available in the BSD socket API. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Currently, once the transmission buffer becomes full, it is not possible to write again unless there is one third of free space in the transmission buffer. Our modification request is not intending to change the behavior of the OS itself, but making the value "one third" to be configurable, not fixed. Thus it would be still possible to set the value to 1/3. So, could you please tell us why it is not acceptable to make it configurable, and what is the persistence with the value of 1/3? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: "Jun.Kondo" <jun.kondo@ctc-g.co.jp> Date: Thu, 25 Aug 2011 13:46:58 +0900 > Currently, once the transmission buffer becomes full, it is not > possible to write again unless there is one third of free space > in the transmission buffer. Then use a non-blocking socket if you don't want to block. We're talking in circles, and will walk down the same discussions again. You have still not shown what real limitation is created by the way things work currently. I've said everything that I can, and I will thus recuse myself from the rest of this discussion since I really can't add anything more. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
The client of this system is cellular phone, and the status of the communication line with a client varies widely according to its place or congestion situation. In terms of the line speed, it can be around 9Mbps when it is fast, but 8kbps when it is slow. Requirement from customer is to provide stable service in both situation. - In normal situation, acquire large default transmission buffer value, and ensure high throughput from the beginning of tcp connection - On the other hand, even when the connection has low throughput, such as low rate streaming, transmit data without timeout However, when the throughput is low, it takes much time for the transmission buffer to be freed, and timeout will occur during that period. Of course, the connection will not be disconnected when the timeout of application is extended, but end user would not wait patiently as long as 1 minute. Therefore, we do not want to extend the timeout value. By making the threshold, which makes write possible after the buffer is blocked once, configurable, and set it to a small value, it will be possible to return data to client without making timeout occur. So, we think the issue can be solved with this modification. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: "Jun.Kondo" <jun.kondo@ctc-g.co.jp> Date: Fri, 09 Sep 2011 10:33:58 +0900 > - In normal situation, acquire large default transmission > buffer value, and ensure high throughput from the > beginning of tcp connection You should never do this. You should use the default buffer sizes and as a result the kernel's TCP stack automatically adjusts the send and receive buffers in response to the link characteristics. When you set explicit buffer sizes, this turns off the TCP stack's auto-tuning mechanism. Every argument made in support of your proposed feature is based upon a false premise of one kind of another, and this is yet another example of this. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
--- linux-mainline/include/net/sock.h.orig 2011-07-27 14:26:43.000000000 +0900 +++ linux-mainline/include/net/sock.h 2011-08-15 11:40:20.000000000 +0900 @@ -604,9 +604,11 @@ static inline int sk_acceptq_is_full(str /* * Compute minimal free write space needed to queue new packets. */ +extern __u32 sysctl_tcp_lowat; + static inline int sk_stream_min_wspace(struct sock *sk) { - return sk->sk_wmem_queued >> 1; + return sk->sk_wmem_queued >> sysctl_tcp_lowat; } static inline int sk_stream_wspace(struct sock *sk) --- linux-mainline/net/core/sock.c.orig 2011-07-24 05:04:06.000000000 +0900 +++ linux-mainline/net/core/sock.c 2011-08-15 11:34:27.000000000 +0900 @@ -217,6 +217,9 @@ __u32 sysctl_rmem_max __read_mostly = SK __u32 sysctl_wmem_default __read_mostly = SK_WMEM_MAX; __u32 sysctl_rmem_default __read_mostly = SK_RMEM_MAX; +__u32 sysctl_tcp_lowat = 1; +EXPORT_SYMBOL(sysctl_tcp_lowat); + /* Maximal space eaten by iovec or ancillary data plus some space */ int sysctl_optmem_max __read_mostly = sizeof(unsigned long)*(2*UIO_MAXIOV+512); EXPORT_SYMBOL(sysctl_optmem_max); @@ -1330,6 +1333,8 @@ void __init sk_init(void) sysctl_wmem_max = 131071; sysctl_rmem_max = 131071; } + + sysctl_tcp_lowat = 1; } /* --- linux-mainline/net/core/sysctl_net_core.c.orig 2011-05-29 06:01:16.000000000 +0900 +++ linux-mainline/net/core/sysctl_net_core.c 2011-08-15 11:05:38.000000000 +0900 @@ -168,6 +168,13 @@ static struct ctl_table net_core_table[] .proc_handler = rps_sock_flow_sysctl }, #endif + { + .procname = "tcp_lowat", + .data = &sysctl_tcp_lowat, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = &proc_dointvec + }, #endif /* CONFIG_NET */ { .procname = "netdev_budget",