Message ID | 4967DF10.2010107@cosmosbay.com |
---|---|
State | Accepted, archived |
Delegated to: | David Miller |
Headers | show |
From: Eric Dumazet <dada1@cosmosbay.com> Date: Sat, 10 Jan 2009 00:34:40 +0100 > David, do you think we still must call __tcp_splice_read() only once > in tcp_splice_read() if SPLICE_F_NONBLOCK is set ? Eric, I'll get to this thread as soon as I can, perhaps tomorrow. I want to get all of the build fallout and bug fixes for 2.6.29-rcX sorted before everyone heads off to LCA in the next week or so :-) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Eric Dumazet <dada1@cosmosbay.com> Date: Sat, 10 Jan 2009 00:34:40 +0100 > David, do you think we still must call __tcp_splice_read() only once > in tcp_splice_read() if SPLICE_F_NONBLOCK is set ? You seem to be working that out in another thread :-) > [PATCH] tcp: splice as many packets as possible at once > > As spotted by Willy Tarreau, current splice() from tcp socket to pipe is not > optimal. It processes at most one segment per call. > This results in low performance and very high overhead due to syscall rate > when splicing from interfaces which do not support LRO. > > Willy provided a patch inside tcp_splice_read(), but a better fix > is to let tcp_read_sock() process as many segments as possible, so > that tcp_rcv_space_adjust() and tcp_cleanup_rbuf() are called less > often. > > With this change, splice() behaves like tcp_recvmsg(), being able > to consume many skbs in one system call. With typical 1460 bytes > of payload per frame, that means splice(SPLICE_F_NONBLOCK) can return > 16*1460 = 23360 bytes. > > Signed-off-by: Willy Tarreau <w@1wt.eu> > Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> I've applied this, thanks! -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index bd6ff90..1233835 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -522,8 +522,12 @@ static int tcp_splice_data_recv(read_descriptor_t *rd_desc, struct sk_buff *skb, unsigned int offset, size_t len) { struct tcp_splice_state *tss = rd_desc->arg.data; + int ret; - return skb_splice_bits(skb, offset, tss->pipe, tss->len, tss->flags); + ret = skb_splice_bits(skb, offset, tss->pipe, rd_desc->count, tss->flags); + if (ret > 0) + rd_desc->count -= ret; + return ret; } static int __tcp_splice_read(struct sock *sk, struct tcp_splice_state *tss) @@ -531,6 +535,7 @@ static int __tcp_splice_read(struct sock *sk, struct tcp_splice_state *tss) /* Store TCP splice context information in read_descriptor_t. */ read_descriptor_t rd_desc = { .arg.data = tss, + .count = tss->len, }; return tcp_read_sock(sk, &rd_desc, tcp_splice_data_recv); @@ -611,11 +616,13 @@ ssize_t tcp_splice_read(struct socket *sock, loff_t *ppos, tss.len -= ret; spliced += ret; + if (!timeo) + break; release_sock(sk); lock_sock(sk); if (sk->sk_err || sk->sk_state == TCP_CLOSE || - (sk->sk_shutdown & RCV_SHUTDOWN) || !timeo || + (sk->sk_shutdown & RCV_SHUTDOWN) || signal_pending(current)) break; }