From patchwork Thu Apr 5 13:05:35 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Eric Dumazet X-Patchwork-Id: 150954 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 0DA31B705A for ; Thu, 5 Apr 2012 23:05:44 +1000 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752912Ab2DENFm (ORCPT ); Thu, 5 Apr 2012 09:05:42 -0400 Received: from mail-wg0-f44.google.com ([74.125.82.44]:63662 "EHLO mail-wg0-f44.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751840Ab2DENFl (ORCPT ); Thu, 5 Apr 2012 09:05:41 -0400 Received: by wgbdr13 with SMTP id dr13so1300396wgb.1 for ; Thu, 05 Apr 2012 06:05:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=subject:from:to:cc:in-reply-to:references:content-type:date :message-id:mime-version:x-mailer:content-transfer-encoding; bh=MuyoyNu0TMZ7/PbENZKo7KVym+OtiL3CWUBQVfJMOOg=; b=mkMmzvNW2pVe6N5GjH57OSncyxazHbROlUThzlYchNleWLtlMKNEUTniwuErHnDEJM ZiuunFb4wlsjvVwdzWZh6CbzcGlkRXKdXn915ogEWnPVsr7YObUfp85pc2F6240S+DA+ NF1iWWcx8PmDgg/ocAsmqrSAPtptgRlgFl2ETyvmiCp6l1T+WD4eMGOp4MKoS6H5tj1W JxBBzV9T/iyS3mnt6YFBxlshP8MyawgAFccyCMorkAw1nrFaA2lD2Ttt5t+9dqqn/fTy K+d8uft50t8pC1w6UZ50d5x+mRZzxA77UXWAZqAbd+XGrrI2F0RTrBRkAjlxFkQuhsba sMzQ== Received: by 10.180.104.230 with SMTP id gh6mr4713446wib.22.1333631139545; Thu, 05 Apr 2012 06:05:39 -0700 (PDT) Received: from [172.28.91.239] ([74.125.122.49]) by mx.google.com with ESMTPS id j3sm16897307wiw.1.2012.04.05.06.05.36 (version=SSLv3 cipher=OTHER); Thu, 05 Apr 2012 06:05:38 -0700 (PDT) Subject: Re: [PATCH] tcp: allow splice() to build full TSO packets From: Eric Dumazet To: David Miller Cc: netdev@vger.kernel.org, ncardwell@google.com, therbert@google.com, ycheng@google.com, hkchu@google.com, maze@google.com, maheshb@google.com, ilpo.jarvinen@helsinki.fi, nanditad@google.com In-Reply-To: <20120403.173614.962252876842659412.davem@davemloft.net> References: <1333481821.18626.322.camel@edumazet-glaptop> <20120403.172126.672236532461758456.davem@davemloft.net> <1333488689.18626.331.camel@edumazet-glaptop> <20120403.173614.962252876842659412.davem@davemloft.net> Date: Thu, 05 Apr 2012 15:05:35 +0200 Message-ID: <1333631135.18626.606.camel@edumazet-glaptop> Mime-Version: 1.0 X-Mailer: Evolution 2.28.3 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org On Tue, 2012-04-03 at 17:36 -0400, David Miller wrote: > From: Eric Dumazet > Date: Tue, 03 Apr 2012 23:31:29 +0200 > > > The code in tcp_sendmsg() and do_tcp_sendpages() is similar (actually > > probably copy/pasted) but the thing is tcp_sendmsg() is called once per > > sendmsg() call (and the push logic is OK at the end of it), while a > > single splice() system call can call do_tcp_sendpages() 16 times (or > > even more if pipe buffer was extended by fcntl(F_SETPIPE_SZ)) > > Ok, so this means that in essence the tcp_mark_push should also only > be done in the final sendpage call. > > And since I'm wholly convinced that the URG stuff is a complete > "don't care" for this path, I'm convinced your patch is the right > thing to do. > > Applied to 'net' and queued up for -stable, thanks Eric. Hmm, thinking again about this, I did more tests and it appears we need to differentiate the SPLICE_F_MORE flag (user request) and the internal marker provided by splice logic (handling a batch of pages) A program doing splice(... SPLICE_F_MORE) should really call tcp_push() at the end of its work. Thanks [PATCH] tcp: tcp_sendpages() should call tcp_push() once commit 2f533844242 (tcp: allow splice() to build full TSO packets) added a regression for splice() calls using SPLICE_F_MORE. We need to call tcp_flush() at the end of the last page processed in tcp_sendpages(), or else transmits can be deferred and future sends stall. Add a new internal flag, MSG_SENDPAGE_NOTLAST, acting like MSG_MORE, but with different semantic. For all sendpage() providers, its a transparent change. Only sock_sendpage() and tcp_sendpages() can differentiate the two different flags provided by pipe_to_sendpage() Reported-by: Tom Herbert Cc: Nandita Dukkipati Cc: Neal Cardwell Cc: Tom Herbert Cc: Yuchung Cheng Cc: H.K. Jerry Chu Cc: Maciej Żenczykowski Cc: Mahesh Bandewar Cc: Ilpo Järvinen Signed-off-by: Eric Dumazet com> --- fs/splice.c | 5 ++++- include/linux/socket.h | 2 +- net/ipv4/tcp.c | 2 +- net/socket.c | 6 +++--- 4 files changed, 9 insertions(+), 6 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/fs/splice.c b/fs/splice.c index 5f883de..f847684 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -30,6 +30,7 @@ #include #include #include +#include /* * Attempt to steal a page from a pipe buffer. This should perhaps go into @@ -690,7 +691,9 @@ static int pipe_to_sendpage(struct pipe_inode_info *pipe, if (!likely(file->f_op && file->f_op->sendpage)) return -EINVAL; - more = (sd->flags & SPLICE_F_MORE) || sd->len < sd->total_len; + more = (sd->flags & SPLICE_F_MORE) ? MSG_MORE : 0; + if (sd->len < sd->total_len) + more |= MSG_SENDPAGE_NOTLAST; return file->f_op->sendpage(file, buf->page, buf->offset, sd->len, &pos, more); } diff --git a/include/linux/socket.h b/include/linux/socket.h index da2d3e2..b84bbd4 100644 --- a/include/linux/socket.h +++ b/include/linux/socket.h @@ -265,7 +265,7 @@ struct ucred { #define MSG_NOSIGNAL 0x4000 /* Do not generate SIGPIPE */ #define MSG_MORE 0x8000 /* Sender will send more */ #define MSG_WAITFORONE 0x10000 /* recvmmsg(): block until 1+ packets avail */ - +#define MSG_SENDPAGE_NOTLAST 0x20000 /* sendpage() internal : not the last page */ #define MSG_EOF MSG_FIN #define MSG_CMSG_CLOEXEC 0x40000000 /* Set close_on_exit for file diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 2ff6f45..5d54ed3 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -860,7 +860,7 @@ wait_for_memory: } out: - if (copied && !(flags & MSG_MORE)) + if (copied && !(flags & MSG_SENDPAGE_NOTLAST)) tcp_push(sk, flags, mss_now, tp->nonagle); return copied; diff --git a/net/socket.c b/net/socket.c index 484cc69..851edcd 100644 --- a/net/socket.c +++ b/net/socket.c @@ -811,9 +811,9 @@ static ssize_t sock_sendpage(struct file *file, struct page *page, sock = file->private_data; - flags = !(file->f_flags & O_NONBLOCK) ? 0 : MSG_DONTWAIT; - if (more) - flags |= MSG_MORE; + flags = (file->f_flags & O_NONBLOCK) ? MSG_DONTWAIT : 0; + /* more is a combination of MSG_MORE and MSG_SENDPAGE_NOTLAST */ + flags |= more; return kernel_sendpage(sock, page, offset, size, flags); }