Message ID | 1340981690.25226.3.camel@gurkel.linbit |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
On Fri, 2012-06-29 at 16:54 +0200, Andreas Gruenbacher wrote: > The MSG_NEW_PACKET flag indicates to sendmsg / sendpage that the message or > page should be put into a new packet even when there is still room left in the > previous packet. > > In the tcp protocol, messages which are not sent immediately are queued. When > more data is sent, it will be added to the last segment in that queue until > that segment is "full" whenever possible; only then is a new segment added. > Right now, there is no way to indicate when tcp should start a new segment. > The new flag allows to control that. > > Signed-off-by: Andreas Gruenbacher <agruen@linbit.com> > --- I don't understand how maintaining any message boundaries at sender can prevent any middlebox or the receiver to coalesce frames to any boundaries it prefers ? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 2012-06-29 at 17:11 +0200, Eric Dumazet wrote: > On Fri, 2012-06-29 at 16:54 +0200, Andreas Gruenbacher wrote: > > The MSG_NEW_PACKET flag indicates to sendmsg / sendpage that the message or > > page should be put into a new packet even when there is still room left in the > > previous packet. > > > > In the tcp protocol, messages which are not sent immediately are queued. When > > more data is sent, it will be added to the last segment in that queue until > > that segment is "full" whenever possible; only then is a new segment added. > > Right now, there is no way to indicate when tcp should start a new segment. > > The new flag allows to control that. > > > > Signed-off-by: Andreas Gruenbacher <agruen@linbit.com> > > --- > > I don't understand how maintaining any message boundaries at sender can > prevent any middlebox or the receiver to coalesce frames to any > boundaries it prefers ? The primary use case is fast Gigabit (10 or more) Ethernet connections with jumbo frames and switches that support them. There, frames will go through unchanged and you can zero-copy receive all the time. Not sure how well the approach scales to other kinds of connections; it may work often enough to be worth it. When things get distorted between the sender and the receiver and tcp_recvbio() fails, the data can still be copied out of the socket as before. Andreas -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 2012-06-29 at 17:38 +0200, Andreas Gruenbacher wrote: > The primary use case is fast Gigabit (10 or more) Ethernet connections > with jumbo frames and switches that support them. There, frames will go > through unchanged and you can zero-copy receive all the time. > > Not sure how well the approach scales to other kinds of connections; it > may work often enough to be worth it. When things get distorted between > the sender and the receiver and tcp_recvbio() fails, the data can still > be copied out of the socket as before. If you have a packet loss, receiver can and will coalesce frames. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 2012-06-29 at 19:15 +0200, Eric Dumazet wrote: > On Fri, 2012-06-29 at 17:38 +0200, Andreas Gruenbacher wrote: > > > The primary use case is fast Gigabit (10 or more) Ethernet connections > > with jumbo frames and switches that support them. There, frames will go > > through unchanged and you can zero-copy receive all the time. > > > > Not sure how well the approach scales to other kinds of connections; it > > may work often enough to be worth it. When things get distorted between > > the sender and the receiver and tcp_recvbio() fails, the data can still > > be copied out of the socket as before. > > If you have a packet loss, receiver can and will coalesce frames. That's alright as long as we'll get "back to normal" eventually; the only effect will be that we'll copy data out of the socket receive buffers for a while. There will be extremely little packet loss on the kinds of networks that we want to use this on. Thanks, Andreas -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/include/linux/socket.h b/include/linux/socket.h index 25d6322..be166de 100644 --- a/include/linux/socket.h +++ b/include/linux/socket.h @@ -266,6 +266,7 @@ struct ucred { #define MSG_MORE 0x8000 /* Sender will send more */ #define MSG_WAITFORONE 0x10000 /* recvmmsg(): block until 1+ packets avail */ #define MSG_SENDPAGE_NOTLAST 0x20000 /* sendpage() internal : not the last page */ +#define MSG_NEW_PACKET 0x40000 /* tcp: try to put message into a new packet */ #define MSG_EOF MSG_FIN #define MSG_CMSG_CLOEXEC 0x40000000 /* Set close_on_exit for file diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 3ba605f..148aebe 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -854,7 +854,8 @@ static ssize_t do_tcp_sendpages(struct sock *sk, struct page **pages, int poffse int size = min_t(size_t, psize, PAGE_SIZE - offset); bool can_coalesce; - if (!tcp_send_head(sk) || (copy = size_goal - skb->len) <= 0) { + if (!tcp_send_head(sk) || (copy = size_goal - skb->len) <= 0 || + (flags & MSG_NEW_PACKET)) { new_segment: if (!sk_stream_memory_free(sk)) goto wait_for_sndbuf; @@ -1044,7 +1045,7 @@ int tcp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg, copy = max - skb->len; } - if (copy <= 0) { + if (copy <= 0 || (flags & MSG_NEW_PACKET)) { new_segment: /* Allocate new segment. If the interface is SG, * allocate skb fitting to single page.
The MSG_NEW_PACKET flag indicates to sendmsg / sendpage that the message or page should be put into a new packet even when there is still room left in the previous packet. In the tcp protocol, messages which are not sent immediately are queued. When more data is sent, it will be added to the last segment in that queue until that segment is "full" whenever possible; only then is a new segment added. Right now, there is no way to indicate when tcp should start a new segment. The new flag allows to control that. Signed-off-by: Andreas Gruenbacher <agruen@linbit.com> --- include/linux/socket.h | 1 + net/ipv4/tcp.c | 5 +++-- 2 files changed, 4 insertions(+), 2 deletions(-)