diff mbox

[net] ipv4: set transport header earlier

Message ID 1373943799.10804.96.camel@edumazet-glaptop
State Accepted, archived
Delegated to: David Miller
Headers show

Commit Message

Eric Dumazet July 16, 2013, 3:03 a.m. UTC
From: Eric Dumazet <edumazet@google.com>

commit 45f00f99d6e ("ipv4: tcp: clean up tcp_v4_early_demux()") added a
performance regression for non GRO traffic, basically disabling
IP early demux.

IPv6 stack resets transport header in ip6_rcv() before calling
IP early demux in ip6_rcv_finish(), while IPv4 does this only in
ip_local_deliver_finish(), _after_ IP early demux.

GRO traffic happened to enable IP early demux because transport header
is also set in inet_gro_receive()

Instead of reverting the faulty commit, we can make IPv4/IPv6 behave the
same : transport_header should be set in ip_rcv() instead of
ip_local_deliver_finish()

ip_local_deliver_finish() can also use skb_network_header_len() which is
faster than ip_hdrlen()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
---
 net/ipv4/ip_input.c |    7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Sergei Shtylyov July 16, 2013, 1:24 p.m. UTC | #1
Hello.

On 16-07-2013 7:03, Eric Dumazet wrote:

> From: Eric Dumazet <edumazet@google.com>

> commit 45f00f99d6e ("ipv4: tcp: clean up tcp_v4_early_demux()") added a
> performance regression for non GRO traffic, basically disabling
> IP early demux.

> IPv6 stack resets transport header in ip6_rcv() before calling
> IP early demux in ip6_rcv_finish(), while IPv4 does this only in
> ip_local_deliver_finish(), _after_ IP early demux.

> GRO traffic happened to enable IP early demux because transport header
> is also set in inet_gro_receive()

> Instead of reverting the faulty commit, we can make IPv4/IPv6 behave the
> same : transport_header should be set in ip_rcv() instead of
> ip_local_deliver_finish()

> ip_local_deliver_finish() can also use skb_network_header_len() which is
> faster than ip_hdrlen()

> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Neal Cardwell <ncardwell@google.com>
> Cc: Tom Herbert <therbert@google.com>
> ---
>   net/ipv4/ip_input.c |    7 +++----
>   1 file changed, 3 insertions(+), 4 deletions(-)

> diff --git a/net/ipv4/ip_input.c b/net/ipv4/ip_input.c
> index 3da817b..15e3e68 100644
> --- a/net/ipv4/ip_input.c
> +++ b/net/ipv4/ip_input.c
[...]
> @@ -437,6 +434,8 @@ int ip_rcv(struct sk_buff *skb, struct net_device *dev, struct packet_type *pt,
>   		goto drop;
>   	}
>
> +	skb->transport_header = skb->network_header + iph->ihl*4;

    Spaces around * wouldn't hurt, to be consistent with the rest of the 
statement and the Linux style in common.

WBR, Sergei

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet July 16, 2013, 1:56 p.m. UTC | #2
On Tue, 2013-07-16 at 17:24 +0400, Sergei Shtylyov wrote:
> > +	skb->transport_header = skb->network_header + iph->ihl*4;
> 
>     Spaces around * wouldn't hurt, to be consistent with the rest of the 
> statement and the Linux style in common.

I am well aware of this, I chose the convention used in this function
and file.

# grep ihl net/ipv4/ip_input.c
	opt->optlen = iph->ihl*4 - sizeof(struct iphdr);
	if (iph->ihl > 5 && ip_rcv_options(skb))
	if (iph->ihl < 5 || iph->version != 4)
	if (!pskb_may_pull(skb, iph->ihl*4))
	if (unlikely(ip_fast_csum((u8 *)iph, iph->ihl)))
	} else if (len < (iph->ihl*4))

Thanks




--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller July 16, 2013, 8 p.m. UTC | #3
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 15 Jul 2013 20:03:19 -0700

> From: Eric Dumazet <edumazet@google.com>
> 
> commit 45f00f99d6e ("ipv4: tcp: clean up tcp_v4_early_demux()") added a
> performance regression for non GRO traffic, basically disabling
> IP early demux.
> 
> IPv6 stack resets transport header in ip6_rcv() before calling
> IP early demux in ip6_rcv_finish(), while IPv4 does this only in
> ip_local_deliver_finish(), _after_ IP early demux.
> 
> GRO traffic happened to enable IP early demux because transport header
> is also set in inet_gro_receive()
> 
> Instead of reverting the faulty commit, we can make IPv4/IPv6 behave the
> same : transport_header should be set in ip_rcv() instead of
> ip_local_deliver_finish()
> 
> ip_local_deliver_finish() can also use skb_network_header_len() which is
> faster than ip_hdrlen()
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied and queued up for -stable, thanks Eric.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/ipv4/ip_input.c b/net/ipv4/ip_input.c
index 3da817b..15e3e68 100644
--- a/net/ipv4/ip_input.c
+++ b/net/ipv4/ip_input.c
@@ -190,10 +190,7 @@  static int ip_local_deliver_finish(struct sk_buff *skb)
 {
 	struct net *net = dev_net(skb->dev);
 
-	__skb_pull(skb, ip_hdrlen(skb));
-
-	/* Point into the IP datagram, just past the header. */
-	skb_reset_transport_header(skb);
+	__skb_pull(skb, skb_network_header_len(skb));
 
 	rcu_read_lock();
 	{
@@ -437,6 +434,8 @@  int ip_rcv(struct sk_buff *skb, struct net_device *dev, struct packet_type *pt,
 		goto drop;
 	}
 
+	skb->transport_header = skb->network_header + iph->ihl*4;
+
 	/* Remove any debris in the socket control block */
 	memset(IPCB(skb), 0, sizeof(struct inet_skb_parm));