diff mbox

net: Cache route in IP tunnels

Message ID 1386880196.19078.95.camel@edumazet-glaptop2.roam.corp.google.com
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Eric Dumazet Dec. 12, 2013, 8:29 p.m. UTC
On Thu, 2013-12-12 at 12:19 -0800, Tom Herbert wrote:

> btw, I needed to disable gro on the tun interface. With it enabled,
> performance in TCP_RR is abysmal. Looking into that.

This is a known problem.

Given that Jerry is working on GRO I was not really wanting to fix
this...

Thats a basically revert of
( http://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/?id=693019e90ca45d881109d32c0c6d29adf03f6447 ) 



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Tom Herbert Dec. 12, 2013, 8:56 p.m. UTC | #1
On Thu, Dec 12, 2013 at 12:29 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Thu, 2013-12-12 at 12:19 -0800, Tom Herbert wrote:
>
>> btw, I needed to disable gro on the tun interface. With it enabled,
>> performance in TCP_RR is abysmal. Looking into that.
>
> This is a known problem.
>
> Given that Jerry is working on GRO I was not really wanting to fix
> this...
>
> Thats a basically revert of
> ( http://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/?id=693019e90ca45d881109d32c0c6d29adf03f6447 )

Reverting that patch doesn't fix the problem. We are hitting the
spinlock in gro_cells_receive a lot though.


> diff --git a/net/ipv4/ip_tunnel_core.c b/net/ipv4/ip_tunnel_core.c
> index 42ffbc8d65c6..dc7de784e934 100644
> --- a/net/ipv4/ip_tunnel_core.c
> +++ b/net/ipv4/ip_tunnel_core.c
> @@ -111,7 +111,6 @@ int iptunnel_pull_header(struct sk_buff *skb, int hdr_len, __be16 inner_proto)
>                 skb->rxhash = 0;
>         skb_dst_drop(skb);
>         skb->vlan_tci = 0;
> -       skb_set_queue_mapping(skb, 0);
>         skb->pkt_type = PACKET_HOST;
>         return 0;
>  }
>
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Dec. 12, 2013, 9:11 p.m. UTC | #2
On Thu, 2013-12-12 at 12:56 -0800, Tom Herbert wrote:

> Reverting that patch doesn't fix the problem. We are hitting the
> spinlock in gro_cells_receive a lot though.

Well, something broke then.

I was able to reach line rate at the time I did this work.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tom Herbert Dec. 12, 2013, 10:54 p.m. UTC | #3
On Thu, Dec 12, 2013 at 1:11 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Thu, 2013-12-12 at 12:56 -0800, Tom Herbert wrote:
>
>> Reverting that patch doesn't fix the problem. We are hitting the
>> spinlock in gro_cells_receive a lot though.
>
> Well, something broke then.
>

The test I'm running is over a single tunnel between two hosts, so all
packets will be hitting the same RX queue since we only steer by
3-tuple hash. Clearing or not clearing hash will have no effect.

Nevertheless, I don't see how code how gro_cells_receive (called via
ip_tunnel_rcv) could have ever have worked right for tunnels. By the
time this function is called, we've probably already spread the
packets from RX queue across multiple CPUs via RPS/RFS. So putting
them back on a queue based on RX queue would be a source of
contention.  It seems like we should be using another per CPU backlog.

> I was able to reach line rate at the time I did this work.
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Dec. 12, 2013, 11:24 p.m. UTC | #4
On Thu, 2013-12-12 at 14:54 -0800, Tom Herbert wrote:

> 
> The test I'm running is over a single tunnel between two hosts, so all
> packets will be hitting the same RX queue since we only steer by
> 3-tuple hash. Clearing or not clearing hash will have no effect.

If a single RX queue is used, how can you have contention on a
spinlock ? A single CPU should be used.


> 
> Nevertheless, I don't see how code how gro_cells_receive (called via
> ip_tunnel_rcv) could have ever have worked right for tunnels. By the
> time this function is called, we've probably already spread the
> packets from RX queue across multiple CPUs via RPS/RFS. So putting
> them back on a queue based on RX queue would be a source of
> contention.  It seems like we should be using another per CPU backlog.

This is explained in c9e6bc644e557338221e75c242ab12c275a67d1b changelog

I explicitly explained all this.

The NAPI _is_ the backlog. But you need to avoid crossing cpus,
obviously.

If you spread the load on multiple cpus, you need to instruct the
gro_cell layer to use a proper key to spread the load on different NAPI
cells.

At the time I did this work, I stated the best performance would be
reached without playing with RPS and these awful IPI. Scheduler guys
have hard time with RPS/RFS.

This is all moot, as Jerry is working to do the GRO without having this
extra layer.

And RPS being done _after_ GRO, if you receive traffic on a single RX
queue, you are stuck. Its actually easy to play on outer IP headers
to get proper hash dispersion.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/ipv4/ip_tunnel_core.c b/net/ipv4/ip_tunnel_core.c
index 42ffbc8d65c6..dc7de784e934 100644
--- a/net/ipv4/ip_tunnel_core.c
+++ b/net/ipv4/ip_tunnel_core.c
@@ -111,7 +111,6 @@  int iptunnel_pull_header(struct sk_buff *skb, int hdr_len, __be16 inner_proto)
 		skb->rxhash = 0;
 	skb_dst_drop(skb);
 	skb->vlan_tci = 0;
-	skb_set_queue_mapping(skb, 0);
 	skb->pkt_type = PACKET_HOST;
 	return 0;
 }