Message ID | 1386880196.19078.95.camel@edumazet-glaptop2.roam.corp.google.com |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
On Thu, Dec 12, 2013 at 12:29 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Thu, 2013-12-12 at 12:19 -0800, Tom Herbert wrote: > >> btw, I needed to disable gro on the tun interface. With it enabled, >> performance in TCP_RR is abysmal. Looking into that. > > This is a known problem. > > Given that Jerry is working on GRO I was not really wanting to fix > this... > > Thats a basically revert of > ( http://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/?id=693019e90ca45d881109d32c0c6d29adf03f6447 ) Reverting that patch doesn't fix the problem. We are hitting the spinlock in gro_cells_receive a lot though. > diff --git a/net/ipv4/ip_tunnel_core.c b/net/ipv4/ip_tunnel_core.c > index 42ffbc8d65c6..dc7de784e934 100644 > --- a/net/ipv4/ip_tunnel_core.c > +++ b/net/ipv4/ip_tunnel_core.c > @@ -111,7 +111,6 @@ int iptunnel_pull_header(struct sk_buff *skb, int hdr_len, __be16 inner_proto) > skb->rxhash = 0; > skb_dst_drop(skb); > skb->vlan_tci = 0; > - skb_set_queue_mapping(skb, 0); > skb->pkt_type = PACKET_HOST; > return 0; > } > > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2013-12-12 at 12:56 -0800, Tom Herbert wrote: > Reverting that patch doesn't fix the problem. We are hitting the > spinlock in gro_cells_receive a lot though. Well, something broke then. I was able to reach line rate at the time I did this work. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Dec 12, 2013 at 1:11 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Thu, 2013-12-12 at 12:56 -0800, Tom Herbert wrote: > >> Reverting that patch doesn't fix the problem. We are hitting the >> spinlock in gro_cells_receive a lot though. > > Well, something broke then. > The test I'm running is over a single tunnel between two hosts, so all packets will be hitting the same RX queue since we only steer by 3-tuple hash. Clearing or not clearing hash will have no effect. Nevertheless, I don't see how code how gro_cells_receive (called via ip_tunnel_rcv) could have ever have worked right for tunnels. By the time this function is called, we've probably already spread the packets from RX queue across multiple CPUs via RPS/RFS. So putting them back on a queue based on RX queue would be a source of contention. It seems like we should be using another per CPU backlog. > I was able to reach line rate at the time I did this work. > > > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2013-12-12 at 14:54 -0800, Tom Herbert wrote: > > The test I'm running is over a single tunnel between two hosts, so all > packets will be hitting the same RX queue since we only steer by > 3-tuple hash. Clearing or not clearing hash will have no effect. If a single RX queue is used, how can you have contention on a spinlock ? A single CPU should be used. > > Nevertheless, I don't see how code how gro_cells_receive (called via > ip_tunnel_rcv) could have ever have worked right for tunnels. By the > time this function is called, we've probably already spread the > packets from RX queue across multiple CPUs via RPS/RFS. So putting > them back on a queue based on RX queue would be a source of > contention. It seems like we should be using another per CPU backlog. This is explained in c9e6bc644e557338221e75c242ab12c275a67d1b changelog I explicitly explained all this. The NAPI _is_ the backlog. But you need to avoid crossing cpus, obviously. If you spread the load on multiple cpus, you need to instruct the gro_cell layer to use a proper key to spread the load on different NAPI cells. At the time I did this work, I stated the best performance would be reached without playing with RPS and these awful IPI. Scheduler guys have hard time with RPS/RFS. This is all moot, as Jerry is working to do the GRO without having this extra layer. And RPS being done _after_ GRO, if you receive traffic on a single RX queue, you are stuck. Its actually easy to play on outer IP headers to get proper hash dispersion. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/net/ipv4/ip_tunnel_core.c b/net/ipv4/ip_tunnel_core.c index 42ffbc8d65c6..dc7de784e934 100644 --- a/net/ipv4/ip_tunnel_core.c +++ b/net/ipv4/ip_tunnel_core.c @@ -111,7 +111,6 @@ int iptunnel_pull_header(struct sk_buff *skb, int hdr_len, __be16 inner_proto) skb->rxhash = 0; skb_dst_drop(skb); skb->vlan_tci = 0; - skb_set_queue_mapping(skb, 0); skb->pkt_type = PACKET_HOST; return 0; }