Message ID | 1375818663-12318-1-git-send-email-sbohrer@rgmadvisors.com |
---|---|
State | Rejected, archived |
Delegated to: | David Miller |
Headers | show |
On Tue, 2013-08-06 at 14:51 -0500, Shawn Bohrer wrote: > Set the napi id for each socket in the multicast path to enable > low-latency/polling support. > > Signed-off-by: Shawn Bohrer <sbohrer@rgmadvisors.com> > --- > v2 include ipv6 support This might help your workload, but I doubt it is generic enough. One UDP socket is supposed to receive traffic from many endpoints, so we have no guarantee all the received traffic will end on a single RX queue on the NIC. That's the same logic than RFS here. sk_mark_napi_id() in UDP are wrong IMHO. It should be guarded by the following test in __udp_queue_rcv_skb() if (inet_sk(sk)->inet_daddr) { sock_rps_save_rxhash(sk, skb); sk_mark_napi_id(sk, skb); } (To occur only for connected UDP sockets, where we are 100% sure all packets will use this same rxhash/rx queue) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 07/08/2013 23:22, Eric Dumazet wrote: > On Tue, 2013-08-06 at 14:51 -0500, Shawn Bohrer wrote: >> Set the napi id for each socket in the multicast path to enable >> low-latency/polling support. >> >> Signed-off-by: Shawn Bohrer <sbohrer@rgmadvisors.com> >> --- >> v2 include ipv6 support > > This might help your workload, but I doubt it is generic enough. > > One UDP socket is supposed to receive traffic from many endpoints, > so we have no guarantee all the received traffic will end on a single RX > queue on the NIC. > > That's the same logic than RFS here. > > sk_mark_napi_id() in UDP are wrong IMHO. > > It should be guarded by the following test in > __udp_queue_rcv_skb() > > if (inet_sk(sk)->inet_daddr) { > sock_rps_save_rxhash(sk, skb); > sk_mark_napi_id(sk, skb); > } > > (To occur only for connected UDP sockets, where we are 100% sure all > packets will use this same rxhash/rx queue) This would also be safe if there is only one NIC and said NIC was programmed to always place this socket's data on the same queue. I don't have a good suggestion on how to detect this. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2013-08-08 at 11:46 +0300, Eliezer Tamir wrote: > On 07/08/2013 23:22, Eric Dumazet wrote: > > On Tue, 2013-08-06 at 14:51 -0500, Shawn Bohrer wrote: > >> Set the napi id for each socket in the multicast path to enable > >> low-latency/polling support. > >> > >> Signed-off-by: Shawn Bohrer <sbohrer@rgmadvisors.com> > >> --- > >> v2 include ipv6 support > > > > This might help your workload, but I doubt it is generic enough. > > > > One UDP socket is supposed to receive traffic from many endpoints, > > so we have no guarantee all the received traffic will end on a single RX > > queue on the NIC. > > > > That's the same logic than RFS here. > > > > sk_mark_napi_id() in UDP are wrong IMHO. > > > > It should be guarded by the following test in > > __udp_queue_rcv_skb() > > > > if (inet_sk(sk)->inet_daddr) { > > sock_rps_save_rxhash(sk, skb); > > sk_mark_napi_id(sk, skb); > > } > > > > (To occur only for connected UDP sockets, where we are 100% sure all > > packets will use this same rxhash/rx queue) > > This would also be safe if there is only one NIC and said NIC was > programmed to always place this socket's data on the same queue. > > I don't have a good suggestion on how to detect this. Well, this stuff relies on flows being correctly steered. TCP stack performs much better if we avoid reorders ;) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 09/08/2013 02:55, Eric Dumazet wrote: > On Thu, 2013-08-08 at 11:46 +0300, Eliezer Tamir wrote: >> On 07/08/2013 23:22, Eric Dumazet wrote: >>> sk_mark_napi_id() in UDP are wrong IMHO. >>> >>> It should be guarded by the following test in >>> __udp_queue_rcv_skb() >>> >>> if (inet_sk(sk)->inet_daddr) { >>> sock_rps_save_rxhash(sk, skb); >>> sk_mark_napi_id(sk, skb); >>> } >>> >>> (To occur only for connected UDP sockets, where we are 100% sure all >>> packets will use this same rxhash/rx queue) >> >> This would also be safe if there is only one NIC and said NIC was >> programmed to always place this socket's data on the same queue. >> >> I don't have a good suggestion on how to detect this. > > Well, this stuff relies on flows being correctly steered. > > TCP stack performs much better if we avoid reorders ;) Let's limit the discussion to UDP for now. If you are getting packets on multiple queues for a TCP socket, it's hard to see how you can avoid caused reordering. Maybe it would be enough to have a socket option that the user can set to tell the stack "I know what I'm doing, please allow busy polling on this UDP socket, even though it's not bound". -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index 766e6ba..0d0da17 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -1596,6 +1596,7 @@ static int __udp4_lib_mcast_deliver(struct net *net, struct sk_buff *skb, dif = skb->dev->ifindex; sk = udp_v4_mcast_next(net, sk, uh->dest, daddr, uh->source, saddr, dif); while (sk) { + sk_mark_napi_id(sk, skb); stack[count++] = sk; sk = udp_v4_mcast_next(net, sk_nulls_next(sk), uh->dest, daddr, uh->source, saddr, dif); diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c index f405815..82be372 100644 --- a/net/ipv6/udp.c +++ b/net/ipv6/udp.c @@ -756,6 +756,7 @@ static int __udp6_lib_mcast_deliver(struct net *net, struct sk_buff *skb, dif = inet6_iif(skb); sk = udp_v6_mcast_next(net, sk, uh->dest, daddr, uh->source, saddr, dif); while (sk) { + sk_mark_napi_id(sk, skb); stack[count++] = sk; sk = udp_v6_mcast_next(net, sk_nulls_next(sk), uh->dest, daddr, uh->source, saddr, dif);
Set the napi id for each socket in the multicast path to enable low-latency/polling support. Signed-off-by: Shawn Bohrer <sbohrer@rgmadvisors.com> --- v2 include ipv6 support net/ipv4/udp.c | 1 + net/ipv6/udp.c | 1 + 2 files changed, 2 insertions(+), 0 deletions(-)