Message ID | 1340739826-3363-1-git-send-email-subramanian.vijay@gmail.com |
---|---|
State | Rejected, archived |
Delegated to: | David Miller |
Headers | show |
On Tue, 2012-06-26 at 12:43 -0700, Vijay Subramanian wrote: > Based on the recent TCP socket early demux code, this patch provides similar > support for UDP. > > Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com> > --- > This has been tested on x86 with UDP iperf flows and seemed to work. If this is > accepted, I plan to submit one more patch moving common code from TCP and UDP > early demux code into common helper functions. > Thanks in advance for feedback. Hmm... I cant see how it can work. Have you tested a router can still route DNS packets if you also have a DNS server on it, listening on 0.0.0.0:53 ? Most UDP applications are using unconnected sockets. (In fact few programmers are aware UDP sockets can be connected) Try to bench how this is going to work with random IP sources, instead of UDP iperf flows using a single IP ? And what about multicast ? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
You can't do this. If the UDP socket has wildcards, that means the source address of the route will not be validated. This means we will start accepting spoofed packets. It also means the route you are caching is going to be the wrong route since the keys are variable. You can only do an early demux where all the keys are fully specified and there are no wildcards. That why for TCP we only early demux for established sockets. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Thanks for the reviews and feedback, Dave and Eric. They were very helpful. On 26 June 2012 14:34, David Miller <davem@davemloft.net> wrote: > > You can't do this. > > If the UDP socket has wildcards, that means the source address of the > route will not be validated. This means we will start accepting > spoofed packets. It also means the route you are caching is going > to be the wrong route since the keys are variable. Thanks for this explanation. I see why Eric wanted me to test with DNS server bound to wildcard address. I guess the same issue exists with multicast too which I had not even considered. > > You can only do an early demux where all the keys are fully specified > and there are no wildcards. That why for TCP we only early demux for > established sockets. I guess for UDP, early demux will work only if server binds to a specific address and port instead of wildcard. But I believe a lot of UDP servers use wildcard addresses, so use of early demux for UDP may be limited. Thanks, Vijay -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jun 27, 2012 at 5:34 AM, David Miller <davem@davemloft.net> wrote: > > You can't do this. > > If the UDP socket has wildcards, that means the source address of the > route will not be validated. This means we will start accepting > spoofed packets. It also means the route you are caching is going > to be the wrong route since the keys are variable. > > You can only do an early demux where all the keys are fully specified > and there are no wildcards. That why for TCP we only early demux for > established sockets. How about implementing the whole early demux infrastructure on iptables/netfilter like rpfilter and TPROXY, then users have more controls.
From: Changli Gao <xiaosuo@gmail.com> Date: Wed, 27 Jun 2012 10:29:19 +0800 > How about implementing the whole early demux infrastructure on > iptables/netfilter like rpfilter and TPROXY, then users have more > controls. Sorry, no. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/include/net/udp.h b/include/net/udp.h index 065f379..e0ed11d 100644 --- a/include/net/udp.h +++ b/include/net/udp.h @@ -175,6 +175,7 @@ extern int udp_lib_get_port(struct sock *sk, unsigned short snum, unsigned int hash2_nulladdr); /* net/ipv4/udp.c */ +extern int udp_v4_early_demux(struct sk_buff *skb); extern int udp_get_port(struct sock *sk, unsigned short snum, int (*saddr_cmp)(const struct sock *, const struct sock *)); diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c index 07a02f6..c7d40db 100644 --- a/net/ipv4/af_inet.c +++ b/net/ipv4/af_inet.c @@ -1531,6 +1531,7 @@ static const struct net_protocol tcp_protocol = { }; static const struct net_protocol udp_protocol = { + .early_demux = udp_v4_early_demux, .handler = udp_rcv, .err_handler = udp_err, .gso_send_check = udp4_ufo_send_check, diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index db017ef..bd85cd8 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -520,10 +520,10 @@ static inline struct sock *__udp4_lib_lookup_skb(struct sk_buff *skb, __be16 sport, __be16 dport, struct udp_table *udptable) { - struct sock *sk; + struct sock *sk = skb_steal_sock(skb); const struct iphdr *iph = ip_hdr(skb); - if (unlikely(sk = skb_steal_sock(skb))) + if (sk) return sk; else return __udp4_lib_lookup(dev_net(skb_dst(skb)->dev), iph->saddr, sport, @@ -1403,6 +1403,19 @@ int udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb) int rc; int is_udplite = IS_UDPLITE(sk); + if (sk->sk_rx_dst) { + struct dst_entry *dst = sk->sk_rx_dst; + if (unlikely(dst->obsolete)) { + if (dst->ops->check(dst, 0) == NULL) { + dst_release(dst); + sk->sk_rx_dst = NULL; + } + } + } + + if (unlikely(sk->sk_rx_dst == NULL)) + sk->sk_rx_dst = dst_clone(skb_dst(skb)); + /* * Charge it to the socket, dropping if the queue is full. */ @@ -1622,6 +1635,56 @@ static inline int udp4_csum_init(struct sk_buff *skb, struct udphdr *uh, return 0; } +int udp_v4_early_demux(struct sk_buff *skb) +{ + struct net *net = dev_net(skb->dev); + const struct iphdr *iph; + const struct udphdr *uh; + struct net_device *dev; + struct sock *sk; + unsigned short ulen; + int err; + + err = -ENOENT; + if (skb->pkt_type != PACKET_HOST) + goto out_err; + + if (!pskb_may_pull(skb, ip_hdrlen(skb) + sizeof(struct udphdr))) + goto out_err; + + iph = ip_hdr(skb); + uh = (struct udphdr *)((char *)iph + ip_hdrlen(skb)); + ulen = ntohs(uh->len); + + if (ulen > skb->len) + goto out_err; + + dev = skb->dev; + + sk = udp4_lib_lookup(net, iph->saddr, uh->source, iph->daddr, + uh->dest, dev->ifindex); + + if (sk) { + struct dst_entry *dst = sk->sk_rx_dst; + skb->sk = sk; + skb->destructor = sock_edemux; + + if (dst) + dst = dst_check(dst, 0); + if (dst) { + struct rtable *rt = (struct rtable *)dst; + + if (rt->rt_iif == dev->ifindex) { + skb_dst_set_noref(skb, dst); + err = 0; + } + } + } + +out_err: + return err; +} + /* * All we need to do is get the socket, and then do a checksum. */
Based on the recent TCP socket early demux code, this patch provides similar support for UDP. Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com> --- This has been tested on x86 with UDP iperf flows and seemed to work. If this is accepted, I plan to submit one more patch moving common code from TCP and UDP early demux code into common helper functions. Thanks in advance for feedback. include/net/udp.h | 1 + net/ipv4/af_inet.c | 1 + net/ipv4/udp.c | 67 ++++++++++++++++++++++++++++++++++++++++++++++++++- 3 files changed, 67 insertions(+), 2 deletions(-)