diff mbox

[net-next] udp: Add socket early demux support

Message ID 1340739826-3363-1-git-send-email-subramanian.vijay@gmail.com
State Rejected, archived
Delegated to: David Miller
Headers show

Commit Message

Vijay Subramanian June 26, 2012, 7:43 p.m. UTC
Based on the recent TCP socket early demux code, this patch provides similar
support for UDP.

Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com>
---
This has been tested on x86 with UDP iperf flows and seemed to work. If this is
accepted, I plan to submit one more patch moving common code from TCP and UDP
early demux code into common helper functions.
Thanks in advance for feedback.

 include/net/udp.h  |    1 +
 net/ipv4/af_inet.c |    1 +
 net/ipv4/udp.c     |   67 ++++++++++++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 67 insertions(+), 2 deletions(-)

Comments

Eric Dumazet June 26, 2012, 8:49 p.m. UTC | #1
On Tue, 2012-06-26 at 12:43 -0700, Vijay Subramanian wrote:
> Based on the recent TCP socket early demux code, this patch provides similar
> support for UDP.
> 
> Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com>
> ---
> This has been tested on x86 with UDP iperf flows and seemed to work. If this is
> accepted, I plan to submit one more patch moving common code from TCP and UDP
> early demux code into common helper functions.
> Thanks in advance for feedback.

Hmm... I cant see how it can work.

Have you tested a router can still route DNS packets if you also have a
DNS server on it, listening on 0.0.0.0:53 ?

Most UDP applications are using unconnected sockets.
(In fact few programmers are aware UDP sockets can be connected)

Try to bench how this is going to work with random IP sources, instead
of UDP iperf flows using a single IP ?

And what about multicast ?



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller June 26, 2012, 9:34 p.m. UTC | #2
You can't do this.

If the UDP socket has wildcards, that means the source address of the
route will not be validated.  This means we will start accepting
spoofed packets.  It also means the route you are caching is going
to be the wrong route since the keys are variable.

You can only do an early demux where all the keys are fully specified
and there are no wildcards.  That why for TCP we only early demux for
established sockets.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Vijay Subramanian June 27, 2012, 12:06 a.m. UTC | #3
Thanks for the reviews and feedback, Dave and Eric. They were very helpful.

On 26 June 2012 14:34, David Miller <davem@davemloft.net> wrote:
>
> You can't do this.
>
> If the UDP socket has wildcards, that means the source address of the
> route will not be validated.  This means we will start accepting
> spoofed packets.  It also means the route you are caching is going
> to be the wrong route since the keys are variable.

Thanks for this explanation.
I see why Eric wanted me to test with DNS server bound to wildcard
address. I guess the same issue exists with multicast too which I had
not even considered.

>
> You can only do an early demux where all the keys are fully specified
> and there are no wildcards.  That why for TCP we only early demux for
> established sockets.

I guess for UDP, early demux will work only if server binds to a
specific address and port instead of wildcard. But I believe
a lot of UDP servers use wildcard addresses, so use of early demux for
UDP may be limited.

Thanks,
Vijay
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Changli Gao June 27, 2012, 2:29 a.m. UTC | #4
On Wed, Jun 27, 2012 at 5:34 AM, David Miller <davem@davemloft.net> wrote:
>
> You can't do this.
>
> If the UDP socket has wildcards, that means the source address of the
> route will not be validated.  This means we will start accepting
> spoofed packets.  It also means the route you are caching is going
> to be the wrong route since the keys are variable.
>
> You can only do an early demux where all the keys are fully specified
> and there are no wildcards.  That why for TCP we only early demux for
> established sockets.

How about implementing the whole early demux infrastructure on
iptables/netfilter like rpfilter and TPROXY, then users have more
controls.
David Miller June 27, 2012, 3:59 a.m. UTC | #5
From: Changli Gao <xiaosuo@gmail.com>
Date: Wed, 27 Jun 2012 10:29:19 +0800

> How about implementing the whole early demux infrastructure on
> iptables/netfilter like rpfilter and TPROXY, then users have more
> controls.

Sorry, no.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/net/udp.h b/include/net/udp.h
index 065f379..e0ed11d 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -175,6 +175,7 @@  extern int udp_lib_get_port(struct sock *sk, unsigned short snum,
 			    unsigned int hash2_nulladdr);
 
 /* net/ipv4/udp.c */
+extern int udp_v4_early_demux(struct sk_buff *skb);
 extern int udp_get_port(struct sock *sk, unsigned short snum,
 			int (*saddr_cmp)(const struct sock *,
 					 const struct sock *));
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 07a02f6..c7d40db 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1531,6 +1531,7 @@  static const struct net_protocol tcp_protocol = {
 };
 
 static const struct net_protocol udp_protocol = {
+	.early_demux =	udp_v4_early_demux,
 	.handler =	udp_rcv,
 	.err_handler =	udp_err,
 	.gso_send_check = udp4_ufo_send_check,
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index db017ef..bd85cd8 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -520,10 +520,10 @@  static inline struct sock *__udp4_lib_lookup_skb(struct sk_buff *skb,
 						 __be16 sport, __be16 dport,
 						 struct udp_table *udptable)
 {
-	struct sock *sk;
+	struct sock *sk = skb_steal_sock(skb);
 	const struct iphdr *iph = ip_hdr(skb);
 
-	if (unlikely(sk = skb_steal_sock(skb)))
+	if (sk)
 		return sk;
 	else
 		return __udp4_lib_lookup(dev_net(skb_dst(skb)->dev), iph->saddr, sport,
@@ -1403,6 +1403,19 @@  int udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
 	int rc;
 	int is_udplite = IS_UDPLITE(sk);
 
+	if (sk->sk_rx_dst) {
+		struct dst_entry *dst = sk->sk_rx_dst;
+		if (unlikely(dst->obsolete)) {
+			if (dst->ops->check(dst, 0) == NULL) {
+				dst_release(dst);
+				sk->sk_rx_dst = NULL;
+			}
+		}
+	}
+
+	if (unlikely(sk->sk_rx_dst == NULL))
+		sk->sk_rx_dst = dst_clone(skb_dst(skb));
+
 	/*
 	 *	Charge it to the socket, dropping if the queue is full.
 	 */
@@ -1622,6 +1635,56 @@  static inline int udp4_csum_init(struct sk_buff *skb, struct udphdr *uh,
 	return 0;
 }
 
+int udp_v4_early_demux(struct sk_buff *skb)
+{
+	struct net *net = dev_net(skb->dev);
+	const struct iphdr *iph;
+	const struct udphdr *uh;
+	struct net_device *dev;
+	struct sock *sk;
+	unsigned short ulen;
+	int err;
+
+	err = -ENOENT;
+	if (skb->pkt_type != PACKET_HOST)
+		goto out_err;
+
+	if (!pskb_may_pull(skb, ip_hdrlen(skb) + sizeof(struct udphdr)))
+		goto out_err;
+
+	iph = ip_hdr(skb);
+	uh = (struct udphdr *)((char *)iph + ip_hdrlen(skb));
+	ulen = ntohs(uh->len);
+
+	if (ulen > skb->len)
+		goto out_err;
+
+	dev = skb->dev;
+
+	sk = udp4_lib_lookup(net, iph->saddr, uh->source, iph->daddr,
+			     uh->dest, dev->ifindex);
+
+	if (sk) {
+		struct dst_entry *dst = sk->sk_rx_dst;
+		skb->sk = sk;
+		skb->destructor = sock_edemux;
+
+		if (dst)
+			dst = dst_check(dst, 0);
+		if (dst) {
+			struct rtable *rt = (struct rtable *)dst;
+
+			if (rt->rt_iif == dev->ifindex) {
+				skb_dst_set_noref(skb, dst);
+				err = 0;
+			}
+		}
+	}
+
+out_err:
+	return err;
+}
+
 /*
  *	All we need to do is get the socket, and then do a checksum.
  */