diff mbox

[net-next] inetpeer: Add support for VRFs

Message ID 1440339964-16075-1-git-send-email-dsa@cumulusnetworks.com
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

David Ahern Aug. 23, 2015, 2:26 p.m. UTC
inetpeer caches based on address only, so duplicate IP addresses within
a namespace return the same cached entry. Similar to IP fragments handle
duplicate addresses across VRFs by adding the VRF master device index to
the lookup.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
---
 include/net/inetpeer.h | 11 ++++++++++-
 net/ipv4/icmp.c        |  3 ++-
 net/ipv4/inetpeer.c    |  5 +++++
 net/ipv4/ip_fragment.c |  3 ++-
 net/ipv4/route.c       |  7 +++++--
 5 files changed, 24 insertions(+), 5 deletions(-)

Comments

Thomas Graf Aug. 24, 2015, 12:15 a.m. UTC | #1
On 08/23/15 at 08:26am, David Ahern wrote:
> inetpeer caches based on address only, so duplicate IP addresses within
> a namespace return the same cached entry. Similar to IP fragments handle
> duplicate addresses across VRFs by adding the VRF master device index to
> the lookup.

We have a lot of other places which use the address only. Are you
going to add the VRF id to all these places as well?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Ahern Aug. 24, 2015, 2:01 a.m. UTC | #2
On 8/23/15 6:15 PM, Thomas Graf wrote:
> On 08/23/15 at 08:26am, David Ahern wrote:
>> inetpeer caches based on address only, so duplicate IP addresses within
>> a namespace return the same cached entry. Similar to IP fragments handle
>> duplicate addresses across VRFs by adding the VRF master device index to
>> the lookup.
>
> We have a lot of other places which use the address only. Are you
> going to add the VRF id to all these places as well?
>

If appropriate, yes. I have fixed IP fragments and this patch fixes 
inetpeer cache. In both cases (L3 artifacts) the vrf device index 
provides the means to uniquely identify duplicate IP addresses within a 
namespace. If you know of other code that might be impacted I will 
investigate and fix as needed.

Thanks,
David
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Thomas Graf Aug. 25, 2015, 5:37 p.m. UTC | #3
On 08/23/15 at 08:01pm, David Ahern wrote:
> On 8/23/15 6:15 PM, Thomas Graf wrote:
> >On 08/23/15 at 08:26am, David Ahern wrote:
> >>inetpeer caches based on address only, so duplicate IP addresses within
> >>a namespace return the same cached entry. Similar to IP fragments handle
> >>duplicate addresses across VRFs by adding the VRF master device index to
> >>the lookup.
> >
> >We have a lot of other places which use the address only. Are you
> >going to add the VRF id to all these places as well?
> >
> 
> If appropriate, yes. I have fixed IP fragments and this patch fixes inetpeer
> cache. In both cases (L3 artifacts) the vrf device index provides the means
> to uniquely identify duplicate IP addresses within a namespace. If you know
> of other code that might be impacted I will investigate and fix as needed.

OK, then the question is what do you consider appropriate? ;-) An obvious
example is netfilter conntrack but eventually any decision based on an
address would require the VRF id if you want to go all the way.

I see the advantages over netns based VRF right now due to the lightweight
nature but if this turns out to require a new field in practically every
address datastructure then that seems not what we want.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Aug. 25, 2015, 8:47 p.m. UTC | #4
From: David Ahern <dsa@cumulusnetworks.com>
Date: Sun, 23 Aug 2015 20:01:34 -0600

> On 8/23/15 6:15 PM, Thomas Graf wrote:
>> On 08/23/15 at 08:26am, David Ahern wrote:
>>> inetpeer caches based on address only, so duplicate IP addresses
>>> within
>>> a namespace return the same cached entry. Similar to IP fragments
>>> handle
>>> duplicate addresses across VRFs by adding the VRF master device index
>>> to
>>> the lookup.
>>
>> We have a lot of other places which use the address only. Are you
>> going to add the VRF id to all these places as well?
>>
> 
> If appropriate, yes. I have fixed IP fragments and this patch fixes
> inetpeer cache. In both cases (L3 artifacts) the vrf device index
> provides the means to uniquely identify duplicate IP addresses within
> a namespace. If you know of other code that might be impacted I will
> investigate and fix as needed.

Anyways, what this inetpeer patch is doing is the wrong abstraction.

The key is really "daddr + netdev" so make a helper that works using
those arguments.

Then it is clear as we propagate this around that addresses need to
be coupled with the device in question in order to be keyed properly.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Ahern Aug. 25, 2015, 10:41 p.m. UTC | #5
On 8/25/15 1:47 PM, David Miller wrote:
> From: David Ahern <dsa@cumulusnetworks.com>
> Date: Sun, 23 Aug 2015 20:01:34 -0600
>
>> On 8/23/15 6:15 PM, Thomas Graf wrote:
>>> On 08/23/15 at 08:26am, David Ahern wrote:
>>>> inetpeer caches based on address only, so duplicate IP addresses
>>>> within
>>>> a namespace return the same cached entry. Similar to IP fragments
>>>> handle
>>>> duplicate addresses across VRFs by adding the VRF master device index
>>>> to
>>>> the lookup.
>>>
>>> We have a lot of other places which use the address only. Are you
>>> going to add the VRF id to all these places as well?
>>>
>>
>> If appropriate, yes. I have fixed IP fragments and this patch fixes
>> inetpeer cache. In both cases (L3 artifacts) the vrf device index
>> provides the means to uniquely identify duplicate IP addresses within
>> a namespace. If you know of other code that might be impacted I will
>> investigate and fix as needed.
>
> Anyways, what this inetpeer patch is doing is the wrong abstraction.
>
> The key is really "daddr + netdev" so make a helper that works using
> those arguments.

That's what I have here:

struct inetpeer_addr {
         struct inetpeer_addr_base       addr;
         __u16                           family;
#if IS_ENABLED(CONFIG_NET_VRF)
         int vif;
#endif
};

the addr_compare then checks the vif (VRF device index) after the N-word 
address compare.

>
> Then it is clear as we propagate this around that addresses need to
> be coupled with the device in question in order to be keyed properly.
>

Meaning rename struct inetpeer_addr to struct inetpeer_key and 
addr_compare to entry_compare or key_compare? Everything else still 
treats the address + VRF device as the key.

David
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Aug. 25, 2015, 10:52 p.m. UTC | #6
From: David Ahern <dsa@cumulusnetworks.com>
Date: Tue, 25 Aug 2015 15:41:36 -0700

> Meaning rename struct inetpeer_addr to struct inetpeer_key and
> addr_compare to entry_compare or key_compare?

I'm not talking about inetpeer specifically, but generally speaking
everywhere you're going to have to handle this including inetpeer.

So something like "inet4_daddr_key" which is a __be32 and the ifindex.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/net/inetpeer.h b/include/net/inetpeer.h
index 002f0bd27001..a75b648b8545 100644
--- a/include/net/inetpeer.h
+++ b/include/net/inetpeer.h
@@ -26,6 +26,9 @@  struct inetpeer_addr_base {
 struct inetpeer_addr {
 	struct inetpeer_addr_base	addr;
 	__u16				family;
+#if IS_ENABLED(CONFIG_NET_VRF)
+	int vif;
+#endif
 };
 
 struct inet_peer {
@@ -78,12 +81,15 @@  struct inet_peer *inet_getpeer(struct inet_peer_base *base,
 
 static inline struct inet_peer *inet_getpeer_v4(struct inet_peer_base *base,
 						__be32 v4daddr,
-						int create)
+						int vif, int create)
 {
 	struct inetpeer_addr daddr;
 
 	daddr.addr.a4 = v4daddr;
 	daddr.family = AF_INET;
+#if IS_ENABLED(CONFIG_NET_VRF)
+	daddr.vif = vif;
+#endif
 	return inet_getpeer(base, &daddr, create);
 }
 
@@ -95,6 +101,9 @@  static inline struct inet_peer *inet_getpeer_v6(struct inet_peer_base *base,
 
 	daddr.addr.in6 = *v6daddr;
 	daddr.family = AF_INET6;
+#if IS_ENABLED(CONFIG_NET_VRF)
+	daddr.vif = 0;   /* placeholder until VRF suppoort is added to IPv6 */
+#endif
 	return inet_getpeer(base, &daddr, create);
 }
 
diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index f16488efa1c8..79fe05befcae 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -309,9 +309,10 @@  static bool icmpv4_xrlim_allow(struct net *net, struct rtable *rt,
 
 	rc = false;
 	if (icmp_global_allow()) {
+		int vif = vrf_master_ifindex(dst->dev);
 		struct inet_peer *peer;
 
-		peer = inet_getpeer_v4(net->ipv4.peers, fl4->daddr, 1);
+		peer = inet_getpeer_v4(net->ipv4.peers, fl4->daddr, vif, 1);
 		rc = inet_peer_xrlim_allow(peer,
 					   net->ipv4.sysctl_icmp_ratelimit);
 		if (peer)
diff --git a/net/ipv4/inetpeer.c b/net/ipv4/inetpeer.c
index 241afd743d2c..b5f268a3ea6b 100644
--- a/net/ipv4/inetpeer.c
+++ b/net/ipv4/inetpeer.c
@@ -170,6 +170,11 @@  static int addr_compare(const struct inetpeer_addr *a,
 		return 1;
 	}
 
+#if IS_ENABLED(CONFIG_NET_VRF)
+	if (a->vif != b->vif)
+		return a->vif < b->vif ? -1 : 1;
+#endif
+
 	return 0;
 }
 
diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
index 15762e758861..fa7f15305f9a 100644
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -151,7 +151,8 @@  static void ip4_frag_init(struct inet_frag_queue *q, const void *a)
 	qp->vif = arg->vif;
 	qp->user = arg->user;
 	qp->peer = sysctl_ipfrag_max_dist ?
-		inet_getpeer_v4(net->ipv4.peers, arg->iph->saddr, 1) : NULL;
+		inet_getpeer_v4(net->ipv4.peers, arg->iph->saddr, arg->vif, 1) :
+		NULL;
 }
 
 static void ip4_frag_free(struct inet_frag_queue *q)
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 2403e85107f0..6805d57152b9 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -838,6 +838,7 @@  void ip_rt_send_redirect(struct sk_buff *skb)
 	struct inet_peer *peer;
 	struct net *net;
 	int log_martians;
+	int vif;
 
 	rcu_read_lock();
 	in_dev = __in_dev_get_rcu(rt->dst.dev);
@@ -846,10 +847,11 @@  void ip_rt_send_redirect(struct sk_buff *skb)
 		return;
 	}
 	log_martians = IN_DEV_LOG_MARTIANS(in_dev);
+	vif = vrf_master_ifindex_rcu(rt->dst.dev);
 	rcu_read_unlock();
 
 	net = dev_net(rt->dst.dev);
-	peer = inet_getpeer_v4(net->ipv4.peers, ip_hdr(skb)->saddr, 1);
+	peer = inet_getpeer_v4(net->ipv4.peers, ip_hdr(skb)->saddr, vif, 1);
 	if (!peer) {
 		icmp_send(skb, ICMP_REDIRECT, ICMP_REDIR_HOST,
 			  rt_nexthop(rt, ip_hdr(skb)->daddr));
@@ -938,7 +940,8 @@  static int ip_error(struct sk_buff *skb)
 		break;
 	}
 
-	peer = inet_getpeer_v4(net->ipv4.peers, ip_hdr(skb)->saddr, 1);
+	peer = inet_getpeer_v4(net->ipv4.peers, ip_hdr(skb)->saddr,
+			       vrf_master_ifindex(skb->dev), 1);
 
 	send = true;
 	if (peer) {