[v3] vrf: Fix conntrack-dnat conflict in vrf-device PREROUTING hook

Message ID 1547251399-15997-1-git-send-email-wenxu@ucloud.cn
State Not Applicable
Delegated to: David Miller
Headers show
Series
  • [v3] vrf: Fix conntrack-dnat conflict in vrf-device PREROUTING hook
Related show

Commit Message

wenxu Jan. 12, 2019, 12:03 a.m.
From: wenxu <wenxu@ucloud.cn>

In the ip_rcv the skb go through the PREROUTING hook first,
Then jump in vrf device go through the same hook again.
When conntrack dnat work with vrf, there will be some conflict for rules.
Because the package go through the hook twice with different nf status

ip link add user1 type vrf table 1
ip link add user2 type vrf table 2
ip l set dev tun1 master user1
ip l set dev tun2 master user2

nft add table firewall
nft add chain firewall zones { type filter hook prerouting  priority - 300 \; }
nft add rule firewall zones counter ct zone set iif map { "tun1" : 1, "tun2" : 2 }
nft add chain firewall rule-1000-ingress
nft add rule firewall rule-1000-ingress ct zone 1 tcp dport 22 ct state new counter accept
nft add rule firewall rule-1000-ingress counter drop
nft add chain firewall rule-1000-egress
nft add rule firewall rule-1000-egress tcp dport 22 ct state new counter drop
nft add rule firewall rule-1000-egress counter accept

nft add chain firewall rules-all { type filter hook prerouting priority - 150 \; }
nft add rule firewall rules-all ip daddr vmap { "2.2.2.11" : jump rule-1000-ingress }
nft add rule firewall rules-all ct zone vmap { 1 : jump rule-1000-egress }

nft add rule firewall dnat-all ct zone vmap { 1 : jump dnat-1000 }
nft add rule firewall dnat-1000 ip daddr 2.2.2.11 counter dnat to 10.0.0.7

For a package with ip daddr 2.2.2.11 and tcp dport 22, first time accept in the
rule-1000-ingress and dnat to 10.0.0.7. Then second time the packet goto the wrong
chain rule-1000-egress which leads the packet drop

So This patch avoid already dnat packet go through the prerouting hook for the
second time.

Fixes: 73e20b761acf8 ("net: vrf: Add support for PREROUTING rules on vrf
device")
Signed-off-by: wenxu <wenxu@ucloud.cn>
---
 drivers/net/vrf.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

Comments

Pablo Neira Ayuso Jan. 14, 2019, 9:40 p.m. | #1
On Sat, Jan 12, 2019 at 08:03:19AM +0800, wenxu@ucloud.cn wrote:
> From: wenxu <wenxu@ucloud.cn>
> 
> In the ip_rcv the skb go through the PREROUTING hook first,
> Then jump in vrf device go through the same hook again.
> When conntrack dnat work with vrf, there will be some conflict for rules.
> Because the package go through the hook twice with different nf status

Then, the first hook applies NAT, while the second is simply ignored.

> ip link add user1 type vrf table 1
> ip link add user2 type vrf table 2
> ip l set dev tun1 master user1
> ip l set dev tun2 master user2
> 
> nft add table firewall
> nft add chain firewall zones { type filter hook prerouting  priority - 300 \; }
> nft add rule firewall zones counter ct zone set iif map { "tun1" : 1, "tun2" : 2 }
> nft add chain firewall rule-1000-ingress
> nft add rule firewall rule-1000-ingress ct zone 1 tcp dport 22 ct state new counter accept
> nft add rule firewall rule-1000-ingress counter drop
> nft add chain firewall rule-1000-egress
> nft add rule firewall rule-1000-egress tcp dport 22 ct state new counter drop
> nft add rule firewall rule-1000-egress counter accept
> 
> nft add chain firewall rules-all { type filter hook prerouting priority - 150 \; }
> nft add rule firewall rules-all ip daddr vmap { "2.2.2.11" : jump rule-1000-ingress }
> nft add rule firewall rules-all ct zone vmap { 1 : jump rule-1000-egress }
> 
> nft add rule firewall dnat-all ct zone vmap { 1 : jump dnat-1000 }
> nft add rule firewall dnat-1000 ip daddr 2.2.2.11 counter dnat to 10.0.0.7
> 
> For a package with ip daddr 2.2.2.11 and tcp dport 22, first time accept in the
> rule-1000-ingress and dnat to 10.0.0.7. Then second time the packet goto the wrong
> chain rule-1000-egress which leads the packet drop
> 
> So This patch avoid already dnat packet go through the prerouting hook for the
> second time.
> 
> Fixes: 73e20b761acf8 ("net: vrf: Add support for PREROUTING rules on vrf
> device")
> Signed-off-by: wenxu <wenxu@ucloud.cn>
> ---
>  drivers/net/vrf.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
> index 95909e2..9b01ca871 100644
> --- a/drivers/net/vrf.c
> +++ b/drivers/net/vrf.c
> @@ -38,6 +38,10 @@
>  #include <net/fib_rules.h>
>  #include <net/netns/generic.h>
>  
> +#if IS_ENABLED(CONFIG_NF_CONNTRACK)
> +#include <net/netfilter/nf_conntrack.h>
> +#endif
> +
>  #define DRV_NAME	"vrf"
>  #define DRV_VERSION	"1.0"
>  
> @@ -898,6 +902,14 @@ static struct sk_buff *vrf_rcv_nfhook(u8 pf, unsigned int hook,
>  				      struct net_device *dev)
>  {
>  	struct net *net = dev_net(dev);
> +#if IS_ENABLED(CONFIG_NF_CONNTRACK)
> +	enum ip_conntrack_info ctinfo;
> +	struct nf_conn *ct;
> +
> +	ct = nf_ct_get(skb, &ctinfo);
> +	if (ct && (ct->status & IPS_DST_NAT))
> +		return skb;
> +#endif

I think we need to scrub the packet here, from the beginning of the
vrf path. The vrf represents a new virtual space, not sure we should
be sharing connection tracking information between different vrf.
Florian Westphal Jan. 14, 2019, 10:15 p.m. | #2
Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> On Sat, Jan 12, 2019 at 08:03:19AM +0800, wenxu@ucloud.cn wrote:
> > From: wenxu <wenxu@ucloud.cn>
> > 
> > In the ip_rcv the skb go through the PREROUTING hook first,
> > Then jump in vrf device go through the same hook again.
> > When conntrack dnat work with vrf, there will be some conflict for rules.
> > Because the package go through the hook twice with different nf status
> 
> Then, the first hook applies NAT, while the second is simply ignored.

Yes, but re-entry occurs with munged addresses in case DNAT was applied.
I'm not sure about this patch either though.

If vrf is used, then it seems its enough to add a 'meta iifname vr+ accept'
rule to prevent false matches/re-invocation.
If the name isn't enough, I think we should consider extending meta to
query 'interface is vrf' so userspace can add the 'don't re-do entire
ruleset for vrf' policy itself.

I am not sure kernel should auto-enforce bypass based on conntrack
state, there is no precedence for this and I don't like
arbitrarily-chosen behaviour.

In bridge case (ingress), the bridge path doesn't run the inet (ipv4/ipv6)
hooks, so we don't have a double-invocation if packet gets pushed up the
stack.  Same for bond/team.

> > +#if IS_ENABLED(CONFIG_NF_CONNTRACK)
> > +	enum ip_conntrack_info ctinfo;
> > +	struct nf_conn *ct;
> > +
> > +	ct = nf_ct_get(skb, &ctinfo);
> > +	if (ct && (ct->status & IPS_DST_NAT))
> > +		return skb;
> > +#endif
> 
> I think we need to scrub the packet here, from the beginning of the
> vrf path. The vrf represents a new virtual space, not sure we should
> be sharing connection tracking information between different vrf.

Hmm.  I see nf_reset calls, but apparently only on output.

If we would scrub (but no packet rewrite occured) we would re-lookup the
exactly same conntrack entry we just discarded, so I don't see the
point (we would also double-account each packet with nf_acct=1).

If re-write occured, I think we would mess up conntrack table:

1. Wire format: saddr A to daddr B.
2. NAT is applied, daddr B gets rewritten to daddr C (DNAT took place).
Then conntrack has: ORIGIN A,B; REPLY C,A            (expects replies to
come from the new destination to original source and further packets from
A to B).

If we re-enter conntrack, then packet (still the original packet!) is
'saddr A to daddr c'.

So, unless I've made a thinko here, conntrack picks up a new flow:
ORIGIN A,C ;  REPLY C,A .

Unlike netns, where the 'old' conntrack incarnation is invisible (netns
is part of the conntrack key), there will now be a clash/collision at
confirm time, because there already is a 'REPLY C,A' in the table.
Pablo Neira Ayuso Jan. 15, 2019, 6:56 a.m. | #3
On Mon, Jan 14, 2019 at 11:15:10PM +0100, Florian Westphal wrote:
> Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > On Sat, Jan 12, 2019 at 08:03:19AM +0800, wenxu@ucloud.cn wrote:
> > > From: wenxu <wenxu@ucloud.cn>
> > > 
> > > In the ip_rcv the skb go through the PREROUTING hook first,
> > > Then jump in vrf device go through the same hook again.
> > > When conntrack dnat work with vrf, there will be some conflict for rules.
> > > Because the package go through the hook twice with different nf status
> > 
> > Then, the first hook applies NAT, while the second is simply ignored.
> 
> Yes, but re-entry occurs with munged addresses in case DNAT was applied.
> I'm not sure about this patch either though.
> 
> If vrf is used, then it seems its enough to add a 'meta iifname vr+ accept'
> rule to prevent false matches/re-invocation.

Then this is a misconfiguration issue.

> If the name isn't enough, I think we should consider extending meta to
> query 'interface is vrf' so userspace can add the 'don't re-do entire
> ruleset for vrf' policy itself.

This sounds good, this problem is solved via policy.

> I am not sure kernel should auto-enforce bypass based on conntrack
> state, there is no precedence for this and I don't like
> arbitrarily-chosen behaviour.

Agreed.

Patch

diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
index 95909e2..9b01ca871 100644
--- a/drivers/net/vrf.c
+++ b/drivers/net/vrf.c
@@ -38,6 +38,10 @@ 
 #include <net/fib_rules.h>
 #include <net/netns/generic.h>
 
+#if IS_ENABLED(CONFIG_NF_CONNTRACK)
+#include <net/netfilter/nf_conntrack.h>
+#endif
+
 #define DRV_NAME	"vrf"
 #define DRV_VERSION	"1.0"
 
@@ -898,6 +902,14 @@  static struct sk_buff *vrf_rcv_nfhook(u8 pf, unsigned int hook,
 				      struct net_device *dev)
 {
 	struct net *net = dev_net(dev);
+#if IS_ENABLED(CONFIG_NF_CONNTRACK)
+	enum ip_conntrack_info ctinfo;
+	struct nf_conn *ct;
+
+	ct = nf_ct_get(skb, &ctinfo);
+	if (ct && (ct->status & IPS_DST_NAT))
+		return skb;
+#endif
 
 	if (nf_hook(pf, hook, net, NULL, skb, dev, NULL, vrf_rcv_finish) != 1)
 		skb = NULL;    /* kfree_skb(skb) handled by nf code */