[ovs-dev,branch-2.7,16/25] datapath: Use inverted tuple in ovs_ct_find_existing() if NATted.

Message ID 1489620689-122370-17-git-send-email-jarno@ovn.org
State Superseded
Headers show

Commit Message

Jarno Rajahalme March 15, 2017, 11:31 p.m.
Upstream commit:

    commit 9ff464db50e437eef131f719cc2e9902eea9c607
    Author: Jarno Rajahalme <jarno@ovn.org>
    Date:   Thu Feb 9 11:21:53 2017 -0800

    openvswitch: Use inverted tuple in ovs_ct_find_existing() if NATted.

    The conntrack lookup for existing connections fails to invert the
    packet 5-tuple for NATted packets, and therefore fails to find the
    existing conntrack entry.  Conntrack only stores 5-tuples for incoming
    packets, and there are various situations where a lookup on a packet
    that has already been transformed by NAT needs to be made.  Looking up
    an existing conntrack entry upon executing packet received from the
    userspace is one of them.

    This patch fixes ovs_ct_find_existing() to invert the packet 5-tuple
    for the conntrack lookup whenever the packet has already been
    transformed by conntrack from its input form as evidenced by one of
    the NAT flags being set in the conntrack state metadata.

    Fixes: 05752523e565 ("openvswitch: Interface with NAT.")
    Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
    Acked-by: Joe Stringer <joe@ovn.org>
    Acked-by: Pravin B Shelar <pshelar@ovn.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

This patch also adds a test case to OVS system tests to verify the
behavior.

The following is a more thorough explanation of what is going on:

When we have evidence that an existing conntrack entry could exist, we
must invert the tuple if NAT has already been applied, as the current
packet headers do not match any tuple stored in conntrack.  For
example, if a packet from private address X to a public address B is
source-NATted to A, the conntrack entry will have the following tuples
(ignoring the protocol and port numbers) after the conntrack entry is
committed:

Original direction tuple: (X,B)
Reply direction tuple: (B,A)

Now, if a reply packet is already transformed back to the private
address space (e.g., with a CT(nat) action), the tuple corresponding
to the current packet headers is:

Current packet tuple: (B,X)

This does not match either of the conntrack tuples above.  Normally
this does not matter, as the conntrack lookup was already done using
the tuple (B,A), but if the current packet does not match any flow in
the OVS datapath, the packet is sent to userspace via an upcall,
during which the packet's skb is freed, and the conntrack entry
pointer in the skb is lost.  When the packet is reintroduced to the
datapath, any further conntrack action will need to perform a new
conntrack lookup to find the entry again.  Prior to this patch this
second lookup failed.  The datapath flow setup corresponding to the
upcall can succeed, however, allowing all further packets in the reply
direction to re-use the conntrack entry pointer in the skb, so
typically the lookup failure only causes a packet drop.

The solution is to invert the tuple derived from the current packet
headers in case the conntrack state stored in the packet metadata
indicates that the packet has been transformed by NAT:

Inverted tuple: (X,B)

With this the conntrack entry can be found, matching the original
direction tuple.

This same logic also works for the original direction packets:

Current packet tuple (after reverse NAT): (A,B)
Inverted tuple: (B,A)

While the current packet tuple (A,B) does not match either of the
conntrack tuples, the inverted one (B,A) does match the reply
direction tuple.

Since the inverted tuple matches the reverse direction tuple the
direction of the packet must be reversed as well.

Fixes: c5f6c06b58d6 ("datapath: Interface with NAT.")
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
---
 datapath/conntrack.c    | 24 +++++++++++++++++++++--
 tests/system-traffic.at | 52 +++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 74 insertions(+), 2 deletions(-)

Patch

diff --git a/datapath/conntrack.c b/datapath/conntrack.c
index 9e34af9..08e5eab 100644
--- a/datapath/conntrack.c
+++ b/datapath/conntrack.c
@@ -471,7 +471,7 @@  ovs_ct_get_info(const struct nf_conntrack_tuple_hash *h)
  */
 static struct nf_conn *
 ovs_ct_find_existing(struct net *net, const struct nf_conntrack_zone *zone,
-		     u8 l3num, struct sk_buff *skb)
+		     u8 l3num, struct sk_buff *skb, bool natted)
 {
 	struct nf_conntrack_l3proto *l3proto;
 	struct nf_conntrack_l4proto *l4proto;
@@ -494,6 +494,17 @@  ovs_ct_find_existing(struct net *net, const struct nf_conntrack_zone *zone,
 		return NULL;
 	}
 
+	/* Must invert the tuple if skb has been transformed by NAT. */
+	if (natted) {
+		struct nf_conntrack_tuple inverse;
+
+		if (!nf_ct_invert_tuple(&inverse, &tuple, l3proto, l4proto)) {
+			pr_debug("ovs_ct_find_existing: Inversion failed!\n");
+			return NULL;
+		}
+		tuple = inverse;
+	}
+
 	/* look for tuple match */
 	h = nf_conntrack_find_get(net, zone, &tuple);
 	if (!h)
@@ -501,6 +512,13 @@  ovs_ct_find_existing(struct net *net, const struct nf_conntrack_zone *zone,
 
 	ct = nf_ct_tuplehash_to_ctrack(h);
 
+	/* Inverted packet tuple matches the reverse direction conntrack tuple,
+	 * select the other tuplehash to get the right 'ctinfo' bits for this
+	 * packet.
+	 */
+	if (natted)
+		h = &ct->tuplehash[!h->tuple.dst.dir];
+
 	nf_ct_set(skb, ct, ovs_ct_get_info(h));
 	return ct;
 }
@@ -523,7 +541,9 @@  static bool skb_nfct_cached(struct net *net,
 	if (!ct && key->ct.state & OVS_CS_F_TRACKED &&
 	    !(key->ct.state & OVS_CS_F_INVALID) &&
 	    key->ct.zone == info->zone.id)
-		ct = ovs_ct_find_existing(net, &info->zone, info->family, skb);
+		ct = ovs_ct_find_existing(net, &info->zone, info->family, skb,
+					  !!(key->ct.state
+					     & OVS_CS_F_NAT_MASK));
 	if (!ct)
 		return false;
 	if (!net_eq(net, read_pnet(&ct->ct_net)))
diff --git a/tests/system-traffic.at b/tests/system-traffic.at
index 29dd6d6..bb7a497 100644
--- a/tests/system-traffic.at
+++ b/tests/system-traffic.at
@@ -2314,6 +2314,58 @@  tcp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=<cleared>,dport=<cleared>),reply=(src=
 OVS_TRAFFIC_VSWITCHD_STOP
 AT_CLEANUP
 
+AT_SETUP([conntrack - SNAT with ct_mark change on reply])
+CHECK_CONNTRACK()
+CHECK_CONNTRACK_NAT()
+OVS_TRAFFIC_VSWITCHD_START()
+
+ADD_NAMESPACES(at_ns0, at_ns1)
+
+ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24")
+NS_CHECK_EXEC([at_ns0], [ip link set dev p0 address 80:88:88:88:88:88])
+ADD_VETH(p1, at_ns1, br0, "10.1.1.2/24")
+
+dnl Allow any traffic from ns0->ns1. Only allow nd, return traffic from ns1->ns0.
+AT_DATA([flows.txt], [dnl
+in_port=1,ip,action=ct(commit,zone=1,nat(src=10.1.1.240-10.1.1.255)),2
+in_port=2,ct_state=-trk,ip,action=ct(table=0,zone=1,nat)
+dnl
+dnl Setting the mark fails if the datapath can't find the existing conntrack
+dnl entry after NAT has been reversed and the skb was lost due to an upcall.
+dnl
+in_port=2,ct_state=+trk,ct_zone=1,ip,action=ct(table=1,commit,zone=1,exec(set_field:1->ct_mark)),1
+table=1,in_port=2,ct_mark=1,ct_state=+rpl,ct_zone=1,ip,action=1
+dnl
+dnl ARP
+priority=100 arp arp_op=1 action=move:OXM_OF_ARP_TPA[[]]->NXM_NX_REG2[[]],resubmit(,8),goto_table:10
+priority=10 arp action=normal
+priority=0,action=drop
+dnl
+dnl MAC resolution table for IP in reg2, stores mac in OXM_OF_PKT_REG0
+table=8,reg2=0x0a0101f0/0xfffffff0,action=load:0x808888888888->OXM_OF_PKT_REG0[[]]
+table=8,priority=0,action=load:0->OXM_OF_PKT_REG0[[]]
+dnl ARP responder mac filled in at OXM_OF_PKT_REG0, or 0 for normal action.
+dnl TPA IP in reg2.
+dnl Swaps the fields of the ARP message to turn a query to a response.
+table=10 priority=100 arp xreg0=0 action=normal
+table=10 priority=10,arp,arp_op=1,action=load:2->OXM_OF_ARP_OP[[]],move:OXM_OF_ARP_SHA[[]]->OXM_OF_ARP_THA[[]],move:OXM_OF_PKT_REG0[[0..47]]->OXM_OF_ARP_SHA[[]],move:OXM_OF_ARP_SPA[[]]->OXM_OF_ARP_TPA[[]],move:NXM_NX_REG2[[]]->OXM_OF_ARP_SPA[[]],move:NXM_OF_ETH_SRC[[]]->NXM_OF_ETH_DST[[]],move:OXM_OF_PKT_REG0[[0..47]]->NXM_OF_ETH_SRC[[]],move:NXM_OF_IN_PORT[[]]->NXM_NX_REG3[[0..15]],load:0->NXM_OF_IN_PORT[[]],output:NXM_NX_REG3[[0..15]]
+table=10 priority=0 action=drop
+])
+
+AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
+
+dnl HTTP requests from p0->p1 should work fine.
+NETNS_DAEMONIZE([at_ns1], [[$PYTHON $srcdir/test-l7.py]], [http0.pid])
+NS_CHECK_EXEC([at_ns0], [ping -c 1 10.1.1.2 | FORMAT_PING], [0], [dnl
+1 packets transmitted, 1 received, 0% packet loss, time 0ms
+])
+
+AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(10.1.1.2) | sed -e 's/dst=10.1.1.2[[45]][[0-9]]/dst=10.1.1.2XX/'], [0], [dnl
+icmp,orig=(src=10.1.1.1,dst=10.1.1.2,id=<cleared>,type=8,code=0),reply=(src=10.1.1.2,dst=10.1.1.2XX,id=<cleared>,type=0,code=0),zone=1,mark=1
+])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
 
 AT_SETUP([conntrack - SNAT with port range])
 CHECK_CONNTRACK()