[nf-next] netfilter: conntrack: add support for flextuples

This patch adds support for the possibility of doing NAT with
conflicting IP address/ports tuples from multiple, isolated
tenants, represented as network namespaces and netfilter zones.
For such internal VRFs, traffic is directed to a single or shared
pool of public IP address/port range for the external/public VRF.

Or in other words, this allows for doing NAT *between* VRFs
instead of *inside* VRFs without requiring each tenant to NAT
twice or to use its own dedicated IP address to SNAT to, also
with the side effect to not requiring to expose a unique marker
per tenant in the data center to the public.

Simplified example scheme:

  +--- VRF A ---+  +--- CT Zone 1 --------+
  | 10.1.1.1/8  +--+ 10.1.1.1 ESTABLISHED |
  +-------------+  +--+-------------------+
                      |
                   +--+--+
                   | L3  +-SNAT-[20.1.1.1:20000-40000]--eth0
                   +--+--+
                      |
  +-- VRF B ----+  +--- CT Zone 2 --------+
  | 10.1.1.1/8  +--+ 10.1.1.1 ESTABLISHED |
  +-------------+  +----------------------+

VRF A and VRF B are two tenants, e.g. represented as a network
namespace. The connection state for each VRF is tracked separately
to implement differing policies, and thus results in one zone per
VRF. The operator does L3 between VRFs using any kind of L3 routing
entity. The NAT is being done in a separate zone so that global
policies can apply here that are tenant-independant.

The connection tracking table is a natural fit for this and the
VRF context is preserved in the flow by using a mark, which offers
high flexibility and can be configured/set based on any criteria.
The ability to selectively include the mark into the tuple match
for a particular direction, we call flextuple.

With the help of flextuples, we can implement the conflicting
IP address/ports tuples in the NAT zone; simplified example and
path traversal explanation:

  iptables -t nat -A POSTROUTING -o <dev> -j SNAT --to <IP>
  iptables -t raw -A PREROUTING -j CT --flextuple ORIGINAL

  iptables -t mangle -A PREROUTING -m conntrack --ctdir ORIGINAL -j CONNMARK --save-mark
  iptables -t mangle -A POSTROUTING -m conntrack --ctdir REPLY -j CONNMARK --restore-mark

  +--- VRF A ---+  +--- CT Zone 1 --------+
  | 10.1.1.1/8  +--+ 10.1.1.1 ESTABLISHED |
  +-------------+  +--+-------------------+
                      | v-- mark=A
                   +--+--+      +-- CT Zone 0 -(rev-mapping)-+
                   |  L3 +-SNAT-| <-- 10.1.1.1:20000, mark=A |
                   +--+--+      | <-- 10.1.1.1:20000, mark=B |
                      |         +----------------------------+
                      | ^-- mark=B
  +-- VRF B ----+  +--- CT Zone 2 --------+
  | 10.1.1.1/8  +--+ 10.1.1.1 ESTABLISHED |
  +-------------+  +----------------------+

The packet traversal for the outgoing direction, starting from
VRF A, is passing through the dedicated CT zone 1, where VRF A
specific firewalling policies apply, and are then being passed to
the L3 forwarder.

Based on the port/dev, the skb is marked with a unique tenant id
and directs the packets to the SNAT zone, where the conntracker
stores the skb->mark in the ct->mark and matches original traffic
on the flextuple.

A unique entry for reply traffic with the public IP/port mapping
is created. When reverse NAT'ing is being done, the ct->mark is
stored back into the skb->mark per above rule, and pushed back to
the L3 entity that knows to which tenant to forward the skb with
help of mark-based routing.

The implementation is rather straight forward, the only requirement
is to support generic flextuples infrastructure for the connection
tracker, no changes to NAT need to be done.

For the connection tracker, the flextuple direction is configured
through the jump target in the raw table, and will store the flag
in a ct template, which is being picked up when a real connection
is created, so that this can be determined by the matcher.

For users not configuring a flextuple, there's no change in
behaviour. Moreover, usage of flextuples does not have an increase
in the memory footprint for a connection tracking entry.

Joint work with Thomas Graf and Madhu Challa, also thanks to
Florian Westphal for input.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Madhu Challa <challa@noironetworks.com>
---
 include/net/netfilter/nf_conntrack.h               | 24 ++++++++++++++
 include/net/netfilter/nf_conntrack_core.h          |  2 +-
 include/uapi/linux/netfilter/nf_conntrack_common.h |  7 ++++
 include/uapi/linux/netfilter/xt_CT.h               |  7 +++-
 net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c     |  3 +-
 net/ipv4/netfilter/nf_conntrack_proto_icmp.c       |  2 +-
 net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c     |  3 +-
 net/ipv6/netfilter/nf_conntrack_proto_icmpv6.c     |  2 +-
 net/netfilter/ipvs/ip_vs_nfct.c                    |  2 +-
 net/netfilter/nf_conntrack_core.c                  | 37 +++++++++++++++-------
 net/netfilter/nf_conntrack_netlink.c               | 14 ++++----
 net/netfilter/nf_conntrack_pptp.c                  |  2 +-
 net/netfilter/xt_CT.c                              |  5 +++
 net/netfilter/xt_connlimit.c                       | 17 +++++-----
 net/sched/act_connmark.c                           |  3 +-
 15 files changed, 94 insertions(+), 36 deletions(-)

Message ID	776b8819c85c83088478b933a35691133055347a.1430733932.git.daniel@iogearbox.net
State	Changes Requested
Delegated to:	Pablo Neira
Headers	show Return-Path: <netfilter-devel-owner@vger.kernel.org> X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 7DBF2140157 for <incoming@patchwork.ozlabs.org>; Mon, 4 May 2015 20:23:55 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752446AbbEDKXs (ORCPT <rfc822;incoming@patchwork.ozlabs.org>); Mon, 4 May 2015 06:23:48 -0400 Received: from www62.your-server.de ([213.133.104.62]:38106 "EHLO www62.your-server.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751344AbbEDKXr (ORCPT <rfc822;netfilter-devel@vger.kernel.org>); Mon, 4 May 2015 06:23:47 -0400 Received: from [83.76.24.107] (helo=localhost) by www62.your-server.de with esmtpsa (TLSv1.2:DHE-RSA-AES128-GCM-SHA256:128) (Exim 4.80.1) (envelope-from <daniel@iogearbox.net>) id 1YpDXL-0004iq-Oy; Mon, 04 May 2015 12:23:43 +0200 From: Daniel Borkmann <daniel@iogearbox.net> To: pablo@netfilter.org Cc: netfilter-devel@vger.kernel.org, Daniel Borkmann <daniel@iogearbox.net>, Thomas Graf <tgraf@suug.ch>, Madhu Challa <challa@noironetworks.com> Subject: [PATCH nf-next] netfilter: conntrack: add support for flextuples Date: Mon, 4 May 2015 12:23:41 +0200 Message-Id: <776b8819c85c83088478b933a35691133055347a.1430733932.git.daniel@iogearbox.net> X-Mailer: git-send-email 1.9.3 X-Authenticated-Sender: daniel@iogearbox.net X-Virus-Scanned: Clear (ClamAV 0.98.5/20413/Mon May 4 09:37:27 2015) Sender: netfilter-devel-owner@vger.kernel.org Precedence: bulk List-ID: <netfilter-devel.vger.kernel.org> X-Mailing-List: netfilter-devel@vger.kernel.org

[nf-next] netfilter: conntrack: add support for flextuples

Commit Message

Comments

Patch