From patchwork Sat Sep 5 01:04:20 2015
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Justin Pettit
X-Patchwork-Id: 514782
Return-Path:
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Received: from archives.nicira.com (li376-54.members.linode.com
[96.126.127.54]) by ozlabs.org (Postfix) with ESMTP id 7914114029C
for ;
Sat, 5 Sep 2015 11:04:29 +1000 (AEST)
Received: from archives.nicira.com (localhost [127.0.0.1])
by archives.nicira.com (Postfix) with ESMTP id 39DD510C0C;
Fri, 4 Sep 2015 18:04:28 -0700 (PDT)
X-Original-To: dev@openvswitch.org
Delivered-To: dev@openvswitch.org
Received: from mx1e3.cudamail.com (mx1.cudamail.com [69.90.118.67])
by archives.nicira.com (Postfix) with ESMTPS id 2BBAB10C0A
for ; Fri, 4 Sep 2015 18:04:27 -0700 (PDT)
Received: from bar5.cudamail.com (localhost [127.0.0.1])
by mx1e3.cudamail.com (Postfix) with ESMTPS id 62B3F420326
for ; Fri, 4 Sep 2015 19:04:26 -0600 (MDT)
X-ASG-Debug-ID: 1441415065-09eadd069586c20001-byXFYA
Received: from mx1-pf1.cudamail.com ([192.168.24.1]) by bar5.cudamail.com
with
ESMTP id OeVp6kB6J56D6bV1 (version=TLSv1 cipher=DHE-RSA-AES256-SHA
bits=256 verify=NO) for ;
Fri, 04 Sep 2015 19:04:25 -0600 (MDT)
X-Barracuda-Envelope-From: jpettit@nicira.com
X-Barracuda-RBL-Trusted-Forwarder: 192.168.24.1
Received: from unknown (HELO mail-pa0-f52.google.com) (209.85.220.52)
by mx1-pf1.cudamail.com with ESMTPS (RC4-SHA encrypted);
5 Sep 2015 01:04:25 -0000
Received-SPF: unknown (mx1-pf1.cudamail.com: Multiple SPF records returned)
X-Barracuda-RBL-Trusted-Forwarder: 209.85.220.52
Received: by padhy16 with SMTP id hy16so36622292pad.1
for ; Fri, 04 Sep 2015 18:04:24 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=1e100.net; s=20130820;
h=x-gm-message-state:from:to:subject:date:message-id;
bh=9yuc6tihjz92TE6DSWQ5FGU8jbXv4+YsJSOkOChTu00=;
b=EEGxPxlpLeXzVUmLMEGTFdooNyq811ZkJDthOx6TmwJUCdkyox0pemMF8nqbTuZ2md
v5rPha7nOUXAzvL/o7iSqYsVIfcNHoL7zswStDAVlAdQh5Oxd3KyHxGx1uYOuLarN2HL
dr9DXwIP86pAd8YLtkBhN6w660mLQBR9NWrHdmhrQ87Y7ykep/IYdEwwH0rRd+tUabje
lxjS3Sc6jMTxWrJ6fSSCduIZUTeUmqccCuhaPyBkFZ+ufPUyEeCvVa1lexMcAM05XPsE
+xK7m0lPoMfaaYau6kqmSWVi1wreUANjFVsERxjJRqZH22IuoVW4znMsUOlCWb8Cs0Cn
xlIA==
X-Gm-Message-State:
ALoCoQls5J3/3VDLeaI5+z37SwROPUhtPBCQJGU/cos2bhNml/VjhkU46oKXkr6agF63ZPEEajU8
X-Received: by 10.66.66.40 with SMTP id c8mr14576711pat.70.1441415064505;
Fri, 04 Sep 2015 18:04:24 -0700 (PDT)
Received: from localhost.localdomain ([208.91.2.4])
by smtp.gmail.com with ESMTPSA id
eg2sm3894415pad.44.2015.09.04.18.04.23
for (version=TLSv1/SSLv3 cipher=OTHER);
Fri, 04 Sep 2015 18:04:23 -0700 (PDT)
X-CudaMail-Envelope-Sender: jpettit@nicira.com
X-Barracuda-Apparent-Source-IP: 208.91.2.4
From: Justin Pettit
To: dev@openvswitch.org
X-CudaMail-Whitelist-To: dev@openvswitch.org
X-CudaMail-MID: CM-E1-903079072
X-CudaMail-DTE: 090415
X-CudaMail-Originating-IP: 209.85.220.52
Date: Fri, 4 Sep 2015 18:04:20 -0700
X-ASG-Orig-Subj: [##CM-E1-903079072##][RFC PATCH] ovn: Add stateful ACL
support.
Message-Id: <1441415060-93194-1-git-send-email-jpettit@nicira.com>
X-Mailer: git-send-email 1.7.5.4
X-Barracuda-Connect: UNKNOWN[192.168.24.1]
X-Barracuda-Start-Time: 1441415065
X-Barracuda-Encrypted: DHE-RSA-AES256-SHA
X-Barracuda-URL: https://web.cudamail.com:443/cgi-mod/mark.cgi
X-ASG-Whitelist: Header =?UTF-8?B?eFwtY3VkYW1haWxcLXdoaXRlbGlzdFwtdG8=?=
X-Virus-Scanned: by bsmtpd at cudamail.com
X-Barracuda-BRTS-Status: 1
Subject: [ovs-dev] [RFC PATCH] ovn: Add stateful ACL support.
X-BeenThere: dev@openvswitch.org
X-Mailman-Version: 2.1.16
Precedence: list
List-Id:
List-Unsubscribe: ,
List-Archive:
List-Post:
List-Help:
List-Subscribe: ,
MIME-Version: 1.0
Errors-To: dev-bounces@openvswitch.org
Sender: "dev"
Add support for the "allow-related" ACL action. This is dependent on
the OVS conntrack functionality, which is not available on all platforms
or kernel versions.
Here is a sample policy that will allow all tenants in logical switch
"ls0" to SSH to each other. Anyone can make an HTTP request to "lp0".
All other IP traffic is dropped:
ovn-nbctl acl-add ls0 from-lport 100 ip allow-related
ovn-nbctl acl-add ls0 to-lport 100 tcp.dst==22 allow-related
ovn-nbctl acl-add ls0 to-lport 100 "outport == \"lp0\" \
&& tcp.dst==80" allow-related
ovn-nbctl acl-add ls0 to-lport 1 ip drop
-=-=-=-=-=-=-=-=-=-
NOTE: This is an RFC. I would like some feedback on the overall design
and whether it works as expected. It has a number of dependencies on
features not yet available in the master of OVS. As such, it is
probably easiest to try this patch from the following repo:
https://github.com/justinpettit/ovs/tree/ovn-acl
Once the prerequisites make it to the main OVS repo, I'll send out a
non-RFC version.
---
ovn/TODO | 8 ++
ovn/controller/binding.c | 43 ++++++++++
ovn/controller/lflow.c | 13 +++-
ovn/controller/lflow.h | 4 +-
ovn/controller/ovn-controller.8.xml | 19 +++++
ovn/controller/ovn-controller.c | 50 +++++++++---
ovn/controller/ovn-controller.h | 7 ++
ovn/controller/physical.c | 16 ++++-
ovn/lib/actions.c | 46 +++++++++--
ovn/lib/actions.h | 13 ++--
ovn/northd/ovn-northd.c | 152 +++++++++++++++++++++++++++++------
ovn/ovn-architecture.7.xml | 8 ++
ovn/ovn-sb.xml | 39 +++++++--
tests/test-ovn.c | 8 +-
14 files changed, 363 insertions(+), 63 deletions(-)
diff --git a/ovn/TODO b/ovn/TODO
index 356b3ba..260ddaa 100644
--- a/ovn/TODO
+++ b/ovn/TODO
@@ -80,3 +80,11 @@
So far, both ovn-controller and ovn-controller-vtep only allow
chassis to have one tunnel encapsulation entry. We should extend
the implementation to support multiple tunnel encapsulations.
+
+* ACL
+
+** Support FTP ALGs.
+
+** Support reject action.
+
+** Support log option.
diff --git a/ovn/controller/binding.c b/ovn/controller/binding.c
index fca2430..492dd39 100644
--- a/ovn/controller/binding.c
+++ b/ovn/controller/binding.c
@@ -16,6 +16,7 @@
#include
#include "binding.h"
+#include "lib/bitmap.h"
#include "lib/sset.h"
#include "lib/util.h"
#include "lib/vswitch-idl.h"
@@ -71,6 +72,46 @@ get_local_iface_ids(const struct ovsrec_bridge *br_int, struct sset *lports)
}
}
+static void
+update_ct_zones(struct controller_ctx *ctx, struct sset *lports)
+{
+ struct simap_node *ct_zone, *ct_zone_next;
+ const char *iface_id;
+
+ /* xxx This is wasteful to assign a zone to each port--even if no
+ * xxx security policy is applied. */
+
+ /* Delete any zones that are associated with removed ports. */
+ SIMAP_FOR_EACH_SAFE(ct_zone, ct_zone_next, &ctx->ct_zones) {
+ if (!sset_contains(lports, ct_zone->name)) {
+ bitmap_set0(ctx->ct_zone_bitmap, ct_zone->data);
+ simap_delete(&ctx->ct_zones, ct_zone);
+ }
+ }
+
+ /* Assign a unique zone id for each logical port. */
+ SSET_FOR_EACH(iface_id, lports) {
+ size_t zone;
+
+ if (simap_contains(&ctx->ct_zones, iface_id)) {
+ continue;
+ }
+
+ /* We assume that there are 64K zones and that we own them all. */
+ zone = bitmap_scan(ctx->ct_zone_bitmap, 0, 1, MAX_CT_ZONES + 1);
+ if (zone == MAX_CT_ZONES + 1) {
+ static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 1);
+ VLOG_WARN_RL(&rl, "exhausted all ct zones");
+ return;
+ }
+
+ bitmap_set1(ctx->ct_zone_bitmap, zone);
+ simap_put(&ctx->ct_zones, iface_id, zone);
+
+ /* xxx This should make call to erase any old entries for this zone. */
+ }
+}
+
void
binding_run(struct controller_ctx *ctx, const struct ovsrec_bridge *br_int,
const char *chassis_id)
@@ -97,6 +138,7 @@ binding_run(struct controller_ctx *ctx, const struct ovsrec_bridge *br_int,
/* We have no integration bridge, therefore no local logical ports.
* We'll remove our chassis from all port binding records below. */
}
+ update_ct_zones(ctx, &lports);
sset_clone(&all_lports, &lports);
ovsdb_idl_txn_add_comment(
@@ -141,6 +183,7 @@ binding_cleanup(struct controller_ctx *ctx, const char *chassis_id)
if (!chassis_id) {
return true;
}
+
const struct sbrec_chassis *chassis_rec
= get_chassis_by_name(ctx->ovnsb_idl, chassis_id);
if (!chassis_rec) {
diff --git a/ovn/controller/lflow.c b/ovn/controller/lflow.c
index 9246e61..fa61247 100644
--- a/ovn/controller/lflow.c
+++ b/ovn/controller/lflow.c
@@ -58,6 +58,15 @@ symtab_init(void)
MFF_LOG_REGS;
#undef MFF_LOG_REG
+ /* Connection tracking state. */
+ expr_symtab_add_field(&symtab, "ct_state", MFF_CT_STATE, NULL, false);
+ expr_symtab_add_predicate(&symtab, "ct.trk", "ct_state[7]");
+ expr_symtab_add_subfield(&symtab, "ct.new", "ct.trk", "ct_state[0]");
+ expr_symtab_add_subfield(&symtab, "ct.est", "ct.trk", "ct_state[1]");
+ expr_symtab_add_subfield(&symtab, "ct.rel", "ct.trk", "ct_state[2]");
+ expr_symtab_add_subfield(&symtab, "ct.inv", "ct.trk", "ct_state[5]");
+ expr_symtab_add_subfield(&symtab, "ct.rpl", "ct.trk", "ct_state[6]");
+
/* Data fields. */
expr_symtab_add_field(&symtab, "eth.src", MFF_ETH_SRC, NULL, false);
expr_symtab_add_field(&symtab, "eth.dst", MFF_ETH_DST, NULL, false);
@@ -284,8 +293,8 @@ lflow_run(struct controller_ctx *ctx, struct hmap *flow_table)
ofpbuf_use_stub(&ofpacts, ofpacts_stub, sizeof ofpacts_stub);
error = actions_parse_string(lflow->actions, &symtab, &ldp->ports,
- next_phys_table, output_phys_table,
- &ofpacts, &prereqs);
+ &ctx->ct_zones, next_phys_table,
+ output_phys_table, &ofpacts, &prereqs);
if (error) {
static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 1);
VLOG_WARN_RL(&rl, "error parsing actions \"%s\": %s",
diff --git a/ovn/controller/lflow.h b/ovn/controller/lflow.h
index 59fe559..0e3e51e 100644
--- a/ovn/controller/lflow.h
+++ b/ovn/controller/lflow.h
@@ -55,6 +55,7 @@ struct uuid;
/* Logical fields. */
#define MFF_LOG_DATAPATH MFF_METADATA /* Logical datapath (64 bits). */
+#define MFF_LOG_CT_ZONE MFF_REG5 /* Logical conntrack zone (32 bits). */
#define MFF_LOG_INPORT MFF_REG6 /* Logical input port (32 bits). */
#define MFF_LOG_OUTPORT MFF_REG7 /* Logical output port (32 bits). */
@@ -66,8 +67,7 @@ struct uuid;
MFF_LOG_REG(MFF_REG1) \
MFF_LOG_REG(MFF_REG2) \
MFF_LOG_REG(MFF_REG3) \
- MFF_LOG_REG(MFF_REG4) \
- MFF_LOG_REG(MFF_REG5)
+ MFF_LOG_REG(MFF_REG4)
void lflow_init(void);
void lflow_run(struct controller_ctx *, struct hmap *flow_table);
diff --git a/ovn/controller/ovn-controller.8.xml b/ovn/controller/ovn-controller.8.xml
index e1cb6a2..acca510 100644
--- a/ovn/controller/ovn-controller.8.xml
+++ b/ovn/controller/ovn-controller.8.xml
@@ -119,4 +119,23 @@
This behavior should be changed.
+
+ RUNTIME MANAGEMENT COMMANDS
+
+ ovs-appctl
can send commands to a running
+ ovn-controller
process. The currently supported
+ commands are described below.
+
+ exit
+ -
+ Causes
ovn-controller
to gracefully terminate.
+
+
+ ct-zone-list
+ -
+ Lists each local logical port and its connection tracking zone.
+
+
+
+
diff --git a/ovn/controller/ovn-controller.c b/ovn/controller/ovn-controller.c
index 8e93a0f..8314e04 100644
--- a/ovn/controller/ovn-controller.c
+++ b/ovn/controller/ovn-controller.c
@@ -27,6 +27,7 @@
#include "compiler.h"
#include "daemon.h"
#include "dirs.h"
+#include "dynamic-string.h"
#include "openvswitch/vconn.h"
#include "openvswitch/vlog.h"
#include "ovn/lib/ovn-sb-idl.h"
@@ -49,6 +50,7 @@
VLOG_DEFINE_THIS_MODULE(main);
static unixctl_cb_func ovn_controller_exit;
+static unixctl_cb_func ct_zone_list;
#define DEFAULT_BRIDGE_NAME "br-int"
@@ -119,6 +121,7 @@ int
main(int argc, char *argv[])
{
struct unixctl_server *unixctl;
+ struct controller_ctx ctx;
bool exiting;
int retval;
@@ -134,6 +137,7 @@ main(int argc, char *argv[])
exit(EXIT_FAILURE);
}
unixctl_command_register("exit", "", 0, 0, ovn_controller_exit, &exiting);
+ unixctl_command_register("ct-zone-list", "", 0, 0, ct_zone_list, &ctx);
daemonize_complete();
@@ -162,15 +166,20 @@ main(int argc, char *argv[])
ovsdb_idl_create(ovnsb_remote, &sbrec_idl_class, true, true));
ovsdb_idl_get_initial_snapshot(ovnsb_idl_loop.idl);
+ /* Initialize connection tracking zones. */
+ simap_init(&ctx.ct_zones);
+ ctx.ct_zone_bitmap = bitmap_allocate(MAX_CT_ZONES);
+
+ /* We never use zone 0. */
+ bitmap_set1(ctx.ct_zone_bitmap, 0);
+
/* Main loop. */
exiting = false;
while (!exiting) {
- struct controller_ctx ctx = {
- .ovs_idl = ovs_idl_loop.idl,
- .ovs_idl_txn = ovsdb_idl_loop_run(&ovs_idl_loop),
- .ovnsb_idl = ovnsb_idl_loop.idl,
- .ovnsb_idl_txn = ovsdb_idl_loop_run(&ovnsb_idl_loop),
- };
+ ctx.ovs_idl = ovs_idl_loop.idl;
+ ctx.ovs_idl_txn = ovsdb_idl_loop_run(&ovs_idl_loop);
+ ctx.ovnsb_idl = ovnsb_idl_loop.idl;
+ ctx.ovnsb_idl_txn = ovsdb_idl_loop_run(&ovnsb_idl_loop);
const struct ovsrec_bridge *br_int = get_br_int(ctx.ovs_idl);
const char *chassis_id = get_chassis_id(ctx.ovs_idl);
@@ -213,12 +222,10 @@ main(int argc, char *argv[])
/* It's time to exit. Clean up the databases. */
bool done = false;
while (!done) {
- struct controller_ctx ctx = {
- .ovs_idl = ovs_idl_loop.idl,
- .ovs_idl_txn = ovsdb_idl_loop_run(&ovs_idl_loop),
- .ovnsb_idl = ovnsb_idl_loop.idl,
- .ovnsb_idl_txn = ovsdb_idl_loop_run(&ovnsb_idl_loop),
- };
+ ctx.ovs_idl = ovs_idl_loop.idl;
+ ctx.ovs_idl_txn = ovsdb_idl_loop_run(&ovs_idl_loop);
+ ctx.ovnsb_idl = ovnsb_idl_loop.idl;
+ ctx.ovnsb_idl_txn = ovsdb_idl_loop_run(&ovnsb_idl_loop);
const struct ovsrec_bridge *br_int = get_br_int(ctx.ovs_idl);
const char *chassis_id = get_chassis_id(ctx.ovs_idl);
@@ -241,6 +248,9 @@ main(int argc, char *argv[])
lflow_destroy();
ofctrl_destroy();
+ simap_destroy(&ctx.ct_zones);
+ bitmap_free(ctx.ct_zone_bitmap);
+
ovsdb_idl_loop_destroy(&ovs_idl_loop);
ovsdb_idl_loop_destroy(&ovnsb_idl_loop);
@@ -341,3 +351,19 @@ ovn_controller_exit(struct unixctl_conn *conn, int argc OVS_UNUSED,
unixctl_command_reply(conn, NULL);
}
+
+static void
+ct_zone_list(struct unixctl_conn *conn, int argc OVS_UNUSED,
+ const char *argv[] OVS_UNUSED, void *aux)
+{
+ const struct controller_ctx *ctx = aux;
+ struct ds ds = DS_EMPTY_INITIALIZER;
+ struct simap_node *zone;
+
+ SIMAP_FOR_EACH(zone, &ctx->ct_zones) {
+ ds_put_format(&ds, "%s %d\n", zone->name, zone->data);
+ }
+
+ unixctl_command_reply(conn, ds_cstr(&ds));
+ ds_destroy(&ds);
+}
diff --git a/ovn/controller/ovn-controller.h b/ovn/controller/ovn-controller.h
index be89b5f..932decd 100644
--- a/ovn/controller/ovn-controller.h
+++ b/ovn/controller/ovn-controller.h
@@ -17,14 +17,21 @@
#ifndef OVN_CONTROLLER_H
#define OVN_CONTROLLER_H 1
+#include "simap.h"
#include "ovn/lib/ovn-sb-idl.h"
+/* Linux supports a maximum of 64K zones, which seems like a fine default. */
+#define MAX_CT_ZONES 65535
+
struct controller_ctx {
struct ovsdb_idl *ovnsb_idl;
struct ovsdb_idl_txn *ovnsb_idl_txn;
struct ovsdb_idl *ovs_idl;
struct ovsdb_idl_txn *ovs_idl_txn;
+
+ struct simap ct_zones; /* Port to conntrack zone mappings. */
+ unsigned long *ct_zone_bitmap; /* Bitmap of assigned zones. */
};
static inline const struct sbrec_chassis *
diff --git a/ovn/controller/physical.c b/ovn/controller/physical.c
index 2ec0ba9..d12c5dc 100644
--- a/ovn/controller/physical.c
+++ b/ovn/controller/physical.c
@@ -235,6 +235,7 @@ physical_run(struct controller_ctx *ctx, enum mf_field_id mff_ovn_geneve,
struct match match;
if (!tun) {
+ int zone_id = simap_get(&ctx->ct_zones, binding->logical_port);
/* Packets that arrive from a vif can belong to a VM or
* to a container located inside that VM. Packets that
* arrive from containers have a tag (vlan) associated with them.
@@ -251,6 +252,7 @@ physical_run(struct controller_ctx *ctx, enum mf_field_id mff_ovn_geneve,
* input port, MFF_LOG_DATAPATH to the logical datapath, and
* resubmit into the logical ingress pipeline starting at table
* 16. */
+
match_init_catchall(&match);
ofpbuf_clear(&ofpacts);
match_set_in_port(&match, ofport);
@@ -258,6 +260,10 @@ physical_run(struct controller_ctx *ctx, enum mf_field_id mff_ovn_geneve,
match_set_dl_vlan(&match, htons(tag));
}
+ if (zone_id) {
+ put_load(zone_id, MFF_LOG_CT_ZONE, 0, 32, &ofpacts);
+ }
+
/* Set MFF_LOG_DATAPATH and MFF_LOG_INPORT. */
put_load(binding->datapath->tunnel_key, MFF_LOG_DATAPATH, 0, 64,
&ofpacts);
@@ -289,6 +295,10 @@ physical_run(struct controller_ctx *ctx, enum mf_field_id mff_ovn_geneve,
match_set_reg(&match, MFF_LOG_OUTPORT - MFF_REG0,
binding->tunnel_key);
+ if (zone_id) {
+ put_load(zone_id, MFF_LOG_CT_ZONE, 0, 32, &ofpacts);
+ }
+
/* Resubmit to table 34. */
put_resubmit(OFTABLE_DROP_LOOPBACK, &ofpacts);
ofctrl_add_flow(flow_table, OFTABLE_LOCAL_OUTPUT, 100, &match,
@@ -396,6 +406,10 @@ physical_run(struct controller_ctx *ctx, enum mf_field_id mff_ovn_geneve,
continue;
}
+ int zone_id = simap_get(&ctx->ct_zones, port->logical_port);
+ if (zone_id) {
+ put_load(zone_id, MFF_LOG_CT_ZONE, 0, 32, &ofpacts);
+ }
if (simap_contains(&lport_to_ofport, port->logical_port)) {
put_load(port->tunnel_key, MFF_LOG_OUTPORT, 0, 32, &ofpacts);
put_resubmit(OFTABLE_DROP_LOOPBACK, &ofpacts);
@@ -504,7 +518,7 @@ physical_run(struct controller_ctx *ctx, enum mf_field_id mff_ovn_geneve,
ofpbuf_clear(&ofpacts);
#define MFF_LOG_REG(ID) put_load(0, ID, 0, 32, &ofpacts);
MFF_LOG_REGS;
-#undef MFF_LOG_REGS
+ #undef MFF_LOG_REGS
put_resubmit(OFTABLE_LOG_EGRESS_PIPELINE, &ofpacts);
ofctrl_add_flow(flow_table, OFTABLE_DROP_LOOPBACK, 0, &match, &ofpacts);
diff --git a/ovn/lib/actions.c b/ovn/lib/actions.c
index 0a0158a..0f9675f 100644
--- a/ovn/lib/actions.c
+++ b/ovn/lib/actions.c
@@ -24,6 +24,7 @@
#include "lex.h"
#include "ofp-actions.h"
#include "ofpbuf.h"
+#include "simap.h"
/* Context maintained during actions_parse(). */
struct action_context {
@@ -33,6 +34,7 @@ struct action_context {
uint8_t next_table_id; /* OpenFlow table for 'next' to resubmit. */
uint8_t output_table_id; /* OpenFlow table for 'output' to resubmit. */
const struct simap *ports; /* Map from port name to number. */
+ const struct simap *ct_zones; /* Map from port name to conntrack zone. */
/* State. */
char *error; /* Error, if any, otherwise NULL. */
@@ -131,6 +133,30 @@ emit_resubmit(struct action_context *ctx, uint8_t table_id)
}
static void
+emit_ct(struct action_context *ctx, bool recirc, bool commit)
+{
+ struct ofpact_conntrack *ct = ofpact_put_CT(ctx->ofpacts);
+ ct->flags |= recirc ? NX_CT_F_RECIRC : 0;
+ ct->flags |= commit ? NX_CT_F_COMMIT : 0;
+ if (recirc) {
+ /* If "recirc" is set, we automatically go to the next table. */
+ ct->next_table = ctx->next_table_id;
+ }
+ /* xxx Should remove hard-coding reg5 if we refactor library. */
+ ct->src.field = mf_from_id(MFF_REG5);
+ ct->src.ofs = 0;
+ ct->src.n_bits = 32;
+
+ /* CT only works with IP, so set up a prerequisite. */
+ struct expr *expr;
+ char *error;
+
+ expr = expr_parse_string("ip", ctx->symtab, &error);
+ ovs_assert(!error);
+ ctx->prereqs = expr_combine(EXPR_T_AND, ctx->prereqs, expr);
+}
+
+static void
parse_actions(struct action_context *ctx)
{
/* "drop;" by itself is a valid (empty) set of actions, but it can't be
@@ -163,6 +189,10 @@ parse_actions(struct action_context *ctx)
}
} else if (lexer_match_id(ctx->lexer, "output")) {
emit_resubmit(ctx, ctx->output_table_id);
+ } else if (lexer_match_id(ctx->lexer, "ct_next")) {
+ emit_ct(ctx, true, false);
+ } else if (lexer_match_id(ctx->lexer, "ct_commit")) {
+ emit_ct(ctx, false, true);
} else {
action_syntax_error(ctx, "expecting action");
}
@@ -186,6 +216,8 @@ parse_actions(struct action_context *ctx)
* (as one would provide to expr_to_matches()). Strings used in the actions
* that are not in 'ports' are translated to zero.
*
+ * 'ct_zones' provides a map from a port name to its connection tracking zone.
+ *
* 'next_table_id' should be the OpenFlow table to which the "next" action will
* resubmit, or 0 to disable "next".
*
@@ -204,8 +236,9 @@ parse_actions(struct action_context *ctx)
*/
char * OVS_WARN_UNUSED_RESULT
actions_parse(struct lexer *lexer, const struct shash *symtab,
- const struct simap *ports, uint8_t next_table_id,
- uint8_t output_table_id, struct ofpbuf *ofpacts,
+ const struct simap *ports, const struct simap *ct_zones,
+ uint8_t next_table_id, uint8_t output_table_id,
+ struct ofpbuf *ofpacts,
struct expr **prereqsp)
{
size_t ofpacts_start = ofpacts->size;
@@ -214,6 +247,7 @@ actions_parse(struct lexer *lexer, const struct shash *symtab,
ctx.lexer = lexer;
ctx.symtab = symtab;
ctx.ports = ports;
+ ctx.ct_zones = ct_zones;
ctx.next_table_id = next_table_id;
ctx.output_table_id = output_table_id;
ctx.error = NULL;
@@ -236,16 +270,16 @@ actions_parse(struct lexer *lexer, const struct shash *symtab,
/* Like actions_parse(), but the actions are taken from 's'. */
char * OVS_WARN_UNUSED_RESULT
actions_parse_string(const char *s, const struct shash *symtab,
- const struct simap *ports, uint8_t next_table_id,
- uint8_t output_table_id, struct ofpbuf *ofpacts,
- struct expr **prereqsp)
+ const struct simap *ports, const struct simap *ct_zones,
+ uint8_t next_table_id, uint8_t output_table_id,
+ struct ofpbuf *ofpacts, struct expr **prereqsp)
{
struct lexer lexer;
char *error;
lexer_init(&lexer, s);
lexer_get(&lexer);
- error = actions_parse(&lexer, symtab, ports, next_table_id,
+ error = actions_parse(&lexer, symtab, ports, ct_zones, next_table_id,
output_table_id, ofpacts, prereqsp);
lexer_destroy(&lexer);
diff --git a/ovn/lib/actions.h b/ovn/lib/actions.h
index 74cd185..377b273 100644
--- a/ovn/lib/actions.h
+++ b/ovn/lib/actions.h
@@ -27,14 +27,15 @@ struct shash;
struct simap;
char *actions_parse(struct lexer *, const struct shash *symtab,
- const struct simap *ports, uint8_t next_table_id,
- uint8_t output_table_id, struct ofpbuf *ofpacts,
- struct expr **prereqsp)
+ const struct simap *ports, const struct simap *ct_zones,
+ uint8_t next_table_id, uint8_t output_table_id,
+ struct ofpbuf *ofpacts, struct expr **prereqsp)
OVS_WARN_UNUSED_RESULT;
char *actions_parse_string(const char *s, const struct shash *symtab,
- const struct simap *ports, uint8_t next_table_id,
- uint8_t output_table_id, struct ofpbuf *ofpacts,
- struct expr **prereqsp)
+ const struct simap *ports,
+ const struct simap *ct_zones,
+ uint8_t next_table_id, uint8_t output_table_id,
+ struct ofpbuf *ofpacts, struct expr **prereqsp)
OVS_WARN_UNUSED_RESULT;
#endif /* ovn/actions.h */
diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
index d70ba39..f1dfa10 100644
--- a/ovn/northd/ovn-northd.c
+++ b/ovn/northd/ovn-northd.c
@@ -60,6 +60,8 @@ static const char *default_db(void);
* These must be listed in the order that the stages will be executed. */
#define INGRESS_STAGES \
INGRESS_STAGE(PORT_SEC, port_sec) \
+ INGRESS_STAGE(PRE_ACL, pre_acl) \
+ INGRESS_STAGE(ACL, acl) \
INGRESS_STAGE(L2_LKUP, l2_lkup)
enum ingress_stage {
@@ -72,8 +74,9 @@ enum ingress_stage {
/* Egress pipeline stages.
*
* These must be listed in the order that the stages will be executed. */
-#define EGRESS_STAGES \
- EGRESS_STAGE(ACL, acl) \
+#define EGRESS_STAGES \
+ EGRESS_STAGE(PRE_ACL, pre_acl) \
+ EGRESS_STAGE(ACL, acl) \
EGRESS_STAGE(PORT_SEC, port_sec)
enum egress_stage {
@@ -723,6 +726,118 @@ lport_is_enabled(const struct nbrec_logical_port *lport)
return !lport->enabled || *lport->enabled;
}
+static bool
+has_stateful_acl(struct ovn_datapath *od)
+{
+ for (size_t i = 0; i < od->nb->n_acls; i++) {
+ struct nbrec_acl *acl = od->nb->acls[i];
+ if (!strcmp(acl->action, "allow-related")) {
+ return true;
+ }
+ }
+
+ return false;
+}
+
+static void
+build_acls(struct ovn_datapath *od, struct hmap *lflows)
+{
+ /* Ingress and Egress Pre-ACL Table (Priority 0): Packets are
+ * allowed by default. */
+ ovn_lflow_add(lflows, od, P_IN, S_IN_PRE_ACL, 0, "1", "next;");
+ ovn_lflow_add(lflows, od, P_OUT, S_OUT_PRE_ACL, 0, "1", "next;");
+
+ /* Ingress and Egress ACL Table (Priority 0): Packets are allowed by
+ * default. */
+ ovn_lflow_add(lflows, od, P_IN, S_IN_ACL, 0, "1", "next;");
+ ovn_lflow_add(lflows, od, P_OUT, S_OUT_ACL, 0, "1", "next;");
+
+ /* If there are any stateful ACL rules, we must send all IP packets
+ * through the conntrack action, which handles defragmentation, in
+ * order to match L4 headers. */
+ if (has_stateful_acl(od)) {
+ /* Ingress and Egress Pre-ACL Table (Priority 100).
+ *
+ * Regardless of whether the ACL is "from-lport" or "to-lport",
+ * we need rules in both the ingress and egress table, because
+ * the return traffic needs to be followed. */
+ ovn_lflow_add(lflows, od, P_IN, S_IN_PRE_ACL, 100,
+ "ip", "ct_next;");
+ ovn_lflow_add(lflows, od, P_OUT, S_OUT_PRE_ACL, 100,
+ "ip", "ct_next;");
+
+ /* Ingress and Egress ACL Table (Priority 65535).
+ *
+ * Always drop traffic that's in an invalid state. This is
+ * enforced at a higher priority than ACLs can be defined. */
+ ovn_lflow_add(lflows, od, P_IN, S_IN_ACL, UINT16_MAX,
+ "ct.inv", "drop;");
+ ovn_lflow_add(lflows, od, P_OUT, S_OUT_ACL, UINT16_MAX,
+ "ct.inv", "drop;");
+
+ /* Ingress and Egress ACL Table (Priority 65535).
+ *
+ * Always allow traffic that is established to a committed
+ * conntrack entry. This is enforced at a higher priority than
+ * ACLs can be defined. */
+ ovn_lflow_add(lflows, od, P_IN, S_IN_ACL, UINT16_MAX,
+ "ct.est && !ct.rel && !ct.new && !ct.inv",
+ "next;");
+ ovn_lflow_add(lflows, od, P_OUT, S_OUT_ACL, UINT16_MAX,
+ "ct.est && !ct.rel && !ct.new && !ct.inv",
+ "next;");
+
+ /* Ingress and Egress ACL Table (Priority 65535).
+ *
+ * Always allow traffic that is related to an existing conntrack
+ * entry. This is enforced at a higher priority than ACLs can
+ * be defined.
+ *
+ * NOTE: This does not support related data sessions (eg,
+ * a dynamically negotiated FTP data channel), but will allow
+ * related traffic such as an ICMP Port Unreachable through
+ * that's generated from a non-listening UDP port. */
+ ovn_lflow_add(lflows, od, P_IN, S_IN_ACL, UINT16_MAX,
+ "!ct.est && ct.rel && !ct.new && !ct.inv",
+ "next;");
+ ovn_lflow_add(lflows, od, P_OUT, S_OUT_ACL, UINT16_MAX,
+ "!ct.est && ct.rel && !ct.new && !ct.inv",
+ "next;");
+ }
+
+ /* Ingress or Egress ACL Table (Various priorities). */
+ for (size_t i = 0; i < od->nb->n_acls; i++) {
+ struct nbrec_acl *acl = od->nb->acls[i];
+ bool ingress = !strcmp(acl->direction, "from-lport") ? true :false;
+ enum ovn_pipeline pipeline = ingress ? P_IN : P_OUT;
+ uint8_t stage = ingress ? S_IN_ACL : S_OUT_ACL;
+
+ if (!strcmp(acl->action, "allow")) {
+ ovn_lflow_add(lflows, od, pipeline, stage, acl->priority,
+ acl->match, "next;");
+ } else if (!strcmp(acl->action, "allow-related")) {
+ struct ds match = DS_EMPTY_INITIALIZER;
+
+ /* Commit the connection tracking entry, which allow all
+ * other traffic related to this entry to flow due to the
+ * 65535 priority flow defined earlier. */
+ ds_put_format(&match, "ct.new && %s", acl->match);
+ ovn_lflow_add(lflows, od, pipeline, stage, acl->priority,
+ ds_cstr(&match), "ct_commit; next;");
+
+ ds_destroy(&match);
+ } else if (!strcmp(acl->action, "drop")) {
+ ovn_lflow_add(lflows, od, pipeline, stage, acl->priority,
+ acl->match, "drop;");
+ } else if (!strcmp(acl->action, "reject")) {
+ /* xxx Need to support "reject". */
+ VLOG_INFO("reject is not a supported action");
+ ovn_lflow_add(lflows, od, pipeline, stage, acl->priority,
+ acl->match, "drop;");
+ }
+ }
+}
+
/* Updates the Logical_Flow and Multicast_Group tables in the OVN_SB database,
* constructing their contents based on the OVN_NB database. */
static void
@@ -764,7 +879,7 @@ build_lflows(struct northd_context *ctx, struct hmap *datapaths,
ds_destroy(&match);
}
- /* Ingress table 1: Destination lookup, broadcast and multicast handling
+ /* Ingress table 3: Destination lookup, broadcast and multicast handling
* (priority 100). */
HMAP_FOR_EACH (op, key_node, ports) {
if (lport_is_enabled(op->nb)) {
@@ -776,7 +891,7 @@ build_lflows(struct northd_context *ctx, struct hmap *datapaths,
"outport = \""MC_FLOOD"\"; output;");
}
- /* Ingress table 1: Destination lookup, unicast handling (priority 50), */
+ /* Ingress table 3: Destination lookup, unicast handling (priority 50), */
HMAP_FOR_EACH (op, key_node, ports) {
for (size_t i = 0; i < op->nb->n_macs; i++) {
uint8_t mac[ETH_ADDR_LEN];
@@ -807,7 +922,7 @@ build_lflows(struct northd_context *ctx, struct hmap *datapaths,
}
}
- /* Ingress table 1: Destination lookup for unknown MACs (priority 0). */
+ /* Ingress table 3: Destination lookup for unknown MACs (priority 0). */
HMAP_FOR_EACH (od, key_node, datapaths) {
if (od->has_unknown) {
ovn_lflow_add(&lflows, od, P_IN, S_IN_L2_LKUP, 0, "1",
@@ -815,31 +930,14 @@ build_lflows(struct northd_context *ctx, struct hmap *datapaths,
}
}
- /* Egress table 0: ACLs (any priority). */
- HMAP_FOR_EACH (od, key_node, datapaths) {
- for (size_t i = 0; i < od->nb->n_acls; i++) {
- const struct nbrec_acl *acl = od->nb->acls[i];
- const char *action;
-
- action = (!strcmp(acl->action, "allow") ||
- !strcmp(acl->action, "allow-related"))
- ? "next;" : "drop;";
- ovn_lflow_add(&lflows, od, P_OUT, S_OUT_ACL, acl->priority,
- acl->match, action);
- }
- }
- HMAP_FOR_EACH (od, key_node, datapaths) {
- ovn_lflow_add(&lflows, od, P_OUT, S_OUT_ACL, 0, "1", "next;");
- }
-
- /* Egress table 1: Egress port security multicast/broadcast (priority
+ /* Egress table 2: Egress port security multicast/broadcast (priority
* 100). */
HMAP_FOR_EACH (od, key_node, datapaths) {
ovn_lflow_add(&lflows, od, P_OUT, S_OUT_PORT_SEC, 100, "eth.dst[40]",
"output;");
}
- /* Egress table 1: Egress port security (priority 50). */
+ /* Egress table 2: Egress port security (priority 50). */
HMAP_FOR_EACH (op, key_node, ports) {
struct ds match;
@@ -857,6 +955,12 @@ build_lflows(struct northd_context *ctx, struct hmap *datapaths,
ds_destroy(&match);
}
+ /* Build pre-ACL and ACL tables for both ingress and egress.
+ * Ingress tables 1 and 2. Egress tables 0 and 1. */
+ HMAP_FOR_EACH (od, key_node, datapaths) {
+ build_acls(od, &lflows);
+ }
+
/* Push changes to the Logical_Flow table to database. */
const struct sbrec_logical_flow *sbflow, *next_sbflow;
SBREC_LOGICAL_FLOW_FOR_EACH_SAFE (sbflow, next_sbflow, ctx->ovnsb_idl) {
diff --git a/ovn/ovn-architecture.7.xml b/ovn/ovn-architecture.7.xml
index 2424836..7252f5d 100644
--- a/ovn/ovn-architecture.7.xml
+++ b/ovn/ovn-architecture.7.xml
@@ -642,6 +642,14 @@
tunnels as part of the tunnel key.)
+ conntrack zone field
+
+ A field that denotes the connection tracking zone. The value only
+ has local significance and is not meaningful between chassis.
+ This is initialized to 0 at the beginning of the logical ingress
+ pipeline. OVN stores this in Nicira extension register number 5.
+
+
VLAN ID
The VLAN ID is used as an interface between OVN and containers nested
diff --git a/ovn/ovn-sb.xml b/ovn/ovn-sb.xml
index 8102eb3..94ef532 100644
--- a/ovn/ovn-sb.xml
+++ b/ovn/ovn-sb.xml
@@ -287,10 +287,11 @@
inport
to outport
; if they are equal, it treats
the output
as a no-op. In the common case, where they are
different, the packet enters the egress pipeline. This transition to the
- egress pipeline discards register data, e.g. reg0
- ... reg5
, to achieve uniform behavior regardless of whether
- the egress pipeline is on a different hypervisor (because registers
- aren't preserve across tunnel encapsulation).
+ egress pipeline discards register data, e.g. reg0
...
+ reg4
and connection tracking state, to achieve
+ uniform behavior regardless of whether the egress pipeline is on a
+ different hypervisor (because registers aren't preserve across
+ tunnel encapsulation).
@@ -674,7 +675,7 @@
- reg0
...reg5
+ reg0
...reg4
inport
outport
eth.src
eth.dst
eth.type
vlan.tci
vlan.vid
vlan.pcp
vlan.present
@@ -688,6 +689,12 @@
icmp4.type
icmp4.code
icmp6.type
icmp6.code
nd.target
nd.sll
nd.tll
+ ct_state
can take the following shortcuts:
+ ct.new
New flow
+ ct.est
Established flow
+ ct.rel
Related flow
+ ct.rpl
Reply flow
+ ct.inv
Connection entry in a bad state
@@ -769,6 +776,26 @@
pipeline.
+
+ ct_next;
+
+ Apply connection tracking to the flow. After a call to
+ ct_next
, the ct_state
field is
+ available to match. As a side effect, IP fragments will be
+ reassembled for matching. If a fragmented packet is output,
+ then it will be sent with any overlapping fragments squashed.
+ The connection tracking state is scoped by the logical port,
+ so overlapping addresses may be used. To allow traffic
+ related to the matched flow, ct_commit
must be
+ called. After this action is used, the next logical datapath
+ table will be executed.
+
+
+ ct_commit;
+
+ Commit the flow to the connection tracking entry associated
+ with it by a previous call to commit
.
+
@@ -784,8 +811,6 @@
learn
- conntrack
-
dec_ttl { action,
... } { action;
...};
decrement TTL; execute first set of actions if
diff --git a/tests/test-ovn.c b/tests/test-ovn.c
index 60b87de..4f8b80d 100644
--- a/tests/test-ovn.c
+++ b/tests/test-ovn.c
@@ -1121,7 +1121,7 @@ static void
test_parse_actions(struct ovs_cmdl_context *ctx OVS_UNUSED)
{
struct shash symtab;
- struct simap ports;
+ struct simap ports, ct_zones;
struct ds input;
create_symtab(&symtab);
@@ -1130,6 +1130,7 @@ test_parse_actions(struct ovs_cmdl_context *ctx OVS_UNUSED)
simap_put(&ports, "eth0", 5);
simap_put(&ports, "eth1", 6);
simap_put(&ports, "LOCAL", ofp_to_u16(OFPP_LOCAL));
+ simap_init(&ct_zones);
ds_init(&input);
while (!ds_get_test_line(&input, stdin)) {
@@ -1138,8 +1139,8 @@ test_parse_actions(struct ovs_cmdl_context *ctx OVS_UNUSED)
char *error;
ofpbuf_init(&ofpacts, 0);
- error = actions_parse_string(ds_cstr(&input), &symtab, &ports, 11, 64,
- &ofpacts, &prereqs);
+ error = actions_parse_string(ds_cstr(&input), &symtab, &ports,
+ &ct_zones, 11, 64, &ofpacts, &prereqs);
if (!error) {
struct ds output;
@@ -1165,6 +1166,7 @@ test_parse_actions(struct ovs_cmdl_context *ctx OVS_UNUSED)
ds_destroy(&input);
simap_destroy(&ports);
+ simap_destroy(&ct_zones);
expr_symtab_destroy(&symtab);
shash_destroy(&symtab);
}