From patchwork Sat Sep 5 01:04:20 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Justin Pettit X-Patchwork-Id: 514782 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from archives.nicira.com (li376-54.members.linode.com [96.126.127.54]) by ozlabs.org (Postfix) with ESMTP id 7914114029C for ; Sat, 5 Sep 2015 11:04:29 +1000 (AEST) Received: from archives.nicira.com (localhost [127.0.0.1]) by archives.nicira.com (Postfix) with ESMTP id 39DD510C0C; Fri, 4 Sep 2015 18:04:28 -0700 (PDT) X-Original-To: dev@openvswitch.org Delivered-To: dev@openvswitch.org Received: from mx1e3.cudamail.com (mx1.cudamail.com [69.90.118.67]) by archives.nicira.com (Postfix) with ESMTPS id 2BBAB10C0A for ; Fri, 4 Sep 2015 18:04:27 -0700 (PDT) Received: from bar5.cudamail.com (localhost [127.0.0.1]) by mx1e3.cudamail.com (Postfix) with ESMTPS id 62B3F420326 for ; Fri, 4 Sep 2015 19:04:26 -0600 (MDT) X-ASG-Debug-ID: 1441415065-09eadd069586c20001-byXFYA Received: from mx1-pf1.cudamail.com ([192.168.24.1]) by bar5.cudamail.com with ESMTP id OeVp6kB6J56D6bV1 (version=TLSv1 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Fri, 04 Sep 2015 19:04:25 -0600 (MDT) X-Barracuda-Envelope-From: jpettit@nicira.com X-Barracuda-RBL-Trusted-Forwarder: 192.168.24.1 Received: from unknown (HELO mail-pa0-f52.google.com) (209.85.220.52) by mx1-pf1.cudamail.com with ESMTPS (RC4-SHA encrypted); 5 Sep 2015 01:04:25 -0000 Received-SPF: unknown (mx1-pf1.cudamail.com: Multiple SPF records returned) X-Barracuda-RBL-Trusted-Forwarder: 209.85.220.52 Received: by padhy16 with SMTP id hy16so36622292pad.1 for ; Fri, 04 Sep 2015 18:04:24 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:subject:date:message-id; bh=9yuc6tihjz92TE6DSWQ5FGU8jbXv4+YsJSOkOChTu00=; b=EEGxPxlpLeXzVUmLMEGTFdooNyq811ZkJDthOx6TmwJUCdkyox0pemMF8nqbTuZ2md v5rPha7nOUXAzvL/o7iSqYsVIfcNHoL7zswStDAVlAdQh5Oxd3KyHxGx1uYOuLarN2HL dr9DXwIP86pAd8YLtkBhN6w660mLQBR9NWrHdmhrQ87Y7ykep/IYdEwwH0rRd+tUabje lxjS3Sc6jMTxWrJ6fSSCduIZUTeUmqccCuhaPyBkFZ+ufPUyEeCvVa1lexMcAM05XPsE +xK7m0lPoMfaaYau6kqmSWVi1wreUANjFVsERxjJRqZH22IuoVW4znMsUOlCWb8Cs0Cn xlIA== X-Gm-Message-State: ALoCoQls5J3/3VDLeaI5+z37SwROPUhtPBCQJGU/cos2bhNml/VjhkU46oKXkr6agF63ZPEEajU8 X-Received: by 10.66.66.40 with SMTP id c8mr14576711pat.70.1441415064505; Fri, 04 Sep 2015 18:04:24 -0700 (PDT) Received: from localhost.localdomain ([208.91.2.4]) by smtp.gmail.com with ESMTPSA id eg2sm3894415pad.44.2015.09.04.18.04.23 for (version=TLSv1/SSLv3 cipher=OTHER); Fri, 04 Sep 2015 18:04:23 -0700 (PDT) X-CudaMail-Envelope-Sender: jpettit@nicira.com X-Barracuda-Apparent-Source-IP: 208.91.2.4 From: Justin Pettit To: dev@openvswitch.org X-CudaMail-Whitelist-To: dev@openvswitch.org X-CudaMail-MID: CM-E1-903079072 X-CudaMail-DTE: 090415 X-CudaMail-Originating-IP: 209.85.220.52 Date: Fri, 4 Sep 2015 18:04:20 -0700 X-ASG-Orig-Subj: [##CM-E1-903079072##][RFC PATCH] ovn: Add stateful ACL support. Message-Id: <1441415060-93194-1-git-send-email-jpettit@nicira.com> X-Mailer: git-send-email 1.7.5.4 X-Barracuda-Connect: UNKNOWN[192.168.24.1] X-Barracuda-Start-Time: 1441415065 X-Barracuda-Encrypted: DHE-RSA-AES256-SHA X-Barracuda-URL: https://web.cudamail.com:443/cgi-mod/mark.cgi X-ASG-Whitelist: Header =?UTF-8?B?eFwtY3VkYW1haWxcLXdoaXRlbGlzdFwtdG8=?= X-Virus-Scanned: by bsmtpd at cudamail.com X-Barracuda-BRTS-Status: 1 Subject: [ovs-dev] [RFC PATCH] ovn: Add stateful ACL support. X-BeenThere: dev@openvswitch.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: dev-bounces@openvswitch.org Sender: "dev" Add support for the "allow-related" ACL action. This is dependent on the OVS conntrack functionality, which is not available on all platforms or kernel versions. Here is a sample policy that will allow all tenants in logical switch "ls0" to SSH to each other. Anyone can make an HTTP request to "lp0". All other IP traffic is dropped: ovn-nbctl acl-add ls0 from-lport 100 ip allow-related ovn-nbctl acl-add ls0 to-lport 100 tcp.dst==22 allow-related ovn-nbctl acl-add ls0 to-lport 100 "outport == \"lp0\" \ && tcp.dst==80" allow-related ovn-nbctl acl-add ls0 to-lport 1 ip drop -=-=-=-=-=-=-=-=-=- NOTE: This is an RFC. I would like some feedback on the overall design and whether it works as expected. It has a number of dependencies on features not yet available in the master of OVS. As such, it is probably easiest to try this patch from the following repo: https://github.com/justinpettit/ovs/tree/ovn-acl Once the prerequisites make it to the main OVS repo, I'll send out a non-RFC version. --- ovn/TODO | 8 ++ ovn/controller/binding.c | 43 ++++++++++ ovn/controller/lflow.c | 13 +++- ovn/controller/lflow.h | 4 +- ovn/controller/ovn-controller.8.xml | 19 +++++ ovn/controller/ovn-controller.c | 50 +++++++++--- ovn/controller/ovn-controller.h | 7 ++ ovn/controller/physical.c | 16 ++++- ovn/lib/actions.c | 46 +++++++++-- ovn/lib/actions.h | 13 ++-- ovn/northd/ovn-northd.c | 152 +++++++++++++++++++++++++++++------ ovn/ovn-architecture.7.xml | 8 ++ ovn/ovn-sb.xml | 39 +++++++-- tests/test-ovn.c | 8 +- 14 files changed, 363 insertions(+), 63 deletions(-) diff --git a/ovn/TODO b/ovn/TODO index 356b3ba..260ddaa 100644 --- a/ovn/TODO +++ b/ovn/TODO @@ -80,3 +80,11 @@ So far, both ovn-controller and ovn-controller-vtep only allow chassis to have one tunnel encapsulation entry. We should extend the implementation to support multiple tunnel encapsulations. + +* ACL + +** Support FTP ALGs. + +** Support reject action. + +** Support log option. diff --git a/ovn/controller/binding.c b/ovn/controller/binding.c index fca2430..492dd39 100644 --- a/ovn/controller/binding.c +++ b/ovn/controller/binding.c @@ -16,6 +16,7 @@ #include #include "binding.h" +#include "lib/bitmap.h" #include "lib/sset.h" #include "lib/util.h" #include "lib/vswitch-idl.h" @@ -71,6 +72,46 @@ get_local_iface_ids(const struct ovsrec_bridge *br_int, struct sset *lports) } } +static void +update_ct_zones(struct controller_ctx *ctx, struct sset *lports) +{ + struct simap_node *ct_zone, *ct_zone_next; + const char *iface_id; + + /* xxx This is wasteful to assign a zone to each port--even if no + * xxx security policy is applied. */ + + /* Delete any zones that are associated with removed ports. */ + SIMAP_FOR_EACH_SAFE(ct_zone, ct_zone_next, &ctx->ct_zones) { + if (!sset_contains(lports, ct_zone->name)) { + bitmap_set0(ctx->ct_zone_bitmap, ct_zone->data); + simap_delete(&ctx->ct_zones, ct_zone); + } + } + + /* Assign a unique zone id for each logical port. */ + SSET_FOR_EACH(iface_id, lports) { + size_t zone; + + if (simap_contains(&ctx->ct_zones, iface_id)) { + continue; + } + + /* We assume that there are 64K zones and that we own them all. */ + zone = bitmap_scan(ctx->ct_zone_bitmap, 0, 1, MAX_CT_ZONES + 1); + if (zone == MAX_CT_ZONES + 1) { + static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 1); + VLOG_WARN_RL(&rl, "exhausted all ct zones"); + return; + } + + bitmap_set1(ctx->ct_zone_bitmap, zone); + simap_put(&ctx->ct_zones, iface_id, zone); + + /* xxx This should make call to erase any old entries for this zone. */ + } +} + void binding_run(struct controller_ctx *ctx, const struct ovsrec_bridge *br_int, const char *chassis_id) @@ -97,6 +138,7 @@ binding_run(struct controller_ctx *ctx, const struct ovsrec_bridge *br_int, /* We have no integration bridge, therefore no local logical ports. * We'll remove our chassis from all port binding records below. */ } + update_ct_zones(ctx, &lports); sset_clone(&all_lports, &lports); ovsdb_idl_txn_add_comment( @@ -141,6 +183,7 @@ binding_cleanup(struct controller_ctx *ctx, const char *chassis_id) if (!chassis_id) { return true; } + const struct sbrec_chassis *chassis_rec = get_chassis_by_name(ctx->ovnsb_idl, chassis_id); if (!chassis_rec) { diff --git a/ovn/controller/lflow.c b/ovn/controller/lflow.c index 9246e61..fa61247 100644 --- a/ovn/controller/lflow.c +++ b/ovn/controller/lflow.c @@ -58,6 +58,15 @@ symtab_init(void) MFF_LOG_REGS; #undef MFF_LOG_REG + /* Connection tracking state. */ + expr_symtab_add_field(&symtab, "ct_state", MFF_CT_STATE, NULL, false); + expr_symtab_add_predicate(&symtab, "ct.trk", "ct_state[7]"); + expr_symtab_add_subfield(&symtab, "ct.new", "ct.trk", "ct_state[0]"); + expr_symtab_add_subfield(&symtab, "ct.est", "ct.trk", "ct_state[1]"); + expr_symtab_add_subfield(&symtab, "ct.rel", "ct.trk", "ct_state[2]"); + expr_symtab_add_subfield(&symtab, "ct.inv", "ct.trk", "ct_state[5]"); + expr_symtab_add_subfield(&symtab, "ct.rpl", "ct.trk", "ct_state[6]"); + /* Data fields. */ expr_symtab_add_field(&symtab, "eth.src", MFF_ETH_SRC, NULL, false); expr_symtab_add_field(&symtab, "eth.dst", MFF_ETH_DST, NULL, false); @@ -284,8 +293,8 @@ lflow_run(struct controller_ctx *ctx, struct hmap *flow_table) ofpbuf_use_stub(&ofpacts, ofpacts_stub, sizeof ofpacts_stub); error = actions_parse_string(lflow->actions, &symtab, &ldp->ports, - next_phys_table, output_phys_table, - &ofpacts, &prereqs); + &ctx->ct_zones, next_phys_table, + output_phys_table, &ofpacts, &prereqs); if (error) { static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 1); VLOG_WARN_RL(&rl, "error parsing actions \"%s\": %s", diff --git a/ovn/controller/lflow.h b/ovn/controller/lflow.h index 59fe559..0e3e51e 100644 --- a/ovn/controller/lflow.h +++ b/ovn/controller/lflow.h @@ -55,6 +55,7 @@ struct uuid; /* Logical fields. */ #define MFF_LOG_DATAPATH MFF_METADATA /* Logical datapath (64 bits). */ +#define MFF_LOG_CT_ZONE MFF_REG5 /* Logical conntrack zone (32 bits). */ #define MFF_LOG_INPORT MFF_REG6 /* Logical input port (32 bits). */ #define MFF_LOG_OUTPORT MFF_REG7 /* Logical output port (32 bits). */ @@ -66,8 +67,7 @@ struct uuid; MFF_LOG_REG(MFF_REG1) \ MFF_LOG_REG(MFF_REG2) \ MFF_LOG_REG(MFF_REG3) \ - MFF_LOG_REG(MFF_REG4) \ - MFF_LOG_REG(MFF_REG5) + MFF_LOG_REG(MFF_REG4) void lflow_init(void); void lflow_run(struct controller_ctx *, struct hmap *flow_table); diff --git a/ovn/controller/ovn-controller.8.xml b/ovn/controller/ovn-controller.8.xml index e1cb6a2..acca510 100644 --- a/ovn/controller/ovn-controller.8.xml +++ b/ovn/controller/ovn-controller.8.xml @@ -119,4 +119,23 @@ This behavior should be changed.

+ +

RUNTIME MANAGEMENT COMMANDS

+

+ ovs-appctl can send commands to a running + ovn-controller process. The currently supported + commands are described below. +

+
exit
+
+ Causes ovn-controller to gracefully terminate. +
+ +
ct-zone-list
+
+ Lists each local logical port and its connection tracking zone. +
+
+

+ diff --git a/ovn/controller/ovn-controller.c b/ovn/controller/ovn-controller.c index 8e93a0f..8314e04 100644 --- a/ovn/controller/ovn-controller.c +++ b/ovn/controller/ovn-controller.c @@ -27,6 +27,7 @@ #include "compiler.h" #include "daemon.h" #include "dirs.h" +#include "dynamic-string.h" #include "openvswitch/vconn.h" #include "openvswitch/vlog.h" #include "ovn/lib/ovn-sb-idl.h" @@ -49,6 +50,7 @@ VLOG_DEFINE_THIS_MODULE(main); static unixctl_cb_func ovn_controller_exit; +static unixctl_cb_func ct_zone_list; #define DEFAULT_BRIDGE_NAME "br-int" @@ -119,6 +121,7 @@ int main(int argc, char *argv[]) { struct unixctl_server *unixctl; + struct controller_ctx ctx; bool exiting; int retval; @@ -134,6 +137,7 @@ main(int argc, char *argv[]) exit(EXIT_FAILURE); } unixctl_command_register("exit", "", 0, 0, ovn_controller_exit, &exiting); + unixctl_command_register("ct-zone-list", "", 0, 0, ct_zone_list, &ctx); daemonize_complete(); @@ -162,15 +166,20 @@ main(int argc, char *argv[]) ovsdb_idl_create(ovnsb_remote, &sbrec_idl_class, true, true)); ovsdb_idl_get_initial_snapshot(ovnsb_idl_loop.idl); + /* Initialize connection tracking zones. */ + simap_init(&ctx.ct_zones); + ctx.ct_zone_bitmap = bitmap_allocate(MAX_CT_ZONES); + + /* We never use zone 0. */ + bitmap_set1(ctx.ct_zone_bitmap, 0); + /* Main loop. */ exiting = false; while (!exiting) { - struct controller_ctx ctx = { - .ovs_idl = ovs_idl_loop.idl, - .ovs_idl_txn = ovsdb_idl_loop_run(&ovs_idl_loop), - .ovnsb_idl = ovnsb_idl_loop.idl, - .ovnsb_idl_txn = ovsdb_idl_loop_run(&ovnsb_idl_loop), - }; + ctx.ovs_idl = ovs_idl_loop.idl; + ctx.ovs_idl_txn = ovsdb_idl_loop_run(&ovs_idl_loop); + ctx.ovnsb_idl = ovnsb_idl_loop.idl; + ctx.ovnsb_idl_txn = ovsdb_idl_loop_run(&ovnsb_idl_loop); const struct ovsrec_bridge *br_int = get_br_int(ctx.ovs_idl); const char *chassis_id = get_chassis_id(ctx.ovs_idl); @@ -213,12 +222,10 @@ main(int argc, char *argv[]) /* It's time to exit. Clean up the databases. */ bool done = false; while (!done) { - struct controller_ctx ctx = { - .ovs_idl = ovs_idl_loop.idl, - .ovs_idl_txn = ovsdb_idl_loop_run(&ovs_idl_loop), - .ovnsb_idl = ovnsb_idl_loop.idl, - .ovnsb_idl_txn = ovsdb_idl_loop_run(&ovnsb_idl_loop), - }; + ctx.ovs_idl = ovs_idl_loop.idl; + ctx.ovs_idl_txn = ovsdb_idl_loop_run(&ovs_idl_loop); + ctx.ovnsb_idl = ovnsb_idl_loop.idl; + ctx.ovnsb_idl_txn = ovsdb_idl_loop_run(&ovnsb_idl_loop); const struct ovsrec_bridge *br_int = get_br_int(ctx.ovs_idl); const char *chassis_id = get_chassis_id(ctx.ovs_idl); @@ -241,6 +248,9 @@ main(int argc, char *argv[]) lflow_destroy(); ofctrl_destroy(); + simap_destroy(&ctx.ct_zones); + bitmap_free(ctx.ct_zone_bitmap); + ovsdb_idl_loop_destroy(&ovs_idl_loop); ovsdb_idl_loop_destroy(&ovnsb_idl_loop); @@ -341,3 +351,19 @@ ovn_controller_exit(struct unixctl_conn *conn, int argc OVS_UNUSED, unixctl_command_reply(conn, NULL); } + +static void +ct_zone_list(struct unixctl_conn *conn, int argc OVS_UNUSED, + const char *argv[] OVS_UNUSED, void *aux) +{ + const struct controller_ctx *ctx = aux; + struct ds ds = DS_EMPTY_INITIALIZER; + struct simap_node *zone; + + SIMAP_FOR_EACH(zone, &ctx->ct_zones) { + ds_put_format(&ds, "%s %d\n", zone->name, zone->data); + } + + unixctl_command_reply(conn, ds_cstr(&ds)); + ds_destroy(&ds); +} diff --git a/ovn/controller/ovn-controller.h b/ovn/controller/ovn-controller.h index be89b5f..932decd 100644 --- a/ovn/controller/ovn-controller.h +++ b/ovn/controller/ovn-controller.h @@ -17,14 +17,21 @@ #ifndef OVN_CONTROLLER_H #define OVN_CONTROLLER_H 1 +#include "simap.h" #include "ovn/lib/ovn-sb-idl.h" +/* Linux supports a maximum of 64K zones, which seems like a fine default. */ +#define MAX_CT_ZONES 65535 + struct controller_ctx { struct ovsdb_idl *ovnsb_idl; struct ovsdb_idl_txn *ovnsb_idl_txn; struct ovsdb_idl *ovs_idl; struct ovsdb_idl_txn *ovs_idl_txn; + + struct simap ct_zones; /* Port to conntrack zone mappings. */ + unsigned long *ct_zone_bitmap; /* Bitmap of assigned zones. */ }; static inline const struct sbrec_chassis * diff --git a/ovn/controller/physical.c b/ovn/controller/physical.c index 2ec0ba9..d12c5dc 100644 --- a/ovn/controller/physical.c +++ b/ovn/controller/physical.c @@ -235,6 +235,7 @@ physical_run(struct controller_ctx *ctx, enum mf_field_id mff_ovn_geneve, struct match match; if (!tun) { + int zone_id = simap_get(&ctx->ct_zones, binding->logical_port); /* Packets that arrive from a vif can belong to a VM or * to a container located inside that VM. Packets that * arrive from containers have a tag (vlan) associated with them. @@ -251,6 +252,7 @@ physical_run(struct controller_ctx *ctx, enum mf_field_id mff_ovn_geneve, * input port, MFF_LOG_DATAPATH to the logical datapath, and * resubmit into the logical ingress pipeline starting at table * 16. */ + match_init_catchall(&match); ofpbuf_clear(&ofpacts); match_set_in_port(&match, ofport); @@ -258,6 +260,10 @@ physical_run(struct controller_ctx *ctx, enum mf_field_id mff_ovn_geneve, match_set_dl_vlan(&match, htons(tag)); } + if (zone_id) { + put_load(zone_id, MFF_LOG_CT_ZONE, 0, 32, &ofpacts); + } + /* Set MFF_LOG_DATAPATH and MFF_LOG_INPORT. */ put_load(binding->datapath->tunnel_key, MFF_LOG_DATAPATH, 0, 64, &ofpacts); @@ -289,6 +295,10 @@ physical_run(struct controller_ctx *ctx, enum mf_field_id mff_ovn_geneve, match_set_reg(&match, MFF_LOG_OUTPORT - MFF_REG0, binding->tunnel_key); + if (zone_id) { + put_load(zone_id, MFF_LOG_CT_ZONE, 0, 32, &ofpacts); + } + /* Resubmit to table 34. */ put_resubmit(OFTABLE_DROP_LOOPBACK, &ofpacts); ofctrl_add_flow(flow_table, OFTABLE_LOCAL_OUTPUT, 100, &match, @@ -396,6 +406,10 @@ physical_run(struct controller_ctx *ctx, enum mf_field_id mff_ovn_geneve, continue; } + int zone_id = simap_get(&ctx->ct_zones, port->logical_port); + if (zone_id) { + put_load(zone_id, MFF_LOG_CT_ZONE, 0, 32, &ofpacts); + } if (simap_contains(&lport_to_ofport, port->logical_port)) { put_load(port->tunnel_key, MFF_LOG_OUTPORT, 0, 32, &ofpacts); put_resubmit(OFTABLE_DROP_LOOPBACK, &ofpacts); @@ -504,7 +518,7 @@ physical_run(struct controller_ctx *ctx, enum mf_field_id mff_ovn_geneve, ofpbuf_clear(&ofpacts); #define MFF_LOG_REG(ID) put_load(0, ID, 0, 32, &ofpacts); MFF_LOG_REGS; -#undef MFF_LOG_REGS + #undef MFF_LOG_REGS put_resubmit(OFTABLE_LOG_EGRESS_PIPELINE, &ofpacts); ofctrl_add_flow(flow_table, OFTABLE_DROP_LOOPBACK, 0, &match, &ofpacts); diff --git a/ovn/lib/actions.c b/ovn/lib/actions.c index 0a0158a..0f9675f 100644 --- a/ovn/lib/actions.c +++ b/ovn/lib/actions.c @@ -24,6 +24,7 @@ #include "lex.h" #include "ofp-actions.h" #include "ofpbuf.h" +#include "simap.h" /* Context maintained during actions_parse(). */ struct action_context { @@ -33,6 +34,7 @@ struct action_context { uint8_t next_table_id; /* OpenFlow table for 'next' to resubmit. */ uint8_t output_table_id; /* OpenFlow table for 'output' to resubmit. */ const struct simap *ports; /* Map from port name to number. */ + const struct simap *ct_zones; /* Map from port name to conntrack zone. */ /* State. */ char *error; /* Error, if any, otherwise NULL. */ @@ -131,6 +133,30 @@ emit_resubmit(struct action_context *ctx, uint8_t table_id) } static void +emit_ct(struct action_context *ctx, bool recirc, bool commit) +{ + struct ofpact_conntrack *ct = ofpact_put_CT(ctx->ofpacts); + ct->flags |= recirc ? NX_CT_F_RECIRC : 0; + ct->flags |= commit ? NX_CT_F_COMMIT : 0; + if (recirc) { + /* If "recirc" is set, we automatically go to the next table. */ + ct->next_table = ctx->next_table_id; + } + /* xxx Should remove hard-coding reg5 if we refactor library. */ + ct->src.field = mf_from_id(MFF_REG5); + ct->src.ofs = 0; + ct->src.n_bits = 32; + + /* CT only works with IP, so set up a prerequisite. */ + struct expr *expr; + char *error; + + expr = expr_parse_string("ip", ctx->symtab, &error); + ovs_assert(!error); + ctx->prereqs = expr_combine(EXPR_T_AND, ctx->prereqs, expr); +} + +static void parse_actions(struct action_context *ctx) { /* "drop;" by itself is a valid (empty) set of actions, but it can't be @@ -163,6 +189,10 @@ parse_actions(struct action_context *ctx) } } else if (lexer_match_id(ctx->lexer, "output")) { emit_resubmit(ctx, ctx->output_table_id); + } else if (lexer_match_id(ctx->lexer, "ct_next")) { + emit_ct(ctx, true, false); + } else if (lexer_match_id(ctx->lexer, "ct_commit")) { + emit_ct(ctx, false, true); } else { action_syntax_error(ctx, "expecting action"); } @@ -186,6 +216,8 @@ parse_actions(struct action_context *ctx) * (as one would provide to expr_to_matches()). Strings used in the actions * that are not in 'ports' are translated to zero. * + * 'ct_zones' provides a map from a port name to its connection tracking zone. + * * 'next_table_id' should be the OpenFlow table to which the "next" action will * resubmit, or 0 to disable "next". * @@ -204,8 +236,9 @@ parse_actions(struct action_context *ctx) */ char * OVS_WARN_UNUSED_RESULT actions_parse(struct lexer *lexer, const struct shash *symtab, - const struct simap *ports, uint8_t next_table_id, - uint8_t output_table_id, struct ofpbuf *ofpacts, + const struct simap *ports, const struct simap *ct_zones, + uint8_t next_table_id, uint8_t output_table_id, + struct ofpbuf *ofpacts, struct expr **prereqsp) { size_t ofpacts_start = ofpacts->size; @@ -214,6 +247,7 @@ actions_parse(struct lexer *lexer, const struct shash *symtab, ctx.lexer = lexer; ctx.symtab = symtab; ctx.ports = ports; + ctx.ct_zones = ct_zones; ctx.next_table_id = next_table_id; ctx.output_table_id = output_table_id; ctx.error = NULL; @@ -236,16 +270,16 @@ actions_parse(struct lexer *lexer, const struct shash *symtab, /* Like actions_parse(), but the actions are taken from 's'. */ char * OVS_WARN_UNUSED_RESULT actions_parse_string(const char *s, const struct shash *symtab, - const struct simap *ports, uint8_t next_table_id, - uint8_t output_table_id, struct ofpbuf *ofpacts, - struct expr **prereqsp) + const struct simap *ports, const struct simap *ct_zones, + uint8_t next_table_id, uint8_t output_table_id, + struct ofpbuf *ofpacts, struct expr **prereqsp) { struct lexer lexer; char *error; lexer_init(&lexer, s); lexer_get(&lexer); - error = actions_parse(&lexer, symtab, ports, next_table_id, + error = actions_parse(&lexer, symtab, ports, ct_zones, next_table_id, output_table_id, ofpacts, prereqsp); lexer_destroy(&lexer); diff --git a/ovn/lib/actions.h b/ovn/lib/actions.h index 74cd185..377b273 100644 --- a/ovn/lib/actions.h +++ b/ovn/lib/actions.h @@ -27,14 +27,15 @@ struct shash; struct simap; char *actions_parse(struct lexer *, const struct shash *symtab, - const struct simap *ports, uint8_t next_table_id, - uint8_t output_table_id, struct ofpbuf *ofpacts, - struct expr **prereqsp) + const struct simap *ports, const struct simap *ct_zones, + uint8_t next_table_id, uint8_t output_table_id, + struct ofpbuf *ofpacts, struct expr **prereqsp) OVS_WARN_UNUSED_RESULT; char *actions_parse_string(const char *s, const struct shash *symtab, - const struct simap *ports, uint8_t next_table_id, - uint8_t output_table_id, struct ofpbuf *ofpacts, - struct expr **prereqsp) + const struct simap *ports, + const struct simap *ct_zones, + uint8_t next_table_id, uint8_t output_table_id, + struct ofpbuf *ofpacts, struct expr **prereqsp) OVS_WARN_UNUSED_RESULT; #endif /* ovn/actions.h */ diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c index d70ba39..f1dfa10 100644 --- a/ovn/northd/ovn-northd.c +++ b/ovn/northd/ovn-northd.c @@ -60,6 +60,8 @@ static const char *default_db(void); * These must be listed in the order that the stages will be executed. */ #define INGRESS_STAGES \ INGRESS_STAGE(PORT_SEC, port_sec) \ + INGRESS_STAGE(PRE_ACL, pre_acl) \ + INGRESS_STAGE(ACL, acl) \ INGRESS_STAGE(L2_LKUP, l2_lkup) enum ingress_stage { @@ -72,8 +74,9 @@ enum ingress_stage { /* Egress pipeline stages. * * These must be listed in the order that the stages will be executed. */ -#define EGRESS_STAGES \ - EGRESS_STAGE(ACL, acl) \ +#define EGRESS_STAGES \ + EGRESS_STAGE(PRE_ACL, pre_acl) \ + EGRESS_STAGE(ACL, acl) \ EGRESS_STAGE(PORT_SEC, port_sec) enum egress_stage { @@ -723,6 +726,118 @@ lport_is_enabled(const struct nbrec_logical_port *lport) return !lport->enabled || *lport->enabled; } +static bool +has_stateful_acl(struct ovn_datapath *od) +{ + for (size_t i = 0; i < od->nb->n_acls; i++) { + struct nbrec_acl *acl = od->nb->acls[i]; + if (!strcmp(acl->action, "allow-related")) { + return true; + } + } + + return false; +} + +static void +build_acls(struct ovn_datapath *od, struct hmap *lflows) +{ + /* Ingress and Egress Pre-ACL Table (Priority 0): Packets are + * allowed by default. */ + ovn_lflow_add(lflows, od, P_IN, S_IN_PRE_ACL, 0, "1", "next;"); + ovn_lflow_add(lflows, od, P_OUT, S_OUT_PRE_ACL, 0, "1", "next;"); + + /* Ingress and Egress ACL Table (Priority 0): Packets are allowed by + * default. */ + ovn_lflow_add(lflows, od, P_IN, S_IN_ACL, 0, "1", "next;"); + ovn_lflow_add(lflows, od, P_OUT, S_OUT_ACL, 0, "1", "next;"); + + /* If there are any stateful ACL rules, we must send all IP packets + * through the conntrack action, which handles defragmentation, in + * order to match L4 headers. */ + if (has_stateful_acl(od)) { + /* Ingress and Egress Pre-ACL Table (Priority 100). + * + * Regardless of whether the ACL is "from-lport" or "to-lport", + * we need rules in both the ingress and egress table, because + * the return traffic needs to be followed. */ + ovn_lflow_add(lflows, od, P_IN, S_IN_PRE_ACL, 100, + "ip", "ct_next;"); + ovn_lflow_add(lflows, od, P_OUT, S_OUT_PRE_ACL, 100, + "ip", "ct_next;"); + + /* Ingress and Egress ACL Table (Priority 65535). + * + * Always drop traffic that's in an invalid state. This is + * enforced at a higher priority than ACLs can be defined. */ + ovn_lflow_add(lflows, od, P_IN, S_IN_ACL, UINT16_MAX, + "ct.inv", "drop;"); + ovn_lflow_add(lflows, od, P_OUT, S_OUT_ACL, UINT16_MAX, + "ct.inv", "drop;"); + + /* Ingress and Egress ACL Table (Priority 65535). + * + * Always allow traffic that is established to a committed + * conntrack entry. This is enforced at a higher priority than + * ACLs can be defined. */ + ovn_lflow_add(lflows, od, P_IN, S_IN_ACL, UINT16_MAX, + "ct.est && !ct.rel && !ct.new && !ct.inv", + "next;"); + ovn_lflow_add(lflows, od, P_OUT, S_OUT_ACL, UINT16_MAX, + "ct.est && !ct.rel && !ct.new && !ct.inv", + "next;"); + + /* Ingress and Egress ACL Table (Priority 65535). + * + * Always allow traffic that is related to an existing conntrack + * entry. This is enforced at a higher priority than ACLs can + * be defined. + * + * NOTE: This does not support related data sessions (eg, + * a dynamically negotiated FTP data channel), but will allow + * related traffic such as an ICMP Port Unreachable through + * that's generated from a non-listening UDP port. */ + ovn_lflow_add(lflows, od, P_IN, S_IN_ACL, UINT16_MAX, + "!ct.est && ct.rel && !ct.new && !ct.inv", + "next;"); + ovn_lflow_add(lflows, od, P_OUT, S_OUT_ACL, UINT16_MAX, + "!ct.est && ct.rel && !ct.new && !ct.inv", + "next;"); + } + + /* Ingress or Egress ACL Table (Various priorities). */ + for (size_t i = 0; i < od->nb->n_acls; i++) { + struct nbrec_acl *acl = od->nb->acls[i]; + bool ingress = !strcmp(acl->direction, "from-lport") ? true :false; + enum ovn_pipeline pipeline = ingress ? P_IN : P_OUT; + uint8_t stage = ingress ? S_IN_ACL : S_OUT_ACL; + + if (!strcmp(acl->action, "allow")) { + ovn_lflow_add(lflows, od, pipeline, stage, acl->priority, + acl->match, "next;"); + } else if (!strcmp(acl->action, "allow-related")) { + struct ds match = DS_EMPTY_INITIALIZER; + + /* Commit the connection tracking entry, which allow all + * other traffic related to this entry to flow due to the + * 65535 priority flow defined earlier. */ + ds_put_format(&match, "ct.new && %s", acl->match); + ovn_lflow_add(lflows, od, pipeline, stage, acl->priority, + ds_cstr(&match), "ct_commit; next;"); + + ds_destroy(&match); + } else if (!strcmp(acl->action, "drop")) { + ovn_lflow_add(lflows, od, pipeline, stage, acl->priority, + acl->match, "drop;"); + } else if (!strcmp(acl->action, "reject")) { + /* xxx Need to support "reject". */ + VLOG_INFO("reject is not a supported action"); + ovn_lflow_add(lflows, od, pipeline, stage, acl->priority, + acl->match, "drop;"); + } + } +} + /* Updates the Logical_Flow and Multicast_Group tables in the OVN_SB database, * constructing their contents based on the OVN_NB database. */ static void @@ -764,7 +879,7 @@ build_lflows(struct northd_context *ctx, struct hmap *datapaths, ds_destroy(&match); } - /* Ingress table 1: Destination lookup, broadcast and multicast handling + /* Ingress table 3: Destination lookup, broadcast and multicast handling * (priority 100). */ HMAP_FOR_EACH (op, key_node, ports) { if (lport_is_enabled(op->nb)) { @@ -776,7 +891,7 @@ build_lflows(struct northd_context *ctx, struct hmap *datapaths, "outport = \""MC_FLOOD"\"; output;"); } - /* Ingress table 1: Destination lookup, unicast handling (priority 50), */ + /* Ingress table 3: Destination lookup, unicast handling (priority 50), */ HMAP_FOR_EACH (op, key_node, ports) { for (size_t i = 0; i < op->nb->n_macs; i++) { uint8_t mac[ETH_ADDR_LEN]; @@ -807,7 +922,7 @@ build_lflows(struct northd_context *ctx, struct hmap *datapaths, } } - /* Ingress table 1: Destination lookup for unknown MACs (priority 0). */ + /* Ingress table 3: Destination lookup for unknown MACs (priority 0). */ HMAP_FOR_EACH (od, key_node, datapaths) { if (od->has_unknown) { ovn_lflow_add(&lflows, od, P_IN, S_IN_L2_LKUP, 0, "1", @@ -815,31 +930,14 @@ build_lflows(struct northd_context *ctx, struct hmap *datapaths, } } - /* Egress table 0: ACLs (any priority). */ - HMAP_FOR_EACH (od, key_node, datapaths) { - for (size_t i = 0; i < od->nb->n_acls; i++) { - const struct nbrec_acl *acl = od->nb->acls[i]; - const char *action; - - action = (!strcmp(acl->action, "allow") || - !strcmp(acl->action, "allow-related")) - ? "next;" : "drop;"; - ovn_lflow_add(&lflows, od, P_OUT, S_OUT_ACL, acl->priority, - acl->match, action); - } - } - HMAP_FOR_EACH (od, key_node, datapaths) { - ovn_lflow_add(&lflows, od, P_OUT, S_OUT_ACL, 0, "1", "next;"); - } - - /* Egress table 1: Egress port security multicast/broadcast (priority + /* Egress table 2: Egress port security multicast/broadcast (priority * 100). */ HMAP_FOR_EACH (od, key_node, datapaths) { ovn_lflow_add(&lflows, od, P_OUT, S_OUT_PORT_SEC, 100, "eth.dst[40]", "output;"); } - /* Egress table 1: Egress port security (priority 50). */ + /* Egress table 2: Egress port security (priority 50). */ HMAP_FOR_EACH (op, key_node, ports) { struct ds match; @@ -857,6 +955,12 @@ build_lflows(struct northd_context *ctx, struct hmap *datapaths, ds_destroy(&match); } + /* Build pre-ACL and ACL tables for both ingress and egress. + * Ingress tables 1 and 2. Egress tables 0 and 1. */ + HMAP_FOR_EACH (od, key_node, datapaths) { + build_acls(od, &lflows); + } + /* Push changes to the Logical_Flow table to database. */ const struct sbrec_logical_flow *sbflow, *next_sbflow; SBREC_LOGICAL_FLOW_FOR_EACH_SAFE (sbflow, next_sbflow, ctx->ovnsb_idl) { diff --git a/ovn/ovn-architecture.7.xml b/ovn/ovn-architecture.7.xml index 2424836..7252f5d 100644 --- a/ovn/ovn-architecture.7.xml +++ b/ovn/ovn-architecture.7.xml @@ -642,6 +642,14 @@ tunnels as part of the tunnel key.) +
conntrack zone field
+
+ A field that denotes the connection tracking zone. The value only + has local significance and is not meaningful between chassis. + This is initialized to 0 at the beginning of the logical ingress + pipeline. OVN stores this in Nicira extension register number 5. +
+
VLAN ID
The VLAN ID is used as an interface between OVN and containers nested diff --git a/ovn/ovn-sb.xml b/ovn/ovn-sb.xml index 8102eb3..94ef532 100644 --- a/ovn/ovn-sb.xml +++ b/ovn/ovn-sb.xml @@ -287,10 +287,11 @@ inport to outport; if they are equal, it treats the output as a no-op. In the common case, where they are different, the packet enters the egress pipeline. This transition to the - egress pipeline discards register data, e.g. reg0 - ... reg5, to achieve uniform behavior regardless of whether - the egress pipeline is on a different hypervisor (because registers - aren't preserve across tunnel encapsulation). + egress pipeline discards register data, e.g. reg0 ... + reg4 and connection tracking state, to achieve + uniform behavior regardless of whether the egress pipeline is on a + different hypervisor (because registers aren't preserve across + tunnel encapsulation).

@@ -674,7 +675,7 @@

    -
  • reg0...reg5
  • +
  • reg0...reg4
  • inport outport
  • eth.src eth.dst eth.type
  • vlan.tci vlan.vid vlan.pcp vlan.present
  • @@ -688,6 +689,12 @@
  • icmp4.type icmp4.code
  • icmp6.type icmp6.code
  • nd.target nd.sll nd.tll
  • +
  • ct_state can take the following shortcuts:
  • +
  • ct.new New flow
  • +
  • ct.est Established flow
  • +
  • ct.rel Related flow
  • +
  • ct.rpl Reply flow
  • +
  • ct.inv Connection entry in a bad state
@@ -769,6 +776,26 @@ pipeline.

+ +
ct_next;
+
+ Apply connection tracking to the flow. After a call to + ct_next, the ct_state field is + available to match. As a side effect, IP fragments will be + reassembled for matching. If a fragmented packet is output, + then it will be sent with any overlapping fragments squashed. + The connection tracking state is scoped by the logical port, + so overlapping addresses may be used. To allow traffic + related to the matched flow, ct_commit must be + called. After this action is used, the next logical datapath + table will be executed. +
+ +
ct_commit;
+
+ Commit the flow to the connection tracking entry associated + with it by a previous call to commit. +

@@ -784,8 +811,6 @@

learn
-
conntrack
-
dec_ttl { action, ... } { action; ...};
decrement TTL; execute first set of actions if diff --git a/tests/test-ovn.c b/tests/test-ovn.c index 60b87de..4f8b80d 100644 --- a/tests/test-ovn.c +++ b/tests/test-ovn.c @@ -1121,7 +1121,7 @@ static void test_parse_actions(struct ovs_cmdl_context *ctx OVS_UNUSED) { struct shash symtab; - struct simap ports; + struct simap ports, ct_zones; struct ds input; create_symtab(&symtab); @@ -1130,6 +1130,7 @@ test_parse_actions(struct ovs_cmdl_context *ctx OVS_UNUSED) simap_put(&ports, "eth0", 5); simap_put(&ports, "eth1", 6); simap_put(&ports, "LOCAL", ofp_to_u16(OFPP_LOCAL)); + simap_init(&ct_zones); ds_init(&input); while (!ds_get_test_line(&input, stdin)) { @@ -1138,8 +1139,8 @@ test_parse_actions(struct ovs_cmdl_context *ctx OVS_UNUSED) char *error; ofpbuf_init(&ofpacts, 0); - error = actions_parse_string(ds_cstr(&input), &symtab, &ports, 11, 64, - &ofpacts, &prereqs); + error = actions_parse_string(ds_cstr(&input), &symtab, &ports, + &ct_zones, 11, 64, &ofpacts, &prereqs); if (!error) { struct ds output; @@ -1165,6 +1166,7 @@ test_parse_actions(struct ovs_cmdl_context *ctx OVS_UNUSED) ds_destroy(&input); simap_destroy(&ports); + simap_destroy(&ct_zones); expr_symtab_destroy(&symtab); shash_destroy(&symtab); }