Message ID | 20210506143446.638103-1-ihrachys@redhat.com |
---|---|
State | Superseded, archived |
Headers | show |
Series | [ovs-dev,v9] ovn-northd: introduce new allow-stateless ACL verb | expand |
Hi Ihar, Thanks for the patch, and it's great to see the significant performance improvement! Please see my comments below. On Thu, May 6, 2021 at 7:35 AM Ihar Hrachyshka <ihrachys@redhat.com> wrote: > > For allow-stateless ACLs, bypass connection tracking by avoiding > setting ct hints for matching traffic. Avoid sending all traffic to ct > when a stateful ACL is present. > > === > > Reusing an existing 'allow' verb for stateless matching would have its > drawbacks, specifically, when it comes to backwards incompatibility of > the new behavior with existing environments. When using "allow" ACLs > in mixed allow/allow-related environment, we still commit "allow" > traffic to conntrack. This unnecessarily hits performance when mixed > ACL action types were used for the same datapath. This is why we > introduce a new action verb to describe stateless behavior. > > Another complexity to consider is the fact that with stateless > matching, one would not be able to rely on 'related' magic that > guarantees that reply traffic is passed through. Instead, the user > would have to accurately define matching rules both for request and > reply directions of a protocol session. Specifically, when allowing > ICMP for a specific peer host, one has to define 'allow-stateless' > rules that would match against ip.dst for request direction and ip.src > for reply direction. Other protocols and scenarios will require their > own fine grained matching approaches implemented by the user. > > === > > For performance measurements, qperf was used. Tests were executed on two > setups: > > 1) ovn-fake-multinode virtual environment; > 2) Supermicro SYS-5039MS-H8TRF 3-node RH-OSP 16.x cluster with 2 compute > nodes, NIC: "Intel Corporation Ethernet Controller XXV710 for 25GbE > SFP28 (rev 02)", using fake_nova_driver to avoid qemu overhead. > > 1) ovn-fake-multinode: > > Performance measured between two virtual nodes, two ports > that belong to different LSs connected via router. Using qperf, > performance was measured for UDP, TCP, SCTP protocols (using > <proto>_lat and <proto>_bw tests). The qperf version used: > 0.4.9-16.fc31.x86_64. Each test scenario was executed five times and > averages compared. > > Tests were executed with `allow-stateless` rules for the tested > protocol and `allow-related` for another protocol set for both ports, > both directions, e.g. for TCP scenario, the following ACLs were > defined: > > ovn-nbctl acl-add sw0 to-lport 100 tcp allow-stateless > ovn-nbctl acl-add sw0 from-lport 100 tcp allow-stateless > ovn-nbctl acl-add sw1 to-lport 100 tcp allow-stateless > ovn-nbctl acl-add sw1 from-lport 100 tcp allow-stateless > > ovn-nbctl acl-add sw0 to-lport 100 sctp allow-related > ovn-nbctl acl-add sw0 from-lport 100 sctp allow-related > ovn-nbctl acl-add sw1 to-lport 100 sctp allow-related > ovn-nbctl acl-add sw1 from-lport 100 sctp allow-related > > In this particular environment, improvement was seen in send_bw, > latency, and msg_rate measurements, where applicable, for all three > protocols under test. > > for UDP, send_bw: 293.6 MB/sec => 313.2 MB/sec (+6.68%) > latency: 16 us => 14.08 us (-12%) > msg_rate: 62.56 K/sec => 71.06 K/sec (+13.59%) > > for TCP, latency: 18.6 us => 14.88 us (-20%) > msg_rate: 53.8 K/sec => 67.28 K/sec (+25.06%) > > for SCTP, latency: 21.98 us => 19.42 us (-11.65%) > msg_rate: 45.58 K/sec => 51.54 K/sec (+13.08%) > > Interestingly, some performance improvement was also seen for the same > scenarios with no ACLs set at all, albeit significantly more > negligible. > > for UDP, send_bw: 320.0 MB/sec => 338.6 MB/sec (+5.81%) > latency: 13.74 us => 12.88 us (-6.68%) > msg_rate: 73.02 K/sec => 77.84 K/sec (+6.6%) > > for TCP, latency: 15.62 us => 14.26 us (-9.54%) > msg_rate: 64.02 K/sec => 70.26 K/sec (+9.75%) > > for SCTP, latency: 19.56 us => 18.16 us (-7.16%) > msg_rate: 51.16 K/sec => 55.12 K/sec (+7.74%) > > 2) Supermicro RH-OSP cluster: > > Two scenarios executed: > - connectivity between two ports on the same switch > - connectivity between two ports on different switches connected via > router > > For the same switch, improvements are as follows: > - TCP latency: -5.3%, UDP latency: -13.5%, SCTP latency: -10.8% > - TCP bw: +0.8%, UDP bw, +12.25%, SCTP bw: +21.29% > > For different switches, improvements are as follows: > - TCP latency: -6%, UDP: -11.2%, SCTP: -0.2% > - TCP bw: -0.7%, UDP bw: +2%, SCTP bw: +9.9% > > The effect is more noticeable in same switch scenario. This is interesting. Any idea why with different switches the result is so different? I'd expect similar improvement just like in same switch, because in this kind of test the flow should be cached in kernel datapath (regardless of same/different logical switches) and in both cases the cost of traversing conntrack is saved. > > Comparable numbers can be captured with iperf. It may be useful to run > more tests in a more elaborate (bare metal) environment. > > === > > The patch takes inspiration from a now abandoned patch: > > "ovn-northd: Support mixing stateless/stateful ACLs with > Stateless_Filter." by Dumitru Ceara. > > The original patch assumed CMS doesn't require full flexibility of > matching rules for stateless matching (for example, to be used by > OpenShift). But other CMS interfaces may require the same > customizability for stateless as well as stateful matching, like in > OpenStack Neutron API. Which is why this patch reuses existing ACL > object type to describe stateless rules. > > Signed-off-by: Ihar Hrachyshka <ihrachys@redhat.com> > > --- > > v1: initial version. > v2: rebased after conflict. > v3: added ddlog implementation. > v3: apply stateless skip-hint-set rules to appropriate direction only. > v3: added more background as to implementation in commit message. > v3: test stateless scenarios with ddlog too. > v3: rebased after conflict. > v4: introduce a separate allow-stateless ACL match verb. > v5: rebased. > v6: updated docs for new allow-stateless approach. > v6: removed no longer valid comments. > v6: removed acl_is_stateless. > v7: bump north db schema version to 5.31.0 -> 5.32.0. > v8: fixed checkpatch failure on a too long line in dbschema. > v8: added perf data on baremetal lab testing. > v9: fixed ovsdb checksum. > --- > NEWS | 2 + > northd/ovn-northd.8.xml | 8 +- > northd/ovn-northd.c | 65 ++++++++- > northd/ovn_northd.dl | 31 ++++ > ovn-nb.ovsschema | 9 +- > ovn-nb.xml | 9 +- > tests/ovn-northd.at | 309 ++++++++++++++++++++++++++++++++++++++++ > utilities/ovn-nbctl.c | 6 +- > 8 files changed, 428 insertions(+), 11 deletions(-) > > diff --git a/NEWS b/NEWS > index 1ddde15f8..d72139e76 100644 > --- a/NEWS > +++ b/NEWS > @@ -11,6 +11,8 @@ Post-v21.03.0 > 'use_parallel_build' to enable it. It is disabled by default. > - Support vlan-passthru mode for tag=0 localnet ports. > - Support custom 802.11ad EthType for localnet ports. > + - Introduce a new "allow-stateless" ACL verb to always bypass connection > + tracking. The existing "allow" verb behavior is left intact. > > OVN v21.03.0 - 12 Mar 2021 > ------------------------- > diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml > index 54e88d3fa..08c4ea852 100644 > --- a/northd/ovn-northd.8.xml > +++ b/northd/ovn-northd.8.xml > @@ -419,7 +419,9 @@ > before eventually advancing to ingress table <code>ACLs</code>. If > special ports such as route ports or localnet ports can't use ct(), a > priority-110 flow is added to skip over stateful ACLs. IPv6 Neighbor > - Discovery and MLD traffic also skips stateful ACLs. > + Discovery and MLD traffic also skips stateful ACLs. For "allow-stateless" > + ACLs, a flow is added to bypass setting the hint for connection tracker > + processing. This approach clearly achieves the goal of bypassing conntrack and there is great dataplane performance gain, but it also doubles the ACL logical flows which in turn could significantly increase the control plane cost, especially in ovn-controller. In large scale environments where lots of ACLs referencing large port-group/address-sets are used, it is already a bottleneck of ovn-controller flow computing. It would be better if we could avoid doubling this cost. Instead of directly add the ACL match here, can we set a new hint (e.g. REGBIT_BYPASS_CT = 1), and in the later stage pre-stateful add a higher priority flow, if this hint is matched then directly "next" without going through CT? I hope this can achieve the same goal but not increasing control plane cost. What do you think? > </p> > > <p> > @@ -614,6 +616,10 @@ > for new connections and <code>reg0[1] = 1; next;</code> for existing > connections. > </li> > + <li> > + <code>allow-stateless</code> ACLs translate into logical > + flows with the <code>next;</code> action. > + </li> > <li> > <code>reject</code> ACLs translate into logical > flows with the > diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c > index 94fae5648..4032f1441 100644 > --- a/northd/ovn-northd.c > +++ b/northd/ovn-northd.c > @@ -4974,7 +4974,52 @@ skip_port_from_conntrack(struct ovn_datapath *od, struct ovn_port *op, > } > > static void > -build_pre_acls(struct ovn_datapath *od, struct hmap *lflows) > +build_stateless_filter(struct ovn_datapath *od, > + const struct nbrec_acl *acl, > + struct hmap *lflows) > +{ > + if (!strcmp(acl->direction, "from-lport")) { > + ovn_lflow_add_with_hint(lflows, od, S_SWITCH_IN_PRE_ACL, > + acl->priority + OVN_ACL_PRI_OFFSET, > + acl->match, > + "next;", > + &acl->header_); > + } else { > + ovn_lflow_add_with_hint(lflows, od, S_SWITCH_OUT_PRE_ACL, > + acl->priority + OVN_ACL_PRI_OFFSET, > + acl->match, > + "next;", > + &acl->header_); > + } > +} > + > +static void > +build_stateless_filters(struct ovn_datapath *od, struct hmap *port_groups, > + struct hmap *lflows) > +{ > + for (size_t i = 0; i < od->nbs->n_acls; i++) { > + const struct nbrec_acl *acl = od->nbs->acls[i]; > + if (!strcmp(acl->action, "allow-stateless")) { > + build_stateless_filter(od, acl, lflows); > + } > + } > + > + struct ovn_port_group *pg; > + HMAP_FOR_EACH (pg, key_node, port_groups) { > + if (ovn_port_group_ls_find(pg, &od->nbs->header_.uuid)) { > + for (size_t i = 0; i < pg->nb_pg->n_acls; i++) { > + const struct nbrec_acl *acl = pg->nb_pg->acls[i]; > + if (!strcmp(acl->action, "allow-stateless")) { > + build_stateless_filter(od, acl, lflows); > + } > + } > + } > + } > +} > + > +static void > +build_pre_acls(struct ovn_datapath *od, struct hmap *port_groups, > + struct hmap *lflows) > { > /* Ingress and Egress Pre-ACL Table (Priority 0): Packets are > * allowed by default. */ > @@ -5002,6 +5047,8 @@ build_pre_acls(struct ovn_datapath *od, struct hmap *lflows) > 110, lflows); > } > nit: there is a comment here several lines above this that needs to be updated: | /* If there are any stateful ACL rules in this datapath, we must | * send all IP packets through the conntrack action, which handles | * defragmentation, in order to match L4 headers. */ > + build_stateless_filters(od, port_groups, lflows); > + > /* Ingress and Egress Pre-ACL Table (Priority 110). > * > * Not to do conntrack on ND and ICMP destination > @@ -5369,7 +5416,8 @@ build_acl_log(struct ds *actions, const struct nbrec_acl *acl, > } else if (!strcmp(acl->action, "reject")) { > ds_put_cstr(actions, "verdict=reject, "); > } else if (!strcmp(acl->action, "allow") > - || !strcmp(acl->action, "allow-related")) { > + || !strcmp(acl->action, "allow-related") > + || !strcmp(acl->action, "allow-stateless")) { > ds_put_cstr(actions, "verdict=allow, "); > } > > @@ -5428,7 +5476,16 @@ consider_acl(struct hmap *lflows, struct ovn_datapath *od, > bool ingress = !strcmp(acl->direction, "from-lport") ? true :false; > enum ovn_stage stage = ingress ? S_SWITCH_IN_ACL : S_SWITCH_OUT_ACL; > > - if (!strcmp(acl->action, "allow") > + if (!strcmp(acl->action, "allow-stateless")) { > + struct ds actions = DS_EMPTY_INITIALIZER; > + build_acl_log(&actions, acl, meter_groups); > + ds_put_cstr(&actions, "next;"); > + ovn_lflow_add_with_hint(lflows, od, stage, > + acl->priority + OVN_ACL_PRI_OFFSET, > + acl->match, ds_cstr(&actions), > + &acl->header_); > + ds_destroy(&actions); > + } else if (!strcmp(acl->action, "allow") > || !strcmp(acl->action, "allow-related")) { > /* If there are any stateful flows, we must even commit "allow" > * actions. This is because, while the initiater's > @@ -6828,7 +6885,7 @@ build_lswitch_lflows_pre_acl_and_acl(struct ovn_datapath *od, > od->has_stateful_acl = ls_has_stateful_acl(od); > od->has_lb_vip = ls_has_lb_vip(od); > > - build_pre_acls(od, lflows); > + build_pre_acls(od, port_groups, lflows); > build_pre_lb(od, lflows, meter_groups, lbs); > build_pre_stateful(od, lflows); > build_acl_hints(od, lflows); > diff --git a/northd/ovn_northd.dl b/northd/ovn_northd.dl > index 7953325aa..5ae52b3e8 100644 > --- a/northd/ovn_northd.dl > +++ b/northd/ovn_northd.dl > @@ -1824,6 +1824,27 @@ for (&Switch(.ls =ls)) { > .external_ids = map_empty()) > } > > +for (&SwitchACL(.sw = sw@&Switch{.ls = ls}, .acl = &acl, .has_fair_meter = fair_meter)) { > + if (sw.has_stateful_acl) { > + if (acl.action == "allow-stateless") { > + if (acl.direction == "from-lport") { > + Flow(.logical_datapath = ls._uuid, > + .stage = s_SWITCH_IN_PRE_ACL(), > + .priority = acl.priority + oVN_ACL_PRI_OFFSET(), > + .__match = acl.__match, > + .actions = "next;", > + .external_ids = stage_hint(acl._uuid)) > + } else { > + Flow(.logical_datapath = ls._uuid, > + .stage = s_SWITCH_OUT_PRE_ACL(), > + .priority = acl.priority + oVN_ACL_PRI_OFFSET(), > + .__match = acl.__match, > + .actions = "next;", > + .external_ids = stage_hint(acl._uuid)) > + } > + } > + } > +} > > /* If there are any stateful ACL rules in this datapath, we must > * send all IP packets through the conntrack action, which handles > @@ -2168,6 +2189,9 @@ function build_acl_log(acl: nb::ACL, fair_meter: bool): string = > "allow-related" -> { > strs.push("verdict=allow") > }, > + "allow-stateless" -> { > + strs.push("verdict=allow") > + }, > _ -> () > }; > match (acl.meter) { > @@ -2546,6 +2570,13 @@ for (&SwitchACL(.sw = sw@&Switch{.ls = ls}, .acl = &acl, .has_fair_meter = fair_ > .actions = "${acl_log}next;", > .external_ids = stage_hint) > } > + } else if (acl.action == "allow-stateless") { > + Flow(.logical_datapath = ls._uuid, > + .stage = stage, > + .priority = acl.priority + oVN_ACL_PRI_OFFSET(), > + .__match = acl.__match, > + .actions = "${acl_log}next;", > + .external_ids = stage_hint) > } else if (acl.action == "drop" or acl.action == "reject") { > /* The implementation of "drop" differs if stateful ACLs are in > * use for this datapath. In that case, the actions differ > diff --git a/ovn-nb.ovsschema b/ovn-nb.ovsschema > index 29019809c..faf619a1c 100644 > --- a/ovn-nb.ovsschema > +++ b/ovn-nb.ovsschema > @@ -1,7 +1,7 @@ > { > "name": "OVN_Northbound", > - "version": "5.31.0", > - "cksum": "2352750632 28701", > + "version": "5.32.0", > + "cksum": "204590300 28863", > "tables": { > "NB_Global": { > "columns": { > @@ -221,7 +221,10 @@ > "enum": ["set", ["from-lport", "to-lport"]]}}}, > "match": {"type": "string"}, > "action": {"type": {"key": {"type": "string", > - "enum": ["set", ["allow", "allow-related", "drop", "reject"]]}}}, > + "enum": ["set", > + ["allow", "allow-related", > + "allow-stateless", "drop", > + "reject"]]}}}, > "log": {"type": "boolean"}, > "severity": {"type": {"key": {"type": "string", > "enum": ["set", > diff --git a/ovn-nb.xml b/ovn-nb.xml > index feb38b5d3..5386c8fef 100644 > --- a/ovn-nb.xml > +++ b/ovn-nb.xml > @@ -1791,7 +1791,9 @@ > <p>The action to take when the ACL rule matches:</p> > <ul> > <li> > - <code>allow</code>: Forward the packet. > + <code>allow</code>: Forward the packet. It will also send the > + packets through connection tracking when > + <code>allow-related</code> rules exist on the logical switch. > </li> Thanks for updating the documentation. It is really helpful for understanding the difference between "allow" and "allow-stateless". I think it could be even more clear if we just say something like: when there is no "allow-related" rules exist on the logical switch, "allow" is equivalent to "allow-stateless". Otherwise, it is equivalent to "allow-related". (And the order of "allow-stateless" and "allow" may be switched) > > <li> > @@ -1799,6 +1801,11 @@ > (e.g. inbound replies to an outbound connection). > </li> > > + <li> > + <code>allow-stateless</code>: Always forward the packet in stateless > + manner. May require defining additional rules for inbound replies. > + </li> > + It would be better to clarify more on the "inbound replies" part. The commit message has more details, but it is better to make sure the users understand it by just reading this text. We may also emphasise that the packets will not go through conntrack table, even if there are other "allow-related" rules co-exist. Thanks, Han > <li> > <code>drop</code>: Silently drop the packet. > </li> > diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at > index 32afb4fa8..f62526fd3 100644 > --- a/tests/ovn-northd.at > +++ b/tests/ovn-northd.at > @@ -2592,6 +2592,315 @@ sed 's/reg8\[[0..15\]] == [[0-9]]*/reg8\[[0..15\]] == <cleared>/' | sort], [0], > AT_CLEANUP > ]) > > +OVN_FOR_EACH_NORTHD([ > +AT_SETUP([ovn -- ACL allow-stateless omit conntrack - Logical_Switch]) > +ovn_start > + > +ovn-nbctl ls-add ls > +ovn-nbctl lsp-add ls lsp1 > +ovn-nbctl lsp-set-addresses lsp1 00:00:00:00:00:01 > +ovn-nbctl lsp-add ls lsp2 > +ovn-nbctl lsp-set-addresses lsp2 00:00:00:00:00:02 > + > +for direction in from to; do > + ovn-nbctl acl-add ls ${direction}-lport 3 "tcp" allow-related > + ovn-nbctl acl-add ls ${direction}-lport 2 "udp" allow-related > + ovn-nbctl acl-add ls ${direction}-lport 1 "ip" drop > +done > +ovn-nbctl --wait=sb sync > + > +flow_eth='eth.src == 00:00:00:00:00:01 && eth.dst == 00:00:00:00:00:02' > +flow_ip='ip.ttl==64 && ip4.src == 42.42.42.1 && ip4.dst == 66.66.66.66' > +flow_tcp='tcp && tcp.dst == 80' > +flow_udp='udp && udp.dst == 80' > + > +lsp1_inport=$(fetch_column Port_Binding tunnel_key logical_port=lsp1) > + > +# TCP packets should go to conntrack. > +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_tcp}" > +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"], [0], [dnl > +# tcp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80,tcp_flags=0 > +ct_next(ct_state=new|trk) { > + ct_next(ct_state=new|trk) { > + output("lsp2"); > + }; > +}; > +]) > + > +# UDP packets should go to conntrack. > +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_udp}" > +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"], [0], [dnl > +# udp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80 > +ct_next(ct_state=new|trk) { > + ct_next(ct_state=new|trk) { > + output("lsp2"); > + }; > +}; > +]) > + > +# Allow stateless for TCP. > +for direction in from to; do > + ovn-nbctl acl-add ls ${direction}-lport 1 tcp allow-stateless > +done > +ovn-nbctl --wait=sb sync > + > +# TCP packets should not go to conntrack anymore. > +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_tcp}" > +AT_CHECK_UNQUOTED([ovn-trace --minimal ls "${flow}"], [0], [dnl > +# tcp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80,tcp_flags=0 > +output("lsp2"); > +]) > + > +# UDP packets still go to conntrack. > +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_udp}" > +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"], [0], [dnl > +# udp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80 > +ct_next(ct_state=new|trk) { > + ct_next(ct_state=new|trk) { > + output("lsp2"); > + }; > +}; > +]) > + > +# Add a load balancer. > +ovn-nbctl lb-add lb-tcp 66.66.66.66:80 42.42.42.2:8080 tcp > +ovn-nbctl lb-add lb-udp 66.66.66.66:80 42.42.42.2:8080 udp > +ovn-nbctl ls-lb-add ls lb-tcp > +ovn-nbctl ls-lb-add ls lb-udp > + > +# Remove stateless for TCP. > +ovn-nbctl acl-del ls > +ovn-nbctl --wait=sb sync > + > +# TCP packets should go to conntrack. > +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_tcp}" > +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"], [0], [dnl > +# tcp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80,tcp_flags=0 > +ct_next(ct_state=new|trk) { > + ct_lb { > + reg0[[6]] = 0; > + *** chk_lb_hairpin_reply action not implemented; > + reg0[[12]] = 0; > + ct_next(ct_state=new|trk) { > + output("lsp2"); > + }; > + }; > +}; > +]) > + > +# UDP packets should go to conntrack. > +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_udp}" > +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"], [0], [dnl > +# udp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80 > +ct_next(ct_state=new|trk) { > + ct_lb { > + reg0[[6]] = 0; > + *** chk_lb_hairpin_reply action not implemented; > + reg0[[12]] = 0; > + ct_next(ct_state=new|trk) { > + output("lsp2"); > + }; > + }; > +}; > +]) > + > +# Allow stateless for TCP. > +for direction in from to; do > + ovn-nbctl acl-add ls ${direction}-lport 1 tcp allow-stateless > +done > +ovn-nbctl --wait=sb sync > + > +# TCP packets should go to conntrack for load balancing. > +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_tcp}" > +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"], [0], [dnl > +# tcp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80,tcp_flags=0 > +ct_next(ct_state=new|trk) { > + ct_lb { > + reg0[[6]] = 0; > + *** chk_lb_hairpin_reply action not implemented; > + reg0[[12]] = 0; > + ct_next(ct_state=new|trk) { > + output("lsp2"); > + }; > + }; > +}; > +]) > + > +# UDP packets still go to conntrack. > +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_udp}" > +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"], [0], [dnl > +# udp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80 > +ct_next(ct_state=new|trk) { > + ct_lb { > + reg0[[6]] = 0; > + *** chk_lb_hairpin_reply action not implemented; > + reg0[[12]] = 0; > + ct_next(ct_state=new|trk) { > + output("lsp2"); > + }; > + }; > +}; > +]) > + > +AT_CLEANUP > +]) > + > +OVN_FOR_EACH_NORTHD([ > +AT_SETUP([ovn -- ACL allow-stateless omit conntrack - Port_Group]) > +ovn_start > + > +ovn-nbctl ls-add ls > +ovn-nbctl lsp-add ls lsp1 > +ovn-nbctl lsp-set-addresses lsp1 00:00:00:00:00:01 > +ovn-nbctl lsp-add ls lsp2 > +ovn-nbctl lsp-set-addresses lsp2 00:00:00:00:00:02 > + > +ovn-nbctl pg-add pg lsp1 lsp2 > + > +for direction in from to; do > + ovn-nbctl acl-add pg ${direction}-lport 3 "tcp" allow-related > + ovn-nbctl acl-add pg ${direction}-lport 2 "udp" allow-related > + ovn-nbctl acl-add pg ${direction}-lport 1 "ip" drop > +done > +ovn-nbctl --wait=sb sync > + > +lsp1_inport=$(fetch_column Port_Binding tunnel_key logical_port=lsp1) > +echo $lsp1_inport > + > +flow_eth='eth.src == 00:00:00:00:00:01 && eth.dst == 00:00:00:00:00:02' > +flow_ip='ip.ttl==64 && ip4.src == 42.42.42.1 && ip4.dst == 66.66.66.66' > +flow_tcp='tcp && tcp.dst == 80' > +flow_udp='udp && udp.dst == 80' > + > +# TCP packets should go to conntrack. > +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_tcp}" > +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"], [0], [dnl > +# tcp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80,tcp_flags=0 > +ct_next(ct_state=new|trk) { > + ct_next(ct_state=new|trk) { > + output("lsp2"); > + }; > +}; > +]) > + > +# UDP packets should go to conntrack. > +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_udp}" > +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"], [0], [dnl > +# udp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80 > +ct_next(ct_state=new|trk) { > + ct_next(ct_state=new|trk) { > + output("lsp2"); > + }; > +}; > +]) > + > +# Allow stateless for TCP. > +for direction in from to; do > + ovn-nbctl acl-add pg ${direction}-lport 1 tcp allow-stateless > +done > +ovn-nbctl --wait=sb sync > + > +# TCP packets should not go to conntrack anymore. > +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_tcp}" > +AT_CHECK_UNQUOTED([ovn-trace --minimal ls "${flow}"], [0], [dnl > +# tcp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80,tcp_flags=0 > +output("lsp2"); > +]) > + > +# UDP packets still go to conntrack. > +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_udp}" > +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"], [0], [dnl > +# udp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80 > +ct_next(ct_state=new|trk) { > + ct_next(ct_state=new|trk) { > + output("lsp2"); > + }; > +}; > +]) > + > +# Add a load balancer. > +ovn-nbctl lb-add lb-tcp 66.66.66.66:80 42.42.42.2:8080 tcp > +ovn-nbctl lb-add lb-udp 66.66.66.66:80 42.42.42.2:8080 udp > +ovn-nbctl ls-lb-add ls lb-tcp > +ovn-nbctl ls-lb-add ls lb-udp > + > +# Remove stateless for TCP. > +ovn-nbctl acl-del pg > +ovn-nbctl --wait=sb sync > + > +# TCP packets should go to conntrack. > +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_tcp}" > +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"], [0], [dnl > +# tcp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80,tcp_flags=0 > +ct_next(ct_state=new|trk) { > + ct_lb { > + reg0[[6]] = 0; > + *** chk_lb_hairpin_reply action not implemented; > + reg0[[12]] = 0; > + ct_next(ct_state=new|trk) { > + output("lsp2"); > + }; > + }; > +}; > +]) > + > +# UDP packets should go to conntrack. > +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_udp}" > +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"], [0], [dnl > +# udp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80 > +ct_next(ct_state=new|trk) { > + ct_lb { > + reg0[[6]] = 0; > + *** chk_lb_hairpin_reply action not implemented; > + reg0[[12]] = 0; > + ct_next(ct_state=new|trk) { > + output("lsp2"); > + }; > + }; > +}; > +]) > + > +# Allow stateless for TCP. > +for direction in from to; do > + ovn-nbctl acl-add pg ${direction}-lport 1 tcp allow-stateless > +done > +ovn-nbctl --wait=sb sync > + > +# TCP packets should go to conntrack for load balancing. > +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_tcp}" > +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"], [0], [dnl > +# tcp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80,tcp_flags=0 > +ct_next(ct_state=new|trk) { > + ct_lb { > + reg0[[6]] = 0; > + *** chk_lb_hairpin_reply action not implemented; > + reg0[[12]] = 0; > + ct_next(ct_state=new|trk) { > + output("lsp2"); > + }; > + }; > +}; > +]) > + > +# UDP packets still go to conntrack. > +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_udp}" > +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"], [0], [dnl > +# udp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80 > +ct_next(ct_state=new|trk) { > + ct_lb { > + reg0[[6]] = 0; > + *** chk_lb_hairpin_reply action not implemented; > + reg0[[12]] = 0; > + ct_next(ct_state=new|trk) { > + output("lsp2"); > + }; > + }; > +}; > +]) > + > +AT_CLEANUP > +]) > + > OVN_FOR_EACH_NORTHD([ > AT_SETUP([ovn -- check BFD config propagation to SBDB]) > AT_KEYWORDS([northd-bfd]) > diff --git a/utilities/ovn-nbctl.c b/utilities/ovn-nbctl.c > index 042c21002..48fd0b7ee 100644 > --- a/utilities/ovn-nbctl.c > +++ b/utilities/ovn-nbctl.c > @@ -2303,9 +2303,11 @@ nbctl_acl_add(struct ctl_context *ctx) > > /* Validate action. */ > if (strcmp(action, "allow") && strcmp(action, "allow-related") > - && strcmp(action, "drop") && strcmp(action, "reject")) { > + && strcmp(action, "allow-stateless") && strcmp(action, "drop") > + && strcmp(action, "reject")) { > ctl_error(ctx, "%s: action must be one of \"allow\", " > - "\"allow-related\", \"drop\", and \"reject\"", action); > + "\"allow-related\", \"allow-stateless\", \"drop\", " > + "and \"reject\"", action); > return; > } > > -- > 2.31.1 > > _______________________________________________ > dev mailing list > dev@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
On 5/7/21 7:35 AM, Han Zhou wrote: > Hi Ihar, > > Thanks for the patch, and it's great to see the significant performance > improvement! > Please see my comments below. > > On Thu, May 6, 2021 at 7:35 AM Ihar Hrachyshka <ihrachys@redhat.com> wrote: >> >> For allow-stateless ACLs, bypass connection tracking by avoiding >> setting ct hints for matching traffic. Avoid sending all traffic to ct >> when a stateful ACL is present. >> >> === >> >> Reusing an existing 'allow' verb for stateless matching would have its >> drawbacks, specifically, when it comes to backwards incompatibility of >> the new behavior with existing environments. When using "allow" ACLs >> in mixed allow/allow-related environment, we still commit "allow" >> traffic to conntrack. This unnecessarily hits performance when mixed >> ACL action types were used for the same datapath. This is why we >> introduce a new action verb to describe stateless behavior. >> >> Another complexity to consider is the fact that with stateless >> matching, one would not be able to rely on 'related' magic that >> guarantees that reply traffic is passed through. Instead, the user >> would have to accurately define matching rules both for request and >> reply directions of a protocol session. Specifically, when allowing >> ICMP for a specific peer host, one has to define 'allow-stateless' >> rules that would match against ip.dst for request direction and ip.src >> for reply direction. Other protocols and scenarios will require their >> own fine grained matching approaches implemented by the user. >> >> === >> >> For performance measurements, qperf was used. Tests were executed on two >> setups: >> >> 1) ovn-fake-multinode virtual environment; >> 2) Supermicro SYS-5039MS-H8TRF 3-node RH-OSP 16.x cluster with 2 compute >> nodes, NIC: "Intel Corporation Ethernet Controller XXV710 for 25GbE >> SFP28 (rev 02)", using fake_nova_driver to avoid qemu overhead. >> >> 1) ovn-fake-multinode: >> >> Performance measured between two virtual nodes, two ports >> that belong to different LSs connected via router. Using qperf, >> performance was measured for UDP, TCP, SCTP protocols (using >> <proto>_lat and <proto>_bw tests). The qperf version used: >> 0.4.9-16.fc31.x86_64. Each test scenario was executed five times and >> averages compared. >> >> Tests were executed with `allow-stateless` rules for the tested >> protocol and `allow-related` for another protocol set for both ports, >> both directions, e.g. for TCP scenario, the following ACLs were >> defined: >> >> ovn-nbctl acl-add sw0 to-lport 100 tcp allow-stateless >> ovn-nbctl acl-add sw0 from-lport 100 tcp allow-stateless >> ovn-nbctl acl-add sw1 to-lport 100 tcp allow-stateless >> ovn-nbctl acl-add sw1 from-lport 100 tcp allow-stateless >> >> ovn-nbctl acl-add sw0 to-lport 100 sctp allow-related >> ovn-nbctl acl-add sw0 from-lport 100 sctp allow-related >> ovn-nbctl acl-add sw1 to-lport 100 sctp allow-related >> ovn-nbctl acl-add sw1 from-lport 100 sctp allow-related >> >> In this particular environment, improvement was seen in send_bw, >> latency, and msg_rate measurements, where applicable, for all three >> protocols under test. >> >> for UDP, send_bw: 293.6 MB/sec => 313.2 MB/sec (+6.68%) >> latency: 16 us => 14.08 us (-12%) >> msg_rate: 62.56 K/sec => 71.06 K/sec (+13.59%) >> >> for TCP, latency: 18.6 us => 14.88 us (-20%) >> msg_rate: 53.8 K/sec => 67.28 K/sec (+25.06%) >> >> for SCTP, latency: 21.98 us => 19.42 us (-11.65%) >> msg_rate: 45.58 K/sec => 51.54 K/sec (+13.08%) >> >> Interestingly, some performance improvement was also seen for the same >> scenarios with no ACLs set at all, albeit significantly more >> negligible. >> >> for UDP, send_bw: 320.0 MB/sec => 338.6 MB/sec (+5.81%) >> latency: 13.74 us => 12.88 us (-6.68%) >> msg_rate: 73.02 K/sec => 77.84 K/sec (+6.6%) >> >> for TCP, latency: 15.62 us => 14.26 us (-9.54%) >> msg_rate: 64.02 K/sec => 70.26 K/sec (+9.75%) >> >> for SCTP, latency: 19.56 us => 18.16 us (-7.16%) >> msg_rate: 51.16 K/sec => 55.12 K/sec (+7.74%) >> >> 2) Supermicro RH-OSP cluster: >> >> Two scenarios executed: >> - connectivity between two ports on the same switch >> - connectivity between two ports on different switches connected via >> router >> >> For the same switch, improvements are as follows: >> - TCP latency: -5.3%, UDP latency: -13.5%, SCTP latency: -10.8% >> - TCP bw: +0.8%, UDP bw, +12.25%, SCTP bw: +21.29% >> >> For different switches, improvements are as follows: >> - TCP latency: -6%, UDP: -11.2%, SCTP: -0.2% >> - TCP bw: -0.7%, UDP bw: +2%, SCTP bw: +9.9% >> >> The effect is more noticeable in same switch scenario. > > This is interesting. Any idea why with different switches the result is so > different? I'd expect similar improvement just like in same switch, because > in this kind of test the flow should be cached in kernel datapath > (regardless of same/different logical switches) and in both cases the cost > of traversing conntrack is saved. > >> >> Comparable numbers can be captured with iperf. It may be useful to run >> more tests in a more elaborate (bare metal) environment. >> >> === >> >> The patch takes inspiration from a now abandoned patch: >> >> "ovn-northd: Support mixing stateless/stateful ACLs with >> Stateless_Filter." by Dumitru Ceara. >> >> The original patch assumed CMS doesn't require full flexibility of >> matching rules for stateless matching (for example, to be used by >> OpenShift). But other CMS interfaces may require the same >> customizability for stateless as well as stateful matching, like in >> OpenStack Neutron API. Which is why this patch reuses existing ACL >> object type to describe stateless rules. >> >> Signed-off-by: Ihar Hrachyshka <ihrachys@redhat.com> >> >> --- >> >> v1: initial version. >> v2: rebased after conflict. >> v3: added ddlog implementation. >> v3: apply stateless skip-hint-set rules to appropriate direction only. >> v3: added more background as to implementation in commit message. >> v3: test stateless scenarios with ddlog too. >> v3: rebased after conflict. >> v4: introduce a separate allow-stateless ACL match verb. >> v5: rebased. >> v6: updated docs for new allow-stateless approach. >> v6: removed no longer valid comments. >> v6: removed acl_is_stateless. >> v7: bump north db schema version to 5.31.0 -> 5.32.0. >> v8: fixed checkpatch failure on a too long line in dbschema. >> v8: added perf data on baremetal lab testing. >> v9: fixed ovsdb checksum. >> --- >> NEWS | 2 + >> northd/ovn-northd.8.xml | 8 +- >> northd/ovn-northd.c | 65 ++++++++- >> northd/ovn_northd.dl | 31 ++++ >> ovn-nb.ovsschema | 9 +- >> ovn-nb.xml | 9 +- >> tests/ovn-northd.at | 309 ++++++++++++++++++++++++++++++++++++++++ >> utilities/ovn-nbctl.c | 6 +- >> 8 files changed, 428 insertions(+), 11 deletions(-) >> >> diff --git a/NEWS b/NEWS >> index 1ddde15f8..d72139e76 100644 >> --- a/NEWS >> +++ b/NEWS >> @@ -11,6 +11,8 @@ Post-v21.03.0 >> 'use_parallel_build' to enable it. It is disabled by default. >> - Support vlan-passthru mode for tag=0 localnet ports. >> - Support custom 802.11ad EthType for localnet ports. >> + - Introduce a new "allow-stateless" ACL verb to always bypass > connection >> + tracking. The existing "allow" verb behavior is left intact. >> >> OVN v21.03.0 - 12 Mar 2021 >> ------------------------- >> diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml >> index 54e88d3fa..08c4ea852 100644 >> --- a/northd/ovn-northd.8.xml >> +++ b/northd/ovn-northd.8.xml >> @@ -419,7 +419,9 @@ >> before eventually advancing to ingress table <code>ACLs</code>. If >> special ports such as route ports or localnet ports can't use > ct(), a >> priority-110 flow is added to skip over stateful ACLs. IPv6 > Neighbor >> - Discovery and MLD traffic also skips stateful ACLs. >> + Discovery and MLD traffic also skips stateful ACLs. For > "allow-stateless" >> + ACLs, a flow is added to bypass setting the hint for connection > tracker >> + processing. > > This approach clearly achieves the goal of bypassing conntrack and there is > great dataplane performance gain, but it also doubles the ACL logical flows > which in turn could significantly increase the control plane cost, > especially in ovn-controller. In large scale environments where lots of > ACLs referencing large port-group/address-sets are used, it is already a > bottleneck of ovn-controller flow computing. It would be better if we could > avoid doubling this cost. Instead of directly add the ACL match here, can > we set a new hint (e.g. REGBIT_BYPASS_CT = 1), and in the later stage > pre-stateful add a higher priority flow, if this hint is matched then > directly "next" without going through CT? I hope this can achieve the same > goal but not increasing control plane cost. What do you think? > Wouldn't this break ACL priority? E.g., with the following ACLs: ACL1: from-lport, prio=100, ip.dst=1.1.1.1, allow-stateless ACL2: from-lport, prio=50, ip, drop If we don't add the lflow for ACL1 in the ls_in_acl stage then traffic will be dropped by the lflow added for ACL2. On the other hand, IIUC, this feature is designed to be used in combination with "allow-related" ACLs defined on the same switch, replacing "allow" ACLs in that case. Before this change an "allow" ACL would've been treated as "allow-related" if there was at least one other "allow-related" ACL applied on the logical switch. Essentially generating 2 logical flows per "allow" ACL: lflow1: "REGBIT_ACL_HINT_ALLOW_NEW==1 && acl.match" then "REGBIT_CONNTRACK_COMMIT=1, next" lflow2: "REGBIT_ACL_HINT_ALLOW==1 && acl.match" then "next" If I'm not wrong, changing this ACL to "allow-stateless" would also generate 2 logical flows, so at least from a number of logical flows perspective, the result is the same. Regards, Dumitru
On Fri, May 7, 2021 at 12:54 AM Dumitru Ceara <dceara@redhat.com> wrote: > > On 5/7/21 7:35 AM, Han Zhou wrote: > > Hi Ihar, > > > > Thanks for the patch, and it's great to see the significant performance > > improvement! > > Please see my comments below. > > > > On Thu, May 6, 2021 at 7:35 AM Ihar Hrachyshka <ihrachys@redhat.com> wrote: > >> > >> For allow-stateless ACLs, bypass connection tracking by avoiding > >> setting ct hints for matching traffic. Avoid sending all traffic to ct > >> when a stateful ACL is present. > >> > >> === > >> > >> Reusing an existing 'allow' verb for stateless matching would have its > >> drawbacks, specifically, when it comes to backwards incompatibility of > >> the new behavior with existing environments. When using "allow" ACLs > >> in mixed allow/allow-related environment, we still commit "allow" > >> traffic to conntrack. This unnecessarily hits performance when mixed > >> ACL action types were used for the same datapath. This is why we > >> introduce a new action verb to describe stateless behavior. > >> > >> Another complexity to consider is the fact that with stateless > >> matching, one would not be able to rely on 'related' magic that > >> guarantees that reply traffic is passed through. Instead, the user > >> would have to accurately define matching rules both for request and > >> reply directions of a protocol session. Specifically, when allowing > >> ICMP for a specific peer host, one has to define 'allow-stateless' > >> rules that would match against ip.dst for request direction and ip.src > >> for reply direction. Other protocols and scenarios will require their > >> own fine grained matching approaches implemented by the user. > >> > >> === > >> > >> For performance measurements, qperf was used. Tests were executed on two > >> setups: > >> > >> 1) ovn-fake-multinode virtual environment; > >> 2) Supermicro SYS-5039MS-H8TRF 3-node RH-OSP 16.x cluster with 2 compute > >> nodes, NIC: "Intel Corporation Ethernet Controller XXV710 for 25GbE > >> SFP28 (rev 02)", using fake_nova_driver to avoid qemu overhead. > >> > >> 1) ovn-fake-multinode: > >> > >> Performance measured between two virtual nodes, two ports > >> that belong to different LSs connected via router. Using qperf, > >> performance was measured for UDP, TCP, SCTP protocols (using > >> <proto>_lat and <proto>_bw tests). The qperf version used: > >> 0.4.9-16.fc31.x86_64. Each test scenario was executed five times and > >> averages compared. > >> > >> Tests were executed with `allow-stateless` rules for the tested > >> protocol and `allow-related` for another protocol set for both ports, > >> both directions, e.g. for TCP scenario, the following ACLs were > >> defined: > >> > >> ovn-nbctl acl-add sw0 to-lport 100 tcp allow-stateless > >> ovn-nbctl acl-add sw0 from-lport 100 tcp allow-stateless > >> ovn-nbctl acl-add sw1 to-lport 100 tcp allow-stateless > >> ovn-nbctl acl-add sw1 from-lport 100 tcp allow-stateless > >> > >> ovn-nbctl acl-add sw0 to-lport 100 sctp allow-related > >> ovn-nbctl acl-add sw0 from-lport 100 sctp allow-related > >> ovn-nbctl acl-add sw1 to-lport 100 sctp allow-related > >> ovn-nbctl acl-add sw1 from-lport 100 sctp allow-related > >> > >> In this particular environment, improvement was seen in send_bw, > >> latency, and msg_rate measurements, where applicable, for all three > >> protocols under test. > >> > >> for UDP, send_bw: 293.6 MB/sec => 313.2 MB/sec (+6.68%) > >> latency: 16 us => 14.08 us (-12%) > >> msg_rate: 62.56 K/sec => 71.06 K/sec (+13.59%) > >> > >> for TCP, latency: 18.6 us => 14.88 us (-20%) > >> msg_rate: 53.8 K/sec => 67.28 K/sec (+25.06%) > >> > >> for SCTP, latency: 21.98 us => 19.42 us (-11.65%) > >> msg_rate: 45.58 K/sec => 51.54 K/sec (+13.08%) > >> > >> Interestingly, some performance improvement was also seen for the same > >> scenarios with no ACLs set at all, albeit significantly more > >> negligible. > >> > >> for UDP, send_bw: 320.0 MB/sec => 338.6 MB/sec (+5.81%) > >> latency: 13.74 us => 12.88 us (-6.68%) > >> msg_rate: 73.02 K/sec => 77.84 K/sec (+6.6%) > >> > >> for TCP, latency: 15.62 us => 14.26 us (-9.54%) > >> msg_rate: 64.02 K/sec => 70.26 K/sec (+9.75%) > >> > >> for SCTP, latency: 19.56 us => 18.16 us (-7.16%) > >> msg_rate: 51.16 K/sec => 55.12 K/sec (+7.74%) > >> > >> 2) Supermicro RH-OSP cluster: > >> > >> Two scenarios executed: > >> - connectivity between two ports on the same switch > >> - connectivity between two ports on different switches connected via > >> router > >> > >> For the same switch, improvements are as follows: > >> - TCP latency: -5.3%, UDP latency: -13.5%, SCTP latency: -10.8% > >> - TCP bw: +0.8%, UDP bw, +12.25%, SCTP bw: +21.29% > >> > >> For different switches, improvements are as follows: > >> - TCP latency: -6%, UDP: -11.2%, SCTP: -0.2% > >> - TCP bw: -0.7%, UDP bw: +2%, SCTP bw: +9.9% > >> > >> The effect is more noticeable in same switch scenario. > > > > This is interesting. Any idea why with different switches the result is so > > different? I'd expect similar improvement just like in same switch, because > > in this kind of test the flow should be cached in kernel datapath > > (regardless of same/different logical switches) and in both cases the cost > > of traversing conntrack is saved. > > > >> > >> Comparable numbers can be captured with iperf. It may be useful to run > >> more tests in a more elaborate (bare metal) environment. > >> > >> === > >> > >> The patch takes inspiration from a now abandoned patch: > >> > >> "ovn-northd: Support mixing stateless/stateful ACLs with > >> Stateless_Filter." by Dumitru Ceara. > >> > >> The original patch assumed CMS doesn't require full flexibility of > >> matching rules for stateless matching (for example, to be used by > >> OpenShift). But other CMS interfaces may require the same > >> customizability for stateless as well as stateful matching, like in > >> OpenStack Neutron API. Which is why this patch reuses existing ACL > >> object type to describe stateless rules. > >> > >> Signed-off-by: Ihar Hrachyshka <ihrachys@redhat.com> > >> > >> --- > >> > >> v1: initial version. > >> v2: rebased after conflict. > >> v3: added ddlog implementation. > >> v3: apply stateless skip-hint-set rules to appropriate direction only. > >> v3: added more background as to implementation in commit message. > >> v3: test stateless scenarios with ddlog too. > >> v3: rebased after conflict. > >> v4: introduce a separate allow-stateless ACL match verb. > >> v5: rebased. > >> v6: updated docs for new allow-stateless approach. > >> v6: removed no longer valid comments. > >> v6: removed acl_is_stateless. > >> v7: bump north db schema version to 5.31.0 -> 5.32.0. > >> v8: fixed checkpatch failure on a too long line in dbschema. > >> v8: added perf data on baremetal lab testing. > >> v9: fixed ovsdb checksum. > >> --- > >> NEWS | 2 + > >> northd/ovn-northd.8.xml | 8 +- > >> northd/ovn-northd.c | 65 ++++++++- > >> northd/ovn_northd.dl | 31 ++++ > >> ovn-nb.ovsschema | 9 +- > >> ovn-nb.xml | 9 +- > >> tests/ovn-northd.at | 309 ++++++++++++++++++++++++++++++++++++++++ > >> utilities/ovn-nbctl.c | 6 +- > >> 8 files changed, 428 insertions(+), 11 deletions(-) > >> > >> diff --git a/NEWS b/NEWS > >> index 1ddde15f8..d72139e76 100644 > >> --- a/NEWS > >> +++ b/NEWS > >> @@ -11,6 +11,8 @@ Post-v21.03.0 > >> 'use_parallel_build' to enable it. It is disabled by default. > >> - Support vlan-passthru mode for tag=0 localnet ports. > >> - Support custom 802.11ad EthType for localnet ports. > >> + - Introduce a new "allow-stateless" ACL verb to always bypass > > connection > >> + tracking. The existing "allow" verb behavior is left intact. > >> > >> OVN v21.03.0 - 12 Mar 2021 > >> ------------------------- > >> diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml > >> index 54e88d3fa..08c4ea852 100644 > >> --- a/northd/ovn-northd.8.xml > >> +++ b/northd/ovn-northd.8.xml > >> @@ -419,7 +419,9 @@ > >> before eventually advancing to ingress table <code>ACLs</code>. If > >> special ports such as route ports or localnet ports can't use > > ct(), a > >> priority-110 flow is added to skip over stateful ACLs. IPv6 > > Neighbor > >> - Discovery and MLD traffic also skips stateful ACLs. > >> + Discovery and MLD traffic also skips stateful ACLs. For > > "allow-stateless" > >> + ACLs, a flow is added to bypass setting the hint for connection > > tracker > >> + processing. > > > > This approach clearly achieves the goal of bypassing conntrack and there is > > great dataplane performance gain, but it also doubles the ACL logical flows > > which in turn could significantly increase the control plane cost, > > especially in ovn-controller. In large scale environments where lots of > > ACLs referencing large port-group/address-sets are used, it is already a > > bottleneck of ovn-controller flow computing. It would be better if we could > > avoid doubling this cost. Instead of directly add the ACL match here, can > > we set a new hint (e.g. REGBIT_BYPASS_CT = 1), and in the later stage > > pre-stateful add a higher priority flow, if this hint is matched then > > directly "next" without going through CT? I hope this can achieve the same > > goal but not increasing control plane cost. What do you think? > > > > Wouldn't this break ACL priority? E.g., with the following ACLs: > > ACL1: from-lport, prio=100, ip.dst=1.1.1.1, allow-stateless > ACL2: from-lport, prio=50, ip, drop > > If we don't add the lflow for ACL1 in the ls_in_acl stage then traffic > will be dropped by the lflow added for ACL2. > You are right. Actually I was thinking about matching the hint again in acl stage and just move next, but it would skip all the stateful ACLs that could have a higher priority. This would work only if the stateless ACLs can always be treated as higher priority than stateful ACLs, but it seems unreasonable. It seems all the ACL matches have to be evaluated at the same stage to ensure the priorities are followed. So, forget about this proposal. > On the other hand, IIUC, this feature is designed to be used in > combination with "allow-related" ACLs defined on the same switch, > replacing "allow" ACLs in that case. > > Before this change an "allow" ACL would've been treated as > "allow-related" if there was at least one other "allow-related" ACL > applied on the logical switch. Essentially generating 2 logical flows > per "allow" ACL: > > lflow1: "REGBIT_ACL_HINT_ALLOW_NEW==1 && acl.match" then > "REGBIT_CONNTRACK_COMMIT=1, next" > lflow2: "REGBIT_ACL_HINT_ALLOW==1 && acl.match" then "next" > > If I'm not wrong, changing this ACL to "allow-stateless" would also > generate 2 logical flows, so at least from a number of logical flows > perspective, the result is the same. > You are right. But this also means that when there is no stateful rules, "allow-stateless" is worse than "allow" because "allow" would generate only a single lflow in that case, while "allow-stateless" always generates doubled lflows. So, I think at least we could do the simple improvement to make "allow-stateless" perform the same as "allow" when there are NO stateful ACLs. (Otherwise, my comment on the documentation regarding '"allow-stateless" is equivalent to "allow" when there are no stateful ACLs' would be wrong). We can move the logic of adding the stateless matching flows of pre-acl stage into a if-block (only when has_stateful is true). Would this work? Thanks, Han
Hi Han, On Fri, May 7, 2021 at 1:58 AM Han Zhou <hzhou@ovn.org> wrote: > > Hi Ihar, > > Thanks for the patch, and it's great to see the significant performance improvement! > Please see my comments below. > > On Thu, May 6, 2021 at 7:35 AM Ihar Hrachyshka <ihrachys@redhat.com> wrote: > > > > For allow-stateless ACLs, bypass connection tracking by avoiding > > setting ct hints for matching traffic. Avoid sending all traffic to ct > > when a stateful ACL is present. > > > > === > > > > Reusing an existing 'allow' verb for stateless matching would have its > > drawbacks, specifically, when it comes to backwards incompatibility of > > the new behavior with existing environments. When using "allow" ACLs > > in mixed allow/allow-related environment, we still commit "allow" > > traffic to conntrack. This unnecessarily hits performance when mixed > > ACL action types were used for the same datapath. This is why we > > introduce a new action verb to describe stateless behavior. > > > > Another complexity to consider is the fact that with stateless > > matching, one would not be able to rely on 'related' magic that > > guarantees that reply traffic is passed through. Instead, the user > > would have to accurately define matching rules both for request and > > reply directions of a protocol session. Specifically, when allowing > > ICMP for a specific peer host, one has to define 'allow-stateless' > > rules that would match against ip.dst for request direction and ip.src > > for reply direction. Other protocols and scenarios will require their > > own fine grained matching approaches implemented by the user. > > > > === > > > > For performance measurements, qperf was used. Tests were executed on two > > setups: > > > > 1) ovn-fake-multinode virtual environment; > > 2) Supermicro SYS-5039MS-H8TRF 3-node RH-OSP 16.x cluster with 2 compute > > nodes, NIC: "Intel Corporation Ethernet Controller XXV710 for 25GbE > > SFP28 (rev 02)", using fake_nova_driver to avoid qemu overhead. > > > > 1) ovn-fake-multinode: > > > > Performance measured between two virtual nodes, two ports > > that belong to different LSs connected via router. Using qperf, > > performance was measured for UDP, TCP, SCTP protocols (using > > <proto>_lat and <proto>_bw tests). The qperf version used: > > 0.4.9-16.fc31.x86_64. Each test scenario was executed five times and > > averages compared. > > > > Tests were executed with `allow-stateless` rules for the tested > > protocol and `allow-related` for another protocol set for both ports, > > both directions, e.g. for TCP scenario, the following ACLs were > > defined: > > > > ovn-nbctl acl-add sw0 to-lport 100 tcp allow-stateless > > ovn-nbctl acl-add sw0 from-lport 100 tcp allow-stateless > > ovn-nbctl acl-add sw1 to-lport 100 tcp allow-stateless > > ovn-nbctl acl-add sw1 from-lport 100 tcp allow-stateless > > > > ovn-nbctl acl-add sw0 to-lport 100 sctp allow-related > > ovn-nbctl acl-add sw0 from-lport 100 sctp allow-related > > ovn-nbctl acl-add sw1 to-lport 100 sctp allow-related > > ovn-nbctl acl-add sw1 from-lport 100 sctp allow-related > > > > In this particular environment, improvement was seen in send_bw, > > latency, and msg_rate measurements, where applicable, for all three > > protocols under test. > > > > for UDP, send_bw: 293.6 MB/sec => 313.2 MB/sec (+6.68%) > > latency: 16 us => 14.08 us (-12%) > > msg_rate: 62.56 K/sec => 71.06 K/sec (+13.59%) > > > > for TCP, latency: 18.6 us => 14.88 us (-20%) > > msg_rate: 53.8 K/sec => 67.28 K/sec (+25.06%) > > > > for SCTP, latency: 21.98 us => 19.42 us (-11.65%) > > msg_rate: 45.58 K/sec => 51.54 K/sec (+13.08%) > > > > Interestingly, some performance improvement was also seen for the same > > scenarios with no ACLs set at all, albeit significantly more > > negligible. > > > > for UDP, send_bw: 320.0 MB/sec => 338.6 MB/sec (+5.81%) > > latency: 13.74 us => 12.88 us (-6.68%) > > msg_rate: 73.02 K/sec => 77.84 K/sec (+6.6%) > > > > for TCP, latency: 15.62 us => 14.26 us (-9.54%) > > msg_rate: 64.02 K/sec => 70.26 K/sec (+9.75%) > > > > for SCTP, latency: 19.56 us => 18.16 us (-7.16%) > > msg_rate: 51.16 K/sec => 55.12 K/sec (+7.74%) > > > > 2) Supermicro RH-OSP cluster: > > > > Two scenarios executed: > > - connectivity between two ports on the same switch > > - connectivity between two ports on different switches connected via > > router > > > > For the same switch, improvements are as follows: > > - TCP latency: -5.3%, UDP latency: -13.5%, SCTP latency: -10.8% > > - TCP bw: +0.8%, UDP bw, +12.25%, SCTP bw: +21.29% > > > > For different switches, improvements are as follows: > > - TCP latency: -6%, UDP: -11.2%, SCTP: -0.2% > > - TCP bw: -0.7%, UDP bw: +2%, SCTP bw: +9.9% > > > > The effect is more noticeable in same switch scenario. > > This is interesting. Any idea why with different switches the result is so different? I'd expect similar improvement just like in same switch, because in this kind of test the flow should be cached in kernel datapath (regardless of same/different logical switches) and in both cases the cost of traversing conntrack is saved. > Thanks for bringing this up. I've made some repeated measurements with more test iterations to avoid flukes, also noticed that sometimes qperf server loses connectivity to the client and reports incorrect data, so avoided spurious runs in repeated tests. After doing all that, I get numbers that are a lot closer for both scenarios, though still I see some reduction in latency (f.e. 13.9% improvement for UDP latency goes slightly down to 11.9%). One other thing I notice is that TCP, for some reason, gains a lot less from the changes than UDP. My knowledge is not enough to explain the difference though. :( > > > > Comparable numbers can be captured with iperf. It may be useful to run > > more tests in a more elaborate (bare metal) environment. > > > > === > > > > The patch takes inspiration from a now abandoned patch: > > > > "ovn-northd: Support mixing stateless/stateful ACLs with > > Stateless_Filter." by Dumitru Ceara. > > > > The original patch assumed CMS doesn't require full flexibility of > > matching rules for stateless matching (for example, to be used by > > OpenShift). But other CMS interfaces may require the same > > customizability for stateless as well as stateful matching, like in > > OpenStack Neutron API. Which is why this patch reuses existing ACL > > object type to describe stateless rules. > > > > Signed-off-by: Ihar Hrachyshka <ihrachys@redhat.com> > > > > --- > > > > v1: initial version. > > v2: rebased after conflict. > > v3: added ddlog implementation. > > v3: apply stateless skip-hint-set rules to appropriate direction only. > > v3: added more background as to implementation in commit message. > > v3: test stateless scenarios with ddlog too. > > v3: rebased after conflict. > > v4: introduce a separate allow-stateless ACL match verb. > > v5: rebased. > > v6: updated docs for new allow-stateless approach. > > v6: removed no longer valid comments. > > v6: removed acl_is_stateless. > > v7: bump north db schema version to 5.31.0 -> 5.32.0. > > v8: fixed checkpatch failure on a too long line in dbschema. > > v8: added perf data on baremetal lab testing. > > v9: fixed ovsdb checksum. > > --- > > NEWS | 2 + > > northd/ovn-northd.8.xml | 8 +- > > northd/ovn-northd.c | 65 ++++++++- > > northd/ovn_northd.dl | 31 ++++ > > ovn-nb.ovsschema | 9 +- > > ovn-nb.xml | 9 +- > > tests/ovn-northd.at | 309 ++++++++++++++++++++++++++++++++++++++++ > > utilities/ovn-nbctl.c | 6 +- > > 8 files changed, 428 insertions(+), 11 deletions(-) > > > > diff --git a/NEWS b/NEWS > > index 1ddde15f8..d72139e76 100644 > > --- a/NEWS > > +++ b/NEWS > > @@ -11,6 +11,8 @@ Post-v21.03.0 > > 'use_parallel_build' to enable it. It is disabled by default. > > - Support vlan-passthru mode for tag=0 localnet ports. > > - Support custom 802.11ad EthType for localnet ports. > > + - Introduce a new "allow-stateless" ACL verb to always bypass connection > > + tracking. The existing "allow" verb behavior is left intact. > > > > OVN v21.03.0 - 12 Mar 2021 > > ------------------------- > > diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml > > index 54e88d3fa..08c4ea852 100644 > > --- a/northd/ovn-northd.8.xml > > +++ b/northd/ovn-northd.8.xml > > @@ -419,7 +419,9 @@ > > before eventually advancing to ingress table <code>ACLs</code>. If > > special ports such as route ports or localnet ports can't use ct(), a > > priority-110 flow is added to skip over stateful ACLs. IPv6 Neighbor > > - Discovery and MLD traffic also skips stateful ACLs. > > + Discovery and MLD traffic also skips stateful ACLs. For "allow-stateless" > > + ACLs, a flow is added to bypass setting the hint for connection tracker > > + processing. > > This approach clearly achieves the goal of bypassing conntrack and there is great dataplane performance gain, but it also doubles the ACL logical flows which in turn could significantly increase the control plane cost, especially in ovn-controller. In large scale environments where lots of ACLs referencing large port-group/address-sets are used, it is already a bottleneck of ovn-controller flow computing. It would be better if we could avoid doubling this cost. Instead of directly add the ACL match here, can we set a new hint (e.g. REGBIT_BYPASS_CT = 1), and in the later stage pre-stateful add a higher priority flow, if this hint is matched then directly "next" without going through CT? I hope this can achieve the same goal but not increasing control plane cost. What do you think? I believe this is already addressed somewhere else in the thread. There should be no difference in the number of flows in general case, though I see your point about a potential optimization where we would treat allow-stateless as allow when no allow-related ACLs present. If possible, I would prefer to make it a follow-up for this patch because of close release. (I'll send the patch to implement the proposal tomorrow.) > > > </p> > > > > <p> > > @@ -614,6 +616,10 @@ > > for new connections and <code>reg0[1] = 1; next;</code> for existing > > connections. > > </li> > > + <li> > > + <code>allow-stateless</code> ACLs translate into logical > > + flows with the <code>next;</code> action. > > + </li> > > <li> > > <code>reject</code> ACLs translate into logical > > flows with the > > diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c > > index 94fae5648..4032f1441 100644 > > --- a/northd/ovn-northd.c > > +++ b/northd/ovn-northd.c > > @@ -4974,7 +4974,52 @@ skip_port_from_conntrack(struct ovn_datapath *od, struct ovn_port *op, > > } > > > > static void > > -build_pre_acls(struct ovn_datapath *od, struct hmap *lflows) > > +build_stateless_filter(struct ovn_datapath *od, > > + const struct nbrec_acl *acl, > > + struct hmap *lflows) > > +{ > > + if (!strcmp(acl->direction, "from-lport")) { > > + ovn_lflow_add_with_hint(lflows, od, S_SWITCH_IN_PRE_ACL, > > + acl->priority + OVN_ACL_PRI_OFFSET, > > + acl->match, > > + "next;", > > + &acl->header_); > > + } else { > > + ovn_lflow_add_with_hint(lflows, od, S_SWITCH_OUT_PRE_ACL, > > + acl->priority + OVN_ACL_PRI_OFFSET, > > + acl->match, > > + "next;", > > + &acl->header_); > > + } > > +} > > + > > +static void > > +build_stateless_filters(struct ovn_datapath *od, struct hmap *port_groups, > > + struct hmap *lflows) > > +{ > > + for (size_t i = 0; i < od->nbs->n_acls; i++) { > > + const struct nbrec_acl *acl = od->nbs->acls[i]; > > + if (!strcmp(acl->action, "allow-stateless")) { > > + build_stateless_filter(od, acl, lflows); > > + } > > + } > > + > > + struct ovn_port_group *pg; > > + HMAP_FOR_EACH (pg, key_node, port_groups) { > > + if (ovn_port_group_ls_find(pg, &od->nbs->header_.uuid)) { > > + for (size_t i = 0; i < pg->nb_pg->n_acls; i++) { > > + const struct nbrec_acl *acl = pg->nb_pg->acls[i]; > > + if (!strcmp(acl->action, "allow-stateless")) { > > + build_stateless_filter(od, acl, lflows); > > + } > > + } > > + } > > + } > > +} > > + > > +static void > > +build_pre_acls(struct ovn_datapath *od, struct hmap *port_groups, > > + struct hmap *lflows) > > { > > /* Ingress and Egress Pre-ACL Table (Priority 0): Packets are > > * allowed by default. */ > > @@ -5002,6 +5047,8 @@ build_pre_acls(struct ovn_datapath *od, struct hmap *lflows) > > 110, lflows); > > } > > > > nit: there is a comment here several lines above this that needs to be updated: thanks, adjusted it somewhat > > | /* If there are any stateful ACL rules in this datapath, we must > | * send all IP packets through the conntrack action, which handles > | * defragmentation, in order to match L4 headers. */ > > > + build_stateless_filters(od, port_groups, lflows); > > + > > /* Ingress and Egress Pre-ACL Table (Priority 110). > > * > > * Not to do conntrack on ND and ICMP destination > > @@ -5369,7 +5416,8 @@ build_acl_log(struct ds *actions, const struct nbrec_acl *acl, > > } else if (!strcmp(acl->action, "reject")) { > > ds_put_cstr(actions, "verdict=reject, "); > > } else if (!strcmp(acl->action, "allow") > > - || !strcmp(acl->action, "allow-related")) { > > + || !strcmp(acl->action, "allow-related") > > + || !strcmp(acl->action, "allow-stateless")) { > > ds_put_cstr(actions, "verdict=allow, "); > > } > > > > @@ -5428,7 +5476,16 @@ consider_acl(struct hmap *lflows, struct ovn_datapath *od, > > bool ingress = !strcmp(acl->direction, "from-lport") ? true :false; > > enum ovn_stage stage = ingress ? S_SWITCH_IN_ACL : S_SWITCH_OUT_ACL; > > > > - if (!strcmp(acl->action, "allow") > > + if (!strcmp(acl->action, "allow-stateless")) { > > + struct ds actions = DS_EMPTY_INITIALIZER; > > + build_acl_log(&actions, acl, meter_groups); > > + ds_put_cstr(&actions, "next;"); > > + ovn_lflow_add_with_hint(lflows, od, stage, > > + acl->priority + OVN_ACL_PRI_OFFSET, > > + acl->match, ds_cstr(&actions), > > + &acl->header_); > > + ds_destroy(&actions); > > + } else if (!strcmp(acl->action, "allow") > > || !strcmp(acl->action, "allow-related")) { > > /* If there are any stateful flows, we must even commit "allow" > > * actions. This is because, while the initiater's > > @@ -6828,7 +6885,7 @@ build_lswitch_lflows_pre_acl_and_acl(struct ovn_datapath *od, > > od->has_stateful_acl = ls_has_stateful_acl(od); > > od->has_lb_vip = ls_has_lb_vip(od); > > > > - build_pre_acls(od, lflows); > > + build_pre_acls(od, port_groups, lflows); > > build_pre_lb(od, lflows, meter_groups, lbs); > > build_pre_stateful(od, lflows); > > build_acl_hints(od, lflows); > > diff --git a/northd/ovn_northd.dl b/northd/ovn_northd.dl > > index 7953325aa..5ae52b3e8 100644 > > --- a/northd/ovn_northd.dl > > +++ b/northd/ovn_northd.dl > > @@ -1824,6 +1824,27 @@ for (&Switch(.ls =ls)) { > > .external_ids = map_empty()) > > } > > > > +for (&SwitchACL(.sw = sw@&Switch{.ls = ls}, .acl = &acl, .has_fair_meter = fair_meter)) { > > + if (sw.has_stateful_acl) { > > + if (acl.action == "allow-stateless") { > > + if (acl.direction == "from-lport") { > > + Flow(.logical_datapath = ls._uuid, > > + .stage = s_SWITCH_IN_PRE_ACL(), > > + .priority = acl.priority + oVN_ACL_PRI_OFFSET(), > > + .__match = acl.__match, > > + .actions = "next;", > > + .external_ids = stage_hint(acl._uuid)) > > + } else { > > + Flow(.logical_datapath = ls._uuid, > > + .stage = s_SWITCH_OUT_PRE_ACL(), > > + .priority = acl.priority + oVN_ACL_PRI_OFFSET(), > > + .__match = acl.__match, > > + .actions = "next;", > > + .external_ids = stage_hint(acl._uuid)) > > + } > > + } > > + } > > +} > > > > /* If there are any stateful ACL rules in this datapath, we must > > * send all IP packets through the conntrack action, which handles > > @@ -2168,6 +2189,9 @@ function build_acl_log(acl: nb::ACL, fair_meter: bool): string = > > "allow-related" -> { > > strs.push("verdict=allow") > > }, > > + "allow-stateless" -> { > > + strs.push("verdict=allow") > > + }, > > _ -> () > > }; > > match (acl.meter) { > > @@ -2546,6 +2570,13 @@ for (&SwitchACL(.sw = sw@&Switch{.ls = ls}, .acl = &acl, .has_fair_meter = fair_ > > .actions = "${acl_log}next;", > > .external_ids = stage_hint) > > } > > + } else if (acl.action == "allow-stateless") { > > + Flow(.logical_datapath = ls._uuid, > > + .stage = stage, > > + .priority = acl.priority + oVN_ACL_PRI_OFFSET(), > > + .__match = acl.__match, > > + .actions = "${acl_log}next;", > > + .external_ids = stage_hint) > > } else if (acl.action == "drop" or acl.action == "reject") { > > /* The implementation of "drop" differs if stateful ACLs are in > > * use for this datapath. In that case, the actions differ > > diff --git a/ovn-nb.ovsschema b/ovn-nb.ovsschema > > index 29019809c..faf619a1c 100644 > > --- a/ovn-nb.ovsschema > > +++ b/ovn-nb.ovsschema > > @@ -1,7 +1,7 @@ > > { > > "name": "OVN_Northbound", > > - "version": "5.31.0", > > - "cksum": "2352750632 28701", > > + "version": "5.32.0", > > + "cksum": "204590300 28863", > > "tables": { > > "NB_Global": { > > "columns": { > > @@ -221,7 +221,10 @@ > > "enum": ["set", ["from-lport", "to-lport"]]}}}, > > "match": {"type": "string"}, > > "action": {"type": {"key": {"type": "string", > > - "enum": ["set", ["allow", "allow-related", "drop", "reject"]]}}}, > > + "enum": ["set", > > + ["allow", "allow-related", > > + "allow-stateless", "drop", > > + "reject"]]}}}, > > "log": {"type": "boolean"}, > > "severity": {"type": {"key": {"type": "string", > > "enum": ["set", > > diff --git a/ovn-nb.xml b/ovn-nb.xml > > index feb38b5d3..5386c8fef 100644 > > --- a/ovn-nb.xml > > +++ b/ovn-nb.xml > > @@ -1791,7 +1791,9 @@ > > <p>The action to take when the ACL rule matches:</p> > > <ul> > > <li> > > - <code>allow</code>: Forward the packet. > > + <code>allow</code>: Forward the packet. It will also send the > > + packets through connection tracking when > > + <code>allow-related</code> rules exist on the logical switch. > > </li> > > Thanks for updating the documentation. It is really helpful for understanding the difference between "allow" and "allow-stateless". I think it could be even more clear if we just say something like: when there is no "allow-related" rules exist on the logical switch, "allow" is equivalent to "allow-stateless". Otherwise, it is equivalent to "allow-related". (And the order of "allow-stateless" and "allow" may be switched) I did something similar though not verbatim in the latest upload. Please take a look if it's better. > > > > > <li> > > @@ -1799,6 +1801,11 @@ > > (e.g. inbound replies to an outbound connection). > > </li> > > > > + <li> > > + <code>allow-stateless</code>: Always forward the packet in stateless > > + manner. May require defining additional rules for inbound replies. > > + </li> > > + > It would be better to clarify more on the "inbound replies" part. The commit message has more details, but it is better to make sure the users understand it by just reading this text. We may also emphasise that the packets will not go through conntrack table, even if there are other "allow-related" rules co-exist. You are right, I've expanded a bit about matching peer rules. Also added on contrack. > > Thanks, > Han > > > <li> > > <code>drop</code>: Silently drop the packet. > > </li> > > diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at > > index 32afb4fa8..f62526fd3 100644 > > --- a/tests/ovn-northd.at > > +++ b/tests/ovn-northd.at > > @@ -2592,6 +2592,315 @@ sed 's/reg8\[[0..15\]] == [[0-9]]*/reg8\[[0..15\]] == <cleared>/' | sort], [0], > > AT_CLEANUP > > ]) > > > > +OVN_FOR_EACH_NORTHD([ > > +AT_SETUP([ovn -- ACL allow-stateless omit conntrack - Logical_Switch]) > > +ovn_start > > + > > +ovn-nbctl ls-add ls > > +ovn-nbctl lsp-add ls lsp1 > > +ovn-nbctl lsp-set-addresses lsp1 00:00:00:00:00:01 > > +ovn-nbctl lsp-add ls lsp2 > > +ovn-nbctl lsp-set-addresses lsp2 00:00:00:00:00:02 > > + > > +for direction in from to; do > > + ovn-nbctl acl-add ls ${direction}-lport 3 "tcp" allow-related > > + ovn-nbctl acl-add ls ${direction}-lport 2 "udp" allow-related > > + ovn-nbctl acl-add ls ${direction}-lport 1 "ip" drop > > +done > > +ovn-nbctl --wait=sb sync > > + > > +flow_eth='eth.src == 00:00:00:00:00:01 && eth.dst == 00:00:00:00:00:02' > > +flow_ip='ip.ttl==64 && ip4.src == 42.42.42.1 && ip4.dst == 66.66.66.66' > > +flow_tcp='tcp && tcp.dst == 80' > > +flow_udp='udp && udp.dst == 80' > > + > > +lsp1_inport=$(fetch_column Port_Binding tunnel_key logical_port=lsp1) > > + > > +# TCP packets should go to conntrack. > > +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_tcp}" > > +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"], [0], [dnl > > +# tcp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80,tcp_flags=0 > > +ct_next(ct_state=new|trk) { > > + ct_next(ct_state=new|trk) { > > + output("lsp2"); > > + }; > > +}; > > +]) > > + > > +# UDP packets should go to conntrack. > > +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_udp}" > > +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"], [0], [dnl > > +# udp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80 > > +ct_next(ct_state=new|trk) { > > + ct_next(ct_state=new|trk) { > > + output("lsp2"); > > + }; > > +}; > > +]) > > + > > +# Allow stateless for TCP. > > +for direction in from to; do > > + ovn-nbctl acl-add ls ${direction}-lport 1 tcp allow-stateless > > +done > > +ovn-nbctl --wait=sb sync > > + > > +# TCP packets should not go to conntrack anymore. > > +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_tcp}" > > +AT_CHECK_UNQUOTED([ovn-trace --minimal ls "${flow}"], [0], [dnl > > +# tcp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80,tcp_flags=0 > > +output("lsp2"); > > +]) > > + > > +# UDP packets still go to conntrack. > > +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_udp}" > > +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"], [0], [dnl > > +# udp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80 > > +ct_next(ct_state=new|trk) { > > + ct_next(ct_state=new|trk) { > > + output("lsp2"); > > + }; > > +}; > > +]) > > + > > +# Add a load balancer. > > +ovn-nbctl lb-add lb-tcp 66.66.66.66:80 42.42.42.2:8080 tcp > > +ovn-nbctl lb-add lb-udp 66.66.66.66:80 42.42.42.2:8080 udp > > +ovn-nbctl ls-lb-add ls lb-tcp > > +ovn-nbctl ls-lb-add ls lb-udp > > + > > +# Remove stateless for TCP. > > +ovn-nbctl acl-del ls > > +ovn-nbctl --wait=sb sync > > + > > +# TCP packets should go to conntrack. > > +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_tcp}" > > +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"], [0], [dnl > > +# tcp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80,tcp_flags=0 > > +ct_next(ct_state=new|trk) { > > + ct_lb { > > + reg0[[6]] = 0; > > + *** chk_lb_hairpin_reply action not implemented; > > + reg0[[12]] = 0; > > + ct_next(ct_state=new|trk) { > > + output("lsp2"); > > + }; > > + }; > > +}; > > +]) > > + > > +# UDP packets should go to conntrack. > > +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_udp}" > > +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"], [0], [dnl > > +# udp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80 > > +ct_next(ct_state=new|trk) { > > + ct_lb { > > + reg0[[6]] = 0; > > + *** chk_lb_hairpin_reply action not implemented; > > + reg0[[12]] = 0; > > + ct_next(ct_state=new|trk) { > > + output("lsp2"); > > + }; > > + }; > > +}; > > +]) > > + > > +# Allow stateless for TCP. > > +for direction in from to; do > > + ovn-nbctl acl-add ls ${direction}-lport 1 tcp allow-stateless > > +done > > +ovn-nbctl --wait=sb sync > > + > > +# TCP packets should go to conntrack for load balancing. > > +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_tcp}" > > +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"], [0], [dnl > > +# tcp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80,tcp_flags=0 > > +ct_next(ct_state=new|trk) { > > + ct_lb { > > + reg0[[6]] = 0; > > + *** chk_lb_hairpin_reply action not implemented; > > + reg0[[12]] = 0; > > + ct_next(ct_state=new|trk) { > > + output("lsp2"); > > + }; > > + }; > > +}; > > +]) > > + > > +# UDP packets still go to conntrack. > > +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_udp}" > > +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"], [0], [dnl > > +# udp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80 > > +ct_next(ct_state=new|trk) { > > + ct_lb { > > + reg0[[6]] = 0; > > + *** chk_lb_hairpin_reply action not implemented; > > + reg0[[12]] = 0; > > + ct_next(ct_state=new|trk) { > > + output("lsp2"); > > + }; > > + }; > > +}; > > +]) > > + > > +AT_CLEANUP > > +]) > > + > > +OVN_FOR_EACH_NORTHD([ > > +AT_SETUP([ovn -- ACL allow-stateless omit conntrack - Port_Group]) > > +ovn_start > > + > > +ovn-nbctl ls-add ls > > +ovn-nbctl lsp-add ls lsp1 > > +ovn-nbctl lsp-set-addresses lsp1 00:00:00:00:00:01 > > +ovn-nbctl lsp-add ls lsp2 > > +ovn-nbctl lsp-set-addresses lsp2 00:00:00:00:00:02 > > + > > +ovn-nbctl pg-add pg lsp1 lsp2 > > + > > +for direction in from to; do > > + ovn-nbctl acl-add pg ${direction}-lport 3 "tcp" allow-related > > + ovn-nbctl acl-add pg ${direction}-lport 2 "udp" allow-related > > + ovn-nbctl acl-add pg ${direction}-lport 1 "ip" drop > > +done > > +ovn-nbctl --wait=sb sync > > + > > +lsp1_inport=$(fetch_column Port_Binding tunnel_key logical_port=lsp1) > > +echo $lsp1_inport > > + > > +flow_eth='eth.src == 00:00:00:00:00:01 && eth.dst == 00:00:00:00:00:02' > > +flow_ip='ip.ttl==64 && ip4.src == 42.42.42.1 && ip4.dst == 66.66.66.66' > > +flow_tcp='tcp && tcp.dst == 80' > > +flow_udp='udp && udp.dst == 80' > > + > > +# TCP packets should go to conntrack. > > +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_tcp}" > > +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"], [0], [dnl > > +# tcp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80,tcp_flags=0 > > +ct_next(ct_state=new|trk) { > > + ct_next(ct_state=new|trk) { > > + output("lsp2"); > > + }; > > +}; > > +]) > > + > > +# UDP packets should go to conntrack. > > +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_udp}" > > +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"], [0], [dnl > > +# udp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80 > > +ct_next(ct_state=new|trk) { > > + ct_next(ct_state=new|trk) { > > + output("lsp2"); > > + }; > > +}; > > +]) > > + > > +# Allow stateless for TCP. > > +for direction in from to; do > > + ovn-nbctl acl-add pg ${direction}-lport 1 tcp allow-stateless > > +done > > +ovn-nbctl --wait=sb sync > > + > > +# TCP packets should not go to conntrack anymore. > > +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_tcp}" > > +AT_CHECK_UNQUOTED([ovn-trace --minimal ls "${flow}"], [0], [dnl > > +# tcp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80,tcp_flags=0 > > +output("lsp2"); > > +]) > > + > > +# UDP packets still go to conntrack. > > +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_udp}" > > +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"], [0], [dnl > > +# udp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80 > > +ct_next(ct_state=new|trk) { > > + ct_next(ct_state=new|trk) { > > + output("lsp2"); > > + }; > > +}; > > +]) > > + > > +# Add a load balancer. > > +ovn-nbctl lb-add lb-tcp 66.66.66.66:80 42.42.42.2:8080 tcp > > +ovn-nbctl lb-add lb-udp 66.66.66.66:80 42.42.42.2:8080 udp > > +ovn-nbctl ls-lb-add ls lb-tcp > > +ovn-nbctl ls-lb-add ls lb-udp > > + > > +# Remove stateless for TCP. > > +ovn-nbctl acl-del pg > > +ovn-nbctl --wait=sb sync > > + > > +# TCP packets should go to conntrack. > > +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_tcp}" > > +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"], [0], [dnl > > +# tcp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80,tcp_flags=0 > > +ct_next(ct_state=new|trk) { > > + ct_lb { > > + reg0[[6]] = 0; > > + *** chk_lb_hairpin_reply action not implemented; > > + reg0[[12]] = 0; > > + ct_next(ct_state=new|trk) { > > + output("lsp2"); > > + }; > > + }; > > +}; > > +]) > > + > > +# UDP packets should go to conntrack. > > +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_udp}" > > +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"], [0], [dnl > > +# udp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80 > > +ct_next(ct_state=new|trk) { > > + ct_lb { > > + reg0[[6]] = 0; > > + *** chk_lb_hairpin_reply action not implemented; > > + reg0[[12]] = 0; > > + ct_next(ct_state=new|trk) { > > + output("lsp2"); > > + }; > > + }; > > +}; > > +]) > > + > > +# Allow stateless for TCP. > > +for direction in from to; do > > + ovn-nbctl acl-add pg ${direction}-lport 1 tcp allow-stateless > > +done > > +ovn-nbctl --wait=sb sync > > + > > +# TCP packets should go to conntrack for load balancing. > > +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_tcp}" > > +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"], [0], [dnl > > +# tcp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80,tcp_flags=0 > > +ct_next(ct_state=new|trk) { > > + ct_lb { > > + reg0[[6]] = 0; > > + *** chk_lb_hairpin_reply action not implemented; > > + reg0[[12]] = 0; > > + ct_next(ct_state=new|trk) { > > + output("lsp2"); > > + }; > > + }; > > +}; > > +]) > > + > > +# UDP packets still go to conntrack. > > +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_udp}" > > +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"], [0], [dnl > > +# udp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80 > > +ct_next(ct_state=new|trk) { > > + ct_lb { > > + reg0[[6]] = 0; > > + *** chk_lb_hairpin_reply action not implemented; > > + reg0[[12]] = 0; > > + ct_next(ct_state=new|trk) { > > + output("lsp2"); > > + }; > > + }; > > +}; > > +]) > > + > > +AT_CLEANUP > > +]) > > + > > OVN_FOR_EACH_NORTHD([ > > AT_SETUP([ovn -- check BFD config propagation to SBDB]) > > AT_KEYWORDS([northd-bfd]) > > diff --git a/utilities/ovn-nbctl.c b/utilities/ovn-nbctl.c > > index 042c21002..48fd0b7ee 100644 > > --- a/utilities/ovn-nbctl.c > > +++ b/utilities/ovn-nbctl.c > > @@ -2303,9 +2303,11 @@ nbctl_acl_add(struct ctl_context *ctx) > > > > /* Validate action. */ > > if (strcmp(action, "allow") && strcmp(action, "allow-related") > > - && strcmp(action, "drop") && strcmp(action, "reject")) { > > + && strcmp(action, "allow-stateless") && strcmp(action, "drop") > > + && strcmp(action, "reject")) { > > ctl_error(ctx, "%s: action must be one of \"allow\", " > > - "\"allow-related\", \"drop\", and \"reject\"", action); > > + "\"allow-related\", \"allow-stateless\", \"drop\", " > > + "and \"reject\"", action); > > return; > > } > > > > -- > > 2.31.1 > > > > _______________________________________________ > > dev mailing list > > dev@openvswitch.org > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
On Sun, May 9, 2021 at 6:28 PM Ihar Hrachyshka <ihrachys@redhat.com> wrote: > > Hi Han, > > On Fri, May 7, 2021 at 1:58 AM Han Zhou <hzhou@ovn.org> wrote: > > > > Hi Ihar, > > > > Thanks for the patch, and it's great to see the significant performance improvement! > > Please see my comments below. > > > > On Thu, May 6, 2021 at 7:35 AM Ihar Hrachyshka <ihrachys@redhat.com> wrote: > > > > > > For allow-stateless ACLs, bypass connection tracking by avoiding > > > setting ct hints for matching traffic. Avoid sending all traffic to ct > > > when a stateful ACL is present. > > > > > > === > > > > > > Reusing an existing 'allow' verb for stateless matching would have its > > > drawbacks, specifically, when it comes to backwards incompatibility of > > > the new behavior with existing environments. When using "allow" ACLs > > > in mixed allow/allow-related environment, we still commit "allow" > > > traffic to conntrack. This unnecessarily hits performance when mixed > > > ACL action types were used for the same datapath. This is why we > > > introduce a new action verb to describe stateless behavior. > > > > > > Another complexity to consider is the fact that with stateless > > > matching, one would not be able to rely on 'related' magic that > > > guarantees that reply traffic is passed through. Instead, the user > > > would have to accurately define matching rules both for request and > > > reply directions of a protocol session. Specifically, when allowing > > > ICMP for a specific peer host, one has to define 'allow-stateless' > > > rules that would match against ip.dst for request direction and ip.src > > > for reply direction. Other protocols and scenarios will require their > > > own fine grained matching approaches implemented by the user. > > > > > > === > > > > > > For performance measurements, qperf was used. Tests were executed on two > > > setups: > > > > > > 1) ovn-fake-multinode virtual environment; > > > 2) Supermicro SYS-5039MS-H8TRF 3-node RH-OSP 16.x cluster with 2 compute > > > nodes, NIC: "Intel Corporation Ethernet Controller XXV710 for 25GbE > > > SFP28 (rev 02)", using fake_nova_driver to avoid qemu overhead. > > > > > > 1) ovn-fake-multinode: > > > > > > Performance measured between two virtual nodes, two ports > > > that belong to different LSs connected via router. Using qperf, > > > performance was measured for UDP, TCP, SCTP protocols (using > > > <proto>_lat and <proto>_bw tests). The qperf version used: > > > 0.4.9-16.fc31.x86_64. Each test scenario was executed five times and > > > averages compared. > > > > > > Tests were executed with `allow-stateless` rules for the tested > > > protocol and `allow-related` for another protocol set for both ports, > > > both directions, e.g. for TCP scenario, the following ACLs were > > > defined: > > > > > > ovn-nbctl acl-add sw0 to-lport 100 tcp allow-stateless > > > ovn-nbctl acl-add sw0 from-lport 100 tcp allow-stateless > > > ovn-nbctl acl-add sw1 to-lport 100 tcp allow-stateless > > > ovn-nbctl acl-add sw1 from-lport 100 tcp allow-stateless > > > > > > ovn-nbctl acl-add sw0 to-lport 100 sctp allow-related > > > ovn-nbctl acl-add sw0 from-lport 100 sctp allow-related > > > ovn-nbctl acl-add sw1 to-lport 100 sctp allow-related > > > ovn-nbctl acl-add sw1 from-lport 100 sctp allow-related > > > > > > In this particular environment, improvement was seen in send_bw, > > > latency, and msg_rate measurements, where applicable, for all three > > > protocols under test. > > > > > > for UDP, send_bw: 293.6 MB/sec => 313.2 MB/sec (+6.68%) > > > latency: 16 us => 14.08 us (-12%) > > > msg_rate: 62.56 K/sec => 71.06 K/sec (+13.59%) > > > > > > for TCP, latency: 18.6 us => 14.88 us (-20%) > > > msg_rate: 53.8 K/sec => 67.28 K/sec (+25.06%) > > > > > > for SCTP, latency: 21.98 us => 19.42 us (-11.65%) > > > msg_rate: 45.58 K/sec => 51.54 K/sec (+13.08%) > > > > > > Interestingly, some performance improvement was also seen for the same > > > scenarios with no ACLs set at all, albeit significantly more > > > negligible. > > > > > > for UDP, send_bw: 320.0 MB/sec => 338.6 MB/sec (+5.81%) > > > latency: 13.74 us => 12.88 us (-6.68%) > > > msg_rate: 73.02 K/sec => 77.84 K/sec (+6.6%) > > > > > > for TCP, latency: 15.62 us => 14.26 us (-9.54%) > > > msg_rate: 64.02 K/sec => 70.26 K/sec (+9.75%) > > > > > > for SCTP, latency: 19.56 us => 18.16 us (-7.16%) > > > msg_rate: 51.16 K/sec => 55.12 K/sec (+7.74%) > > > > > > 2) Supermicro RH-OSP cluster: > > > > > > Two scenarios executed: > > > - connectivity between two ports on the same switch > > > - connectivity between two ports on different switches connected via > > > router > > > > > > For the same switch, improvements are as follows: > > > - TCP latency: -5.3%, UDP latency: -13.5%, SCTP latency: -10.8% > > > - TCP bw: +0.8%, UDP bw, +12.25%, SCTP bw: +21.29% > > > > > > For different switches, improvements are as follows: > > > - TCP latency: -6%, UDP: -11.2%, SCTP: -0.2% > > > - TCP bw: -0.7%, UDP bw: +2%, SCTP bw: +9.9% > > > > > > The effect is more noticeable in same switch scenario. > > > > This is interesting. Any idea why with different switches the result is so different? I'd expect similar improvement just like in same switch, because in this kind of test the flow should be cached in kernel datapath (regardless of same/different logical switches) and in both cases the cost of traversing conntrack is saved. > > > > Thanks for bringing this up. I've made some repeated measurements with > more test iterations to avoid flukes, also noticed that sometimes > qperf server loses connectivity to the client and reports incorrect > data, so avoided spurious runs in repeated tests. After doing all > that, I get numbers that are a lot closer for both scenarios, though > still I see some reduction in latency (f.e. 13.9% improvement for UDP > latency goes slightly down to 11.9%). One other thing I notice is that > TCP, for some reason, gains a lot less from the changes than UDP. My > knowledge is not enough to explain the difference though. :( > Thanks for retesting and I checked the new data in your v10 patch, which looks more consistent. Here is my guess to why TCP shows less gains. Since TCP itself requires states on both source and destination's stack, so the conntrack cost introduced by OVN contributes less significant part of the e2e cost comparing with UDP. So when this OVN conntrack is removed, TCP shows less gains than UDP. TCP still shows obvious improvement in latency, because there is absolutely a saving of conntrack cost from the e2e datapath, but it shows almost no improvement in bandwidth, probably because this conntrack wasn't the bottleneck between the source and destination, so when throughput is considered it is almost the same, and I think this is ok and nothing wrong about it. Does this sound reasonable? > > > > > > Comparable numbers can be captured with iperf. It may be useful to run > > > more tests in a more elaborate (bare metal) environment. > > > > > > === > > > > > > The patch takes inspiration from a now abandoned patch: > > > > > > "ovn-northd: Support mixing stateless/stateful ACLs with > > > Stateless_Filter." by Dumitru Ceara. > > > > > > The original patch assumed CMS doesn't require full flexibility of > > > matching rules for stateless matching (for example, to be used by > > > OpenShift). But other CMS interfaces may require the same > > > customizability for stateless as well as stateful matching, like in > > > OpenStack Neutron API. Which is why this patch reuses existing ACL > > > object type to describe stateless rules. > > > > > > Signed-off-by: Ihar Hrachyshka <ihrachys@redhat.com> > > > > > > --- > > > > > > v1: initial version. > > > v2: rebased after conflict. > > > v3: added ddlog implementation. > > > v3: apply stateless skip-hint-set rules to appropriate direction only. > > > v3: added more background as to implementation in commit message. > > > v3: test stateless scenarios with ddlog too. > > > v3: rebased after conflict. > > > v4: introduce a separate allow-stateless ACL match verb. > > > v5: rebased. > > > v6: updated docs for new allow-stateless approach. > > > v6: removed no longer valid comments. > > > v6: removed acl_is_stateless. > > > v7: bump north db schema version to 5.31.0 -> 5.32.0. > > > v8: fixed checkpatch failure on a too long line in dbschema. > > > v8: added perf data on baremetal lab testing. > > > v9: fixed ovsdb checksum. > > > --- > > > NEWS | 2 + > > > northd/ovn-northd.8.xml | 8 +- > > > northd/ovn-northd.c | 65 ++++++++- > > > northd/ovn_northd.dl | 31 ++++ > > > ovn-nb.ovsschema | 9 +- > > > ovn-nb.xml | 9 +- > > > tests/ovn-northd.at | 309 ++++++++++++++++++++++++++++++++++++++++ > > > utilities/ovn-nbctl.c | 6 +- > > > 8 files changed, 428 insertions(+), 11 deletions(-) > > > > > > diff --git a/NEWS b/NEWS > > > index 1ddde15f8..d72139e76 100644 > > > --- a/NEWS > > > +++ b/NEWS > > > @@ -11,6 +11,8 @@ Post-v21.03.0 > > > 'use_parallel_build' to enable it. It is disabled by default. > > > - Support vlan-passthru mode for tag=0 localnet ports. > > > - Support custom 802.11ad EthType for localnet ports. > > > + - Introduce a new "allow-stateless" ACL verb to always bypass connection > > > + tracking. The existing "allow" verb behavior is left intact. > > > > > > OVN v21.03.0 - 12 Mar 2021 > > > ------------------------- > > > diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml > > > index 54e88d3fa..08c4ea852 100644 > > > --- a/northd/ovn-northd.8.xml > > > +++ b/northd/ovn-northd.8.xml > > > @@ -419,7 +419,9 @@ > > > before eventually advancing to ingress table <code>ACLs</code>. If > > > special ports such as route ports or localnet ports can't use ct(), a > > > priority-110 flow is added to skip over stateful ACLs. IPv6 Neighbor > > > - Discovery and MLD traffic also skips stateful ACLs. > > > + Discovery and MLD traffic also skips stateful ACLs. For "allow-stateless" > > > + ACLs, a flow is added to bypass setting the hint for connection tracker > > > + processing. > > > > This approach clearly achieves the goal of bypassing conntrack and there is great dataplane performance gain, but it also doubles the ACL logical flows which in turn could significantly increase the control plane cost, especially in ovn-controller. In large scale environments where lots of ACLs referencing large port-group/address-sets are used, it is already a bottleneck of ovn-controller flow computing. It would be better if we could avoid doubling this cost. Instead of directly add the ACL match here, can we set a new hint (e.g. REGBIT_BYPASS_CT = 1), and in the later stage pre-stateful add a higher priority flow, if this hint is matched then directly "next" without going through CT? I hope this can achieve the same goal but not increasing control plane cost. What do you think? > > I believe this is already addressed somewhere else in the thread. > There should be no difference in the number of flows in general case, > though I see your point about a potential optimization where we would > treat allow-stateless as allow when no allow-related ACLs present. If > possible, I would prefer to make it a follow-up for this patch because > of close release. (I'll send the patch to implement the proposal > tomorrow.) > I am ok with a follow-up patch. (but don't worry about the release, because this patch was sent before the soft freeze date so it should have secured its position in this release :)) > > > > > </p> > > > > > > <p> > > > @@ -614,6 +616,10 @@ > > > for new connections and <code>reg0[1] = 1; next;</code> for existing > > > connections. > > > </li> > > > + <li> > > > + <code>allow-stateless</code> ACLs translate into logical > > > + flows with the <code>next;</code> action. > > > + </li> > > > <li> > > > <code>reject</code> ACLs translate into logical > > > flows with the > > > diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c > > > index 94fae5648..4032f1441 100644 > > > --- a/northd/ovn-northd.c > > > +++ b/northd/ovn-northd.c > > > @@ -4974,7 +4974,52 @@ skip_port_from_conntrack(struct ovn_datapath *od, struct ovn_port *op, > > > } > > > > > > static void > > > -build_pre_acls(struct ovn_datapath *od, struct hmap *lflows) > > > +build_stateless_filter(struct ovn_datapath *od, > > > + const struct nbrec_acl *acl, > > > + struct hmap *lflows) > > > +{ > > > + if (!strcmp(acl->direction, "from-lport")) { > > > + ovn_lflow_add_with_hint(lflows, od, S_SWITCH_IN_PRE_ACL, > > > + acl->priority + OVN_ACL_PRI_OFFSET, > > > + acl->match, > > > + "next;", > > > + &acl->header_); > > > + } else { > > > + ovn_lflow_add_with_hint(lflows, od, S_SWITCH_OUT_PRE_ACL, > > > + acl->priority + OVN_ACL_PRI_OFFSET, > > > + acl->match, > > > + "next;", > > > + &acl->header_); > > > + } > > > +} > > > + > > > +static void > > > +build_stateless_filters(struct ovn_datapath *od, struct hmap *port_groups, > > > + struct hmap *lflows) > > > +{ > > > + for (size_t i = 0; i < od->nbs->n_acls; i++) { > > > + const struct nbrec_acl *acl = od->nbs->acls[i]; > > > + if (!strcmp(acl->action, "allow-stateless")) { > > > + build_stateless_filter(od, acl, lflows); > > > + } > > > + } > > > + > > > + struct ovn_port_group *pg; > > > + HMAP_FOR_EACH (pg, key_node, port_groups) { > > > + if (ovn_port_group_ls_find(pg, &od->nbs->header_.uuid)) { > > > + for (size_t i = 0; i < pg->nb_pg->n_acls; i++) { > > > + const struct nbrec_acl *acl = pg->nb_pg->acls[i]; > > > + if (!strcmp(acl->action, "allow-stateless")) { > > > + build_stateless_filter(od, acl, lflows); > > > + } > > > + } > > > + } > > > + } > > > +} > > > + > > > +static void > > > +build_pre_acls(struct ovn_datapath *od, struct hmap *port_groups, > > > + struct hmap *lflows) > > > { > > > /* Ingress and Egress Pre-ACL Table (Priority 0): Packets are > > > * allowed by default. */ > > > @@ -5002,6 +5047,8 @@ build_pre_acls(struct ovn_datapath *od, struct hmap *lflows) > > > 110, lflows); > > > } > > > > > > > nit: there is a comment here several lines above this that needs to be updated: > > thanks, adjusted it somewhat > > > > > | /* If there are any stateful ACL rules in this datapath, we must > > | * send all IP packets through the conntrack action, which handles > > | * defragmentation, in order to match L4 headers. */ > > > > > + build_stateless_filters(od, port_groups, lflows); > > > + > > > /* Ingress and Egress Pre-ACL Table (Priority 110). > > > * > > > * Not to do conntrack on ND and ICMP destination > > > @@ -5369,7 +5416,8 @@ build_acl_log(struct ds *actions, const struct nbrec_acl *acl, > > > } else if (!strcmp(acl->action, "reject")) { > > > ds_put_cstr(actions, "verdict=reject, "); > > > } else if (!strcmp(acl->action, "allow") > > > - || !strcmp(acl->action, "allow-related")) { > > > + || !strcmp(acl->action, "allow-related") > > > + || !strcmp(acl->action, "allow-stateless")) { > > > ds_put_cstr(actions, "verdict=allow, "); > > > } > > > > > > @@ -5428,7 +5476,16 @@ consider_acl(struct hmap *lflows, struct ovn_datapath *od, > > > bool ingress = !strcmp(acl->direction, "from-lport") ? true :false; > > > enum ovn_stage stage = ingress ? S_SWITCH_IN_ACL : S_SWITCH_OUT_ACL; > > > > > > - if (!strcmp(acl->action, "allow") > > > + if (!strcmp(acl->action, "allow-stateless")) { > > > + struct ds actions = DS_EMPTY_INITIALIZER; > > > + build_acl_log(&actions, acl, meter_groups); > > > + ds_put_cstr(&actions, "next;"); > > > + ovn_lflow_add_with_hint(lflows, od, stage, > > > + acl->priority + OVN_ACL_PRI_OFFSET, > > > + acl->match, ds_cstr(&actions), > > > + &acl->header_); > > > + ds_destroy(&actions); > > > + } else if (!strcmp(acl->action, "allow") > > > || !strcmp(acl->action, "allow-related")) { > > > /* If there are any stateful flows, we must even commit "allow" > > > * actions. This is because, while the initiater's > > > @@ -6828,7 +6885,7 @@ build_lswitch_lflows_pre_acl_and_acl(struct ovn_datapath *od, > > > od->has_stateful_acl = ls_has_stateful_acl(od); > > > od->has_lb_vip = ls_has_lb_vip(od); > > > > > > - build_pre_acls(od, lflows); > > > + build_pre_acls(od, port_groups, lflows); > > > build_pre_lb(od, lflows, meter_groups, lbs); > > > build_pre_stateful(od, lflows); > > > build_acl_hints(od, lflows); > > > diff --git a/northd/ovn_northd.dl b/northd/ovn_northd.dl > > > index 7953325aa..5ae52b3e8 100644 > > > --- a/northd/ovn_northd.dl > > > +++ b/northd/ovn_northd.dl > > > @@ -1824,6 +1824,27 @@ for (&Switch(.ls =ls)) { > > > .external_ids = map_empty()) > > > } > > > > > > +for (&SwitchACL(.sw = sw@&Switch{.ls = ls}, .acl = &acl, .has_fair_meter = fair_meter)) { > > > + if (sw.has_stateful_acl) { > > > + if (acl.action == "allow-stateless") { > > > + if (acl.direction == "from-lport") { > > > + Flow(.logical_datapath = ls._uuid, > > > + .stage = s_SWITCH_IN_PRE_ACL(), > > > + .priority = acl.priority + oVN_ACL_PRI_OFFSET(), > > > + .__match = acl.__match, > > > + .actions = "next;", > > > + .external_ids = stage_hint(acl._uuid)) > > > + } else { > > > + Flow(.logical_datapath = ls._uuid, > > > + .stage = s_SWITCH_OUT_PRE_ACL(), > > > + .priority = acl.priority + oVN_ACL_PRI_OFFSET(), > > > + .__match = acl.__match, > > > + .actions = "next;", > > > + .external_ids = stage_hint(acl._uuid)) > > > + } > > > + } > > > + } > > > +} > > > > > > /* If there are any stateful ACL rules in this datapath, we must > > > * send all IP packets through the conntrack action, which handles > > > @@ -2168,6 +2189,9 @@ function build_acl_log(acl: nb::ACL, fair_meter: bool): string = > > > "allow-related" -> { > > > strs.push("verdict=allow") > > > }, > > > + "allow-stateless" -> { > > > + strs.push("verdict=allow") > > > + }, > > > _ -> () > > > }; > > > match (acl.meter) { > > > @@ -2546,6 +2570,13 @@ for (&SwitchACL(.sw = sw@&Switch{.ls = ls}, .acl = &acl, .has_fair_meter = fair_ > > > .actions = "${acl_log}next;", > > > .external_ids = stage_hint) > > > } > > > + } else if (acl.action == "allow-stateless") { > > > + Flow(.logical_datapath = ls._uuid, > > > + .stage = stage, > > > + .priority = acl.priority + oVN_ACL_PRI_OFFSET(), > > > + .__match = acl.__match, > > > + .actions = "${acl_log}next;", > > > + .external_ids = stage_hint) > > > } else if (acl.action == "drop" or acl.action == "reject") { > > > /* The implementation of "drop" differs if stateful ACLs are in > > > * use for this datapath. In that case, the actions differ > > > diff --git a/ovn-nb.ovsschema b/ovn-nb.ovsschema > > > index 29019809c..faf619a1c 100644 > > > --- a/ovn-nb.ovsschema > > > +++ b/ovn-nb.ovsschema > > > @@ -1,7 +1,7 @@ > > > { > > > "name": "OVN_Northbound", > > > - "version": "5.31.0", > > > - "cksum": "2352750632 28701", > > > + "version": "5.32.0", > > > + "cksum": "204590300 28863", > > > "tables": { > > > "NB_Global": { > > > "columns": { > > > @@ -221,7 +221,10 @@ > > > "enum": ["set", ["from-lport", "to-lport"]]}}}, > > > "match": {"type": "string"}, > > > "action": {"type": {"key": {"type": "string", > > > - "enum": ["set", ["allow", "allow-related", "drop", "reject"]]}}}, > > > + "enum": ["set", > > > + ["allow", "allow-related", > > > + "allow-stateless", "drop", > > > + "reject"]]}}}, > > > "log": {"type": "boolean"}, > > > "severity": {"type": {"key": {"type": "string", > > > "enum": ["set", > > > diff --git a/ovn-nb.xml b/ovn-nb.xml > > > index feb38b5d3..5386c8fef 100644 > > > --- a/ovn-nb.xml > > > +++ b/ovn-nb.xml > > > @@ -1791,7 +1791,9 @@ > > > <p>The action to take when the ACL rule matches:</p> > > > <ul> > > > <li> > > > - <code>allow</code>: Forward the packet. > > > + <code>allow</code>: Forward the packet. It will also send the > > > + packets through connection tracking when > > > + <code>allow-related</code> rules exist on the logical switch. > > > </li> > > > > Thanks for updating the documentation. It is really helpful for understanding the difference between "allow" and "allow-stateless". I think it could be even more clear if we just say something like: when there is no "allow-related" rules exist on the logical switch, "allow" is equivalent to "allow-stateless". Otherwise, it is equivalent to "allow-related". (And the order of "allow-stateless" and "allow" may be switched) > > I did something similar though not verbatim in the latest upload. > Please take a look if it's better. > > > > > > > > > <li> > > > @@ -1799,6 +1801,11 @@ > > > (e.g. inbound replies to an outbound connection). > > > </li> > > > > > > + <li> > > > + <code>allow-stateless</code>: Always forward the packet in stateless > > > + manner. May require defining additional rules for inbound replies. > > > + </li> > > > + > > It would be better to clarify more on the "inbound replies" part. The commit message has more details, but it is better to make sure the users understand it by just reading this text. We may also emphasise that the packets will not go through conntrack table, even if there are other "allow-related" rules co-exist. > > You are right, I've expanded a bit about matching peer rules. Also > added on contrack. > > > > > Thanks, > > Han > > > > > <li> > > > <code>drop</code>: Silently drop the packet. > > > </li> > > > diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at > > > index 32afb4fa8..f62526fd3 100644 > > > --- a/tests/ovn-northd.at > > > +++ b/tests/ovn-northd.at > > > @@ -2592,6 +2592,315 @@ sed 's/reg8\[[0..15\]] == [[0-9]]*/reg8\[[0..15\]] == <cleared>/' | sort], [0], > > > AT_CLEANUP > > > ]) > > > > > > +OVN_FOR_EACH_NORTHD([ > > > +AT_SETUP([ovn -- ACL allow-stateless omit conntrack - Logical_Switch]) > > > +ovn_start > > > + > > > +ovn-nbctl ls-add ls > > > +ovn-nbctl lsp-add ls lsp1 > > > +ovn-nbctl lsp-set-addresses lsp1 00:00:00:00:00:01 > > > +ovn-nbctl lsp-add ls lsp2 > > > +ovn-nbctl lsp-set-addresses lsp2 00:00:00:00:00:02 > > > + > > > +for direction in from to; do > > > + ovn-nbctl acl-add ls ${direction}-lport 3 "tcp" allow-related > > > + ovn-nbctl acl-add ls ${direction}-lport 2 "udp" allow-related > > > + ovn-nbctl acl-add ls ${direction}-lport 1 "ip" drop > > > +done > > > +ovn-nbctl --wait=sb sync > > > + > > > +flow_eth='eth.src == 00:00:00:00:00:01 && eth.dst == 00:00:00:00:00:02' > > > +flow_ip='ip.ttl==64 && ip4.src == 42.42.42.1 && ip4.dst == 66.66.66.66' > > > +flow_tcp='tcp && tcp.dst == 80' > > > +flow_udp='udp && udp.dst == 80' > > > + > > > +lsp1_inport=$(fetch_column Port_Binding tunnel_key logical_port=lsp1) > > > + > > > +# TCP packets should go to conntrack. > > > +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_tcp}" > > > +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"], [0], [dnl > > > +# tcp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80,tcp_flags=0 > > > +ct_next(ct_state=new|trk) { > > > + ct_next(ct_state=new|trk) { > > > + output("lsp2"); > > > + }; > > > +}; > > > +]) > > > + > > > +# UDP packets should go to conntrack. > > > +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_udp}" > > > +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"], [0], [dnl > > > +# udp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80 > > > +ct_next(ct_state=new|trk) { > > > + ct_next(ct_state=new|trk) { > > > + output("lsp2"); > > > + }; > > > +}; > > > +]) > > > + > > > +# Allow stateless for TCP. > > > +for direction in from to; do > > > + ovn-nbctl acl-add ls ${direction}-lport 1 tcp allow-stateless > > > +done > > > +ovn-nbctl --wait=sb sync > > > + > > > +# TCP packets should not go to conntrack anymore. > > > +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_tcp}" > > > +AT_CHECK_UNQUOTED([ovn-trace --minimal ls "${flow}"], [0], [dnl > > > +# tcp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80,tcp_flags=0 > > > +output("lsp2"); > > > +]) > > > + > > > +# UDP packets still go to conntrack. > > > +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_udp}" > > > +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"], [0], [dnl > > > +# udp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80 > > > +ct_next(ct_state=new|trk) { > > > + ct_next(ct_state=new|trk) { > > > + output("lsp2"); > > > + }; > > > +}; > > > +]) > > > + > > > +# Add a load balancer. > > > +ovn-nbctl lb-add lb-tcp 66.66.66.66:80 42.42.42.2:8080 tcp > > > +ovn-nbctl lb-add lb-udp 66.66.66.66:80 42.42.42.2:8080 udp > > > +ovn-nbctl ls-lb-add ls lb-tcp > > > +ovn-nbctl ls-lb-add ls lb-udp > > > + > > > +# Remove stateless for TCP. > > > +ovn-nbctl acl-del ls > > > +ovn-nbctl --wait=sb sync > > > + > > > +# TCP packets should go to conntrack. > > > +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_tcp}" > > > +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"], [0], [dnl > > > +# tcp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80,tcp_flags=0 > > > +ct_next(ct_state=new|trk) { > > > + ct_lb { > > > + reg0[[6]] = 0; > > > + *** chk_lb_hairpin_reply action not implemented; > > > + reg0[[12]] = 0; > > > + ct_next(ct_state=new|trk) { > > > + output("lsp2"); > > > + }; > > > + }; > > > +}; > > > +]) > > > + > > > +# UDP packets should go to conntrack. > > > +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_udp}" > > > +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"], [0], [dnl > > > +# udp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80 > > > +ct_next(ct_state=new|trk) { > > > + ct_lb { > > > + reg0[[6]] = 0; > > > + *** chk_lb_hairpin_reply action not implemented; > > > + reg0[[12]] = 0; > > > + ct_next(ct_state=new|trk) { > > > + output("lsp2"); > > > + }; > > > + }; > > > +}; > > > +]) > > > + > > > +# Allow stateless for TCP. > > > +for direction in from to; do > > > + ovn-nbctl acl-add ls ${direction}-lport 1 tcp allow-stateless > > > +done > > > +ovn-nbctl --wait=sb sync > > > + > > > +# TCP packets should go to conntrack for load balancing. > > > +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_tcp}" > > > +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"], [0], [dnl > > > +# tcp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80,tcp_flags=0 > > > +ct_next(ct_state=new|trk) { > > > + ct_lb { > > > + reg0[[6]] = 0; > > > + *** chk_lb_hairpin_reply action not implemented; > > > + reg0[[12]] = 0; > > > + ct_next(ct_state=new|trk) { > > > + output("lsp2"); > > > + }; > > > + }; > > > +}; > > > +]) > > > + > > > +# UDP packets still go to conntrack. > > > +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_udp}" > > > +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"], [0], [dnl > > > +# udp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80 > > > +ct_next(ct_state=new|trk) { > > > + ct_lb { > > > + reg0[[6]] = 0; > > > + *** chk_lb_hairpin_reply action not implemented; > > > + reg0[[12]] = 0; > > > + ct_next(ct_state=new|trk) { > > > + output("lsp2"); > > > + }; > > > + }; > > > +}; > > > +]) > > > + > > > +AT_CLEANUP > > > +]) > > > + > > > +OVN_FOR_EACH_NORTHD([ > > > +AT_SETUP([ovn -- ACL allow-stateless omit conntrack - Port_Group]) > > > +ovn_start > > > + > > > +ovn-nbctl ls-add ls > > > +ovn-nbctl lsp-add ls lsp1 > > > +ovn-nbctl lsp-set-addresses lsp1 00:00:00:00:00:01 > > > +ovn-nbctl lsp-add ls lsp2 > > > +ovn-nbctl lsp-set-addresses lsp2 00:00:00:00:00:02 > > > + > > > +ovn-nbctl pg-add pg lsp1 lsp2 > > > + > > > +for direction in from to; do > > > + ovn-nbctl acl-add pg ${direction}-lport 3 "tcp" allow-related > > > + ovn-nbctl acl-add pg ${direction}-lport 2 "udp" allow-related > > > + ovn-nbctl acl-add pg ${direction}-lport 1 "ip" drop > > > +done > > > +ovn-nbctl --wait=sb sync > > > + > > > +lsp1_inport=$(fetch_column Port_Binding tunnel_key logical_port=lsp1) > > > +echo $lsp1_inport > > > + > > > +flow_eth='eth.src == 00:00:00:00:00:01 && eth.dst == 00:00:00:00:00:02' > > > +flow_ip='ip.ttl==64 && ip4.src == 42.42.42.1 && ip4.dst == 66.66.66.66' > > > +flow_tcp='tcp && tcp.dst == 80' > > > +flow_udp='udp && udp.dst == 80' > > > + > > > +# TCP packets should go to conntrack. > > > +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_tcp}" > > > +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"], [0], [dnl > > > +# tcp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80,tcp_flags=0 > > > +ct_next(ct_state=new|trk) { > > > + ct_next(ct_state=new|trk) { > > > + output("lsp2"); > > > + }; > > > +}; > > > +]) > > > + > > > +# UDP packets should go to conntrack. > > > +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_udp}" > > > +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"], [0], [dnl > > > +# udp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80 > > > +ct_next(ct_state=new|trk) { > > > + ct_next(ct_state=new|trk) { > > > + output("lsp2"); > > > + }; > > > +}; > > > +]) > > > + > > > +# Allow stateless for TCP. > > > +for direction in from to; do > > > + ovn-nbctl acl-add pg ${direction}-lport 1 tcp allow-stateless > > > +done > > > +ovn-nbctl --wait=sb sync > > > + > > > +# TCP packets should not go to conntrack anymore. > > > +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_tcp}" > > > +AT_CHECK_UNQUOTED([ovn-trace --minimal ls "${flow}"], [0], [dnl > > > +# tcp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80,tcp_flags=0 > > > +output("lsp2"); > > > +]) > > > + > > > +# UDP packets still go to conntrack. > > > +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_udp}" > > > +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"], [0], [dnl > > > +# udp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80 > > > +ct_next(ct_state=new|trk) { > > > + ct_next(ct_state=new|trk) { > > > + output("lsp2"); > > > + }; > > > +}; > > > +]) > > > + > > > +# Add a load balancer. > > > +ovn-nbctl lb-add lb-tcp 66.66.66.66:80 42.42.42.2:8080 tcp > > > +ovn-nbctl lb-add lb-udp 66.66.66.66:80 42.42.42.2:8080 udp > > > +ovn-nbctl ls-lb-add ls lb-tcp > > > +ovn-nbctl ls-lb-add ls lb-udp > > > + > > > +# Remove stateless for TCP. > > > +ovn-nbctl acl-del pg > > > +ovn-nbctl --wait=sb sync > > > + > > > +# TCP packets should go to conntrack. > > > +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_tcp}" > > > +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"], [0], [dnl > > > +# tcp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80,tcp_flags=0 > > > +ct_next(ct_state=new|trk) { > > > + ct_lb { > > > + reg0[[6]] = 0; > > > + *** chk_lb_hairpin_reply action not implemented; > > > + reg0[[12]] = 0; > > > + ct_next(ct_state=new|trk) { > > > + output("lsp2"); > > > + }; > > > + }; > > > +}; > > > +]) > > > + > > > +# UDP packets should go to conntrack. > > > +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_udp}" > > > +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"], [0], [dnl > > > +# udp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80 > > > +ct_next(ct_state=new|trk) { > > > + ct_lb { > > > + reg0[[6]] = 0; > > > + *** chk_lb_hairpin_reply action not implemented; > > > + reg0[[12]] = 0; > > > + ct_next(ct_state=new|trk) { > > > + output("lsp2"); > > > + }; > > > + }; > > > +}; > > > +]) > > > + > > > +# Allow stateless for TCP. > > > +for direction in from to; do > > > + ovn-nbctl acl-add pg ${direction}-lport 1 tcp allow-stateless > > > +done > > > +ovn-nbctl --wait=sb sync > > > + > > > +# TCP packets should go to conntrack for load balancing. > > > +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_tcp}" > > > +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"], [0], [dnl > > > +# tcp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80,tcp_flags=0 > > > +ct_next(ct_state=new|trk) { > > > + ct_lb { > > > + reg0[[6]] = 0; > > > + *** chk_lb_hairpin_reply action not implemented; > > > + reg0[[12]] = 0; > > > + ct_next(ct_state=new|trk) { > > > + output("lsp2"); > > > + }; > > > + }; > > > +}; > > > +]) > > > + > > > +# UDP packets still go to conntrack. > > > +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_udp}" > > > +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"], [0], [dnl > > > +# udp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80 > > > +ct_next(ct_state=new|trk) { > > > + ct_lb { > > > + reg0[[6]] = 0; > > > + *** chk_lb_hairpin_reply action not implemented; > > > + reg0[[12]] = 0; > > > + ct_next(ct_state=new|trk) { > > > + output("lsp2"); > > > + }; > > > + }; > > > +}; > > > +]) > > > + > > > +AT_CLEANUP > > > +]) > > > + > > > OVN_FOR_EACH_NORTHD([ > > > AT_SETUP([ovn -- check BFD config propagation to SBDB]) > > > AT_KEYWORDS([northd-bfd]) > > > diff --git a/utilities/ovn-nbctl.c b/utilities/ovn-nbctl.c > > > index 042c21002..48fd0b7ee 100644 > > > --- a/utilities/ovn-nbctl.c > > > +++ b/utilities/ovn-nbctl.c > > > @@ -2303,9 +2303,11 @@ nbctl_acl_add(struct ctl_context *ctx) > > > > > > /* Validate action. */ > > > if (strcmp(action, "allow") && strcmp(action, "allow-related") > > > - && strcmp(action, "drop") && strcmp(action, "reject")) { > > > + && strcmp(action, "allow-stateless") && strcmp(action, "drop") > > > + && strcmp(action, "reject")) { > > > ctl_error(ctx, "%s: action must be one of \"allow\", " > > > - "\"allow-related\", \"drop\", and \"reject\"", action); > > > + "\"allow-related\", \"allow-stateless\", \"drop\", " > > > + "and \"reject\"", action); > > > return; > > > } > > > > > > -- > > > 2.31.1 > > > > > > _______________________________________________ > > > dev mailing list > > > dev@openvswitch.org > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev >
On Fri, May 7, 2021 at 7:11 PM Han Zhou <hzhou@ovn.org> wrote: > > > > On Fri, May 7, 2021 at 12:54 AM Dumitru Ceara <dceara@redhat.com> wrote: > > > > On 5/7/21 7:35 AM, Han Zhou wrote: > > > Hi Ihar, > > > > > > Thanks for the patch, and it's great to see the significant performance > > > improvement! > > > Please see my comments below. > > > > > > On Thu, May 6, 2021 at 7:35 AM Ihar Hrachyshka <ihrachys@redhat.com> wrote: > > >> > > >> For allow-stateless ACLs, bypass connection tracking by avoiding > > >> setting ct hints for matching traffic. Avoid sending all traffic to ct > > >> when a stateful ACL is present. > > >> > > >> === > > >> > > >> Reusing an existing 'allow' verb for stateless matching would have its > > >> drawbacks, specifically, when it comes to backwards incompatibility of > > >> the new behavior with existing environments. When using "allow" ACLs > > >> in mixed allow/allow-related environment, we still commit "allow" > > >> traffic to conntrack. This unnecessarily hits performance when mixed > > >> ACL action types were used for the same datapath. This is why we > > >> introduce a new action verb to describe stateless behavior. > > >> > > >> Another complexity to consider is the fact that with stateless > > >> matching, one would not be able to rely on 'related' magic that > > >> guarantees that reply traffic is passed through. Instead, the user > > >> would have to accurately define matching rules both for request and > > >> reply directions of a protocol session. Specifically, when allowing > > >> ICMP for a specific peer host, one has to define 'allow-stateless' > > >> rules that would match against ip.dst for request direction and ip.src > > >> for reply direction. Other protocols and scenarios will require their > > >> own fine grained matching approaches implemented by the user. > > >> > > >> === > > >> > > >> For performance measurements, qperf was used. Tests were executed on two > > >> setups: > > >> > > >> 1) ovn-fake-multinode virtual environment; > > >> 2) Supermicro SYS-5039MS-H8TRF 3-node RH-OSP 16.x cluster with 2 compute > > >> nodes, NIC: "Intel Corporation Ethernet Controller XXV710 for 25GbE > > >> SFP28 (rev 02)", using fake_nova_driver to avoid qemu overhead. > > >> > > >> 1) ovn-fake-multinode: > > >> > > >> Performance measured between two virtual nodes, two ports > > >> that belong to different LSs connected via router. Using qperf, > > >> performance was measured for UDP, TCP, SCTP protocols (using > > >> <proto>_lat and <proto>_bw tests). The qperf version used: > > >> 0.4.9-16.fc31.x86_64. Each test scenario was executed five times and > > >> averages compared. > > >> > > >> Tests were executed with `allow-stateless` rules for the tested > > >> protocol and `allow-related` for another protocol set for both ports, > > >> both directions, e.g. for TCP scenario, the following ACLs were > > >> defined: > > >> > > >> ovn-nbctl acl-add sw0 to-lport 100 tcp allow-stateless > > >> ovn-nbctl acl-add sw0 from-lport 100 tcp allow-stateless > > >> ovn-nbctl acl-add sw1 to-lport 100 tcp allow-stateless > > >> ovn-nbctl acl-add sw1 from-lport 100 tcp allow-stateless > > >> > > >> ovn-nbctl acl-add sw0 to-lport 100 sctp allow-related > > >> ovn-nbctl acl-add sw0 from-lport 100 sctp allow-related > > >> ovn-nbctl acl-add sw1 to-lport 100 sctp allow-related > > >> ovn-nbctl acl-add sw1 from-lport 100 sctp allow-related > > >> > > >> In this particular environment, improvement was seen in send_bw, > > >> latency, and msg_rate measurements, where applicable, for all three > > >> protocols under test. > > >> > > >> for UDP, send_bw: 293.6 MB/sec => 313.2 MB/sec (+6.68%) > > >> latency: 16 us => 14.08 us (-12%) > > >> msg_rate: 62.56 K/sec => 71.06 K/sec (+13.59%) > > >> > > >> for TCP, latency: 18.6 us => 14.88 us (-20%) > > >> msg_rate: 53.8 K/sec => 67.28 K/sec (+25.06%) > > >> > > >> for SCTP, latency: 21.98 us => 19.42 us (-11.65%) > > >> msg_rate: 45.58 K/sec => 51.54 K/sec (+13.08%) > > >> > > >> Interestingly, some performance improvement was also seen for the same > > >> scenarios with no ACLs set at all, albeit significantly more > > >> negligible. > > >> > > >> for UDP, send_bw: 320.0 MB/sec => 338.6 MB/sec (+5.81%) > > >> latency: 13.74 us => 12.88 us (-6.68%) > > >> msg_rate: 73.02 K/sec => 77.84 K/sec (+6.6%) > > >> > > >> for TCP, latency: 15.62 us => 14.26 us (-9.54%) > > >> msg_rate: 64.02 K/sec => 70.26 K/sec (+9.75%) > > >> > > >> for SCTP, latency: 19.56 us => 18.16 us (-7.16%) > > >> msg_rate: 51.16 K/sec => 55.12 K/sec (+7.74%) > > >> > > >> 2) Supermicro RH-OSP cluster: > > >> > > >> Two scenarios executed: > > >> - connectivity between two ports on the same switch > > >> - connectivity between two ports on different switches connected via > > >> router > > >> > > >> For the same switch, improvements are as follows: > > >> - TCP latency: -5.3%, UDP latency: -13.5%, SCTP latency: -10.8% > > >> - TCP bw: +0.8%, UDP bw, +12.25%, SCTP bw: +21.29% > > >> > > >> For different switches, improvements are as follows: > > >> - TCP latency: -6%, UDP: -11.2%, SCTP: -0.2% > > >> - TCP bw: -0.7%, UDP bw: +2%, SCTP bw: +9.9% > > >> > > >> The effect is more noticeable in same switch scenario. > > > > > > This is interesting. Any idea why with different switches the result is so > > > different? I'd expect similar improvement just like in same switch, because > > > in this kind of test the flow should be cached in kernel datapath > > > (regardless of same/different logical switches) and in both cases the cost > > > of traversing conntrack is saved. > > > > > >> > > >> Comparable numbers can be captured with iperf. It may be useful to run > > >> more tests in a more elaborate (bare metal) environment. > > >> > > >> === > > >> > > >> The patch takes inspiration from a now abandoned patch: > > >> > > >> "ovn-northd: Support mixing stateless/stateful ACLs with > > >> Stateless_Filter." by Dumitru Ceara. > > >> > > >> The original patch assumed CMS doesn't require full flexibility of > > >> matching rules for stateless matching (for example, to be used by > > >> OpenShift). But other CMS interfaces may require the same > > >> customizability for stateless as well as stateful matching, like in > > >> OpenStack Neutron API. Which is why this patch reuses existing ACL > > >> object type to describe stateless rules. > > >> > > >> Signed-off-by: Ihar Hrachyshka <ihrachys@redhat.com> > > >> > > >> --- > > >> > > >> v1: initial version. > > >> v2: rebased after conflict. > > >> v3: added ddlog implementation. > > >> v3: apply stateless skip-hint-set rules to appropriate direction only. > > >> v3: added more background as to implementation in commit message. > > >> v3: test stateless scenarios with ddlog too. > > >> v3: rebased after conflict. > > >> v4: introduce a separate allow-stateless ACL match verb. > > >> v5: rebased. > > >> v6: updated docs for new allow-stateless approach. > > >> v6: removed no longer valid comments. > > >> v6: removed acl_is_stateless. > > >> v7: bump north db schema version to 5.31.0 -> 5.32.0. > > >> v8: fixed checkpatch failure on a too long line in dbschema. > > >> v8: added perf data on baremetal lab testing. > > >> v9: fixed ovsdb checksum. > > >> --- > > >> NEWS | 2 + > > >> northd/ovn-northd.8.xml | 8 +- > > >> northd/ovn-northd.c | 65 ++++++++- > > >> northd/ovn_northd.dl | 31 ++++ > > >> ovn-nb.ovsschema | 9 +- > > >> ovn-nb.xml | 9 +- > > >> tests/ovn-northd.at | 309 ++++++++++++++++++++++++++++++++++++++++ > > >> utilities/ovn-nbctl.c | 6 +- > > >> 8 files changed, 428 insertions(+), 11 deletions(-) > > >> > > >> diff --git a/NEWS b/NEWS > > >> index 1ddde15f8..d72139e76 100644 > > >> --- a/NEWS > > >> +++ b/NEWS > > >> @@ -11,6 +11,8 @@ Post-v21.03.0 > > >> 'use_parallel_build' to enable it. It is disabled by default. > > >> - Support vlan-passthru mode for tag=0 localnet ports. > > >> - Support custom 802.11ad EthType for localnet ports. > > >> + - Introduce a new "allow-stateless" ACL verb to always bypass > > > connection > > >> + tracking. The existing "allow" verb behavior is left intact. > > >> > > >> OVN v21.03.0 - 12 Mar 2021 > > >> ------------------------- > > >> diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml > > >> index 54e88d3fa..08c4ea852 100644 > > >> --- a/northd/ovn-northd.8.xml > > >> +++ b/northd/ovn-northd.8.xml > > >> @@ -419,7 +419,9 @@ > > >> before eventually advancing to ingress table <code>ACLs</code>. If > > >> special ports such as route ports or localnet ports can't use > > > ct(), a > > >> priority-110 flow is added to skip over stateful ACLs. IPv6 > > > Neighbor > > >> - Discovery and MLD traffic also skips stateful ACLs. > > >> + Discovery and MLD traffic also skips stateful ACLs. For > > > "allow-stateless" > > >> + ACLs, a flow is added to bypass setting the hint for connection > > > tracker > > >> + processing. > > > > > > This approach clearly achieves the goal of bypassing conntrack and there is > > > great dataplane performance gain, but it also doubles the ACL logical flows > > > which in turn could significantly increase the control plane cost, > > > especially in ovn-controller. In large scale environments where lots of > > > ACLs referencing large port-group/address-sets are used, it is already a > > > bottleneck of ovn-controller flow computing. It would be better if we could > > > avoid doubling this cost. Instead of directly add the ACL match here, can > > > we set a new hint (e.g. REGBIT_BYPASS_CT = 1), and in the later stage > > > pre-stateful add a higher priority flow, if this hint is matched then > > > directly "next" without going through CT? I hope this can achieve the same > > > goal but not increasing control plane cost. What do you think? > > > > > > > Wouldn't this break ACL priority? E.g., with the following ACLs: > > > > ACL1: from-lport, prio=100, ip.dst=1.1.1.1, allow-stateless > > ACL2: from-lport, prio=50, ip, drop > > > > If we don't add the lflow for ACL1 in the ls_in_acl stage then traffic > > will be dropped by the lflow added for ACL2. > > > You are right. Actually I was thinking about matching the hint again in acl stage and just move next, but it would skip all the stateful ACLs that could have a higher priority. This would work only if the stateless ACLs can always be treated as higher priority than stateful ACLs, but it seems unreasonable. It seems all the ACL matches have to be evaluated at the same stage to ensure the priorities are followed. So, forget about this proposal. > > > On the other hand, IIUC, this feature is designed to be used in > > combination with "allow-related" ACLs defined on the same switch, > > replacing "allow" ACLs in that case. > > > > Before this change an "allow" ACL would've been treated as > > "allow-related" if there was at least one other "allow-related" ACL > > applied on the logical switch. Essentially generating 2 logical flows > > per "allow" ACL: > > > > lflow1: "REGBIT_ACL_HINT_ALLOW_NEW==1 && acl.match" then > > "REGBIT_CONNTRACK_COMMIT=1, next" > > lflow2: "REGBIT_ACL_HINT_ALLOW==1 && acl.match" then "next" > > > > If I'm not wrong, changing this ACL to "allow-stateless" would also > > generate 2 logical flows, so at least from a number of logical flows > > perspective, the result is the same. > > > You are right. But this also means that when there is no stateful rules, "allow-stateless" is worse than "allow" because "allow" would generate only a single lflow in that case, while "allow-stateless" always generates doubled lflows. So, I think at least we could do the simple improvement to make "allow-stateless" perform the same as "allow" when there are NO stateful ACLs. (Otherwise, my comment on the documentation regarding '"allow-stateless" is equivalent to "allow" when there are no stateful ACLs' would be wrong). We can move the logic of adding the stateless matching flows of pre-acl stage into a if-block (only when has_stateful is true). Would this work? For the record, I looked again into that, and I don't think double lflows concern doesn't apply because we already check for od->has_stateful_acl before introducing any flows that set DEFRAG bit, as well as the new flows that omit doing it. So I don't think there is any optimization as described applicable. Let me know if that's not correct. Ihar
On Mon, May 17, 2021 at 2:27 PM Ihar Hrachyshka <ihrachys@redhat.com> wrote: > > On Fri, May 7, 2021 at 7:11 PM Han Zhou <hzhou@ovn.org> wrote: > > > > > > > > On Fri, May 7, 2021 at 12:54 AM Dumitru Ceara <dceara@redhat.com> wrote: > > > > > > On 5/7/21 7:35 AM, Han Zhou wrote: > > > > Hi Ihar, > > > > > > > > Thanks for the patch, and it's great to see the significant performance > > > > improvement! > > > > Please see my comments below. > > > > > > > > On Thu, May 6, 2021 at 7:35 AM Ihar Hrachyshka <ihrachys@redhat.com> wrote: > > > >> > > > >> For allow-stateless ACLs, bypass connection tracking by avoiding > > > >> setting ct hints for matching traffic. Avoid sending all traffic to ct > > > >> when a stateful ACL is present. > > > >> > > > >> === > > > >> > > > >> Reusing an existing 'allow' verb for stateless matching would have its > > > >> drawbacks, specifically, when it comes to backwards incompatibility of > > > >> the new behavior with existing environments. When using "allow" ACLs > > > >> in mixed allow/allow-related environment, we still commit "allow" > > > >> traffic to conntrack. This unnecessarily hits performance when mixed > > > >> ACL action types were used for the same datapath. This is why we > > > >> introduce a new action verb to describe stateless behavior. > > > >> > > > >> Another complexity to consider is the fact that with stateless > > > >> matching, one would not be able to rely on 'related' magic that > > > >> guarantees that reply traffic is passed through. Instead, the user > > > >> would have to accurately define matching rules both for request and > > > >> reply directions of a protocol session. Specifically, when allowing > > > >> ICMP for a specific peer host, one has to define 'allow-stateless' > > > >> rules that would match against ip.dst for request direction and ip.src > > > >> for reply direction. Other protocols and scenarios will require their > > > >> own fine grained matching approaches implemented by the user. > > > >> > > > >> === > > > >> > > > >> For performance measurements, qperf was used. Tests were executed on two > > > >> setups: > > > >> > > > >> 1) ovn-fake-multinode virtual environment; > > > >> 2) Supermicro SYS-5039MS-H8TRF 3-node RH-OSP 16.x cluster with 2 compute > > > >> nodes, NIC: "Intel Corporation Ethernet Controller XXV710 for 25GbE > > > >> SFP28 (rev 02)", using fake_nova_driver to avoid qemu overhead. > > > >> > > > >> 1) ovn-fake-multinode: > > > >> > > > >> Performance measured between two virtual nodes, two ports > > > >> that belong to different LSs connected via router. Using qperf, > > > >> performance was measured for UDP, TCP, SCTP protocols (using > > > >> <proto>_lat and <proto>_bw tests). The qperf version used: > > > >> 0.4.9-16.fc31.x86_64. Each test scenario was executed five times and > > > >> averages compared. > > > >> > > > >> Tests were executed with `allow-stateless` rules for the tested > > > >> protocol and `allow-related` for another protocol set for both ports, > > > >> both directions, e.g. for TCP scenario, the following ACLs were > > > >> defined: > > > >> > > > >> ovn-nbctl acl-add sw0 to-lport 100 tcp allow-stateless > > > >> ovn-nbctl acl-add sw0 from-lport 100 tcp allow-stateless > > > >> ovn-nbctl acl-add sw1 to-lport 100 tcp allow-stateless > > > >> ovn-nbctl acl-add sw1 from-lport 100 tcp allow-stateless > > > >> > > > >> ovn-nbctl acl-add sw0 to-lport 100 sctp allow-related > > > >> ovn-nbctl acl-add sw0 from-lport 100 sctp allow-related > > > >> ovn-nbctl acl-add sw1 to-lport 100 sctp allow-related > > > >> ovn-nbctl acl-add sw1 from-lport 100 sctp allow-related > > > >> > > > >> In this particular environment, improvement was seen in send_bw, > > > >> latency, and msg_rate measurements, where applicable, for all three > > > >> protocols under test. > > > >> > > > >> for UDP, send_bw: 293.6 MB/sec => 313.2 MB/sec (+6.68%) > > > >> latency: 16 us => 14.08 us (-12%) > > > >> msg_rate: 62.56 K/sec => 71.06 K/sec (+13.59%) > > > >> > > > >> for TCP, latency: 18.6 us => 14.88 us (-20%) > > > >> msg_rate: 53.8 K/sec => 67.28 K/sec (+25.06%) > > > >> > > > >> for SCTP, latency: 21.98 us => 19.42 us (-11.65%) > > > >> msg_rate: 45.58 K/sec => 51.54 K/sec (+13.08%) > > > >> > > > >> Interestingly, some performance improvement was also seen for the same > > > >> scenarios with no ACLs set at all, albeit significantly more > > > >> negligible. > > > >> > > > >> for UDP, send_bw: 320.0 MB/sec => 338.6 MB/sec (+5.81%) > > > >> latency: 13.74 us => 12.88 us (-6.68%) > > > >> msg_rate: 73.02 K/sec => 77.84 K/sec (+6.6%) > > > >> > > > >> for TCP, latency: 15.62 us => 14.26 us (-9.54%) > > > >> msg_rate: 64.02 K/sec => 70.26 K/sec (+9.75%) > > > >> > > > >> for SCTP, latency: 19.56 us => 18.16 us (-7.16%) > > > >> msg_rate: 51.16 K/sec => 55.12 K/sec (+7.74%) > > > >> > > > >> 2) Supermicro RH-OSP cluster: > > > >> > > > >> Two scenarios executed: > > > >> - connectivity between two ports on the same switch > > > >> - connectivity between two ports on different switches connected via > > > >> router > > > >> > > > >> For the same switch, improvements are as follows: > > > >> - TCP latency: -5.3%, UDP latency: -13.5%, SCTP latency: -10.8% > > > >> - TCP bw: +0.8%, UDP bw, +12.25%, SCTP bw: +21.29% > > > >> > > > >> For different switches, improvements are as follows: > > > >> - TCP latency: -6%, UDP: -11.2%, SCTP: -0.2% > > > >> - TCP bw: -0.7%, UDP bw: +2%, SCTP bw: +9.9% > > > >> > > > >> The effect is more noticeable in same switch scenario. > > > > > > > > This is interesting. Any idea why with different switches the result is so > > > > different? I'd expect similar improvement just like in same switch, because > > > > in this kind of test the flow should be cached in kernel datapath > > > > (regardless of same/different logical switches) and in both cases the cost > > > > of traversing conntrack is saved. > > > > > > > >> > > > >> Comparable numbers can be captured with iperf. It may be useful to run > > > >> more tests in a more elaborate (bare metal) environment. > > > >> > > > >> === > > > >> > > > >> The patch takes inspiration from a now abandoned patch: > > > >> > > > >> "ovn-northd: Support mixing stateless/stateful ACLs with > > > >> Stateless_Filter." by Dumitru Ceara. > > > >> > > > >> The original patch assumed CMS doesn't require full flexibility of > > > >> matching rules for stateless matching (for example, to be used by > > > >> OpenShift). But other CMS interfaces may require the same > > > >> customizability for stateless as well as stateful matching, like in > > > >> OpenStack Neutron API. Which is why this patch reuses existing ACL > > > >> object type to describe stateless rules. > > > >> > > > >> Signed-off-by: Ihar Hrachyshka <ihrachys@redhat.com> > > > >> > > > >> --- > > > >> > > > >> v1: initial version. > > > >> v2: rebased after conflict. > > > >> v3: added ddlog implementation. > > > >> v3: apply stateless skip-hint-set rules to appropriate direction only. > > > >> v3: added more background as to implementation in commit message. > > > >> v3: test stateless scenarios with ddlog too. > > > >> v3: rebased after conflict. > > > >> v4: introduce a separate allow-stateless ACL match verb. > > > >> v5: rebased. > > > >> v6: updated docs for new allow-stateless approach. > > > >> v6: removed no longer valid comments. > > > >> v6: removed acl_is_stateless. > > > >> v7: bump north db schema version to 5.31.0 -> 5.32.0. > > > >> v8: fixed checkpatch failure on a too long line in dbschema. > > > >> v8: added perf data on baremetal lab testing. > > > >> v9: fixed ovsdb checksum. > > > >> --- > > > >> NEWS | 2 + > > > >> northd/ovn-northd.8.xml | 8 +- > > > >> northd/ovn-northd.c | 65 ++++++++- > > > >> northd/ovn_northd.dl | 31 ++++ > > > >> ovn-nb.ovsschema | 9 +- > > > >> ovn-nb.xml | 9 +- > > > >> tests/ovn-northd.at | 309 ++++++++++++++++++++++++++++++++++++++++ > > > >> utilities/ovn-nbctl.c | 6 +- > > > >> 8 files changed, 428 insertions(+), 11 deletions(-) > > > >> > > > >> diff --git a/NEWS b/NEWS > > > >> index 1ddde15f8..d72139e76 100644 > > > >> --- a/NEWS > > > >> +++ b/NEWS > > > >> @@ -11,6 +11,8 @@ Post-v21.03.0 > > > >> 'use_parallel_build' to enable it. It is disabled by default. > > > >> - Support vlan-passthru mode for tag=0 localnet ports. > > > >> - Support custom 802.11ad EthType for localnet ports. > > > >> + - Introduce a new "allow-stateless" ACL verb to always bypass > > > > connection > > > >> + tracking. The existing "allow" verb behavior is left intact. > > > >> > > > >> OVN v21.03.0 - 12 Mar 2021 > > > >> ------------------------- > > > >> diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml > > > >> index 54e88d3fa..08c4ea852 100644 > > > >> --- a/northd/ovn-northd.8.xml > > > >> +++ b/northd/ovn-northd.8.xml > > > >> @@ -419,7 +419,9 @@ > > > >> before eventually advancing to ingress table <code>ACLs</code>. If > > > >> special ports such as route ports or localnet ports can't use > > > > ct(), a > > > >> priority-110 flow is added to skip over stateful ACLs. IPv6 > > > > Neighbor > > > >> - Discovery and MLD traffic also skips stateful ACLs. > > > >> + Discovery and MLD traffic also skips stateful ACLs. For > > > > "allow-stateless" > > > >> + ACLs, a flow is added to bypass setting the hint for connection > > > > tracker > > > >> + processing. > > > > > > > > This approach clearly achieves the goal of bypassing conntrack and there is > > > > great dataplane performance gain, but it also doubles the ACL logical flows > > > > which in turn could significantly increase the control plane cost, > > > > especially in ovn-controller. In large scale environments where lots of > > > > ACLs referencing large port-group/address-sets are used, it is already a > > > > bottleneck of ovn-controller flow computing. It would be better if we could > > > > avoid doubling this cost. Instead of directly add the ACL match here, can > > > > we set a new hint (e.g. REGBIT_BYPASS_CT = 1), and in the later stage > > > > pre-stateful add a higher priority flow, if this hint is matched then > > > > directly "next" without going through CT? I hope this can achieve the same > > > > goal but not increasing control plane cost. What do you think? > > > > > > > > > > Wouldn't this break ACL priority? E.g., with the following ACLs: > > > > > > ACL1: from-lport, prio=100, ip.dst=1.1.1.1, allow-stateless > > > ACL2: from-lport, prio=50, ip, drop > > > > > > If we don't add the lflow for ACL1 in the ls_in_acl stage then traffic > > > will be dropped by the lflow added for ACL2. > > > > > You are right. Actually I was thinking about matching the hint again in acl stage and just move next, but it would skip all the stateful ACLs that could have a higher priority. This would work only if the stateless ACLs can always be treated as higher priority than stateful ACLs, but it seems unreasonable. It seems all the ACL matches have to be evaluated at the same stage to ensure the priorities are followed. So, forget about this proposal. > > > > > On the other hand, IIUC, this feature is designed to be used in > > > combination with "allow-related" ACLs defined on the same switch, > > > replacing "allow" ACLs in that case. > > > > > > Before this change an "allow" ACL would've been treated as > > > "allow-related" if there was at least one other "allow-related" ACL > > > applied on the logical switch. Essentially generating 2 logical flows > > > per "allow" ACL: > > > > > > lflow1: "REGBIT_ACL_HINT_ALLOW_NEW==1 && acl.match" then > > > "REGBIT_CONNTRACK_COMMIT=1, next" > > > lflow2: "REGBIT_ACL_HINT_ALLOW==1 && acl.match" then "next" > > > > > > If I'm not wrong, changing this ACL to "allow-stateless" would also > > > generate 2 logical flows, so at least from a number of logical flows > > > perspective, the result is the same. > > > > > You are right. But this also means that when there is no stateful rules, "allow-stateless" is worse than "allow" because "allow" would generate only a single lflow in that case, while "allow-stateless" always generates doubled lflows. So, I think at least we could do the simple improvement to make "allow-stateless" perform the same as "allow" when there are NO stateful ACLs. (Otherwise, my comment on the documentation regarding '"allow-stateless" is equivalent to "allow" when there are no stateful ACLs' would be wrong). We can move the logic of adding the stateless matching flows of pre-acl stage into a if-block (only when has_stateful is true). Would this work? > > For the record, I looked again into that, and I don't think double > lflows concern doesn't apply because we already check for > od->has_stateful_acl before introducing any flows that set DEFRAG bit, > as well as the new flows that omit doing it. So I don't think there is > any optimization as described applicable. Let me know if that's not > correct. > > Ihar > Yes you are right. Sorry that I should have noticed that the caller of the function build_stateless_filters() is already in the if-block. Forget about my comment :)
diff --git a/NEWS b/NEWS index 1ddde15f8..d72139e76 100644 --- a/NEWS +++ b/NEWS @@ -11,6 +11,8 @@ Post-v21.03.0 'use_parallel_build' to enable it. It is disabled by default. - Support vlan-passthru mode for tag=0 localnet ports. - Support custom 802.11ad EthType for localnet ports. + - Introduce a new "allow-stateless" ACL verb to always bypass connection + tracking. The existing "allow" verb behavior is left intact. OVN v21.03.0 - 12 Mar 2021 ------------------------- diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml index 54e88d3fa..08c4ea852 100644 --- a/northd/ovn-northd.8.xml +++ b/northd/ovn-northd.8.xml @@ -419,7 +419,9 @@ before eventually advancing to ingress table <code>ACLs</code>. If special ports such as route ports or localnet ports can't use ct(), a priority-110 flow is added to skip over stateful ACLs. IPv6 Neighbor - Discovery and MLD traffic also skips stateful ACLs. + Discovery and MLD traffic also skips stateful ACLs. For "allow-stateless" + ACLs, a flow is added to bypass setting the hint for connection tracker + processing. </p> <p> @@ -614,6 +616,10 @@ for new connections and <code>reg0[1] = 1; next;</code> for existing connections. </li> + <li> + <code>allow-stateless</code> ACLs translate into logical + flows with the <code>next;</code> action. + </li> <li> <code>reject</code> ACLs translate into logical flows with the diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c index 94fae5648..4032f1441 100644 --- a/northd/ovn-northd.c +++ b/northd/ovn-northd.c @@ -4974,7 +4974,52 @@ skip_port_from_conntrack(struct ovn_datapath *od, struct ovn_port *op, } static void -build_pre_acls(struct ovn_datapath *od, struct hmap *lflows) +build_stateless_filter(struct ovn_datapath *od, + const struct nbrec_acl *acl, + struct hmap *lflows) +{ + if (!strcmp(acl->direction, "from-lport")) { + ovn_lflow_add_with_hint(lflows, od, S_SWITCH_IN_PRE_ACL, + acl->priority + OVN_ACL_PRI_OFFSET, + acl->match, + "next;", + &acl->header_); + } else { + ovn_lflow_add_with_hint(lflows, od, S_SWITCH_OUT_PRE_ACL, + acl->priority + OVN_ACL_PRI_OFFSET, + acl->match, + "next;", + &acl->header_); + } +} + +static void +build_stateless_filters(struct ovn_datapath *od, struct hmap *port_groups, + struct hmap *lflows) +{ + for (size_t i = 0; i < od->nbs->n_acls; i++) { + const struct nbrec_acl *acl = od->nbs->acls[i]; + if (!strcmp(acl->action, "allow-stateless")) { + build_stateless_filter(od, acl, lflows); + } + } + + struct ovn_port_group *pg; + HMAP_FOR_EACH (pg, key_node, port_groups) { + if (ovn_port_group_ls_find(pg, &od->nbs->header_.uuid)) { + for (size_t i = 0; i < pg->nb_pg->n_acls; i++) { + const struct nbrec_acl *acl = pg->nb_pg->acls[i]; + if (!strcmp(acl->action, "allow-stateless")) { + build_stateless_filter(od, acl, lflows); + } + } + } + } +} + +static void +build_pre_acls(struct ovn_datapath *od, struct hmap *port_groups, + struct hmap *lflows) { /* Ingress and Egress Pre-ACL Table (Priority 0): Packets are * allowed by default. */ @@ -5002,6 +5047,8 @@ build_pre_acls(struct ovn_datapath *od, struct hmap *lflows) 110, lflows); } + build_stateless_filters(od, port_groups, lflows); + /* Ingress and Egress Pre-ACL Table (Priority 110). * * Not to do conntrack on ND and ICMP destination @@ -5369,7 +5416,8 @@ build_acl_log(struct ds *actions, const struct nbrec_acl *acl, } else if (!strcmp(acl->action, "reject")) { ds_put_cstr(actions, "verdict=reject, "); } else if (!strcmp(acl->action, "allow") - || !strcmp(acl->action, "allow-related")) { + || !strcmp(acl->action, "allow-related") + || !strcmp(acl->action, "allow-stateless")) { ds_put_cstr(actions, "verdict=allow, "); } @@ -5428,7 +5476,16 @@ consider_acl(struct hmap *lflows, struct ovn_datapath *od, bool ingress = !strcmp(acl->direction, "from-lport") ? true :false; enum ovn_stage stage = ingress ? S_SWITCH_IN_ACL : S_SWITCH_OUT_ACL; - if (!strcmp(acl->action, "allow") + if (!strcmp(acl->action, "allow-stateless")) { + struct ds actions = DS_EMPTY_INITIALIZER; + build_acl_log(&actions, acl, meter_groups); + ds_put_cstr(&actions, "next;"); + ovn_lflow_add_with_hint(lflows, od, stage, + acl->priority + OVN_ACL_PRI_OFFSET, + acl->match, ds_cstr(&actions), + &acl->header_); + ds_destroy(&actions); + } else if (!strcmp(acl->action, "allow") || !strcmp(acl->action, "allow-related")) { /* If there are any stateful flows, we must even commit "allow" * actions. This is because, while the initiater's @@ -6828,7 +6885,7 @@ build_lswitch_lflows_pre_acl_and_acl(struct ovn_datapath *od, od->has_stateful_acl = ls_has_stateful_acl(od); od->has_lb_vip = ls_has_lb_vip(od); - build_pre_acls(od, lflows); + build_pre_acls(od, port_groups, lflows); build_pre_lb(od, lflows, meter_groups, lbs); build_pre_stateful(od, lflows); build_acl_hints(od, lflows); diff --git a/northd/ovn_northd.dl b/northd/ovn_northd.dl index 7953325aa..5ae52b3e8 100644 --- a/northd/ovn_northd.dl +++ b/northd/ovn_northd.dl @@ -1824,6 +1824,27 @@ for (&Switch(.ls =ls)) { .external_ids = map_empty()) } +for (&SwitchACL(.sw = sw@&Switch{.ls = ls}, .acl = &acl, .has_fair_meter = fair_meter)) { + if (sw.has_stateful_acl) { + if (acl.action == "allow-stateless") { + if (acl.direction == "from-lport") { + Flow(.logical_datapath = ls._uuid, + .stage = s_SWITCH_IN_PRE_ACL(), + .priority = acl.priority + oVN_ACL_PRI_OFFSET(), + .__match = acl.__match, + .actions = "next;", + .external_ids = stage_hint(acl._uuid)) + } else { + Flow(.logical_datapath = ls._uuid, + .stage = s_SWITCH_OUT_PRE_ACL(), + .priority = acl.priority + oVN_ACL_PRI_OFFSET(), + .__match = acl.__match, + .actions = "next;", + .external_ids = stage_hint(acl._uuid)) + } + } + } +} /* If there are any stateful ACL rules in this datapath, we must * send all IP packets through the conntrack action, which handles @@ -2168,6 +2189,9 @@ function build_acl_log(acl: nb::ACL, fair_meter: bool): string = "allow-related" -> { strs.push("verdict=allow") }, + "allow-stateless" -> { + strs.push("verdict=allow") + }, _ -> () }; match (acl.meter) { @@ -2546,6 +2570,13 @@ for (&SwitchACL(.sw = sw@&Switch{.ls = ls}, .acl = &acl, .has_fair_meter = fair_ .actions = "${acl_log}next;", .external_ids = stage_hint) } + } else if (acl.action == "allow-stateless") { + Flow(.logical_datapath = ls._uuid, + .stage = stage, + .priority = acl.priority + oVN_ACL_PRI_OFFSET(), + .__match = acl.__match, + .actions = "${acl_log}next;", + .external_ids = stage_hint) } else if (acl.action == "drop" or acl.action == "reject") { /* The implementation of "drop" differs if stateful ACLs are in * use for this datapath. In that case, the actions differ diff --git a/ovn-nb.ovsschema b/ovn-nb.ovsschema index 29019809c..faf619a1c 100644 --- a/ovn-nb.ovsschema +++ b/ovn-nb.ovsschema @@ -1,7 +1,7 @@ { "name": "OVN_Northbound", - "version": "5.31.0", - "cksum": "2352750632 28701", + "version": "5.32.0", + "cksum": "204590300 28863", "tables": { "NB_Global": { "columns": { @@ -221,7 +221,10 @@ "enum": ["set", ["from-lport", "to-lport"]]}}}, "match": {"type": "string"}, "action": {"type": {"key": {"type": "string", - "enum": ["set", ["allow", "allow-related", "drop", "reject"]]}}}, + "enum": ["set", + ["allow", "allow-related", + "allow-stateless", "drop", + "reject"]]}}}, "log": {"type": "boolean"}, "severity": {"type": {"key": {"type": "string", "enum": ["set", diff --git a/ovn-nb.xml b/ovn-nb.xml index feb38b5d3..5386c8fef 100644 --- a/ovn-nb.xml +++ b/ovn-nb.xml @@ -1791,7 +1791,9 @@ <p>The action to take when the ACL rule matches:</p> <ul> <li> - <code>allow</code>: Forward the packet. + <code>allow</code>: Forward the packet. It will also send the + packets through connection tracking when + <code>allow-related</code> rules exist on the logical switch. </li> <li> @@ -1799,6 +1801,11 @@ (e.g. inbound replies to an outbound connection). </li> + <li> + <code>allow-stateless</code>: Always forward the packet in stateless + manner. May require defining additional rules for inbound replies. + </li> + <li> <code>drop</code>: Silently drop the packet. </li> diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at index 32afb4fa8..f62526fd3 100644 --- a/tests/ovn-northd.at +++ b/tests/ovn-northd.at @@ -2592,6 +2592,315 @@ sed 's/reg8\[[0..15\]] == [[0-9]]*/reg8\[[0..15\]] == <cleared>/' | sort], [0], AT_CLEANUP ]) +OVN_FOR_EACH_NORTHD([ +AT_SETUP([ovn -- ACL allow-stateless omit conntrack - Logical_Switch]) +ovn_start + +ovn-nbctl ls-add ls +ovn-nbctl lsp-add ls lsp1 +ovn-nbctl lsp-set-addresses lsp1 00:00:00:00:00:01 +ovn-nbctl lsp-add ls lsp2 +ovn-nbctl lsp-set-addresses lsp2 00:00:00:00:00:02 + +for direction in from to; do + ovn-nbctl acl-add ls ${direction}-lport 3 "tcp" allow-related + ovn-nbctl acl-add ls ${direction}-lport 2 "udp" allow-related + ovn-nbctl acl-add ls ${direction}-lport 1 "ip" drop +done +ovn-nbctl --wait=sb sync + +flow_eth='eth.src == 00:00:00:00:00:01 && eth.dst == 00:00:00:00:00:02' +flow_ip='ip.ttl==64 && ip4.src == 42.42.42.1 && ip4.dst == 66.66.66.66' +flow_tcp='tcp && tcp.dst == 80' +flow_udp='udp && udp.dst == 80' + +lsp1_inport=$(fetch_column Port_Binding tunnel_key logical_port=lsp1) + +# TCP packets should go to conntrack. +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_tcp}" +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"], [0], [dnl +# tcp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80,tcp_flags=0 +ct_next(ct_state=new|trk) { + ct_next(ct_state=new|trk) { + output("lsp2"); + }; +}; +]) + +# UDP packets should go to conntrack. +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_udp}" +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"], [0], [dnl +# udp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80 +ct_next(ct_state=new|trk) { + ct_next(ct_state=new|trk) { + output("lsp2"); + }; +}; +]) + +# Allow stateless for TCP. +for direction in from to; do + ovn-nbctl acl-add ls ${direction}-lport 1 tcp allow-stateless +done +ovn-nbctl --wait=sb sync + +# TCP packets should not go to conntrack anymore. +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_tcp}" +AT_CHECK_UNQUOTED([ovn-trace --minimal ls "${flow}"], [0], [dnl +# tcp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80,tcp_flags=0 +output("lsp2"); +]) + +# UDP packets still go to conntrack. +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_udp}" +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"], [0], [dnl +# udp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80 +ct_next(ct_state=new|trk) { + ct_next(ct_state=new|trk) { + output("lsp2"); + }; +}; +]) + +# Add a load balancer. +ovn-nbctl lb-add lb-tcp 66.66.66.66:80 42.42.42.2:8080 tcp +ovn-nbctl lb-add lb-udp 66.66.66.66:80 42.42.42.2:8080 udp +ovn-nbctl ls-lb-add ls lb-tcp +ovn-nbctl ls-lb-add ls lb-udp + +# Remove stateless for TCP. +ovn-nbctl acl-del ls +ovn-nbctl --wait=sb sync + +# TCP packets should go to conntrack. +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_tcp}" +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"], [0], [dnl +# tcp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80,tcp_flags=0 +ct_next(ct_state=new|trk) { + ct_lb { + reg0[[6]] = 0; + *** chk_lb_hairpin_reply action not implemented; + reg0[[12]] = 0; + ct_next(ct_state=new|trk) { + output("lsp2"); + }; + }; +}; +]) + +# UDP packets should go to conntrack. +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_udp}" +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"], [0], [dnl +# udp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80 +ct_next(ct_state=new|trk) { + ct_lb { + reg0[[6]] = 0; + *** chk_lb_hairpin_reply action not implemented; + reg0[[12]] = 0; + ct_next(ct_state=new|trk) { + output("lsp2"); + }; + }; +}; +]) + +# Allow stateless for TCP. +for direction in from to; do + ovn-nbctl acl-add ls ${direction}-lport 1 tcp allow-stateless +done +ovn-nbctl --wait=sb sync + +# TCP packets should go to conntrack for load balancing. +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_tcp}" +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"], [0], [dnl +# tcp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80,tcp_flags=0 +ct_next(ct_state=new|trk) { + ct_lb { + reg0[[6]] = 0; + *** chk_lb_hairpin_reply action not implemented; + reg0[[12]] = 0; + ct_next(ct_state=new|trk) { + output("lsp2"); + }; + }; +}; +]) + +# UDP packets still go to conntrack. +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_udp}" +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"], [0], [dnl +# udp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80 +ct_next(ct_state=new|trk) { + ct_lb { + reg0[[6]] = 0; + *** chk_lb_hairpin_reply action not implemented; + reg0[[12]] = 0; + ct_next(ct_state=new|trk) { + output("lsp2"); + }; + }; +}; +]) + +AT_CLEANUP +]) + +OVN_FOR_EACH_NORTHD([ +AT_SETUP([ovn -- ACL allow-stateless omit conntrack - Port_Group]) +ovn_start + +ovn-nbctl ls-add ls +ovn-nbctl lsp-add ls lsp1 +ovn-nbctl lsp-set-addresses lsp1 00:00:00:00:00:01 +ovn-nbctl lsp-add ls lsp2 +ovn-nbctl lsp-set-addresses lsp2 00:00:00:00:00:02 + +ovn-nbctl pg-add pg lsp1 lsp2 + +for direction in from to; do + ovn-nbctl acl-add pg ${direction}-lport 3 "tcp" allow-related + ovn-nbctl acl-add pg ${direction}-lport 2 "udp" allow-related + ovn-nbctl acl-add pg ${direction}-lport 1 "ip" drop +done +ovn-nbctl --wait=sb sync + +lsp1_inport=$(fetch_column Port_Binding tunnel_key logical_port=lsp1) +echo $lsp1_inport + +flow_eth='eth.src == 00:00:00:00:00:01 && eth.dst == 00:00:00:00:00:02' +flow_ip='ip.ttl==64 && ip4.src == 42.42.42.1 && ip4.dst == 66.66.66.66' +flow_tcp='tcp && tcp.dst == 80' +flow_udp='udp && udp.dst == 80' + +# TCP packets should go to conntrack. +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_tcp}" +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"], [0], [dnl +# tcp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80,tcp_flags=0 +ct_next(ct_state=new|trk) { + ct_next(ct_state=new|trk) { + output("lsp2"); + }; +}; +]) + +# UDP packets should go to conntrack. +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_udp}" +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"], [0], [dnl +# udp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80 +ct_next(ct_state=new|trk) { + ct_next(ct_state=new|trk) { + output("lsp2"); + }; +}; +]) + +# Allow stateless for TCP. +for direction in from to; do + ovn-nbctl acl-add pg ${direction}-lport 1 tcp allow-stateless +done +ovn-nbctl --wait=sb sync + +# TCP packets should not go to conntrack anymore. +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_tcp}" +AT_CHECK_UNQUOTED([ovn-trace --minimal ls "${flow}"], [0], [dnl +# tcp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80,tcp_flags=0 +output("lsp2"); +]) + +# UDP packets still go to conntrack. +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_udp}" +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"], [0], [dnl +# udp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80 +ct_next(ct_state=new|trk) { + ct_next(ct_state=new|trk) { + output("lsp2"); + }; +}; +]) + +# Add a load balancer. +ovn-nbctl lb-add lb-tcp 66.66.66.66:80 42.42.42.2:8080 tcp +ovn-nbctl lb-add lb-udp 66.66.66.66:80 42.42.42.2:8080 udp +ovn-nbctl ls-lb-add ls lb-tcp +ovn-nbctl ls-lb-add ls lb-udp + +# Remove stateless for TCP. +ovn-nbctl acl-del pg +ovn-nbctl --wait=sb sync + +# TCP packets should go to conntrack. +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_tcp}" +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"], [0], [dnl +# tcp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80,tcp_flags=0 +ct_next(ct_state=new|trk) { + ct_lb { + reg0[[6]] = 0; + *** chk_lb_hairpin_reply action not implemented; + reg0[[12]] = 0; + ct_next(ct_state=new|trk) { + output("lsp2"); + }; + }; +}; +]) + +# UDP packets should go to conntrack. +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_udp}" +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"], [0], [dnl +# udp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80 +ct_next(ct_state=new|trk) { + ct_lb { + reg0[[6]] = 0; + *** chk_lb_hairpin_reply action not implemented; + reg0[[12]] = 0; + ct_next(ct_state=new|trk) { + output("lsp2"); + }; + }; +}; +]) + +# Allow stateless for TCP. +for direction in from to; do + ovn-nbctl acl-add pg ${direction}-lport 1 tcp allow-stateless +done +ovn-nbctl --wait=sb sync + +# TCP packets should go to conntrack for load balancing. +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_tcp}" +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"], [0], [dnl +# tcp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80,tcp_flags=0 +ct_next(ct_state=new|trk) { + ct_lb { + reg0[[6]] = 0; + *** chk_lb_hairpin_reply action not implemented; + reg0[[12]] = 0; + ct_next(ct_state=new|trk) { + output("lsp2"); + }; + }; +}; +]) + +# UDP packets still go to conntrack. +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_udp}" +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"], [0], [dnl +# udp,reg14=0x${lsp1_inport},vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80 +ct_next(ct_state=new|trk) { + ct_lb { + reg0[[6]] = 0; + *** chk_lb_hairpin_reply action not implemented; + reg0[[12]] = 0; + ct_next(ct_state=new|trk) { + output("lsp2"); + }; + }; +}; +]) + +AT_CLEANUP +]) + OVN_FOR_EACH_NORTHD([ AT_SETUP([ovn -- check BFD config propagation to SBDB]) AT_KEYWORDS([northd-bfd]) diff --git a/utilities/ovn-nbctl.c b/utilities/ovn-nbctl.c index 042c21002..48fd0b7ee 100644 --- a/utilities/ovn-nbctl.c +++ b/utilities/ovn-nbctl.c @@ -2303,9 +2303,11 @@ nbctl_acl_add(struct ctl_context *ctx) /* Validate action. */ if (strcmp(action, "allow") && strcmp(action, "allow-related") - && strcmp(action, "drop") && strcmp(action, "reject")) { + && strcmp(action, "allow-stateless") && strcmp(action, "drop") + && strcmp(action, "reject")) { ctl_error(ctx, "%s: action must be one of \"allow\", " - "\"allow-related\", \"drop\", and \"reject\"", action); + "\"allow-related\", \"allow-stateless\", \"drop\", " + "and \"reject\"", action); return; }
For allow-stateless ACLs, bypass connection tracking by avoiding setting ct hints for matching traffic. Avoid sending all traffic to ct when a stateful ACL is present. === Reusing an existing 'allow' verb for stateless matching would have its drawbacks, specifically, when it comes to backwards incompatibility of the new behavior with existing environments. When using "allow" ACLs in mixed allow/allow-related environment, we still commit "allow" traffic to conntrack. This unnecessarily hits performance when mixed ACL action types were used for the same datapath. This is why we introduce a new action verb to describe stateless behavior. Another complexity to consider is the fact that with stateless matching, one would not be able to rely on 'related' magic that guarantees that reply traffic is passed through. Instead, the user would have to accurately define matching rules both for request and reply directions of a protocol session. Specifically, when allowing ICMP for a specific peer host, one has to define 'allow-stateless' rules that would match against ip.dst for request direction and ip.src for reply direction. Other protocols and scenarios will require their own fine grained matching approaches implemented by the user. === For performance measurements, qperf was used. Tests were executed on two setups: 1) ovn-fake-multinode virtual environment; 2) Supermicro SYS-5039MS-H8TRF 3-node RH-OSP 16.x cluster with 2 compute nodes, NIC: "Intel Corporation Ethernet Controller XXV710 for 25GbE SFP28 (rev 02)", using fake_nova_driver to avoid qemu overhead. 1) ovn-fake-multinode: Performance measured between two virtual nodes, two ports that belong to different LSs connected via router. Using qperf, performance was measured for UDP, TCP, SCTP protocols (using <proto>_lat and <proto>_bw tests). The qperf version used: 0.4.9-16.fc31.x86_64. Each test scenario was executed five times and averages compared. Tests were executed with `allow-stateless` rules for the tested protocol and `allow-related` for another protocol set for both ports, both directions, e.g. for TCP scenario, the following ACLs were defined: ovn-nbctl acl-add sw0 to-lport 100 tcp allow-stateless ovn-nbctl acl-add sw0 from-lport 100 tcp allow-stateless ovn-nbctl acl-add sw1 to-lport 100 tcp allow-stateless ovn-nbctl acl-add sw1 from-lport 100 tcp allow-stateless ovn-nbctl acl-add sw0 to-lport 100 sctp allow-related ovn-nbctl acl-add sw0 from-lport 100 sctp allow-related ovn-nbctl acl-add sw1 to-lport 100 sctp allow-related ovn-nbctl acl-add sw1 from-lport 100 sctp allow-related In this particular environment, improvement was seen in send_bw, latency, and msg_rate measurements, where applicable, for all three protocols under test. for UDP, send_bw: 293.6 MB/sec => 313.2 MB/sec (+6.68%) latency: 16 us => 14.08 us (-12%) msg_rate: 62.56 K/sec => 71.06 K/sec (+13.59%) for TCP, latency: 18.6 us => 14.88 us (-20%) msg_rate: 53.8 K/sec => 67.28 K/sec (+25.06%) for SCTP, latency: 21.98 us => 19.42 us (-11.65%) msg_rate: 45.58 K/sec => 51.54 K/sec (+13.08%) Interestingly, some performance improvement was also seen for the same scenarios with no ACLs set at all, albeit significantly more negligible. for UDP, send_bw: 320.0 MB/sec => 338.6 MB/sec (+5.81%) latency: 13.74 us => 12.88 us (-6.68%) msg_rate: 73.02 K/sec => 77.84 K/sec (+6.6%) for TCP, latency: 15.62 us => 14.26 us (-9.54%) msg_rate: 64.02 K/sec => 70.26 K/sec (+9.75%) for SCTP, latency: 19.56 us => 18.16 us (-7.16%) msg_rate: 51.16 K/sec => 55.12 K/sec (+7.74%) 2) Supermicro RH-OSP cluster: Two scenarios executed: - connectivity between two ports on the same switch - connectivity between two ports on different switches connected via router For the same switch, improvements are as follows: - TCP latency: -5.3%, UDP latency: -13.5%, SCTP latency: -10.8% - TCP bw: +0.8%, UDP bw, +12.25%, SCTP bw: +21.29% For different switches, improvements are as follows: - TCP latency: -6%, UDP: -11.2%, SCTP: -0.2% - TCP bw: -0.7%, UDP bw: +2%, SCTP bw: +9.9% The effect is more noticeable in same switch scenario. Comparable numbers can be captured with iperf. It may be useful to run more tests in a more elaborate (bare metal) environment. === The patch takes inspiration from a now abandoned patch: "ovn-northd: Support mixing stateless/stateful ACLs with Stateless_Filter." by Dumitru Ceara. The original patch assumed CMS doesn't require full flexibility of matching rules for stateless matching (for example, to be used by OpenShift). But other CMS interfaces may require the same customizability for stateless as well as stateful matching, like in OpenStack Neutron API. Which is why this patch reuses existing ACL object type to describe stateless rules. Signed-off-by: Ihar Hrachyshka <ihrachys@redhat.com> --- v1: initial version. v2: rebased after conflict. v3: added ddlog implementation. v3: apply stateless skip-hint-set rules to appropriate direction only. v3: added more background as to implementation in commit message. v3: test stateless scenarios with ddlog too. v3: rebased after conflict. v4: introduce a separate allow-stateless ACL match verb. v5: rebased. v6: updated docs for new allow-stateless approach. v6: removed no longer valid comments. v6: removed acl_is_stateless. v7: bump north db schema version to 5.31.0 -> 5.32.0. v8: fixed checkpatch failure on a too long line in dbschema. v8: added perf data on baremetal lab testing. v9: fixed ovsdb checksum. --- NEWS | 2 + northd/ovn-northd.8.xml | 8 +- northd/ovn-northd.c | 65 ++++++++- northd/ovn_northd.dl | 31 ++++ ovn-nb.ovsschema | 9 +- ovn-nb.xml | 9 +- tests/ovn-northd.at | 309 ++++++++++++++++++++++++++++++++++++++++ utilities/ovn-nbctl.c | 6 +- 8 files changed, 428 insertions(+), 11 deletions(-)