diff mbox

[ovs-dev] ovn: SFC Patch V3

Message ID 65324DBA-5F28-4BD8-9DB9-770FC7BECE28@paloaltonetworks.com
State Not Applicable
Headers show

Commit Message

John McDowall May 9, 2017, 11:33 p.m. UTC
Mickey,

Thanks for the review. I need some help understanding a couple of things:


1)       The proposed change, I could see the previous logic where we inserted the flow back in the ingress pipeline just after the IN_CHAIN stage. The changes you suggest seem to imply that the action is still insert after the _IN_CHAIN stage but in the egress (OUT) pipeline. I am missing something here – can you give me some more info?

+    for (int ii = 0; ii < MFF_N_LOG_REGS; ii++) {
+            ds_put_format(&actions, "reg%d = 0; ", ii);
+    }
+    ds_put_format(&actions, "next(pipeline=ingress, table=%d); };",
+                            ovn_stage_get_table(S_SWITCH_IN_CHAIN) + 1);
+    ovn_lflow_add(lflows, od, S_SWITCH_IN_CHAIN, ingress_inner_priority,
+                        lcc_match, ds_cstr(&actions));

Replace the line above by:

    ovn_lflow_add(lflows, od, S_SWITCH_OUT_SFC_LOOPBACK, 100,
                        lcc_match, ds_cstr(&actions));




2)       I can try and put some checks in for loop avoidance. Can you think of scenarios that would cause this, a badly configured port-pair could perhaps cause it (if the eth egress of the port-pair was configured as the ingress eth.) Any other scenarios that come to mind ?

Regards

John
From: Mickey Spiegel <mickeys.dev@gmail.com>

Date: Monday, April 24, 2017 at 6:39 PM
To: John McDowall <jmcdowall@paloaltonetworks.com>
Cc: "ovs-dev@openvswitch.org" <ovs-dev@openvswitch.org>
Subject: Re: [ovs-dev] ovn: SFC Patch V3


On Mon, Apr 24, 2017 at 12:56 PM, <jmcdowall@paloaltonetworks.com<mailto:jmcdowall@paloaltonetworks.com>> wrote:
From: John McDowall <jmcdowall@paloaltonetworks.com<mailto:jmcdowall@paloaltonetworks.com>>



Fixed changes from Mickey's last review.

Changes

1) Fixed re-circulation rules

Still a few modifications required. See comments inline. I just typed some stuff out, have not run, built, or tested anything.

2) Fixed match statement - match is only applied to beginnning of chain in
   each direction.
3) Fixed array length of chain of VNFs. I have tested thi sup to three VNFs
   in a chain and it looks like it works in both directions.

Areas to review

1) The logic now supports hair-pinnign the flow back to the original source to
   ensure that the MAC learnign problem is addressed. I tested this using
   ovn-trace - any better testing that I should do?

Current todo list

1) I have standalone tests need to add tests to ovs/ovn framework.
2) Load-balancing support for port-pair-groups
3) Publish more detailed examples.
4) Submit suggestions to change and shorted the CLI names.

Simple example using ovn-trace

#!/bin/sh
#
clear
ovn-nbctl ls-add swt1

ovn-nbctl lsp-add swt1 swt1-appc
ovn-nbctl lsp-add swt1 swt1-apps
ovn-nbctl lsp-add swt1 swt1-vnfp1
ovn-nbctl lsp-add swt1 swt1-vnfp2

ovn-nbctl lsp-set-addresses swt1-appc "00:00:00:00:00:01 192.168.33.1"
ovn-nbctl lsp-set-addresses swt1-apps "00:00:00:00:00:02 192.168.33.2"
ovn-nbctl lsp-set-addresses swt1-vnfp1 00:00:00:00:00:03
ovn-nbctl lsp-set-addresses swt1-vnfp2 00:00:00:00:00:04
#
# Configure Service chain
#
ovn-nbctl lsp-pair-add swt1 swt1-vnfp1 swt1-vnfp2 pp1
ovn-nbctl lsp-chain-add swt1 pc1
ovn-nbctl lsp-pair-group-add pc1 ppg1
ovn-nbctl lsp-pair-group-add-port-pair ppg1 pp1
ovn-nbctl lsp-chain-classifier-add swt1 pc1 swt1-appc "entry-lport" "bi-directional" pcc1
#
ovn-sbctl dump-flows
#
# Run trace command
printf "\n---------Flow 1 -------------\n\n"
ovn-trace --detailed  swt1 'inport == "swt1-appc" && eth.src == 00:00:00:00:00:01 && eth.dst == 00:00:00:00:00:02'
printf "\n---------Flow 2 -------------\n\n"
ovn-trace --detailed  swt1 'inport == "swt1-vnfp1" && eth.src == 00:00:00:00:00:01 && eth.dst == 00:00:00:00:00:02'
printf "\n---------Flow 3 -------------\n\n"
ovn-trace --detailed  swt1 'inport == "swt1-apps" && eth.dst == 00:00:00:00:00:01 && eth.src == 00:00:00:00:00:02'
printf "\n---------Flow 4 -------------\n\n"
ovn-trace --detailed  swt1 'inport == "swt1-vnfp2" && eth.dst == 00:00:00:00:00:01 && eth.src == 00:00:00:00:00:02'
#
# Cleanup
#
ovn-nbctl lsp-chain-classifier-del pcc1
ovn-nbctl lsp-pair-group-del ppg1
ovn-nbctl lsp-chain-del pc1
ovn-nbctl lsp-pair-del pp1
ovn-nbctl ls-del swt1

Reported at: https://mail.openvswitch.org/pipermail/ovs-discuss/2016-March/040381.html<https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_pipermail_ovs-2Ddiscuss_2016-2DMarch_040381.html&d=DwMFaQ&c=V9IgWpI5PvzTw83UyHGVSoW3Uc1MFWe5J8PTfkrzVSo&r=vZ6VUDaavDpfOdPQrz1ED54jEjvAE36A8TVJroVlrOQ&m=IVGOl8me3C4dzw8GMHKoxCKC1PfFf10p5EcD8PIGmbI&s=TRwcbjVrcqErrFP_gC_ZK6axUelonie6aFWbgEx1tpI&e=>
Reported at: https://mail.openvswitch.org/pipermail/ovs-discuss/2016-May/041359.html<https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_pipermail_ovs-2Ddiscuss_2016-2DMay_041359.html&d=DwMFaQ&c=V9IgWpI5PvzTw83UyHGVSoW3Uc1MFWe5J8PTfkrzVSo&r=vZ6VUDaavDpfOdPQrz1ED54jEjvAE36A8TVJroVlrOQ&m=IVGOl8me3C4dzw8GMHKoxCKC1PfFf10p5EcD8PIGmbI&s=HHqxmHiE32vv5l-uyeTAY2gy3MQvDclqu-A49TlM6pU&e=>

Signed-off-by: John McDowall <jmcdowall@paloaltonetworks.com<mailto:jmcdowall@paloaltonetworks.com>>

Signed-off-by: Flavio Fernandes <flavio@flaviof.com<mailto:flavio@flaviof.com>>

Co-authored-by: Flavio Fernandes <flavio@flaviof.com<mailto:flavio@flaviof.com>>
---
 ovn/northd/ovn-northd.8.xml   |   69 ++-
 ovn/northd/ovn-northd.c       |  382 ++++++++++++-
 ovn/ovn-architecture.7.xml    |   91 ++++
 ovn/ovn-nb.ovsschema          |   87 ++-
 ovn/ovn-nb.xml                |  188 ++++++-
 ovn/utilities/ovn-nbctl.8.xml |  231 ++++++++
 ovn/utilities/ovn-nbctl.c     | 1208 +++++++++++++++++++++++++++++++++++++++++
 7 files changed, 2227 insertions(+), 29 deletions(-)

<snip>


You need a stage in the egress pipeline to do SFC egress loopback:

    PIPELINE_STAGE(SWITCH, OUT, SFC_LOOPBACK, 0, "ls_out_sfc_loopback")  \
    PIPELINE_STAGE(SWITCH, OUT, PRE_LB,       1, "ls_out_pre_lb")     \
    ...

     PIPELINE_STAGE(SWITCH, OUT, PRE_LB,       0, "ls_out_pre_lb")     \
@@ -160,7 +161,6 @@ enum ovn_stage {
 #define REGBIT_CONNTRACK_COMMIT "reg0[1]"
 #define REGBIT_CONNTRACK_NAT    "reg0[2]"
 #define REGBIT_DHCP_OPTS_RESULT "reg0[3]"
-

Whitespace change should be removed.

 /* Register definitions for switches and routers. */
 #define REGBIT_NAT_REDIRECT     "reg9[0]"
 /* Indicate that this packet has been recirculated using egress
@@ -3014,6 +3014,348 @@ build_acls(struct ovn_datapath *od, struct hmap *lflows)
     }
 }

+static int
+cmp_port_pair_groups(const void *ppg1_, const void *ppg2_)
+{
+    const struct nbrec_logical_port_pair_group *const *ppg1p = ppg1_;
+    const struct nbrec_logical_port_pair_group *const *ppg2p = ppg2_;
+    const struct nbrec_logical_port_pair_group *ppg1 = *ppg1p;
+    const struct nbrec_logical_port_pair_group *ppg2 = *ppg2p;
+
+
+    return ( (int)ppg1->sortkey - (int)ppg2->sortkey);
+}
+
+static void
+build_chain(struct ovn_datapath *od, struct hmap *lflows, struct hmap *ports)
+{
+   /*
+    * TODO Items
+    *     * IPV6 support
+    *     * Load balancing support
+    *     * Bidirectional parameter support
+    *     * Support modes of VNF (BitW, L2, L3)
+    *     * Remove port-security on VNF Ports (if set by CMS)
+    *     * Add some state to allow match that does not require 'inport'
+    *     * Support visiting the same VNF more than once
+    *     * Unit tests!
+    */
+    const uint16_t ingress_inner_priority = 150;
+    const uint16_t ingress_outer_priority = 100;
+    const uint16_t egress_inner_priority = 150;
+    const uint16_t egress_outer_priority = 100;
+
+    struct ovn_port **input_port_array = NULL;
+    struct ovn_port **output_port_array = NULL;
+
+    struct ovn_port *dst_port = NULL;
+    struct ovn_port *src_port = NULL;
+
+    struct nbrec_logical_port_chain *lpc;
+    struct nbrec_logical_port_pair_group *lppg;
+    struct nbrec_logical_port_pair *lpp;
+    struct nbrec_logical_port_chain_classifier *lcc;
+
+    char *lcc_match = NULL;
+    char *lcc_action = NULL;
+    struct ovn_port *traffic_port;
+    unsigned int chain_direction = 2;
+    unsigned int chain_path = 2;
+    char * chain_match = NULL;
+
+    /* Ingress table ls_in_chain: default to passing through to the next table
+     * (priority 0)
+     */
+    ovn_lflow_add(lflows, od, S_SWITCH_IN_CHAIN, 0, "1", "next;");

At some point also need:

    ovn_lflow_add(lflows, od, S_SWITCH_OUT_SFC_LOOPBACK, 0, "1", "next;");

+
+    /* No port chains defined therefore, no further work to do here. */
+    if (!od->nbs->port_chain_classifiers) {
+        return;
+    }
+    /* Iterate through all the port-chains defined for this datapath. */
+    for (size_t i = 0; i < od->nbs->n_port_chain_classifiers; i++) {
+        lcc = od->nbs->port_chain_classifiers[i];
+        /* Get the parameters from the classifier */
+        lpc = lcc->chain;
+        //traffic_port = lcc->port;
+        traffic_port =  ovn_port_find(ports,lcc->port->name);
+        if (traffic_port == NULL) {
+            static struct vlog_rate_limit rl =
+                        VLOG_RATE_LIMIT_INIT(1, 1);
+            VLOG_WARN_RL(&rl,
+                        "Traffic port %s does not exist\n",
+                              lcc->port->name);
+            break;
+        }
+        /*
+        * Check chain has one or more groups
+        */
+        if (lpc->n_port_pair_groups == 0) {
+            static struct vlog_rate_limit rl =
+                        VLOG_RATE_LIMIT_INIT(1, 1);
+            VLOG_WARN_RL(&rl,
+                "SFC Chain: %s used in classifier: %s does not not "
+                 "have any port pair groups\n",
+                              lcc->chain->name, lcc->name);
+            break;
+
+        }
+        /* TODO Check port exists. */
+        struct eth_addr traffic_logical_port_ea;
+        ovs_be32 traffic_logical_port_ip;
+        ovs_scan(traffic_port->nbsp->addresses[0],
+                    ETH_ADDR_SCAN_FMT" "IP_SCAN_FMT,
+                    ETH_ADDR_SCAN_ARGS(traffic_logical_port_ea),
+                    IP_SCAN_ARGS(&traffic_logical_port_ip));
+        /* Set the port to use as source or destination. */
+        if (strcmp(lcc->direction,"entry-lport")==0) {
+            chain_path = 0;
+        } else {
+            chain_path = 1;
+        }
+        /* Set the direction of the port-chain. */
+        if (strcmp(lcc->path,"uni-directional") == 0) {
+            chain_direction = 0;
+        } else {
+            chain_direction = 1;
+        }
+        /* Set the match parameters. */
+        chain_match = lcc->match;
+        /*
+         * Allocate space for port-pairs + 1. The Extra 1 represents the
+         * final hop to reach desired destination.
+         * TODO: We are going to allocate enough space to hold all the hops:
+         *  1 x portGroups + 1. This needs
+         *  to enhanced to: SUM(port pairs of each port group) + 1
+         */
+        input_port_array = xmalloc(sizeof *src_port *
+                                   lpc->n_port_pair_groups);
+        output_port_array = xmalloc(sizeof *dst_port *
+                                  (lpc->n_port_pair_groups));
+        /* Copy port groups from chain and sort them according to sortkey.*/
+        struct nbrec_logical_port_pair_group **port_pair_groups =
+                                 xmemdup(lpc->port_pair_groups,
+                          sizeof *port_pair_groups * lpc->n_port_pair_groups);
+        if (lpc->n_port_pair_groups > 1) {
+            qsort(port_pair_groups, lpc->n_port_pair_groups,
+              sizeof *port_pair_groups, cmp_port_pair_groups);
+        }
+        /* For each port-pair-group in a port chain pull out the port-pairs.*/
+        for (size_t j = 0; j < lpc->n_port_pair_groups; j++) {
+            lppg = port_pair_groups[j];
+            for (size_t k = 0; k < lppg->n_port_pairs; k++) {
+                 /* TODO: Need to add load balancing logic when LB becomes
+                 * available. Until LB is available just take the first
+                 * PP in the PPG. */
+                if (k > 0) {
+                    static struct vlog_rate_limit rl =
+                        VLOG_RATE_LIMIT_INIT(1, 1);
+                    VLOG_WARN_RL(&rl,
+                        "Currently lacking support for more than \
+                            one port-pair %"PRIuSIZE"\n",
+                              lppg->n_port_pairs);
+                    break;
+                }
+                lpp = lppg->port_pairs[k];
+                input_port_array[j] = lpp->inport ? ovn_port_find(ports,
+                                       lpp->inport->name) : NULL;
+                output_port_array[j] = lpp->outport ? ovn_port_find(ports,
+                                        lpp->outport->name) : NULL;
+            }
+        /* At this point we need to determine the final hop port to add to
+         * the chain. This defines the complete path for packets through
+         * the chain. */
+        }

I find the change in indentation below confusing.

+    /*
+    * Insert the lowest priorty rule dest is src-logical-port. These are the
+    * entry points into the chain in either direction. The match statement
+    * is used to filter the entry port to provide higher granularity of
+    * filtering.
+    */
+    if (chain_path == 1) { /* Path starting from entry port */
+        if (strcmp(chain_match,"")!=0) {
+           lcc_match =  xasprintf(
+            "eth.src == "ETH_ADDR_FMT" && %s",
+             ETH_ADDR_ARGS(traffic_logical_port_ea), chain_match);
+        } else {
+           lcc_match =  xasprintf(
+            "eth.src == "ETH_ADDR_FMT,
+             ETH_ADDR_ARGS(traffic_logical_port_ea));
+        }
+           lcc_action = xasprintf("outport = %s; output;",
+                                input_port_array[0]->json_key);
+    } else {  /* Path going to the entry port */
+        if (strcmp(chain_match,"")!=0) {
+           lcc_match =  xasprintf(
+            "eth.dst == "ETH_ADDR_FMT" && %s",
+             ETH_ADDR_ARGS(traffic_logical_port_ea), chain_match);
+        } else {
+           lcc_match =  xasprintf(
+            "eth.dst == "ETH_ADDR_FMT,
+             ETH_ADDR_ARGS(traffic_logical_port_ea));
+        }
+           lcc_action = xasprintf("outport = %s; output;",
+                    output_port_array[lpc->n_port_pair_groups-1]->json_key);
+    }
+    ovn_lflow_add(lflows, od, S_SWITCH_IN_CHAIN, ingress_outer_priority,
+                       lcc_match, lcc_action);
+    free(lcc_match);
+    free(lcc_action);
+    /*
+    * For last VNF send the flow back to teh original chassis and exit from
+    * there.
+    */
+    if (chain_path == 1) { /* Path starting from entry port */
+           lcc_match = xasprintf(
+                    "eth.src == "ETH_ADDR_FMT" && inport == %s",
+                     ETH_ADDR_ARGS(traffic_logical_port_ea),
+                     output_port_array[lpc->n_port_pair_groups-1]->json_key);
+    } else { /* Path starting from destination port. */
+            lcc_match = xasprintf(
+                    "eth.dst == "ETH_ADDR_FMT" && inport == %s",
+                     ETH_ADDR_ARGS(traffic_logical_port_ea),
+                     input_port_array[0]->json_key);
+    }

    lcc_action = xasprintf("outport = %s; output;", traffic_port->json_key);
    ovn_lflow_add(lflows, od, S_SWITCH_IN_CHAIN, ingress_inner_priority,
                        lcc_match, lcc_action);
    free(lcc_match);
    free(lcc_action);

Temporarily ignoring cases where there are multiple port_chain_classifiers with the same traffic_port. You should do something to avoid duplicate flows in this case.

    /* SFC egress loopback on traffic_port. */

    lcc_match = xasprintf(
                    "eth.src == "ETH_ADDR_FMT" && outport == %s",
                    ETH_ADDR_ARGS(traffic_logical_port_ea),
                    traffic_port->json_key);

+             //lcc_action = xasprintf("next;");
+             //lcc_action = xasprintf("flags.loopback = 1; "
+             //    REGBIT_CHAIN_LOOPBACK" = 1;"
+             //   "next(pipeline=ingress, table=0);");
+    struct ds actions = DS_EMPTY_INITIALIZER;
+    ds_put_format(&actions, "clone { ct_clear; "
+                              "inport = outport; outport = \"\"; "
+                              "flags = 0; flags.loopback = 1; ");

Probably should not set flags.loopback, see comments
below.

+    for (int ii = 0; ii < MFF_N_LOG_REGS; ii++) {
+            ds_put_format(&actions, "reg%d = 0; ", ii);
+    }
+    ds_put_format(&actions, "next(pipeline=ingress, table=%d); };",
+                            ovn_stage_get_table(S_SWITCH_IN_CHAIN) + 1);
+    ovn_lflow_add(lflows, od, S_SWITCH_IN_CHAIN, ingress_inner_priority,
+                        lcc_match, ds_cstr(&actions));

Replace the line above by:

    ovn_lflow_add(lflows, od, S_SWITCH_OUT_SFC_LOOPBACK, 100,
                        lcc_match, ds_cstr(&actions));

+    free(lcc_match);
+        //free(lcc_action);
+    ds_destroy(&actions);
+

Some thought needs to be put behind loop avoidance, for
example when eth.dst == eth.src == traffic_logical_port_ea.
This may be an issue since we do not have any definitive
method to distinguish packets returning from chains, from
packets that were switched to traffic_port according to
normal L2 processing.

One option is to add higher priority ( > 100) flows. This is
messy, since there is already a fair amount of code to
write flows to send traffic to this port in
S_SWITCH_IN_L2_LKUP. This needs some thought to
determine the match conditions.
The action is simple, just "next;".

Perhaps more promising is not to set flags.loopback to 1.
This would cause any packets with
eth.src == traffic_logical_port_ea that are switched to
traffic_port to go back through traffic_port's ingress
pipeline, then be dropped if outport == traffic_port at the
end of traffic_port's ingress pipeline.
However, if anything after S_SWITCH_IN_CHAIN sets
flags.loopback, that might not be enough.

There seems to be another issue, if anything before
S_SWITCH_IN_CHAIN sets flags.loopback, that will not be
remembered when the packet comes back, so the packet
will not be processed properly.

Right now it looks like flags.loopback is only set for a
logical switch in tables 11, 12, 13 which are after
S_SWITCH_IN_CHAIN. So, at the moment, remembering
flags.loopback is not an issue. These flows generate
ARP and DCHP responses, in all cases changing
eth.src, so the packets will not loop indefinitely even
though these rules set flags.loopback. There are
already flows to avoid generating ARP responses for
requests on the port that owns the address, which is
the dangerous case for SFC loops.

In summary, it looks like leaving flags.loopback at 0
should work for now. The question is what would
happen if more cases are added in the future that
set flags.loopback in a logical switch.

Mickey

+    /*
+    * Loop over all VNFs and create flow rules for each
+    * Only get here when there is more than one VNF in the chain.
+    */
+    for (size_t j = 1; j < lpc->n_port_pair_groups; j++) {
+
+        /* Apply inner rules flows */
+        if (chain_path == 1) { /* Path starting from entry port */
+            lcc_match = xasprintf(
+                    "eth.src == "ETH_ADDR_FMT" && inport == %s",
+                                 ETH_ADDR_ARGS(traffic_logical_port_ea),
+                                 output_port_array[j-1]->json_key);
+            lcc_action = xasprintf("outport = %s; output;",
+                                input_port_array[j]->json_key);
+        } else { /* Path going to entry port. */
+            lcc_match = xasprintf(
+                    "eth.dst == "ETH_ADDR_FMT" && inport == %s",
+                    ETH_ADDR_ARGS(traffic_logical_port_ea),
+                    input_port_array[lpc->n_port_pair_groups-j]->json_key);
+             lcc_action = xasprintf("outport = %s; output;",
+                  output_port_array[lpc->n_port_pair_groups-(j+1)]->json_key);
+        }
+        ovn_lflow_add(lflows, od, S_SWITCH_IN_CHAIN, ingress_inner_priority,
+                        lcc_match, lcc_action);
+        free(lcc_match);
+        free(lcc_action);
+    }
+    /* bi-directional chain (Response)*/
+    if (chain_direction == 1) {
+        /*
+         * Insert the lowest priorty rule dest is src-logical-port
+         */
+        if (chain_path == 1) { /* Path from source port. */
+            if (strcmp(chain_match,"")!=0) {
+                 lcc_match =  xasprintf("eth.dst == "ETH_ADDR_FMT" && %s",
+                        ETH_ADDR_ARGS(traffic_logical_port_ea), chain_match);
+            } else { /* Path to source port */
+
+                 lcc_match =  xasprintf("eth.dst == "ETH_ADDR_FMT,
+                                    ETH_ADDR_ARGS(traffic_logical_port_ea));
+            }
+            lcc_action = xasprintf("outport = %s; output;",
+                    output_port_array[lpc->n_port_pair_groups-1]->json_key);
+        } else { /* Path from destination port. */
+           if (strcmp(chain_match,"")!=0) {
+                lcc_match =  xasprintf("eth.src == "ETH_ADDR_FMT" && %s",
+                        ETH_ADDR_ARGS(traffic_logical_port_ea), chain_match);
+            } else {
+                lcc_match =  xasprintf("eth.src == "ETH_ADDR_FMT,
+                        ETH_ADDR_ARGS(traffic_logical_port_ea));
+            }
+            lcc_action = xasprintf("outport = %s; output;",
+                    input_port_array[0]->json_key);
+        }
+        ovn_lflow_add(lflows, od, S_SWITCH_IN_CHAIN, egress_outer_priority,
+                       lcc_match, lcc_action);
+        free(lcc_match);
+        free(lcc_action);
+        /*
+        * End Entry Flow to classification chain entry point.
+        */
+        /* Apply last rule to exit from chain */
+        if (chain_path == 1) { /* Path from source port. */
+             lcc_match = xasprintf(
+                    "eth.dst == "ETH_ADDR_FMT" && inport == %s",
+                                 ETH_ADDR_ARGS(traffic_logical_port_ea),
+                                 input_port_array[0]->json_key);
+
+        } else { /* Path to source port. */
+             lcc_match = xasprintf(
+                "eth.src == "ETH_ADDR_FMT" && inport == %s",
+                    ETH_ADDR_ARGS(traffic_logical_port_ea),
+                    output_port_array[lpc->n_port_pair_groups-1]->json_key);
+        }
+        struct ds actions = DS_EMPTY_INITIALIZER;
+        ds_put_format(&actions,
+                              "clone { ct_clear; "
+                              "inport = outport; outport = \"\"; "
+                              "flags = 0; flags.loopback = 1; ");
+        for (int ii = 0; ii < MFF_N_LOG_REGS; ii++) {
+                    ds_put_format(&actions, "reg%d = 0; ", ii);
+        }
+        ds_put_format(&actions, "next(pipeline=ingress, table=%d); };",
+                              ovn_stage_get_table(S_SWITCH_IN_CHAIN) + 1);
+        ovn_lflow_add(lflows, od, S_SWITCH_IN_CHAIN,
+                        egress_inner_priority, lcc_match, ds_cstr(&actions));
+        ds_destroy(&actions);
+        free(lcc_match);
+
+        for (int j = 1; j< lpc->n_port_pair_groups; j++) {
+
+            /* Completed first catch all rule for this port-chain. */
+
+            /* Apply inner rules flows */
+            if (chain_path == 1) { /* Path from source port. */
+                lcc_match = xasprintf(
+                    "eth.dst == "ETH_ADDR_FMT" && inport == %s",
+                    ETH_ADDR_ARGS(traffic_logical_port_ea),
+                    input_port_array[lpc->n_port_pair_groups-j]->json_key);
+                 lcc_action = xasprintf("outport = %s; output;",
+                  output_port_array[lpc->n_port_pair_groups-(j+1)]->json_key);
+
+            } else { /* Path to source port. */
+                lcc_match = xasprintf(
+                    "eth.src == "ETH_ADDR_FMT" && inport == %s",
+                        ETH_ADDR_ARGS(traffic_logical_port_ea),
+                        output_port_array[j-1]->json_key);
+                 lcc_action = xasprintf("outport = %s; output;",
+                        input_port_array[j]->json_key);
+            }
+            ovn_lflow_add(lflows, od, S_SWITCH_IN_CHAIN,
+                          egress_inner_priority, lcc_match, lcc_action);
+            free(lcc_match);
+            free(lcc_action);
+        }
+    }
+    free(input_port_array);
+    free(output_port_array);
+    free(port_pair_groups);
+    }
+}
 static void
 build_qos(struct ovn_datapath *od, struct hmap *lflows) {
     ovn_lflow_add(lflows, od, S_SWITCH_IN_QOS_MARK, 0, "1", "next;");
@@ -3142,7 +3484,7 @@ build_lswitch_flows(struct hmap *datapaths, struct hmap *ports,
     struct ds actions = DS_EMPTY_INITIALIZER;

     /* Build pre-ACL and ACL tables for both ingress and egress.
-     * Ingress tables 3 through 9.  Egress tables 0 through 6. */
+     * Ingress tables 3 through 10.  Egress tables 0 through 6. */
     struct ovn_datapath *od;
     HMAP_FOR_EACH (od, key_node, datapaths) {
         if (!od->nbs) {
@@ -3153,6 +3495,7 @@ build_lswitch_flows(struct hmap *datapaths, struct hmap *ports,
         build_pre_lb(od, lflows);
         build_pre_stateful(od, lflows);
         build_acls(od, lflows);
+        build_chain(od, lflows, ports);
         build_qos(od, lflows);
         build_lb(od, lflows);
         build_stateful(od, lflows);
@@ -3225,9 +3568,9 @@ build_lswitch_flows(struct hmap *datapaths, struct hmap *ports,
         ovn_lflow_add(lflows, od, S_SWITCH_IN_PORT_SEC_IP, 0, "1", "next;");
     }

-    /* Ingress table 10: ARP/ND responder, skip requests coming from localnet
-     * and vtep ports. (priority 100); see ovn-northd.8.xml for the
-     * rationale. */
+    /* Ingress table 11: ARP/ND responder, skip requests coming from localnet
+     * ports. (priority 100). */
+
     HMAP_FOR_EACH (op, key_node, ports) {
         if (!op->nbsp) {
             continue;
@@ -3242,7 +3585,7 @@ build_lswitch_flows(struct hmap *datapaths, struct hmap *ports,
         }
     }

-    /* Ingress table 10: ARP/ND responder, reply for known IPs.
+    /* Ingress table 11: ARP/ND responder, reply for known IPs.
      * (priority 50). */
     HMAP_FOR_EACH (op, key_node, ports) {
         if (!op->nbsp) {
@@ -3335,7 +3678,7 @@ build_lswitch_flows(struct hmap *datapaths, struct hmap *ports,
         }
     }

-    /* Ingress table 10: ARP/ND responder, by default goto next.
+    /* Ingress table 11: ARP/ND responder, by default goto next.
      * (priority 0)*/
     HMAP_FOR_EACH (od, key_node, datapaths) {
         if (!od->nbs) {
@@ -3345,7 +3688,7 @@ build_lswitch_flows(struct hmap *datapaths, struct hmap *ports,
         ovn_lflow_add(lflows, od, S_SWITCH_IN_ARP_ND_RSP, 0, "1", "next;");
     }

-    /* Logical switch ingress table 11 and 12: DHCP options and response
+    /* Logical switch ingress table 12 and 13: DHCP options and response
          * priority 100 flows. */
     HMAP_FOR_EACH (op, key_node, ports) {
         if (!op->nbsp) {
@@ -3449,7 +3792,7 @@ build_lswitch_flows(struct hmap *datapaths, struct hmap *ports,
         }
     }

-    /* Ingress table 11 and 12: DHCP options and response, by default goto next.
+    /* Ingress table 12 and 13: DHCP options and response, by default goto next.
      * (priority 0). */

     HMAP_FOR_EACH (od, key_node, datapaths) {
@@ -3461,7 +3804,8 @@ build_lswitch_flows(struct hmap *datapaths, struct hmap *ports,
         ovn_lflow_add(lflows, od, S_SWITCH_IN_DHCP_RESPONSE, 0, "1", "next;");
     }

-    /* Ingress table 13: Destination lookup, broadcast and multicast handling
+
+    /* Ingress table 14: Destination lookup, broadcast and multicast handling
      * (priority 100). */
     HMAP_FOR_EACH (op, key_node, ports) {
         if (!op->nbsp) {
@@ -3481,7 +3825,7 @@ build_lswitch_flows(struct hmap *datapaths, struct hmap *ports,
                       "outport = \""MC_FLOOD"\"; output;");
     }

-    /* Ingress table 13: Destination lookup, unicast handling (priority 50), */
+    /* Ingress table 14: Destination lookup, unicast handling (priority 50), */
     HMAP_FOR_EACH (op, key_node, ports) {
         if (!op->nbsp) {
             continue;
@@ -3581,7 +3925,7 @@ build_lswitch_flows(struct hmap *datapaths, struct hmap *ports,
         }
     }

-    /* Ingress table 13: Destination lookup for unknown MACs (priority 0). */
+    /* Ingress table 14: Destination lookup for unknown MACs (priority 0). */
     HMAP_FOR_EACH (od, key_node, datapaths) {
         if (!od->nbs) {
             continue;

<snip>

Comments

Mickey Spiegel May 10, 2017, 10:49 p.m. UTC | #1
Three issues before diving in:


1. Placement of S_SWITCH_IN_CHAIN

For some reason I thought S_SWITCH_IN_CHAIN was after all the stateful
processing, but now I see that it is table 7. That breaks ACLs and other
stateful processing, since REGBIT_CONNTRACK_COMMIT is set in
S_SWITCH_IN_ACL and matched in S_SWITCH_IN_STATEFUL.

S_SWITCH_IN_CHAIN should instead be table 10. The comments below are
written assuming this change.


2. Ingress pipeline needs to be expanded to more than 16 tables

DNS went in since the v3 patch and used up the last of the 16 ingress
tables. If you rebase, you will see that there is no space in the ingress
pipeline for the addition of S_SWITCH_IN_CHAIN. Someone (not me) needs to
expand the ingress pipeline to more than 16 stages before you can proceed.


3. While writing this response, I paid a little more attention to the
"exit-lport" direction and noticed a couple of significant issues.

a. If a packet goes from VM A on port 1 to VM B on port 4, there is a
logical port chain classifier on port 1 in the "entry-lport" direction, and
there is a logical port chain classifier on port 4 in the "exit-lport"
direction, you will only go down one of the service chains. Since the
priorities are equal, I cannot even tell you which one of the service
chains. Logically I would think that the packet should go down both service
chains, first the port 1 "entry-lport" service chain and then the port 4
"exit-lport" service chain.

b. This is done in the ingress pipeline, not the egress pipeline, and is
based on matching eth.dst. This assumes that the forwarding decision will
be based on eth.dst, since you are immediately going down the service
chain, skipping the other ingress pipeline stages, and at the end you go
directly to the egress pipeline with outport based on eth.dst. That is
quite restrictive for a generic forwarding architecture like OVN. I would
think that the right thing to do would be to move the classifier to the
egress pipeline stage, but then I do not know how to avoid loops. When a
packet comes to the egress pipeline stage where the VM resides, there is no
way to tell whether the packet has already gone down the service chain or
not. I guess you could put a S_SWITCH_IN_EGRESS_CHAIN ingress pipeline
stage right after L2_LKUP instead, and match on outport in addition to
eth.dst, but it feels a bit unclean.

On Tue, May 9, 2017 at 4:33 PM, John McDowall <
jmcdowall@paloaltonetworks.com> wrote:

> Mickey,
>
>
>
> Thanks for the review. I need some help understanding a couple of things:
>
>
>
> 1)       The proposed change, I could see the previous logic where we
> inserted the flow back in the ingress pipeline just after the IN_CHAIN
> stage. The changes you suggest seem to imply that the action is still
> insert after the _*IN*_CHAIN stage but in the egress (OUT) pipeline. I am
> missing something here – can you give me some more info?
>
Assume you have port 1 to a VM on HV1, port 2 as the input port to a VNF on
HV2, and port 3 as the output port from that same VNF on HV2. The service
chain is just that one VNF, with direction "entry-lport".

The packet progresses as follows:

HV1, ingress pipeline, inport 1
Tables 1 to 9
Table 10 (S_SWITCH_IN_CHAIN) hits priority 100 flow setting outport = 2 and
then going to output immediately

Table 32 sends the packet through a geneve tunnel to HV2

HV2, egress pipeline, outport 2
Tables 48 to 65

After packet gets delivered to the VNF, it comes back

HV2, ingress pipeline, inport 3
Tables 1 to 9

Table 10 (S_SWITCH_IN_CHAIN)

With the code in v3, you are doing a clone with inport = outport (which is
not yet set!),
then progressing to ingress Table 11, still on HV2

Eventually you hit S_SWITCH_IN_L2_LKUP and go directly to the destination
from HV2. If the destination is in the outside world, this just broke
upstream L2 learning.

My suggestion is instead Table 10 hits a priority 150 flow setting outport
= 1 and then going to output immediately.

Table 32 sends the packet through a geneve tunnel to HV1 based on outport
== 1, but it ends up in the egress pipeline

HV1, egress pipeline, outport 1

New Table 0 (S_SWITCH_OUT_SFC_LOOPBACK) hits the entry doing a clone with
input = outport and jumping to the ingress pipeline Table 11

HV1, ingress pipeline, inport 1
Tables 11 to 15 (continuation from where things left off at the beginning,
when the packet was redirected down the service chain)
...

Since this code does not use NSH, all metadata accumulated in Tables 1 to 9
is lost while going around the service chain, before proceeding through
Tables 11 to 15. At the moment this works, but this becomes a restriction
on future features, if they are to work with SFC. The alternative is to
back through all the Tables 1 through 15 instead, with
REGBIT_CHAIN_LOOPBACK as in the v2 patch to avoid going hitting any entries
in Table 10 (S_SWITCH_IN_CHAIN). With this alternative, all packets go
through Tables 1 to 9 on HV1 twice. We have to make a judgement call which
alternative is least ugly, or move to NSH and stuff the metadata in the NSH
header.

>
>
> +    for (int ii = 0; ii < MFF_N_LOG_REGS; ii++) {
> +            ds_put_format(&actions, "reg%d = 0; ", ii);
> +    }
> +    ds_put_format(&actions, "next(pipeline=ingress, table=%d); };",
> +                            ovn_stage_get_table(S_SWITCH_IN_CHAIN) + 1);
> +    ovn_lflow_add(lflows, od, S_SWITCH_IN_CHAIN, ingress_inner_priority,
> +                        lcc_match, ds_cstr(&actions));
>
>
>
> Replace the line above by:
>
>
>
>     ovn_lflow_add(lflows, od, S_SWITCH_OUT_SFC_LOOPBACK, 100,
>                         lcc_match, ds_cstr(&actions));
>
>
>
>
>
>
>
> 2)       I can try and put some checks in for loop avoidance. Can you
> think of scenarios that would cause this, a badly configured port-pair
> could perhaps cause it (if the eth egress of the port-pair was configured
> as the ingress eth.) Any other scenarios that come to mind ?
>

I cannot think of a good reason why a packet would end up on the egress
pipeline on HV1 with outport == 1.
After thinking about it more, I think it is OK if flags.loopback is left as
0. If this case is ever triggered and the second time around through the
ingress pipeline still sets outport = 1, then the Table 34 loopback check
will detect that outport == inport and drop the packet.

Mickey


>
> Regards
>
>
>
> John
>
> *From: *Mickey Spiegel <mickeys.dev@gmail.com>
> *Date: *Monday, April 24, 2017 at 6:39 PM
> *To: *John McDowall <jmcdowall@paloaltonetworks.com>
> *Cc: *"ovs-dev@openvswitch.org" <ovs-dev@openvswitch.org>
> *Subject: *Re: [ovs-dev] ovn: SFC Patch V3
>
>
>
>
>
> On Mon, Apr 24, 2017 at 12:56 PM, <jmcdowall@paloaltonetworks.com> wrote:
>
> From: John McDowall <jmcdowall@paloaltonetworks.com>
>
>
> Fixed changes from Mickey's last review.
>
> Changes
>
> 1) Fixed re-circulation rules
>
>
>
> Still a few modifications required. See comments inline. I just typed some
> stuff out, have not run, built, or tested anything.
>
>
>
> 2) Fixed match statement - match is only applied to beginnning of chain in
>    each direction.
> 3) Fixed array length of chain of VNFs. I have tested thi sup to three VNFs
>    in a chain and it looks like it works in both directions.
>
> Areas to review
>
> 1) The logic now supports hair-pinnign the flow back to the original
> source to
>    ensure that the MAC learnign problem is addressed. I tested this using
>    ovn-trace - any better testing that I should do?
>
> Current todo list
>
> 1) I have standalone tests need to add tests to ovs/ovn framework.
> 2) Load-balancing support for port-pair-groups
> 3) Publish more detailed examples.
> 4) Submit suggestions to change and shorted the CLI names.
>
> Simple example using ovn-trace
>
> #!/bin/sh
> #
> clear
> ovn-nbctl ls-add swt1
>
> ovn-nbctl lsp-add swt1 swt1-appc
> ovn-nbctl lsp-add swt1 swt1-apps
> ovn-nbctl lsp-add swt1 swt1-vnfp1
> ovn-nbctl lsp-add swt1 swt1-vnfp2
>
> ovn-nbctl lsp-set-addresses swt1-appc "00:00:00:00:00:01 192.168.33.1"
> ovn-nbctl lsp-set-addresses swt1-apps "00:00:00:00:00:02 192.168.33.2"
> ovn-nbctl lsp-set-addresses swt1-vnfp1 00:00:00:00:00:03
> ovn-nbctl lsp-set-addresses swt1-vnfp2 00:00:00:00:00:04
> #
> # Configure Service chain
> #
> ovn-nbctl lsp-pair-add swt1 swt1-vnfp1 swt1-vnfp2 pp1
> ovn-nbctl lsp-chain-add swt1 pc1
> ovn-nbctl lsp-pair-group-add pc1 ppg1
> ovn-nbctl lsp-pair-group-add-port-pair ppg1 pp1
> ovn-nbctl lsp-chain-classifier-add swt1 pc1 swt1-appc "entry-lport"
> "bi-directional" pcc1
> #
> ovn-sbctl dump-flows
> #
> # Run trace command
> printf "\n---------Flow 1 -------------\n\n"
> ovn-trace --detailed  swt1 'inport == "swt1-appc" && eth.src ==
> 00:00:00:00:00:01 && eth.dst == 00:00:00:00:00:02'
> printf "\n---------Flow 2 -------------\n\n"
> ovn-trace --detailed  swt1 'inport == "swt1-vnfp1" && eth.src ==
> 00:00:00:00:00:01 && eth.dst == 00:00:00:00:00:02'
> printf "\n---------Flow 3 -------------\n\n"
> ovn-trace --detailed  swt1 'inport == "swt1-apps" && eth.dst ==
> 00:00:00:00:00:01 && eth.src == 00:00:00:00:00:02'
> printf "\n---------Flow 4 -------------\n\n"
> ovn-trace --detailed  swt1 'inport == "swt1-vnfp2" && eth.dst ==
> 00:00:00:00:00:01 && eth.src == 00:00:00:00:00:02'
> #
> # Cleanup
> #
> ovn-nbctl lsp-chain-classifier-del pcc1
> ovn-nbctl lsp-pair-group-del ppg1
> ovn-nbctl lsp-chain-del pc1
> ovn-nbctl lsp-pair-del pp1
> ovn-nbctl ls-del swt1
>
> Reported at: https://mail.openvswitch.org/pipermail/ovs-discuss/2016-
> March/040381.html
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_pipermail_ovs-2Ddiscuss_2016-2DMarch_040381.html&d=DwMFaQ&c=V9IgWpI5PvzTw83UyHGVSoW3Uc1MFWe5J8PTfkrzVSo&r=vZ6VUDaavDpfOdPQrz1ED54jEjvAE36A8TVJroVlrOQ&m=IVGOl8me3C4dzw8GMHKoxCKC1PfFf10p5EcD8PIGmbI&s=TRwcbjVrcqErrFP_gC_ZK6axUelonie6aFWbgEx1tpI&e=>
> Reported at: https://mail.openvswitch.org/pipermail/ovs-discuss/2016-
> May/041359.html
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_pipermail_ovs-2Ddiscuss_2016-2DMay_041359.html&d=DwMFaQ&c=V9IgWpI5PvzTw83UyHGVSoW3Uc1MFWe5J8PTfkrzVSo&r=vZ6VUDaavDpfOdPQrz1ED54jEjvAE36A8TVJroVlrOQ&m=IVGOl8me3C4dzw8GMHKoxCKC1PfFf10p5EcD8PIGmbI&s=HHqxmHiE32vv5l-uyeTAY2gy3MQvDclqu-A49TlM6pU&e=>
>
> Signed-off-by: John McDowall <jmcdowall@paloaltonetworks.com>
> Signed-off-by: Flavio Fernandes <flavio@flaviof.com>
> Co-authored-by: Flavio Fernandes <flavio@flaviof.com>
> ---
>  ovn/northd/ovn-northd.8.xml   |   69 ++-
>  ovn/northd/ovn-northd.c       |  382 ++++++++++++-
>  ovn/ovn-architecture.7.xml    |   91 ++++
>  ovn/ovn-nb.ovsschema          |   87 ++-
>  ovn/ovn-nb.xml                |  188 ++++++-
>  ovn/utilities/ovn-nbctl.8.xml |  231 ++++++++
>  ovn/utilities/ovn-nbctl.c     | 1208 ++++++++++++++++++++++++++++++
> +++++++++++
>  7 files changed, 2227 insertions(+), 29 deletions(-)
>
>
>
> <snip>
>
>
>
>
> diff --git ovn/northd/ovn-northd.c ovn/northd/ovn-northd.c
> index d0a5ba2..090f768 100644
> --- ovn/northd/ovn-northd.c
> +++ ovn/northd/ovn-northd.c
> @@ -106,13 +106,14 @@ enum ovn_stage {
>      PIPELINE_STAGE(SWITCH, IN,  PRE_LB,         4, "ls_in_pre_lb")
> \
>      PIPELINE_STAGE(SWITCH, IN,  PRE_STATEFUL,   5, "ls_in_pre_stateful")
> \
>      PIPELINE_STAGE(SWITCH, IN,  ACL,            6, "ls_in_acl")
>  \
> -    PIPELINE_STAGE(SWITCH, IN,  QOS_MARK,       7, "ls_in_qos_mark")
> \
> -    PIPELINE_STAGE(SWITCH, IN,  LB,             8, "ls_in_lb")
> \
> -    PIPELINE_STAGE(SWITCH, IN,  STATEFUL,       9, "ls_in_stateful")
> \
> -    PIPELINE_STAGE(SWITCH, IN,  ARP_ND_RSP,    10, "ls_in_arp_rsp")
>  \
> -    PIPELINE_STAGE(SWITCH, IN,  DHCP_OPTIONS,  11, "ls_in_dhcp_options")
> \
> -    PIPELINE_STAGE(SWITCH, IN,  DHCP_RESPONSE, 12, "ls_in_dhcp_response")
> \
> -    PIPELINE_STAGE(SWITCH, IN,  L2_LKUP,       13, "ls_in_l2_lkup")
>  \
> +    PIPELINE_STAGE(SWITCH, IN,  CHAIN,          7, "ls_in_chain")        \
> +    PIPELINE_STAGE(SWITCH, IN,  QOS_MARK,       8, "ls_in_qos_mark")    \
> +    PIPELINE_STAGE(SWITCH, IN,  LB,             9, "ls_in_lb")
> \
> +    PIPELINE_STAGE(SWITCH, IN,  STATEFUL,      10, "ls_in_stateful")
> \
> +    PIPELINE_STAGE(SWITCH, IN,  ARP_ND_RSP,    11, "ls_in_arp_rsp")
>  \
> +    PIPELINE_STAGE(SWITCH, IN,  DHCP_OPTIONS,  12, "ls_in_dhcp_options")
> \
> +    PIPELINE_STAGE(SWITCH, IN,  DHCP_RESPONSE, 13, "ls_in_dhcp_response")
> \
> +    PIPELINE_STAGE(SWITCH, IN,  L2_LKUP,       14, "ls_in_l2_lkup")
>  \
>                                                                        \
>      /* Logical switch egress stages. */                               \
>
>
>
> You need a stage in the egress pipeline to do SFC egress loopback:
>
>
>
>     PIPELINE_STAGE(SWITCH, OUT, SFC_LOOPBACK, 0, "ls_out_sfc_loopback")  \
>
>     PIPELINE_STAGE(SWITCH, OUT, PRE_LB,       1, "ls_out_pre_lb")     \
>
>     ...
>
>
>
>      PIPELINE_STAGE(SWITCH, OUT, PRE_LB,       0, "ls_out_pre_lb")     \
> @@ -160,7 +161,6 @@ enum ovn_stage {
>  #define REGBIT_CONNTRACK_COMMIT "reg0[1]"
>  #define REGBIT_CONNTRACK_NAT    "reg0[2]"
>  #define REGBIT_DHCP_OPTS_RESULT "reg0[3]"
> -
>
>
>
> Whitespace change should be removed.
>
>
>
>  /* Register definitions for switches and routers. */
>  #define REGBIT_NAT_REDIRECT     "reg9[0]"
>  /* Indicate that this packet has been recirculated using egress
> @@ -3014,6 +3014,348 @@ build_acls(struct ovn_datapath *od, struct hmap
> *lflows)
>      }
>  }
>
> +static int
> +cmp_port_pair_groups(const void *ppg1_, const void *ppg2_)
> +{
> +    const struct nbrec_logical_port_pair_group *const *ppg1p = ppg1_;
> +    const struct nbrec_logical_port_pair_group *const *ppg2p = ppg2_;
> +    const struct nbrec_logical_port_pair_group *ppg1 = *ppg1p;
> +    const struct nbrec_logical_port_pair_group *ppg2 = *ppg2p;
> +
> +
> +    return ( (int)ppg1->sortkey - (int)ppg2->sortkey);
> +}
> +
> +static void
> +build_chain(struct ovn_datapath *od, struct hmap *lflows, struct hmap
> *ports)
> +{
> +   /*
> +    * TODO Items
> +    *     * IPV6 support
> +    *     * Load balancing support
> +    *     * Bidirectional parameter support
> +    *     * Support modes of VNF (BitW, L2, L3)
> +    *     * Remove port-security on VNF Ports (if set by CMS)
> +    *     * Add some state to allow match that does not require 'inport'
> +    *     * Support visiting the same VNF more than once
> +    *     * Unit tests!
> +    */
> +    const uint16_t ingress_inner_priority = 150;
> +    const uint16_t ingress_outer_priority = 100;
> +    const uint16_t egress_inner_priority = 150;
> +    const uint16_t egress_outer_priority = 100;
> +
> +    struct ovn_port **input_port_array = NULL;
> +    struct ovn_port **output_port_array = NULL;
> +
> +    struct ovn_port *dst_port = NULL;
> +    struct ovn_port *src_port = NULL;
> +
> +    struct nbrec_logical_port_chain *lpc;
> +    struct nbrec_logical_port_pair_group *lppg;
> +    struct nbrec_logical_port_pair *lpp;
> +    struct nbrec_logical_port_chain_classifier *lcc;
> +
> +    char *lcc_match = NULL;
> +    char *lcc_action = NULL;
> +    struct ovn_port *traffic_port;
> +    unsigned int chain_direction = 2;
> +    unsigned int chain_path = 2;
> +    char * chain_match = NULL;
> +
> +    /* Ingress table ls_in_chain: default to passing through to the next
> table
> +     * (priority 0)
> +     */
> +    ovn_lflow_add(lflows, od, S_SWITCH_IN_CHAIN, 0, "1", "next;");
>
>
>
> At some point also need:
>
>
>
>     ovn_lflow_add(lflows, od, S_SWITCH_OUT_SFC_LOOPBACK, 0, "1", "next;");
>
>
>
> +
> +    /* No port chains defined therefore, no further work to do here. */
> +    if (!od->nbs->port_chain_classifiers) {
> +        return;
> +    }
> +    /* Iterate through all the port-chains defined for this datapath. */
> +    for (size_t i = 0; i < od->nbs->n_port_chain_classifiers; i++) {
> +        lcc = od->nbs->port_chain_classifiers[i];
> +        /* Get the parameters from the classifier */
> +        lpc = lcc->chain;
> +        //traffic_port = lcc->port;
> +        traffic_port =  ovn_port_find(ports,lcc->port->name);
> +        if (traffic_port == NULL) {
> +            static struct vlog_rate_limit rl =
> +                        VLOG_RATE_LIMIT_INIT(1, 1);
> +            VLOG_WARN_RL(&rl,
> +                        "Traffic port %s does not exist\n",
> +                              lcc->port->name);
> +            break;
> +        }
> +        /*
> +        * Check chain has one or more groups
> +        */
> +        if (lpc->n_port_pair_groups == 0) {
> +            static struct vlog_rate_limit rl =
> +                        VLOG_RATE_LIMIT_INIT(1, 1);
> +            VLOG_WARN_RL(&rl,
> +                "SFC Chain: %s used in classifier: %s does not not "
> +                 "have any port pair groups\n",
> +                              lcc->chain->name, lcc->name);
> +            break;
> +
> +        }
> +        /* TODO Check port exists. */
> +        struct eth_addr traffic_logical_port_ea;
> +        ovs_be32 traffic_logical_port_ip;
> +        ovs_scan(traffic_port->nbsp->addresses[0],
> +                    ETH_ADDR_SCAN_FMT" "IP_SCAN_FMT,
> +                    ETH_ADDR_SCAN_ARGS(traffic_logical_port_ea),
> +                    IP_SCAN_ARGS(&traffic_logical_port_ip));
> +        /* Set the port to use as source or destination. */
> +        if (strcmp(lcc->direction,"entry-lport")==0) {
> +            chain_path = 0;
> +        } else {
> +            chain_path = 1;
> +        }
> +        /* Set the direction of the port-chain. */
> +        if (strcmp(lcc->path,"uni-directional") == 0) {
> +            chain_direction = 0;
> +        } else {
> +            chain_direction = 1;
> +        }
> +        /* Set the match parameters. */
> +        chain_match = lcc->match;
> +        /*
> +         * Allocate space for port-pairs + 1. The Extra 1 represents the
> +         * final hop to reach desired destination.
> +         * TODO: We are going to allocate enough space to hold all the
> hops:
> +         *  1 x portGroups + 1. This needs
> +         *  to enhanced to: SUM(port pairs of each port group) + 1
> +         */
> +        input_port_array = xmalloc(sizeof *src_port *
> +                                   lpc->n_port_pair_groups);
> +        output_port_array = xmalloc(sizeof *dst_port *
> +                                  (lpc->n_port_pair_groups));
> +        /* Copy port groups from chain and sort them according to
> sortkey.*/
> +        struct nbrec_logical_port_pair_group **port_pair_groups =
> +                                 xmemdup(lpc->port_pair_groups,
> +                          sizeof *port_pair_groups *
> lpc->n_port_pair_groups);
> +        if (lpc->n_port_pair_groups > 1) {
> +            qsort(port_pair_groups, lpc->n_port_pair_groups,
> +              sizeof *port_pair_groups, cmp_port_pair_groups);
> +        }
> +        /* For each port-pair-group in a port chain pull out the
> port-pairs.*/
> +        for (size_t j = 0; j < lpc->n_port_pair_groups; j++) {
> +            lppg = port_pair_groups[j];
> +            for (size_t k = 0; k < lppg->n_port_pairs; k++) {
> +                 /* TODO: Need to add load balancing logic when LB becomes
> +                 * available. Until LB is available just take the first
> +                 * PP in the PPG. */
> +                if (k > 0) {
> +                    static struct vlog_rate_limit rl =
> +                        VLOG_RATE_LIMIT_INIT(1, 1);
> +                    VLOG_WARN_RL(&rl,
> +                        "Currently lacking support for more than \
> +                            one port-pair %"PRIuSIZE"\n",
> +                              lppg->n_port_pairs);
> +                    break;
> +                }
> +                lpp = lppg->port_pairs[k];
> +                input_port_array[j] = lpp->inport ? ovn_port_find(ports,
> +                                       lpp->inport->name) : NULL;
> +                output_port_array[j] = lpp->outport ? ovn_port_find(ports,
> +                                        lpp->outport->name) : NULL;
> +            }
> +        /* At this point we need to determine the final hop port to add to
> +         * the chain. This defines the complete path for packets through
> +         * the chain. */
> +        }
>
>
>
> I find the change in indentation below confusing.
>
>
>
> +    /*
> +    * Insert the lowest priorty rule dest is src-logical-port. These are
> the
> +    * entry points into the chain in either direction. The match statement
> +    * is used to filter the entry port to provide higher granularity of
> +    * filtering.
> +    */
> +    if (chain_path == 1) { /* Path starting from entry port */
> +        if (strcmp(chain_match,"")!=0) {
> +           lcc_match =  xasprintf(
> +            "eth.src == "ETH_ADDR_FMT" && %s",
> +             ETH_ADDR_ARGS(traffic_logical_port_ea), chain_match);
> +        } else {
> +           lcc_match =  xasprintf(
> +            "eth.src == "ETH_ADDR_FMT,
> +             ETH_ADDR_ARGS(traffic_logical_port_ea));
> +        }
> +           lcc_action = xasprintf("outport = %s; output;",
> +                                input_port_array[0]->json_key);
> +    } else {  /* Path going to the entry port */
> +        if (strcmp(chain_match,"")!=0) {
> +           lcc_match =  xasprintf(
> +            "eth.dst == "ETH_ADDR_FMT" && %s",
> +             ETH_ADDR_ARGS(traffic_logical_port_ea), chain_match);
> +        } else {
> +           lcc_match =  xasprintf(
> +            "eth.dst == "ETH_ADDR_FMT,
> +             ETH_ADDR_ARGS(traffic_logical_port_ea));
> +        }
> +           lcc_action = xasprintf("outport = %s; output;",
> +                    output_port_array[lpc->n_port_
> pair_groups-1]->json_key);
> +    }
> +    ovn_lflow_add(lflows, od, S_SWITCH_IN_CHAIN, ingress_outer_priority,
> +                       lcc_match, lcc_action);
> +    free(lcc_match);
> +    free(lcc_action);
> +    /*
> +    * For last VNF send the flow back to teh original chassis and exit
> from
> +    * there.
> +    */
> +    if (chain_path == 1) { /* Path starting from entry port */
> +           lcc_match = xasprintf(
> +                    "eth.src == "ETH_ADDR_FMT" && inport == %s",
> +                     ETH_ADDR_ARGS(traffic_logical_port_ea),
> +                     output_port_array[lpc->n_
> port_pair_groups-1]->json_key);
> +    } else { /* Path starting from destination port. */
> +            lcc_match = xasprintf(
> +                    "eth.dst == "ETH_ADDR_FMT" && inport == %s",
> +                     ETH_ADDR_ARGS(traffic_logical_port_ea),
> +                     input_port_array[0]->json_key);
> +    }
>
>
>
>     lcc_action = xasprintf("outport = %s; output;",
> traffic_port->json_key);
>
>     ovn_lflow_add(lflows, od, S_SWITCH_IN_CHAIN, ingress_inner_priority,
>                         lcc_match, lcc_action);
>     free(lcc_match);
>     free(lcc_action);
>
>
>
> Temporarily ignoring cases where there are multiple port_chain_classifiers
> with the same traffic_port. You should do something to avoid duplicate
> flows in this case.
>
>
>
>     /* SFC egress loopback on traffic_port. */
>
>
>
>     lcc_match = xasprintf(
>
>                     "eth.src == "ETH_ADDR_FMT" && outport == %s",
>                     ETH_ADDR_ARGS(traffic_logical_port_ea),
>
>                     traffic_port->json_key);
>
>
>
> +             //lcc_action = xasprintf("next;");
> +             //lcc_action = xasprintf("flags.loopback = 1; "
> +             //    REGBIT_CHAIN_LOOPBACK" = 1;"
> +             //   "next(pipeline=ingress, table=0);");
> +    struct ds actions = DS_EMPTY_INITIALIZER;
> +    ds_put_format(&actions, "clone { ct_clear; "
> +                              "inport = outport; outport = \"\"; "
> +                              "flags = 0; flags.loopback = 1; ");
>
>
>
> Probably should not set flags.loopback, see comments
>
> below.
>
>
>
> +    for (int ii = 0; ii < MFF_N_LOG_REGS; ii++) {
> +            ds_put_format(&actions, "reg%d = 0; ", ii);
> +    }
> +    ds_put_format(&actions, "next(pipeline=ingress, table=%d); };",
> +                            ovn_stage_get_table(S_SWITCH_IN_CHAIN) + 1);
> +    ovn_lflow_add(lflows, od, S_SWITCH_IN_CHAIN, ingress_inner_priority,
> +                        lcc_match, ds_cstr(&actions));
>
>
>
> Replace the line above by:
>
>
>
>     ovn_lflow_add(lflows, od, S_SWITCH_OUT_SFC_LOOPBACK, 100,
>                         lcc_match, ds_cstr(&actions));
>
>
>
> +    free(lcc_match);
> +        //free(lcc_action);
> +    ds_destroy(&actions);
> +
>
>
>
> Some thought needs to be put behind loop avoidance, for
>
> example when eth.dst == eth.src == traffic_logical_port_ea.
>
> This may be an issue since we do not have any definitive
>
> method to distinguish packets returning from chains, from
>
> packets that were switched to traffic_port according to
>
> normal L2 processing.
>
>
>
> One option is to add higher priority ( > 100) flows. This is
>
> messy, since there is already a fair amount of code to
>
> write flows to send traffic to this port in
>
> S_SWITCH_IN_L2_LKUP. This needs some thought to
>
> determine the match conditions.
>
> The action is simple, just "next;".
>
>
>
> Perhaps more promising is not to set flags.loopback to 1.
>
> This would cause any packets with
>
> eth.src == traffic_logical_port_ea that are switched to
>
> traffic_port to go back through traffic_port's ingress
>
> pipeline, then be dropped if outport == traffic_port at the
>
> end of traffic_port's ingress pipeline.
>
> However, if anything after S_SWITCH_IN_CHAIN sets
>
> flags.loopback, that might not be enough.
>
>
>
> There seems to be another issue, if anything before
>
> S_SWITCH_IN_CHAIN sets flags.loopback, that will not be
>
> remembered when the packet comes back, so the packet
>
> will not be processed properly.
>
>
>
> Right now it looks like flags.loopback is only set for a
>
> logical switch in tables 11, 12, 13 which are after
>
> S_SWITCH_IN_CHAIN. So, at the moment, remembering
>
> flags.loopback is not an issue. These flows generate
>
> ARP and DCHP responses, in all cases changing
>
> eth.src, so the packets will not loop indefinitely even
>
> though these rules set flags.loopback. There are
>
> already flows to avoid generating ARP responses for
>
> requests on the port that owns the address, which is
>
> the dangerous case for SFC loops.
>
>
>
> In summary, it looks like leaving flags.loopback at 0
>
> should work for now. The question is what would
>
> happen if more cases are added in the future that
>
> set flags.loopback in a logical switch.
>
>
>
> Mickey
>
>
>
> +    /*
> +    * Loop over all VNFs and create flow rules for each
> +    * Only get here when there is more than one VNF in the chain.
> +    */
> +    for (size_t j = 1; j < lpc->n_port_pair_groups; j++) {
> +
> +        /* Apply inner rules flows */
> +        if (chain_path == 1) { /* Path starting from entry port */
> +            lcc_match = xasprintf(
> +                    "eth.src == "ETH_ADDR_FMT" && inport == %s",
> +                                 ETH_ADDR_ARGS(traffic_logical_port_ea),
> +                                 output_port_array[j-1]->json_key);
> +            lcc_action = xasprintf("outport = %s; output;",
> +                                input_port_array[j]->json_key);
> +        } else { /* Path going to entry port. */
> +            lcc_match = xasprintf(
> +                    "eth.dst == "ETH_ADDR_FMT" && inport == %s",
> +                    ETH_ADDR_ARGS(traffic_logical_port_ea),
> +                    input_port_array[lpc->n_port_
> pair_groups-j]->json_key);
> +             lcc_action = xasprintf("outport = %s; output;",
> +                  output_port_array[lpc->n_port_
> pair_groups-(j+1)]->json_key);
> +        }
> +        ovn_lflow_add(lflows, od, S_SWITCH_IN_CHAIN,
> ingress_inner_priority,
> +                        lcc_match, lcc_action);
> +        free(lcc_match);
> +        free(lcc_action);
> +    }
> +    /* bi-directional chain (Response)*/
> +    if (chain_direction == 1) {
> +        /*
> +         * Insert the lowest priorty rule dest is src-logical-port
> +         */
> +        if (chain_path == 1) { /* Path from source port. */
> +            if (strcmp(chain_match,"")!=0) {
> +                 lcc_match =  xasprintf("eth.dst == "ETH_ADDR_FMT" && %s",
> +                        ETH_ADDR_ARGS(traffic_logical_port_ea),
> chain_match);
> +            } else { /* Path to source port */
> +
> +                 lcc_match =  xasprintf("eth.dst == "ETH_ADDR_FMT,
> +                                    ETH_ADDR_ARGS(traffic_logical_
> port_ea));
> +            }
> +            lcc_action = xasprintf("outport = %s; output;",
> +                    output_port_array[lpc->n_port_
> pair_groups-1]->json_key);
> +        } else { /* Path from destination port. */
> +           if (strcmp(chain_match,"")!=0) {
> +                lcc_match =  xasprintf("eth.src == "ETH_ADDR_FMT" && %s",
> +                        ETH_ADDR_ARGS(traffic_logical_port_ea),
> chain_match);
> +            } else {
> +                lcc_match =  xasprintf("eth.src == "ETH_ADDR_FMT,
> +                        ETH_ADDR_ARGS(traffic_logical_port_ea));
> +            }
> +            lcc_action = xasprintf("outport = %s; output;",
> +                    input_port_array[0]->json_key);
> +        }
> +        ovn_lflow_add(lflows, od, S_SWITCH_IN_CHAIN,
> egress_outer_priority,
> +                       lcc_match, lcc_action);
> +        free(lcc_match);
> +        free(lcc_action);
> +        /*
> +        * End Entry Flow to classification chain entry point.
> +        */
> +        /* Apply last rule to exit from chain */
> +        if (chain_path == 1) { /* Path from source port. */
> +             lcc_match = xasprintf(
> +                    "eth.dst == "ETH_ADDR_FMT" && inport == %s",
> +                                 ETH_ADDR_ARGS(traffic_logical_port_ea),
> +                                 input_port_array[0]->json_key);
> +
> +        } else { /* Path to source port. */
> +             lcc_match = xasprintf(
> +                "eth.src == "ETH_ADDR_FMT" && inport == %s",
> +                    ETH_ADDR_ARGS(traffic_logical_port_ea),
> +                    output_port_array[lpc->n_port_
> pair_groups-1]->json_key);
> +        }
> +        struct ds actions = DS_EMPTY_INITIALIZER;
> +        ds_put_format(&actions,
> +                              "clone { ct_clear; "
> +                              "inport = outport; outport = \"\"; "
> +                              "flags = 0; flags.loopback = 1; ");
> +        for (int ii = 0; ii < MFF_N_LOG_REGS; ii++) {
> +                    ds_put_format(&actions, "reg%d = 0; ", ii);
> +        }
> +        ds_put_format(&actions, "next(pipeline=ingress, table=%d); };",
> +                              ovn_stage_get_table(S_SWITCH_IN_CHAIN) +
> 1);
> +        ovn_lflow_add(lflows, od, S_SWITCH_IN_CHAIN,
> +                        egress_inner_priority, lcc_match,
> ds_cstr(&actions));
> +        ds_destroy(&actions);
> +        free(lcc_match);
> +
> +        for (int j = 1; j< lpc->n_port_pair_groups; j++) {
> +
> +            /* Completed first catch all rule for this port-chain. */
> +
> +            /* Apply inner rules flows */
> +            if (chain_path == 1) { /* Path from source port. */
> +                lcc_match = xasprintf(
> +                    "eth.dst == "ETH_ADDR_FMT" && inport == %s",
> +                    ETH_ADDR_ARGS(traffic_logical_port_ea),
> +                    input_port_array[lpc->n_port_
> pair_groups-j]->json_key);
> +                 lcc_action = xasprintf("outport = %s; output;",
> +                  output_port_array[lpc->n_port_
> pair_groups-(j+1)]->json_key);
> +
> +            } else { /* Path to source port. */
> +                lcc_match = xasprintf(
> +                    "eth.src == "ETH_ADDR_FMT" && inport == %s",
> +                        ETH_ADDR_ARGS(traffic_logical_port_ea),
> +                        output_port_array[j-1]->json_key);
> +                 lcc_action = xasprintf("outport = %s; output;",
> +                        input_port_array[j]->json_key);
> +            }
> +            ovn_lflow_add(lflows, od, S_SWITCH_IN_CHAIN,
> +                          egress_inner_priority, lcc_match, lcc_action);
> +            free(lcc_match);
> +            free(lcc_action);
> +        }
> +    }
> +    free(input_port_array);
> +    free(output_port_array);
> +    free(port_pair_groups);
> +    }
> +}
>  static void
>  build_qos(struct ovn_datapath *od, struct hmap *lflows) {
>      ovn_lflow_add(lflows, od, S_SWITCH_IN_QOS_MARK, 0, "1", "next;");
> @@ -3142,7 +3484,7 @@ build_lswitch_flows(struct hmap *datapaths, struct
> hmap *ports,
>      struct ds actions = DS_EMPTY_INITIALIZER;
>
>      /* Build pre-ACL and ACL tables for both ingress and egress.
> -     * Ingress tables 3 through 9.  Egress tables 0 through 6. */
> +     * Ingress tables 3 through 10.  Egress tables 0 through 6. */
>      struct ovn_datapath *od;
>      HMAP_FOR_EACH (od, key_node, datapaths) {
>          if (!od->nbs) {
> @@ -3153,6 +3495,7 @@ build_lswitch_flows(struct hmap *datapaths, struct
> hmap *ports,
>          build_pre_lb(od, lflows);
>          build_pre_stateful(od, lflows);
>          build_acls(od, lflows);
> +        build_chain(od, lflows, ports);
>          build_qos(od, lflows);
>          build_lb(od, lflows);
>          build_stateful(od, lflows);
> @@ -3225,9 +3568,9 @@ build_lswitch_flows(struct hmap *datapaths, struct
> hmap *ports,
>          ovn_lflow_add(lflows, od, S_SWITCH_IN_PORT_SEC_IP, 0, "1",
> "next;");
>      }
>
> -    /* Ingress table 10: ARP/ND responder, skip requests coming from
> localnet
> -     * and vtep ports. (priority 100); see ovn-northd.8.xml for the
> -     * rationale. */
> +    /* Ingress table 11: ARP/ND responder, skip requests coming from
> localnet
> +     * ports. (priority 100). */
> +
>      HMAP_FOR_EACH (op, key_node, ports) {
>          if (!op->nbsp) {
>              continue;
> @@ -3242,7 +3585,7 @@ build_lswitch_flows(struct hmap *datapaths, struct
> hmap *ports,
>          }
>      }
>
> -    /* Ingress table 10: ARP/ND responder, reply for known IPs.
> +    /* Ingress table 11: ARP/ND responder, reply for known IPs.
>       * (priority 50). */
>      HMAP_FOR_EACH (op, key_node, ports) {
>          if (!op->nbsp) {
> @@ -3335,7 +3678,7 @@ build_lswitch_flows(struct hmap *datapaths, struct
> hmap *ports,
>          }
>      }
>
> -    /* Ingress table 10: ARP/ND responder, by default goto next.
> +    /* Ingress table 11: ARP/ND responder, by default goto next.
>       * (priority 0)*/
>      HMAP_FOR_EACH (od, key_node, datapaths) {
>          if (!od->nbs) {
> @@ -3345,7 +3688,7 @@ build_lswitch_flows(struct hmap *datapaths, struct
> hmap *ports,
>          ovn_lflow_add(lflows, od, S_SWITCH_IN_ARP_ND_RSP, 0, "1",
> "next;");
>      }
>
> -    /* Logical switch ingress table 11 and 12: DHCP options and response
> +    /* Logical switch ingress table 12 and 13: DHCP options and response
>           * priority 100 flows. */
>      HMAP_FOR_EACH (op, key_node, ports) {
>          if (!op->nbsp) {
> @@ -3449,7 +3792,7 @@ build_lswitch_flows(struct hmap *datapaths, struct
> hmap *ports,
>          }
>      }
>
> -    /* Ingress table 11 and 12: DHCP options and response, by default
> goto next.
> +    /* Ingress table 12 and 13: DHCP options and response, by default
> goto next.
>       * (priority 0). */
>
>      HMAP_FOR_EACH (od, key_node, datapaths) {
> @@ -3461,7 +3804,8 @@ build_lswitch_flows(struct hmap *datapaths, struct
> hmap *ports,
>          ovn_lflow_add(lflows, od, S_SWITCH_IN_DHCP_RESPONSE, 0, "1",
> "next;");
>      }
>
> -    /* Ingress table 13: Destination lookup, broadcast and multicast
> handling
> +
> +    /* Ingress table 14: Destination lookup, broadcast and multicast
> handling
>       * (priority 100). */
>      HMAP_FOR_EACH (op, key_node, ports) {
>          if (!op->nbsp) {
> @@ -3481,7 +3825,7 @@ build_lswitch_flows(struct hmap *datapaths, struct
> hmap *ports,
>                        "outport = \""MC_FLOOD"\"; output;");
>      }
>
> -    /* Ingress table 13: Destination lookup, unicast handling (priority
> 50), */
> +    /* Ingress table 14: Destination lookup, unicast handling (priority
> 50), */
>      HMAP_FOR_EACH (op, key_node, ports) {
>          if (!op->nbsp) {
>              continue;
> @@ -3581,7 +3925,7 @@ build_lswitch_flows(struct hmap *datapaths, struct
> hmap *ports,
>          }
>      }
>
> -    /* Ingress table 13: Destination lookup for unknown MACs (priority
> 0). */
> +    /* Ingress table 14: Destination lookup for unknown MACs (priority
> 0). */
>      HMAP_FOR_EACH (od, key_node, datapaths) {
>          if (!od->nbs) {
>              continue;
>
>
>
> <snip>
>
>
>
Mickey Spiegel June 8, 2017, 5:14 p.m. UTC | #2
A couple more issues that have not come up in a while or at all so far:

1. A port can have multiple mac addresses. Right now you are only using the
first mac address on a port (traffic_port->nbsp->addresses[0]). Rules need
to be installed for all of the mac addresses on the port.

2. The VNF ports are associated with the logical switch's datapath. Any
rules in non-SFC specific tables (e.g. ingress stages 1-9, egress stages
1-9) will be applied over and over again at each SFC hop. This affects
performance. Besides the overhead of additional redundant lookups, if there
are stateful ACL rules then conntrack recirc will occur. Possibly even
worse, if any of the VNFs change any of the fields in the ACL match
condition, then the packet could fail the ACL and be dropped. One option to
avoid all of this would be to insert a pipeline stage at the beginning of
the ingress pipeline and egress pipelines (we already have one for SFC in
the egress pipeline, which can be reused for this purpose as well),
skipping most if not all other pipeline stages on VNF ports. For
intermediate SFC hops in the ingress pipeline, I guess the rules could be
put in this first table rather than the current table 10. For the egress
pipeline, I guess just output directly?

Mickey


On Wed, May 10, 2017 at 3:49 PM, Mickey Spiegel <mickeys.dev@gmail.com>
wrote:

> Three issues before diving in:
>
>
> 1. Placement of S_SWITCH_IN_CHAIN
>
> For some reason I thought S_SWITCH_IN_CHAIN was after all the stateful
> processing, but now I see that it is table 7. That breaks ACLs and other
> stateful processing, since REGBIT_CONNTRACK_COMMIT is set in
> S_SWITCH_IN_ACL and matched in S_SWITCH_IN_STATEFUL.
>
> S_SWITCH_IN_CHAIN should instead be table 10. The comments below are
> written assuming this change.
>
>
> 2. Ingress pipeline needs to be expanded to more than 16 tables
>
> DNS went in since the v3 patch and used up the last of the 16 ingress
> tables. If you rebase, you will see that there is no space in the ingress
> pipeline for the addition of S_SWITCH_IN_CHAIN. Someone (not me) needs to
> expand the ingress pipeline to more than 16 stages before you can proceed.
>
>
> 3. While writing this response, I paid a little more attention to the
> "exit-lport" direction and noticed a couple of significant issues.
>
> a. If a packet goes from VM A on port 1 to VM B on port 4, there is a
> logical port chain classifier on port 1 in the "entry-lport" direction, and
> there is a logical port chain classifier on port 4 in the "exit-lport"
> direction, you will only go down one of the service chains. Since the
> priorities are equal, I cannot even tell you which one of the service
> chains. Logically I would think that the packet should go down both service
> chains, first the port 1 "entry-lport" service chain and then the port 4
> "exit-lport" service chain.
>
> b. This is done in the ingress pipeline, not the egress pipeline, and is
> based on matching eth.dst. This assumes that the forwarding decision will
> be based on eth.dst, since you are immediately going down the service
> chain, skipping the other ingress pipeline stages, and at the end you go
> directly to the egress pipeline with outport based on eth.dst. That is
> quite restrictive for a generic forwarding architecture like OVN. I would
> think that the right thing to do would be to move the classifier to the
> egress pipeline stage, but then I do not know how to avoid loops. When a
> packet comes to the egress pipeline stage where the VM resides, there is no
> way to tell whether the packet has already gone down the service chain or
> not. I guess you could put a S_SWITCH_IN_EGRESS_CHAIN ingress pipeline
> stage right after L2_LKUP instead, and match on outport in addition to
> eth.dst, but it feels a bit unclean.
>
> On Tue, May 9, 2017 at 4:33 PM, John McDowall <jmcdowall@paloaltonetworks.
> com> wrote:
>
>> Mickey,
>>
>>
>>
>> Thanks for the review. I need some help understanding a couple of things:
>>
>>
>>
>> 1)       The proposed change, I could see the previous logic where we
>> inserted the flow back in the ingress pipeline just after the IN_CHAIN
>> stage. The changes you suggest seem to imply that the action is still
>> insert after the _*IN*_CHAIN stage but in the egress (OUT) pipeline. I
>> am missing something here – can you give me some more info?
>>
> Assume you have port 1 to a VM on HV1, port 2 as the input port to a VNF
> on HV2, and port 3 as the output port from that same VNF on HV2. The
> service chain is just that one VNF, with direction "entry-lport".
>
> The packet progresses as follows:
>
> HV1, ingress pipeline, inport 1
> Tables 1 to 9
> Table 10 (S_SWITCH_IN_CHAIN) hits priority 100 flow setting outport = 2
> and then going to output immediately
>
> Table 32 sends the packet through a geneve tunnel to HV2
>
> HV2, egress pipeline, outport 2
> Tables 48 to 65
>
> After packet gets delivered to the VNF, it comes back
>
> HV2, ingress pipeline, inport 3
> Tables 1 to 9
>
> Table 10 (S_SWITCH_IN_CHAIN)
>
> With the code in v3, you are doing a clone with inport = outport (which is
> not yet set!),
> then progressing to ingress Table 11, still on HV2
>
> Eventually you hit S_SWITCH_IN_L2_LKUP and go directly to the destination
> from HV2. If the destination is in the outside world, this just broke
> upstream L2 learning.
>
> My suggestion is instead Table 10 hits a priority 150 flow setting outport
> = 1 and then going to output immediately.
>
> Table 32 sends the packet through a geneve tunnel to HV1 based on outport
> == 1, but it ends up in the egress pipeline
>
> HV1, egress pipeline, outport 1
>
> New Table 0 (S_SWITCH_OUT_SFC_LOOPBACK) hits the entry doing a clone with
> input = outport and jumping to the ingress pipeline Table 11
>
> HV1, ingress pipeline, inport 1
> Tables 11 to 15 (continuation from where things left off at the beginning,
> when the packet was redirected down the service chain)
> ...
>
> Since this code does not use NSH, all metadata accumulated in Tables 1 to
> 9 is lost while going around the service chain, before proceeding through
> Tables 11 to 15. At the moment this works, but this becomes a restriction
> on future features, if they are to work with SFC. The alternative is to
> back through all the Tables 1 through 15 instead, with
> REGBIT_CHAIN_LOOPBACK as in the v2 patch to avoid going hitting any entries
> in Table 10 (S_SWITCH_IN_CHAIN). With this alternative, all packets go
> through Tables 1 to 9 on HV1 twice. We have to make a judgement call which
> alternative is least ugly, or move to NSH and stuff the metadata in the NSH
> header.
>
>>
>>
>> +    for (int ii = 0; ii < MFF_N_LOG_REGS; ii++) {
>> +            ds_put_format(&actions, "reg%d = 0; ", ii);
>> +    }
>> +    ds_put_format(&actions, "next(pipeline=ingress, table=%d); };",
>> +                            ovn_stage_get_table(S_SWITCH_IN_CHAIN) + 1);
>> +    ovn_lflow_add(lflows, od, S_SWITCH_IN_CHAIN, ingress_inner_priority,
>> +                        lcc_match, ds_cstr(&actions));
>>
>>
>>
>> Replace the line above by:
>>
>>
>>
>>     ovn_lflow_add(lflows, od, S_SWITCH_OUT_SFC_LOOPBACK, 100,
>>                         lcc_match, ds_cstr(&actions));
>>
>>
>>
>>
>>
>>
>>
>> 2)       I can try and put some checks in for loop avoidance. Can you
>> think of scenarios that would cause this, a badly configured port-pair
>> could perhaps cause it (if the eth egress of the port-pair was configured
>> as the ingress eth.) Any other scenarios that come to mind ?
>>
>
> I cannot think of a good reason why a packet would end up on the egress
> pipeline on HV1 with outport == 1.
> After thinking about it more, I think it is OK if flags.loopback is left
> as 0. If this case is ever triggered and the second time around through the
> ingress pipeline still sets outport = 1, then the Table 34 loopback check
> will detect that outport == inport and drop the packet.
>
> Mickey
>
>
>>
>> Regards
>>
>>
>>
>> John
>>
>> *From: *Mickey Spiegel <mickeys.dev@gmail.com>
>> *Date: *Monday, April 24, 2017 at 6:39 PM
>> *To: *John McDowall <jmcdowall@paloaltonetworks.com>
>> *Cc: *"ovs-dev@openvswitch.org" <ovs-dev@openvswitch.org>
>> *Subject: *Re: [ovs-dev] ovn: SFC Patch V3
>>
>>
>>
>>
>>
>> On Mon, Apr 24, 2017 at 12:56 PM, <jmcdowall@paloaltonetworks.com> wrote:
>>
>> From: John McDowall <jmcdowall@paloaltonetworks.com>
>>
>>
>> Fixed changes from Mickey's last review.
>>
>> Changes
>>
>> 1) Fixed re-circulation rules
>>
>>
>>
>> Still a few modifications required. See comments inline. I just typed
>> some stuff out, have not run, built, or tested anything.
>>
>>
>>
>> 2) Fixed match statement - match is only applied to beginnning of chain in
>>    each direction.
>> 3) Fixed array length of chain of VNFs. I have tested thi sup to three
>> VNFs
>>    in a chain and it looks like it works in both directions.
>>
>> Areas to review
>>
>> 1) The logic now supports hair-pinnign the flow back to the original
>> source to
>>    ensure that the MAC learnign problem is addressed. I tested this using
>>    ovn-trace - any better testing that I should do?
>>
>> Current todo list
>>
>> 1) I have standalone tests need to add tests to ovs/ovn framework.
>> 2) Load-balancing support for port-pair-groups
>> 3) Publish more detailed examples.
>> 4) Submit suggestions to change and shorted the CLI names.
>>
>> Simple example using ovn-trace
>>
>> #!/bin/sh
>> #
>> clear
>> ovn-nbctl ls-add swt1
>>
>> ovn-nbctl lsp-add swt1 swt1-appc
>> ovn-nbctl lsp-add swt1 swt1-apps
>> ovn-nbctl lsp-add swt1 swt1-vnfp1
>> ovn-nbctl lsp-add swt1 swt1-vnfp2
>>
>> ovn-nbctl lsp-set-addresses swt1-appc "00:00:00:00:00:01 192.168.33.1"
>> ovn-nbctl lsp-set-addresses swt1-apps "00:00:00:00:00:02 192.168.33.2"
>> ovn-nbctl lsp-set-addresses swt1-vnfp1 00:00:00:00:00:03
>> ovn-nbctl lsp-set-addresses swt1-vnfp2 00:00:00:00:00:04
>> #
>> # Configure Service chain
>> #
>> ovn-nbctl lsp-pair-add swt1 swt1-vnfp1 swt1-vnfp2 pp1
>> ovn-nbctl lsp-chain-add swt1 pc1
>> ovn-nbctl lsp-pair-group-add pc1 ppg1
>> ovn-nbctl lsp-pair-group-add-port-pair ppg1 pp1
>> ovn-nbctl lsp-chain-classifier-add swt1 pc1 swt1-appc "entry-lport"
>> "bi-directional" pcc1
>> #
>> ovn-sbctl dump-flows
>> #
>> # Run trace command
>> printf "\n---------Flow 1 -------------\n\n"
>> ovn-trace --detailed  swt1 'inport == "swt1-appc" && eth.src ==
>> 00:00:00:00:00:01 && eth.dst == 00:00:00:00:00:02'
>> printf "\n---------Flow 2 -------------\n\n"
>> ovn-trace --detailed  swt1 'inport == "swt1-vnfp1" && eth.src ==
>> 00:00:00:00:00:01 && eth.dst == 00:00:00:00:00:02'
>> printf "\n---------Flow 3 -------------\n\n"
>> ovn-trace --detailed  swt1 'inport == "swt1-apps" && eth.dst ==
>> 00:00:00:00:00:01 && eth.src == 00:00:00:00:00:02'
>> printf "\n---------Flow 4 -------------\n\n"
>> ovn-trace --detailed  swt1 'inport == "swt1-vnfp2" && eth.dst ==
>> 00:00:00:00:00:01 && eth.src == 00:00:00:00:00:02'
>> #
>> # Cleanup
>> #
>> ovn-nbctl lsp-chain-classifier-del pcc1
>> ovn-nbctl lsp-pair-group-del ppg1
>> ovn-nbctl lsp-chain-del pc1
>> ovn-nbctl lsp-pair-del pp1
>> ovn-nbctl ls-del swt1
>>
>> Reported at: https://mail.openvswitch.org/pipermail/ovs-discuss/2016-Marc
>> h/040381.html
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_pipermail_ovs-2Ddiscuss_2016-2DMarch_040381.html&d=DwMFaQ&c=V9IgWpI5PvzTw83UyHGVSoW3Uc1MFWe5J8PTfkrzVSo&r=vZ6VUDaavDpfOdPQrz1ED54jEjvAE36A8TVJroVlrOQ&m=IVGOl8me3C4dzw8GMHKoxCKC1PfFf10p5EcD8PIGmbI&s=TRwcbjVrcqErrFP_gC_ZK6axUelonie6aFWbgEx1tpI&e=>
>> Reported at: https://mail.openvswitch.org/pipermail/ovs-discuss/2016-May/
>> 041359.html
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_pipermail_ovs-2Ddiscuss_2016-2DMay_041359.html&d=DwMFaQ&c=V9IgWpI5PvzTw83UyHGVSoW3Uc1MFWe5J8PTfkrzVSo&r=vZ6VUDaavDpfOdPQrz1ED54jEjvAE36A8TVJroVlrOQ&m=IVGOl8me3C4dzw8GMHKoxCKC1PfFf10p5EcD8PIGmbI&s=HHqxmHiE32vv5l-uyeTAY2gy3MQvDclqu-A49TlM6pU&e=>
>>
>> Signed-off-by: John McDowall <jmcdowall@paloaltonetworks.com>
>> Signed-off-by: Flavio Fernandes <flavio@flaviof.com>
>> Co-authored-by: Flavio Fernandes <flavio@flaviof.com>
>> ---
>>  ovn/northd/ovn-northd.8.xml   |   69 ++-
>>  ovn/northd/ovn-northd.c       |  382 ++++++++++++-
>>  ovn/ovn-architecture.7.xml    |   91 ++++
>>  ovn/ovn-nb.ovsschema          |   87 ++-
>>  ovn/ovn-nb.xml                |  188 ++++++-
>>  ovn/utilities/ovn-nbctl.8.xml |  231 ++++++++
>>  ovn/utilities/ovn-nbctl.c     | 1208 ++++++++++++++++++++++++++++++
>> +++++++++++
>>  7 files changed, 2227 insertions(+), 29 deletions(-)
>>
>>
>>
>> <snip>
>>
>>
>>
>>
>> diff --git ovn/northd/ovn-northd.c ovn/northd/ovn-northd.c
>> index d0a5ba2..090f768 100644
>> --- ovn/northd/ovn-northd.c
>> +++ ovn/northd/ovn-northd.c
>> @@ -106,13 +106,14 @@ enum ovn_stage {
>>      PIPELINE_STAGE(SWITCH, IN,  PRE_LB,         4, "ls_in_pre_lb")
>>   \
>>      PIPELINE_STAGE(SWITCH, IN,  PRE_STATEFUL,   5,
>> "ls_in_pre_stateful")  \
>>      PIPELINE_STAGE(SWITCH, IN,  ACL,            6, "ls_in_acl")
>>  \
>> -    PIPELINE_STAGE(SWITCH, IN,  QOS_MARK,       7, "ls_in_qos_mark")
>>   \
>> -    PIPELINE_STAGE(SWITCH, IN,  LB,             8, "ls_in_lb")
>>   \
>> -    PIPELINE_STAGE(SWITCH, IN,  STATEFUL,       9, "ls_in_stateful")
>>   \
>> -    PIPELINE_STAGE(SWITCH, IN,  ARP_ND_RSP,    10, "ls_in_arp_rsp")
>>  \
>> -    PIPELINE_STAGE(SWITCH, IN,  DHCP_OPTIONS,  11,
>> "ls_in_dhcp_options")  \
>> -    PIPELINE_STAGE(SWITCH, IN,  DHCP_RESPONSE, 12,
>> "ls_in_dhcp_response") \
>> -    PIPELINE_STAGE(SWITCH, IN,  L2_LKUP,       13, "ls_in_l2_lkup")
>>  \
>> +    PIPELINE_STAGE(SWITCH, IN,  CHAIN,          7, "ls_in_chain")
>> \
>> +    PIPELINE_STAGE(SWITCH, IN,  QOS_MARK,       8, "ls_in_qos_mark")    \
>> +    PIPELINE_STAGE(SWITCH, IN,  LB,             9, "ls_in_lb")
>>   \
>> +    PIPELINE_STAGE(SWITCH, IN,  STATEFUL,      10, "ls_in_stateful")
>>   \
>> +    PIPELINE_STAGE(SWITCH, IN,  ARP_ND_RSP,    11, "ls_in_arp_rsp")
>>  \
>> +    PIPELINE_STAGE(SWITCH, IN,  DHCP_OPTIONS,  12,
>> "ls_in_dhcp_options")  \
>> +    PIPELINE_STAGE(SWITCH, IN,  DHCP_RESPONSE, 13,
>> "ls_in_dhcp_response") \
>> +    PIPELINE_STAGE(SWITCH, IN,  L2_LKUP,       14, "ls_in_l2_lkup")
>>  \
>>                                                                        \
>>      /* Logical switch egress stages. */                               \
>>
>>
>>
>> You need a stage in the egress pipeline to do SFC egress loopback:
>>
>>
>>
>>     PIPELINE_STAGE(SWITCH, OUT, SFC_LOOPBACK, 0, "ls_out_sfc_loopback")  \
>>
>>     PIPELINE_STAGE(SWITCH, OUT, PRE_LB,       1, "ls_out_pre_lb")     \
>>
>>     ...
>>
>>
>>
>>      PIPELINE_STAGE(SWITCH, OUT, PRE_LB,       0, "ls_out_pre_lb")     \
>> @@ -160,7 +161,6 @@ enum ovn_stage {
>>  #define REGBIT_CONNTRACK_COMMIT "reg0[1]"
>>  #define REGBIT_CONNTRACK_NAT    "reg0[2]"
>>  #define REGBIT_DHCP_OPTS_RESULT "reg0[3]"
>> -
>>
>>
>>
>> Whitespace change should be removed.
>>
>>
>>
>>  /* Register definitions for switches and routers. */
>>  #define REGBIT_NAT_REDIRECT     "reg9[0]"
>>  /* Indicate that this packet has been recirculated using egress
>> @@ -3014,6 +3014,348 @@ build_acls(struct ovn_datapath *od, struct hmap
>> *lflows)
>>      }
>>  }
>>
>> +static int
>> +cmp_port_pair_groups(const void *ppg1_, const void *ppg2_)
>> +{
>> +    const struct nbrec_logical_port_pair_group *const *ppg1p = ppg1_;
>> +    const struct nbrec_logical_port_pair_group *const *ppg2p = ppg2_;
>> +    const struct nbrec_logical_port_pair_group *ppg1 = *ppg1p;
>> +    const struct nbrec_logical_port_pair_group *ppg2 = *ppg2p;
>> +
>> +
>> +    return ( (int)ppg1->sortkey - (int)ppg2->sortkey);
>> +}
>> +
>> +static void
>> +build_chain(struct ovn_datapath *od, struct hmap *lflows, struct hmap
>> *ports)
>> +{
>> +   /*
>> +    * TODO Items
>> +    *     * IPV6 support
>> +    *     * Load balancing support
>> +    *     * Bidirectional parameter support
>> +    *     * Support modes of VNF (BitW, L2, L3)
>> +    *     * Remove port-security on VNF Ports (if set by CMS)
>> +    *     * Add some state to allow match that does not require 'inport'
>> +    *     * Support visiting the same VNF more than once
>> +    *     * Unit tests!
>> +    */
>> +    const uint16_t ingress_inner_priority = 150;
>> +    const uint16_t ingress_outer_priority = 100;
>> +    const uint16_t egress_inner_priority = 150;
>> +    const uint16_t egress_outer_priority = 100;
>> +
>> +    struct ovn_port **input_port_array = NULL;
>> +    struct ovn_port **output_port_array = NULL;
>> +
>> +    struct ovn_port *dst_port = NULL;
>> +    struct ovn_port *src_port = NULL;
>> +
>> +    struct nbrec_logical_port_chain *lpc;
>> +    struct nbrec_logical_port_pair_group *lppg;
>> +    struct nbrec_logical_port_pair *lpp;
>> +    struct nbrec_logical_port_chain_classifier *lcc;
>> +
>> +    char *lcc_match = NULL;
>> +    char *lcc_action = NULL;
>> +    struct ovn_port *traffic_port;
>> +    unsigned int chain_direction = 2;
>> +    unsigned int chain_path = 2;
>> +    char * chain_match = NULL;
>> +
>> +    /* Ingress table ls_in_chain: default to passing through to the next
>> table
>> +     * (priority 0)
>> +     */
>> +    ovn_lflow_add(lflows, od, S_SWITCH_IN_CHAIN, 0, "1", "next;");
>>
>>
>>
>> At some point also need:
>>
>>
>>
>>     ovn_lflow_add(lflows, od, S_SWITCH_OUT_SFC_LOOPBACK, 0, "1", "next;");
>>
>>
>>
>> +
>> +    /* No port chains defined therefore, no further work to do here. */
>> +    if (!od->nbs->port_chain_classifiers) {
>> +        return;
>> +    }
>> +    /* Iterate through all the port-chains defined for this datapath. */
>> +    for (size_t i = 0; i < od->nbs->n_port_chain_classifiers; i++) {
>> +        lcc = od->nbs->port_chain_classifiers[i];
>> +        /* Get the parameters from the classifier */
>> +        lpc = lcc->chain;
>> +        //traffic_port = lcc->port;
>> +        traffic_port =  ovn_port_find(ports,lcc->port->name);
>> +        if (traffic_port == NULL) {
>> +            static struct vlog_rate_limit rl =
>> +                        VLOG_RATE_LIMIT_INIT(1, 1);
>> +            VLOG_WARN_RL(&rl,
>> +                        "Traffic port %s does not exist\n",
>> +                              lcc->port->name);
>> +            break;
>> +        }
>> +        /*
>> +        * Check chain has one or more groups
>> +        */
>> +        if (lpc->n_port_pair_groups == 0) {
>> +            static struct vlog_rate_limit rl =
>> +                        VLOG_RATE_LIMIT_INIT(1, 1);
>> +            VLOG_WARN_RL(&rl,
>> +                "SFC Chain: %s used in classifier: %s does not not "
>> +                 "have any port pair groups\n",
>> +                              lcc->chain->name, lcc->name);
>> +            break;
>> +
>> +        }
>> +        /* TODO Check port exists. */
>> +        struct eth_addr traffic_logical_port_ea;
>> +        ovs_be32 traffic_logical_port_ip;
>> +        ovs_scan(traffic_port->nbsp->addresses[0],
>> +                    ETH_ADDR_SCAN_FMT" "IP_SCAN_FMT,
>> +                    ETH_ADDR_SCAN_ARGS(traffic_logical_port_ea),
>> +                    IP_SCAN_ARGS(&traffic_logical_port_ip));
>> +        /* Set the port to use as source or destination. */
>> +        if (strcmp(lcc->direction,"entry-lport")==0) {
>> +            chain_path = 0;
>> +        } else {
>> +            chain_path = 1;
>> +        }
>> +        /* Set the direction of the port-chain. */
>> +        if (strcmp(lcc->path,"uni-directional") == 0) {
>> +            chain_direction = 0;
>> +        } else {
>> +            chain_direction = 1;
>> +        }
>> +        /* Set the match parameters. */
>> +        chain_match = lcc->match;
>> +        /*
>> +         * Allocate space for port-pairs + 1. The Extra 1 represents the
>> +         * final hop to reach desired destination.
>> +         * TODO: We are going to allocate enough space to hold all the
>> hops:
>> +         *  1 x portGroups + 1. This needs
>> +         *  to enhanced to: SUM(port pairs of each port group) + 1
>> +         */
>> +        input_port_array = xmalloc(sizeof *src_port *
>> +                                   lpc->n_port_pair_groups);
>> +        output_port_array = xmalloc(sizeof *dst_port *
>> +                                  (lpc->n_port_pair_groups));
>> +        /* Copy port groups from chain and sort them according to
>> sortkey.*/
>> +        struct nbrec_logical_port_pair_group **port_pair_groups =
>> +                                 xmemdup(lpc->port_pair_groups,
>> +                          sizeof *port_pair_groups *
>> lpc->n_port_pair_groups);
>> +        if (lpc->n_port_pair_groups > 1) {
>> +            qsort(port_pair_groups, lpc->n_port_pair_groups,
>> +              sizeof *port_pair_groups, cmp_port_pair_groups);
>> +        }
>> +        /* For each port-pair-group in a port chain pull out the
>> port-pairs.*/
>> +        for (size_t j = 0; j < lpc->n_port_pair_groups; j++) {
>> +            lppg = port_pair_groups[j];
>> +            for (size_t k = 0; k < lppg->n_port_pairs; k++) {
>> +                 /* TODO: Need to add load balancing logic when LB
>> becomes
>> +                 * available. Until LB is available just take the first
>> +                 * PP in the PPG. */
>> +                if (k > 0) {
>> +                    static struct vlog_rate_limit rl =
>> +                        VLOG_RATE_LIMIT_INIT(1, 1);
>> +                    VLOG_WARN_RL(&rl,
>> +                        "Currently lacking support for more than \
>> +                            one port-pair %"PRIuSIZE"\n",
>> +                              lppg->n_port_pairs);
>> +                    break;
>> +                }
>> +                lpp = lppg->port_pairs[k];
>> +                input_port_array[j] = lpp->inport ? ovn_port_find(ports,
>> +                                       lpp->inport->name) : NULL;
>> +                output_port_array[j] = lpp->outport ?
>> ovn_port_find(ports,
>> +                                        lpp->outport->name) : NULL;
>> +            }
>> +        /* At this point we need to determine the final hop port to add
>> to
>> +         * the chain. This defines the complete path for packets through
>> +         * the chain. */
>> +        }
>>
>>
>>
>> I find the change in indentation below confusing.
>>
>>
>>
>> +    /*
>> +    * Insert the lowest priorty rule dest is src-logical-port. These are
>> the
>> +    * entry points into the chain in either direction. The match
>> statement
>> +    * is used to filter the entry port to provide higher granularity of
>> +    * filtering.
>> +    */
>> +    if (chain_path == 1) { /* Path starting from entry port */
>> +        if (strcmp(chain_match,"")!=0) {
>> +           lcc_match =  xasprintf(
>> +            "eth.src == "ETH_ADDR_FMT" && %s",
>> +             ETH_ADDR_ARGS(traffic_logical_port_ea), chain_match);
>> +        } else {
>> +           lcc_match =  xasprintf(
>> +            "eth.src == "ETH_ADDR_FMT,
>> +             ETH_ADDR_ARGS(traffic_logical_port_ea));
>> +        }
>> +           lcc_action = xasprintf("outport = %s; output;",
>> +                                input_port_array[0]->json_key);
>> +    } else {  /* Path going to the entry port */
>> +        if (strcmp(chain_match,"")!=0) {
>> +           lcc_match =  xasprintf(
>> +            "eth.dst == "ETH_ADDR_FMT" && %s",
>> +             ETH_ADDR_ARGS(traffic_logical_port_ea), chain_match);
>> +        } else {
>> +           lcc_match =  xasprintf(
>> +            "eth.dst == "ETH_ADDR_FMT,
>> +             ETH_ADDR_ARGS(traffic_logical_port_ea));
>> +        }
>> +           lcc_action = xasprintf("outport = %s; output;",
>> +                    output_port_array[lpc->n_port_
>> pair_groups-1]->json_key);
>> +    }
>> +    ovn_lflow_add(lflows, od, S_SWITCH_IN_CHAIN, ingress_outer_priority,
>> +                       lcc_match, lcc_action);
>> +    free(lcc_match);
>> +    free(lcc_action);
>> +    /*
>> +    * For last VNF send the flow back to teh original chassis and exit
>> from
>> +    * there.
>> +    */
>> +    if (chain_path == 1) { /* Path starting from entry port */
>> +           lcc_match = xasprintf(
>> +                    "eth.src == "ETH_ADDR_FMT" && inport == %s",
>> +                     ETH_ADDR_ARGS(traffic_logical_port_ea),
>> +                     output_port_array[lpc->n_port
>> _pair_groups-1]->json_key);
>> +    } else { /* Path starting from destination port. */
>> +            lcc_match = xasprintf(
>> +                    "eth.dst == "ETH_ADDR_FMT" && inport == %s",
>> +                     ETH_ADDR_ARGS(traffic_logical_port_ea),
>> +                     input_port_array[0]->json_key);
>> +    }
>>
>>
>>
>>     lcc_action = xasprintf("outport = %s; output;",
>> traffic_port->json_key);
>>
>>     ovn_lflow_add(lflows, od, S_SWITCH_IN_CHAIN, ingress_inner_priority,
>>                         lcc_match, lcc_action);
>>     free(lcc_match);
>>     free(lcc_action);
>>
>>
>>
>> Temporarily ignoring cases where there are multiple
>> port_chain_classifiers with the same traffic_port. You should do something
>> to avoid duplicate flows in this case.
>>
>>
>>
>>     /* SFC egress loopback on traffic_port. */
>>
>>
>>
>>     lcc_match = xasprintf(
>>
>>                     "eth.src == "ETH_ADDR_FMT" && outport == %s",
>>                     ETH_ADDR_ARGS(traffic_logical_port_ea),
>>
>>                     traffic_port->json_key);
>>
>>
>>
>> +             //lcc_action = xasprintf("next;");
>> +             //lcc_action = xasprintf("flags.loopback = 1; "
>> +             //    REGBIT_CHAIN_LOOPBACK" = 1;"
>> +             //   "next(pipeline=ingress, table=0);");
>> +    struct ds actions = DS_EMPTY_INITIALIZER;
>> +    ds_put_format(&actions, "clone { ct_clear; "
>> +                              "inport = outport; outport = \"\"; "
>> +                              "flags = 0; flags.loopback = 1; ");
>>
>>
>>
>> Probably should not set flags.loopback, see comments
>>
>> below.
>>
>>
>>
>> +    for (int ii = 0; ii < MFF_N_LOG_REGS; ii++) {
>> +            ds_put_format(&actions, "reg%d = 0; ", ii);
>> +    }
>> +    ds_put_format(&actions, "next(pipeline=ingress, table=%d); };",
>> +                            ovn_stage_get_table(S_SWITCH_IN_CHAIN) + 1);
>> +    ovn_lflow_add(lflows, od, S_SWITCH_IN_CHAIN, ingress_inner_priority,
>> +                        lcc_match, ds_cstr(&actions));
>>
>>
>>
>> Replace the line above by:
>>
>>
>>
>>     ovn_lflow_add(lflows, od, S_SWITCH_OUT_SFC_LOOPBACK, 100,
>>                         lcc_match, ds_cstr(&actions));
>>
>>
>>
>> +    free(lcc_match);
>> +        //free(lcc_action);
>> +    ds_destroy(&actions);
>> +
>>
>>
>>
>> Some thought needs to be put behind loop avoidance, for
>>
>> example when eth.dst == eth.src == traffic_logical_port_ea.
>>
>> This may be an issue since we do not have any definitive
>>
>> method to distinguish packets returning from chains, from
>>
>> packets that were switched to traffic_port according to
>>
>> normal L2 processing.
>>
>>
>>
>> One option is to add higher priority ( > 100) flows. This is
>>
>> messy, since there is already a fair amount of code to
>>
>> write flows to send traffic to this port in
>>
>> S_SWITCH_IN_L2_LKUP. This needs some thought to
>>
>> determine the match conditions.
>>
>> The action is simple, just "next;".
>>
>>
>>
>> Perhaps more promising is not to set flags.loopback to 1.
>>
>> This would cause any packets with
>>
>> eth.src == traffic_logical_port_ea that are switched to
>>
>> traffic_port to go back through traffic_port's ingress
>>
>> pipeline, then be dropped if outport == traffic_port at the
>>
>> end of traffic_port's ingress pipeline.
>>
>> However, if anything after S_SWITCH_IN_CHAIN sets
>>
>> flags.loopback, that might not be enough.
>>
>>
>>
>> There seems to be another issue, if anything before
>>
>> S_SWITCH_IN_CHAIN sets flags.loopback, that will not be
>>
>> remembered when the packet comes back, so the packet
>>
>> will not be processed properly.
>>
>>
>>
>> Right now it looks like flags.loopback is only set for a
>>
>> logical switch in tables 11, 12, 13 which are after
>>
>> S_SWITCH_IN_CHAIN. So, at the moment, remembering
>>
>> flags.loopback is not an issue. These flows generate
>>
>> ARP and DCHP responses, in all cases changing
>>
>> eth.src, so the packets will not loop indefinitely even
>>
>> though these rules set flags.loopback. There are
>>
>> already flows to avoid generating ARP responses for
>>
>> requests on the port that owns the address, which is
>>
>> the dangerous case for SFC loops.
>>
>>
>>
>> In summary, it looks like leaving flags.loopback at 0
>>
>> should work for now. The question is what would
>>
>> happen if more cases are added in the future that
>>
>> set flags.loopback in a logical switch.
>>
>>
>>
>> Mickey
>>
>>
>>
>> +    /*
>> +    * Loop over all VNFs and create flow rules for each
>> +    * Only get here when there is more than one VNF in the chain.
>> +    */
>> +    for (size_t j = 1; j < lpc->n_port_pair_groups; j++) {
>> +
>> +        /* Apply inner rules flows */
>> +        if (chain_path == 1) { /* Path starting from entry port */
>> +            lcc_match = xasprintf(
>> +                    "eth.src == "ETH_ADDR_FMT" && inport == %s",
>> +                                 ETH_ADDR_ARGS(traffic_logical_port_ea),
>> +                                 output_port_array[j-1]->json_key);
>> +            lcc_action = xasprintf("outport = %s; output;",
>> +                                input_port_array[j]->json_key);
>> +        } else { /* Path going to entry port. */
>> +            lcc_match = xasprintf(
>> +                    "eth.dst == "ETH_ADDR_FMT" && inport == %s",
>> +                    ETH_ADDR_ARGS(traffic_logical_port_ea),
>> +                    input_port_array[lpc->n_port_p
>> air_groups-j]->json_key);
>> +             lcc_action = xasprintf("outport = %s; output;",
>> +                  output_port_array[lpc->n_port_
>> pair_groups-(j+1)]->json_key);
>> +        }
>> +        ovn_lflow_add(lflows, od, S_SWITCH_IN_CHAIN,
>> ingress_inner_priority,
>> +                        lcc_match, lcc_action);
>> +        free(lcc_match);
>> +        free(lcc_action);
>> +    }
>> +    /* bi-directional chain (Response)*/
>> +    if (chain_direction == 1) {
>> +        /*
>> +         * Insert the lowest priorty rule dest is src-logical-port
>> +         */
>> +        if (chain_path == 1) { /* Path from source port. */
>> +            if (strcmp(chain_match,"")!=0) {
>> +                 lcc_match =  xasprintf("eth.dst == "ETH_ADDR_FMT" &&
>> %s",
>> +                        ETH_ADDR_ARGS(traffic_logical_port_ea),
>> chain_match);
>> +            } else { /* Path to source port */
>> +
>> +                 lcc_match =  xasprintf("eth.dst == "ETH_ADDR_FMT,
>> +                                    ETH_ADDR_ARGS(traffic_logical_
>> port_ea));
>> +            }
>> +            lcc_action = xasprintf("outport = %s; output;",
>> +                    output_port_array[lpc->n_port_
>> pair_groups-1]->json_key);
>> +        } else { /* Path from destination port. */
>> +           if (strcmp(chain_match,"")!=0) {
>> +                lcc_match =  xasprintf("eth.src == "ETH_ADDR_FMT" && %s",
>> +                        ETH_ADDR_ARGS(traffic_logical_port_ea),
>> chain_match);
>> +            } else {
>> +                lcc_match =  xasprintf("eth.src == "ETH_ADDR_FMT,
>> +                        ETH_ADDR_ARGS(traffic_logical_port_ea));
>> +            }
>> +            lcc_action = xasprintf("outport = %s; output;",
>> +                    input_port_array[0]->json_key);
>> +        }
>> +        ovn_lflow_add(lflows, od, S_SWITCH_IN_CHAIN,
>> egress_outer_priority,
>> +                       lcc_match, lcc_action);
>> +        free(lcc_match);
>> +        free(lcc_action);
>> +        /*
>> +        * End Entry Flow to classification chain entry point.
>> +        */
>> +        /* Apply last rule to exit from chain */
>> +        if (chain_path == 1) { /* Path from source port. */
>> +             lcc_match = xasprintf(
>> +                    "eth.dst == "ETH_ADDR_FMT" && inport == %s",
>> +                                 ETH_ADDR_ARGS(traffic_logical_port_ea),
>> +                                 input_port_array[0]->json_key);
>> +
>> +        } else { /* Path to source port. */
>> +             lcc_match = xasprintf(
>> +                "eth.src == "ETH_ADDR_FMT" && inport == %s",
>> +                    ETH_ADDR_ARGS(traffic_logical_port_ea),
>> +                    output_port_array[lpc->n_port_
>> pair_groups-1]->json_key);
>> +        }
>> +        struct ds actions = DS_EMPTY_INITIALIZER;
>> +        ds_put_format(&actions,
>> +                              "clone { ct_clear; "
>> +                              "inport = outport; outport = \"\"; "
>> +                              "flags = 0; flags.loopback = 1; ");
>> +        for (int ii = 0; ii < MFF_N_LOG_REGS; ii++) {
>> +                    ds_put_format(&actions, "reg%d = 0; ", ii);
>> +        }
>> +        ds_put_format(&actions, "next(pipeline=ingress, table=%d); };",
>> +                              ovn_stage_get_table(S_SWITCH_IN_CHAIN) +
>> 1);
>> +        ovn_lflow_add(lflows, od, S_SWITCH_IN_CHAIN,
>> +                        egress_inner_priority, lcc_match,
>> ds_cstr(&actions));
>> +        ds_destroy(&actions);
>> +        free(lcc_match);
>> +
>> +        for (int j = 1; j< lpc->n_port_pair_groups; j++) {
>> +
>> +            /* Completed first catch all rule for this port-chain. */
>> +
>> +            /* Apply inner rules flows */
>> +            if (chain_path == 1) { /* Path from source port. */
>> +                lcc_match = xasprintf(
>> +                    "eth.dst == "ETH_ADDR_FMT" && inport == %s",
>> +                    ETH_ADDR_ARGS(traffic_logical_port_ea),
>> +                    input_port_array[lpc->n_port_p
>> air_groups-j]->json_key);
>> +                 lcc_action = xasprintf("outport = %s; output;",
>> +                  output_port_array[lpc->n_port_
>> pair_groups-(j+1)]->json_key);
>> +
>> +            } else { /* Path to source port. */
>> +                lcc_match = xasprintf(
>> +                    "eth.src == "ETH_ADDR_FMT" && inport == %s",
>> +                        ETH_ADDR_ARGS(traffic_logical_port_ea),
>> +                        output_port_array[j-1]->json_key);
>> +                 lcc_action = xasprintf("outport = %s; output;",
>> +                        input_port_array[j]->json_key);
>> +            }
>> +            ovn_lflow_add(lflows, od, S_SWITCH_IN_CHAIN,
>> +                          egress_inner_priority, lcc_match, lcc_action);
>> +            free(lcc_match);
>> +            free(lcc_action);
>> +        }
>> +    }
>> +    free(input_port_array);
>> +    free(output_port_array);
>> +    free(port_pair_groups);
>> +    }
>> +}
>>  static void
>>  build_qos(struct ovn_datapath *od, struct hmap *lflows) {
>>      ovn_lflow_add(lflows, od, S_SWITCH_IN_QOS_MARK, 0, "1", "next;");
>> @@ -3142,7 +3484,7 @@ build_lswitch_flows(struct hmap *datapaths, struct
>> hmap *ports,
>>      struct ds actions = DS_EMPTY_INITIALIZER;
>>
>>      /* Build pre-ACL and ACL tables for both ingress and egress.
>> -     * Ingress tables 3 through 9.  Egress tables 0 through 6. */
>> +     * Ingress tables 3 through 10.  Egress tables 0 through 6. */
>>      struct ovn_datapath *od;
>>      HMAP_FOR_EACH (od, key_node, datapaths) {
>>          if (!od->nbs) {
>> @@ -3153,6 +3495,7 @@ build_lswitch_flows(struct hmap *datapaths, struct
>> hmap *ports,
>>          build_pre_lb(od, lflows);
>>          build_pre_stateful(od, lflows);
>>          build_acls(od, lflows);
>> +        build_chain(od, lflows, ports);
>>          build_qos(od, lflows);
>>          build_lb(od, lflows);
>>          build_stateful(od, lflows);
>> @@ -3225,9 +3568,9 @@ build_lswitch_flows(struct hmap *datapaths, struct
>> hmap *ports,
>>          ovn_lflow_add(lflows, od, S_SWITCH_IN_PORT_SEC_IP, 0, "1",
>> "next;");
>>      }
>>
>> -    /* Ingress table 10: ARP/ND responder, skip requests coming from
>> localnet
>> -     * and vtep ports. (priority 100); see ovn-northd.8.xml for the
>> -     * rationale. */
>> +    /* Ingress table 11: ARP/ND responder, skip requests coming from
>> localnet
>> +     * ports. (priority 100). */
>> +
>>      HMAP_FOR_EACH (op, key_node, ports) {
>>          if (!op->nbsp) {
>>              continue;
>> @@ -3242,7 +3585,7 @@ build_lswitch_flows(struct hmap *datapaths, struct
>> hmap *ports,
>>          }
>>      }
>>
>> -    /* Ingress table 10: ARP/ND responder, reply for known IPs.
>> +    /* Ingress table 11: ARP/ND responder, reply for known IPs.
>>       * (priority 50). */
>>      HMAP_FOR_EACH (op, key_node, ports) {
>>          if (!op->nbsp) {
>> @@ -3335,7 +3678,7 @@ build_lswitch_flows(struct hmap *datapaths, struct
>> hmap *ports,
>>          }
>>      }
>>
>> -    /* Ingress table 10: ARP/ND responder, by default goto next.
>> +    /* Ingress table 11: ARP/ND responder, by default goto next.
>>       * (priority 0)*/
>>      HMAP_FOR_EACH (od, key_node, datapaths) {
>>          if (!od->nbs) {
>> @@ -3345,7 +3688,7 @@ build_lswitch_flows(struct hmap *datapaths, struct
>> hmap *ports,
>>          ovn_lflow_add(lflows, od, S_SWITCH_IN_ARP_ND_RSP, 0, "1",
>> "next;");
>>      }
>>
>> -    /* Logical switch ingress table 11 and 12: DHCP options and response
>> +    /* Logical switch ingress table 12 and 13: DHCP options and response
>>           * priority 100 flows. */
>>      HMAP_FOR_EACH (op, key_node, ports) {
>>          if (!op->nbsp) {
>> @@ -3449,7 +3792,7 @@ build_lswitch_flows(struct hmap *datapaths, struct
>> hmap *ports,
>>          }
>>      }
>>
>> -    /* Ingress table 11 and 12: DHCP options and response, by default
>> goto next.
>> +    /* Ingress table 12 and 13: DHCP options and response, by default
>> goto next.
>>       * (priority 0). */
>>
>>      HMAP_FOR_EACH (od, key_node, datapaths) {
>> @@ -3461,7 +3804,8 @@ build_lswitch_flows(struct hmap *datapaths, struct
>> hmap *ports,
>>          ovn_lflow_add(lflows, od, S_SWITCH_IN_DHCP_RESPONSE, 0, "1",
>> "next;");
>>      }
>>
>> -    /* Ingress table 13: Destination lookup, broadcast and multicast
>> handling
>> +
>> +    /* Ingress table 14: Destination lookup, broadcast and multicast
>> handling
>>       * (priority 100). */
>>      HMAP_FOR_EACH (op, key_node, ports) {
>>          if (!op->nbsp) {
>> @@ -3481,7 +3825,7 @@ build_lswitch_flows(struct hmap *datapaths, struct
>> hmap *ports,
>>                        "outport = \""MC_FLOOD"\"; output;");
>>      }
>>
>> -    /* Ingress table 13: Destination lookup, unicast handling (priority
>> 50), */
>> +    /* Ingress table 14: Destination lookup, unicast handling (priority
>> 50), */
>>      HMAP_FOR_EACH (op, key_node, ports) {
>>          if (!op->nbsp) {
>>              continue;
>> @@ -3581,7 +3925,7 @@ build_lswitch_flows(struct hmap *datapaths, struct
>> hmap *ports,
>>          }
>>      }
>>
>> -    /* Ingress table 13: Destination lookup for unknown MACs (priority
>> 0). */
>> +    /* Ingress table 14: Destination lookup for unknown MACs (priority
>> 0). */
>>      HMAP_FOR_EACH (od, key_node, datapaths) {
>>          if (!od->nbs) {
>>              continue;
>>
>>
>>
>> <snip>
>>
>>
>>
>
>
diff mbox

Patch

diff --git ovn/northd/ovn-northd.c ovn/northd/ovn-northd.c
index d0a5ba2..090f768 100644
--- ovn/northd/ovn-northd.c
+++ ovn/northd/ovn-northd.c
@@ -106,13 +106,14 @@  enum ovn_stage {
     PIPELINE_STAGE(SWITCH, IN,  PRE_LB,         4, "ls_in_pre_lb")        \
     PIPELINE_STAGE(SWITCH, IN,  PRE_STATEFUL,   5, "ls_in_pre_stateful")  \
     PIPELINE_STAGE(SWITCH, IN,  ACL,            6, "ls_in_acl")           \
-    PIPELINE_STAGE(SWITCH, IN,  QOS_MARK,       7, "ls_in_qos_mark")      \
-    PIPELINE_STAGE(SWITCH, IN,  LB,             8, "ls_in_lb")            \
-    PIPELINE_STAGE(SWITCH, IN,  STATEFUL,       9, "ls_in_stateful")      \
-    PIPELINE_STAGE(SWITCH, IN,  ARP_ND_RSP,    10, "ls_in_arp_rsp")       \
-    PIPELINE_STAGE(SWITCH, IN,  DHCP_OPTIONS,  11, "ls_in_dhcp_options")  \
-    PIPELINE_STAGE(SWITCH, IN,  DHCP_RESPONSE, 12, "ls_in_dhcp_response") \
-    PIPELINE_STAGE(SWITCH, IN,  L2_LKUP,       13, "ls_in_l2_lkup")       \
+    PIPELINE_STAGE(SWITCH, IN,  CHAIN,          7, "ls_in_chain")        \
+    PIPELINE_STAGE(SWITCH, IN,  QOS_MARK,       8, "ls_in_qos_mark")    \
+    PIPELINE_STAGE(SWITCH, IN,  LB,             9, "ls_in_lb")            \
+    PIPELINE_STAGE(SWITCH, IN,  STATEFUL,      10, "ls_in_stateful")      \
+    PIPELINE_STAGE(SWITCH, IN,  ARP_ND_RSP,    11, "ls_in_arp_rsp")       \
+    PIPELINE_STAGE(SWITCH, IN,  DHCP_OPTIONS,  12, "ls_in_dhcp_options")  \
+    PIPELINE_STAGE(SWITCH, IN,  DHCP_RESPONSE, 13, "ls_in_dhcp_response") \
+    PIPELINE_STAGE(SWITCH, IN,  L2_LKUP,       14, "ls_in_l2_lkup")       \
                                                                       \
     /* Logical switch egress stages. */                               \