[ovs-dev,v4] Policy-based routing (PBR) in OVN.

Message ID 1554334195-126528-2-git-send-email-mary.manohar@nutanix.com
State Accepted
Headers show
Series
  • [ovs-dev,v4] Policy-based routing (PBR) in OVN.
Related show

Commit Message

Mary Manohar April 3, 2019, 11:27 p.m.
PBR provides a mechanism to configure permit/deny and reroute policies on the
router. Permit/deny policies are similar to OVN ACLs, but exist on the
logical-router. Reroute policies are needed for service-insertion and
service-chaining. Currently, policies are stateless.

To achieve this, a new table is introduced in the ingress pipeline of the
Logical-router. The new table is between the ‘IP Routing’ and the ‘ARP/ND
resolution’ table. This way, PBR can override routing decisions and provide a
different next-hop.

This Patch:
a. Changes in OVN NB Schema to introduce a new table in the Logical
router.
b. Add commands to ovn-nbctl to add/delete/list routing policies.
c. Changes in ovn-northd to process routing-policy configurations.

 A new table 'Logical_Router_Policy' has been added in the northbound schema.
The table has the following columns:
      * priority: Rules with numerically higher priority take precedence over
        those with lower.
      * match: Uses the same expression language as the 'match' column of
       'Logical_Flow' table in the OVN Southbound database.
      * action: allow/drop/reroute nexthop: Nexthop IP address.

Each row in this table represents one routing policy for a logical router. The
'action' column for the highest priority matching row in this table determines a
packet's treatment. If no row matches, packets are allowed by default.

The new ovn-nbctl commands are as follows:
     1. Add a new ovn-nbctl command to add a routing policy.
     lr-policy-add ROUTER PRIORITY MATCH ACTION [NEXTHOP]

        Nexthop is an optional parameter. It needs to be provided only when
'action' is 'reroute'. A policy is uniquely identified by priority and match.
Multiple policies can have the same priority.

     2. Add a new ovn-nbctl command to delete a routing policy.
     lr-policy-del ROUTER [PRIORITY [MATCH]]

        Takes priority and match as optional parameters. If priority and match
are specified, the policy with the given priority and match is deleted. If
priority is specified and match is not specified, all rules with that priority
are deleted.  If priority is not specified, all the rules would be deleted.

     3. Add a new ovn-nbctl command to list routing-policies in the logical
router.
     lr-policy-list ROUTER

        ovn-northd changes are to get routing-policies from northbound database
and populate the same as logical flows in the southbound database. A new table
called 'POLICY' is introduced in the Logical router's ingress pipeline. Each
routing-policy configured in the northbound database translates into a single
logical flow in the new table.

        The columns from the Logical_Router_Policy table are used as follows:
The priority column is used as priority in the logical-flow. The match column
is used as the 'match' string in the logical-flow. The action column is used to
determine the action of the logical-flow.

        When the 'action' is reroute, if the nexthop ip-address is a connected
router port or the IP address of a logical port, the logical-flow is constructed
to route the packet to the nexthop ip-address.

Signed-off-by: Mary Manohar <mary.manohar@nutanix.com>
---
 ovn/northd/ovn-northd.c   | 113 +++++++++++++-
 ovn/ovn-nb.ovsschema      |  21 ++-
 ovn/ovn-nb.xml            |  64 ++++++++
 ovn/utilities/ovn-nbctl.c | 206 +++++++++++++++++++++++++
 tests/ovn-nbctl.at        |  57 +++++++
 tests/ovn.at              | 372 ++++++++++++++++++++++++++++++++++++++++++++++
 6 files changed, 825 insertions(+), 8 deletions(-)

Comments

0-day Robot April 4, 2019, 12:51 a.m. | #1
Bleep bloop.  Greetings Mary Manohar, I am a robot and I have tried out your patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


checkpatch:
WARNING: Line lacks whitespace around operator
#369 FILE: ovn/utilities/ovn-nbctl.c:649:
  lr-policy-add ROUTER PRIORITY MATCH ACTION [NEXTHOP]\n\

WARNING: Line lacks whitespace around operator
#371 FILE: ovn/utilities/ovn-nbctl.c:651:
  lr-policy-del ROUTER [PRIORITY [MATCH]]\n\

WARNING: Line lacks whitespace around operator
#373 FILE: ovn/utilities/ovn-nbctl.c:653:
  lr-policy-list ROUTER     print policies for ROUTER\n\

Lines checked: 1053, Warnings: 3, Errors: 0


Please check this out.  If you feel there has been an error, please email aconole@bytheb.org

Thanks,
0-day Robot
Ben Pfaff April 16, 2019, 5:57 p.m. | #2
On Wed, Apr 03, 2019 at 11:27:56PM +0000, Mary Manohar wrote:
> PBR provides a mechanism to configure permit/deny and reroute policies on the
> router. Permit/deny policies are similar to OVN ACLs, but exist on the
> logical-router. Reroute policies are needed for service-insertion and
> service-chaining. Currently, policies are stateless.

Thanks, I applied this to master.

I'd appreciate it if you'd submit a followup patch that adds a NEWS item
so that users know about the new feature.

Patch

diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
index 05b8aad..96a783f 100644
--- a/ovn/northd/ovn-northd.c
+++ b/ovn/northd/ovn-northd.c
@@ -141,9 +141,10 @@  enum ovn_stage {
     PIPELINE_STAGE(ROUTER, IN,  ND_RA_OPTIONS,  5, "lr_in_nd_ra_options") \
     PIPELINE_STAGE(ROUTER, IN,  ND_RA_RESPONSE, 6, "lr_in_nd_ra_response") \
     PIPELINE_STAGE(ROUTER, IN,  IP_ROUTING,     7, "lr_in_ip_routing")   \
-    PIPELINE_STAGE(ROUTER, IN,  ARP_RESOLVE,    8, "lr_in_arp_resolve")  \
-    PIPELINE_STAGE(ROUTER, IN,  GW_REDIRECT,    9, "lr_in_gw_redirect")  \
-    PIPELINE_STAGE(ROUTER, IN,  ARP_REQUEST,    10, "lr_in_arp_request")  \
+    PIPELINE_STAGE(ROUTER, IN,  POLICY,         8, "lr_in_policy")       \
+    PIPELINE_STAGE(ROUTER, IN,  ARP_RESOLVE,    9, "lr_in_arp_resolve")  \
+    PIPELINE_STAGE(ROUTER, IN,  GW_REDIRECT,    10, "lr_in_gw_redirect")  \
+    PIPELINE_STAGE(ROUTER, IN,  ARP_REQUEST,    11, "lr_in_arp_request")  \
                                                                       \
     /* Logical router egress stages. */                               \
     PIPELINE_STAGE(ROUTER, OUT, UNDNAT,    0, "lr_out_undnat")        \
@@ -4678,6 +4679,80 @@  find_lrp_member_ip(const struct ovn_port *op, const char *ip_s)
     return NULL;
 }
 
+static struct ovn_port*
+get_outport_for_routing_policy_nexthop(struct ovn_datapath *od,
+                                       struct hmap *ports,
+                                       int priority, const char *nexthop)
+{
+    if (nexthop == NULL) {
+        return NULL;
+    }
+
+    /* Find the router port matching the next hop. */
+    for (int i = 0; i < od->nbr->n_ports; i++) {
+       struct nbrec_logical_router_port *lrp = od->nbr->ports[i];
+
+       struct ovn_port *out_port = ovn_port_find(ports, lrp->name);
+       if (out_port && find_lrp_member_ip(out_port, nexthop)) {
+           return out_port;
+       }
+    }
+
+    static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 1);
+    VLOG_WARN_RL(&rl, "No path for routing policy priority %d; next hop %s",
+                 priority, nexthop);
+    return NULL;
+}
+
+static void
+build_routing_policy_flow(struct hmap *lflows, struct ovn_datapath *od,
+                          struct hmap *ports,
+                          const struct nbrec_logical_router_policy *rule)
+{
+    struct ds match = DS_EMPTY_INITIALIZER;
+    struct ds actions = DS_EMPTY_INITIALIZER;
+
+    if (!strcmp(rule->action, "reroute")) {
+        struct ovn_port *out_port = get_outport_for_routing_policy_nexthop(
+             od, ports, rule->priority, rule->nexthop);
+        if (!out_port) {
+            return;
+        }
+
+        const char *lrp_addr_s = find_lrp_member_ip(out_port, rule->nexthop);
+        if (!lrp_addr_s) {
+            static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 1);
+            VLOG_WARN_RL(&rl, "lrp_addr not found for routing policy "
+                         " priority %"PRId64" nexthop %s",
+                         rule->priority, rule->nexthop);
+            return;
+        }
+        bool is_ipv4 = strchr(rule->nexthop, '.') ? true : false;
+        ds_put_format(&actions, "%sreg0 = %s; "
+                      "%sreg1 = %s; "
+                      "eth.src = %s; "
+                      "outport = %s; "
+                      "flags.loopback = 1; "
+                      "next;",
+                      is_ipv4 ? "" : "xx",
+                      rule->nexthop,
+                      is_ipv4 ? "" : "xx",
+                      lrp_addr_s,
+                      out_port->lrp_networks.ea_s,
+                      out_port->json_key);
+
+    } else if (!strcmp(rule->action, "drop")) {
+        ds_put_cstr(&actions, "drop;");
+    } else if (!strcmp(rule->action, "allow")) {
+        ds_put_cstr(&actions, "next;");
+    }
+    ds_put_format(&match, "%s", rule->match);
+    ovn_lflow_add(lflows, od, S_ROUTER_IN_POLICY, rule->priority,
+                  ds_cstr(&match), ds_cstr(&actions));
+    ds_destroy(&match);
+    ds_destroy(&actions);
+}
+
 static void
 add_route(struct hmap *lflows, const struct ovn_port *op,
           const char *lrp_addr_s, const char *network_s, int plen,
@@ -6394,9 +6469,35 @@  build_lrouter_flows(struct hmap *datapaths, struct hmap *ports,
         }
     }
 
+    /* Logical router ingress table 8: Policy.
+     *
+     * A packet that arrives at this table is an IP packet that should be
+     * permitted/denied/rerouted to the address in the rule's nexthop.
+     * This table sets outport to the correct out_port,
+     * eth.src to the output port's MAC address,
+     * and '[xx]reg0' to the next-hop IP address (leaving
+     * 'ip[46].dst', the packet’s final destination, unchanged), and
+     * advances to the next table for ARP/ND resolution. */
+    HMAP_FOR_EACH (od, key_node, datapaths) {
+        if (!od->nbr) {
+            continue;
+        }
+        /* This is a catch-all rule. It has the lowest priority (0)
+         * does a match-all("1") and pass-through (next) */
+        ovn_lflow_add(lflows, od, S_ROUTER_IN_POLICY, 0, "1", "next;");
+
+        /* Convert routing policies to flows. */
+        for (int i = 0; i < od->nbr->n_policies; i++) {
+            const struct nbrec_logical_router_policy *rule
+                = od->nbr->policies[i];
+            build_routing_policy_flow(lflows, od, ports, rule);
+        }
+    }
+
+
     /* XXX destination unreachable */
 
-    /* Local router ingress table 8: ARP Resolution.
+    /* Local router ingress table 9: ARP Resolution.
      *
      * Any packet that reaches this table is an IP packet whose next-hop IP
      * address is in reg0. (ip4.dst is the final destination.) This table
@@ -6595,7 +6696,7 @@  build_lrouter_flows(struct hmap *datapaths, struct hmap *ports,
                       "get_nd(outport, xxreg0); next;");
     }
 
-    /* Logical router ingress table 9: Gateway redirect.
+    /* Logical router ingress table 10: Gateway redirect.
      *
      * For traffic with outport equal to the l3dgw_port
      * on a distributed router, this table redirects a subset
@@ -6635,7 +6736,7 @@  build_lrouter_flows(struct hmap *datapaths, struct hmap *ports,
         ovn_lflow_add(lflows, od, S_ROUTER_IN_GW_REDIRECT, 0, "1", "next;");
     }
 
-    /* Local router ingress table 10: ARP request.
+    /* Local router ingress table 11: ARP request.
      *
      * In the common case where the Ethernet destination has been resolved,
      * this table outputs the packet (priority 0).  Otherwise, it composes
diff --git a/ovn/ovn-nb.ovsschema b/ovn/ovn-nb.ovsschema
index 10a5964..c41f0d4 100644
--- a/ovn/ovn-nb.ovsschema
+++ b/ovn/ovn-nb.ovsschema
@@ -1,7 +1,7 @@ 
 {
     "name": "OVN_Northbound",
-    "version": "5.14.1",
-    "cksum": "3758097843 20509",
+    "version": "5.14.2",
+    "cksum": "1959190905 21386",
     "tables": {
         "NB_Global": {
             "columns": {
@@ -242,6 +242,12 @@ 
                                             "refType": "strong"},
                                    "min": 0,
                                    "max": "unlimited"}},
+                "policies": {
+                    "type": {"key": {"type": "uuid",
+                                     "refTable": "Logical_Router_Policy",
+                                     "refType": "strong"},
+                             "min": 0,
+                             "max": "unlimited"}},
                 "enabled": {"type": {"key": "boolean", "min": 0, "max": 1}},
                 "nat": {"type": {"key": {"type": "uuid",
                                          "refTable": "NAT",
@@ -303,6 +309,17 @@ 
                     "type": {"key": "string", "value": "string",
                              "min": 0, "max": "unlimited"}}},
             "isRoot": false},
+        "Logical_Router_Policy": {
+            "columns": {
+                "priority": {"type": {"key": {"type": "integer",
+                                              "minInteger": 0,
+                                              "maxInteger": 32767}}},
+                "match": {"type": "string"},
+                "action": {"type": {
+                    "key": {"type": "string",
+                            "enum": ["set", ["allow", "drop", "reroute"]]}}},
+                "nexthop": {"type": {"key": "string", "min": 0, "max": 1}}},
+            "isRoot": false},
         "NAT": {
             "columns": {
                 "external_ip": {"type": "string"},
diff --git a/ovn/ovn-nb.xml b/ovn/ovn-nb.xml
index 61a5711..4a8c94b 100644
--- a/ovn/ovn-nb.xml
+++ b/ovn/ovn-nb.xml
@@ -1241,6 +1241,10 @@ 
       Zero or more static routes for the router.
     </column>
 
+    <column name="policies">
+      Zero or more routing policies for the router.
+    </column>
+
     <column name="enabled">
       This column is used to administratively set router state.  If this column
       is empty or is set to <code>true</code>, the router is enabled.  If this
@@ -1841,6 +1845,66 @@ 
 
   </table>
 
+  <table name="Logical_Router_Policy" title="Logical router policies">
+    <p>
+      Each row in this table represents one routing policy for a logical router
+      that points to it through its <ref column="policies"/> column.  The <ref
+      column="action"/> column for the highest-<ref column="priority"/>
+      matching row in this table determines a packet's treatment.  If no row
+      matches, packets are allowed by default. (Default-deny treatment is
+      possible: add a rule with <ref column="priority"/> 0, <code>1</code> as
+      <ref column="match"/>, and <code>drop</code> as <ref column="action"/>.)
+    </p>
+
+    <column name="priority">
+      <p>
+        The routing policy's priority.  Rules with numerically higher priority
+        take precedence over those with lower. A rule is uniquely identified
+        by the priority and match string.
+      </p>
+    </column>
+
+    <column name="match">
+      <p>
+        The packets that the routing policy should match,
+        in the same expression language used for the
+        <ref column="match" table="Logical_Flow" db="OVN_Southbound"/>
+        column in the OVN Southbound database's
+        <ref table="Logical_Flow" db="OVN_Southbound"/> table.
+      </p>
+
+      <p>
+        By default all traffic is allowed.  When writing a more
+        restrictive policy, it is important to remember to allow flows
+        such as ARP and IPv6 neighbor discovery packets.
+      </p>
+    </column>
+
+    <column name="action">
+      <p>The action to take when the routing policy matches:</p>
+      <ul>
+        <li>
+          <code>allow</code>: Forward the packet.
+        </li>
+
+        <li>
+          <code>drop</code>: Silently drop the packet.
+        </li>
+
+        <li>
+          <code>reroute</code>: Reroute packet to <ref column="nexthop"/>.
+        </li>
+      </ul>
+    </column>
+
+    <column name="nexthop">
+      <p>
+        Next-hop IP address for this route, which should be the IP
+        address of a connected router port or the IP address of a logical port.
+      </p>
+    </column>
+  </table>
+
   <table name="NAT" title="NAT rules">
     <p>
       Each record represents a NAT rule.
diff --git a/ovn/utilities/ovn-nbctl.c b/ovn/utilities/ovn-nbctl.c
index 2727b41..cdbc299 100644
--- a/ovn/utilities/ovn-nbctl.c
+++ b/ovn/utilities/ovn-nbctl.c
@@ -645,6 +645,13 @@  Route commands:\n\
                             remove routes from ROUTER\n\
   lr-route-list ROUTER      print routes for ROUTER\n\
 \n\
+Policy commands:\n\
+  lr-policy-add ROUTER PRIORITY MATCH ACTION [NEXTHOP]\n\
+                            add a policy to router\n\
+  lr-policy-del ROUTER [PRIORITY [MATCH]]\n\
+                            remove policies from ROUTER\n\
+  lr-policy-list ROUTER     print policies for ROUTER\n\
+\n\
 NAT commands:\n\
   lr-nat-add ROUTER TYPE EXTERNAL_IP LOGICAL_IP [LOGICAL_PORT EXTERNAL_MAC]\n\
                             add a NAT to ROUTER\n\
@@ -3425,6 +3432,197 @@  normalize_prefix_str(const char *orig_prefix)
         return normalize_ipv6_prefix(ipv6, plen);
     }
 }
+
+static void
+nbctl_lr_policy_add(struct ctl_context *ctx)
+{
+    const struct nbrec_logical_router *lr;
+    int64_t priority = 0;
+    char *error = lr_by_name_or_uuid(ctx, ctx->argv[1], true, &lr);
+    if (error) {
+        ctx->error = error;
+        return;
+    }
+    error = parse_priority(ctx->argv[2], &priority);
+    if (error) {
+        ctx->error = error;
+        return;
+    }
+    const char *action = ctx->argv[4];
+    char *next_hop = NULL;
+
+    /* Validate action. */
+    if (strcmp(action, "allow") && strcmp(action, "drop")
+        && strcmp(action, "reroute")) {
+        ctl_error(ctx, "%s: action must be one of \"allow\", \"drop\", "
+                  "and \"reroute\"", action);
+    }
+    if (!strcmp(action, "reroute")) {
+        if (ctx->argc < 6) {
+            ctl_error(ctx, "Nexthop is required when action is reroute.");
+        }
+    }
+
+    /* Check if same routing policy already exists.
+     * A policy is uniquely identified by priority and match */
+    for (int i = 0; i < lr->n_policies; i++) {
+        const struct nbrec_logical_router_policy *policy = lr->policies[i];
+        if (policy->priority == priority &&
+            !strcmp(policy->match, ctx->argv[3])) {
+            ctl_error(ctx, "Same routing policy already existed on the "
+                      "logical router %s.", ctx->argv[1]);
+        }
+    }
+    if (ctx->argc == 6) {
+        next_hop = normalize_prefix_str(ctx->argv[5]);
+        if (!next_hop) {
+            ctl_error(ctx, "bad next hop argument: %s", ctx->argv[5]);
+        }
+    }
+
+    struct nbrec_logical_router_policy *policy;
+    policy = nbrec_logical_router_policy_insert(ctx->txn);
+    nbrec_logical_router_policy_set_priority(policy, priority);
+    nbrec_logical_router_policy_set_match(policy, ctx->argv[3]);
+    nbrec_logical_router_policy_set_action(policy, action);
+    if (ctx->argc == 6) {
+        nbrec_logical_router_policy_set_nexthop(policy, next_hop);
+    }
+    nbrec_logical_router_verify_policies(lr);
+    struct nbrec_logical_router_policy **new_policies
+        = xmalloc(sizeof *new_policies * (lr->n_policies + 1));
+    memcpy(new_policies, lr->policies,
+           sizeof *new_policies * lr->n_policies);
+    new_policies[lr->n_policies] = policy;
+    nbrec_logical_router_set_policies(lr, new_policies,
+                                      lr->n_policies + 1);
+    free(new_policies);
+    if (next_hop != NULL) {
+        free(next_hop);
+    }
+}
+
+static void
+nbctl_lr_policy_del(struct ctl_context *ctx)
+{
+    const struct nbrec_logical_router *lr;
+    int64_t priority = 0;
+    char *error = lr_by_name_or_uuid(ctx, ctx->argv[1], true, &lr);
+    if (error) {
+        ctx->error = error;
+        return;
+    }
+
+    if (ctx->argc == 2) {
+        /* If a priority is not specified, delete all policies. */
+        nbrec_logical_router_set_policies(lr, NULL, 0);
+        return;
+    }
+
+    error = parse_priority(ctx->argv[2], &priority);
+    if (error) {
+        ctx->error = error;
+        return;
+    }
+    /* If match is not specified, delete all routing policies with the
+     * specified priority. */
+    if (ctx->argc == 3) {
+        struct nbrec_logical_router_policy **new_policies
+            = xmemdup(lr->policies,
+                      sizeof *new_policies * lr->n_policies);
+        int n_policies = 0;
+        for (int i = 0; i < lr->n_policies; i++) {
+            if (priority != lr->policies[i]->priority) {
+                new_policies[n_policies++] = lr->policies[i];
+            }
+        }
+        nbrec_logical_router_verify_policies(lr);
+        nbrec_logical_router_set_policies(lr, new_policies, n_policies);
+        free(new_policies);
+        return;
+    }
+
+    /* Delete policy that has the same priority and match string */
+    for (int i = 0; i < lr->n_policies; i++) {
+        struct nbrec_logical_router_policy *routing_policy = lr->policies[i];
+        if (priority == routing_policy->priority &&
+            !strcmp(ctx->argv[3], routing_policy->match)) {
+            struct nbrec_logical_router_policy **new_policies
+                = xmemdup(lr->policies,
+                          sizeof *new_policies * lr->n_policies);
+            new_policies[i] = lr->policies[lr->n_policies - 1];
+            nbrec_logical_router_verify_policies(lr);
+            nbrec_logical_router_set_policies(lr, new_policies,
+                                              lr->n_policies - 1);
+            free(new_policies);
+            return;
+        }
+    }
+}
+
+ struct routing_policy {
+    int priority;
+    char *match;
+    const struct nbrec_logical_router_policy *policy;
+};
+
+static int
+routing_policy_cmp(const void *policy1_, const void *policy2_)
+{
+    const struct routing_policy *policy1p = policy1_;
+    const struct routing_policy *policy2p = policy2_;
+    if (policy1p->priority != policy2p->priority) {
+        return policy1p->priority > policy2p->priority ? -1 : 1;
+    } else {
+        return strcmp(policy1p->match, policy2p->match);
+    }
+}
+
+static void
+print_routing_policy(const struct nbrec_logical_router_policy *policy,
+                     struct ds *s)
+{
+    if (policy->nexthop != NULL) {
+        char *next_hop = normalize_prefix_str(policy->nexthop);
+        ds_put_format(s, "%10ld %50s %15s %25s", policy->priority,
+                      policy->match, policy->action, next_hop);
+        free(next_hop);
+    } else {
+        ds_put_format(s, "%10ld %50s %15s", policy->priority,
+                      policy->match, policy->action);
+    }
+    ds_put_char(s, '\n');
+}
+
+static void
+nbctl_lr_policy_list(struct ctl_context *ctx)
+{
+    const struct nbrec_logical_router *lr;
+    struct routing_policy *policies;
+    size_t n_policies = 0;
+    char *error = lr_by_name_or_uuid(ctx, ctx->argv[1], true, &lr);
+    if (error) {
+        ctx->error = error;
+        return;
+    }
+    policies = xmalloc(sizeof *policies * lr->n_policies);
+     for (int i = 0; i < lr->n_policies; i++) {
+        const struct nbrec_logical_router_policy *policy
+            = lr->policies[i];
+        policies[n_policies].priority = policy->priority;
+        policies[n_policies].match = policy->match;
+        policies[n_policies].policy = policy;
+        n_policies++;
+    }
+    qsort(policies, n_policies, sizeof *policies, routing_policy_cmp);
+    if (n_policies) {
+        ds_put_cstr(&ctx->output, "Routing Policies\n");
+    }
+    for (int i = 0; i < n_policies; i++) {
+        print_routing_policy(policies[i].policy, &ctx->output);
+    }
+    free(policies);
+}
 
 static void
 nbctl_lr_route_add(struct ctl_context *ctx)
@@ -5174,6 +5372,14 @@  static const struct ctl_command_syntax nbctl_commands[] = {
     { "lr-route-list", 1, 1, "ROUTER", NULL, nbctl_lr_route_list, NULL,
       "", RO },
 
+    /* Policy commands */
+    { "lr-policy-add", 4, 5, "ROUTER PRIORITY MATCH ACTION [NEXTHOP]", NULL,
+        nbctl_lr_policy_add, NULL, "", RW },
+    { "lr-policy-del", 1, 3, "ROUTER [PRIORITY [MATCH]]", NULL,
+        nbctl_lr_policy_del, NULL, "", RW },
+    { "lr-policy-list", 1, 1, "ROUTER", NULL, nbctl_lr_policy_list, NULL,
+       "", RO },
+
     /* NAT commands. */
     { "lr-nat-add", 4, 6,
       "ROUTER TYPE EXTERNAL_IP LOGICAL_IP [LOGICAL_PORT EXTERNAL_MAC]", NULL,
diff --git a/tests/ovn-nbctl.at b/tests/ovn-nbctl.at
index f884fc7..18c5c1d 100644
--- a/tests/ovn-nbctl.at
+++ b/tests/ovn-nbctl.at
@@ -1353,6 +1353,63 @@  IPv6 Routes
 
 dnl ---------------------------------------------------------------------
 
+OVN_NBCTL_TEST([ovn_nbctl_policies], [policies], [
+AT_CHECK([ovn-nbctl lr-add lr0])
+
+dnl Add policies with allow and drop actions
+AT_CHECK([ovn-nbctl lr-policy-add lr0 100 "ip4.src == 1.1.1.0/24" drop])
+AT_CHECK([ovn-nbctl lr-policy-add lr0 100 "ip4.src == 1.1.2.0/24" allow])
+AT_CHECK([ovn-nbctl lr-policy-add lr0 101 "ip4.src == 2.1.1.0/24" allow])
+AT_CHECK([ovn-nbctl lr-policy-add lr0 101 "ip4.src == 2.1.2.0/24" drop])
+AT_CHECK([ovn-nbctl lr-policy-add lr0 101 "ip6.src == 2002::/64" drop])
+
+dnl Add duplicated policy
+AT_CHECK([ovn-nbctl lr-policy-add lr0 100 "ip4.src == 1.1.1.0/24" drop], [1], [],
+  [ovn-nbctl: Same routing policy already existed on the logical router lr0.
+])
+
+dnl Add duplicated policy
+AT_CHECK([ovn-nbctl lr-policy-add lr0 103 "ip4.src == 1.1.1.0/24" deny], [1], [],
+  [ovn-nbctl: deny: action must be one of "allow", "drop", and "reroute"
+])
+
+dnl Delete by priority and match string
+AT_CHECK([ovn-nbctl lr-policy-del lr0 100 "ip4.src == 1.1.1.0/24"])
+AT_CHECK([ovn-nbctl lr-policy-list lr0], [0], [dnl
+Routing Policies
+       101                              ip4.src == 2.1.1.0/24           allow
+       101                              ip4.src == 2.1.2.0/24            drop
+       101                               ip6.src == 2002::/64            drop
+       100                              ip4.src == 1.1.2.0/24           allow
+])
+
+dnl Delete all policies for given priority
+AT_CHECK([ovn-nbctl lr-policy-del lr0 101])
+AT_CHECK([ovn-nbctl lr-policy-list lr0], [0], [dnl
+Routing Policies
+       100                              ip4.src == 1.1.2.0/24           allow
+])
+
+dnl Add policy with reroute action
+AT_CHECK([ovn-nbctl lr-policy-add lr0 102 "ip4.src == 3.1.2.0/24" reroute 3.3.3.3])
+
+dnl Add policy with invalid reroute ip
+AT_CHECK([ovn-nbctl lr-policy-add lr0 103 "ip4.src == 3.1.2.0/24" reroute 3.3.3.x], [1], [],
+  [ovn-nbctl: bad next hop argument: 3.3.3.x
+])
+
+dnl Add policy with reroute action
+AT_CHECK([ovn-nbctl lr-policy-add lr0 104 "ip6.src == 2001::/64" reroute 2002::5])
+
+dnl Add policy with invalid reroute ip
+AT_CHECK([ovn-nbctl lr-policy-add lr0 105 "ip6.src == 2001::/64" reroute 2002::x], [1], [],
+  [ovn-nbctl: bad next hop argument: 2002::x
+])
+
+])
+
+dnl ---------------------------------------------------------------------
+
 OVN_NBCTL_TEST([ovn_nbctl_lsp_types], [lsp types], [
 AT_CHECK([ovn-nbctl ls-add ls0])
 AT_CHECK([ovn-nbctl lsp-add ls0 lp0])
diff --git a/tests/ovn.at b/tests/ovn.at
index e7746cb..2294450 100644
--- a/tests/ovn.at
+++ b/tests/ovn.at
@@ -5375,6 +5375,7 @@  test_ipv4_icmp_request 2 000000010204 0000000102f2 $l2_ip $rtr_l2_ip 0000 8510 0
 test_ipv4_icmp_request 1 000000010203 0000000102f1 $l1_ip $rtr_l2_ip 0000 8510 02ff 8d10
 test_ipv4_icmp_request 2 000000010204 0000000102f2 $l2_ip $rtr_l1_ip 0000 8510 02ff 8d10
 
+
 echo "---------NB dump-----"
 ovn-nbctl show
 echo "---------------------"
@@ -5398,7 +5399,378 @@  for inport in 1 2; do
 done
 
 OVN_CLEANUP([hv1])
+AT_CLEANUP
+
+AT_SETUP([ovn -- policy-based routing: 1 HVs, 2 LSs, 1 lport/LS, 1 LR])
+AT_KEYWORDS([policy-based-routing])
+AT_SKIP_IF([test $HAVE_PYTHON = no])
+ovn_start
+
+# Logical network:
+# One LR - R1 has switch ls1 (191.168.1.0/24) connected to it,
+# and has switch ls2 (172.16.1.0/24) connected to it.
+
+ovn-nbctl lr-add R1
+
+ovn-nbctl ls-add ls1
+ovn-nbctl ls-add ls2
+ovn-nbctl ls-add ls3
+
+# Connect ls1 to R1
+ovn-nbctl lrp-add R1 ls1 00:00:00:01:02:f1 192.168.1.1/24
+ovn-nbctl lsp-add ls1 rp-ls1 -- set Logical_Switch_Port rp-ls1 \
+    type=router options:router-port=ls1 addresses=\"00:00:00:01:02:f1\"
+
+# Connect ls2 to R1
+ovn-nbctl lrp-add R1 ls2 00:00:00:01:02:f2 172.16.1.1/24
+ovn-nbctl lsp-add ls2 rp-ls2 -- set Logical_Switch_Port rp-ls2 \
+    type=router options:router-port=ls2 addresses=\"00:00:00:01:02:f2\"
+
+# Connect ls3 to R1
+ovn-nbctl lrp-add R1 ls3 00:00:00:01:02:f3 20.20.1.1/24
+ovn-nbctl lsp-add ls3 rp-ls3 -- set Logical_Switch_Port rp-ls3 \
+    type=router options:router-port=ls3 addresses=\"00:00:00:01:02:f3\"
+
+# Create logical port ls1-lp1 in ls1
+ovn-nbctl lsp-add ls1 ls1-lp1 \
+-- lsp-set-addresses ls1-lp1 "00:00:00:01:02:03 192.168.1.2"
+
+# Create logical port ls2-lp1 in ls2
+ovn-nbctl lsp-add ls2 ls2-lp1 \
+-- lsp-set-addresses ls2-lp1 "00:00:00:01:02:04 172.16.1.2"
+
+# Create logical port ls3-lp1 in ls3
+ovn-nbctl lsp-add ls3 ls3-lp1 \
+-- lsp-set-addresses ls3-lp1 "00:00:00:01:02:05 20.20.1.2"
+
+# Create one hypervisor and create OVS ports corresponding to logical ports.
+net_add n1
+
+sim_add pbr-hv
+as pbr-hv
+ovs-vsctl add-br br-phys
+ovn_attach n1 br-phys 192.168.0.1
+
+ovs-vsctl -- add-port br-int vif1 -- \
+    set interface vif1 external-ids:iface-id=ls1-lp1 \
+    options:tx_pcap=pbr-hv/vif1-tx.pcap \
+    options:rxq_pcap=pbr-hv/vif1-rx.pcap \
+    ofport-request=1
+
+ovs-vsctl -- add-port br-int vif2 -- \
+    set interface vif2 external-ids:iface-id=ls2-lp1 \
+    options:tx_pcap=pbr-hv/vif2-tx.pcap \
+    options:rxq_pcap=pbr-hv/vif2-rx.pcap \
+    ofport-request=1
+
+ovs-vsctl -- add-port br-int vif3 -- \
+    set interface vif3 external-ids:iface-id=ls3-lp1 \
+    options:tx_pcap=pbr-hv/vif3-tx.pcap \
+    options:rxq_pcap=pbr-hv/vif3-rx.pcap \
+    ofport-request=1
+
+# Allow some time for ovn-northd and ovn-controller to catch up.
+# XXX This should be more systematic.
+sleep 1
+
+ls1_ro_mac=00:00:00:01:02:f1
+ls1_ro_ip=192.168.1.1
+
+ls2_ro_mac=00:00:00:01:02:f2
+ls2_ro_ip=172.16.1.1
+
+ls3_ro_mac=00:00:00:01:02:f3
+
+ls1_p1_mac=00:00:00:01:02:03
+ls1_p1_ip=192.168.1.2
+
+ls2_p1_mac=00:00:00:01:02:04
+ls2_p1_ip=172.16.1.2
+
+ls3_p1_mac=00:00:00:01:02:05
+
+# Create a drop policy
+ovn-nbctl lr-policy-add R1 10 "ip4.src==192.168.1.0/24 && ip4.dst==172.16.1.0/24" drop
+
+# Check logical flow
+AT_CHECK([ovn-sbctl dump-flows | grep lr_in_policy | grep "192.168.1.0" | wc -l], [0], [dnl
+1
+])
+
+# Send packet.
+packet="inport==\"ls1-lp1\" && eth.src==$ls1_p1_mac && eth.dst==$ls1_ro_mac &&
+       ip4 && ip.ttl==64 && ip4.src==$ls1_p1_ip && ip4.dst==$ls2_p1_ip &&
+       udp && udp.src==53 && udp.dst==4369"
+
+as pbr-hv ovs-appctl -t ovn-controller inject-pkt "$packet"
+
+# Check if packet hit the drop policy
+AT_CHECK([ovs-ofctl dump-flows br-int | \
+    grep "nw_src=192.168.1.0/24,nw_dst=172.16.1.0/24 actions=drop" | \
+    grep "priority=10" | \
+    grep "n_packets=1" | wc -l], [0], [dnl
+1
+])
+
+# Expected to drop the packet.
+$PYTHON "$top_srcdir/utilities/ovs-pcap.in" pbr-hv/vif2-tx.pcap > vif2.packets
+rcvd_packet=`cat vif2.packets`
+AT_FAIL_IF([rcvd_packet = ""])
+
+# Override drop policy with allow
+ovn-nbctl lr-policy-add R1 20 "ip4.src==192.168.1.0/24 && ip4.dst==172.16.1.0/24" allow
+
+# Check logical flow
+AT_CHECK([ovn-sbctl dump-flows | grep lr_in_policy | grep "192.168.1.0" | wc -l], [0], [dnl
+2
+])
+
+# Send packet.
+packet="inport==\"ls1-lp1\" && eth.src==$ls1_p1_mac && eth.dst==$ls1_ro_mac &&
+       ip4 && ip.ttl==64 && ip4.src==$ls1_p1_ip && ip4.dst==$ls2_p1_ip &&
+       udp && udp.src==53 && udp.dst==4369"
+as pbr-hv ovs-appctl -t ovn-controller inject-pkt "$packet"
+
+# Check if packet hit the allow policy
+AT_CHECK([ovn-sbctl dump-flows | grep lr_in_policy | \
+    grep "192.168.1.0" | \
+    grep "priority=20" | wc -l], [0], [dnl
+1
+])
+
+# Expected packet has TTL decreased by 1
+expected="eth.src==$ls2_ro_mac && eth.dst==$ls2_p1_mac &&
+       ip4 && ip.ttl==63 && ip4.src==$ls1_p1_ip && ip4.dst==$ls2_p1_ip &&
+       udp && udp.src==53 && udp.dst==4369"
+echo $expected | ovstest test-ovn expr-to-packets > expected
+
+OVN_CHECK_PACKETS([pbr-hv/vif2-tx.pcap], [expected])
+
+# Override allow policy with reroute
+ovn-nbctl lr-policy-add R1 30 "ip4.src==192.168.1.0/24 && ip4.dst==172.16.1.0/24" reroute 20.20.1.2
+
+# Check logical flow
+AT_CHECK([ovn-sbctl dump-flows | grep lr_in_policy | \
+    grep "192.168.1.0" | \
+    grep "priority=30" | wc -l], [0], [dnl
+1
+])
+
+# Send packet.
+packet="inport==\"ls1-lp1\" && eth.src==$ls1_p1_mac && eth.dst==$ls1_ro_mac &&
+       ip4 && ip.ttl==64 && ip4.src==$ls1_p1_ip && ip4.dst==$ls2_p1_ip &&
+       udp && udp.src==53 && udp.dst==4369"
+as pbr-hv ovs-appctl -t ovn-controller inject-pkt "$packet"
+
+echo "southbound flows"
+
+ovn-sbctl dump-flows | grep lr_in_policy
+echo "ovs flows"
+ovs-ofctl dump-flows br-int
+# Check if packet hit the allow policy
+AT_CHECK([ovs-ofctl dump-flows br-int | \
+    grep "nw_src=192.168.1.0/24,nw_dst=172.16.1.0/24" | \
+    grep "priority=30" | \
+    grep "n_packets=1" | wc -l], [0], [dnl
+1
+])
+echo "packet hit reroute policy"
+
+# Expected packet has TTL decreased by 1
+expected="eth.src==$ls3_ro_mac && eth.dst==$ls3_p1_mac &&
+       ip4 && ip.ttl==63 && ip4.src==$ls1_p1_ip && ip4.dst==$ls2_p1_ip &&
+       udp && udp.src==53 && udp.dst==4369"
+echo $expected | ovstest test-ovn expr-to-packets > 3.expected
+
+OVN_CHECK_PACKETS([pbr-hv/vif3-tx.pcap], [3.expected])
+
+OVN_CLEANUP([pbr-hv])
+AT_CLEANUP
+
+AT_SETUP([ovn -- policy-based routing IPv6: 1 HVs, 3 LSs, 1 lport/LS, 1 LR])
+AT_KEYWORDS([policy-based-routing])
+AT_SKIP_IF([test $HAVE_PYTHON = no])
+ovn_start
+
+# Logical network:
+# One LR - R1 has switch ls1 (191.168.1.0/24) connected to it,
+# and has switch ls2 (172.16.1.0/24) connected to it.
+
+ovn-nbctl lr-add R1
+
+ovn-nbctl ls-add ls1
+ovn-nbctl ls-add ls2
+ovn-nbctl ls-add ls3
+
+# Connect ls1 to R1
+ovn-nbctl lrp-add R1 ls1 00:00:00:01:02:f1 2001::1/64
+ovn-nbctl lsp-add ls1 rp-ls1 -- set Logical_Switch_Port rp-ls1 \
+    type=router options:router-port=ls1 addresses=\"00:00:00:01:02:f1\"
+
+# Connect ls2 to R1
+ovn-nbctl lrp-add R1 ls2 00:00:00:01:02:f2 2002::1/64
+ovn-nbctl lsp-add ls2 rp-ls2 -- set Logical_Switch_Port rp-ls2 \
+    type=router options:router-port=ls2 addresses=\"00:00:00:01:02:f2\"
+
+# Connect ls3 to R1
+ovn-nbctl lrp-add R1 ls3 00:00:00:01:02:f3 2003::1/64
+ovn-nbctl lsp-add ls3 rp-ls3 -- set Logical_Switch_Port rp-ls3 \
+    type=router options:router-port=ls3 addresses=\"00:00:00:01:02:f3\"
+
+# Create logical port ls1-lp1 in ls1
+ovn-nbctl lsp-add ls1 ls1-lp1 \
+-- lsp-set-addresses ls1-lp1 "00:00:00:01:02:03 2001::2"
+
+# Create logical port ls2-lp1 in ls2
+ovn-nbctl lsp-add ls2 ls2-lp1 \
+-- lsp-set-addresses ls2-lp1 "00:00:00:01:02:04 2002::2"
+
+# Create logical port ls3-lp1 in ls3
+ovn-nbctl lsp-add ls3 ls3-lp1 \
+-- lsp-set-addresses ls3-lp1 "00:00:00:01:02:05 2003::2"
+
+# Create one hypervisor and create OVS ports corresponding to logical ports.
+net_add n1
+
+sim_add pbr-hv
+as pbr-hv
+ovs-vsctl add-br br-phys
+ovn_attach n1 br-phys 192.168.0.1
+
+ovs-vsctl -- add-port br-int vif1 -- \
+    set interface vif1 external-ids:iface-id=ls1-lp1 \
+    options:tx_pcap=pbr-hv/vif1-tx.pcap \
+    options:rxq_pcap=pbr-hv/vif1-rx.pcap \
+    ofport-request=1
+
+ovs-vsctl -- add-port br-int vif2 -- \
+    set interface vif2 external-ids:iface-id=ls2-lp1 \
+    options:tx_pcap=pbr-hv/vif2-tx.pcap \
+    options:rxq_pcap=pbr-hv/vif2-rx.pcap \
+    ofport-request=1
+
+ovs-vsctl -- add-port br-int vif3 -- \
+    set interface vif3 external-ids:iface-id=ls3-lp1 \
+    options:tx_pcap=pbr-hv/vif3-tx.pcap \
+    options:rxq_pcap=pbr-hv/vif3-rx.pcap \
+    ofport-request=1
+
+# Allow some time for ovn-northd and ovn-controller to catch up.
+# XXX This should be more systematic.
+sleep 1
+
+ls1_ro_mac=00:00:00:01:02:f1
+ls1_ro_ip=2001::1
+
+ls2_ro_mac=00:00:00:01:02:f2
+ls2_ro_ip=2002::1
+
+ls3_ro_mac=00:00:00:01:02:f3
+
+ls1_p1_mac=00:00:00:01:02:03
+ls1_p1_ip=2001::2
+
+ls2_p1_mac=00:00:00:01:02:04
+ls2_p1_ip=2002::2
+
+ls3_p1_mac=00:00:00:01:02:05
+
+# Create a drop policy
+ovn-nbctl lr-policy-add R1 10 "ip6.src==2001::/64 && ip6.dst==2002::/64" drop
+
+# Check logical flow
+AT_CHECK([ovn-sbctl dump-flows | grep lr_in_policy | grep "2001" | wc -l], [0], [dnl
+1
+])
+
+# Send packet.
+packet="inport==\"ls1-lp1\" && eth.src==$ls1_p1_mac && eth.dst==$ls1_ro_mac &&
+       ip6 && ip.ttl==64 && ip6.src==$ls1_p1_ip && ip6.dst==$ls2_p1_ip &&
+       udp && udp.src==53 && udp.dst==4369"
+
+as pbr-hv ovs-appctl -t ovn-controller inject-pkt "$packet"
+
+# Check if packet hit the drop policy
+AT_CHECK([ovs-ofctl dump-flows br-int | \
+    grep "ipv6_src=2001::/64,ipv6_dst=2002::/64 actions=drop" | \
+    grep "priority=10" | \
+    grep "n_packets=1" | wc -l], [0], [dnl
+1
+])
+
+# Expected to drop the packet.
+$PYTHON "$top_srcdir/utilities/ovs-pcap.in" pbr-hv/vif2-tx.pcap > vif2.packets
+rcvd_packet=`cat vif2.packets`
+AT_FAIL_IF([rcvd_packet = ""])
+
+# Override drop policy with allow
+ovn-nbctl lr-policy-add R1 20 "ip6.src==2001::/64 && ip6.dst==2002::/64" allow
+
+# Check logical flow
+AT_CHECK([ovn-sbctl dump-flows | grep lr_in_policy | grep "2001" | wc -l], [0], [dnl
+2
+])
+
+# Send packet.
+packet="inport==\"ls1-lp1\" && eth.src==$ls1_p1_mac && eth.dst==$ls1_ro_mac &&
+       ip6 && ip.ttl==64 && ip6.src==$ls1_p1_ip && ip6.dst==$ls2_p1_ip &&
+       udp && udp.src==53 && udp.dst==4369"
+as pbr-hv ovs-appctl -t ovn-controller inject-pkt "$packet"
+
+# Check if packet hit the allow policy
+AT_CHECK([ovn-sbctl dump-flows | grep lr_in_policy | \
+    grep "2001" | \
+    grep "priority=20" | wc -l], [0], [dnl
+1
+])
+
+# Expected packet has TTL decreased by 1
+expected="eth.src==$ls2_ro_mac && eth.dst==$ls2_p1_mac &&
+       ip6 && ip.ttl==63 && ip6.src==$ls1_p1_ip && ip6.dst==$ls2_p1_ip &&
+       udp && udp.src==53 && udp.dst==4369"
+echo $expected | ovstest test-ovn expr-to-packets > expected
+
+OVN_CHECK_PACKETS([pbr-hv/vif2-tx.pcap], [expected])
+
+# Override allow policy with reroute
+ovn-nbctl lr-policy-add R1 30 "ip6.src==2001::/64 && ip6.dst==2002::/64" reroute 2003::2
+
+# Check logical flow
+AT_CHECK([ovn-sbctl dump-flows | grep lr_in_policy | \
+    grep "2001" | \
+    grep "priority=30" | wc -l], [0], [dnl
+1
+])
+
+# Send packet.
+packet="inport==\"ls1-lp1\" && eth.src==$ls1_p1_mac && eth.dst==$ls1_ro_mac &&
+       ip6 && ip.ttl==64 && ip6.src==$ls1_p1_ip && ip6.dst==$ls2_p1_ip &&
+       udp && udp.src==53 && udp.dst==4369"
+as pbr-hv ovs-appctl -t ovn-controller inject-pkt "$packet"
+
+echo "southbound flows"
+
+ovn-sbctl dump-flows | grep lr_in_policy
+echo "ovs flows"
+ovs-ofctl dump-flows br-int
+# Check if packet hit the allow policy
+AT_CHECK([ovs-ofctl dump-flows br-int | \
+    grep "ipv6_src=2001::/64,ipv6_dst=2002::/64" | \
+    grep "priority=30" | \
+    grep "n_packets=1" | wc -l], [0], [dnl
+1
+])
+echo "packet hit reroute policy"
+
+# Expected packet has TTL decreased by 1
+expected="eth.src==$ls3_ro_mac && eth.dst==$ls3_p1_mac &&
+       ip6 && ip.ttl==63 && ip6.src==$ls1_p1_ip && ip6.dst==$ls2_p1_ip &&
+       udp && udp.src==53 && udp.dst==4369"
+echo $expected | ovstest test-ovn expr-to-packets > 3.expected
+
+OVN_CHECK_PACKETS([pbr-hv/vif3-tx.pcap], [3.expected])
 
+OVN_CLEANUP([pbr-hv])
 AT_CLEANUP
 
 # 1 hypervisor, 1 port