diff mbox

[ovs-dev,RFC,v3,2/5] ovn: Introduce "chassisredirect" port binding

Message ID 1481812455-13149-3-git-send-email-mickeys.dev@gmail.com
State Superseded
Headers show

Commit Message

Mickey Spiegel Dec. 15, 2016, 2:34 p.m. UTC
Currently OVN handles all logical router ports in a distributed manner,
creating instances on each chassis.  The logical router ingress and
egress pipelines are traversed locally on the source chassis.

In order to support advanced features such as one-to-many NAT (aka IP
masquerading), where multiple private IP addresses spread across
multiple chassis are mapped to one public IP address, it will be
necessary to handle some of the logical router processing on a specific
chassis in a centralized manner.

The goal of this patch is to develop abstractions that allow for a
subset of router gateway traffic to be handled in a centralized manner
(e.g. one-to-many NAT traffic), while allowing for other subsets of
router gateway traffic to be handled in a distributed manner (e.g.
floating IP traffic).

This patch introduces a new type of SB port_binding called
"chassisredirect".  A "chassisredirect" port represents a particular
instance, bound to a specific chassis, of an otherwise distributed
port.  The ovn-controller on that chassis populates the "chassis"
column for this record as an indication for other ovn-controllers of
its physical location.  Other ovn-controllers do not treat this port
as a local port.

A "chassisredirect" port should never be used as an "inport".  When an
ingress pipeline sets the "outport", it may set the value to a logical
port of type "chassisredirect".  This will cause the packet to be
directed to a specific chassis to carry out the egress logical router
pipeline, in the same way that a logical switch forwards egress traffic
to a VIF port residing on a specific chassis.  At the beginning of the
egress pipeline, the "outport" will be reset to the value of the
distributed port.

For outbound traffic to be handled in a centralized manner, the
"outport" should be set to the "chassisredirect" port representing
centralized gateway functionality in the otherwise distributed router.
For outbound traffic to be handled in a distributed manner, locally on
the source chassis, the "outport" should be set to the existing "patch"
port representing distributed gateway functionality.

Inbound traffic will be directed to the appropriate chassis by
restricting source MAC address usage and ARP responses to that chassis,
or by running dynamic routing protocols.

Note that "chassisredirect" ports have no associated IP or MAC addresses.
Any pipeline stages that depend on port specific IP or MAC addresses
should be carried out in the context of the distributed port.

Although the abstraction represented by the "chassisredirect" port
binding is generalized, in this patch the "chassisredirect" port binding
is only created for NB logical router ports that specify the new
"redirect-chassis" option.  There is no explicit notion of a
"chassisredirect" port in the NB database.  The expectation is when
capabilities are implemented that take advantage of "chassisredirect"
ports (e.g. NAT), the addition of flows specifying a "chassisredirect"
port as the outport will also be triggered by the presence of the
"redirect-chassis" option.

Signed-off-by: Mickey Spiegel <mickeys.dev@gmail.com>
---
 ovn/controller/binding.c            | 143 ++++++++++++++++-
 ovn/controller/ovn-controller.8.xml |  15 ++
 ovn/controller/ovn-controller.c     |   3 +-
 ovn/controller/physical.c           |  68 ++++++--
 ovn/controller/physical.h           |   2 +
 ovn/northd/ovn-northd.8.xml         |  51 +++++-
 ovn/northd/ovn-northd.c             | 179 ++++++++++++++++++++--
 ovn/ovn-nb.ovsschema                |   9 +-
 ovn/ovn-nb.xml                      |  26 ++++
 ovn/ovn-sb.xml                      |  35 +++++
 tests/ovn.at                        | 298 ++++++++++++++++++++++++++++++++++++
 11 files changed, 801 insertions(+), 28 deletions(-)
diff mbox

Patch

diff --git a/ovn/controller/binding.c b/ovn/controller/binding.c
index fb76032..b603a9d 100644
--- a/ovn/controller/binding.c
+++ b/ovn/controller/binding.c
@@ -68,7 +68,8 @@  static void
 get_local_iface_ids(const struct ovsrec_bridge *br_int,
                     struct shash *lport_to_iface,
                     struct sset *all_lports,
-                    struct sset *egress_ifaces)
+                    struct sset *egress_ifaces,
+                    struct shash *chassisredirect_ports)
 {
     int i;
 
@@ -81,10 +82,20 @@  get_local_iface_ids(const struct ovsrec_bridge *br_int,
             continue;
         }
 
+        const char *chassisredirect_port = smap_get(&port_rec->external_ids,
+                                                   "ovn-chassisredirect-port");
+
         for (j = 0; j < port_rec->n_interfaces; j++) {
             const struct ovsrec_interface *iface_rec;
 
             iface_rec = port_rec->interfaces[j];
+
+            if (chassisredirect_port) {
+                shash_add(chassisredirect_ports, chassisredirect_port,
+                          port_rec);
+                break;
+            }
+
             iface_id = smap_get(&iface_rec->external_ids, "iface-id");
 
             if (iface_id) {
@@ -275,13 +286,83 @@  setup_qos(const char *egress_iface, struct hmap *queue_map)
 }
 
 static void
+create_port_from_sb(struct controller_ctx *ctx,
+                    const struct ovsrec_bridge *br_int,
+                    const struct sbrec_port_binding *binding_rec,
+                    struct hmap *local_datapaths,
+                    const char *ovs_port_type,
+                    const char *iface_type)
+{
+    /* This function only handles ports with a single interface */
+    /* If local datapath does not exist return */
+    struct local_datapath *ld = get_local_datapath(local_datapaths,
+                                    binding_rec->datapath->tunnel_key);
+    if (!ld || !ctx->ovs_idl_txn || !binding_rec->logical_port) {
+        return;
+    }
+    struct ovsrec_interface *iface;
+    iface = ovsrec_interface_insert(ctx->ovs_idl_txn);
+    ovsrec_interface_set_name(iface, binding_rec->logical_port);
+    ovsrec_interface_set_type(iface, iface_type);
+
+    struct smap ids;
+    smap_clone(&ids, &binding_rec->options);
+    ovsrec_interface_set_external_ids(iface, &ids);
+    smap_destroy(&ids);
+
+    struct ovsrec_port *port;
+    port = ovsrec_port_insert(ctx->ovs_idl_txn);
+    ovsrec_port_set_name(port, binding_rec->logical_port);
+    ovsrec_port_set_interfaces(port, &iface, 1);
+    const struct smap port_ids = SMAP_CONST1(&port_ids,
+                                             ovs_port_type,
+                                             binding_rec->logical_port);
+    ovsrec_port_set_external_ids(port, &port_ids);
+
+    struct ovsrec_port **ports;
+    ports = xmalloc(sizeof *ports * (br_int->n_ports + 1));
+    memcpy(ports, br_int->ports, sizeof *ports * br_int->n_ports);
+    ports[br_int->n_ports] = port;
+    ovsrec_bridge_verify_ports(br_int);
+    ovsrec_bridge_set_ports(br_int, ports, br_int->n_ports + 1);
+
+    free(ports);
+}
+
+static void
+remove_ovs_port(const struct ovsrec_bridge *bridge,
+                const struct ovsrec_port *port)
+{
+    size_t i;
+    for (i = 0; i < bridge->n_ports; i++) {
+        if (bridge->ports[i] != port) {
+            continue;
+        }
+        struct ovsrec_port **new_ports;
+        new_ports = xmemdup(bridge->ports,
+                sizeof *new_ports * (bridge->n_ports - 1));
+        if (i != bridge->n_ports - 1) {
+            /* Removed port was not last */
+            new_ports[i] = bridge->ports[bridge->n_ports - 1];
+        }
+        ovsrec_bridge_verify_ports(bridge);
+        ovsrec_bridge_set_ports(bridge, new_ports, bridge->n_ports - 1);
+        free(new_ports);
+        ovsrec_port_delete(port);
+        return;
+    }
+}
+
+static void
 consider_local_datapath(struct controller_ctx *ctx,
+                        const struct ovsrec_bridge *br_int,
                         const struct sbrec_chassis *chassis_rec,
                         const struct sbrec_port_binding *binding_rec,
                         struct hmap *qos_map,
                         struct hmap *local_datapaths,
                         struct shash *lport_to_iface,
-                        struct sset *all_lports)
+                        struct sset *all_lports,
+                        struct shash *chassisredirect_ports)
 {
     const struct ovsrec_interface *iface_rec
         = shash_find_data(lport_to_iface, binding_rec->logical_port);
@@ -338,6 +419,47 @@  consider_local_datapath(struct controller_ctx *ctx,
                       binding_rec->logical_port);
             sbrec_port_binding_set_chassis(binding_rec, chassis_rec);
         }
+    } else if (chassis_rec && !strcmp(binding_rec->type,
+                                      "chassisredirect")) {
+        const char *chassis_options = smap_get(&binding_rec->options,
+                                               "redirect-chassis");
+        const struct ovsrec_port *port_rec
+            = shash_find_and_delete(chassisredirect_ports,
+                                    binding_rec->logical_port);
+        if (chassis_options && !strcmp(chassis_options, chassis_rec->name)) {
+            add_local_datapath(local_datapaths, binding_rec);
+            if (!port_rec) {
+                create_port_from_sb(ctx, br_int, binding_rec,
+                                    local_datapaths,
+                                    "ovn-chassisredirect-port",
+                                    "internal");
+            }
+            if (ctx->ovnsb_idl_txn
+                && binding_rec->chassis != chassis_rec) {
+                if (binding_rec->chassis) {
+                    VLOG_INFO("Changing chassis for redirect port %s "
+                              "from %s to %s.",
+                              binding_rec->logical_port,
+                              binding_rec->chassis->name,
+                              chassis_rec->name);
+                } else {
+                    VLOG_INFO("Claiming redirect port %s for this "
+                              "chassis.",
+                              binding_rec->logical_port);
+                }
+                sbrec_port_binding_set_chassis(binding_rec, chassis_rec);
+            }
+        } else {
+            if (ctx->ovnsb_idl_txn
+                && binding_rec->chassis == chassis_rec) {
+                VLOG_INFO("Releasing redirect port %s from this chassis.",
+                          binding_rec->logical_port);
+                sbrec_port_binding_set_chassis(binding_rec, NULL);
+            }
+            if (ctx->ovs_idl && port_rec) {
+                remove_ovs_port(br_int, port_rec);
+            }
+        }
     } else if (!strcmp(binding_rec->type, "l3gateway")) {
         const char *chassis = smap_get(&binding_rec->options,
                                        "l3gateway-chassis");
@@ -372,6 +494,8 @@  binding_run(struct controller_ctx *ctx, const struct ovsrec_bridge *br_int,
     struct shash lport_to_iface = SHASH_INITIALIZER(&lport_to_iface);
     struct sset egress_ifaces = SSET_INITIALIZER(&egress_ifaces);
     struct hmap qos_map;
+    struct shash chassisredirect_ports =
+                                  SHASH_INITIALIZER(&chassisredirect_ports);
 
     chassis_rec = get_chassis(ctx->ovnsb_idl, chassis_id);
     if (!chassis_rec) {
@@ -381,17 +505,17 @@  binding_run(struct controller_ctx *ctx, const struct ovsrec_bridge *br_int,
     hmap_init(&qos_map);
     if (br_int) {
         get_local_iface_ids(br_int, &lport_to_iface, all_lports,
-                            &egress_ifaces);
+                            &egress_ifaces, &chassisredirect_ports);
     }
 
     /* Run through each binding record to see if it is resident on this
      * chassis and update the binding accordingly.  This includes both
      * directly connected logical ports and children of those ports. */
     SBREC_PORT_BINDING_FOR_EACH(binding_rec, ctx->ovnsb_idl) {
-        consider_local_datapath(ctx, chassis_rec, binding_rec,
+        consider_local_datapath(ctx, br_int, chassis_rec, binding_rec,
                                 sset_is_empty(&egress_ifaces) ? NULL :
                                 &qos_map, local_datapaths, &lport_to_iface,
-                                all_lports);
+                                all_lports, &chassisredirect_ports);
 
     }
 
@@ -406,6 +530,15 @@  binding_run(struct controller_ctx *ctx, const struct ovsrec_bridge *br_int,
     shash_destroy(&lport_to_iface);
     sset_destroy(&egress_ifaces);
     hmap_destroy(&qos_map);
+
+    /* At this point, 'chassisredirect_ports' only contains OVS ports with
+     * no corresponding SB port binding.  Delete these ports from OVS. */
+    struct shash_node *node;
+    SHASH_FOR_EACH (node, &chassisredirect_ports) {
+        remove_ovs_port(br_int, node->data);
+    }
+
+    shash_destroy(&chassisredirect_ports);
 }
 
 /* Returns true if the database is all cleaned up, false if more work is
diff --git a/ovn/controller/ovn-controller.8.xml b/ovn/controller/ovn-controller.8.xml
index 9f4dad1..662f41c 100644
--- a/ovn/controller/ovn-controller.8.xml
+++ b/ovn/controller/ovn-controller.8.xml
@@ -283,6 +283,21 @@ 
           logical patch port that it implements.
         </p>
       </dd>
+
+      <dt>
+        <code>external-ids:ovn-chassisredirect-port</code> in the
+        <code>Port</code> table
+      </dt>
+
+      <dd>
+        <p>
+          The presence of this key identifies a port as one created by
+          <code>ovn-controller</code> to implement a
+          <code>chassisredirect</code> logical port.  Its value is the name
+          of the logical port with type=chassisredirect that the port
+          implements.
+        </p>
+      </dd>
     </dl>
 
     <h1>Runtime Management Commands</h1>
diff --git a/ovn/controller/ovn-controller.c b/ovn/controller/ovn-controller.c
index 9829207..03b7e79 100644
--- a/ovn/controller/ovn-controller.c
+++ b/ovn/controller/ovn-controller.c
@@ -541,7 +541,8 @@  main(int argc, char *argv[])
                           chassis_id, &flow_table);
 
                 physical_run(&ctx, mff_ovn_geneve,
-                             br_int, chassis_id, &ct_zones, &flow_table,
+                             br_int, chassis_id,
+                             &lports, &ct_zones, &flow_table,
                              &local_datapaths, &patched_datapaths);
 
                 ofctrl_put(&flow_table, &pending_ct_zones,
diff --git a/ovn/controller/physical.c b/ovn/controller/physical.c
index 48adb78..bc8ba4e 100644
--- a/ovn/controller/physical.c
+++ b/ovn/controller/physical.c
@@ -19,6 +19,7 @@ 
 #include "flow.h"
 #include "lflow.h"
 #include "lib/poll-loop.h"
+#include "lport.h"
 #include "ofctrl.h"
 #include "openvswitch/hmap.h"
 #include "openvswitch/match.h"
@@ -159,6 +160,7 @@  consider_port_binding(enum mf_field_id mff_ovn_geneve,
                       struct hmap *local_datapaths,
                       struct hmap *patched_datapaths,
                       const struct sbrec_port_binding *binding,
+                      const struct lport_index *lports,
                       struct ofpbuf *ofpacts_p,
                       struct hmap *flow_table)
 {
@@ -199,6 +201,9 @@  consider_port_binding(enum mf_field_id mff_ovn_geneve,
      *       The same logic handles logical patch ports, as well as
      *       localnet patch ports.
      *
+     *       If the port is a chassisredirect port on the chassis we are
+     *       managing, the OpenFlow port for the chassisredirect port.
+     *
      *       For a container nested inside a VM and accessible via a VLAN,
      *       'tag' is the VLAN ID; otherwise 'tag' is 0.
      *
@@ -359,18 +364,54 @@  consider_port_binding(enum mf_field_id mff_ovn_geneve,
         match_set_metadata(&match, htonll(dp_key));
         match_set_reg(&match, MFF_LOG_OUTPORT - MFF_REG0, port_key);
 
-        if (zone_id) {
-            put_load(zone_id, MFF_LOG_CT_ZONE, 0, 32, ofpacts_p);
-        }
-        if (zone_id_dnat) {
-            put_load(zone_id_dnat, MFF_LOG_DNAT_ZONE, 0, 32, ofpacts_p);
+        /* If this is a chassisredirect port, reset the outport to
+         * the distributed port */
+        bool chassisredirect_error = false;
+        if (!strcmp(binding->type, "chassisredirect")) {
+            const char *distributed_port = smap_get(&binding->options,
+                                                    "distributed-port");
+            const struct sbrec_port_binding *distributed_binding;
+            distributed_binding = lport_lookup_by_name(lports,
+                                                       distributed_port);
+            if (!distributed_binding) {
+                chassisredirect_error = true; /* Packet will be dropped. */
+                static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 1);
+                VLOG_WARN_RL(&rl, "No port binding record for distributed "
+                         "port %s referred by chassisredirect port %s",
+                         distributed_port,
+                         binding->logical_port);
+            } else if (binding->datapath !=
+                       distributed_binding->datapath) {
+                static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 1);
+                VLOG_WARN_RL(&rl,
+                             "chassisredirect port %s refers to "
+                             "distributed port %s in wrong datapath",
+                             binding->logical_port,
+                             distributed_port);
+                chassisredirect_error = true; /* Packet will be dropped. */
+            } else {
+                put_load(distributed_binding->tunnel_key,
+                         MFF_LOG_OUTPORT, 0, 32, ofpacts_p);
+                zone_id = simap_get(ct_zones, distributed_port);
+            }
         }
-        if (zone_id_snat) {
-            put_load(zone_id_snat, MFF_LOG_SNAT_ZONE, 0, 32, ofpacts_p);
+
+        if (!chassisredirect_error) {
+        /* Normal handling if not chassisredirect port or no error. */
+            if (zone_id) {
+                put_load(zone_id, MFF_LOG_CT_ZONE, 0, 32, ofpacts_p);
+            }
+            if (zone_id_dnat) {
+                put_load(zone_id_dnat, MFF_LOG_DNAT_ZONE, 0, 32, ofpacts_p);
+            }
+            if (zone_id_snat) {
+                put_load(zone_id_snat, MFF_LOG_SNAT_ZONE, 0, 32, ofpacts_p);
+            }
+
+            /* Resubmit to table 34. */
+            put_resubmit(OFTABLE_CHECK_LOOPBACK, ofpacts_p);
         }
 
-        /* Resubmit to table 34. */
-        put_resubmit(OFTABLE_CHECK_LOOPBACK, ofpacts_p);
         ofctrl_add_flow(flow_table, OFTABLE_LOCAL_OUTPUT, 100,
                         &match, ofpacts_p);
 
@@ -630,6 +671,7 @@  consider_mc_group(enum mf_field_id mff_ovn_geneve,
 void
 physical_run(struct controller_ctx *ctx, enum mf_field_id mff_ovn_geneve,
              const struct ovsrec_bridge *br_int, const char *this_chassis_id,
+             const struct lport_index *lports,
              const struct simap *ct_zones, struct hmap *flow_table,
              struct hmap *local_datapaths, struct hmap *patched_datapaths)
 {
@@ -661,6 +703,8 @@  physical_run(struct controller_ctx *ctx, enum mf_field_id mff_ovn_geneve,
                                         "ovn-l3gateway-port");
         const char *logpatch = smap_get(&port_rec->external_ids,
                                         "ovn-logical-patch-port");
+        const char *chassisredirect = smap_get(&port_rec->external_ids,
+                                               "ovn-chassisredirect-port");
 
         for (int j = 0; j < port_rec->n_interfaces; j++) {
             const struct ovsrec_interface *iface_rec = port_rec->interfaces[j];
@@ -693,6 +737,10 @@  physical_run(struct controller_ctx *ctx, enum mf_field_id mff_ovn_geneve,
                 /* Logical patch ports can be handled just like VIFs. */
                 simap_put(&new_localvif_to_ofport, logpatch, ofport);
                 break;
+            } else if (!strcmp(iface_rec->type, "internal") &&
+                       chassisredirect) {
+                simap_put(&new_localvif_to_ofport, chassisredirect, ofport);
+                break;
             } else if (chassis_id) {
                 enum chassis_tunnel_type tunnel_type;
                 if (!strcmp(iface_rec->type, "geneve")) {
@@ -782,7 +830,7 @@  physical_run(struct controller_ctx *ctx, enum mf_field_id mff_ovn_geneve,
     const struct sbrec_port_binding *binding;
     SBREC_PORT_BINDING_FOR_EACH (binding, ctx->ovnsb_idl) {
         consider_port_binding(mff_ovn_geneve, ct_zones, local_datapaths,
-                              patched_datapaths, binding, &ofpacts,
+                              patched_datapaths, binding, lports, &ofpacts,
                               flow_table);
     }
 
diff --git a/ovn/controller/physical.h b/ovn/controller/physical.h
index 86ce93c..8f66f62 100644
--- a/ovn/controller/physical.h
+++ b/ovn/controller/physical.h
@@ -32,6 +32,7 @@  struct hmap;
 struct ovsdb_idl;
 struct ovsrec_bridge;
 struct simap;
+struct lport_index;
 
 /* OVN Geneve option information.
  *
@@ -43,6 +44,7 @@  struct simap;
 void physical_register_ovs_idl(struct ovsdb_idl *);
 void physical_run(struct controller_ctx *, enum mf_field_id mff_ovn_geneve,
                   const struct ovsrec_bridge *br_int, const char *chassis_id,
+                  const struct lport_index *lports,
                   const struct simap *ct_zones, struct hmap *flow_table,
                   struct hmap *local_datapaths, struct hmap *patched_datapaths);
 
diff --git a/ovn/northd/ovn-northd.8.xml b/ovn/northd/ovn-northd.8.xml
index f3c1682..a57afac 100644
--- a/ovn/northd/ovn-northd.8.xml
+++ b/ovn/northd/ovn-northd.8.xml
@@ -942,6 +942,18 @@  next;
         </pre>
 
         <p>
+          For the gateway port <var>P</var> on a distributed router,
+          an additional flow is programmed on chassis other than the
+          <code>redirect-chassis</code>.  This priority-95 flow drops
+          ICMP echo requests that match <code>ip4.dst == <var>A</var>
+          &amp;&amp; inport == <var>P</var> &amp;&amp; icmp4.type == 8
+          &amp;&amp; icmp4.code == 0</code> (ICMP echo request).
+          This restricts ICMP echo replies on the gateway port to
+          only the distributed router instance on the
+          <code>redirect-chassis</code>.
+        </p>
+
+        <p>
           Flows for ICMPv6 echo requests use the following actions:
         </p>
 
@@ -980,6 +992,17 @@  outport = <var>P</var>;
 flags.loopback = 1;
 output;
         </pre>
+
+        <p>
+          For the gateway port on a distributed logical router (where
+          one of the logical router ports specifies a
+          <code>redirect-chassis</code>), the above flows are only
+          programmed on the gateway port instance on the
+          <code>redirect-chassis</code>.  This behavior avoids generation
+          of multiple ARP responses from different chassis, and allows
+          upstream MAC learning to point to the
+          <code>redirect-chassis</code>.
+        </p>
       </li>
 
       <li>
@@ -1485,7 +1508,33 @@  next;
       </li>
     </ul>
 
-    <h3>Ingress Table 7: ARP Request</h3>
+    <h3>Ingress Table 7: Gateway Redirect</h3>
+
+    <p>
+      For distributed logical routers where one of the logical router
+      ports specifies a <code>redirect-chassis</code>, this table redirects
+      certain packets to the gateway port instance on the
+      <code>redirect-chassis</code>.  This table has the following flows:
+    </p>
+
+    <ul>
+      <li>
+        A priority-50 logical flow with match
+        <code>outport == <var>GW</var></code> has actions
+        <code>outport = <var>CR</var>; next;</code>, where
+        <var>GW</var> is the logical router gateway port and
+        <var>CR</var> is the <code>chassisredirect</code> port
+        representing the instance of the logical router gateway
+        port on the <code>redirect-chassis</code>.
+      </li>
+
+      <li>
+        A priority-0 logical flow with match <code>1</code> has actions
+        <code>next;</code>.
+      </li>
+    </ul>
+
+    <h3>Ingress Table 8: ARP Request</h3>
 
     <p>
       In the common case where the Ethernet destination has been resolved, this
diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
index f2dc353..05a1224 100644
--- a/ovn/northd/ovn-northd.c
+++ b/ovn/northd/ovn-northd.c
@@ -132,7 +132,8 @@  enum ovn_stage {
     PIPELINE_STAGE(ROUTER, IN,  DNAT,        4, "lr_in_dnat")         \
     PIPELINE_STAGE(ROUTER, IN,  IP_ROUTING,  5, "lr_in_ip_routing")   \
     PIPELINE_STAGE(ROUTER, IN,  ARP_RESOLVE, 6, "lr_in_arp_resolve")  \
-    PIPELINE_STAGE(ROUTER, IN,  ARP_REQUEST, 7, "lr_in_arp_request")  \
+    PIPELINE_STAGE(ROUTER, IN,  GW_REDIRECT, 7, "lr_in_gw_redirect")  \
+    PIPELINE_STAGE(ROUTER, IN,  ARP_REQUEST, 8, "lr_in_arp_request")  \
                                                                       \
     /* Logical router egress stages. */                               \
     PIPELINE_STAGE(ROUTER, OUT, SNAT,      0, "lr_out_snat")          \
@@ -382,6 +383,15 @@  struct ovn_datapath {
 
     /* IPAM data. */
     struct hmap ipam;
+
+    /* OVN northd only needs to know about the logical router gateway port for
+     * NAT on a distributed router.  This is populated only when there is a
+     * "redirect-chassis" specified for one of the ports on the logical router.
+     * Otherwise this will be NULL. */
+    struct ovn_port *l3gateway_port;
+    /* The "derived" OVN port representing the instance of l3gateway_port on
+     * the "redirect-chassis". */
+    struct ovn_port *l3redirect_port;
 };
 
 struct macam_node {
@@ -658,6 +668,9 @@  struct ovn_port {
 
     struct lport_addresses lrp_networks;
 
+    bool derived; /* Indicates whether this is an additional port
+                   * derived from nbsp or nbrp. */
+
     /* The port's peer:
      *
      *     - A switch port S of type "router" has a router port R as a peer,
@@ -687,6 +700,7 @@  ovn_port_create(struct hmap *ports, const char *key,
     op->sb = sb;
     op->nbsp = nbsp;
     op->nbrp = nbrp;
+    op->derived = false;
     hmap_insert(ports, &op->key_node, hash_string(op->key, 0));
     return op;
 }
@@ -737,6 +751,16 @@  ovn_port_allocate_key(struct ovn_datapath *od)
                           (1u << 15) - 1, &od->port_key_hint);
 }
 
+static char *
+chassis_redirect_name(const char *port_name)
+/* XXX Need to derive unique name of max length 15 bytes since chassisredirect
+ * ports use interface type "internal".  If name is greater than 15 bytes, no
+ * ofport is assigned, so chassisredirect port is never treated as local and
+ * the necessary flows are not created. */
+{
+    return xasprintf("cr-%s", port_name);
+}
+
 static bool
 ipam_is_duplicate_mac(struct eth_addr *ea, uint64_t mac64, bool warn)
 {
@@ -1298,6 +1322,52 @@  join_logical_ports(struct northd_context *ctx,
                 op->lrp_networks = lrp_networks;
                 op->od = od;
                 ipam_add_port_addresses(op->od, op);
+
+                const char *redirect_chassis = smap_get(&op->nbrp->options,
+                                                        "redirect-chassis");
+                if (redirect_chassis) {
+                    /* Additional "derived" ovn_port crp represents the
+                     * instance of op on the "redirect-chassis". */
+                    const char *gw_chassis = smap_get(&op->od->nbr->options,
+                                                   "chassis");
+                    if (gw_chassis) {
+                        static struct vlog_rate_limit rl
+                            = VLOG_RATE_LIMIT_INIT(1, 1);
+                        VLOG_WARN_RL(&rl, "Bad configuration: "
+                                     "redirect-chassis configured on port %s "
+                                     "on L3 gateway router", nbrp->name);
+                        continue;
+                    }
+                    char *redirect_name = chassis_redirect_name(nbrp->name);
+                    struct ovn_port *crp = ovn_port_find(ports, redirect_name);
+                    if (crp) {
+                        crp->derived = true;
+                        crp->nbrp = nbrp;
+                        ovs_list_remove(&crp->list);
+                        ovs_list_push_back(both, &crp->list);
+                    } else {
+                        crp = ovn_port_create(ports, redirect_name,
+                                              NULL, nbrp, NULL);
+                        crp->derived = true;
+                        ovs_list_push_back(nb_only, &crp->list);
+                    }
+                    crp->od = od;
+                    free(redirect_name);
+
+                    /* Set l3gateway_port and l3redirect_port in od, for later
+                     * use during flow creation. */
+                    if (od->l3gateway_port || od->l3redirect_port) {
+                        static struct vlog_rate_limit rl
+                            = VLOG_RATE_LIMIT_INIT(1, 1);
+                        VLOG_WARN_RL(&rl, "Bad configuration: multiple ports "
+                                     "with redirect-chassis on same logical "
+                                     "router %s", od->nbr->name);
+                        continue;
+                    } else {
+                        od->l3gateway_port = op;
+                        od->l3redirect_port = crp;
+                    }
+                }
             }
         }
     }
@@ -1306,7 +1376,7 @@  join_logical_ports(struct northd_context *ctx,
      * to their peers. */
     struct ovn_port *op;
     HMAP_FOR_EACH (op, key_node, ports) {
-        if (op->nbsp && !strcmp(op->nbsp->type, "router")) {
+        if (op->nbsp && !strcmp(op->nbsp->type, "router") && !op->derived) {
             const char *peer_name = smap_get(&op->nbsp->options, "router-port");
             if (!peer_name) {
                 continue;
@@ -1323,7 +1393,7 @@  join_logical_ports(struct northd_context *ctx,
                 op->od->router_ports,
                 sizeof *op->od->router_ports * (op->od->n_router_ports + 1));
             op->od->router_ports[op->od->n_router_ports++] = op;
-        } else if (op->nbrp && op->nbrp->peer) {
+        } else if (op->nbrp && op->nbrp->peer && !op->derived) {
             struct ovn_port *peer = ovn_port_find(ports, op->nbrp->peer);
             if (peer) {
                 if (peer->nbrp) {
@@ -1353,18 +1423,29 @@  ovn_port_update_sbrec(const struct ovn_port *op,
         /* If the router is for l3 gateway, it resides on a chassis
          * and its port type is "l3gateway". */
         const char *chassis = smap_get(&op->od->nbr->options, "chassis");
-        if (chassis) {
+        if (op->derived) {
+            sbrec_port_binding_set_type(op->sb, "chassisredirect");
+        } else if (chassis) {
             sbrec_port_binding_set_type(op->sb, "l3gateway");
         } else {
             sbrec_port_binding_set_type(op->sb, "patch");
         }
 
-        const char *peer = op->peer ? op->peer->key : "<error>";
         struct smap new;
         smap_init(&new);
-        smap_add(&new, "peer", peer);
-        if (chassis) {
-            smap_add(&new, "l3gateway-chassis", chassis);
+        if (op->derived) {
+            const char *redirect_chassis = smap_get(&op->nbrp->options,
+                                                    "redirect-chassis");
+            if (redirect_chassis) {
+                smap_add(&new, "redirect-chassis", redirect_chassis);
+            }
+            smap_add(&new, "distributed-port", op->nbrp->name);
+        } else {
+            const char *peer = op->peer ? op->peer->key : "<error>";
+            smap_add(&new, "peer", peer);
+            if (chassis) {
+                smap_add(&new, "l3gateway-chassis", chassis);
+            }
         }
         sbrec_port_binding_set_options(op->sb, &new);
         smap_destroy(&new);
@@ -3547,6 +3628,12 @@  build_lrouter_flows(struct hmap *datapaths, struct hmap *ports,
             continue;
         }
 
+        if (op->derived) {
+            /* No ingress packets should be received on a chassisredirect
+             * port. */
+            continue;
+        }
+
         ds_clear(&match);
         ds_put_format(&match, "(eth.mcast || eth.dst == %s) && inport == %s",
                       op->lrp_networks.ea_s, op->json_key);
@@ -3612,6 +3699,11 @@  build_lrouter_flows(struct hmap *datapaths, struct hmap *ports,
             continue;
         }
 
+        if (op->derived) {
+            /* No ingress packets are accepted on a chassisredirect
+             * port, so no need to program flows for that port. */
+            continue;
+        }
 
         if (op->lrp_networks.n_ipv4_addrs) {
             /* L3 admission control: drop packets that originate from an
@@ -3642,6 +3734,21 @@  build_lrouter_flows(struct hmap *datapaths, struct hmap *ports,
                 "next; ");
             ovn_lflow_add(lflows, op->od, S_ROUTER_IN_IP_INPUT, 90,
                           ds_cstr(&match), ds_cstr(&actions));
+
+            /* Add flow dropping ICMP echo requests for router's IP
+             * address that are received on l3gateway_port, on all
+             * chassis except the "redirect-chassis".  Traffic with
+             * eth.src = l3gateway_port->lrp_networks.ea_s should only
+             * be sent from the "redirect-chassis", so that upstream
+             * MAC learning points to the "redirect-chassis". */
+            if (op->od->l3gateway_port && op->od->l3redirect_port) {
+                ds_put_format(&match, " && inport == %s"
+                                      " && !is_chassis_resident(%s)",
+                              op->od->l3gateway_port->json_key,
+                              op->od->l3redirect_port->json_key);
+                ovn_lflow_add(lflows, op->od, S_ROUTER_IN_IP_INPUT, 95,
+                              ds_cstr(&match), "drop;");
+            }
         }
 
         /* ARP reply.  These flows reply to ARP requests for the router's own
@@ -3651,6 +3758,16 @@  build_lrouter_flows(struct hmap *datapaths, struct hmap *ports,
             ds_put_format(&match,
                           "inport == %s && arp.tpa == %s && arp.op == 1",
                           op->json_key, op->lrp_networks.ipv4_addrs[i].addr_s);
+            if (op->od->l3gateway_port && op == op->od->l3gateway_port
+                && op->od->l3redirect_port) {
+                /* Traffic with eth.src = l3gateway_port->lrp_networks.ea_s
+                 * should only be sent from the "redirect-chassis", so that
+                 * upstream MAC learning points to the "redirect-chassis".
+                 * Also need to avoid generation of multiple ARP responses
+                 * from different chassis. */
+                ds_put_format(&match, " && is_chassis_resident(%s)",
+                              op->od->l3redirect_port->json_key);
+            }
 
             ds_clear(&actions);
             ds_put_format(&actions,
@@ -3837,6 +3954,12 @@  build_lrouter_flows(struct hmap *datapaths, struct hmap *ports,
             continue;
         }
 
+        if (op->derived) {
+            /* No ingress packets are accepted on a chassisredirect
+             * port, so no need to program flows for that port. */
+            continue;
+        }
+
         if (op->lrp_networks.n_ipv6_addrs) {
             /* L3 admission control: drop packets that originate from an
              * IPv6 address owned by the router (priority 100). */
@@ -4391,7 +4514,37 @@  build_lrouter_flows(struct hmap *datapaths, struct hmap *ports,
                       "get_nd(outport, xxreg0); next;");
     }
 
-    /* Local router ingress table 7: ARP request.
+    /* Logical router ingress table 7: Gateway redirect.
+     *
+     * For traffic with outport equal to the l3gateway_port
+     * on a distributed router, this table redirects a subset
+     * of the traffic to the l3redirect_port which represents
+     * the central instance of the l3gateway_port.
+     */
+    HMAP_FOR_EACH (od, key_node, datapaths) {
+        if (!od->nbr) {
+            continue;
+        }
+        if (od->l3gateway_port && od->l3redirect_port) {
+            /* For traffic with outport == l3gateway_port, if the
+             * packet did not match any higher priority redirect
+             * rule, then the traffic is redirected to the central
+             * instance of the l3gateway_port. */
+            ds_clear(&match);
+            ds_put_format(&match, "outport == %s",
+                          od->l3gateway_port->json_key);
+            ds_clear(&actions);
+            ds_put_format(&actions, "outport = %s; next;",
+                          od->l3redirect_port->json_key);
+            ovn_lflow_add(lflows, od, S_ROUTER_IN_GW_REDIRECT, 50,
+                          ds_cstr(&match), ds_cstr(&actions));
+        }
+
+        /* Packets are allowed by default. */
+        ovn_lflow_add(lflows, od, S_ROUTER_IN_GW_REDIRECT, 0, "1", "next;");
+    }
+
+    /* Local router ingress table 8: ARP request.
      *
      * In the common case where the Ethernet destination has been resolved,
      * this table outputs the packet (priority 0).  Otherwise, it composes
@@ -4427,6 +4580,14 @@  build_lrouter_flows(struct hmap *datapaths, struct hmap *ports,
             continue;
         }
 
+        if (op->derived) {
+            /* No egress packets should be processed in the context of
+             * a chassisredirect port.  The chassisredirect port should
+             * be replaced by the l3gateway port in the local output
+             * pipeline stage before egress processing. */
+            continue;
+        }
+
         ds_clear(&match);
         ds_put_format(&match, "outport == %s", op->json_key);
         ovn_lflow_add(lflows, op->od, S_ROUTER_OUT_DELIVERY, 100,
diff --git a/ovn/ovn-nb.ovsschema b/ovn/ovn-nb.ovsschema
index 65f2d7c..51f84ea 100644
--- a/ovn/ovn-nb.ovsschema
+++ b/ovn/ovn-nb.ovsschema
@@ -1,7 +1,7 @@ 
 {
     "name": "OVN_Northbound",
-    "version": "5.4.1",
-    "cksum": "3773248894 11490",
+    "version": "5.5.0",
+    "cksum": "376874946 11703",
     "tables": {
         "NB_Global": {
             "columns": {
@@ -182,6 +182,11 @@ 
         "Logical_Router_Port": {
             "columns": {
                 "name": {"type": "string"},
+                "options": {
+                    "type": {"key": "string",
+                             "value": "string",
+                             "min": 0,
+                             "max": "unlimited"}},
                 "networks": {"type": {"key": "string",
                                       "min": 1,
                                       "max": "unlimited"}},
diff --git a/ovn/ovn-nb.xml b/ovn/ovn-nb.xml
index 3e40881..6a9a19f 100644
--- a/ovn/ovn-nb.xml
+++ b/ovn/ovn-nb.xml
@@ -1060,6 +1060,32 @@ 
       port has all ingress and egress traffic dropped.
     </column>
 
+    <group title="Options">
+      <p>
+        Additional options for the logical router port.
+      </p>
+
+      <column name="options" key="redirect-chassis">
+        <p>
+          If set, this indicates a desire to create (in the southbound
+          database) an additional port that represents a particular instance,
+          bound to a specific chassis, of this otherwise distributed logical
+          router port.  This additional port can then be specified as an
+          <code>outport</code> in ingress pipeline flows.  This will cause
+          matching packets to be directed to a specific chassis to carry out
+          the egress pipeline.  At the beginning of the egress pipeline, the
+          <code>outport</code> will be reset to the value of the distributed
+          port.
+        </p>
+
+        <p>
+          This option specifies the name of the <code>chassis</code> to which
+          the additional southbound port binding of type
+          <code>chassisredirect</code> will be bound.
+        </p>
+      </column>
+    </group>
+
     <group title="Attachment">
       <p>
         A given router port serves one of two purposes:
diff --git a/ovn/ovn-sb.xml b/ovn/ovn-sb.xml
index 65191ed..a01102b 100644
--- a/ovn/ovn-sb.xml
+++ b/ovn/ovn-sb.xml
@@ -1771,6 +1771,21 @@  tcp.flags = RST;
             table="Port_Binding"/>:<code>vtep-logical-switch</code> must also
             be defined.
           </dd>
+
+          <dt><code>chassisredirect</code></dt>
+          <dd>
+            A logical port that represents a particular instance, bound
+            to a specific chassis, of an otherwise distributed parent
+            port (e.g. of type <code>patch</code>).  A
+            <code>chassisredirect</code> port should never be used as an
+            <code>inport</code>.  When an ingress pipeline sets the
+            <code>outport</code>, it may set the value to a logical port
+            of type <code>chassisredirect</code>.  This will cause the
+            packet to be directed to a specific chassis to carry out the
+            egress pipeline.  At the beginning of the egress pipeline,
+            the <code>outport</code> will be reset to the value of the
+            distributed port.
+          </dd>
         </dl>
       </column>
     </group>
@@ -1927,6 +1942,26 @@  tcp.flags = RST;
       </column>
     </group>
 
+    <group title="Chassis Redirect Options">
+      <p>
+        These options apply to logical ports with <ref column="type"/>
+        of <code>chassisredirect</code>.
+      </p>
+
+      <column name="options" key="distributed-port">
+        The name of the distributed port for which this
+        <code>chassisredirect</code> port represents a particular instance.
+      </column>
+
+      <column name="options" key="redirect-chassis">
+        The <code>chassis</code> that this <code>chassisredirect</code> port
+        is bound to.  This is taken from <ref table="Logical_Router_Port"
+        column="options" key="redirect-chassis" db="OVN_Northbound"/>
+        in the OVN_Northbound database's <ref table="Logical_Router_Port"
+        db="OVN_Northbound"/> table.
+      </column>
+    </group>
+
     <group title="Nested Containers">
       <p>
         These columns support containers nested within a VM.  Specifically,
diff --git a/tests/ovn.at b/tests/ovn.at
index e33efc3..d38d2c7 100644
--- a/tests/ovn.at
+++ b/tests/ovn.at
@@ -6037,3 +6037,301 @@  OVS_APP_EXIT_AND_WAIT([ovs-vswitchd])
 OVS_APP_EXIT_AND_WAIT([ovsdb-server])
 
 AT_CLEANUP
+
+AT_SETUP([ovn -- 1 LR with distributed router gateway port])
+AT_SKIP_IF([test $HAVE_PYTHON = no])
+ovn_start
+
+# Logical network:
+# One LR R1 that has switches foo (192.168.1.0/24) and
+# alice (172.16.1.0/24) connected to it.  The logical port
+# between R1 and alice has a "redirect-chassis" specified,
+# i.e. it is the distributed router gateway port.
+# Switch alice also has a localnet port defined.
+# An additional switch outside has a localnet port and the
+# same subnet as alice (172.16.1.0/24).
+
+# Physical network:
+# Three hypervisors hv[123].
+# hv1 hosts vif foo1.
+# hv2 is the "redirect-chassis" that hosts the distributed
+# router gateway port.
+# hv3 hosts vif outside1.
+# In order to show that connectivity works only through hv2,
+# an initial round of tests is run without any bridge-mapping
+# defined for the localnet on hv2.  These tests are expected
+# to fail.
+# Subsequent tests are run after defining the bridge-mapping
+# for the localnet on hv2. These tests are expected to succeed.
+
+# Create three hypervisors and create OVS ports corresponding
+to logical ports.
+net_add n1
+
+sim_add hv1
+as hv1
+ovs-vsctl add-br br-phys
+ovn_attach n1 br-phys 192.168.0.1
+ovs-vsctl -- add-port br-int hv1-vif1 -- \
+    set interface hv1-vif1 external-ids:iface-id=foo1 \
+    options:tx_pcap=hv1/vif1-tx.pcap \
+    options:rxq_pcap=hv1/vif1-rx.pcap \
+    ofport-request=1
+# The following port can be deleted once extended localnet is implemented
+ovs-vsctl -- add-port br-int hv1-vif2 -- \
+    set interface hv1-vif2 external-ids:iface-id=alice1 \
+    options:tx_pcap=hv1/vif2-tx.pcap \
+    options:rxq_pcap=hv1/vif2-rx.pcap \
+    ofport-request=1
+
+sim_add hv2
+as hv2
+ovs-vsctl add-br br-phys
+ovn_attach n1 br-phys 192.168.0.2
+# The following port can be deleted once extended localnet is implemented
+ovs-vsctl -- add-port br-int hv2-vif1 -- \
+    set interface hv2-vif1 external-ids:iface-id=alice2 \
+    options:tx_pcap=hv2/vif1-tx.pcap \
+    options:rxq_pcap=hv2/vif1-rx.pcap \
+    ofport-request=1
+
+sim_add hv3
+as hv3
+ovs-vsctl add-br br-phys
+ovn_attach n1 br-phys 192.168.0.3
+ovs-vsctl -- add-port br-int hv3-vif1 -- \
+    set interface hv3-vif1 external-ids:iface-id=outside1 \
+    options:tx_pcap=hv3/vif1-tx.pcap \
+    options:rxq_pcap=hv3/vif1-rx.pcap \
+    ofport-request=1
+
+# Pre-populate the hypervisors' ARP tables so that we don't lose any
+# packets for ARP resolution (native tunneling doesn't queue packets
+# for ARP resolution).
+ovn_populate_arp
+
+ovn-nbctl create Logical_Router name=R1
+
+ovn-nbctl ls-add foo
+ovn-nbctl ls-add alice
+ovn-nbctl ls-add outside
+
+# Connect foo to R1
+ovn-nbctl lrp-add R1 foo 00:00:01:01:02:03 192.168.1.1/24
+ovn-nbctl lsp-add foo rp-foo -- set Logical_Switch_Port rp-foo \
+    type=router options:router-port=foo addresses=\"00:00:01:01:02:03\"
+
+# Connect alice to R1 as distributed router gateway port on hv2
+ovn-nbctl lrp-add R1 alice 00:00:02:01:02:03 172.16.1.1/24 \
+    -- set Logical_Router_Port alice options:redirect-chassis="hv2"
+ovn-nbctl lsp-add alice rp-alice -- set Logical_Switch_Port rp-alice \
+    type=router options:router-port=alice addresses=\"00:00:02:01:02:03\"
+
+# Create logical port foo1 in foo
+ovn-nbctl lsp-add foo foo1 \
+-- lsp-set-addresses foo1 "f0:00:00:01:02:03 192.168.1.2"
+
+# Create logical port outside1 in outside
+ovn-nbctl lsp-add outside outside1 \
+-- lsp-set-addresses outside1 "f0:00:00:01:02:04 172.16.1.3"
+
+# Create logical port alice1 in alice
+ovn-nbctl lsp-add alice alice1 \
+-- lsp-set-addresses alice1 "f0:00:00:01:02:05"
+
+# Create logical port alice2 in alice
+ovn-nbctl lsp-add alice alice2 \
+-- lsp-set-addresses alice2 "f0:00:00:01:02:06"
+
+# Create localnet port in alice
+ovn-nbctl lsp-add alice ln-alice
+ovn-nbctl lsp-set-addresses ln-alice unknown
+ovn-nbctl lsp-set-type ln-alice localnet
+ovn-nbctl lsp-set-options ln-alice network_name=phys
+
+# Create localnet port in outside
+ovn-nbctl lsp-add outside ln-outside
+ovn-nbctl lsp-set-addresses ln-outside unknown
+ovn-nbctl lsp-set-type ln-outside localnet
+ovn-nbctl lsp-set-options ln-outside network_name=phys
+
+# Create bridge-mappings on hv1 and hv3, leaving hv2 for later
+as hv1 ovs-vsctl set open . external-ids:ovn-bridge-mappings=phys:br-phys
+as hv3 ovs-vsctl set open . external-ids:ovn-bridge-mappings=phys:br-phys
+
+
+# Allow some time for ovn-northd and ovn-controller to catch up.
+# XXX This should be more systematic.
+sleep 2
+
+echo "---------NB dump-----"
+ovn-nbctl show
+echo "---------------------"
+ovn-nbctl list logical_router
+echo "---------------------"
+ovn-nbctl list logical_router_port
+echo "---------------------"
+
+echo "---------SB dump-----"
+ovn-sbctl list datapath_binding
+echo "---------------------"
+ovn-sbctl list port_binding
+echo "---------------------"
+ovn-sbctl dump-flows
+echo "---------------------"
+ovn-sbctl list chassis
+ovn-sbctl list encap
+echo "---------------------"
+
+echo "------ hv1 dump ----------"
+as hv1 ovs-ofctl show br-int
+as hv1 ovs-ofctl dump-flows br-int
+echo "------ hv2 dump ----------"
+as hv2 ovs-ofctl show br-int
+as hv2 ovs-ofctl dump-flows br-int
+echo "------ hv3 dump ----------"
+as hv3 ovs-ofctl show br-int
+as hv3 ovs-ofctl dump-flows br-int
+echo "--------------------------"
+
+# Check that redirect mapping is programmed only on hv2
+# XXX Should these checks be rewritten not to reference register numbers,
+#     in case they change in the future?  Perhaps just count lines like
+#     the arp reply flow checks below?
+AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=33 | grep reg15=0x3,metadata=0x1 | ofctl_strip])
+AT_CHECK([as hv2 ovs-ofctl dump-flows br-int table=33 | grep reg15=0x3,metadata=0x1 | ofctl_strip], [0],
+[ table=33, priority=100,reg15=0x3,metadata=0x1 actions=load:0x2->NXM_NX_REG15@<:@@:>@,resubmit(,34)
+])
+# Check that hv1 sends chassisredirect port traffic to hv2
+AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=32 | grep reg15=0x3,metadata=0x1 | ofctl_strip], [0],
+[ table=32, priority=100,reg15=0x3,metadata=0x1 actions=load:0x1->NXM_NX_TUN_ID@<:@0..23@:>@,set_field:0x3/0xffffffff->tun_metadata0,move:NXM_NX_REG14@<:@0..14@:>@->NXM_NX_TUN_METADATA0@<:@16..30@:>@,output:3
+])
+AT_CHECK([as hv2 ovs-ofctl dump-flows br-int table=32 | grep reg15=0x3,metadata=0x1 | ofctl_strip])
+# Check that arp reply is only programmed on hv2
+AT_CHECK([as hv1 ovs-ofctl dump-flows br-int | grep arp | grep =0x2,metadata=0x1 | wc -l], [0], [0
+])
+AT_CHECK([as hv2 ovs-ofctl dump-flows br-int | grep arp | grep =0x2,metadata=0x1 | wc -l], [0], [1
+])
+
+
+ip_to_hex() {
+    printf "%02x%02x%02x%02x" "$@"
+}
+
+
+: > hv2-vif1.expected
+: > hv3-vif1.expected
+
+# test_arp INPORT SHA SPA TPA [REPLY_HA]
+#
+# Causes a packet to be received on INPORT.  The packet is an ARP
+# request with SHA, SPA, and TPA as specified.  If REPLY_HA is provided, then
+# it should be the hardware address of the target to expect to receive in an
+# ARP reply; otherwise no reply is expected.
+#
+# INPORT is an logical switch port number, e.g. 11 for vif11.
+# SHA and REPLY_HA are each 12 hex digits.
+# SPA and TPA are each 8 hex digits.
+test_arp() {
+    local hv=$1 inport=$2 sha=$3 spa=$4 tpa=$5 reply_ha=$6
+    local request=ffffffffffff${sha}08060001080006040001${sha}${spa}ffffffffffff${tpa}
+    as hv$hv ovs-appctl netdev-dummy/receive hv${hv}-vif$inport $request
+
+    if test X$reply_ha != X; then
+        # Expect to receive the reply, if any.
+        local reply=${sha}${reply_ha}08060001080006040002${reply_ha}${tpa}${sha}${spa}
+        echo $reply >> hv${hv}-vif$inport.expected
+    fi
+}
+
+rtr_ip=$(ip_to_hex 172 16 1 1)
+foo_ip=$(ip_to_hex 192 168 1 2)
+outside_ip=$(ip_to_hex 172 16 1 3)
+
+echo $rtr_ip
+echo $foo_ip
+echo $outside_ip
+
+# ARP for router IP address from outside1, no response expected
+test_arp 3 1 f00000010204 $outside_ip $rtr_ip
+
+# Now check the packets actually received against the ones expected.
+OVN_CHECK_PACKETS([hv3/vif1-tx.pcap], [hv3-vif1.expected])
+
+# Send ip packet between foo1 and outside1
+src_mac="f00000010203"
+dst_mac="000001010203"
+src_ip=`ip_to_hex 192 168 1 2`
+dst_ip=`ip_to_hex 172 16 1 3`
+packet=${dst_mac}${src_mac}08004500001c0000000040110000${src_ip}${dst_ip}0035111100080000
+
+# Now check the packets actually received against the ones expected.
+OVN_CHECK_PACKETS([hv3/vif1-tx.pcap], [hv3-vif1.expected])
+
+# Now add bridge-mappings on hv2, which should make everything work
+as hv2 ovs-vsctl set open . external-ids:ovn-bridge-mappings=phys:br-phys
+
+# Allow some time for ovn-northd and ovn-controller to catch up.
+# XXX This should be more systematic.
+sleep 2
+
+# ARP for router IP address from outside1
+test_arp 3 1 f00000010204 $outside_ip $rtr_ip 000002010203
+
+# Now check the packets actually received against the ones expected.
+OVN_CHECK_PACKETS([hv3/vif1-tx.pcap], [hv3-vif1.expected])
+
+# Send ip packet between foo1 and outside1
+src_mac="f00000010203"
+dst_mac="000001010203"
+src_ip=`ip_to_hex 192 168 1 2`
+dst_ip=`ip_to_hex 172 16 1 3`
+packet=${dst_mac}${src_mac}08004500001c0000000040110000${src_ip}${dst_ip}0035111100080000
+
+# ARP request packet to expect at outside1
+src_mac="000002010203"
+src_ip=`ip_to_hex 172 16 1 1`
+arp_request=ffffffffffff${src_mac}08060001080006040001${src_mac}${src_ip}000000000000${dst_ip}
+
+as hv1 ovs-appctl netdev-dummy/receive hv1-vif1 $packet
+
+echo $arp_request >> hv3-vif1.expected
+OVN_CHECK_PACKETS([hv3/vif1-tx.pcap], [hv3-vif1.expected])
+
+# Send ARP reply from outside1 back to the router
+reply_mac="f00000010204"
+arp_reply=${src_mac}${reply_mac}08060001080006040002${reply_mac}${dst_ip}${src_mac}${src_ip}
+
+as hv3 ovs-appctl netdev-dummy/receive hv3-vif1 $arp_reply
+
+# Allow some time for ovn-northd and ovn-controller to catch up.
+# XXX This should be more systematic.
+sleep 1
+
+# Packet to Expect at outside1
+src_mac="000002010203"
+dst_mac="f00000010204"
+src_ip=`ip_to_hex 192 168 1 2`
+dst_ip=`ip_to_hex 172 16 1 3`
+expected=${dst_mac}${src_mac}08004500001c000000003f110100${src_ip}${dst_ip}0035111100080000
+
+# Resend packet from foo1 to outside1
+as hv1 ovs-appctl netdev-dummy/receive hv1-vif1 $packet
+
+echo "------ hv1 dump ----------"
+as hv1 ovs-ofctl show br-int
+as hv1 ovs-ofctl dump-flows br-int
+echo "------ hv2 dump ----------"
+as hv2 ovs-ofctl show br-int
+as hv2 ovs-ofctl dump-flows br-int
+echo "------ hv3 dump ----------"
+as hv3 ovs-ofctl show br-int
+as hv3 ovs-ofctl dump-flows br-int
+echo "----------------------------"
+
+echo $expected >> hv3-vif1.expected
+OVN_CHECK_PACKETS([hv3/vif1-tx.pcap], [hv3-vif1.expected])
+
+OVN_CLEANUP([hv1],[hv2],[hv3])
+
+AT_CLEANUP