diff mbox

[ovs-dev,v7,3/7] ovn: Introduce "chassisredirect" port binding

Message ID 1483732834-11570-4-git-send-email-mickeys.dev@gmail.com
State Superseded
Headers show

Commit Message

Mickey Spiegel Jan. 6, 2017, 8 p.m. UTC
Currently OVN handles all logical router ports in a distributed manner,
creating instances on each chassis.  The logical router ingress and
egress pipelines are traversed locally on the source chassis.

In order to support advanced features such as one-to-many NAT (aka IP
masquerading), where multiple private IP addresses spread across
multiple chassis are mapped to one public IP address, it will be
necessary to handle some of the logical router processing on a specific
chassis in a centralized manner.

The goal of this patch is to develop abstractions that allow for a
subset of router gateway traffic to be handled in a centralized manner
(e.g. one-to-many NAT traffic), while allowing for other subsets of
router gateway traffic to be handled in a distributed manner (e.g.
floating IP traffic).

This patch introduces a new type of SB port_binding called
"chassisredirect".  A "chassisredirect" port represents a particular
instance, bound to a specific chassis, of an otherwise distributed
port.  The ovn-controller on that chassis populates the "chassis"
column for this record as an indication for other ovn-controllers of
its physical location.  Other ovn-controllers do not treat this port
as a local port.

A "chassisredirect" port should never be used as an "inport".  When an
ingress pipeline sets the "outport", it may set the value to a logical
port of type "chassisredirect".  This will cause the packet to be
directed to a specific chassis to carry out the egress logical router
pipeline, in the same way that a logical switch forwards egress traffic
to a VIF port residing on a specific chassis.  At the beginning of the
egress pipeline, the "outport" will be reset to the value of the
distributed port.

For outbound traffic to be handled in a centralized manner, the
"outport" should be set to the "chassisredirect" port representing
centralized gateway functionality in the otherwise distributed router.
For outbound traffic to be handled in a distributed manner, locally on
the source chassis, the "outport" should be set to the existing "patch"
port representing distributed gateway functionality.

Inbound traffic will be directed to the appropriate chassis by
restricting source MAC address usage and ARP responses to that chassis,
or by running dynamic routing protocols.

Note that "chassisredirect" ports have no associated IP or MAC addresses.
Any pipeline stages that depend on port specific IP or MAC addresses
should be carried out in the context of the distributed port.

Although the abstraction represented by the "chassisredirect" port
binding is generalized, in this patch the "chassisredirect" port binding
is only created for NB logical router ports that specify the new
"redirect-chassis" option.  There is no explicit notion of a
"chassisredirect" port in the NB database.  The expectation is when
capabilities are implemented that take advantage of "chassisredirect"
ports (e.g. NAT), the addition of flows specifying a "chassisredirect"
port as the outport will also be triggered by the presence of the
"redirect-chassis" option.  Such flows are added for NB logical router
ports that specify the "redirect-chassis" option.

Signed-off-by: Mickey Spiegel <mickeys.dev@gmail.com>
---
 ovn/controller/binding.c        |   8 +
 ovn/controller/ovn-controller.c |   4 +
 ovn/controller/physical.c       |  63 ++++++++
 ovn/northd/ovn-northd.8.xml     | 100 +++++++++++-
 ovn/northd/ovn-northd.c         | 203 ++++++++++++++++++++++--
 ovn/ovn-nb.ovsschema            |   9 +-
 ovn/ovn-nb.xml                  |  38 +++++
 ovn/ovn-sb.xml                  |  35 +++++
 ovn/utilities/ovn-trace.c       |  43 +++++-
 tests/ovn.at                    | 334 ++++++++++++++++++++++++++++++++++++++++
 10 files changed, 816 insertions(+), 21 deletions(-)

Comments

Ben Pfaff Jan. 6, 2017, 11:20 p.m. UTC | #1
On Fri, Jan 06, 2017 at 12:00:30PM -0800, Mickey Spiegel wrote:
> Currently OVN handles all logical router ports in a distributed manner,
> creating instances on each chassis.  The logical router ingress and
> egress pipelines are traversed locally on the source chassis.
> 
> In order to support advanced features such as one-to-many NAT (aka IP
> masquerading), where multiple private IP addresses spread across
> multiple chassis are mapped to one public IP address, it will be
> necessary to handle some of the logical router processing on a specific
> chassis in a centralized manner.
> 
> The goal of this patch is to develop abstractions that allow for a
> subset of router gateway traffic to be handled in a centralized manner
> (e.g. one-to-many NAT traffic), while allowing for other subsets of
> router gateway traffic to be handled in a distributed manner (e.g.
> floating IP traffic).
> 
> This patch introduces a new type of SB port_binding called
> "chassisredirect".  A "chassisredirect" port represents a particular
> instance, bound to a specific chassis, of an otherwise distributed
> port.  The ovn-controller on that chassis populates the "chassis"
> column for this record as an indication for other ovn-controllers of
> its physical location.  Other ovn-controllers do not treat this port
> as a local port.
> 
> A "chassisredirect" port should never be used as an "inport".  When an
> ingress pipeline sets the "outport", it may set the value to a logical
> port of type "chassisredirect".  This will cause the packet to be
> directed to a specific chassis to carry out the egress logical router
> pipeline, in the same way that a logical switch forwards egress traffic
> to a VIF port residing on a specific chassis.  At the beginning of the
> egress pipeline, the "outport" will be reset to the value of the
> distributed port.
> 
> For outbound traffic to be handled in a centralized manner, the
> "outport" should be set to the "chassisredirect" port representing
> centralized gateway functionality in the otherwise distributed router.
> For outbound traffic to be handled in a distributed manner, locally on
> the source chassis, the "outport" should be set to the existing "patch"
> port representing distributed gateway functionality.
> 
> Inbound traffic will be directed to the appropriate chassis by
> restricting source MAC address usage and ARP responses to that chassis,
> or by running dynamic routing protocols.
> 
> Note that "chassisredirect" ports have no associated IP or MAC addresses.
> Any pipeline stages that depend on port specific IP or MAC addresses
> should be carried out in the context of the distributed port.
> 
> Although the abstraction represented by the "chassisredirect" port
> binding is generalized, in this patch the "chassisredirect" port binding
> is only created for NB logical router ports that specify the new
> "redirect-chassis" option.  There is no explicit notion of a
> "chassisredirect" port in the NB database.  The expectation is when
> capabilities are implemented that take advantage of "chassisredirect"
> ports (e.g. NAT), the addition of flows specifying a "chassisredirect"
> port as the outport will also be triggered by the presence of the
> "redirect-chassis" option.  Such flows are added for NB logical router
> ports that specify the "redirect-chassis" option.
> 
> Signed-off-by: Mickey Spiegel <mickeys.dev@gmail.com>

chassisredirect ports seem incredibly similar to vif ports.  Is the only
difference that the output port is changed at the beginning of the
egress pipeline?  That's something that could be implemented in the
logical egress pipeline with 'outport = "...";'.  We do say that the
outport isn't supposed to be modified in an egress pipeline, but nothing
enforces that and if it's actually useful then we could just change the
documentation.
Mickey Spiegel Jan. 6, 2017, 11:47 p.m. UTC | #2
On Fri, Jan 6, 2017 at 3:20 PM, Ben Pfaff <blp@ovn.org> wrote:

> On Fri, Jan 06, 2017 at 12:00:30PM -0800, Mickey Spiegel wrote:
> > Currently OVN handles all logical router ports in a distributed manner,
> > creating instances on each chassis.  The logical router ingress and
> > egress pipelines are traversed locally on the source chassis.
> >
> > In order to support advanced features such as one-to-many NAT (aka IP
> > masquerading), where multiple private IP addresses spread across
> > multiple chassis are mapped to one public IP address, it will be
> > necessary to handle some of the logical router processing on a specific
> > chassis in a centralized manner.
> >
> > The goal of this patch is to develop abstractions that allow for a
> > subset of router gateway traffic to be handled in a centralized manner
> > (e.g. one-to-many NAT traffic), while allowing for other subsets of
> > router gateway traffic to be handled in a distributed manner (e.g.
> > floating IP traffic).
> >
> > This patch introduces a new type of SB port_binding called
> > "chassisredirect".  A "chassisredirect" port represents a particular
> > instance, bound to a specific chassis, of an otherwise distributed
> > port.  The ovn-controller on that chassis populates the "chassis"
> > column for this record as an indication for other ovn-controllers of
> > its physical location.  Other ovn-controllers do not treat this port
> > as a local port.
> >
> > A "chassisredirect" port should never be used as an "inport".  When an
> > ingress pipeline sets the "outport", it may set the value to a logical
> > port of type "chassisredirect".  This will cause the packet to be
> > directed to a specific chassis to carry out the egress logical router
> > pipeline, in the same way that a logical switch forwards egress traffic
> > to a VIF port residing on a specific chassis.  At the beginning of the
> > egress pipeline, the "outport" will be reset to the value of the
> > distributed port.
> >
> > For outbound traffic to be handled in a centralized manner, the
> > "outport" should be set to the "chassisredirect" port representing
> > centralized gateway functionality in the otherwise distributed router.
> > For outbound traffic to be handled in a distributed manner, locally on
> > the source chassis, the "outport" should be set to the existing "patch"
> > port representing distributed gateway functionality.
> >
> > Inbound traffic will be directed to the appropriate chassis by
> > restricting source MAC address usage and ARP responses to that chassis,
> > or by running dynamic routing protocols.
> >
> > Note that "chassisredirect" ports have no associated IP or MAC addresses.
> > Any pipeline stages that depend on port specific IP or MAC addresses
> > should be carried out in the context of the distributed port.
> >
> > Although the abstraction represented by the "chassisredirect" port
> > binding is generalized, in this patch the "chassisredirect" port binding
> > is only created for NB logical router ports that specify the new
> > "redirect-chassis" option.  There is no explicit notion of a
> > "chassisredirect" port in the NB database.  The expectation is when
> > capabilities are implemented that take advantage of "chassisredirect"
> > ports (e.g. NAT), the addition of flows specifying a "chassisredirect"
> > port as the outport will also be triggered by the presence of the
> > "redirect-chassis" option.  Such flows are added for NB logical router
> > ports that specify the "redirect-chassis" option.
> >
> > Signed-off-by: Mickey Spiegel <mickeys.dev@gmail.com>
>
> chassisredirect ports seem incredibly similar to vif ports.  Is the only
> difference that the output port is changed at the beginning of the
> egress pipeline?  That's something that could be implemented in the
> logical egress pipeline with 'outport = "...";'.  We do say that the
> outport isn't supposed to be modified in an egress pipeline, but nothing
> enforces that and if it's actually useful then we could just change the
> documentation.
>

I don't get the similarity to vif ports.

I need to create two different ports for each logical router port
specifying a "redirect-chassis". One represents the centralized
instance, for traffic that needs to be centralized. The other
represents the distributed instance, i.e. just take the local patch
port and go to/from the local logical router instance. I wanted the
egress pipeline processing to be the same regardless of whether
the packet arrived at the egress pipeline on the port representing
the centralized instance, or whether the packet arrived at the
egress pipeline on the port representing the distributed instance.

There is no pipeline processing of the chassisredirect port,
except as the outport in the ingress pipeline. Everything else
happens in tables 32 and 33.

Mickey
Mickey Spiegel Jan. 6, 2017, 11:52 p.m. UTC | #3
On Fri, Jan 6, 2017 at 3:47 PM, Mickey Spiegel <mickeys.dev@gmail.com>
wrote:

>
>
> On Fri, Jan 6, 2017 at 3:20 PM, Ben Pfaff <blp@ovn.org> wrote:
>
>> On Fri, Jan 06, 2017 at 12:00:30PM -0800, Mickey Spiegel wrote:
>> > Currently OVN handles all logical router ports in a distributed manner,
>> > creating instances on each chassis.  The logical router ingress and
>> > egress pipelines are traversed locally on the source chassis.
>> >
>> > In order to support advanced features such as one-to-many NAT (aka IP
>> > masquerading), where multiple private IP addresses spread across
>> > multiple chassis are mapped to one public IP address, it will be
>> > necessary to handle some of the logical router processing on a specific
>> > chassis in a centralized manner.
>> >
>> > The goal of this patch is to develop abstractions that allow for a
>> > subset of router gateway traffic to be handled in a centralized manner
>> > (e.g. one-to-many NAT traffic), while allowing for other subsets of
>> > router gateway traffic to be handled in a distributed manner (e.g.
>> > floating IP traffic).
>> >
>> > This patch introduces a new type of SB port_binding called
>> > "chassisredirect".  A "chassisredirect" port represents a particular
>> > instance, bound to a specific chassis, of an otherwise distributed
>> > port.  The ovn-controller on that chassis populates the "chassis"
>> > column for this record as an indication for other ovn-controllers of
>> > its physical location.  Other ovn-controllers do not treat this port
>> > as a local port.
>> >
>> > A "chassisredirect" port should never be used as an "inport".  When an
>> > ingress pipeline sets the "outport", it may set the value to a logical
>> > port of type "chassisredirect".  This will cause the packet to be
>> > directed to a specific chassis to carry out the egress logical router
>> > pipeline, in the same way that a logical switch forwards egress traffic
>> > to a VIF port residing on a specific chassis.  At the beginning of the
>> > egress pipeline, the "outport" will be reset to the value of the
>> > distributed port.
>> >
>> > For outbound traffic to be handled in a centralized manner, the
>> > "outport" should be set to the "chassisredirect" port representing
>> > centralized gateway functionality in the otherwise distributed router.
>> > For outbound traffic to be handled in a distributed manner, locally on
>> > the source chassis, the "outport" should be set to the existing "patch"
>> > port representing distributed gateway functionality.
>> >
>> > Inbound traffic will be directed to the appropriate chassis by
>> > restricting source MAC address usage and ARP responses to that chassis,
>> > or by running dynamic routing protocols.
>> >
>> > Note that "chassisredirect" ports have no associated IP or MAC
>> addresses.
>> > Any pipeline stages that depend on port specific IP or MAC addresses
>> > should be carried out in the context of the distributed port.
>> >
>> > Although the abstraction represented by the "chassisredirect" port
>> > binding is generalized, in this patch the "chassisredirect" port binding
>> > is only created for NB logical router ports that specify the new
>> > "redirect-chassis" option.  There is no explicit notion of a
>> > "chassisredirect" port in the NB database.  The expectation is when
>> > capabilities are implemented that take advantage of "chassisredirect"
>> > ports (e.g. NAT), the addition of flows specifying a "chassisredirect"
>> > port as the outport will also be triggered by the presence of the
>> > "redirect-chassis" option.  Such flows are added for NB logical router
>> > ports that specify the "redirect-chassis" option.
>> >
>> > Signed-off-by: Mickey Spiegel <mickeys.dev@gmail.com>
>>
>> chassisredirect ports seem incredibly similar to vif ports.  Is the only
>> difference that the output port is changed at the beginning of the
>> egress pipeline?  That's something that could be implemented in the
>> logical egress pipeline with 'outport = "...";'.  We do say that the
>> outport isn't supposed to be modified in an egress pipeline, but nothing
>> enforces that and if it's actually useful then we could just change the
>> documentation.
>>
>
> I don't get the similarity to vif ports.
>
> I need to create two different ports for each logical router port
> specifying a "redirect-chassis". One represents the centralized
> instance, for traffic that needs to be centralized. The other
> represents the distributed instance, i.e. just take the local patch
> port and go to/from the local logical router instance. I wanted the
> egress pipeline processing to be the same regardless of whether
> the packet arrived at the egress pipeline on the port representing
> the centralized instance, or whether the packet arrived at the
> egress pipeline on the port representing the distributed instance.
>
> There is no pipeline processing of the chassisredirect port,
> except as the outport in the ingress pipeline. Everything else
> happens in tables 32 and 33.
>

Elaborating a little further. All I need is a way to sometimes tell
table 32 to go to a certain chassis, and other times to let table 32
send it to table 33 locally, on a packet by packet basis, with the
same outport by the time the packet gets to the egress pipeline.
Since table 32 works on a port basis, I created a port
representation.

Mickey
Ben Pfaff Jan. 7, 2017, 12:11 a.m. UTC | #4
On Fri, Jan 06, 2017 at 03:47:03PM -0800, Mickey Spiegel wrote:
> On Fri, Jan 6, 2017 at 3:20 PM, Ben Pfaff <blp@ovn.org> wrote:
> 
> > On Fri, Jan 06, 2017 at 12:00:30PM -0800, Mickey Spiegel wrote:
> > > Currently OVN handles all logical router ports in a distributed manner,
> > > creating instances on each chassis.  The logical router ingress and
> > > egress pipelines are traversed locally on the source chassis.
> > >
> > > In order to support advanced features such as one-to-many NAT (aka IP
> > > masquerading), where multiple private IP addresses spread across
> > > multiple chassis are mapped to one public IP address, it will be
> > > necessary to handle some of the logical router processing on a specific
> > > chassis in a centralized manner.
> > >
> > > The goal of this patch is to develop abstractions that allow for a
> > > subset of router gateway traffic to be handled in a centralized manner
> > > (e.g. one-to-many NAT traffic), while allowing for other subsets of
> > > router gateway traffic to be handled in a distributed manner (e.g.
> > > floating IP traffic).
> > >
> > > This patch introduces a new type of SB port_binding called
> > > "chassisredirect".  A "chassisredirect" port represents a particular
> > > instance, bound to a specific chassis, of an otherwise distributed
> > > port.  The ovn-controller on that chassis populates the "chassis"
> > > column for this record as an indication for other ovn-controllers of
> > > its physical location.  Other ovn-controllers do not treat this port
> > > as a local port.
> > >
> > > A "chassisredirect" port should never be used as an "inport".  When an
> > > ingress pipeline sets the "outport", it may set the value to a logical
> > > port of type "chassisredirect".  This will cause the packet to be
> > > directed to a specific chassis to carry out the egress logical router
> > > pipeline, in the same way that a logical switch forwards egress traffic
> > > to a VIF port residing on a specific chassis.  At the beginning of the
> > > egress pipeline, the "outport" will be reset to the value of the
> > > distributed port.
> > >
> > > For outbound traffic to be handled in a centralized manner, the
> > > "outport" should be set to the "chassisredirect" port representing
> > > centralized gateway functionality in the otherwise distributed router.
> > > For outbound traffic to be handled in a distributed manner, locally on
> > > the source chassis, the "outport" should be set to the existing "patch"
> > > port representing distributed gateway functionality.
> > >
> > > Inbound traffic will be directed to the appropriate chassis by
> > > restricting source MAC address usage and ARP responses to that chassis,
> > > or by running dynamic routing protocols.
> > >
> > > Note that "chassisredirect" ports have no associated IP or MAC addresses.
> > > Any pipeline stages that depend on port specific IP or MAC addresses
> > > should be carried out in the context of the distributed port.
> > >
> > > Although the abstraction represented by the "chassisredirect" port
> > > binding is generalized, in this patch the "chassisredirect" port binding
> > > is only created for NB logical router ports that specify the new
> > > "redirect-chassis" option.  There is no explicit notion of a
> > > "chassisredirect" port in the NB database.  The expectation is when
> > > capabilities are implemented that take advantage of "chassisredirect"
> > > ports (e.g. NAT), the addition of flows specifying a "chassisredirect"
> > > port as the outport will also be triggered by the presence of the
> > > "redirect-chassis" option.  Such flows are added for NB logical router
> > > ports that specify the "redirect-chassis" option.
> > >
> > > Signed-off-by: Mickey Spiegel <mickeys.dev@gmail.com>
> >
> > chassisredirect ports seem incredibly similar to vif ports.  Is the only
> > difference that the output port is changed at the beginning of the
> > egress pipeline?  That's something that could be implemented in the
> > logical egress pipeline with 'outport = "...";'.  We do say that the
> > outport isn't supposed to be modified in an egress pipeline, but nothing
> > enforces that and if it's actually useful then we could just change the
> > documentation.
> >
> 
> I don't get the similarity to vif ports.
> 
> I need to create two different ports for each logical router port
> specifying a "redirect-chassis". One represents the centralized
> instance, for traffic that needs to be centralized. The other
> represents the distributed instance, i.e. just take the local patch
> port and go to/from the local logical router instance. I wanted the
> egress pipeline processing to be the same regardless of whether
> the packet arrived at the egress pipeline on the port representing
> the centralized instance, or whether the packet arrived at the
> egress pipeline on the port representing the distributed instance.
> 
> There is no pipeline processing of the chassisredirect port,
> except as the outport in the ingress pipeline. Everything else
> happens in tables 32 and 33.

OK, then I'm having trouble following the description.  For me, here's
the key paragraphs that led me to my conclusions:

    This patch introduces a new type of SB port_binding called
    "chassisredirect".  A "chassisredirect" port represents a particular
    instance, bound to a specific chassis, of an otherwise distributed
    port.  The ovn-controller on that chassis populates the "chassis"
    column for this record as an indication for other ovn-controllers of
    its physical location.  Other ovn-controllers do not treat this port
    as a local port.

    A "chassisredirect" port should never be used as an "inport".  When
    an ingress pipeline sets the "outport", it may set the value to a
    logical port of type "chassisredirect".  This will cause the packet
    to be directed to a specific chassis to carry out the egress logical
    router pipeline, in the same way that a logical switch forwards
    egress traffic to a VIF port residing on a specific chassis.  At the
    beginning of the egress pipeline, the "outport" will be reset to the
    value of the distributed port.

The first paragraph appears to say that a chassisredirect port is a port
on a particular chassis and that its chassis column says what chassis
it's on.  OK, that's the same as a vif port, right?

The second paragraph appears to me to say, first, that packets would
never originate from a chassisredirect port.  OK, fine, no problem.
Second, it directly makes an analogy to vif ports, and then says that
the outport changes.  No problem.

I guess that I must be missing important points, but that's why I
interpreted the text as I did.  Can you help me figure out why I'm not
following?

Thanks,

Ben.
Mickey Spiegel Jan. 7, 2017, 12:21 a.m. UTC | #5
On Fri, Jan 6, 2017 at 4:11 PM, Ben Pfaff <blp@ovn.org> wrote:

> On Fri, Jan 06, 2017 at 03:47:03PM -0800, Mickey Spiegel wrote:
> > On Fri, Jan 6, 2017 at 3:20 PM, Ben Pfaff <blp@ovn.org> wrote:
> >
> > > On Fri, Jan 06, 2017 at 12:00:30PM -0800, Mickey Spiegel wrote:
> > > > Currently OVN handles all logical router ports in a distributed
> manner,
> > > > creating instances on each chassis.  The logical router ingress and
> > > > egress pipelines are traversed locally on the source chassis.
> > > >
> > > > In order to support advanced features such as one-to-many NAT (aka IP
> > > > masquerading), where multiple private IP addresses spread across
> > > > multiple chassis are mapped to one public IP address, it will be
> > > > necessary to handle some of the logical router processing on a
> specific
> > > > chassis in a centralized manner.
> > > >
> > > > The goal of this patch is to develop abstractions that allow for a
> > > > subset of router gateway traffic to be handled in a centralized
> manner
> > > > (e.g. one-to-many NAT traffic), while allowing for other subsets of
> > > > router gateway traffic to be handled in a distributed manner (e.g.
> > > > floating IP traffic).
> > > >
> > > > This patch introduces a new type of SB port_binding called
> > > > "chassisredirect".  A "chassisredirect" port represents a particular
> > > > instance, bound to a specific chassis, of an otherwise distributed
> > > > port.  The ovn-controller on that chassis populates the "chassis"
> > > > column for this record as an indication for other ovn-controllers of
> > > > its physical location.  Other ovn-controllers do not treat this port
> > > > as a local port.
> > > >
> > > > A "chassisredirect" port should never be used as an "inport".  When
> an
> > > > ingress pipeline sets the "outport", it may set the value to a
> logical
> > > > port of type "chassisredirect".  This will cause the packet to be
> > > > directed to a specific chassis to carry out the egress logical router
> > > > pipeline, in the same way that a logical switch forwards egress
> traffic
> > > > to a VIF port residing on a specific chassis.  At the beginning of
> the
> > > > egress pipeline, the "outport" will be reset to the value of the
> > > > distributed port.
> > > >
> > > > For outbound traffic to be handled in a centralized manner, the
> > > > "outport" should be set to the "chassisredirect" port representing
> > > > centralized gateway functionality in the otherwise distributed
> router.
> > > > For outbound traffic to be handled in a distributed manner, locally
> on
> > > > the source chassis, the "outport" should be set to the existing
> "patch"
> > > > port representing distributed gateway functionality.
> > > >
> > > > Inbound traffic will be directed to the appropriate chassis by
> > > > restricting source MAC address usage and ARP responses to that
> chassis,
> > > > or by running dynamic routing protocols.
> > > >
> > > > Note that "chassisredirect" ports have no associated IP or MAC
> addresses.
> > > > Any pipeline stages that depend on port specific IP or MAC addresses
> > > > should be carried out in the context of the distributed port.
> > > >
> > > > Although the abstraction represented by the "chassisredirect" port
> > > > binding is generalized, in this patch the "chassisredirect" port
> binding
> > > > is only created for NB logical router ports that specify the new
> > > > "redirect-chassis" option.  There is no explicit notion of a
> > > > "chassisredirect" port in the NB database.  The expectation is when
> > > > capabilities are implemented that take advantage of "chassisredirect"
> > > > ports (e.g. NAT), the addition of flows specifying a
> "chassisredirect"
> > > > port as the outport will also be triggered by the presence of the
> > > > "redirect-chassis" option.  Such flows are added for NB logical
> router
> > > > ports that specify the "redirect-chassis" option.
> > > >
> > > > Signed-off-by: Mickey Spiegel <mickeys.dev@gmail.com>
> > >
> > > chassisredirect ports seem incredibly similar to vif ports.  Is the
> only
> > > difference that the output port is changed at the beginning of the
> > > egress pipeline?  That's something that could be implemented in the
> > > logical egress pipeline with 'outport = "...";'.  We do say that the
> > > outport isn't supposed to be modified in an egress pipeline, but
> nothing
> > > enforces that and if it's actually useful then we could just change the
> > > documentation.
> > >
> >
> > I don't get the similarity to vif ports.
> >
> > I need to create two different ports for each logical router port
> > specifying a "redirect-chassis". One represents the centralized
> > instance, for traffic that needs to be centralized. The other
> > represents the distributed instance, i.e. just take the local patch
> > port and go to/from the local logical router instance. I wanted the
> > egress pipeline processing to be the same regardless of whether
> > the packet arrived at the egress pipeline on the port representing
> > the centralized instance, or whether the packet arrived at the
> > egress pipeline on the port representing the distributed instance.
> >
> > There is no pipeline processing of the chassisredirect port,
> > except as the outport in the ingress pipeline. Everything else
> > happens in tables 32 and 33.
>
> OK, then I'm having trouble following the description.  For me, here's
> the key paragraphs that led me to my conclusions:
>
>     This patch introduces a new type of SB port_binding called
>     "chassisredirect".  A "chassisredirect" port represents a particular
>     instance, bound to a specific chassis, of an otherwise distributed
>     port.  The ovn-controller on that chassis populates the "chassis"
>     column for this record as an indication for other ovn-controllers of
>     its physical location.  Other ovn-controllers do not treat this port
>     as a local port.
>
>     A "chassisredirect" port should never be used as an "inport".  When
>     an ingress pipeline sets the "outport", it may set the value to a
>     logical port of type "chassisredirect".  This will cause the packet
>     to be directed to a specific chassis to carry out the egress logical
>     router pipeline, in the same way that a logical switch forwards
>     egress traffic to a VIF port residing on a specific chassis.  At the
>     beginning of the egress pipeline, the "outport" will be reset to the
>     value of the distributed port.
>
> The first paragraph appears to say that a chassisredirect port is a port
> on a particular chassis and that its chassis column says what chassis
> it's on.  OK, that's the same as a vif port, right?
>

Yes, the same as vif, l2gateway, or l3gateway in the sense that this
port is bound to a chassis. No differences there.

>
> The second paragraph appears to me to say, first, that packets would
> never originate from a chassisredirect port.  OK, fine, no problem.
> Second, it directly makes an analogy to vif ports, and then says that
> the outport changes.  No problem.
>

Two main differences from vif:
1. The outport changes. I want the ct_zone assignments in table 33
   and the loopback check in table 34 to be according to the new
   outport.

2. There is no pipeline processing of this port. This port has no
   addresses or other configuration. The purpose of the port is to
   tell table 32 to go to a particular chassis, and then tell table 33
   what the real outport should be.

I got to this notion because a port is the way to tell table 32 to
go to a particular chassis. The first thought was two regular patch
ports, but the idea of two patch ports with the same addresses
is confusing and dangerous. By changing back to the real patch
port right away in the egress pipeline, it avoids those problems.

Mickey


> I guess that I must be missing important points, but that's why I
> interpreted the text as I did.  Can you help me figure out why I'm not
> following?
>
> Thanks,
>
> Ben.
>
Mickey Spiegel Jan. 7, 2017, 4:31 a.m. UTC | #6
On Fri, Jan 6, 2017 at 4:21 PM, Mickey Spiegel <mickeys.dev@gmail.com>
wrote:

>
> On Fri, Jan 6, 2017 at 4:11 PM, Ben Pfaff <blp@ovn.org> wrote:
>
>> On Fri, Jan 06, 2017 at 03:47:03PM -0800, Mickey Spiegel wrote:
>> > On Fri, Jan 6, 2017 at 3:20 PM, Ben Pfaff <blp@ovn.org> wrote:
>> >
>> > > On Fri, Jan 06, 2017 at 12:00:30PM -0800, Mickey Spiegel wrote:
>> > > > Currently OVN handles all logical router ports in a distributed
>> manner,
>> > > > creating instances on each chassis.  The logical router ingress and
>> > > > egress pipelines are traversed locally on the source chassis.
>> > > >
>> > > > In order to support advanced features such as one-to-many NAT (aka
>> IP
>> > > > masquerading), where multiple private IP addresses spread across
>> > > > multiple chassis are mapped to one public IP address, it will be
>> > > > necessary to handle some of the logical router processing on a
>> specific
>> > > > chassis in a centralized manner.
>> > > >
>> > > > The goal of this patch is to develop abstractions that allow for a
>> > > > subset of router gateway traffic to be handled in a centralized
>> manner
>> > > > (e.g. one-to-many NAT traffic), while allowing for other subsets of
>> > > > router gateway traffic to be handled in a distributed manner (e.g.
>> > > > floating IP traffic).
>> > > >
>> > > > This patch introduces a new type of SB port_binding called
>> > > > "chassisredirect".  A "chassisredirect" port represents a particular
>> > > > instance, bound to a specific chassis, of an otherwise distributed
>> > > > port.  The ovn-controller on that chassis populates the "chassis"
>> > > > column for this record as an indication for other ovn-controllers of
>> > > > its physical location.  Other ovn-controllers do not treat this port
>> > > > as a local port.
>> > > >
>> > > > A "chassisredirect" port should never be used as an "inport".  When
>> an
>> > > > ingress pipeline sets the "outport", it may set the value to a
>> logical
>> > > > port of type "chassisredirect".  This will cause the packet to be
>> > > > directed to a specific chassis to carry out the egress logical
>> router
>> > > > pipeline, in the same way that a logical switch forwards egress
>> traffic
>> > > > to a VIF port residing on a specific chassis.  At the beginning of
>> the
>> > > > egress pipeline, the "outport" will be reset to the value of the
>> > > > distributed port.
>> > > >
>> > > > For outbound traffic to be handled in a centralized manner, the
>> > > > "outport" should be set to the "chassisredirect" port representing
>> > > > centralized gateway functionality in the otherwise distributed
>> router.
>> > > > For outbound traffic to be handled in a distributed manner, locally
>> on
>> > > > the source chassis, the "outport" should be set to the existing
>> "patch"
>> > > > port representing distributed gateway functionality.
>> > > >
>> > > > Inbound traffic will be directed to the appropriate chassis by
>> > > > restricting source MAC address usage and ARP responses to that
>> chassis,
>> > > > or by running dynamic routing protocols.
>> > > >
>> > > > Note that "chassisredirect" ports have no associated IP or MAC
>> addresses.
>> > > > Any pipeline stages that depend on port specific IP or MAC addresses
>> > > > should be carried out in the context of the distributed port.
>> > > >
>> > > > Although the abstraction represented by the "chassisredirect" port
>> > > > binding is generalized, in this patch the "chassisredirect" port
>> binding
>> > > > is only created for NB logical router ports that specify the new
>> > > > "redirect-chassis" option.  There is no explicit notion of a
>> > > > "chassisredirect" port in the NB database.  The expectation is when
>> > > > capabilities are implemented that take advantage of
>> "chassisredirect"
>> > > > ports (e.g. NAT), the addition of flows specifying a
>> "chassisredirect"
>> > > > port as the outport will also be triggered by the presence of the
>> > > > "redirect-chassis" option.  Such flows are added for NB logical
>> router
>> > > > ports that specify the "redirect-chassis" option.
>> > > >
>> > > > Signed-off-by: Mickey Spiegel <mickeys.dev@gmail.com>
>> > >
>> > > chassisredirect ports seem incredibly similar to vif ports.  Is the
>> only
>> > > difference that the output port is changed at the beginning of the
>> > > egress pipeline?  That's something that could be implemented in the
>> > > logical egress pipeline with 'outport = "...";'.  We do say that the
>> > > outport isn't supposed to be modified in an egress pipeline, but
>> nothing
>> > > enforces that and if it's actually useful then we could just change
>> the
>> > > documentation.
>> > >
>> >
>> > I don't get the similarity to vif ports.
>> >
>> > I need to create two different ports for each logical router port
>> > specifying a "redirect-chassis". One represents the centralized
>> > instance, for traffic that needs to be centralized. The other
>> > represents the distributed instance, i.e. just take the local patch
>> > port and go to/from the local logical router instance. I wanted the
>> > egress pipeline processing to be the same regardless of whether
>> > the packet arrived at the egress pipeline on the port representing
>> > the centralized instance, or whether the packet arrived at the
>> > egress pipeline on the port representing the distributed instance.
>> >
>> > There is no pipeline processing of the chassisredirect port,
>> > except as the outport in the ingress pipeline. Everything else
>> > happens in tables 32 and 33.
>>
>> OK, then I'm having trouble following the description.  For me, here's
>> the key paragraphs that led me to my conclusions:
>>
>>     This patch introduces a new type of SB port_binding called
>>     "chassisredirect".  A "chassisredirect" port represents a particular
>>     instance, bound to a specific chassis, of an otherwise distributed
>>     port.  The ovn-controller on that chassis populates the "chassis"
>>     column for this record as an indication for other ovn-controllers of
>>     its physical location.  Other ovn-controllers do not treat this port
>>     as a local port.
>>
>>     A "chassisredirect" port should never be used as an "inport".  When
>>     an ingress pipeline sets the "outport", it may set the value to a
>>     logical port of type "chassisredirect".  This will cause the packet
>>     to be directed to a specific chassis to carry out the egress logical
>>     router pipeline, in the same way that a logical switch forwards
>>     egress traffic to a VIF port residing on a specific chassis.  At the
>>     beginning of the egress pipeline, the "outport" will be reset to the
>>     value of the distributed port.
>>
>> The first paragraph appears to say that a chassisredirect port is a port
>> on a particular chassis and that its chassis column says what chassis
>> it's on.  OK, that's the same as a vif port, right?
>>
>
> Yes, the same as vif, l2gateway, or l3gateway in the sense that this
> port is bound to a chassis. No differences there.
>
>>
>> The second paragraph appears to me to say, first, that packets would
>> never originate from a chassisredirect port.  OK, fine, no problem.
>> Second, it directly makes an analogy to vif ports, and then says that
>> the outport changes.  No problem.
>>
>
> Two main differences from vif:
> 1. The outport changes. I want the ct_zone assignments in table 33
>    and the loopback check in table 34 to be according to the new
>    outport.
>
> 2. There is no pipeline processing of this port. This port has no
>    addresses or other configuration. The purpose of the port is to
>    tell table 32 to go to a particular chassis, and then tell table 33
>    what the real outport should be.
>
> I got to this notion because a port is the way to tell table 32 to
> go to a particular chassis. The first thought was two regular patch
> ports, but the idea of two patch ports with the same addresses
> is confusing and dangerous. By changing back to the real patch
> port right away in the egress pipeline, it avoids those problems.
>
> Mickey
>

Let me go back to first principles. I need three sorts of chassis
specific behaviors for distributed NAT:
1. Install some flows only on the chassis where a certain logical
   port resides. That is is_chassis_resident which you already
   reviewed and acked. The nat flows patch at the end of the
   patch set uses this mechanism.
2. Install a different set of flows associated with the distributed
   gateway port only on the redirect-chassis. There are several
   such flows in this patch.
3. Direct some traffic with outport being the distributed gateway
   port to the instance of the distributed gateway port on the
   redirect-chassis. When this traffic hits table 32, it gets
   sent through the normal tunnel to the redirect-chassis.

I needed some handle that triggers 3. I decided to make that
handle be a port, which I called a "chassisredirect" port. That
also allows me to use is_chassis_resident(chassisredirect_port)
to solve 2.

It is possible to make that handle be something other than a
port, as long as table 32 is modified to act on that. In that case,
I will need another match "condition" (as I called it) based on
that handle, similar to is_chassis_resident but based on
whatever handle we decide on instead of port.

Mickey


>
>> I guess that I must be missing important points, but that's why I
>> interpreted the text as I did.  Can you help me figure out why I'm not
>> following?
>>
>> Thanks,
>>
>> Ben.
>>
>
>
Mickey Spiegel Jan. 9, 2017, 6:30 a.m. UTC | #7
On Fri, Jan 6, 2017 at 8:31 PM, Mickey Spiegel <mickeys.dev@gmail.com>
wrote:

>
> On Fri, Jan 6, 2017 at 4:21 PM, Mickey Spiegel <mickeys.dev@gmail.com>
> wrote:
>
>>
>> On Fri, Jan 6, 2017 at 4:11 PM, Ben Pfaff <blp@ovn.org> wrote:
>>
>>> On Fri, Jan 06, 2017 at 03:47:03PM -0800, Mickey Spiegel wrote:
>>> > On Fri, Jan 6, 2017 at 3:20 PM, Ben Pfaff <blp@ovn.org> wrote:
>>> >
>>> > > On Fri, Jan 06, 2017 at 12:00:30PM -0800, Mickey Spiegel wrote:
>>> > > > Currently OVN handles all logical router ports in a distributed
>>> manner,
>>> > > > creating instances on each chassis.  The logical router ingress and
>>> > > > egress pipelines are traversed locally on the source chassis.
>>> > > >
>>> > > > In order to support advanced features such as one-to-many NAT (aka
>>> IP
>>> > > > masquerading), where multiple private IP addresses spread across
>>> > > > multiple chassis are mapped to one public IP address, it will be
>>> > > > necessary to handle some of the logical router processing on a
>>> specific
>>> > > > chassis in a centralized manner.
>>> > > >
>>> > > > The goal of this patch is to develop abstractions that allow for a
>>> > > > subset of router gateway traffic to be handled in a centralized
>>> manner
>>> > > > (e.g. one-to-many NAT traffic), while allowing for other subsets of
>>> > > > router gateway traffic to be handled in a distributed manner (e.g.
>>> > > > floating IP traffic).
>>> > > >
>>> > > > This patch introduces a new type of SB port_binding called
>>> > > > "chassisredirect".  A "chassisredirect" port represents a
>>> particular
>>> > > > instance, bound to a specific chassis, of an otherwise distributed
>>> > > > port.  The ovn-controller on that chassis populates the "chassis"
>>> > > > column for this record as an indication for other ovn-controllers
>>> of
>>> > > > its physical location.  Other ovn-controllers do not treat this
>>> port
>>> > > > as a local port.
>>> > > >
>>> > > > A "chassisredirect" port should never be used as an "inport".
>>> When an
>>> > > > ingress pipeline sets the "outport", it may set the value to a
>>> logical
>>> > > > port of type "chassisredirect".  This will cause the packet to be
>>> > > > directed to a specific chassis to carry out the egress logical
>>> router
>>> > > > pipeline, in the same way that a logical switch forwards egress
>>> traffic
>>> > > > to a VIF port residing on a specific chassis.  At the beginning of
>>> the
>>> > > > egress pipeline, the "outport" will be reset to the value of the
>>> > > > distributed port.
>>> > > >
>>> > > > For outbound traffic to be handled in a centralized manner, the
>>> > > > "outport" should be set to the "chassisredirect" port representing
>>> > > > centralized gateway functionality in the otherwise distributed
>>> router.
>>> > > > For outbound traffic to be handled in a distributed manner,
>>> locally on
>>> > > > the source chassis, the "outport" should be set to the existing
>>> "patch"
>>> > > > port representing distributed gateway functionality.
>>> > > >
>>> > > > Inbound traffic will be directed to the appropriate chassis by
>>> > > > restricting source MAC address usage and ARP responses to that
>>> chassis,
>>> > > > or by running dynamic routing protocols.
>>> > > >
>>> > > > Note that "chassisredirect" ports have no associated IP or MAC
>>> addresses.
>>> > > > Any pipeline stages that depend on port specific IP or MAC
>>> addresses
>>> > > > should be carried out in the context of the distributed port.
>>> > > >
>>> > > > Although the abstraction represented by the "chassisredirect" port
>>> > > > binding is generalized, in this patch the "chassisredirect" port
>>> binding
>>> > > > is only created for NB logical router ports that specify the new
>>> > > > "redirect-chassis" option.  There is no explicit notion of a
>>> > > > "chassisredirect" port in the NB database.  The expectation is when
>>> > > > capabilities are implemented that take advantage of
>>> "chassisredirect"
>>> > > > ports (e.g. NAT), the addition of flows specifying a
>>> "chassisredirect"
>>> > > > port as the outport will also be triggered by the presence of the
>>> > > > "redirect-chassis" option.  Such flows are added for NB logical
>>> router
>>> > > > ports that specify the "redirect-chassis" option.
>>> > > >
>>> > > > Signed-off-by: Mickey Spiegel <mickeys.dev@gmail.com>
>>> > >
>>> > > chassisredirect ports seem incredibly similar to vif ports.  Is the
>>> only
>>> > > difference that the output port is changed at the beginning of the
>>> > > egress pipeline?  That's something that could be implemented in the
>>> > > logical egress pipeline with 'outport = "...";'.  We do say that the
>>> > > outport isn't supposed to be modified in an egress pipeline, but
>>> nothing
>>> > > enforces that and if it's actually useful then we could just change
>>> the
>>> > > documentation.
>>> > >
>>> >
>>> > I don't get the similarity to vif ports.
>>> >
>>> > I need to create two different ports for each logical router port
>>> > specifying a "redirect-chassis". One represents the centralized
>>> > instance, for traffic that needs to be centralized. The other
>>> > represents the distributed instance, i.e. just take the local patch
>>> > port and go to/from the local logical router instance. I wanted the
>>> > egress pipeline processing to be the same regardless of whether
>>> > the packet arrived at the egress pipeline on the port representing
>>> > the centralized instance, or whether the packet arrived at the
>>> > egress pipeline on the port representing the distributed instance.
>>> >
>>> > There is no pipeline processing of the chassisredirect port,
>>> > except as the outport in the ingress pipeline. Everything else
>>> > happens in tables 32 and 33.
>>>
>>> OK, then I'm having trouble following the description.  For me, here's
>>> the key paragraphs that led me to my conclusions:
>>>
>>>     This patch introduces a new type of SB port_binding called
>>>     "chassisredirect".  A "chassisredirect" port represents a particular
>>>     instance, bound to a specific chassis, of an otherwise distributed
>>>     port.  The ovn-controller on that chassis populates the "chassis"
>>>     column for this record as an indication for other ovn-controllers of
>>>     its physical location.  Other ovn-controllers do not treat this port
>>>     as a local port.
>>>
>>>     A "chassisredirect" port should never be used as an "inport".  When
>>>     an ingress pipeline sets the "outport", it may set the value to a
>>>     logical port of type "chassisredirect".  This will cause the packet
>>>     to be directed to a specific chassis to carry out the egress logical
>>>     router pipeline, in the same way that a logical switch forwards
>>>     egress traffic to a VIF port residing on a specific chassis.  At the
>>>     beginning of the egress pipeline, the "outport" will be reset to the
>>>     value of the distributed port.
>>>
>>> The first paragraph appears to say that a chassisredirect port is a port
>>> on a particular chassis and that its chassis column says what chassis
>>> it's on.  OK, that's the same as a vif port, right?
>>>
>>
>> Yes, the same as vif, l2gateway, or l3gateway in the sense that this
>> port is bound to a chassis. No differences there.
>>
>>>
>>> The second paragraph appears to me to say, first, that packets would
>>> never originate from a chassisredirect port.  OK, fine, no problem.
>>> Second, it directly makes an analogy to vif ports, and then says that
>>> the outport changes.  No problem.
>>>
>>
>> Two main differences from vif:
>> 1. The outport changes. I want the ct_zone assignments in table 33
>>    and the loopback check in table 34 to be according to the new
>>    outport.
>>
>> 2. There is no pipeline processing of this port. This port has no
>>    addresses or other configuration. The purpose of the port is to
>>    tell table 32 to go to a particular chassis, and then tell table 33
>>    what the real outport should be.
>>
>> I got to this notion because a port is the way to tell table 32 to
>> go to a particular chassis. The first thought was two regular patch
>> ports, but the idea of two patch ports with the same addresses
>> is confusing and dangerous. By changing back to the real patch
>> port right away in the egress pipeline, it avoids those problems.
>>
>> Mickey
>>
>
> Let me go back to first principles. I need three sorts of chassis
> specific behaviors for distributed NAT:
> 1. Install some flows only on the chassis where a certain logical
>    port resides. That is is_chassis_resident which you already
>    reviewed and acked. The nat flows patch at the end of the
>    patch set uses this mechanism.
> 2. Install a different set of flows associated with the distributed
>    gateway port only on the redirect-chassis. There are several
>    such flows in this patch.
> 3. Direct some traffic with outport being the distributed gateway
>    port to the instance of the distributed gateway port on the
>    redirect-chassis. When this traffic hits table 32, it gets
>    sent through the normal tunnel to the redirect-chassis.
>
> I needed some handle that triggers 3. I decided to make that
> handle be a port, which I called a "chassisredirect" port. That
> also allows me to use is_chassis_resident(chassisredirect_port)
> to solve 2.
>
> It is possible to make that handle be something other than a
> port, as long as table 32 is modified to act on that. In that case,
> I will need another match "condition" (as I called it) based on
> that handle, similar to is_chassis_resident but based on
> whatever handle we decide on instead of port.
>

I realized earlier tonight that there is a straightforward
alternative, though it does have one potentially confusing
aspect.

For some reason, I had been assuming that a port_binding is
either exclusive to a chassis (in the previous implementation
with OVS patch ports, it had an ofport), or the port_binding
exists everywhere and does not have a chassis association
(is_remote in the previous implementation with OVS patch
ports).

If this is relaxed and we allow logical patch ports to be
associated with a chassis, then all I need is a new
MLF_FORCE_CHASSIS_REDIRECT flag rather than
a second port_binding with a new "chassisredirect" type.

The potentially confusing aspect is that even though the
mechanism for associating a logical patch port with a
chassis is identical to that for other port_binding types such
as "l3gateway", the association of a chassis with a logical
patch port has a different meaning than the association of a
chassis with a VIF, a type "l3gateway" port_binding, or a
type "l2gateway" port_binding.  For the latter, the association
is exclusive, i.e. the port only exists on that chassis.  For
logical patch ports, whether there is an association with a
chassis or not, the logical patch port exists everywhere
(subject to the constraints of conditional monitoring).

The chassis association would only be used for a new
table 32 flow similar to other flows sending packets to
remote hypervisors for other port_binding types, but with
a different match condition:
    match_set_metadata(&match, htonll(dp_key))
    match_set_reg(&match, MFF_LOG_OUTPORT - MFF_REG0, port_key);
    match_set_reg_masked(&match, MFF_LOG_FLAGS - MFF_REG0,
                         1, MLF_FORCE_CHASSIS_REDIRECT);

Depending on whether the
MLF_FORCE_CHASSIS_REDIRECT flag is set, the
packet would either be sent to the remote hypervisor,
or it would fall through to the table 32 priority 0 fallback
flow and be processed locally.

The chassis association could also be used for
evaluation of is_chassis_resident("l3dgw_port") functions
in flow matches.

If you agree that this approach is more promising than
type "chassisredirect" ports, I can code this up tomorrow.

Mickey



> Mickey
>
>
>>
>>> I guess that I must be missing important points, but that's why I
>>> interpreted the text as I did.  Can you help me figure out why I'm not
>>> following?
>>>
>>> Thanks,
>>>
>>> Ben.
>>>
>>
>>
>
Mickey Spiegel Jan. 13, 2017, 1:12 a.m. UTC | #8
On Sun, Jan 8, 2017 at 10:30 PM, Mickey Spiegel <mickeys.dev@gmail.com>
wrote:

>
> On Fri, Jan 6, 2017 at 8:31 PM, Mickey Spiegel <mickeys.dev@gmail.com>
> wrote:
>
>>
>> On Fri, Jan 6, 2017 at 4:21 PM, Mickey Spiegel <mickeys.dev@gmail.com>
>> wrote:
>>
>>>
>>> On Fri, Jan 6, 2017 at 4:11 PM, Ben Pfaff <blp@ovn.org> wrote:
>>>
>>>> On Fri, Jan 06, 2017 at 03:47:03PM -0800, Mickey Spiegel wrote:
>>>> > On Fri, Jan 6, 2017 at 3:20 PM, Ben Pfaff <blp@ovn.org> wrote:
>>>> >
>>>> > > On Fri, Jan 06, 2017 at 12:00:30PM -0800, Mickey Spiegel wrote:
>>>> > > > Currently OVN handles all logical router ports in a distributed
>>>> manner,
>>>> > > > creating instances on each chassis.  The logical router ingress
>>>> and
>>>> > > > egress pipelines are traversed locally on the source chassis.
>>>> > > >
>>>> > > > In order to support advanced features such as one-to-many NAT
>>>> (aka IP
>>>> > > > masquerading), where multiple private IP addresses spread across
>>>> > > > multiple chassis are mapped to one public IP address, it will be
>>>> > > > necessary to handle some of the logical router processing on a
>>>> specific
>>>> > > > chassis in a centralized manner.
>>>> > > >
>>>> > > > The goal of this patch is to develop abstractions that allow for a
>>>> > > > subset of router gateway traffic to be handled in a centralized
>>>> manner
>>>> > > > (e.g. one-to-many NAT traffic), while allowing for other subsets
>>>> of
>>>> > > > router gateway traffic to be handled in a distributed manner (e.g.
>>>> > > > floating IP traffic).
>>>> > > >
>>>> > > > This patch introduces a new type of SB port_binding called
>>>> > > > "chassisredirect".  A "chassisredirect" port represents a
>>>> particular
>>>> > > > instance, bound to a specific chassis, of an otherwise distributed
>>>> > > > port.  The ovn-controller on that chassis populates the "chassis"
>>>> > > > column for this record as an indication for other ovn-controllers
>>>> of
>>>> > > > its physical location.  Other ovn-controllers do not treat this
>>>> port
>>>> > > > as a local port.
>>>> > > >
>>>> > > > A "chassisredirect" port should never be used as an "inport".
>>>> When an
>>>> > > > ingress pipeline sets the "outport", it may set the value to a
>>>> logical
>>>> > > > port of type "chassisredirect".  This will cause the packet to be
>>>> > > > directed to a specific chassis to carry out the egress logical
>>>> router
>>>> > > > pipeline, in the same way that a logical switch forwards egress
>>>> traffic
>>>> > > > to a VIF port residing on a specific chassis.  At the beginning
>>>> of the
>>>> > > > egress pipeline, the "outport" will be reset to the value of the
>>>> > > > distributed port.
>>>> > > >
>>>> > > > For outbound traffic to be handled in a centralized manner, the
>>>> > > > "outport" should be set to the "chassisredirect" port representing
>>>> > > > centralized gateway functionality in the otherwise distributed
>>>> router.
>>>> > > > For outbound traffic to be handled in a distributed manner,
>>>> locally on
>>>> > > > the source chassis, the "outport" should be set to the existing
>>>> "patch"
>>>> > > > port representing distributed gateway functionality.
>>>> > > >
>>>> > > > Inbound traffic will be directed to the appropriate chassis by
>>>> > > > restricting source MAC address usage and ARP responses to that
>>>> chassis,
>>>> > > > or by running dynamic routing protocols.
>>>> > > >
>>>> > > > Note that "chassisredirect" ports have no associated IP or MAC
>>>> addresses.
>>>> > > > Any pipeline stages that depend on port specific IP or MAC
>>>> addresses
>>>> > > > should be carried out in the context of the distributed port.
>>>> > > >
>>>> > > > Although the abstraction represented by the "chassisredirect" port
>>>> > > > binding is generalized, in this patch the "chassisredirect" port
>>>> binding
>>>> > > > is only created for NB logical router ports that specify the new
>>>> > > > "redirect-chassis" option.  There is no explicit notion of a
>>>> > > > "chassisredirect" port in the NB database.  The expectation is
>>>> when
>>>> > > > capabilities are implemented that take advantage of
>>>> "chassisredirect"
>>>> > > > ports (e.g. NAT), the addition of flows specifying a
>>>> "chassisredirect"
>>>> > > > port as the outport will also be triggered by the presence of the
>>>> > > > "redirect-chassis" option.  Such flows are added for NB logical
>>>> router
>>>> > > > ports that specify the "redirect-chassis" option.
>>>> > > >
>>>> > > > Signed-off-by: Mickey Spiegel <mickeys.dev@gmail.com>
>>>> > >
>>>> > > chassisredirect ports seem incredibly similar to vif ports.  Is the
>>>> only
>>>> > > difference that the output port is changed at the beginning of the
>>>> > > egress pipeline?  That's something that could be implemented in the
>>>> > > logical egress pipeline with 'outport = "...";'.  We do say that the
>>>> > > outport isn't supposed to be modified in an egress pipeline, but
>>>> nothing
>>>> > > enforces that and if it's actually useful then we could just change
>>>> the
>>>> > > documentation.
>>>> > >
>>>> >
>>>> > I don't get the similarity to vif ports.
>>>> >
>>>> > I need to create two different ports for each logical router port
>>>> > specifying a "redirect-chassis". One represents the centralized
>>>> > instance, for traffic that needs to be centralized. The other
>>>> > represents the distributed instance, i.e. just take the local patch
>>>> > port and go to/from the local logical router instance. I wanted the
>>>> > egress pipeline processing to be the same regardless of whether
>>>> > the packet arrived at the egress pipeline on the port representing
>>>> > the centralized instance, or whether the packet arrived at the
>>>> > egress pipeline on the port representing the distributed instance.
>>>> >
>>>> > There is no pipeline processing of the chassisredirect port,
>>>> > except as the outport in the ingress pipeline. Everything else
>>>> > happens in tables 32 and 33.
>>>>
>>>> OK, then I'm having trouble following the description.  For me, here's
>>>> the key paragraphs that led me to my conclusions:
>>>>
>>>>     This patch introduces a new type of SB port_binding called
>>>>     "chassisredirect".  A "chassisredirect" port represents a particular
>>>>     instance, bound to a specific chassis, of an otherwise distributed
>>>>     port.  The ovn-controller on that chassis populates the "chassis"
>>>>     column for this record as an indication for other ovn-controllers of
>>>>     its physical location.  Other ovn-controllers do not treat this port
>>>>     as a local port.
>>>>
>>>>     A "chassisredirect" port should never be used as an "inport".  When
>>>>     an ingress pipeline sets the "outport", it may set the value to a
>>>>     logical port of type "chassisredirect".  This will cause the packet
>>>>     to be directed to a specific chassis to carry out the egress logical
>>>>     router pipeline, in the same way that a logical switch forwards
>>>>     egress traffic to a VIF port residing on a specific chassis.  At the
>>>>     beginning of the egress pipeline, the "outport" will be reset to the
>>>>     value of the distributed port.
>>>>
>>>> The first paragraph appears to say that a chassisredirect port is a port
>>>> on a particular chassis and that its chassis column says what chassis
>>>> it's on.  OK, that's the same as a vif port, right?
>>>>
>>>
>>> Yes, the same as vif, l2gateway, or l3gateway in the sense that this
>>> port is bound to a chassis. No differences there.
>>>
>>>>
>>>> The second paragraph appears to me to say, first, that packets would
>>>> never originate from a chassisredirect port.  OK, fine, no problem.
>>>> Second, it directly makes an analogy to vif ports, and then says that
>>>> the outport changes.  No problem.
>>>>
>>>
>>> Two main differences from vif:
>>> 1. The outport changes. I want the ct_zone assignments in table 33
>>>    and the loopback check in table 34 to be according to the new
>>>    outport.
>>>
>>> 2. There is no pipeline processing of this port. This port has no
>>>    addresses or other configuration. The purpose of the port is to
>>>    tell table 32 to go to a particular chassis, and then tell table 33
>>>    what the real outport should be.
>>>
>>> I got to this notion because a port is the way to tell table 32 to
>>> go to a particular chassis. The first thought was two regular patch
>>> ports, but the idea of two patch ports with the same addresses
>>> is confusing and dangerous. By changing back to the real patch
>>> port right away in the egress pipeline, it avoids those problems.
>>>
>>> Mickey
>>>
>>
>> Let me go back to first principles. I need three sorts of chassis
>> specific behaviors for distributed NAT:
>> 1. Install some flows only on the chassis where a certain logical
>>    port resides. That is is_chassis_resident which you already
>>    reviewed and acked. The nat flows patch at the end of the
>>    patch set uses this mechanism.
>> 2. Install a different set of flows associated with the distributed
>>    gateway port only on the redirect-chassis. There are several
>>    such flows in this patch.
>> 3. Direct some traffic with outport being the distributed gateway
>>    port to the instance of the distributed gateway port on the
>>    redirect-chassis. When this traffic hits table 32, it gets
>>    sent through the normal tunnel to the redirect-chassis.
>>
>> I needed some handle that triggers 3. I decided to make that
>> handle be a port, which I called a "chassisredirect" port. That
>> also allows me to use is_chassis_resident(chassisredirect_port)
>> to solve 2.
>>
>> It is possible to make that handle be something other than a
>> port, as long as table 32 is modified to act on that. In that case,
>> I will need another match "condition" (as I called it) based on
>> that handle, similar to is_chassis_resident but based on
>> whatever handle we decide on instead of port.
>>
>
> I realized earlier tonight that there is a straightforward
> alternative, though it does have one potentially confusing
> aspect.
>
> For some reason, I had been assuming that a port_binding is
> either exclusive to a chassis (in the previous implementation
> with OVS patch ports, it had an ofport), or the port_binding
> exists everywhere and does not have a chassis association
> (is_remote in the previous implementation with OVS patch
> ports).
>
> If this is relaxed and we allow logical patch ports to be
> associated with a chassis, then all I need is a new
> MLF_FORCE_CHASSIS_REDIRECT flag rather than
> a second port_binding with a new "chassisredirect" type.
>
> The potentially confusing aspect is that even though the
> mechanism for associating a logical patch port with a
> chassis is identical to that for other port_binding types such
> as "l3gateway", the association of a chassis with a logical
> patch port has a different meaning than the association of a
> chassis with a VIF, a type "l3gateway" port_binding, or a
> type "l2gateway" port_binding.  For the latter, the association
> is exclusive, i.e. the port only exists on that chassis.  For
> logical patch ports, whether there is an association with a
> chassis or not, the logical patch port exists everywhere
> (subject to the constraints of conditional monitoring).
>
> The chassis association would only be used for a new
> table 32 flow similar to other flows sending packets to
> remote hypervisors for other port_binding types, but with
> a different match condition:
>     match_set_metadata(&match, htonll(dp_key))
>     match_set_reg(&match, MFF_LOG_OUTPORT - MFF_REG0, port_key);
>     match_set_reg_masked(&match, MFF_LOG_FLAGS - MFF_REG0,
>                          1, MLF_FORCE_CHASSIS_REDIRECT);
>
> Depending on whether the
> MLF_FORCE_CHASSIS_REDIRECT flag is set, the
> packet would either be sent to the remote hypervisor,
> or it would fall through to the table 32 priority 0 fallback
> flow and be processed locally.
>
> The chassis association could also be used for
> evaluation of is_chassis_resident("l3dgw_port") functions
> in flow matches.
>
> If you agree that this approach is more promising than
> type "chassisredirect" ports, I can code this up tomorrow.
>

I am having trouble making this approach work with the
ARP request table. With the approach of replacing the
logical outport, the ARP request goes to the controller
with the new outport of type "chassisredirect". When the
packet is reinjected, it does eventually end up at the
redirect chassis.

With the approach of using a flag, the packet is not
hitting the table 32 entry matching the flag. I am not sure
what happens to the packet after it goes up to the
controller, and I am not sure how to debug it further or
what to change to make it work.

Mickey


> Mickey
>
>
>
>> Mickey
>>
>>
>>>
>>>> I guess that I must be missing important points, but that's why I
>>>> interpreted the text as I did.  Can you help me figure out why I'm not
>>>> following?
>>>>
>>>> Thanks,
>>>>
>>>> Ben.
>>>>
>>>
>>>
>>
>
Mickey Spiegel Jan. 13, 2017, 10:19 p.m. UTC | #9
On Thu, Jan 12, 2017 at 5:12 PM, Mickey Spiegel <mickeys.dev@gmail.com>
wrote:

>
> On Sun, Jan 8, 2017 at 10:30 PM, Mickey Spiegel <mickeys.dev@gmail.com>
> wrote:
>
>>
>> On Fri, Jan 6, 2017 at 8:31 PM, Mickey Spiegel <mickeys.dev@gmail.com>
>> wrote:
>>
>>>
>>> On Fri, Jan 6, 2017 at 4:21 PM, Mickey Spiegel <mickeys.dev@gmail.com>
>>> wrote:
>>>
>>>>
>>>> On Fri, Jan 6, 2017 at 4:11 PM, Ben Pfaff <blp@ovn.org> wrote:
>>>>
>>>>> On Fri, Jan 06, 2017 at 03:47:03PM -0800, Mickey Spiegel wrote:
>>>>> > On Fri, Jan 6, 2017 at 3:20 PM, Ben Pfaff <blp@ovn.org> wrote:
>>>>> >
>>>>> > > On Fri, Jan 06, 2017 at 12:00:30PM -0800, Mickey Spiegel wrote:
>>>>> > > > Currently OVN handles all logical router ports in a distributed
>>>>> manner,
>>>>> > > > creating instances on each chassis.  The logical router ingress
>>>>> and
>>>>> > > > egress pipelines are traversed locally on the source chassis.
>>>>> > > >
>>>>> > > > In order to support advanced features such as one-to-many NAT
>>>>> (aka IP
>>>>> > > > masquerading), where multiple private IP addresses spread across
>>>>> > > > multiple chassis are mapped to one public IP address, it will be
>>>>> > > > necessary to handle some of the logical router processing on a
>>>>> specific
>>>>> > > > chassis in a centralized manner.
>>>>> > > >
>>>>> > > > The goal of this patch is to develop abstractions that allow for
>>>>> a
>>>>> > > > subset of router gateway traffic to be handled in a centralized
>>>>> manner
>>>>> > > > (e.g. one-to-many NAT traffic), while allowing for other subsets
>>>>> of
>>>>> > > > router gateway traffic to be handled in a distributed manner
>>>>> (e.g.
>>>>> > > > floating IP traffic).
>>>>> > > >
>>>>> > > > This patch introduces a new type of SB port_binding called
>>>>> > > > "chassisredirect".  A "chassisredirect" port represents a
>>>>> particular
>>>>> > > > instance, bound to a specific chassis, of an otherwise
>>>>> distributed
>>>>> > > > port.  The ovn-controller on that chassis populates the "chassis"
>>>>> > > > column for this record as an indication for other
>>>>> ovn-controllers of
>>>>> > > > its physical location.  Other ovn-controllers do not treat this
>>>>> port
>>>>> > > > as a local port.
>>>>> > > >
>>>>> > > > A "chassisredirect" port should never be used as an "inport".
>>>>> When an
>>>>> > > > ingress pipeline sets the "outport", it may set the value to a
>>>>> logical
>>>>> > > > port of type "chassisredirect".  This will cause the packet to be
>>>>> > > > directed to a specific chassis to carry out the egress logical
>>>>> router
>>>>> > > > pipeline, in the same way that a logical switch forwards egress
>>>>> traffic
>>>>> > > > to a VIF port residing on a specific chassis.  At the beginning
>>>>> of the
>>>>> > > > egress pipeline, the "outport" will be reset to the value of the
>>>>> > > > distributed port.
>>>>> > > >
>>>>> > > > For outbound traffic to be handled in a centralized manner, the
>>>>> > > > "outport" should be set to the "chassisredirect" port
>>>>> representing
>>>>> > > > centralized gateway functionality in the otherwise distributed
>>>>> router.
>>>>> > > > For outbound traffic to be handled in a distributed manner,
>>>>> locally on
>>>>> > > > the source chassis, the "outport" should be set to the existing
>>>>> "patch"
>>>>> > > > port representing distributed gateway functionality.
>>>>> > > >
>>>>> > > > Inbound traffic will be directed to the appropriate chassis by
>>>>> > > > restricting source MAC address usage and ARP responses to that
>>>>> chassis,
>>>>> > > > or by running dynamic routing protocols.
>>>>> > > >
>>>>> > > > Note that "chassisredirect" ports have no associated IP or MAC
>>>>> addresses.
>>>>> > > > Any pipeline stages that depend on port specific IP or MAC
>>>>> addresses
>>>>> > > > should be carried out in the context of the distributed port.
>>>>> > > >
>>>>> > > > Although the abstraction represented by the "chassisredirect"
>>>>> port
>>>>> > > > binding is generalized, in this patch the "chassisredirect" port
>>>>> binding
>>>>> > > > is only created for NB logical router ports that specify the new
>>>>> > > > "redirect-chassis" option.  There is no explicit notion of a
>>>>> > > > "chassisredirect" port in the NB database.  The expectation is
>>>>> when
>>>>> > > > capabilities are implemented that take advantage of
>>>>> "chassisredirect"
>>>>> > > > ports (e.g. NAT), the addition of flows specifying a
>>>>> "chassisredirect"
>>>>> > > > port as the outport will also be triggered by the presence of the
>>>>> > > > "redirect-chassis" option.  Such flows are added for NB logical
>>>>> router
>>>>> > > > ports that specify the "redirect-chassis" option.
>>>>> > > >
>>>>> > > > Signed-off-by: Mickey Spiegel <mickeys.dev@gmail.com>
>>>>> > >
>>>>> > > chassisredirect ports seem incredibly similar to vif ports.  Is
>>>>> the only
>>>>> > > difference that the output port is changed at the beginning of the
>>>>> > > egress pipeline?  That's something that could be implemented in the
>>>>> > > logical egress pipeline with 'outport = "...";'.  We do say that
>>>>> the
>>>>> > > outport isn't supposed to be modified in an egress pipeline, but
>>>>> nothing
>>>>> > > enforces that and if it's actually useful then we could just
>>>>> change the
>>>>> > > documentation.
>>>>> > >
>>>>> >
>>>>> > I don't get the similarity to vif ports.
>>>>> >
>>>>> > I need to create two different ports for each logical router port
>>>>> > specifying a "redirect-chassis". One represents the centralized
>>>>> > instance, for traffic that needs to be centralized. The other
>>>>> > represents the distributed instance, i.e. just take the local patch
>>>>> > port and go to/from the local logical router instance. I wanted the
>>>>> > egress pipeline processing to be the same regardless of whether
>>>>> > the packet arrived at the egress pipeline on the port representing
>>>>> > the centralized instance, or whether the packet arrived at the
>>>>> > egress pipeline on the port representing the distributed instance.
>>>>> >
>>>>> > There is no pipeline processing of the chassisredirect port,
>>>>> > except as the outport in the ingress pipeline. Everything else
>>>>> > happens in tables 32 and 33.
>>>>>
>>>>> OK, then I'm having trouble following the description.  For me, here's
>>>>> the key paragraphs that led me to my conclusions:
>>>>>
>>>>>     This patch introduces a new type of SB port_binding called
>>>>>     "chassisredirect".  A "chassisredirect" port represents a
>>>>> particular
>>>>>     instance, bound to a specific chassis, of an otherwise distributed
>>>>>     port.  The ovn-controller on that chassis populates the "chassis"
>>>>>     column for this record as an indication for other ovn-controllers
>>>>> of
>>>>>     its physical location.  Other ovn-controllers do not treat this
>>>>> port
>>>>>     as a local port.
>>>>>
>>>>>     A "chassisredirect" port should never be used as an "inport".  When
>>>>>     an ingress pipeline sets the "outport", it may set the value to a
>>>>>     logical port of type "chassisredirect".  This will cause the packet
>>>>>     to be directed to a specific chassis to carry out the egress
>>>>> logical
>>>>>     router pipeline, in the same way that a logical switch forwards
>>>>>     egress traffic to a VIF port residing on a specific chassis.  At
>>>>> the
>>>>>     beginning of the egress pipeline, the "outport" will be reset to
>>>>> the
>>>>>     value of the distributed port.
>>>>>
>>>>> The first paragraph appears to say that a chassisredirect port is a
>>>>> port
>>>>> on a particular chassis and that its chassis column says what chassis
>>>>> it's on.  OK, that's the same as a vif port, right?
>>>>>
>>>>
>>>> Yes, the same as vif, l2gateway, or l3gateway in the sense that this
>>>> port is bound to a chassis. No differences there.
>>>>
>>>>>
>>>>> The second paragraph appears to me to say, first, that packets would
>>>>> never originate from a chassisredirect port.  OK, fine, no problem.
>>>>> Second, it directly makes an analogy to vif ports, and then says that
>>>>> the outport changes.  No problem.
>>>>>
>>>>
>>>> Two main differences from vif:
>>>> 1. The outport changes. I want the ct_zone assignments in table 33
>>>>    and the loopback check in table 34 to be according to the new
>>>>    outport.
>>>>
>>>> 2. There is no pipeline processing of this port. This port has no
>>>>    addresses or other configuration. The purpose of the port is to
>>>>    tell table 32 to go to a particular chassis, and then tell table 33
>>>>    what the real outport should be.
>>>>
>>>> I got to this notion because a port is the way to tell table 32 to
>>>> go to a particular chassis. The first thought was two regular patch
>>>> ports, but the idea of two patch ports with the same addresses
>>>> is confusing and dangerous. By changing back to the real patch
>>>> port right away in the egress pipeline, it avoids those problems.
>>>>
>>>> Mickey
>>>>
>>>
>>> Let me go back to first principles. I need three sorts of chassis
>>> specific behaviors for distributed NAT:
>>> 1. Install some flows only on the chassis where a certain logical
>>>    port resides. That is is_chassis_resident which you already
>>>    reviewed and acked. The nat flows patch at the end of the
>>>    patch set uses this mechanism.
>>> 2. Install a different set of flows associated with the distributed
>>>    gateway port only on the redirect-chassis. There are several
>>>    such flows in this patch.
>>> 3. Direct some traffic with outport being the distributed gateway
>>>    port to the instance of the distributed gateway port on the
>>>    redirect-chassis. When this traffic hits table 32, it gets
>>>    sent through the normal tunnel to the redirect-chassis.
>>>
>>> I needed some handle that triggers 3. I decided to make that
>>> handle be a port, which I called a "chassisredirect" port. That
>>> also allows me to use is_chassis_resident(chassisredirect_port)
>>> to solve 2.
>>>
>>> It is possible to make that handle be something other than a
>>> port, as long as table 32 is modified to act on that. In that case,
>>> I will need another match "condition" (as I called it) based on
>>> that handle, similar to is_chassis_resident but based on
>>> whatever handle we decide on instead of port.
>>>
>>
>> I realized earlier tonight that there is a straightforward
>> alternative, though it does have one potentially confusing
>> aspect.
>>
>> For some reason, I had been assuming that a port_binding is
>> either exclusive to a chassis (in the previous implementation
>> with OVS patch ports, it had an ofport), or the port_binding
>> exists everywhere and does not have a chassis association
>> (is_remote in the previous implementation with OVS patch
>> ports).
>>
>> If this is relaxed and we allow logical patch ports to be
>> associated with a chassis, then all I need is a new
>> MLF_FORCE_CHASSIS_REDIRECT flag rather than
>> a second port_binding with a new "chassisredirect" type.
>>
>> The potentially confusing aspect is that even though the
>> mechanism for associating a logical patch port with a
>> chassis is identical to that for other port_binding types such
>> as "l3gateway", the association of a chassis with a logical
>> patch port has a different meaning than the association of a
>> chassis with a VIF, a type "l3gateway" port_binding, or a
>> type "l2gateway" port_binding.  For the latter, the association
>> is exclusive, i.e. the port only exists on that chassis.  For
>> logical patch ports, whether there is an association with a
>> chassis or not, the logical patch port exists everywhere
>> (subject to the constraints of conditional monitoring).
>>
>> The chassis association would only be used for a new
>> table 32 flow similar to other flows sending packets to
>> remote hypervisors for other port_binding types, but with
>> a different match condition:
>>     match_set_metadata(&match, htonll(dp_key))
>>     match_set_reg(&match, MFF_LOG_OUTPORT - MFF_REG0, port_key);
>>     match_set_reg_masked(&match, MFF_LOG_FLAGS - MFF_REG0,
>>                          1, MLF_FORCE_CHASSIS_REDIRECT);
>>
>> Depending on whether the
>> MLF_FORCE_CHASSIS_REDIRECT flag is set, the
>> packet would either be sent to the remote hypervisor,
>> or it would fall through to the table 32 priority 0 fallback
>> flow and be processed locally.
>>
>> The chassis association could also be used for
>> evaluation of is_chassis_resident("l3dgw_port") functions
>> in flow matches.
>>
>> If you agree that this approach is more promising than
>> type "chassisredirect" ports, I can code this up tomorrow.
>>
>
> I am having trouble making this approach work with the
> ARP request table. With the approach of replacing the
> logical outport, the ARP request goes to the controller
> with the new outport of type "chassisredirect". When the
> packet is reinjected, it does eventually end up at the
> redirect chassis.
>
> With the approach of using a flag, the packet is not
> hitting the table 32 entry matching the flag. I am not sure
> what happens to the packet after it goes up to the
> controller, and I am not sure how to debug it further or
> what to change to make it work.
>

I found the bug. It was affecting all packets, not just arp, and
was a simple fix. I am still checking all scenarios, but I think
I have the approach with the flag instead of a new port type
working. I can move forward with either approach, a flag or
a new port type as originally proposed.

Mickey


>
> Mickey
>
>
>> Mickey
>>
>>
>>
>>> Mickey
>>>
>>>
>>>>
>>>>> I guess that I must be missing important points, but that's why I
>>>>> interpreted the text as I did.  Can you help me figure out why I'm not
>>>>> following?
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Ben.
>>>>>
>>>>
>>>>
>>>
>>
>
Ben Pfaff Jan. 14, 2017, 12:21 a.m. UTC | #10
On Fri, Jan 13, 2017 at 02:19:21PM -0800, Mickey Spiegel wrote:
> On Thu, Jan 12, 2017 at 5:12 PM, Mickey Spiegel <mickeys.dev@gmail.com>
> wrote:
> 
> >
> > On Sun, Jan 8, 2017 at 10:30 PM, Mickey Spiegel <mickeys.dev@gmail.com>
> > wrote:
> >
> >>
> >> On Fri, Jan 6, 2017 at 8:31 PM, Mickey Spiegel <mickeys.dev@gmail.com>
> >> wrote:
> >>
> >>>
> >>> On Fri, Jan 6, 2017 at 4:21 PM, Mickey Spiegel <mickeys.dev@gmail.com>
> >>> wrote:
> >>>
> >>>>
> >>>> On Fri, Jan 6, 2017 at 4:11 PM, Ben Pfaff <blp@ovn.org> wrote:
> >>>>
> >>>>> On Fri, Jan 06, 2017 at 03:47:03PM -0800, Mickey Spiegel wrote:
> >>>>> > On Fri, Jan 6, 2017 at 3:20 PM, Ben Pfaff <blp@ovn.org> wrote:
> >>>>> >
> >>>>> > > On Fri, Jan 06, 2017 at 12:00:30PM -0800, Mickey Spiegel wrote:
> >>>>> > > > Currently OVN handles all logical router ports in a distributed
> >>>>> manner,
> >>>>> > > > creating instances on each chassis.  The logical router ingress
> >>>>> and
> >>>>> > > > egress pipelines are traversed locally on the source chassis.
> >>>>> > > >
> >>>>> > > > In order to support advanced features such as one-to-many NAT
> >>>>> (aka IP
> >>>>> > > > masquerading), where multiple private IP addresses spread across
> >>>>> > > > multiple chassis are mapped to one public IP address, it will be
> >>>>> > > > necessary to handle some of the logical router processing on a
> >>>>> specific
> >>>>> > > > chassis in a centralized manner.
> >>>>> > > >
> >>>>> > > > The goal of this patch is to develop abstractions that allow for
> >>>>> a
> >>>>> > > > subset of router gateway traffic to be handled in a centralized
> >>>>> manner
> >>>>> > > > (e.g. one-to-many NAT traffic), while allowing for other subsets
> >>>>> of
> >>>>> > > > router gateway traffic to be handled in a distributed manner
> >>>>> (e.g.
> >>>>> > > > floating IP traffic).
> >>>>> > > >
> >>>>> > > > This patch introduces a new type of SB port_binding called
> >>>>> > > > "chassisredirect".  A "chassisredirect" port represents a
> >>>>> particular
> >>>>> > > > instance, bound to a specific chassis, of an otherwise
> >>>>> distributed
> >>>>> > > > port.  The ovn-controller on that chassis populates the "chassis"
> >>>>> > > > column for this record as an indication for other
> >>>>> ovn-controllers of
> >>>>> > > > its physical location.  Other ovn-controllers do not treat this
> >>>>> port
> >>>>> > > > as a local port.
> >>>>> > > >
> >>>>> > > > A "chassisredirect" port should never be used as an "inport".
> >>>>> When an
> >>>>> > > > ingress pipeline sets the "outport", it may set the value to a
> >>>>> logical
> >>>>> > > > port of type "chassisredirect".  This will cause the packet to be
> >>>>> > > > directed to a specific chassis to carry out the egress logical
> >>>>> router
> >>>>> > > > pipeline, in the same way that a logical switch forwards egress
> >>>>> traffic
> >>>>> > > > to a VIF port residing on a specific chassis.  At the beginning
> >>>>> of the
> >>>>> > > > egress pipeline, the "outport" will be reset to the value of the
> >>>>> > > > distributed port.
> >>>>> > > >
> >>>>> > > > For outbound traffic to be handled in a centralized manner, the
> >>>>> > > > "outport" should be set to the "chassisredirect" port
> >>>>> representing
> >>>>> > > > centralized gateway functionality in the otherwise distributed
> >>>>> router.
> >>>>> > > > For outbound traffic to be handled in a distributed manner,
> >>>>> locally on
> >>>>> > > > the source chassis, the "outport" should be set to the existing
> >>>>> "patch"
> >>>>> > > > port representing distributed gateway functionality.
> >>>>> > > >
> >>>>> > > > Inbound traffic will be directed to the appropriate chassis by
> >>>>> > > > restricting source MAC address usage and ARP responses to that
> >>>>> chassis,
> >>>>> > > > or by running dynamic routing protocols.
> >>>>> > > >
> >>>>> > > > Note that "chassisredirect" ports have no associated IP or MAC
> >>>>> addresses.
> >>>>> > > > Any pipeline stages that depend on port specific IP or MAC
> >>>>> addresses
> >>>>> > > > should be carried out in the context of the distributed port.
> >>>>> > > >
> >>>>> > > > Although the abstraction represented by the "chassisredirect"
> >>>>> port
> >>>>> > > > binding is generalized, in this patch the "chassisredirect" port
> >>>>> binding
> >>>>> > > > is only created for NB logical router ports that specify the new
> >>>>> > > > "redirect-chassis" option.  There is no explicit notion of a
> >>>>> > > > "chassisredirect" port in the NB database.  The expectation is
> >>>>> when
> >>>>> > > > capabilities are implemented that take advantage of
> >>>>> "chassisredirect"
> >>>>> > > > ports (e.g. NAT), the addition of flows specifying a
> >>>>> "chassisredirect"
> >>>>> > > > port as the outport will also be triggered by the presence of the
> >>>>> > > > "redirect-chassis" option.  Such flows are added for NB logical
> >>>>> router
> >>>>> > > > ports that specify the "redirect-chassis" option.
> >>>>> > > >
> >>>>> > > > Signed-off-by: Mickey Spiegel <mickeys.dev@gmail.com>
> >>>>> > >
> >>>>> > > chassisredirect ports seem incredibly similar to vif ports.  Is
> >>>>> the only
> >>>>> > > difference that the output port is changed at the beginning of the
> >>>>> > > egress pipeline?  That's something that could be implemented in the
> >>>>> > > logical egress pipeline with 'outport = "...";'.  We do say that
> >>>>> the
> >>>>> > > outport isn't supposed to be modified in an egress pipeline, but
> >>>>> nothing
> >>>>> > > enforces that and if it's actually useful then we could just
> >>>>> change the
> >>>>> > > documentation.
> >>>>> > >
> >>>>> >
> >>>>> > I don't get the similarity to vif ports.
> >>>>> >
> >>>>> > I need to create two different ports for each logical router port
> >>>>> > specifying a "redirect-chassis". One represents the centralized
> >>>>> > instance, for traffic that needs to be centralized. The other
> >>>>> > represents the distributed instance, i.e. just take the local patch
> >>>>> > port and go to/from the local logical router instance. I wanted the
> >>>>> > egress pipeline processing to be the same regardless of whether
> >>>>> > the packet arrived at the egress pipeline on the port representing
> >>>>> > the centralized instance, or whether the packet arrived at the
> >>>>> > egress pipeline on the port representing the distributed instance.
> >>>>> >
> >>>>> > There is no pipeline processing of the chassisredirect port,
> >>>>> > except as the outport in the ingress pipeline. Everything else
> >>>>> > happens in tables 32 and 33.
> >>>>>
> >>>>> OK, then I'm having trouble following the description.  For me, here's
> >>>>> the key paragraphs that led me to my conclusions:
> >>>>>
> >>>>>     This patch introduces a new type of SB port_binding called
> >>>>>     "chassisredirect".  A "chassisredirect" port represents a
> >>>>> particular
> >>>>>     instance, bound to a specific chassis, of an otherwise distributed
> >>>>>     port.  The ovn-controller on that chassis populates the "chassis"
> >>>>>     column for this record as an indication for other ovn-controllers
> >>>>> of
> >>>>>     its physical location.  Other ovn-controllers do not treat this
> >>>>> port
> >>>>>     as a local port.
> >>>>>
> >>>>>     A "chassisredirect" port should never be used as an "inport".  When
> >>>>>     an ingress pipeline sets the "outport", it may set the value to a
> >>>>>     logical port of type "chassisredirect".  This will cause the packet
> >>>>>     to be directed to a specific chassis to carry out the egress
> >>>>> logical
> >>>>>     router pipeline, in the same way that a logical switch forwards
> >>>>>     egress traffic to a VIF port residing on a specific chassis.  At
> >>>>> the
> >>>>>     beginning of the egress pipeline, the "outport" will be reset to
> >>>>> the
> >>>>>     value of the distributed port.
> >>>>>
> >>>>> The first paragraph appears to say that a chassisredirect port is a
> >>>>> port
> >>>>> on a particular chassis and that its chassis column says what chassis
> >>>>> it's on.  OK, that's the same as a vif port, right?
> >>>>>
> >>>>
> >>>> Yes, the same as vif, l2gateway, or l3gateway in the sense that this
> >>>> port is bound to a chassis. No differences there.
> >>>>
> >>>>>
> >>>>> The second paragraph appears to me to say, first, that packets would
> >>>>> never originate from a chassisredirect port.  OK, fine, no problem.
> >>>>> Second, it directly makes an analogy to vif ports, and then says that
> >>>>> the outport changes.  No problem.
> >>>>>
> >>>>
> >>>> Two main differences from vif:
> >>>> 1. The outport changes. I want the ct_zone assignments in table 33
> >>>>    and the loopback check in table 34 to be according to the new
> >>>>    outport.
> >>>>
> >>>> 2. There is no pipeline processing of this port. This port has no
> >>>>    addresses or other configuration. The purpose of the port is to
> >>>>    tell table 32 to go to a particular chassis, and then tell table 33
> >>>>    what the real outport should be.
> >>>>
> >>>> I got to this notion because a port is the way to tell table 32 to
> >>>> go to a particular chassis. The first thought was two regular patch
> >>>> ports, but the idea of two patch ports with the same addresses
> >>>> is confusing and dangerous. By changing back to the real patch
> >>>> port right away in the egress pipeline, it avoids those problems.
> >>>>
> >>>> Mickey
> >>>>
> >>>
> >>> Let me go back to first principles. I need three sorts of chassis
> >>> specific behaviors for distributed NAT:
> >>> 1. Install some flows only on the chassis where a certain logical
> >>>    port resides. That is is_chassis_resident which you already
> >>>    reviewed and acked. The nat flows patch at the end of the
> >>>    patch set uses this mechanism.
> >>> 2. Install a different set of flows associated with the distributed
> >>>    gateway port only on the redirect-chassis. There are several
> >>>    such flows in this patch.
> >>> 3. Direct some traffic with outport being the distributed gateway
> >>>    port to the instance of the distributed gateway port on the
> >>>    redirect-chassis. When this traffic hits table 32, it gets
> >>>    sent through the normal tunnel to the redirect-chassis.
> >>>
> >>> I needed some handle that triggers 3. I decided to make that
> >>> handle be a port, which I called a "chassisredirect" port. That
> >>> also allows me to use is_chassis_resident(chassisredirect_port)
> >>> to solve 2.
> >>>
> >>> It is possible to make that handle be something other than a
> >>> port, as long as table 32 is modified to act on that. In that case,
> >>> I will need another match "condition" (as I called it) based on
> >>> that handle, similar to is_chassis_resident but based on
> >>> whatever handle we decide on instead of port.
> >>>
> >>
> >> I realized earlier tonight that there is a straightforward
> >> alternative, though it does have one potentially confusing
> >> aspect.
> >>
> >> For some reason, I had been assuming that a port_binding is
> >> either exclusive to a chassis (in the previous implementation
> >> with OVS patch ports, it had an ofport), or the port_binding
> >> exists everywhere and does not have a chassis association
> >> (is_remote in the previous implementation with OVS patch
> >> ports).
> >>
> >> If this is relaxed and we allow logical patch ports to be
> >> associated with a chassis, then all I need is a new
> >> MLF_FORCE_CHASSIS_REDIRECT flag rather than
> >> a second port_binding with a new "chassisredirect" type.
> >>
> >> The potentially confusing aspect is that even though the
> >> mechanism for associating a logical patch port with a
> >> chassis is identical to that for other port_binding types such
> >> as "l3gateway", the association of a chassis with a logical
> >> patch port has a different meaning than the association of a
> >> chassis with a VIF, a type "l3gateway" port_binding, or a
> >> type "l2gateway" port_binding.  For the latter, the association
> >> is exclusive, i.e. the port only exists on that chassis.  For
> >> logical patch ports, whether there is an association with a
> >> chassis or not, the logical patch port exists everywhere
> >> (subject to the constraints of conditional monitoring).
> >>
> >> The chassis association would only be used for a new
> >> table 32 flow similar to other flows sending packets to
> >> remote hypervisors for other port_binding types, but with
> >> a different match condition:
> >>     match_set_metadata(&match, htonll(dp_key))
> >>     match_set_reg(&match, MFF_LOG_OUTPORT - MFF_REG0, port_key);
> >>     match_set_reg_masked(&match, MFF_LOG_FLAGS - MFF_REG0,
> >>                          1, MLF_FORCE_CHASSIS_REDIRECT);
> >>
> >> Depending on whether the
> >> MLF_FORCE_CHASSIS_REDIRECT flag is set, the
> >> packet would either be sent to the remote hypervisor,
> >> or it would fall through to the table 32 priority 0 fallback
> >> flow and be processed locally.
> >>
> >> The chassis association could also be used for
> >> evaluation of is_chassis_resident("l3dgw_port") functions
> >> in flow matches.
> >>
> >> If you agree that this approach is more promising than
> >> type "chassisredirect" ports, I can code this up tomorrow.
> >>
> >
> > I am having trouble making this approach work with the
> > ARP request table. With the approach of replacing the
> > logical outport, the ARP request goes to the controller
> > with the new outport of type "chassisredirect". When the
> > packet is reinjected, it does eventually end up at the
> > redirect chassis.
> >
> > With the approach of using a flag, the packet is not
> > hitting the table 32 entry matching the flag. I am not sure
> > what happens to the packet after it goes up to the
> > controller, and I am not sure how to debug it further or
> > what to change to make it work.
> >
> 
> I found the bug. It was affecting all packets, not just arp, and
> was a simple fix. I am still checking all scenarios, but I think
> I have the approach with the flag instead of a new port type
> working. I can move forward with either approach, a flag or
> a new port type as originally proposed.

Do you mind posting the version with the flag?  We'll do one or the
other.
Mickey Spiegel Jan. 14, 2017, 12:22 a.m. UTC | #11
On Fri, Jan 13, 2017 at 4:21 PM, Ben Pfaff <blp@ovn.org> wrote:

> On Fri, Jan 13, 2017 at 02:19:21PM -0800, Mickey Spiegel wrote:
> > On Thu, Jan 12, 2017 at 5:12 PM, Mickey Spiegel <mickeys.dev@gmail.com>
> > wrote:
> >
> > >
> > > On Sun, Jan 8, 2017 at 10:30 PM, Mickey Spiegel <mickeys.dev@gmail.com
> >
> > > wrote:
> > >
> > >>
> > >> On Fri, Jan 6, 2017 at 8:31 PM, Mickey Spiegel <mickeys.dev@gmail.com
> >
> > >> wrote:
> > >>
> > >>>
> > >>> On Fri, Jan 6, 2017 at 4:21 PM, Mickey Spiegel <
> mickeys.dev@gmail.com>
> > >>> wrote:
> > >>>
> > >>>>
> > >>>> On Fri, Jan 6, 2017 at 4:11 PM, Ben Pfaff <blp@ovn.org> wrote:
> > >>>>
> > >>>>> On Fri, Jan 06, 2017 at 03:47:03PM -0800, Mickey Spiegel wrote:
> > >>>>> > On Fri, Jan 6, 2017 at 3:20 PM, Ben Pfaff <blp@ovn.org> wrote:
> > >>>>> >
> > >>>>> > > On Fri, Jan 06, 2017 at 12:00:30PM -0800, Mickey Spiegel wrote:
> > >>>>> > > > Currently OVN handles all logical router ports in a
> distributed
> > >>>>> manner,
> > >>>>> > > > creating instances on each chassis.  The logical router
> ingress
> > >>>>> and
> > >>>>> > > > egress pipelines are traversed locally on the source chassis.
> > >>>>> > > >
> > >>>>> > > > In order to support advanced features such as one-to-many NAT
> > >>>>> (aka IP
> > >>>>> > > > masquerading), where multiple private IP addresses spread
> across
> > >>>>> > > > multiple chassis are mapped to one public IP address, it
> will be
> > >>>>> > > > necessary to handle some of the logical router processing on
> a
> > >>>>> specific
> > >>>>> > > > chassis in a centralized manner.
> > >>>>> > > >
> > >>>>> > > > The goal of this patch is to develop abstractions that allow
> for
> > >>>>> a
> > >>>>> > > > subset of router gateway traffic to be handled in a
> centralized
> > >>>>> manner
> > >>>>> > > > (e.g. one-to-many NAT traffic), while allowing for other
> subsets
> > >>>>> of
> > >>>>> > > > router gateway traffic to be handled in a distributed manner
> > >>>>> (e.g.
> > >>>>> > > > floating IP traffic).
> > >>>>> > > >
> > >>>>> > > > This patch introduces a new type of SB port_binding called
> > >>>>> > > > "chassisredirect".  A "chassisredirect" port represents a
> > >>>>> particular
> > >>>>> > > > instance, bound to a specific chassis, of an otherwise
> > >>>>> distributed
> > >>>>> > > > port.  The ovn-controller on that chassis populates the
> "chassis"
> > >>>>> > > > column for this record as an indication for other
> > >>>>> ovn-controllers of
> > >>>>> > > > its physical location.  Other ovn-controllers do not treat
> this
> > >>>>> port
> > >>>>> > > > as a local port.
> > >>>>> > > >
> > >>>>> > > > A "chassisredirect" port should never be used as an "inport".
> > >>>>> When an
> > >>>>> > > > ingress pipeline sets the "outport", it may set the value to
> a
> > >>>>> logical
> > >>>>> > > > port of type "chassisredirect".  This will cause the packet
> to be
> > >>>>> > > > directed to a specific chassis to carry out the egress
> logical
> > >>>>> router
> > >>>>> > > > pipeline, in the same way that a logical switch forwards
> egress
> > >>>>> traffic
> > >>>>> > > > to a VIF port residing on a specific chassis.  At the
> beginning
> > >>>>> of the
> > >>>>> > > > egress pipeline, the "outport" will be reset to the value of
> the
> > >>>>> > > > distributed port.
> > >>>>> > > >
> > >>>>> > > > For outbound traffic to be handled in a centralized manner,
> the
> > >>>>> > > > "outport" should be set to the "chassisredirect" port
> > >>>>> representing
> > >>>>> > > > centralized gateway functionality in the otherwise
> distributed
> > >>>>> router.
> > >>>>> > > > For outbound traffic to be handled in a distributed manner,
> > >>>>> locally on
> > >>>>> > > > the source chassis, the "outport" should be set to the
> existing
> > >>>>> "patch"
> > >>>>> > > > port representing distributed gateway functionality.
> > >>>>> > > >
> > >>>>> > > > Inbound traffic will be directed to the appropriate chassis
> by
> > >>>>> > > > restricting source MAC address usage and ARP responses to
> that
> > >>>>> chassis,
> > >>>>> > > > or by running dynamic routing protocols.
> > >>>>> > > >
> > >>>>> > > > Note that "chassisredirect" ports have no associated IP or
> MAC
> > >>>>> addresses.
> > >>>>> > > > Any pipeline stages that depend on port specific IP or MAC
> > >>>>> addresses
> > >>>>> > > > should be carried out in the context of the distributed port.
> > >>>>> > > >
> > >>>>> > > > Although the abstraction represented by the "chassisredirect"
> > >>>>> port
> > >>>>> > > > binding is generalized, in this patch the "chassisredirect"
> port
> > >>>>> binding
> > >>>>> > > > is only created for NB logical router ports that specify the
> new
> > >>>>> > > > "redirect-chassis" option.  There is no explicit notion of a
> > >>>>> > > > "chassisredirect" port in the NB database.  The expectation
> is
> > >>>>> when
> > >>>>> > > > capabilities are implemented that take advantage of
> > >>>>> "chassisredirect"
> > >>>>> > > > ports (e.g. NAT), the addition of flows specifying a
> > >>>>> "chassisredirect"
> > >>>>> > > > port as the outport will also be triggered by the presence
> of the
> > >>>>> > > > "redirect-chassis" option.  Such flows are added for NB
> logical
> > >>>>> router
> > >>>>> > > > ports that specify the "redirect-chassis" option.
> > >>>>> > > >
> > >>>>> > > > Signed-off-by: Mickey Spiegel <mickeys.dev@gmail.com>
> > >>>>> > >
> > >>>>> > > chassisredirect ports seem incredibly similar to vif ports.  Is
> > >>>>> the only
> > >>>>> > > difference that the output port is changed at the beginning of
> the
> > >>>>> > > egress pipeline?  That's something that could be implemented
> in the
> > >>>>> > > logical egress pipeline with 'outport = "...";'.  We do say
> that
> > >>>>> the
> > >>>>> > > outport isn't supposed to be modified in an egress pipeline,
> but
> > >>>>> nothing
> > >>>>> > > enforces that and if it's actually useful then we could just
> > >>>>> change the
> > >>>>> > > documentation.
> > >>>>> > >
> > >>>>> >
> > >>>>> > I don't get the similarity to vif ports.
> > >>>>> >
> > >>>>> > I need to create two different ports for each logical router port
> > >>>>> > specifying a "redirect-chassis". One represents the centralized
> > >>>>> > instance, for traffic that needs to be centralized. The other
> > >>>>> > represents the distributed instance, i.e. just take the local
> patch
> > >>>>> > port and go to/from the local logical router instance. I wanted
> the
> > >>>>> > egress pipeline processing to be the same regardless of whether
> > >>>>> > the packet arrived at the egress pipeline on the port
> representing
> > >>>>> > the centralized instance, or whether the packet arrived at the
> > >>>>> > egress pipeline on the port representing the distributed
> instance.
> > >>>>> >
> > >>>>> > There is no pipeline processing of the chassisredirect port,
> > >>>>> > except as the outport in the ingress pipeline. Everything else
> > >>>>> > happens in tables 32 and 33.
> > >>>>>
> > >>>>> OK, then I'm having trouble following the description.  For me,
> here's
> > >>>>> the key paragraphs that led me to my conclusions:
> > >>>>>
> > >>>>>     This patch introduces a new type of SB port_binding called
> > >>>>>     "chassisredirect".  A "chassisredirect" port represents a
> > >>>>> particular
> > >>>>>     instance, bound to a specific chassis, of an otherwise
> distributed
> > >>>>>     port.  The ovn-controller on that chassis populates the
> "chassis"
> > >>>>>     column for this record as an indication for other
> ovn-controllers
> > >>>>> of
> > >>>>>     its physical location.  Other ovn-controllers do not treat this
> > >>>>> port
> > >>>>>     as a local port.
> > >>>>>
> > >>>>>     A "chassisredirect" port should never be used as an "inport".
> When
> > >>>>>     an ingress pipeline sets the "outport", it may set the value
> to a
> > >>>>>     logical port of type "chassisredirect".  This will cause the
> packet
> > >>>>>     to be directed to a specific chassis to carry out the egress
> > >>>>> logical
> > >>>>>     router pipeline, in the same way that a logical switch forwards
> > >>>>>     egress traffic to a VIF port residing on a specific chassis.
> At
> > >>>>> the
> > >>>>>     beginning of the egress pipeline, the "outport" will be reset
> to
> > >>>>> the
> > >>>>>     value of the distributed port.
> > >>>>>
> > >>>>> The first paragraph appears to say that a chassisredirect port is a
> > >>>>> port
> > >>>>> on a particular chassis and that its chassis column says what
> chassis
> > >>>>> it's on.  OK, that's the same as a vif port, right?
> > >>>>>
> > >>>>
> > >>>> Yes, the same as vif, l2gateway, or l3gateway in the sense that this
> > >>>> port is bound to a chassis. No differences there.
> > >>>>
> > >>>>>
> > >>>>> The second paragraph appears to me to say, first, that packets
> would
> > >>>>> never originate from a chassisredirect port.  OK, fine, no problem.
> > >>>>> Second, it directly makes an analogy to vif ports, and then says
> that
> > >>>>> the outport changes.  No problem.
> > >>>>>
> > >>>>
> > >>>> Two main differences from vif:
> > >>>> 1. The outport changes. I want the ct_zone assignments in table 33
> > >>>>    and the loopback check in table 34 to be according to the new
> > >>>>    outport.
> > >>>>
> > >>>> 2. There is no pipeline processing of this port. This port has no
> > >>>>    addresses or other configuration. The purpose of the port is to
> > >>>>    tell table 32 to go to a particular chassis, and then tell table
> 33
> > >>>>    what the real outport should be.
> > >>>>
> > >>>> I got to this notion because a port is the way to tell table 32 to
> > >>>> go to a particular chassis. The first thought was two regular patch
> > >>>> ports, but the idea of two patch ports with the same addresses
> > >>>> is confusing and dangerous. By changing back to the real patch
> > >>>> port right away in the egress pipeline, it avoids those problems.
> > >>>>
> > >>>> Mickey
> > >>>>
> > >>>
> > >>> Let me go back to first principles. I need three sorts of chassis
> > >>> specific behaviors for distributed NAT:
> > >>> 1. Install some flows only on the chassis where a certain logical
> > >>>    port resides. That is is_chassis_resident which you already
> > >>>    reviewed and acked. The nat flows patch at the end of the
> > >>>    patch set uses this mechanism.
> > >>> 2. Install a different set of flows associated with the distributed
> > >>>    gateway port only on the redirect-chassis. There are several
> > >>>    such flows in this patch.
> > >>> 3. Direct some traffic with outport being the distributed gateway
> > >>>    port to the instance of the distributed gateway port on the
> > >>>    redirect-chassis. When this traffic hits table 32, it gets
> > >>>    sent through the normal tunnel to the redirect-chassis.
> > >>>
> > >>> I needed some handle that triggers 3. I decided to make that
> > >>> handle be a port, which I called a "chassisredirect" port. That
> > >>> also allows me to use is_chassis_resident(chassisredirect_port)
> > >>> to solve 2.
> > >>>
> > >>> It is possible to make that handle be something other than a
> > >>> port, as long as table 32 is modified to act on that. In that case,
> > >>> I will need another match "condition" (as I called it) based on
> > >>> that handle, similar to is_chassis_resident but based on
> > >>> whatever handle we decide on instead of port.
> > >>>
> > >>
> > >> I realized earlier tonight that there is a straightforward
> > >> alternative, though it does have one potentially confusing
> > >> aspect.
> > >>
> > >> For some reason, I had been assuming that a port_binding is
> > >> either exclusive to a chassis (in the previous implementation
> > >> with OVS patch ports, it had an ofport), or the port_binding
> > >> exists everywhere and does not have a chassis association
> > >> (is_remote in the previous implementation with OVS patch
> > >> ports).
> > >>
> > >> If this is relaxed and we allow logical patch ports to be
> > >> associated with a chassis, then all I need is a new
> > >> MLF_FORCE_CHASSIS_REDIRECT flag rather than
> > >> a second port_binding with a new "chassisredirect" type.
> > >>
> > >> The potentially confusing aspect is that even though the
> > >> mechanism for associating a logical patch port with a
> > >> chassis is identical to that for other port_binding types such
> > >> as "l3gateway", the association of a chassis with a logical
> > >> patch port has a different meaning than the association of a
> > >> chassis with a VIF, a type "l3gateway" port_binding, or a
> > >> type "l2gateway" port_binding.  For the latter, the association
> > >> is exclusive, i.e. the port only exists on that chassis.  For
> > >> logical patch ports, whether there is an association with a
> > >> chassis or not, the logical patch port exists everywhere
> > >> (subject to the constraints of conditional monitoring).
> > >>
> > >> The chassis association would only be used for a new
> > >> table 32 flow similar to other flows sending packets to
> > >> remote hypervisors for other port_binding types, but with
> > >> a different match condition:
> > >>     match_set_metadata(&match, htonll(dp_key))
> > >>     match_set_reg(&match, MFF_LOG_OUTPORT - MFF_REG0, port_key);
> > >>     match_set_reg_masked(&match, MFF_LOG_FLAGS - MFF_REG0,
> > >>                          1, MLF_FORCE_CHASSIS_REDIRECT);
> > >>
> > >> Depending on whether the
> > >> MLF_FORCE_CHASSIS_REDIRECT flag is set, the
> > >> packet would either be sent to the remote hypervisor,
> > >> or it would fall through to the table 32 priority 0 fallback
> > >> flow and be processed locally.
> > >>
> > >> The chassis association could also be used for
> > >> evaluation of is_chassis_resident("l3dgw_port") functions
> > >> in flow matches.
> > >>
> > >> If you agree that this approach is more promising than
> > >> type "chassisredirect" ports, I can code this up tomorrow.
> > >>
> > >
> > > I am having trouble making this approach work with the
> > > ARP request table. With the approach of replacing the
> > > logical outport, the ARP request goes to the controller
> > > with the new outport of type "chassisredirect". When the
> > > packet is reinjected, it does eventually end up at the
> > > redirect chassis.
> > >
> > > With the approach of using a flag, the packet is not
> > > hitting the table 32 entry matching the flag. I am not sure
> > > what happens to the packet after it goes up to the
> > > controller, and I am not sure how to debug it further or
> > > what to change to make it work.
> > >
> >
> > I found the bug. It was affecting all packets, not just arp, and
> > was a simple fix. I am still checking all scenarios, but I think
> > I have the approach with the flag instead of a new port type
> > working. I can move forward with either approach, a flag or
> > a new port type as originally proposed.
>
> Do you mind posting the version with the flag?  We'll do one or the
> other.
>

Should I post just that, or the patch set?
I have not integrated it with later patches yet.

Mickey
Ben Pfaff Jan. 14, 2017, 12:59 a.m. UTC | #12
On Fri, Jan 13, 2017 at 04:22:01PM -0800, Mickey Spiegel wrote:
> On Fri, Jan 13, 2017 at 4:21 PM, Ben Pfaff <blp@ovn.org> wrote:
> 
> > On Fri, Jan 13, 2017 at 02:19:21PM -0800, Mickey Spiegel wrote:
> > > On Thu, Jan 12, 2017 at 5:12 PM, Mickey Spiegel <mickeys.dev@gmail.com>
> > > wrote:
> > >
> > > >
> > > > On Sun, Jan 8, 2017 at 10:30 PM, Mickey Spiegel <mickeys.dev@gmail.com
> > >
> > > > wrote:
> > > >
> > > >>
> > > >> On Fri, Jan 6, 2017 at 8:31 PM, Mickey Spiegel <mickeys.dev@gmail.com
> > >
> > > >> wrote:
> > > >>
> > > >>>
> > > >>> On Fri, Jan 6, 2017 at 4:21 PM, Mickey Spiegel <
> > mickeys.dev@gmail.com>
> > > >>> wrote:
> > > >>>
> > > >>>>
> > > >>>> On Fri, Jan 6, 2017 at 4:11 PM, Ben Pfaff <blp@ovn.org> wrote:
> > > >>>>
> > > >>>>> On Fri, Jan 06, 2017 at 03:47:03PM -0800, Mickey Spiegel wrote:
> > > >>>>> > On Fri, Jan 6, 2017 at 3:20 PM, Ben Pfaff <blp@ovn.org> wrote:
> > > >>>>> >
> > > >>>>> > > On Fri, Jan 06, 2017 at 12:00:30PM -0800, Mickey Spiegel wrote:
> > > >>>>> > > > Currently OVN handles all logical router ports in a
> > distributed
> > > >>>>> manner,
> > > >>>>> > > > creating instances on each chassis.  The logical router
> > ingress
> > > >>>>> and
> > > >>>>> > > > egress pipelines are traversed locally on the source chassis.
> > > >>>>> > > >
> > > >>>>> > > > In order to support advanced features such as one-to-many NAT
> > > >>>>> (aka IP
> > > >>>>> > > > masquerading), where multiple private IP addresses spread
> > across
> > > >>>>> > > > multiple chassis are mapped to one public IP address, it
> > will be
> > > >>>>> > > > necessary to handle some of the logical router processing on
> > a
> > > >>>>> specific
> > > >>>>> > > > chassis in a centralized manner.
> > > >>>>> > > >
> > > >>>>> > > > The goal of this patch is to develop abstractions that allow
> > for
> > > >>>>> a
> > > >>>>> > > > subset of router gateway traffic to be handled in a
> > centralized
> > > >>>>> manner
> > > >>>>> > > > (e.g. one-to-many NAT traffic), while allowing for other
> > subsets
> > > >>>>> of
> > > >>>>> > > > router gateway traffic to be handled in a distributed manner
> > > >>>>> (e.g.
> > > >>>>> > > > floating IP traffic).
> > > >>>>> > > >
> > > >>>>> > > > This patch introduces a new type of SB port_binding called
> > > >>>>> > > > "chassisredirect".  A "chassisredirect" port represents a
> > > >>>>> particular
> > > >>>>> > > > instance, bound to a specific chassis, of an otherwise
> > > >>>>> distributed
> > > >>>>> > > > port.  The ovn-controller on that chassis populates the
> > "chassis"
> > > >>>>> > > > column for this record as an indication for other
> > > >>>>> ovn-controllers of
> > > >>>>> > > > its physical location.  Other ovn-controllers do not treat
> > this
> > > >>>>> port
> > > >>>>> > > > as a local port.
> > > >>>>> > > >
> > > >>>>> > > > A "chassisredirect" port should never be used as an "inport".
> > > >>>>> When an
> > > >>>>> > > > ingress pipeline sets the "outport", it may set the value to
> > a
> > > >>>>> logical
> > > >>>>> > > > port of type "chassisredirect".  This will cause the packet
> > to be
> > > >>>>> > > > directed to a specific chassis to carry out the egress
> > logical
> > > >>>>> router
> > > >>>>> > > > pipeline, in the same way that a logical switch forwards
> > egress
> > > >>>>> traffic
> > > >>>>> > > > to a VIF port residing on a specific chassis.  At the
> > beginning
> > > >>>>> of the
> > > >>>>> > > > egress pipeline, the "outport" will be reset to the value of
> > the
> > > >>>>> > > > distributed port.
> > > >>>>> > > >
> > > >>>>> > > > For outbound traffic to be handled in a centralized manner,
> > the
> > > >>>>> > > > "outport" should be set to the "chassisredirect" port
> > > >>>>> representing
> > > >>>>> > > > centralized gateway functionality in the otherwise
> > distributed
> > > >>>>> router.
> > > >>>>> > > > For outbound traffic to be handled in a distributed manner,
> > > >>>>> locally on
> > > >>>>> > > > the source chassis, the "outport" should be set to the
> > existing
> > > >>>>> "patch"
> > > >>>>> > > > port representing distributed gateway functionality.
> > > >>>>> > > >
> > > >>>>> > > > Inbound traffic will be directed to the appropriate chassis
> > by
> > > >>>>> > > > restricting source MAC address usage and ARP responses to
> > that
> > > >>>>> chassis,
> > > >>>>> > > > or by running dynamic routing protocols.
> > > >>>>> > > >
> > > >>>>> > > > Note that "chassisredirect" ports have no associated IP or
> > MAC
> > > >>>>> addresses.
> > > >>>>> > > > Any pipeline stages that depend on port specific IP or MAC
> > > >>>>> addresses
> > > >>>>> > > > should be carried out in the context of the distributed port.
> > > >>>>> > > >
> > > >>>>> > > > Although the abstraction represented by the "chassisredirect"
> > > >>>>> port
> > > >>>>> > > > binding is generalized, in this patch the "chassisredirect"
> > port
> > > >>>>> binding
> > > >>>>> > > > is only created for NB logical router ports that specify the
> > new
> > > >>>>> > > > "redirect-chassis" option.  There is no explicit notion of a
> > > >>>>> > > > "chassisredirect" port in the NB database.  The expectation
> > is
> > > >>>>> when
> > > >>>>> > > > capabilities are implemented that take advantage of
> > > >>>>> "chassisredirect"
> > > >>>>> > > > ports (e.g. NAT), the addition of flows specifying a
> > > >>>>> "chassisredirect"
> > > >>>>> > > > port as the outport will also be triggered by the presence
> > of the
> > > >>>>> > > > "redirect-chassis" option.  Such flows are added for NB
> > logical
> > > >>>>> router
> > > >>>>> > > > ports that specify the "redirect-chassis" option.
> > > >>>>> > > >
> > > >>>>> > > > Signed-off-by: Mickey Spiegel <mickeys.dev@gmail.com>
> > > >>>>> > >
> > > >>>>> > > chassisredirect ports seem incredibly similar to vif ports.  Is
> > > >>>>> the only
> > > >>>>> > > difference that the output port is changed at the beginning of
> > the
> > > >>>>> > > egress pipeline?  That's something that could be implemented
> > in the
> > > >>>>> > > logical egress pipeline with 'outport = "...";'.  We do say
> > that
> > > >>>>> the
> > > >>>>> > > outport isn't supposed to be modified in an egress pipeline,
> > but
> > > >>>>> nothing
> > > >>>>> > > enforces that and if it's actually useful then we could just
> > > >>>>> change the
> > > >>>>> > > documentation.
> > > >>>>> > >
> > > >>>>> >
> > > >>>>> > I don't get the similarity to vif ports.
> > > >>>>> >
> > > >>>>> > I need to create two different ports for each logical router port
> > > >>>>> > specifying a "redirect-chassis". One represents the centralized
> > > >>>>> > instance, for traffic that needs to be centralized. The other
> > > >>>>> > represents the distributed instance, i.e. just take the local
> > patch
> > > >>>>> > port and go to/from the local logical router instance. I wanted
> > the
> > > >>>>> > egress pipeline processing to be the same regardless of whether
> > > >>>>> > the packet arrived at the egress pipeline on the port
> > representing
> > > >>>>> > the centralized instance, or whether the packet arrived at the
> > > >>>>> > egress pipeline on the port representing the distributed
> > instance.
> > > >>>>> >
> > > >>>>> > There is no pipeline processing of the chassisredirect port,
> > > >>>>> > except as the outport in the ingress pipeline. Everything else
> > > >>>>> > happens in tables 32 and 33.
> > > >>>>>
> > > >>>>> OK, then I'm having trouble following the description.  For me,
> > here's
> > > >>>>> the key paragraphs that led me to my conclusions:
> > > >>>>>
> > > >>>>>     This patch introduces a new type of SB port_binding called
> > > >>>>>     "chassisredirect".  A "chassisredirect" port represents a
> > > >>>>> particular
> > > >>>>>     instance, bound to a specific chassis, of an otherwise
> > distributed
> > > >>>>>     port.  The ovn-controller on that chassis populates the
> > "chassis"
> > > >>>>>     column for this record as an indication for other
> > ovn-controllers
> > > >>>>> of
> > > >>>>>     its physical location.  Other ovn-controllers do not treat this
> > > >>>>> port
> > > >>>>>     as a local port.
> > > >>>>>
> > > >>>>>     A "chassisredirect" port should never be used as an "inport".
> > When
> > > >>>>>     an ingress pipeline sets the "outport", it may set the value
> > to a
> > > >>>>>     logical port of type "chassisredirect".  This will cause the
> > packet
> > > >>>>>     to be directed to a specific chassis to carry out the egress
> > > >>>>> logical
> > > >>>>>     router pipeline, in the same way that a logical switch forwards
> > > >>>>>     egress traffic to a VIF port residing on a specific chassis.
> > At
> > > >>>>> the
> > > >>>>>     beginning of the egress pipeline, the "outport" will be reset
> > to
> > > >>>>> the
> > > >>>>>     value of the distributed port.
> > > >>>>>
> > > >>>>> The first paragraph appears to say that a chassisredirect port is a
> > > >>>>> port
> > > >>>>> on a particular chassis and that its chassis column says what
> > chassis
> > > >>>>> it's on.  OK, that's the same as a vif port, right?
> > > >>>>>
> > > >>>>
> > > >>>> Yes, the same as vif, l2gateway, or l3gateway in the sense that this
> > > >>>> port is bound to a chassis. No differences there.
> > > >>>>
> > > >>>>>
> > > >>>>> The second paragraph appears to me to say, first, that packets
> > would
> > > >>>>> never originate from a chassisredirect port.  OK, fine, no problem.
> > > >>>>> Second, it directly makes an analogy to vif ports, and then says
> > that
> > > >>>>> the outport changes.  No problem.
> > > >>>>>
> > > >>>>
> > > >>>> Two main differences from vif:
> > > >>>> 1. The outport changes. I want the ct_zone assignments in table 33
> > > >>>>    and the loopback check in table 34 to be according to the new
> > > >>>>    outport.
> > > >>>>
> > > >>>> 2. There is no pipeline processing of this port. This port has no
> > > >>>>    addresses or other configuration. The purpose of the port is to
> > > >>>>    tell table 32 to go to a particular chassis, and then tell table
> > 33
> > > >>>>    what the real outport should be.
> > > >>>>
> > > >>>> I got to this notion because a port is the way to tell table 32 to
> > > >>>> go to a particular chassis. The first thought was two regular patch
> > > >>>> ports, but the idea of two patch ports with the same addresses
> > > >>>> is confusing and dangerous. By changing back to the real patch
> > > >>>> port right away in the egress pipeline, it avoids those problems.
> > > >>>>
> > > >>>> Mickey
> > > >>>>
> > > >>>
> > > >>> Let me go back to first principles. I need three sorts of chassis
> > > >>> specific behaviors for distributed NAT:
> > > >>> 1. Install some flows only on the chassis where a certain logical
> > > >>>    port resides. That is is_chassis_resident which you already
> > > >>>    reviewed and acked. The nat flows patch at the end of the
> > > >>>    patch set uses this mechanism.
> > > >>> 2. Install a different set of flows associated with the distributed
> > > >>>    gateway port only on the redirect-chassis. There are several
> > > >>>    such flows in this patch.
> > > >>> 3. Direct some traffic with outport being the distributed gateway
> > > >>>    port to the instance of the distributed gateway port on the
> > > >>>    redirect-chassis. When this traffic hits table 32, it gets
> > > >>>    sent through the normal tunnel to the redirect-chassis.
> > > >>>
> > > >>> I needed some handle that triggers 3. I decided to make that
> > > >>> handle be a port, which I called a "chassisredirect" port. That
> > > >>> also allows me to use is_chassis_resident(chassisredirect_port)
> > > >>> to solve 2.
> > > >>>
> > > >>> It is possible to make that handle be something other than a
> > > >>> port, as long as table 32 is modified to act on that. In that case,
> > > >>> I will need another match "condition" (as I called it) based on
> > > >>> that handle, similar to is_chassis_resident but based on
> > > >>> whatever handle we decide on instead of port.
> > > >>>
> > > >>
> > > >> I realized earlier tonight that there is a straightforward
> > > >> alternative, though it does have one potentially confusing
> > > >> aspect.
> > > >>
> > > >> For some reason, I had been assuming that a port_binding is
> > > >> either exclusive to a chassis (in the previous implementation
> > > >> with OVS patch ports, it had an ofport), or the port_binding
> > > >> exists everywhere and does not have a chassis association
> > > >> (is_remote in the previous implementation with OVS patch
> > > >> ports).
> > > >>
> > > >> If this is relaxed and we allow logical patch ports to be
> > > >> associated with a chassis, then all I need is a new
> > > >> MLF_FORCE_CHASSIS_REDIRECT flag rather than
> > > >> a second port_binding with a new "chassisredirect" type.
> > > >>
> > > >> The potentially confusing aspect is that even though the
> > > >> mechanism for associating a logical patch port with a
> > > >> chassis is identical to that for other port_binding types such
> > > >> as "l3gateway", the association of a chassis with a logical
> > > >> patch port has a different meaning than the association of a
> > > >> chassis with a VIF, a type "l3gateway" port_binding, or a
> > > >> type "l2gateway" port_binding.  For the latter, the association
> > > >> is exclusive, i.e. the port only exists on that chassis.  For
> > > >> logical patch ports, whether there is an association with a
> > > >> chassis or not, the logical patch port exists everywhere
> > > >> (subject to the constraints of conditional monitoring).
> > > >>
> > > >> The chassis association would only be used for a new
> > > >> table 32 flow similar to other flows sending packets to
> > > >> remote hypervisors for other port_binding types, but with
> > > >> a different match condition:
> > > >>     match_set_metadata(&match, htonll(dp_key))
> > > >>     match_set_reg(&match, MFF_LOG_OUTPORT - MFF_REG0, port_key);
> > > >>     match_set_reg_masked(&match, MFF_LOG_FLAGS - MFF_REG0,
> > > >>                          1, MLF_FORCE_CHASSIS_REDIRECT);
> > > >>
> > > >> Depending on whether the
> > > >> MLF_FORCE_CHASSIS_REDIRECT flag is set, the
> > > >> packet would either be sent to the remote hypervisor,
> > > >> or it would fall through to the table 32 priority 0 fallback
> > > >> flow and be processed locally.
> > > >>
> > > >> The chassis association could also be used for
> > > >> evaluation of is_chassis_resident("l3dgw_port") functions
> > > >> in flow matches.
> > > >>
> > > >> If you agree that this approach is more promising than
> > > >> type "chassisredirect" ports, I can code this up tomorrow.
> > > >>
> > > >
> > > > I am having trouble making this approach work with the
> > > > ARP request table. With the approach of replacing the
> > > > logical outport, the ARP request goes to the controller
> > > > with the new outport of type "chassisredirect". When the
> > > > packet is reinjected, it does eventually end up at the
> > > > redirect chassis.
> > > >
> > > > With the approach of using a flag, the packet is not
> > > > hitting the table 32 entry matching the flag. I am not sure
> > > > what happens to the packet after it goes up to the
> > > > controller, and I am not sure how to debug it further or
> > > > what to change to make it work.
> > > >
> > >
> > > I found the bug. It was affecting all packets, not just arp, and
> > > was a simple fix. I am still checking all scenarios, but I think
> > > I have the approach with the flag instead of a new port type
> > > working. I can move forward with either approach, a flag or
> > > a new port type as originally proposed.
> >
> > Do you mind posting the version with the flag?  We'll do one or the
> > other.
> >
> 
> Should I post just that, or the patch set?
> I have not integrated it with later patches yet.

Can you post the revised patch set up to this patch?  I guess that's
just 2 patches, since patch 1 is now applied.
diff mbox

Patch

diff --git a/ovn/controller/binding.c b/ovn/controller/binding.c
index 2f24e9d..25592c2 100644
--- a/ovn/controller/binding.c
+++ b/ovn/controller/binding.c
@@ -355,6 +355,14 @@  consider_local_datapath(struct controller_ctx *ctx,
             add_local_datapath(ldatapaths, lports, binding_rec->datapath,
                                false, local_datapaths);
         }
+    } else if (!strcmp(binding_rec->type, "chassisredirect")) {
+        const char *chassis_id = smap_get(&binding_rec->options,
+                                          "redirect-chassis");
+        our_chassis = chassis_id && !strcmp(chassis_id, chassis_rec->name);
+        if (our_chassis) {
+            add_local_datapath(ldatapaths, lports, binding_rec->datapath,
+                               false, local_datapaths);
+        }
     } else if (!strcmp(binding_rec->type, "l3gateway")) {
         const char *chassis_id = smap_get(&binding_rec->options,
                                           "l3gateway-chassis");
diff --git a/ovn/controller/ovn-controller.c b/ovn/controller/ovn-controller.c
index a28e5f6..7cef3f8 100644
--- a/ovn/controller/ovn-controller.c
+++ b/ovn/controller/ovn-controller.c
@@ -163,6 +163,10 @@  update_sb_monitors(struct ovsdb_idl *ovnsb_idl,
         sbrec_port_binding_add_clause_options(&pb, OVSDB_F_INCLUDES, &l2);
         const struct smap l3 = SMAP_CONST1(&l3, "l3gateway-chassis", id);
         sbrec_port_binding_add_clause_options(&pb, OVSDB_F_INCLUDES, &l3);
+        const struct smap redirect = SMAP_CONST1(&redirect,
+                                                 "redirect-chassis", id);
+        sbrec_port_binding_add_clause_options(&pb, OVSDB_F_INCLUDES,
+                                              &redirect);
     }
     if (local_ifaces) {
         const char *name;
diff --git a/ovn/controller/physical.c b/ovn/controller/physical.c
index 9cc7eb6..07b7cd5 100644
--- a/ovn/controller/physical.c
+++ b/ovn/controller/physical.c
@@ -352,6 +352,69 @@  consider_port_binding(enum mf_field_id mff_ovn_geneve,
         return;
     }
 
+    if (!strcmp(binding->type, "chassisredirect")
+        && binding->chassis == chassis) {
+
+        /* Table 33, priority 100.
+         * =======================
+         *
+         * Implements output to local hypervisor.  Each flow matches a
+         * logical output port on the local hypervisor, and resubmits to
+         * table 34.  For ports of type "chassisredirect", the logical
+         * output port is changed from the "chassisredirect" port to the
+         * underlying distributed port. */
+
+        match_init_catchall(&match);
+        ofpbuf_clear(ofpacts_p);
+        match_set_metadata(&match, htonll(dp_key));
+        match_set_reg(&match, MFF_LOG_OUTPORT - MFF_REG0, port_key);
+
+        const char *distributed_port = smap_get(&binding->options,
+                                                "distributed-port");
+        const struct sbrec_port_binding *distributed_binding
+            = lport_lookup_by_name(lports, distributed_port);
+
+        if (!distributed_binding) {
+            /* Packet will be dropped. */
+            static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 1);
+            VLOG_WARN_RL(&rl, "No port binding record for distributed "
+                         "port %s referred by chassisredirect port %s",
+                         distributed_port,
+                         binding->logical_port);
+        } else if (binding->datapath !=
+                   distributed_binding->datapath) {
+            /* Packet will be dropped. */
+            static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 1);
+            VLOG_WARN_RL(&rl,
+                         "chassisredirect port %s refers to "
+                         "distributed port %s in wrong datapath",
+                         binding->logical_port,
+                         distributed_port);
+        } else {
+            put_load(distributed_binding->tunnel_key,
+                     MFF_LOG_OUTPORT, 0, 32, ofpacts_p);
+
+            struct zone_ids zone_ids = get_zone_ids(distributed_binding,
+                                                    ct_zones);
+            if (zone_ids.ct) {
+                put_load(zone_ids.ct, MFF_LOG_CT_ZONE, 0, 32, ofpacts_p);
+            }
+            if (zone_ids.dnat) {
+                put_load(zone_ids.dnat, MFF_LOG_DNAT_ZONE, 0, 32, ofpacts_p);
+            }
+            if (zone_ids.snat) {
+                put_load(zone_ids.snat, MFF_LOG_SNAT_ZONE, 0, 32, ofpacts_p);
+            }
+
+            /* Resubmit to table 34. */
+            put_resubmit(OFTABLE_CHECK_LOOPBACK, ofpacts_p);
+        }
+
+        ofctrl_add_flow(flow_table, OFTABLE_LOCAL_OUTPUT, 100, 0,
+                        &match, ofpacts_p);
+        return;
+    }
+
     /* Find the OpenFlow port for the logical port, as 'ofport'.  This is
      * one of:
      *
diff --git a/ovn/northd/ovn-northd.8.xml b/ovn/northd/ovn-northd.8.xml
index f3c1682..49e4291 100644
--- a/ovn/northd/ovn-northd.8.xml
+++ b/ovn/northd/ovn-northd.8.xml
@@ -740,9 +740,21 @@  output;
       </li>
 
       <li>
-        One priority-50 flow that matches each known Ethernet address against
-        <code>eth.dst</code> and outputs the packet to the single associated
-        output port.
+        <p>
+          One priority-50 flow that matches each known Ethernet address against
+          <code>eth.dst</code> and outputs the packet to the single associated
+          output port.
+        </p>
+
+        <p>
+          For the Ethernet address on a logical switch port of type
+          <code>router</code>, when that logical switch port's
+          <ref column="addresses" table="Logical_Switch_Port"
+          db="OVN_Northbound"/> column is set to <code>router</code> and
+          the connected logical router port specifies a
+          <code>redirect-chassis</code>, the flow is only programmed on the
+          <code>redirect-chassis</code>.
+        </p>
       </li>
 
       <li>
@@ -862,10 +874,21 @@  output;
       </li>
 
       <li>
-        For each enabled router port <var>P</var> with Ethernet address
-        <var>E</var>, a priority-50 flow that matches <code>inport ==
-        <var>P</var> &amp;&amp; (eth.mcast || eth.dst ==
-        <var>E</var></code>), with action <code>next;</code>.
+        <p>
+          For each enabled router port <var>P</var> with Ethernet address
+          <var>E</var>, a priority-50 flow that matches <code>inport ==
+          <var>P</var> &amp;&amp; (eth.mcast || eth.dst ==
+          <var>E</var></code>), with action <code>next;</code>.
+        </p>
+
+        <p>
+          For the gateway port on a distributed logical router (where
+          one of the logical router ports specifies a
+          <code>redirect-chassis</code>), the above flow matching
+          <code>eth.dst == <var>E</var></code> is only programmed on
+          the gateway port instance on the
+          <code>redirect-chassis</code>.
+        </p>
       </li>
     </ul>
 
@@ -980,6 +1003,17 @@  outport = <var>P</var>;
 flags.loopback = 1;
 output;
         </pre>
+
+        <p>
+          For the gateway port on a distributed logical router (where
+          one of the logical router ports specifies a
+          <code>redirect-chassis</code>), the above flows are only
+          programmed on the gateway port instance on the
+          <code>redirect-chassis</code>.  This behavior avoids generation
+          of multiple ARP responses from different chassis, and allows
+          upstream MAC learning to point to the
+          <code>redirect-chassis</code>.
+        </p>
       </li>
 
       <li>
@@ -1040,6 +1074,17 @@  nd_na {
     output;
 };
         </pre>
+
+        <p>
+          For the gateway port on a distributed logical router (where
+          one of the logical router ports specifies a
+          <code>redirect-chassis</code>), the above flows replying to
+          IPv6 Neighbor Solicitations are only programmed on the
+          gateway port instance on the <code>redirect-chassis</code>.
+          This behavior avoids generation of multiple replies from
+          different chassis, and allows upstream MAC learning to point
+          to the <code>redirect-chassis</code>.
+        </p>
       </li>
 
       <li>
@@ -1485,7 +1530,46 @@  next;
       </li>
     </ul>
 
-    <h3>Ingress Table 7: ARP Request</h3>
+    <h3>Ingress Table 7: Gateway Redirect</h3>
+
+    <p>
+      For distributed logical routers where one of the logical router
+      ports specifies a <code>redirect-chassis</code>, this table redirects
+      certain packets to the distributed gateway port instance on the
+      <code>redirect-chassis</code>.  This table has the following flows:
+    </p>
+
+    <ul>
+      <li>
+        A priority-150 logical flow with match
+        <code>outport == <var>GW</var> &amp;&amp;
+        eth.dst == 00:00:00:00:00:00</code> has actions
+        <code>outport = <var>CR</var>; next;</code>, where
+        <var>GW</var> is the logical router distributed gateway
+        port and <var>CR</var> is the <code>chassisredirect</code>
+        port representing the instance of the logical router
+        distributed gateway port on the
+        <code>redirect-chassis</code>.
+      </li>
+
+      <li>
+        A priority-50 logical flow with match
+        <code>outport == <var>GW</var></code> has actions
+        <code>outport = <var>CR</var>; next;</code>, where
+        <var>GW</var> is the logical router distributed gateway
+        port and <var>CR</var> is the <code>chassisredirect</code>
+        port representing the instance of the logical router
+        distributed gateway port on the
+        <code>redirect-chassis</code>.
+      </li>
+
+      <li>
+        A priority-0 logical flow with match <code>1</code> has actions
+        <code>next;</code>.
+      </li>
+    </ul>
+
+    <h3>Ingress Table 8: ARP Request</h3>
 
     <p>
       In the common case where the Ethernet destination has been resolved, this
diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
index 5ad544d..87c80d1 100644
--- a/ovn/northd/ovn-northd.c
+++ b/ovn/northd/ovn-northd.c
@@ -132,7 +132,8 @@  enum ovn_stage {
     PIPELINE_STAGE(ROUTER, IN,  DNAT,        4, "lr_in_dnat")         \
     PIPELINE_STAGE(ROUTER, IN,  IP_ROUTING,  5, "lr_in_ip_routing")   \
     PIPELINE_STAGE(ROUTER, IN,  ARP_RESOLVE, 6, "lr_in_arp_resolve")  \
-    PIPELINE_STAGE(ROUTER, IN,  ARP_REQUEST, 7, "lr_in_arp_request")  \
+    PIPELINE_STAGE(ROUTER, IN,  GW_REDIRECT, 7, "lr_in_gw_redirect")  \
+    PIPELINE_STAGE(ROUTER, IN,  ARP_REQUEST, 8, "lr_in_arp_request")  \
                                                                       \
     /* Logical router egress stages. */                               \
     PIPELINE_STAGE(ROUTER, OUT, SNAT,      0, "lr_out_snat")          \
@@ -382,6 +383,15 @@  struct ovn_datapath {
 
     /* IPAM data. */
     struct hmap ipam;
+
+    /* OVN northd only needs to know about the logical router gateway port for
+     * NAT on a distributed router.  This "distributed gateway port" is
+     * populated only when there is a "redirect-chassis" specified for one of
+     * the ports on the logical router.  Otherwise this will be NULL. */
+    struct ovn_port *l3dgw_port;
+    /* The "derived" OVN port representing the instance of l3dgw_port on
+     * the "redirect-chassis". */
+    struct ovn_port *l3redirect_port;
 };
 
 struct macam_node {
@@ -658,6 +668,9 @@  struct ovn_port {
 
     struct lport_addresses lrp_networks;
 
+    bool derived; /* Indicates whether this is an additional port
+                   * derived from nbsp or nbrp. */
+
     /* The port's peer:
      *
      *     - A switch port S of type "router" has a router port R as a peer,
@@ -687,6 +700,7 @@  ovn_port_create(struct hmap *ports, const char *key,
     op->sb = sb;
     op->nbsp = nbsp;
     op->nbrp = nbrp;
+    op->derived = false;
     hmap_insert(ports, &op->key_node, hash_string(op->key, 0));
     return op;
 }
@@ -737,6 +751,12 @@  ovn_port_allocate_key(struct ovn_datapath *od)
                           (1u << 15) - 1, &od->port_key_hint);
 }
 
+static char *
+chassis_redirect_name(const char *port_name)
+{
+    return xasprintf("cr-%s", port_name);
+}
+
 static bool
 ipam_is_duplicate_mac(struct eth_addr *ea, uint64_t mac64, bool warn)
 {
@@ -1299,6 +1319,52 @@  join_logical_ports(struct northd_context *ctx,
                 op->lrp_networks = lrp_networks;
                 op->od = od;
                 ipam_add_port_addresses(op->od, op);
+
+                const char *redirect_chassis = smap_get(&op->nbrp->options,
+                                                        "redirect-chassis");
+                if (redirect_chassis) {
+                    /* Additional "derived" ovn_port crp represents the
+                     * instance of op on the "redirect-chassis". */
+                    const char *gw_chassis = smap_get(&op->od->nbr->options,
+                                                   "chassis");
+                    if (gw_chassis) {
+                        static struct vlog_rate_limit rl
+                            = VLOG_RATE_LIMIT_INIT(1, 1);
+                        VLOG_WARN_RL(&rl, "Bad configuration: "
+                                     "redirect-chassis configured on port %s "
+                                     "on L3 gateway router", nbrp->name);
+                        continue;
+                    }
+                    char *redirect_name = chassis_redirect_name(nbrp->name);
+                    struct ovn_port *crp = ovn_port_find(ports, redirect_name);
+                    if (crp) {
+                        crp->derived = true;
+                        crp->nbrp = nbrp;
+                        ovs_list_remove(&crp->list);
+                        ovs_list_push_back(both, &crp->list);
+                    } else {
+                        crp = ovn_port_create(ports, redirect_name,
+                                              NULL, nbrp, NULL);
+                        crp->derived = true;
+                        ovs_list_push_back(nb_only, &crp->list);
+                    }
+                    crp->od = od;
+                    free(redirect_name);
+
+                    /* Set l3dgw_port and l3redirect_port in od, for later
+                     * use during flow creation. */
+                    if (od->l3dgw_port || od->l3redirect_port) {
+                        static struct vlog_rate_limit rl
+                            = VLOG_RATE_LIMIT_INIT(1, 1);
+                        VLOG_WARN_RL(&rl, "Bad configuration: multiple ports "
+                                     "with redirect-chassis on same logical "
+                                     "router %s", od->nbr->name);
+                        continue;
+                    } else {
+                        od->l3dgw_port = op;
+                        od->l3redirect_port = crp;
+                    }
+                }
             }
         }
     }
@@ -1307,7 +1373,7 @@  join_logical_ports(struct northd_context *ctx,
      * to their peers. */
     struct ovn_port *op;
     HMAP_FOR_EACH (op, key_node, ports) {
-        if (op->nbsp && !strcmp(op->nbsp->type, "router")) {
+        if (op->nbsp && !strcmp(op->nbsp->type, "router") && !op->derived) {
             const char *peer_name = smap_get(&op->nbsp->options, "router-port");
             if (!peer_name) {
                 continue;
@@ -1336,7 +1402,7 @@  join_logical_ports(struct northd_context *ctx,
                     break;
                 }
             }
-        } else if (op->nbrp && op->nbrp->peer) {
+        } else if (op->nbrp && op->nbrp->peer && !op->derived) {
             struct ovn_port *peer = ovn_port_find(ports, op->nbrp->peer);
             if (peer) {
                 if (peer->nbrp) {
@@ -1366,18 +1432,29 @@  ovn_port_update_sbrec(const struct ovn_port *op,
         /* If the router is for l3 gateway, it resides on a chassis
          * and its port type is "l3gateway". */
         const char *chassis = smap_get(&op->od->nbr->options, "chassis");
-        if (chassis) {
+        if (op->derived) {
+            sbrec_port_binding_set_type(op->sb, "chassisredirect");
+        } else if (chassis) {
             sbrec_port_binding_set_type(op->sb, "l3gateway");
         } else {
             sbrec_port_binding_set_type(op->sb, "patch");
         }
 
-        const char *peer = op->peer ? op->peer->key : "<error>";
         struct smap new;
         smap_init(&new);
-        smap_add(&new, "peer", peer);
-        if (chassis) {
-            smap_add(&new, "l3gateway-chassis", chassis);
+        if (op->derived) {
+            const char *redirect_chassis = smap_get(&op->nbrp->options,
+                                                    "redirect-chassis");
+            if (redirect_chassis) {
+                smap_add(&new, "redirect-chassis", redirect_chassis);
+            }
+            smap_add(&new, "distributed-port", op->nbrp->name);
+        } else {
+            const char *peer = op->peer ? op->peer->key : "<error>";
+            smap_add(&new, "peer", peer);
+            if (chassis) {
+                smap_add(&new, "l3gateway-chassis", chassis);
+            }
         }
         sbrec_port_binding_set_options(op->sb, &new);
         smap_destroy(&new);
@@ -3145,6 +3222,15 @@  build_lswitch_flows(struct hmap *datapaths, struct hmap *ports,
                 ds_clear(&match);
                 ds_put_format(&match, "eth.dst == "ETH_ADDR_FMT,
                               ETH_ADDR_ARGS(mac));
+                if (op->peer->od->l3dgw_port
+                    && op->peer == op->peer->od->l3dgw_port
+                    && op->peer->od->l3redirect_port) {
+                    /* The destination lookup flow for the router's
+                     * distributed gateway port MAC address should only be
+                     * programmed on the "redirect-chassis". */
+                    ds_put_format(&match, " && is_chassis_resident(%s)",
+                                  op->peer->od->l3redirect_port->json_key);
+                }
 
                 ds_clear(&actions);
                 ds_put_format(&actions, "outport = %s; output;", op->json_key);
@@ -3583,9 +3669,27 @@  build_lrouter_flows(struct hmap *datapaths, struct hmap *ports,
             continue;
         }
 
+        if (op->derived) {
+            /* No ingress packets should be received on a chassisredirect
+             * port. */
+            continue;
+        }
+
+        ds_clear(&match);
+        ds_put_format(&match, "eth.mcast && inport == %s", op->json_key);
+        ovn_lflow_add(lflows, op->od, S_ROUTER_IN_ADMISSION, 50,
+                      ds_cstr(&match), "next;");
+
         ds_clear(&match);
-        ds_put_format(&match, "(eth.mcast || eth.dst == %s) && inport == %s",
+        ds_put_format(&match, "eth.dst == %s && inport == %s",
                       op->lrp_networks.ea_s, op->json_key);
+        if (op->od->l3dgw_port && op == op->od->l3dgw_port
+            && op->od->l3redirect_port) {
+            /* Traffic with eth.dst = l3dgw_port->lrp_networks.ea_s
+             * should only be received on the "redirect-chassis". */
+            ds_put_format(&match, " && is_chassis_resident(%s)",
+                          op->od->l3redirect_port->json_key);
+        }
         ovn_lflow_add(lflows, op->od, S_ROUTER_IN_ADMISSION, 50,
                       ds_cstr(&match), "next;");
     }
@@ -3648,6 +3752,11 @@  build_lrouter_flows(struct hmap *datapaths, struct hmap *ports,
             continue;
         }
 
+        if (op->derived) {
+            /* No ingress packets are accepted on a chassisredirect
+             * port, so no need to program flows for that port. */
+            continue;
+        }
 
         if (op->lrp_networks.n_ipv4_addrs) {
             /* L3 admission control: drop packets that originate from an
@@ -3687,6 +3796,16 @@  build_lrouter_flows(struct hmap *datapaths, struct hmap *ports,
             ds_put_format(&match,
                           "inport == %s && arp.tpa == %s && arp.op == 1",
                           op->json_key, op->lrp_networks.ipv4_addrs[i].addr_s);
+            if (op->od->l3dgw_port && op == op->od->l3dgw_port
+                && op->od->l3redirect_port) {
+                /* Traffic with eth.src = l3dgw_port->lrp_networks.ea_s
+                 * should only be sent from the "redirect-chassis", so that
+                 * upstream MAC learning points to the "redirect-chassis".
+                 * Also need to avoid generation of multiple ARP responses
+                 * from different chassis. */
+                ds_put_format(&match, " && is_chassis_resident(%s)",
+                              op->od->l3redirect_port->json_key);
+            }
 
             ds_clear(&actions);
             ds_put_format(&actions,
@@ -3873,6 +3992,12 @@  build_lrouter_flows(struct hmap *datapaths, struct hmap *ports,
             continue;
         }
 
+        if (op->derived) {
+            /* No ingress packets are accepted on a chassisredirect
+             * port, so no need to program flows for that port. */
+            continue;
+        }
+
         if (op->lrp_networks.n_ipv6_addrs) {
             /* L3 admission control: drop packets that originate from an
              * IPv6 address owned by the router (priority 100). */
@@ -3918,6 +4043,16 @@  build_lrouter_flows(struct hmap *datapaths, struct hmap *ports,
                     op->lrp_networks.ipv6_addrs[i].addr_s,
                     op->lrp_networks.ipv6_addrs[i].sn_addr_s,
                     op->lrp_networks.ipv6_addrs[i].addr_s);
+            if (op->od->l3dgw_port && op == op->od->l3dgw_port
+                && op->od->l3redirect_port) {
+                /* Traffic with eth.src = l3dgw_port->lrp_networks.ea_s
+                 * should only be sent from the "redirect-chassis", so that
+                 * upstream MAC learning points to the "redirect-chassis".
+                 * Also need to avoid generation of multiple ND replies
+                 * from different chassis. */
+                ds_put_format(&match, " && is_chassis_resident(%s)",
+                              op->od->l3redirect_port->json_key);
+            }
 
             ds_clear(&actions);
             ds_put_format(&actions,
@@ -4427,7 +4562,47 @@  build_lrouter_flows(struct hmap *datapaths, struct hmap *ports,
                       "get_nd(outport, xxreg0); next;");
     }
 
-    /* Local router ingress table 7: ARP request.
+    /* Logical router ingress table 7: Gateway redirect.
+     *
+     * For traffic with outport equal to the l3dgw_port
+     * on a distributed router, this table redirects a subset
+     * of the traffic to the l3redirect_port which represents
+     * the central instance of the l3dgw_port.
+     */
+    HMAP_FOR_EACH (od, key_node, datapaths) {
+        if (!od->nbr) {
+            continue;
+        }
+        if (od->l3dgw_port && od->l3redirect_port) {
+            /* For traffic with outport == l3dgw_port, if the
+             * packet did not match any higher priority redirect
+             * rule, then the traffic is redirected to the central
+             * instance of the l3dgw_port. */
+            ds_clear(&match);
+            ds_put_format(&match, "outport == %s",
+                          od->l3dgw_port->json_key);
+            ds_clear(&actions);
+            ds_put_format(&actions, "outport = %s; next;",
+                          od->l3redirect_port->json_key);
+            ovn_lflow_add(lflows, od, S_ROUTER_IN_GW_REDIRECT, 50,
+                          ds_cstr(&match), ds_cstr(&actions));
+
+            /* If the Ethernet destination has not been resolved,
+             * redirect to the central instance of the l3dgw_port.
+             * Such traffic will be replaced by an ARP request or ND
+             * Neighbor Solicitation in the ARP request ingress
+             * table, before being redirected to the central instance.
+             */
+            ds_put_format(&match, " && eth.dst == 00:00:00:00:00:00");
+            ovn_lflow_add(lflows, od, S_ROUTER_IN_GW_REDIRECT, 150,
+                          ds_cstr(&match), ds_cstr(&actions));
+        }
+
+        /* Packets are allowed by default. */
+        ovn_lflow_add(lflows, od, S_ROUTER_IN_GW_REDIRECT, 0, "1", "next;");
+    }
+
+    /* Local router ingress table 8: ARP request.
      *
      * In the common case where the Ethernet destination has been resolved,
      * this table outputs the packet (priority 0).  Otherwise, it composes
@@ -4463,6 +4638,14 @@  build_lrouter_flows(struct hmap *datapaths, struct hmap *ports,
             continue;
         }
 
+        if (op->derived) {
+            /* No egress packets should be processed in the context of
+             * a chassisredirect port.  The chassisredirect port should
+             * be replaced by the l3dgw port in the local output
+             * pipeline stage before egress processing. */
+            continue;
+        }
+
         ds_clear(&match);
         ds_put_format(&match, "outport == %s", op->json_key);
         ovn_lflow_add(lflows, op->od, S_ROUTER_OUT_DELIVERY, 100,
diff --git a/ovn/ovn-nb.ovsschema b/ovn/ovn-nb.ovsschema
index 39c7f99..1c8319f 100644
--- a/ovn/ovn-nb.ovsschema
+++ b/ovn/ovn-nb.ovsschema
@@ -1,7 +1,7 @@ 
 {
     "name": "OVN_Northbound",
-    "version": "5.4.1",
-    "cksum": "3485560318 13777",
+    "version": "5.5.0",
+    "cksum": "379266191 13990",
     "tables": {
         "NB_Global": {
             "columns": {
@@ -191,6 +191,11 @@ 
         "Logical_Router_Port": {
             "columns": {
                 "name": {"type": "string"},
+                "options": {
+                    "type": {"key": "string",
+                             "value": "string",
+                             "min": 0,
+                             "max": "unlimited"}},
                 "networks": {"type": {"key": "string",
                                       "min": 1,
                                       "max": "unlimited"}},
diff --git a/ovn/ovn-nb.xml b/ovn/ovn-nb.xml
index e52b29e..2ace82e 100644
--- a/ovn/ovn-nb.xml
+++ b/ovn/ovn-nb.xml
@@ -1093,6 +1093,44 @@ 
       port has all ingress and egress traffic dropped.
     </column>
 
+    <group title="Options">
+      <p>
+        Additional options for the logical router port.
+      </p>
+
+      <column name="options" key="redirect-chassis">
+        <p>
+          If set, this indicates that this logical router port represents
+          a distributed gateway port.  In addition to the southbound
+          database port representing this distributed gateway port, another
+          port will be created in the southbound database that represents a
+          particular instance, bound to a specific chassis, of this
+          otherwise distributed logical router port.  This additional port
+          can then be specified as an <code>outport</code> in some of the
+          ingress pipeline flows.  This will cause matching packets to be
+          directed to a specific chassis to carry out the egress pipeline,
+          allowing a subset of logical router functionality to be
+          implemented in a centralized manner.  At the beginning of the
+          egress pipeline, the <code>outport</code> will be reset to the
+          value of the distributed port.
+        </p>
+
+        <p>
+          This option specifies the name of the <code>chassis</code> to which
+          the additional southbound port binding of type
+          <code>chassisredirect</code> will be bound.
+        </p>
+
+        <p>
+          When this option is specified, the peer logical switch port's
+          <ref column="addresses" table="Logical_Switch_Port"/> should be
+          set to <code>router</code>, so that the corresponding logical
+          switch destination lookup flow is only programmed on the
+          <code>redirect-chassis</code>.
+        </p>
+      </column>
+    </group>
+
     <group title="Attachment">
       <p>
         A given router port serves one of two purposes:
diff --git a/ovn/ovn-sb.xml b/ovn/ovn-sb.xml
index 2f35079..43c6f11 100644
--- a/ovn/ovn-sb.xml
+++ b/ovn/ovn-sb.xml
@@ -1776,6 +1776,21 @@  tcp.flags = RST;
             table="Port_Binding"/>:<code>vtep-logical-switch</code> must also
             be defined.
           </dd>
+
+          <dt><code>chassisredirect</code></dt>
+          <dd>
+            A logical port that represents a particular instance, bound
+            to a specific chassis, of an otherwise distributed parent
+            port (e.g. of type <code>patch</code>).  A
+            <code>chassisredirect</code> port should never be used as an
+            <code>inport</code>.  When an ingress pipeline sets the
+            <code>outport</code>, it may set the value to a logical port
+            of type <code>chassisredirect</code>.  This will cause the
+            packet to be directed to a specific chassis to carry out the
+            egress pipeline.  At the beginning of the egress pipeline,
+            the <code>outport</code> will be reset to the value of the
+            distributed port.
+          </dd>
         </dl>
       </column>
     </group>
@@ -1932,6 +1947,26 @@  tcp.flags = RST;
       </column>
     </group>
 
+    <group title="Chassis Redirect Options">
+      <p>
+        These options apply to logical ports with <ref column="type"/>
+        of <code>chassisredirect</code>.
+      </p>
+
+      <column name="options" key="distributed-port">
+        The name of the distributed port for which this
+        <code>chassisredirect</code> port represents a particular instance.
+      </column>
+
+      <column name="options" key="redirect-chassis">
+        The <code>chassis</code> that this <code>chassisredirect</code> port
+        is bound to.  This is taken from <ref table="Logical_Router_Port"
+        column="options" key="redirect-chassis" db="OVN_Northbound"/>
+        in the OVN_Northbound database's <ref table="Logical_Router_Port"
+        db="OVN_Northbound"/> table.
+      </column>
+    </group>
+
     <group title="Nested Containers">
       <p>
         These columns support containers nested within a VM.  Specifically,
diff --git a/ovn/utilities/ovn-trace.c b/ovn/utilities/ovn-trace.c
index 7106bed..b19d4d6 100644
--- a/ovn/utilities/ovn-trace.c
+++ b/ovn/utilities/ovn-trace.c
@@ -307,6 +307,7 @@  struct ovntrace_port {
     char *type;
     uint16_t tunnel_key;
     struct ovntrace_port *peer; /* Patch ports only. */
+    struct ovntrace_port *distributed_port; /* chassisredirect ports only. */
 };
 
 struct ovntrace_mcgroup {
@@ -516,6 +517,24 @@  read_ports(void)
             }
         }
     }
+
+    SBREC_PORT_BINDING_FOR_EACH (sbpb, ovnsb_idl) {
+        if (!strcmp(sbpb->type, "chassisredirect")) {
+            struct ovntrace_port *port
+                = shash_find_data(&ports, sbpb->logical_port);
+            if (port) {
+                const char *distributed_name = smap_get(&sbpb->options,
+                                                       "distributed-port");
+                if (distributed_name) {
+                    struct ovntrace_port *distributed_port
+                        = shash_find_data(&ports, distributed_name);
+                    if (distributed_port && distributed_port->dp == port->dp) {
+                        port->distributed_port = distributed_port;
+                    }
+                }
+            }
+        }
+    }
 }
 
 static int
@@ -1173,7 +1192,29 @@  execute_output(const struct ovntrace_datapath *dp, struct flow *uflow,
                                      "/* omitting output because inport == outport && !flags.loopback */");
             }
         }
-    } else if (port->tunnel_key != in_key || allow_loopback) {
+        return;
+    }
+
+    if (port && !strcmp(port->type, "chassisredirect")) {
+        if (port->distributed_port) {
+            ovntrace_node_append(super, OVNTRACE_NODE_OUTPUT,
+                                 "/* Replacing type \"%s\" outport \"%s\""
+                                 " with distributed port \"%s\". */",
+                                 port->type, port->name,
+                                 port->distributed_port->name);
+            port = port->distributed_port;
+            out_name = port->name;
+            egress_uflow.regs[MFF_LOG_OUTPORT - MFF_REG0] = port->tunnel_key;
+        } else {
+            ovntrace_node_append(super, OVNTRACE_NODE_ERROR,
+                                 "*** output to type \"%s\" port \"%s\""
+                                 " with no or invalid distributed port",
+                                 port->type, out_name);
+            return;
+        }
+    }
+
+    if (port->tunnel_key != in_key || allow_loopback) {
         struct ovntrace_node *node = ovntrace_node_append(
             super, OVNTRACE_NODE_PIPELINE,
             "egress(dp=\"%s\", inport=\"%s\", outport=\"%s\")",
diff --git a/tests/ovn.at b/tests/ovn.at
index 25ee54e..5e5d5c2 100644
--- a/tests/ovn.at
+++ b/tests/ovn.at
@@ -6148,3 +6148,337 @@  OVS_APP_EXIT_AND_WAIT([ovs-vswitchd])
 OVS_APP_EXIT_AND_WAIT([ovsdb-server])
 
 AT_CLEANUP
+
+AT_SETUP([ovn -- 1 LR with distributed router gateway port])
+AT_SKIP_IF([test $HAVE_PYTHON = no])
+ovn_start
+
+# Logical network:
+# One LR R1 that has switches foo (192.168.1.0/24) and
+# alice (172.16.1.0/24) connected to it.  The logical port
+# between R1 and alice has a "redirect-chassis" specified,
+# i.e. it is the distributed router gateway port.
+# Switch alice also has a localnet port defined.
+# An additional switch outside has a localnet port and the
+# same subnet as alice (172.16.1.0/24).
+
+# Physical network:
+# Three hypervisors hv[123].
+# hv1 hosts vif foo1.
+# hv2 is the "redirect-chassis" that hosts the distributed
+# router gateway port.
+# hv3 hosts vif outside1.
+# In order to show that connectivity works only through hv2,
+# an initial round of tests is run without any bridge-mapping
+# defined for the localnet on hv2.  These tests are expected
+# to fail.
+# Subsequent tests are run after defining the bridge-mapping
+# for the localnet on hv2. These tests are expected to succeed.
+
+# Create three hypervisors and create OVS ports corresponding
+to logical ports.
+net_add n1
+
+sim_add hv1
+as hv1
+ovs-vsctl add-br br-phys
+ovn_attach n1 br-phys 192.168.0.1
+ovs-vsctl -- add-port br-int hv1-vif1 -- \
+    set interface hv1-vif1 external-ids:iface-id=foo1 \
+    options:tx_pcap=hv1/vif1-tx.pcap \
+    options:rxq_pcap=hv1/vif1-rx.pcap \
+    ofport-request=1
+
+sim_add hv2
+as hv2
+ovs-vsctl add-br br-phys
+ovn_attach n1 br-phys 192.168.0.2
+
+sim_add hv3
+as hv3
+ovs-vsctl add-br br-phys
+ovn_attach n1 br-phys 192.168.0.3
+ovs-vsctl -- add-port br-int hv3-vif1 -- \
+    set interface hv3-vif1 external-ids:iface-id=outside1 \
+    options:tx_pcap=hv3/vif1-tx.pcap \
+    options:rxq_pcap=hv3/vif1-rx.pcap \
+    ofport-request=1
+
+# Pre-populate the hypervisors' ARP tables so that we don't lose any
+# packets for ARP resolution (native tunneling doesn't queue packets
+# for ARP resolution).
+ovn_populate_arp
+
+ovn-nbctl create Logical_Router name=R1
+
+ovn-nbctl ls-add foo
+ovn-nbctl ls-add alice
+ovn-nbctl ls-add outside
+
+# Connect foo to R1
+ovn-nbctl lrp-add R1 foo 00:00:01:01:02:03 192.168.1.1/24
+ovn-nbctl lsp-add foo rp-foo -- set Logical_Switch_Port rp-foo \
+    type=router options:router-port=foo \
+    -- lsp-set-addresses rp-foo router
+
+# Connect alice to R1 as distributed router gateway port on hv2
+ovn-nbctl lrp-add R1 alice 00:00:02:01:02:03 172.16.1.1/24 \
+    -- set Logical_Router_Port alice options:redirect-chassis="hv2"
+ovn-nbctl lsp-add alice rp-alice -- set Logical_Switch_Port rp-alice \
+    type=router options:router-port=alice \
+    -- lsp-set-addresses rp-alice router
+
+# Create logical port foo1 in foo
+ovn-nbctl lsp-add foo foo1 \
+-- lsp-set-addresses foo1 "f0:00:00:01:02:03 192.168.1.2"
+
+# Create logical port outside1 in outside
+ovn-nbctl lsp-add outside outside1 \
+-- lsp-set-addresses outside1 "f0:00:00:01:02:04 172.16.1.3"
+
+# Create localnet port in alice
+ovn-nbctl lsp-add alice ln-alice
+ovn-nbctl lsp-set-addresses ln-alice unknown
+ovn-nbctl lsp-set-type ln-alice localnet
+ovn-nbctl lsp-set-options ln-alice network_name=phys
+
+# Create localnet port in outside
+ovn-nbctl lsp-add outside ln-outside
+ovn-nbctl lsp-set-addresses ln-outside unknown
+ovn-nbctl lsp-set-type ln-outside localnet
+ovn-nbctl lsp-set-options ln-outside network_name=phys
+
+# Create bridge-mappings on hv1 and hv3, leaving hv2 for later
+as hv1 ovs-vsctl set open . external-ids:ovn-bridge-mappings=phys:br-phys
+as hv3 ovs-vsctl set open . external-ids:ovn-bridge-mappings=phys:br-phys
+
+
+# Allow some time for ovn-northd and ovn-controller to catch up.
+# XXX This should be more systematic.
+sleep 2
+
+echo "---------NB dump-----"
+ovn-nbctl show
+echo "---------------------"
+ovn-nbctl list logical_router
+echo "---------------------"
+ovn-nbctl list logical_router_port
+echo "---------------------"
+
+echo "---------SB dump-----"
+ovn-sbctl list datapath_binding
+echo "---------------------"
+ovn-sbctl list port_binding
+echo "---------------------"
+ovn-sbctl dump-flows
+echo "---------------------"
+ovn-sbctl list chassis
+ovn-sbctl list encap
+echo "---------------------"
+
+echo "------ hv1 dump ----------"
+as hv1 ovs-ofctl show br-int
+as hv1 ovs-ofctl dump-flows br-int
+echo "------ hv2 dump ----------"
+as hv2 ovs-ofctl show br-int
+as hv2 ovs-ofctl dump-flows br-int
+echo "------ hv3 dump ----------"
+as hv3 ovs-ofctl show br-int
+as hv3 ovs-ofctl dump-flows br-int
+echo "--------------------------"
+
+# Check that redirect mapping is programmed only on hv2
+AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=33 | grep =0x3,metadata=0x1 | wc -l], [0], [0
+])
+AT_CHECK([as hv2 ovs-ofctl dump-flows br-int table=33 | grep =0x3,metadata=0x1 | grep load:0x2- | wc -l], [0], [1
+])
+# Check that hv1 sends chassisredirect port traffic to hv2
+AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=32 | grep =0x3,metadata=0x1 | grep output | wc -l], [0], [1
+])
+AT_CHECK([as hv2 ovs-ofctl dump-flows br-int table=32 | grep =0x3,metadata=0x1 | wc -l], [0], [0
+])
+# Check that arp reply on distributed gateway port is only programmed on hv2
+AT_CHECK([as hv1 ovs-ofctl dump-flows br-int | grep arp | grep =0x2,metadata=0x1 | wc -l], [0], [0
+])
+AT_CHECK([as hv2 ovs-ofctl dump-flows br-int | grep arp | grep =0x2,metadata=0x1 | wc -l], [0], [1
+])
+
+
+ip_to_hex() {
+    printf "%02x%02x%02x%02x" "$@"
+}
+
+
+: > hv2-vif1.expected
+: > hv3-vif1.expected
+
+# test_arp INPORT SHA SPA TPA [REPLY_HA]
+#
+# Causes a packet to be received on INPORT.  The packet is an ARP
+# request with SHA, SPA, and TPA as specified.  If REPLY_HA is provided, then
+# it should be the hardware address of the target to expect to receive in an
+# ARP reply; otherwise no reply is expected.
+#
+# INPORT is an logical switch port number, e.g. 11 for vif11.
+# SHA and REPLY_HA are each 12 hex digits.
+# SPA and TPA are each 8 hex digits.
+test_arp() {
+    local hv=$1 inport=$2 sha=$3 spa=$4 tpa=$5 reply_ha=$6
+    local request=ffffffffffff${sha}08060001080006040001${sha}${spa}ffffffffffff${tpa}
+    as hv$hv ovs-appctl netdev-dummy/receive hv${hv}-vif$inport $request
+
+    if test X$reply_ha != X; then
+        # Expect to receive the reply, if any.
+        local reply=${sha}${reply_ha}08060001080006040002${reply_ha}${tpa}${sha}${spa}
+        echo $reply >> hv${hv}-vif$inport.expected
+    fi
+}
+
+rtr_ip=$(ip_to_hex 172 16 1 1)
+foo_ip=$(ip_to_hex 192 168 1 2)
+outside_ip=$(ip_to_hex 172 16 1 3)
+
+echo $rtr_ip
+echo $foo_ip
+echo $outside_ip
+
+# ARP for router IP address from outside1, no response expected
+test_arp 3 1 f00000010204 $outside_ip $rtr_ip
+
+# Now check the packets actually received against the ones expected.
+OVN_CHECK_PACKETS([hv3/vif1-tx.pcap], [hv3-vif1.expected])
+
+# Send ip packet between foo1 and outside1
+src_mac="f00000010203"
+dst_mac="000001010203"
+src_ip=`ip_to_hex 192 168 1 2`
+dst_ip=`ip_to_hex 172 16 1 3`
+packet=${dst_mac}${src_mac}08004500001c0000000040110000${src_ip}${dst_ip}0035111100080000
+
+# Now check the packets actually received against the ones expected.
+OVN_CHECK_PACKETS([hv3/vif1-tx.pcap], [hv3-vif1.expected])
+
+# Now add bridge-mappings on hv2, which should make everything work
+as hv2 ovs-vsctl set open . external-ids:ovn-bridge-mappings=phys:br-phys
+
+# Allow some time for ovn-northd and ovn-controller to catch up.
+# XXX This should be more systematic.
+sleep 2
+
+# ARP for router IP address from outside1
+test_arp 3 1 f00000010204 $outside_ip $rtr_ip 000002010203
+
+# Now check the packets actually received against the ones expected.
+OVN_CHECK_PACKETS([hv3/vif1-tx.pcap], [hv3-vif1.expected])
+
+# Send ip packet between foo1 and outside1
+src_mac="f00000010203"
+dst_mac="000001010203"
+src_ip=`ip_to_hex 192 168 1 2`
+dst_ip=`ip_to_hex 172 16 1 3`
+packet=${dst_mac}${src_mac}08004500001c0000000040110000${src_ip}${dst_ip}0035111100080000
+
+# ARP request packet to expect at outside1
+src_mac="000002010203"
+src_ip=`ip_to_hex 172 16 1 1`
+arp_request=ffffffffffff${src_mac}08060001080006040001${src_mac}${src_ip}000000000000${dst_ip}
+
+as hv1 ovs-appctl netdev-dummy/receive hv1-vif1 $packet
+
+echo $arp_request >> hv3-vif1.expected
+OVN_CHECK_PACKETS([hv3/vif1-tx.pcap], [hv3-vif1.expected])
+
+# Send ARP reply from outside1 back to the router
+reply_mac="f00000010204"
+arp_reply=${src_mac}${reply_mac}08060001080006040002${reply_mac}${dst_ip}${src_mac}${src_ip}
+
+as hv3 ovs-appctl netdev-dummy/receive hv3-vif1 $arp_reply
+
+# Allow some time for ovn-northd and ovn-controller to catch up.
+# XXX This should be more systematic.
+sleep 1
+
+# Packet to Expect at outside1
+src_mac="000002010203"
+dst_mac="f00000010204"
+src_ip=`ip_to_hex 192 168 1 2`
+dst_ip=`ip_to_hex 172 16 1 3`
+expected=${dst_mac}${src_mac}08004500001c000000003f110100${src_ip}${dst_ip}0035111100080000
+
+# Resend packet from foo1 to outside1
+as hv1 ovs-appctl netdev-dummy/receive hv1-vif1 $packet
+
+echo "------ hv1 dump ----------"
+as hv1 ovs-ofctl show br-int
+as hv1 ovs-ofctl dump-flows br-int
+echo "------ hv2 dump ----------"
+as hv2 ovs-ofctl show br-int
+as hv2 ovs-ofctl dump-flows br-int
+echo "------ hv3 dump ----------"
+as hv3 ovs-ofctl show br-int
+as hv3 ovs-ofctl dump-flows br-int
+echo "----------------------------"
+
+echo $expected >> hv3-vif1.expected
+OVN_CHECK_PACKETS([hv3/vif1-tx.pcap], [hv3-vif1.expected])
+
+#Check ovn-trace over "chassisredirect" port
+AT_CAPTURE_FILE([trace])
+ovn_trace () {
+    ovn-trace --all "$@" | tee trace | sed '1,/Minimal trace/d'
+}
+
+echo 'ip.ttl--;' > expout
+echo 'eth.src = 00:00:02:01:02:03;' >> expout
+echo 'eth.dst = f0:00:00:01:02:04;' >> expout
+echo 'output("ln-alice");' >> expout
+AT_CHECK_UNQUOTED([ovn_trace foo 'inport == "foo1" && eth.src == f0:00:00:01:02:03 && eth.dst == 00:00:01:01:02:03 && ip4.src == 192.168.1.2 && ip4.dst == 172.16.1.3 && ip.ttl == 0xff'], [0], [expout])
+
+# Create logical port alice1 in alice on hv1
+as hv1 ovs-vsctl -- add-port br-int hv1-vif2 -- \
+    set interface hv1-vif2 external-ids:iface-id=alice1 \
+    options:tx_pcap=hv1/vif2-tx.pcap \
+    options:rxq_pcap=hv1/vif2-rx.pcap \
+    ofport-request=1
+
+ovn-nbctl lsp-add alice alice1 \
+-- lsp-set-addresses alice1 "f0:00:00:01:02:05 172.16.1.4"
+
+# Create logical port foo2 in foo on hv2
+as hv2 ovs-vsctl -- add-port br-int hv2-vif1 -- \
+    set interface hv2-vif1 external-ids:iface-id=foo2 \
+    options:tx_pcap=hv2/vif1-tx.pcap \
+    options:rxq_pcap=hv2/vif1-rx.pcap \
+    ofport-request=1
+
+ovn-nbctl lsp-add foo foo2 \
+-- lsp-set-addresses foo2 "f0:00:00:01:02:06 192.168.1.3"
+
+# Allow some time for ovn-northd and ovn-controller to catch up.
+# XXX This should be more systematic.
+sleep 1
+
+: > hv1-vif2.expected
+
+# Send ip packet between alice1 and foo2
+src_mac="f00000010205"
+dst_mac="000002010203"
+src_ip=`ip_to_hex 172 16 1 4`
+dst_ip=`ip_to_hex 192 168 1 3`
+packet=${dst_mac}${src_mac}08004500001c0000000040110000${src_ip}${dst_ip}0035111100080000
+
+as hv1 ovs-appctl netdev-dummy/receive hv1-vif2 $packet
+
+# Packet to Expect at foo2
+src_mac="000001010203"
+dst_mac="f00000010206"
+src_ip=`ip_to_hex 172 16 1 4`
+dst_ip=`ip_to_hex 192 168 1 3`
+expected=${dst_mac}${src_mac}08004500001c000000003f110100${src_ip}${dst_ip}0035111100080000
+
+echo $expected >> hv2-vif1.expected
+OVN_CHECK_PACKETS([hv2/vif1-tx.pcap], [hv2-vif1.expected])
+
+OVN_CLEANUP([hv1],[hv2],[hv3])
+
+AT_CLEANUP