diff mbox series

[ovs-dev] ovn: Send GARP for the router ports connected to localnet switches

Message ID 20190501160311.29298-1-nusiddiq@redhat.com
State Changes Requested
Headers show
Series [ovs-dev] ovn: Send GARP for the router ports connected to localnet switches | expand

Commit Message

Numan Siddique May 1, 2019, 4:03 p.m. UTC
From: Numan Siddique <nusiddiq@redhat.com>

With the commit [1], the routing for the provider logical switches
connected to a router is centralized on the master gateway chassis
(if the option - reside-on-redirect-chassis) is set. When the
failover happens and a standby gateway chassis becomes master,
it should send GARPs for the router port macs. Without this, the
physical switch doesn't learn the new location of the router port macs
immediately and this could result in traffic disruption.

This patch addresses this issue so that the ovn-controller which claims the
distributed gatweway router port sends out the GARPs.

[1] - 85706c34d53d ("ovn: Avoid tunneling for VLAN packets redirected to a gateway chassis")

Signed-off-by: Numan Siddique <nusiddiq@redhat.com>
---
 ovn/northd/ovn-northd.c | 21 +++++++++++++++
 tests/ovn.at            | 58 +++++++++++++++++++++++++++++++++++++++--
 2 files changed, 77 insertions(+), 2 deletions(-)

Comments

Han Zhou June 6, 2019, 11:47 p.m. UTC | #1
On Wed, May 1, 2019 at 9:04 AM <nusiddiq@redhat.com> wrote:
>
> From: Numan Siddique <nusiddiq@redhat.com>
>
> With the commit [1], the routing for the provider logical switches
> connected to a router is centralized on the master gateway chassis
> (if the option - reside-on-redirect-chassis) is set. When the
> failover happens and a standby gateway chassis becomes master,
> it should send GARPs for the router port macs. Without this, the
> physical switch doesn't learn the new location of the router port macs
> immediately and this could result in traffic disruption.
>
> This patch addresses this issue so that the ovn-controller which claims
the
> distributed gatweway router port sends out the GARPs.
>
> [1] - 85706c34d53d ("ovn: Avoid tunneling for VLAN packets redirected to
a gateway chassis")
>
> Signed-off-by: Numan Siddique <nusiddiq@redhat.com>
> ---
>  ovn/northd/ovn-northd.c | 21 +++++++++++++++
>  tests/ovn.at            | 58 +++++++++++++++++++++++++++++++++++++++--
>  2 files changed, 77 insertions(+), 2 deletions(-)
>
Hi Numan,

Thanks for the fix. I have 2 comments:

1. The title is general which seems to address the problem for all "router
ports connected to localnet switches". However, the commit message body and
the code seems only handling the "reside-on-redirect-chassis" scenario,
without taking care of the more common scenario - the gateway port GARP.

2. The fix seems all related to "nat_addresses" handling, but this problem
is not directly related to "NAT". After reading more context of the code, I
realized that it is the right place to fix, but it is really not
straightforward to understand. Of course this is not a problem introduced
by current patch. It would be better if we had named the option
"garp_addresses" instead of "nat_addresses", but I think it may be hard to
rename at this stage because it will introduce compatibility problem. So
probably we can add some more comments just to make the context more clear
for readers.

Thanks,
Han
Numan Siddique June 10, 2019, 5:27 p.m. UTC | #2
On Fri, Jun 7, 2019 at 5:18 AM Han Zhou <zhouhan@gmail.com> wrote:

>
>
> On Wed, May 1, 2019 at 9:04 AM <nusiddiq@redhat.com> wrote:
> >
> > From: Numan Siddique <nusiddiq@redhat.com>
> >
> > With the commit [1], the routing for the provider logical switches
> > connected to a router is centralized on the master gateway chassis
> > (if the option - reside-on-redirect-chassis) is set. When the
> > failover happens and a standby gateway chassis becomes master,
> > it should send GARPs for the router port macs. Without this, the
> > physical switch doesn't learn the new location of the router port macs
> > immediately and this could result in traffic disruption.
> >
> > This patch addresses this issue so that the ovn-controller which claims
> the
> > distributed gatweway router port sends out the GARPs.
> >
> > [1] - 85706c34d53d ("ovn: Avoid tunneling for VLAN packets redirected to
> a gateway chassis")
> >
> > Signed-off-by: Numan Siddique <nusiddiq@redhat.com>
> > ---
> >  ovn/northd/ovn-northd.c | 21 +++++++++++++++
> >  tests/ovn.at            | 58 +++++++++++++++++++++++++++++++++++++++--
> >  2 files changed, 77 insertions(+), 2 deletions(-)
> >
> Hi Numan,
>
> Thanks for the fix. I have 2 comments:
>
> 1. The title is general which seems to address the problem for all "router
> ports connected to localnet switches". However, the commit message body and
> the code seems only handling the "reside-on-redirect-chassis" scenario,
> without taking care of the more common scenario - the gateway port GARP.
>

Agree. I think Ankur (CC'ed) is planning to handle this scenario - i.e
sending the GAR for the gateway ports.



>
> 2. The fix seems all related to "nat_addresses" handling, but this problem
> is not directly related to "NAT". After reading more context of the code, I
> realized that it is the right place to fix, but it is really not
> straightforward to understand. Of course this is not a problem introduced
> by current patch. It would be better if we had named the option
> "garp_addresses" instead of "nat_addresses", but I think it may be hard to
> rename at this stage because it will introduce compatibility problem. So
> probably we can add some more comments just to make the context more clear
> for readers.
>

How about adding a new column "garp_addresses" ? This column can be used
both for the gw port GARP and for the "reside-on-redirect-chassis" ?

If this makes sense, I will work on it.
@Ankur - I will submit a patch with a new column (which will send GARPs for
both the gw router ports and other router ports with the option -
reside-on-redirect-chassis ) and and then you can enhance it to send the
GARPs in periodic interval instead of initial bursts  (which the present
code does for NAT addresses ) ? Does this sound good ?

Thanks
Numan



> Thanks,
> Han
>
Ankur Sharma June 10, 2019, 10:53 p.m. UTC | #3
Hi Numan,

Sure, we can wait for your “garp_addresses”  based changes to do both “gw ports and  “resident-on-redirect-chassis”.
I can provide periodic advertisement related changes after that.

Regards,
Ankur

From: Numan Siddique <nusiddiq@redhat.com>
Sent: Monday, June 10, 2019 10:27 AM
To: Han Zhou <zhouhan@gmail.com>
Cc: ovs dev <dev@openvswitch.org>; Ankur Sharma <ankur.sharma@nutanix.com>
Subject: Re: [ovs-dev] [PATCH] ovn: Send GARP for the router ports connected to localnet switches



On Fri, Jun 7, 2019 at 5:18 AM Han Zhou <zhouhan@gmail.com<mailto:zhouhan@gmail.com>> wrote:


On Wed, May 1, 2019 at 9:04 AM <nusiddiq@redhat.com<mailto:nusiddiq@redhat.com>> wrote:
>
> From: Numan Siddique <nusiddiq@redhat.com<mailto:nusiddiq@redhat.com>>
>
> With the commit [1], the routing for the provider logical switches
> connected to a router is centralized on the master gateway chassis
> (if the option - reside-on-redirect-chassis) is set. When the
> failover happens and a standby gateway chassis becomes master,
> it should send GARPs for the router port macs. Without this, the
> physical switch doesn't learn the new location of the router port macs
> immediately and this could result in traffic disruption.
>
> This patch addresses this issue so that the ovn-controller which claims the
> distributed gatweway router port sends out the GARPs.
>
> [1] - 85706c34d53d ("ovn: Avoid tunneling for VLAN packets redirected to a gateway chassis")
>
> Signed-off-by: Numan Siddique <nusiddiq@redhat.com<mailto:nusiddiq@redhat.com>>
> ---
>  ovn/northd/ovn-northd.c | 21 +++++++++++++++
>  tests/ovn.at [ovn.at]<https://urldefense.proofpoint.com/v2/url?u=http-3A__ovn.at&d=DwMFaQ&c=s883GpUCOChKOHiocYtGcg&r=mZwX9gFQgeJHzTg-68aCJgsODyUEVsHGFOfL90J6MJY&m=9l_WOe1v0FLuY89KsBfxqhIU01WNzT5ZUjWrfxPfaCk&s=-UjbquhF3wtMvv-nrpNleIelQPt0yoB63gdfs13KZ1M&e=>            | 58 +++++++++++++++++++++++++++++++++++++++--
>  2 files changed, 77 insertions(+), 2 deletions(-)
>
Hi Numan,

Thanks for the fix. I have 2 comments:
1. The title is general which seems to address the problem for all "router ports connected to localnet switches". However, the commit message body and the code seems only handling the "reside-on-redirect-chassis" scenario, without taking care of the more common scenario - the gateway port GARP.

Agree. I think Ankur (CC'ed) is planning to handle this scenario - i.e sending the GAR for the gateway ports.



2. The fix seems all related to "nat_addresses" handling, but this problem is not directly related to "NAT". After reading more context of the code, I realized that it is the right place to fix, but it is really not straightforward to understand. Of course this is not a problem introduced by current patch. It would be better if we had named the option "garp_addresses" instead of "nat_addresses", but I think it may be hard to rename at this stage because it will introduce compatibility problem. So probably we can add some more comments just to make the context more clear for readers.

How about adding a new column "garp_addresses" ? This column can be used both for the gw port GARP and for the "reside-on-redirect-chassis" ?

If this makes sense, I will work on it.
@Ankur - I will submit a patch with a new column (which will send GARPs for both the gw router ports and other router ports with the option - reside-on-redirect-chassis ) and and then you can enhance it to send the GARPs in periodic interval instead of initial bursts  (which the present code does for NAT addresses ) ? Does this sound good ?


Thanks
Numan



Thanks,
Han
Han Zhou June 11, 2019, 4:06 p.m. UTC | #4
On Mon, Jun 10, 2019 at 10:27 AM Numan Siddique <nusiddiq@redhat.com> wrote:

>
>
> On Fri, Jun 7, 2019 at 5:18 AM Han Zhou <zhouhan@gmail.com> wrote:
>
>>
>>
>> On Wed, May 1, 2019 at 9:04 AM <nusiddiq@redhat.com> wrote:
>> >
>> > From: Numan Siddique <nusiddiq@redhat.com>
>> >
>> > With the commit [1], the routing for the provider logical switches
>> > connected to a router is centralized on the master gateway chassis
>> > (if the option - reside-on-redirect-chassis) is set. When the
>> > failover happens and a standby gateway chassis becomes master,
>> > it should send GARPs for the router port macs. Without this, the
>> > physical switch doesn't learn the new location of the router port macs
>> > immediately and this could result in traffic disruption.
>> >
>> > This patch addresses this issue so that the ovn-controller which claims
>> the
>> > distributed gatweway router port sends out the GARPs.
>> >
>> > [1] - 85706c34d53d ("ovn: Avoid tunneling for VLAN packets redirected
>> to a gateway chassis")
>> >
>> > Signed-off-by: Numan Siddique <nusiddiq@redhat.com>
>> > ---
>> >  ovn/northd/ovn-northd.c | 21 +++++++++++++++
>> >  tests/ovn.at            | 58 +++++++++++++++++++++++++++++++++++++++--
>> >  2 files changed, 77 insertions(+), 2 deletions(-)
>> >
>> Hi Numan,
>>
>> Thanks for the fix. I have 2 comments:
>>
>> 1. The title is general which seems to address the problem for all
>> "router ports connected to localnet switches". However, the commit message
>> body and the code seems only handling the "reside-on-redirect-chassis"
>> scenario, without taking care of the more common scenario - the gateway
>> port GARP.
>>
>
> Agree. I think Ankur (CC'ed) is planning to handle this scenario - i.e
> sending the GAR for the gateway ports.
>
>
>
>>
>> 2. The fix seems all related to "nat_addresses" handling, but this
>> problem is not directly related to "NAT". After reading more context of the
>> code, I realized that it is the right place to fix, but it is really not
>> straightforward to understand. Of course this is not a problem introduced
>> by current patch. It would be better if we had named the option
>> "garp_addresses" instead of "nat_addresses", but I think it may be hard to
>> rename at this stage because it will introduce compatibility problem. So
>> probably we can add some more comments just to make the context more clear
>> for readers.
>>
>
> How about adding a new column "garp_addresses" ? This column can be used
> both for the gw port GARP and for the "reside-on-redirect-chassis" ?
>

Hi Numan, I am not sure if adding a new column is better than clarifying
with some documentation. If we add a new column, we should deprecate the
old "nat_addresses" column (and kept there only for ovsdb upgrading). I
don't think there is a need to distinguish in the schema if we are sending
GARP for NAT or for GW port.

>
> If this makes sense, I will work on it.
> @Ankur - I will submit a patch with a new column (which will send GARPs
> for both the gw router ports and other router ports with the option -
> reside-on-redirect-chassis ) and and then you can enhance it to send the
> GARPs in periodic interval instead of initial bursts  (which the present
> code does for NAT addresses ) ? Does this sound good ?
>
> Thanks
> Numan
>
>
>
>> Thanks,
>> Han
>>
>
Numan Siddique June 12, 2019, 7:29 a.m. UTC | #5
On Tue, Jun 11, 2019 at 9:37 PM Han Zhou <zhouhan@gmail.com> wrote:

>
>
> On Mon, Jun 10, 2019 at 10:27 AM Numan Siddique <nusiddiq@redhat.com>
> wrote:
>
>>
>>
>> On Fri, Jun 7, 2019 at 5:18 AM Han Zhou <zhouhan@gmail.com> wrote:
>>
>>>
>>>
>>> On Wed, May 1, 2019 at 9:04 AM <nusiddiq@redhat.com> wrote:
>>> >
>>> > From: Numan Siddique <nusiddiq@redhat.com>
>>> >
>>> > With the commit [1], the routing for the provider logical switches
>>> > connected to a router is centralized on the master gateway chassis
>>> > (if the option - reside-on-redirect-chassis) is set. When the
>>> > failover happens and a standby gateway chassis becomes master,
>>> > it should send GARPs for the router port macs. Without this, the
>>> > physical switch doesn't learn the new location of the router port macs
>>> > immediately and this could result in traffic disruption.
>>> >
>>> > This patch addresses this issue so that the ovn-controller which
>>> claims the
>>> > distributed gatweway router port sends out the GARPs.
>>> >
>>> > [1] - 85706c34d53d ("ovn: Avoid tunneling for VLAN packets redirected
>>> to a gateway chassis")
>>> >
>>> > Signed-off-by: Numan Siddique <nusiddiq@redhat.com>
>>> > ---
>>> >  ovn/northd/ovn-northd.c | 21 +++++++++++++++
>>> >  tests/ovn.at            | 58
>>> +++++++++++++++++++++++++++++++++++++++--
>>> >  2 files changed, 77 insertions(+), 2 deletions(-)
>>> >
>>> Hi Numan,
>>>
>>> Thanks for the fix. I have 2 comments:
>>>
>>> 1. The title is general which seems to address the problem for all
>>> "router ports connected to localnet switches". However, the commit message
>>> body and the code seems only handling the "reside-on-redirect-chassis"
>>> scenario, without taking care of the more common scenario - the gateway
>>> port GARP.
>>>
>>
>> Agree. I think Ankur (CC'ed) is planning to handle this scenario - i.e
>> sending the GAR for the gateway ports.
>>
>>
>>
>>>
>>> 2. The fix seems all related to "nat_addresses" handling, but this
>>> problem is not directly related to "NAT". After reading more context of the
>>> code, I realized that it is the right place to fix, but it is really not
>>> straightforward to understand. Of course this is not a problem introduced
>>> by current patch. It would be better if we had named the option
>>> "garp_addresses" instead of "nat_addresses", but I think it may be hard to
>>> rename at this stage because it will introduce compatibility problem. So
>>> probably we can add some more comments just to make the context more clear
>>> for readers.
>>>
>>
>> How about adding a new column "garp_addresses" ? This column can be used
>> both for the gw port GARP and for the "reside-on-redirect-chassis" ?
>>
>
> Hi Numan, I am not sure if adding a new column is better than clarifying
> with some documentation. If we add a new column, we should deprecate the
> old "nat_addresses" column (and kept there only for ovsdb upgrading). I
> don't think there is a need to distinguish in the schema if we are sending
> GARP for NAT or for GW port.
>
>>
>>
That's what even I thought. Ankur has some other requirements for sending
the GARPs for the gateway router port IPs. He wants the GARPs to be sent
periodically. I think where as now, the GARPs are sent as a burst whenever
a chassis claims a
gw router port.

Do you think it's better to change the GARP interval as per this patch -
https://patchwork.ozlabs.org/patch/1107466/
which Ankur has proposed for all the addresses - router addresses and NAT
addresses ? Please grep for "add_garp" in
the above patch for more details.

Thanks
Numan




> If this makes sense, I will work on it.
>> @Ankur - I will submit a patch with a new column (which will send GARPs
>> for both the gw router ports and other router ports with the option -
>> reside-on-redirect-chassis ) and and then you can enhance it to send the
>> GARPs in periodic interval instead of initial bursts  (which the present
>> code does for NAT addresses ) ? Does this sound good ?
>>
>> Thanks
>> Numan
>>
>>
>>
>>> Thanks,
>>> Han
>>>
>>
Ankur Sharma June 12, 2019, 10:20 p.m. UTC | #6
Hi,

Just adding a minor clarification inline.

Regards,
Ankur

From: Numan Siddique <nusiddiq@redhat.com>
Sent: Wednesday, June 12, 2019 12:29 AM
To: Han Zhou <zhouhan@gmail.com>
Cc: ovs dev <dev@openvswitch.org>; Ankur Sharma <ankur.sharma@nutanix.com>
Subject: Re: [ovs-dev] [PATCH] ovn: Send GARP for the router ports connected to localnet switches



On Tue, Jun 11, 2019 at 9:37 PM Han Zhou <zhouhan@gmail.com<mailto:zhouhan@gmail.com>> wrote:


On Mon, Jun 10, 2019 at 10:27 AM Numan Siddique <nusiddiq@redhat.com<mailto:nusiddiq@redhat.com>> wrote:


On Fri, Jun 7, 2019 at 5:18 AM Han Zhou <zhouhan@gmail.com<mailto:zhouhan@gmail.com>> wrote:


On Wed, May 1, 2019 at 9:04 AM <nusiddiq@redhat.com<mailto:nusiddiq@redhat.com>> wrote:
>
> From: Numan Siddique <nusiddiq@redhat.com<mailto:nusiddiq@redhat.com>>
>
> With the commit [1], the routing for the provider logical switches
> connected to a router is centralized on the master gateway chassis
> (if the option - reside-on-redirect-chassis) is set. When the
> failover happens and a standby gateway chassis becomes master,
> it should send GARPs for the router port macs. Without this, the
> physical switch doesn't learn the new location of the router port macs
> immediately and this could result in traffic disruption.
>
> This patch addresses this issue so that the ovn-controller which claims the
> distributed gatweway router port sends out the GARPs.
>
> [1] - 85706c34d53d ("ovn: Avoid tunneling for VLAN packets redirected to a gateway chassis")
>
> Signed-off-by: Numan Siddique <nusiddiq@redhat.com<mailto:nusiddiq@redhat.com>>
> ---
>  ovn/northd/ovn-northd.c | 21 +++++++++++++++
>  tests/ovn.at [ovn.at]<https://urldefense.proofpoint.com/v2/url?u=http-3A__ovn.at&d=DwMFaQ&c=s883GpUCOChKOHiocYtGcg&r=mZwX9gFQgeJHzTg-68aCJgsODyUEVsHGFOfL90J6MJY&m=NiZlnfKLpFZTne6nLvW2LY-mMfT1ROA_XO3vDUxswGA&s=UbZ9hOu17Ha6b2AEfkP9hmScErxLKfSpkmLB33spkiw&e=>            | 58 +++++++++++++++++++++++++++++++++++++++--
>  2 files changed, 77 insertions(+), 2 deletions(-)
>
Hi Numan,

Thanks for the fix. I have 2 comments:
1. The title is general which seems to address the problem for all "router ports connected to localnet switches". However, the commit message body and the code seems only handling the "reside-on-redirect-chassis" scenario, without taking care of the more common scenario - the gateway port GARP.

Agree. I think Ankur (CC'ed) is planning to handle this scenario - i.e sending the GAR for the gateway ports.



2. The fix seems all related to "nat_addresses" handling, but this problem is not directly related to "NAT". After reading more context of the code, I realized that it is the right place to fix, but it is really not straightforward to understand. Of course this is not a problem introduced by current patch. It would be better if we had named the option "garp_addresses" instead of "nat_addresses", but I think it may be hard to rename at this stage because it will introduce compatibility problem. So probably we can add some more comments just to make the context more clear for readers.

How about adding a new column "garp_addresses" ? This column can be used both for the gw port GARP and for the "reside-on-redirect-chassis" ?

Hi Numan, I am not sure if adding a new column is better than clarifying with some documentation. If we add a new column, we should deprecate the old "nat_addresses" column (and kept there only for ovsdb upgrading). I don't think there is a need to distinguish in the schema if we are sending GARP for NAT or for GW port.


That's what even I thought. Ankur has some other requirements for sending the GARPs for the gateway router port IPs. He wants the GARPs to be sent periodically. I think where as now, the GARPs are sent as a burst whenever a chassis claims a
gw router port.

Do you think it's better to change the GARP interval as per this patch - https://patchwork.ozlabs.org/patch/1107466/ [patchwork.ozlabs.org]<https://urldefense.proofpoint.com/v2/url?u=https-3A__patchwork.ozlabs.org_patch_1107466_&d=DwMFaQ&c=s883GpUCOChKOHiocYtGcg&r=mZwX9gFQgeJHzTg-68aCJgsODyUEVsHGFOfL90J6MJY&m=NiZlnfKLpFZTne6nLvW2LY-mMfT1ROA_XO3vDUxswGA&s=cGB3-m1D9uO-iDu65mItknPBQwEMNe7noJBtRKNaoBY&e=>
which Ankur has proposed for all the addresses - router addresses and NAT addresses ? Please grep for "add_garp" in
the above patch for more details.

[ANKUR]:
a. We should still be sending burst and by burst I mean sending multiple GARP packets with increasing inter packet interval.
    i.e existing behavior is correct.

b. In addition to above I wanted to add 2 more changes.

c. Send the GARP for a gateway chassis attached router port, even if NAT is not configured on it.
    For vlan backed networks, NAT is not mandatory to do North South communication.

d. For c. above do a. periodically. This is more being safe solution. Reason being that, in the absence of encapsulation,
    redirection to gateway chassis will happen over L2, using the attached router port mac as destination mac.
    Hence, we have to ensure that this mac is always in learnt state in the Physical switch, else redirection would end
    up causing flooding.

Thanks
Numan



If this makes sense, I will work on it.
@Ankur - I will submit a patch with a new column (which will send GARPs for both the gw router ports and other router ports with the option - reside-on-redirect-chassis ) and and then you can enhance it to send the GARPs in periodic interval instead of initial bursts  (which the present code does for NAT addresses ) ? Does this sound good ?

Thanks
Numan



Thanks,
Han
Numan Siddique June 14, 2019, 12:44 p.m. UTC | #7
On Thu, Jun 13, 2019 at 3:51 AM Ankur Sharma <ankur.sharma@nutanix.com>
wrote:

> Hi,
>
> Just adding a minor clarification inline.
>
> Regards,
> Ankur
>
>
>
> *From:* Numan Siddique <nusiddiq@redhat.com>
> *Sent:* Wednesday, June 12, 2019 12:29 AM
> *To:* Han Zhou <zhouhan@gmail.com>
> *Cc:* ovs dev <dev@openvswitch.org>; Ankur Sharma <
> ankur.sharma@nutanix.com>
> *Subject:* Re: [ovs-dev] [PATCH] ovn: Send GARP for the router ports
> connected to localnet switches
>
>
>
>
>
>
>
> On Tue, Jun 11, 2019 at 9:37 PM Han Zhou <zhouhan@gmail.com> wrote:
>
>
>
>
>
> On Mon, Jun 10, 2019 at 10:27 AM Numan Siddique <nusiddiq@redhat.com>
> wrote:
>
>
>
>
>
> On Fri, Jun 7, 2019 at 5:18 AM Han Zhou <zhouhan@gmail.com> wrote:
>
>
>
> On Wed, May 1, 2019 at 9:04 AM <nusiddiq@redhat.com> wrote:
> >
> > From: Numan Siddique <nusiddiq@redhat.com>
> >
> > With the commit [1], the routing for the provider logical switches
> > connected to a router is centralized on the master gateway chassis
> > (if the option - reside-on-redirect-chassis) is set. When the
> > failover happens and a standby gateway chassis becomes master,
> > it should send GARPs for the router port macs. Without this, the
> > physical switch doesn't learn the new location of the router port macs
> > immediately and this could result in traffic disruption.
> >
> > This patch addresses this issue so that the ovn-controller which claims
> the
> > distributed gatweway router port sends out the GARPs.
> >
> > [1] - 85706c34d53d ("ovn: Avoid tunneling for VLAN packets redirected to
> a gateway chassis")
> >
> > Signed-off-by: Numan Siddique <nusiddiq@redhat.com>
> > ---
> >  ovn/northd/ovn-northd.c | 21 +++++++++++++++
> >  tests/ovn.at [ovn.at]
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__ovn.at&d=DwMFaQ&c=s883GpUCOChKOHiocYtGcg&r=mZwX9gFQgeJHzTg-68aCJgsODyUEVsHGFOfL90J6MJY&m=NiZlnfKLpFZTne6nLvW2LY-mMfT1ROA_XO3vDUxswGA&s=UbZ9hOu17Ha6b2AEfkP9hmScErxLKfSpkmLB33spkiw&e=>
>            | 58 +++++++++++++++++++++++++++++++++++++++--
> >  2 files changed, 77 insertions(+), 2 deletions(-)
> >
> Hi Numan,
>
> Thanks for the fix. I have 2 comments:
>
> 1. The title is general which seems to address the problem for all "router
> ports connected to localnet switches". However, the commit message body and
> the code seems only handling the "reside-on-redirect-chassis" scenario,
> without taking care of the more common scenario - the gateway port GARP.
>
>
>
> Agree. I think Ankur (CC'ed) is planning to handle this scenario - i.e
> sending the GAR for the gateway ports.
>
>
>
>
>
>
>
> 2. The fix seems all related to "nat_addresses" handling, but this problem
> is not directly related to "NAT". After reading more context of the code, I
> realized that it is the right place to fix, but it is really not
> straightforward to understand. Of course this is not a problem introduced
> by current patch. It would be better if we had named the option
> "garp_addresses" instead of "nat_addresses", but I think it may be hard to
> rename at this stage because it will introduce compatibility problem. So
> probably we can add some more comments just to make the context more clear
> for readers.
>
>
>
> How about adding a new column "garp_addresses" ? This column can be used
> both for the gw port GARP and for the "reside-on-redirect-chassis" ?
>
>
>
> Hi Numan, I am not sure if adding a new column is better than clarifying
> with some documentation. If we add a new column, we should deprecate the
> old "nat_addresses" column (and kept there only for ovsdb upgrading). I
> don't think there is a need to distinguish in the schema if we are sending
> GARP for NAT or for GW port.
>
>
>
>
>
> That's what even I thought. Ankur has some other requirements for sending
> the GARPs for the gateway router port IPs. He wants the GARPs to be sent
> periodically. I think where as now, the GARPs are sent as a burst whenever
> a chassis claims a
>
> gw router port.
>
>
>
> Do you think it's better to change the GARP interval as per this patch - https://patchwork.ozlabs.org/patch/1107466/
> [patchwork.ozlabs.org]
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__patchwork.ozlabs.org_patch_1107466_&d=DwMFaQ&c=s883GpUCOChKOHiocYtGcg&r=mZwX9gFQgeJHzTg-68aCJgsODyUEVsHGFOfL90J6MJY&m=NiZlnfKLpFZTne6nLvW2LY-mMfT1ROA_XO3vDUxswGA&s=cGB3-m1D9uO-iDu65mItknPBQwEMNe7noJBtRKNaoBY&e=>
>
> which Ankur has proposed for all the addresses - router addresses and NAT
> addresses ? Please grep for "add_garp" in
>
> the above patch for more details.
>
>
I have submitted the v2 here -
https://patchwork.ozlabs.org/project/openvswitch/list/?series=113880.
Request to please take  a look



> [ANKUR]:
> a. We should still be sending burst and by burst I mean sending multiple
> GARP packets with increasing inter packet interval.
>     i.e existing behavior is correct.
>
> b. In addition to above I wanted to add 2 more changes.
>
>
> c. Send the GARP for a gateway chassis attached router port, even if NAT
> is not configured on it.
>     For vlan backed networks, NAT is not mandatory to do North South
> communication.
>

The patch 3 of the series i submitted will take care of it.


>
> d. For c. above do a. periodically. This is more being safe solution.
> Reason being that, in the absence of encapsulation,
>
>     redirection to gateway chassis will happen over L2, using the attached
> router port mac as destination mac.
>     Hence, we have to ensure that this mac is always in learnt state in
> the Physical switch, else redirection would end
>
>     up causing flooding.
>
>
>

It should be fine to do (d) not just for (c) but for all the NAT addresses
in the Port_Binding.nat_addresses column right ? Thoughts ?
I mean there is no harm I suppose to send the GARPs periodically in bursts
for all the GARP packets which ovn-controllers
generate.
@Ankur - I would suggest that you modify the existing garp code in
ovn-controller to enhance to do (d).

Thanks
Numan



> Thanks
>
> Numan
>
>
>
>
>
>
>
> If this makes sense, I will work on it.
>
> @Ankur - I will submit a patch with a new column (which will send GARPs
> for both the gw router ports and other router ports with the option -
> reside-on-redirect-chassis ) and and then you can enhance it to send the
> GARPs in periodic interval instead of initial bursts  (which the present
> code does for NAT addresses ) ? Does this sound good ?
>
>
>
> Thanks
>
> Numan
>
>
>
>
>
>
>
> Thanks,
>
> Han
>
>
diff mbox series

Patch

diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
index de0c06d4b..4fb6c3a98 100644
--- a/ovn/northd/ovn-northd.c
+++ b/ovn/northd/ovn-northd.c
@@ -2525,6 +2525,27 @@  ovn_port_update_sbrec(struct northd_context *ctx,
                                                          &nat_addresses, 1);
                     destroy_lport_addresses(&laddrs);
                 }
+            } else if (op->peer && op->peer->nbrp && op->peer->od->l3dgw_port
+                       && op->peer->od->l3redirect_port
+                       && smap_get_bool(&op->peer->nbrp->options,
+                                        "reside-on-redirect-chassis", false)) {
+                /* reside-on-gateway-chassis is set and the
+                 * the logical router datapath has distributed router port. Add
+                 * the router mac and IPv4 addresses so that GARP is sent for
+                 * these IPs by the ovn-controller which handles the
+                 * centralized routing. */
+                struct ds nat_addr = DS_EMPTY_INITIALIZER;
+                ds_put_format(&nat_addr, "%s", op->peer->lrp_networks.ea_s);
+                for (size_t i = 0; i < op->peer->lrp_networks.n_ipv4_addrs;
+                     i++) {
+                    ds_put_format(&nat_addr, " %s",
+                                  op->peer->lrp_networks.ipv4_addrs[i].addr_s);
+                }
+                ds_put_format(&nat_addr, " is_chassis_resident(%s)",
+                              op->peer->od->l3redirect_port->json_key);
+                const char *s = ds_cstr(&nat_addr);
+                sbrec_port_binding_set_nat_addresses(op->sb, &s, 1);
+                ds_destroy(&nat_addr);
             } else {
                 sbrec_port_binding_set_nat_addresses(op->sb, NULL, 0);
             }
diff --git a/tests/ovn.at b/tests/ovn.at
index 592f491fd..d0fe1f561 100644
--- a/tests/ovn.at
+++ b/tests/ovn.at
@@ -9395,9 +9395,10 @@  ovn_start
 # # (i.e 8.8.8.8 as destination ip).
 
 # Physical network:
-# # Three hypervisors hv[123].
+# # Four hypervisors hv[1234].
 # # hv1 hosts vif foo1.
 # # hv2 is the "redirect-chassis" that hosts the distributed router gateway port.
+# # Later to test GARPs for the router port - foo, hv2 and hv4 are added to the ha_chassis_group
 # # hv3 hosts nexthop port vif outside1.
 # # All other tests connect hypervisors to network n1 through br-phys for tunneling.
 # # But in this test, hv1 won't connect to n1(and no br-phys in hv1), and
@@ -9422,6 +9423,8 @@  ovs-vsctl \
 start_daemon ovn-controller
 ovs-vsctl -- add-port br-int hv1-vif1 -- \
     set interface hv1-vif1 external-ids:iface-id=foo1 \
+    options:tx_pcap=hv1/vif1-tx.pcap \
+    options:rxq_pcap=hv1/vif1-rx.pcap \
     ofport-request=1
 
 sim_add hv2
@@ -9441,6 +9444,12 @@  ovs-vsctl -- add-port br-int hv3-vif1 -- \
     ofport-request=1
 ovs-vsctl set Open_vSwitch . external-ids:ovn-bridge-mappings="phys:br-phys"
 
+sim_add hv4
+as hv4
+ovs-vsctl add-br br-phys
+ovn_attach n1 br-phys 192.168.0.4
+ovs-vsctl set Open_vSwitch . external-ids:ovn-bridge-mappings="public:br-ex,phys:br-phys"
+
 # Create network n2 for vlan connectivity between hv1 and hv2
 net_add n2
 
@@ -9452,6 +9461,10 @@  as hv2
 ovs-vsctl add-br br-ex
 net_attach n2 br-ex
 
+as hv4
+ovs-vsctl add-br br-ex
+net_attach n2 br-ex
+
 OVN_POPULATE_ARP
 
 ovn-nbctl create Logical_Router name=R1
@@ -9634,7 +9647,48 @@  $PYTHON "$top_srcdir/utilities/ovs-pcap.in" hv3/vif1-tx.pcap | grep ${foo1_ip}${
 cat hv3-vif1.expected > expout
 AT_CHECK([sort hv3-vif1], [0], [expout])
 
-OVN_CLEANUP([hv1],[hv2],[hv3])
+# Test the GARP for the router port ip - 192.168.1.1
+ovn-nbctl --wait=sb ha-chassis-group-add hagrp1
+
+as hv1 reset_pcap_file hv1-vif1 hv1/vif1
+as hv2 reset_pcap_file br-ex_n2 hv2/br-ex_n2
+as hv4 reset_pcap_file br-ex_n2 hv4/br-ex_n2
+
+ovn-nbctl --wait=sb ha-chassis-group-add-chassis hagrp1 hv2 30
+ovn-nbctl --wait=sb ha-chassis-group-add-chassis hagrp1 hv4 20
+
+hagrp1_uuid=`ovn-nbctl --bare --columns _uuid find ha_chassis_group name=hagrp1`
+ovn-nbctl remove logical_router_port alice options redirect-chassis
+ovn-nbctl --wait=sb set logical_router_port alice ha_chassis_group=$hagrp1_uuid
+
+# When hv2 claims the gw router port cr-alice, it should send out
+# GARP for 192.168.1.1 and it should be received by foo1 on hv1.
+
+# foo1 (on hv1) should receive GARP without VLAN tag
+exp_garp_on_foo1="ffffffffffff00000101020308060001080006040001000001010203c0a80101000000000000c0a80101"
+echo $exp_garp_on_foo1 > foo1.expout
+
+# ovn-controller on hv2 should send garp with VLAN tag
+sent_garp="ffffffffffff0000010102038100000208060001080006040001000001010203c0a80101000000000000c0a80101"
+echo $sent_garp > br-ex_n2.expout
+
+OVN_CHECK_PACKETS([hv1/vif1-tx.pcap], [foo1.expout])
+OVN_CHECK_PACKETS([hv2/br-ex_n2-tx.pcap], [br-ex_n2.expout])
+$PYTHON "$top_srcdir/utilities/ovs-pcap.in" hv4/br-ex_n2-tx.pcap > empty
+AT_CHECK([cat empty], [0], [])
+
+# Make hv4 master
+as hv1 reset_pcap_file hv1-vif1 hv1/vif1
+as hv2 reset_pcap_file br-ex_n2 hv2/br-ex_n2
+as hv4 reset_pcap_file br-ex_n2 hv4/br-ex_n2
+ovn-nbctl --wait=sb ha-chassis-group-add-chassis hagrp1 hv4 40
+
+OVN_CHECK_PACKETS([hv1/vif1-tx.pcap], [foo1.expout])
+OVN_CHECK_PACKETS([hv4/br-ex_n2-tx.pcap], [br-ex_n2.expout])
+$PYTHON "$top_srcdir/utilities/ovs-pcap.in" hv2/br-ex_n2-tx.pcap > empty
+AT_CHECK([cat empty], [0], [])
+
+OVN_CLEANUP([hv1],[hv2],[hv3], [hv4])
 AT_CLEANUP
 
 AT_SETUP([ovn -- IPv6 ND Router Solicitation responder])