diff mbox series

[ovs-dev] Remove lport from related_lports if there is no binding

Message ID 1685511349-156407-1-git-send-email-priyankar.jain@nutanix.com
State Changes Requested
Delegated to: Dumitru Ceara
Headers show
Series [ovs-dev] Remove lport from related_lports if there is no binding | expand

Checks

Context Check Description
ovsrobot/apply-robot success apply and check: success
ovsrobot/github-robot-_ovn-kubernetes success github build: passed
ovsrobot/github-robot-_Build_and_Test fail github build: failed

Commit Message

Priyankar Jain May 31, 2023, 5:35 a.m. UTC
Currently during port migration, two chassis (source and destination)
can try to claim the same logical switch port simultaneously for a
short-period of time until the tap is deleted on source hypervisor.
ovn-controllers on these 2 hosts constantly receives port-binding
updates about other chassis claiming the port and as a result it tries
to claim the port again (because its chassis has a tap interface
referencing the LSP). This flapping ends once CMS cleans up tap
interface from the source chassis.

Now following steps occur during a single iteration inc-proc-eng during
flapping:

1. PB update received on OVN controller about other chassis owning the
   port.
2. ovn-controller tries to claim the port.
3. It installs the OVS flows for the port and updates the runtime_data
   to include this port in locally relevant ports.
4. If some change to runtime data happens as part of 3, port-groups
   containing the affected ports are recomputed. It uses related_lports
   runtime data to compute the port-groups.

Finally, ovn-controller sends a port-binding update to SB changing the
chassis to itself.
At a later point of time, SB sends the notification to ovn-controller
about (4) being completed.

Once CMS deletes the tap interface, ovn-controller receives the
notification and updates the runtime data accordingly.

Issue: ovs-flows are (sometimes)not cleaned up upon port migration.

If the notification of OVS interface deletion is received before SB
acks the PortBinding update, then ovn-controller does not cleanup
related_lports leading to incorrect port-groups computation.

i.e if the order of events is as follows:

1. PB update received on OVN controller about other chassis owning the
   port.
2. ovn-controller claims the port, installs OVS flows and sends the
   PortBinding update to SB.
3. OVS interface deletion notification received by ovn-controller.
4. SB ack received for step-2 PB update.

This commit fixes this issue by removing the logical_port from related
port even in case there is no binding available locally.

Signed-off-by: Priyankar Jain <priyankar.jain@nutanix.com>
---
 controller/binding.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Mark Michelson May 31, 2023, 6:59 p.m. UTC | #1
Hi Priyankar,

The description makes the issue crystal clear, and you appear to be 
solving the race condition that can happen between the OVS interface 
table and the southbound port_binding table.

Acked-by: Mark Michelson <mmichels@redhat.com>

Just to let you know, the flapping problem you mention can be avoided 
altogether by using options:requested-chassis on the northbound logical 
switch port. When you migrate the port to a new chassis, place the new 
chassis's name or hostname as this option, and ovn-controller will only 
claim the logical switch port on that chassis. The old chassis will not 
try to claim the port even if the tap is still present.

I wouldn't be surprised if there were other ways to trigger this race 
condition as well. I suspect the port-flapping scenario is most likely 
to trigger it, though.

On 5/31/23 01:35, Priyankar Jain wrote:
> Currently during port migration, two chassis (source and destination)
> can try to claim the same logical switch port simultaneously for a
> short-period of time until the tap is deleted on source hypervisor.
> ovn-controllers on these 2 hosts constantly receives port-binding
> updates about other chassis claiming the port and as a result it tries
> to claim the port again (because its chassis has a tap interface
> referencing the LSP). This flapping ends once CMS cleans up tap
> interface from the source chassis.
> 
> Now following steps occur during a single iteration inc-proc-eng during
> flapping:
> 
> 1. PB update received on OVN controller about other chassis owning the
>     port.
> 2. ovn-controller tries to claim the port.
> 3. It installs the OVS flows for the port and updates the runtime_data
>     to include this port in locally relevant ports.
> 4. If some change to runtime data happens as part of 3, port-groups
>     containing the affected ports are recomputed. It uses related_lports
>     runtime data to compute the port-groups.
> 
> Finally, ovn-controller sends a port-binding update to SB changing the
> chassis to itself.
> At a later point of time, SB sends the notification to ovn-controller
> about (4) being completed.
> 
> Once CMS deletes the tap interface, ovn-controller receives the
> notification and updates the runtime data accordingly.
> 
> Issue: ovs-flows are (sometimes)not cleaned up upon port migration.
> 
> If the notification of OVS interface deletion is received before SB
> acks the PortBinding update, then ovn-controller does not cleanup
> related_lports leading to incorrect port-groups computation.
> 
> i.e if the order of events is as follows:
> 
> 1. PB update received on OVN controller about other chassis owning the
>     port.
> 2. ovn-controller claims the port, installs OVS flows and sends the
>     PortBinding update to SB.
> 3. OVS interface deletion notification received by ovn-controller.
> 4. SB ack received for step-2 PB update.
> 
> This commit fixes this issue by removing the logical_port from related
> port even in case there is no binding available locally.
> 
> Signed-off-by: Priyankar Jain <priyankar.jain@nutanix.com>
> ---
>   controller/binding.c | 1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/controller/binding.c b/controller/binding.c
> index 9b0647b70..9889be5c7 100644
> --- a/controller/binding.c
> +++ b/controller/binding.c
> @@ -1568,6 +1568,7 @@ consider_vif_lport_(const struct sbrec_port_binding *pb,
>               || is_additional_chassis(pb, b_ctx_in->chassis_rec)) {
>           /* Release the lport if there is no lbinding. */
>           if (!lbinding_set || !can_bind) {
> +            remove_related_lport(pb, b_ctx_out);
>               return release_lport(pb, b_ctx_in->chassis_rec,
>                                    !b_ctx_in->ovnsb_idl_txn,
>                                    b_ctx_out->tracked_dp_bindings,
Priyankar Jain June 1, 2023, 6:19 p.m. UTC | #2
Hi Mark,

Thanks for the review.

On 01/06/23 12:29 am, Mark Michelson wrote:
> Hi Priyankar,
> 
> The description makes the issue crystal clear, and you appear to be 
> solving the race condition that can happen between the OVS interface 
> table and the southbound port_binding table.
> 
> Acked-by: Mark Michelson <mmichels@redhat.com>
> 
> Just to let you know, the flapping problem you mention can be avoided 
> altogether by using options:requested-chassis on the northbound logical 
> switch port. When you migrate the port to a new chassis, place the new 
> chassis's name or hostname as this option, and ovn-controller will only 
> claim the logical switch port on that chassis. The old chassis will not 
> try to claim the port even if the tap is still present.
>

Thanks for the suggestion. I'll definitely try out this.
Appreciate all the help!

Regards,
Priyankar

> I wouldn't be surprised if there were other ways to trigger this race 
> condition as well. I suspect the port-flapping scenario is most likely 
> to trigger it, though.
> 
> On 5/31/23 01:35, Priyankar Jain wrote:
>> Currently during port migration, two chassis (source and destination)
>> can try to claim the same logical switch port simultaneously for a
>> short-period of time until the tap is deleted on source hypervisor.
>> ovn-controllers on these 2 hosts constantly receives port-binding
>> updates about other chassis claiming the port and as a result it tries
>> to claim the port again (because its chassis has a tap interface
>> referencing the LSP). This flapping ends once CMS cleans up tap
>> interface from the source chassis.
>>
>> Now following steps occur during a single iteration inc-proc-eng during
>> flapping:
>>
>> 1. PB update received on OVN controller about other chassis owning the
>>     port.
>> 2. ovn-controller tries to claim the port.
>> 3. It installs the OVS flows for the port and updates the runtime_data
>>     to include this port in locally relevant ports.
>> 4. If some change to runtime data happens as part of 3, port-groups
>>     containing the affected ports are recomputed. It uses related_lports
>>     runtime data to compute the port-groups.
>>
>> Finally, ovn-controller sends a port-binding update to SB changing the
>> chassis to itself.
>> At a later point of time, SB sends the notification to ovn-controller
>> about (4) being completed.
>>
>> Once CMS deletes the tap interface, ovn-controller receives the
>> notification and updates the runtime data accordingly.
>>
>> Issue: ovs-flows are (sometimes)not cleaned up upon port migration.
>>
>> If the notification of OVS interface deletion is received before SB
>> acks the PortBinding update, then ovn-controller does not cleanup
>> related_lports leading to incorrect port-groups computation.
>>
>> i.e if the order of events is as follows:
>>
>> 1. PB update received on OVN controller about other chassis owning the
>>     port.
>> 2. ovn-controller claims the port, installs OVS flows and sends the
>>     PortBinding update to SB.
>> 3. OVS interface deletion notification received by ovn-controller.
>> 4. SB ack received for step-2 PB update.
>>
>> This commit fixes this issue by removing the logical_port from related
>> port even in case there is no binding available locally.
>>
>> Signed-off-by: Priyankar Jain <priyankar.jain@nutanix.com>
>> ---
>>   controller/binding.c | 1 +
>>   1 file changed, 1 insertion(+)
>>
>> diff --git a/controller/binding.c b/controller/binding.c
>> index 9b0647b70..9889be5c7 100644
>> --- a/controller/binding.c
>> +++ b/controller/binding.c
>> @@ -1568,6 +1568,7 @@ consider_vif_lport_(const struct 
>> sbrec_port_binding *pb,
>>               || is_additional_chassis(pb, b_ctx_in->chassis_rec)) {
>>           /* Release the lport if there is no lbinding. */
>>           if (!lbinding_set || !can_bind) {
>> +            remove_related_lport(pb, b_ctx_out);
>>               return release_lport(pb, b_ctx_in->chassis_rec,
>>                                    !b_ctx_in->ovnsb_idl_txn,
>>                                    b_ctx_out->tracked_dp_bindings,
>
Xavier Simonart June 6, 2023, 12:10 p.m. UTC | #3
Hi Priyankar, Mark

Thanks for the patch. I agree with Mark - the description is really great !
Based on your description, I tried creating a unit-test reproducing the
issue, and checking that your patch fixes it.
I came up with [0]. It reproduces some issues (flows not deleted) on
origin/main, and the issues fixed using your patch. So, it looks good.

However, if, in [0] I remove the "sleep 2" (see below), then it seems that
there are still some issues.
It might not be exactly the same issue you saw, but is very similar - the
same flow does not get properly deleted.
I think that the (new) issue is the following:
When a port is claimed by two different chassis (as part of the migration),
ovn-controllers try to avoid a "war" between themselves, and postpone port
claiming if the port got claimed very recently.
This works fine. But, if, while a port claim is postponed, the interface is
deleted, it seems that some flows are not properly removed.
Checking that the port is postponed in is_binding_lport_this_chassis might
be enough, but this requires additional check.
(if we want to add this unit test to the patch, then we probably need to
move some of the functions to ovn-macros to avoid duplication, as I steal
them from the system tests)

Thanks
Xavier

[0]
OVN_FOR_EACH_NORTHD([
AT_SETUP([XXXX)
ovn_start

ovn-nbctl ls-add ls0
ovn-nbctl lsp-add ls0 lsp0
ovn-nbctl lsp-add ls0 lsp1

net_add n1
for i in 1 2; do
    sim_add hv$i
    as hv$i
    ovs-vsctl add-br br-phys
    ovn_attach n1 br-phys 192.168.0.$i
    ovn-appctl vlog/set dbg
done

sleep_sb() {
  echo SB going to sleep
  AT_CHECK([kill -STOP $(cat ovn-sb/ovsdb-server.pid)])
}
wake_up_sb() {
  echo SB waking up
  AT_CHECK([kill -CONT $(cat ovn-sb/ovsdb-server.pid)])
}
sleep_controller() {
  hv=$1
  echo Controller $hv going to sleep
  as $hv ovn-appctl debug/pause
  OVS_WAIT_UNTIL([test x$(as $hv ovn-appctl -t ovn-controller debug/status)
= "xpaused"])
}

wake_up_controller() {
  hv=$1
  echo Controller $hv going to wake up
  as $hv ovn-appctl debug/resume
  OVS_WAIT_UNTIL([test x$(as $hv ovn-appctl -t ovn-controller debug/status)
= "xrunning"])
}
get_flows()
{
    hv=$1
    out=oflows${2}
    as $hv ovs-ofctl dump-flows br-int |
        sed -e 's/cookie=[[0-9.]]*s, //g' |
        sed -e 's/duration=[[0-9.]]*s, //g' |
        sed -e 's/idle_age=[[0-9]]*, //g' |
        sed -e 's/n_packets=[[0-9]]*, //g' |
        sed -e 's/n_bytes=[[0-9]]*, //g' | sort -k2 > $out
    AT_CAPTURE_FILE([$out])
}

check ovn-nbctl --wait=hv sync
hv1_uuid=$(ovn-sbctl --bare --columns _uuid list chassis hv1)
hv2_uuid=$(ovn-sbctl --bare --columns _uuid list chassis hv2)

as hv1 ovs-vsctl -- add-port br-int vif1 -- set Interface vif1
external-ids:iface-id=lsp1
as hv1 ovs-vsctl -- add-port br-int vif -- set Interface vif
external-ids:iface-id=lsp0
check ovn-nbctl pg-add pg1 lsp0 lsp1
wait_for_ports_up
check ovn-nbctl --wait=hv acl-add ls0 to-lport 1000 'eth.type == 0x1234 &&
outport == @pg1' drop

# Delete vif => store flows w/ only vif1, and no vif
as hv1 ovs-vsctl -- del-port br-int vif
check ovn-nbctl --wait=hv sync
get_flows hv1 1

AS_BOX([sleeping hv1, binding hv1 and hv2])
sleep_controller hv1
as hv2 ovs-vsctl -- add-port br-int vif -- set Interface vif
external-ids:iface-id=lsp0
as hv1 ovs-vsctl -- add-port br-int vif -- set Interface vif
external-ids:iface-id=lsp0

OVS_WAIT_UNTIL([
    chassis=$(ovn-sbctl --bare --columns chassis list port_binding lsp0)
    test "$chassis" = $hv2_uuid
])

AS_BOX([sleeping sb, waking up hv1, Sleeping hv2])
sleep_sb
wake_up_controller hv1
sleep_controller hv2
*sleep 2*
AS_BOX([Unbinding hv1, waking up sb, waking up hv2])

as hv1 ovs-vsctl -- del-port br-int vif
wake_up_sb
wake_up_controller hv2

AS_BOX([Unbinding hv2])
as hv2 ovs-vsctl -- del-port br-int vif
check ovn-nbctl --wait=hv sync

get_flows hv1 2

check diff oflows1 oflows2
OVN_CLEANUP([hv1],[hv2])

AT_CLEANUP
])

~




On Thu, Jun 1, 2023 at 8:21 PM Priyankar Jain <priyankar.jain@nutanix.com>
wrote:

> Hi Mark,
>
> Thanks for the review.
>
> On 01/06/23 12:29 am, Mark Michelson wrote:
> > Hi Priyankar,
> >
> > The description makes the issue crystal clear, and you appear to be
> > solving the race condition that can happen between the OVS interface
> > table and the southbound port_binding table.
> >
> > Acked-by: Mark Michelson <mmichels@redhat.com>
> >
> > Just to let you know, the flapping problem you mention can be avoided
> > altogether by using options:requested-chassis on the northbound logical
> > switch port. When you migrate the port to a new chassis, place the new
> > chassis's name or hostname as this option, and ovn-controller will only
> > claim the logical switch port on that chassis. The old chassis will not
> > try to claim the port even if the tap is still present.
> >
>
> Thanks for the suggestion. I'll definitely try out this.
> Appreciate all the help!
>
> Regards,
> Priyankar
>
> > I wouldn't be surprised if there were other ways to trigger this race
> > condition as well. I suspect the port-flapping scenario is most likely
> > to trigger it, though.
> >
> > On 5/31/23 01:35, Priyankar Jain wrote:
> >> Currently during port migration, two chassis (source and destination)
> >> can try to claim the same logical switch port simultaneously for a
> >> short-period of time until the tap is deleted on source hypervisor.
> >> ovn-controllers on these 2 hosts constantly receives port-binding
> >> updates about other chassis claiming the port and as a result it tries
> >> to claim the port again (because its chassis has a tap interface
> >> referencing the LSP). This flapping ends once CMS cleans up tap
> >> interface from the source chassis.
> >>
> >> Now following steps occur during a single iteration inc-proc-eng during
> >> flapping:
> >>
> >> 1. PB update received on OVN controller about other chassis owning the
> >>     port.
> >> 2. ovn-controller tries to claim the port.
> >> 3. It installs the OVS flows for the port and updates the runtime_data
> >>     to include this port in locally relevant ports.
> >> 4. If some change to runtime data happens as part of 3, port-groups
> >>     containing the affected ports are recomputed. It uses related_lports
> >>     runtime data to compute the port-groups.
> >>
> >> Finally, ovn-controller sends a port-binding update to SB changing the
> >> chassis to itself.
> >> At a later point of time, SB sends the notification to ovn-controller
> >> about (4) being completed.
> >>
> >> Once CMS deletes the tap interface, ovn-controller receives the
> >> notification and updates the runtime data accordingly.
> >>
> >> Issue: ovs-flows are (sometimes)not cleaned up upon port migration.
> >>
> >> If the notification of OVS interface deletion is received before SB
> >> acks the PortBinding update, then ovn-controller does not cleanup
> >> related_lports leading to incorrect port-groups computation.
> >>
> >> i.e if the order of events is as follows:
> >>
> >> 1. PB update received on OVN controller about other chassis owning the
> >>     port.
> >> 2. ovn-controller claims the port, installs OVS flows and sends the
> >>     PortBinding update to SB.
> >> 3. OVS interface deletion notification received by ovn-controller.
> >> 4. SB ack received for step-2 PB update.
> >>
> >> This commit fixes this issue by removing the logical_port from related
> >> port even in case there is no binding available locally.
> >>
> >> Signed-off-by: Priyankar Jain <priyankar.jain@nutanix.com>
> >> ---
> >>   controller/binding.c | 1 +
> >>   1 file changed, 1 insertion(+)
> >>
> >> diff --git a/controller/binding.c b/controller/binding.c
> >> index 9b0647b70..9889be5c7 100644
> >> --- a/controller/binding.c
> >> +++ b/controller/binding.c
> >> @@ -1568,6 +1568,7 @@ consider_vif_lport_(const struct
> >> sbrec_port_binding *pb,
> >>               || is_additional_chassis(pb, b_ctx_in->chassis_rec)) {
> >>           /* Release the lport if there is no lbinding. */
> >>           if (!lbinding_set || !can_bind) {
> >> +            remove_related_lport(pb, b_ctx_out);
> >>               return release_lport(pb, b_ctx_in->chassis_rec,
> >>                                    !b_ctx_in->ovnsb_idl_txn,
> >>                                    b_ctx_out->tracked_dp_bindings,
> >
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
Dumitru Ceara July 12, 2023, 10:42 a.m. UTC | #4
On 6/6/23 14:10, Xavier Simonart wrote:
> Hi Priyankar, Mark
> 
> Thanks for the patch. I agree with Mark - the description is really great !
> Based on your description, I tried creating a unit-test reproducing the
> issue, and checking that your patch fixes it.
> I came up with [0]. It reproduces some issues (flows not deleted) on
> origin/main, and the issues fixed using your patch. So, it looks good.
> 
> However, if, in [0] I remove the "sleep 2" (see below), then it seems that
> there are still some issues.
> It might not be exactly the same issue you saw, but is very similar - the
> same flow does not get properly deleted.
> I think that the (new) issue is the following:
> When a port is claimed by two different chassis (as part of the migration),
> ovn-controllers try to avoid a "war" between themselves, and postpone port
> claiming if the port got claimed very recently.
> This works fine. But, if, while a port claim is postponed, the interface is
> deleted, it seems that some flows are not properly removed.
> Checking that the port is postponed in is_binding_lport_this_chassis might
> be enough, but this requires additional check.
> (if we want to add this unit test to the patch, then we probably need to
> move some of the functions to ovn-macros to avoid duplication, as I steal
> them from the system tests)

Hi Priyankar,

Thanks for the fix!

I guess we have two options:

(1) go ahead with your patch but also include Xavier's test.
(2) figure out a more generic solution which covers the second problem
reported by Xavier too and respin this patch as a v2 that fixes all cases.

I'd prefer we go for (2) mainly because we know there's a problem but
also because the "sleep 2" in the test case makes me uneasy.

Do you have some input on this?

Regards,
Dumitru

> 
> Thanks
> Xavier
> 
> [0]
> OVN_FOR_EACH_NORTHD([
> AT_SETUP([XXXX)
> ovn_start
> 
> ovn-nbctl ls-add ls0
> ovn-nbctl lsp-add ls0 lsp0
> ovn-nbctl lsp-add ls0 lsp1
> 
> net_add n1
> for i in 1 2; do
>     sim_add hv$i
>     as hv$i
>     ovs-vsctl add-br br-phys
>     ovn_attach n1 br-phys 192.168.0.$i
>     ovn-appctl vlog/set dbg
> done
> 
> sleep_sb() {
>   echo SB going to sleep
>   AT_CHECK([kill -STOP $(cat ovn-sb/ovsdb-server.pid)])
> }
> wake_up_sb() {
>   echo SB waking up
>   AT_CHECK([kill -CONT $(cat ovn-sb/ovsdb-server.pid)])
> }
> sleep_controller() {
>   hv=$1
>   echo Controller $hv going to sleep
>   as $hv ovn-appctl debug/pause
>   OVS_WAIT_UNTIL([test x$(as $hv ovn-appctl -t ovn-controller debug/status)
> = "xpaused"])
> }
> 
> wake_up_controller() {
>   hv=$1
>   echo Controller $hv going to wake up
>   as $hv ovn-appctl debug/resume
>   OVS_WAIT_UNTIL([test x$(as $hv ovn-appctl -t ovn-controller debug/status)
> = "xrunning"])
> }
> get_flows()
> {
>     hv=$1
>     out=oflows${2}
>     as $hv ovs-ofctl dump-flows br-int |
>         sed -e 's/cookie=[[0-9.]]*s, //g' |
>         sed -e 's/duration=[[0-9.]]*s, //g' |
>         sed -e 's/idle_age=[[0-9]]*, //g' |
>         sed -e 's/n_packets=[[0-9]]*, //g' |
>         sed -e 's/n_bytes=[[0-9]]*, //g' | sort -k2 > $out
>     AT_CAPTURE_FILE([$out])
> }
> 
> check ovn-nbctl --wait=hv sync
> hv1_uuid=$(ovn-sbctl --bare --columns _uuid list chassis hv1)
> hv2_uuid=$(ovn-sbctl --bare --columns _uuid list chassis hv2)
> 
> as hv1 ovs-vsctl -- add-port br-int vif1 -- set Interface vif1
> external-ids:iface-id=lsp1
> as hv1 ovs-vsctl -- add-port br-int vif -- set Interface vif
> external-ids:iface-id=lsp0
> check ovn-nbctl pg-add pg1 lsp0 lsp1
> wait_for_ports_up
> check ovn-nbctl --wait=hv acl-add ls0 to-lport 1000 'eth.type == 0x1234 &&
> outport == @pg1' drop
> 
> # Delete vif => store flows w/ only vif1, and no vif
> as hv1 ovs-vsctl -- del-port br-int vif
> check ovn-nbctl --wait=hv sync
> get_flows hv1 1
> 
> AS_BOX([sleeping hv1, binding hv1 and hv2])
> sleep_controller hv1
> as hv2 ovs-vsctl -- add-port br-int vif -- set Interface vif
> external-ids:iface-id=lsp0
> as hv1 ovs-vsctl -- add-port br-int vif -- set Interface vif
> external-ids:iface-id=lsp0
> 
> OVS_WAIT_UNTIL([
>     chassis=$(ovn-sbctl --bare --columns chassis list port_binding lsp0)
>     test "$chassis" = $hv2_uuid
> ])
> 
> AS_BOX([sleeping sb, waking up hv1, Sleeping hv2])
> sleep_sb
> wake_up_controller hv1
> sleep_controller hv2
> *sleep 2*
> AS_BOX([Unbinding hv1, waking up sb, waking up hv2])
> 
> as hv1 ovs-vsctl -- del-port br-int vif
> wake_up_sb
> wake_up_controller hv2
> 
> AS_BOX([Unbinding hv2])
> as hv2 ovs-vsctl -- del-port br-int vif
> check ovn-nbctl --wait=hv sync
> 
> get_flows hv1 2
> 
> check diff oflows1 oflows2
> OVN_CLEANUP([hv1],[hv2])
> 
> AT_CLEANUP
> ])
> 
> ~
> 
> 
> 
> 
> On Thu, Jun 1, 2023 at 8:21 PM Priyankar Jain <priyankar.jain@nutanix.com>
> wrote:
> 
>> Hi Mark,
>>
>> Thanks for the review.
>>
>> On 01/06/23 12:29 am, Mark Michelson wrote:
>>> Hi Priyankar,
>>>
>>> The description makes the issue crystal clear, and you appear to be
>>> solving the race condition that can happen between the OVS interface
>>> table and the southbound port_binding table.
>>>
>>> Acked-by: Mark Michelson <mmichels@redhat.com>
>>>
>>> Just to let you know, the flapping problem you mention can be avoided
>>> altogether by using options:requested-chassis on the northbound logical
>>> switch port. When you migrate the port to a new chassis, place the new
>>> chassis's name or hostname as this option, and ovn-controller will only
>>> claim the logical switch port on that chassis. The old chassis will not
>>> try to claim the port even if the tap is still present.
>>>
>>
>> Thanks for the suggestion. I'll definitely try out this.
>> Appreciate all the help!
>>
>> Regards,
>> Priyankar
>>
>>> I wouldn't be surprised if there were other ways to trigger this race
>>> condition as well. I suspect the port-flapping scenario is most likely
>>> to trigger it, though.
>>>
>>> On 5/31/23 01:35, Priyankar Jain wrote:
>>>> Currently during port migration, two chassis (source and destination)
>>>> can try to claim the same logical switch port simultaneously for a
>>>> short-period of time until the tap is deleted on source hypervisor.
>>>> ovn-controllers on these 2 hosts constantly receives port-binding
>>>> updates about other chassis claiming the port and as a result it tries
>>>> to claim the port again (because its chassis has a tap interface
>>>> referencing the LSP). This flapping ends once CMS cleans up tap
>>>> interface from the source chassis.
>>>>
>>>> Now following steps occur during a single iteration inc-proc-eng during
>>>> flapping:
>>>>
>>>> 1. PB update received on OVN controller about other chassis owning the
>>>>     port.
>>>> 2. ovn-controller tries to claim the port.
>>>> 3. It installs the OVS flows for the port and updates the runtime_data
>>>>     to include this port in locally relevant ports.
>>>> 4. If some change to runtime data happens as part of 3, port-groups
>>>>     containing the affected ports are recomputed. It uses related_lports
>>>>     runtime data to compute the port-groups.
>>>>
>>>> Finally, ovn-controller sends a port-binding update to SB changing the
>>>> chassis to itself.
>>>> At a later point of time, SB sends the notification to ovn-controller
>>>> about (4) being completed.
>>>>
>>>> Once CMS deletes the tap interface, ovn-controller receives the
>>>> notification and updates the runtime data accordingly.
>>>>
>>>> Issue: ovs-flows are (sometimes)not cleaned up upon port migration.
>>>>
>>>> If the notification of OVS interface deletion is received before SB
>>>> acks the PortBinding update, then ovn-controller does not cleanup
>>>> related_lports leading to incorrect port-groups computation.
>>>>
>>>> i.e if the order of events is as follows:
>>>>
>>>> 1. PB update received on OVN controller about other chassis owning the
>>>>     port.
>>>> 2. ovn-controller claims the port, installs OVS flows and sends the
>>>>     PortBinding update to SB.
>>>> 3. OVS interface deletion notification received by ovn-controller.
>>>> 4. SB ack received for step-2 PB update.
>>>>
>>>> This commit fixes this issue by removing the logical_port from related
>>>> port even in case there is no binding available locally.
>>>>
>>>> Signed-off-by: Priyankar Jain <priyankar.jain@nutanix.com>
>>>> ---
>>>>   controller/binding.c | 1 +
>>>>   1 file changed, 1 insertion(+)
>>>>
>>>> diff --git a/controller/binding.c b/controller/binding.c
>>>> index 9b0647b70..9889be5c7 100644
>>>> --- a/controller/binding.c
>>>> +++ b/controller/binding.c
>>>> @@ -1568,6 +1568,7 @@ consider_vif_lport_(const struct
>>>> sbrec_port_binding *pb,
>>>>               || is_additional_chassis(pb, b_ctx_in->chassis_rec)) {
>>>>           /* Release the lport if there is no lbinding. */
>>>>           if (!lbinding_set || !can_bind) {
>>>> +            remove_related_lport(pb, b_ctx_out);
>>>>               return release_lport(pb, b_ctx_in->chassis_rec,
>>>>                                    !b_ctx_in->ovnsb_idl_txn,
>>>>                                    b_ctx_out->tracked_dp_bindings,
>>>
>> _______________________________________________
>> dev mailing list
>> dev@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Priyankar Jain July 13, 2023, 6:48 a.m. UTC | #5
Hi Dumitru,

On 12/07/23 4:12 pm, Dumitru Ceara wrote:
> On 6/6/23 14:10, Xavier Simonart wrote:
>> Hi Priyankar, Mark
>>
>> Thanks for the patch. I agree with Mark - the description is really great !
>> Based on your description, I tried creating a unit-test reproducing the
>> issue, and checking that your patch fixes it.
>> I came up with [0]. It reproduces some issues (flows not deleted) on
>> origin/main, and the issues fixed using your patch. So, it looks good.
>>
>> However, if, in [0] I remove the "sleep 2" (see below), then it seems that
>> there are still some issues.
>> It might not be exactly the same issue you saw, but is very similar - the
>> same flow does not get properly deleted.
>> I think that the (new) issue is the following:
>> When a port is claimed by two different chassis (as part of the migration),
>> ovn-controllers try to avoid a "war" between themselves, and postpone port
>> claiming if the port got claimed very recently.
>> This works fine. But, if, while a port claim is postponed, the interface is
>> deleted, it seems that some flows are not properly removed.
>> Checking that the port is postponed in is_binding_lport_this_chassis might
>> be enough, but this requires additional check.
>> (if we want to add this unit test to the patch, then we probably need to
>> move some of the functions to ovn-macros to avoid duplication, as I steal
>> them from the system tests)
> 
> Hi Priyankar,
> 
> Thanks for the fix!
> 
> I guess we have two options:
> 
> (1) go ahead with your patch but also include Xavier's test.
> (2) figure out a more generic solution which covers the second problem
> reported by Xavier too and respin this patch as a v2 that fixes all cases.
> 
> I'd prefer we go for (2) mainly because we know there's a problem but
> also because the "sleep 2" in the test case makes me uneasy.
> 
> Do you have some input on this?

I concur. We should go ahead with (2). Lately I was not able to spend 
time on the patch. I'll try to spend some cycles working on the generic 
fix for the issues forementioned.

P.S: Thanks a lot Xavier for providing an automated testcase. Appreciate 
your support.

Thanks,
Priyankar

> Regards,
> Dumitru
> 
>>
>> Thanks
>> Xavier
>>
>> [0]
>> OVN_FOR_EACH_NORTHD([
>> AT_SETUP([XXXX)
>> ovn_start
>>
>> ovn-nbctl ls-add ls0
>> ovn-nbctl lsp-add ls0 lsp0
>> ovn-nbctl lsp-add ls0 lsp1
>>
>> net_add n1
>> for i in 1 2; do
>>      sim_add hv$i
>>      as hv$i
>>      ovs-vsctl add-br br-phys
>>      ovn_attach n1 br-phys 192.168.0.$i
>>      ovn-appctl vlog/set dbg
>> done
>>
>> sleep_sb() {
>>    echo SB going to sleep
>>    AT_CHECK([kill -STOP $(cat ovn-sb/ovsdb-server.pid)])
>> }
>> wake_up_sb() {
>>    echo SB waking up
>>    AT_CHECK([kill -CONT $(cat ovn-sb/ovsdb-server.pid)])
>> }
>> sleep_controller() {
>>    hv=$1
>>    echo Controller $hv going to sleep
>>    as $hv ovn-appctl debug/pause
>>    OVS_WAIT_UNTIL([test x$(as $hv ovn-appctl -t ovn-controller debug/status)
>> = "xpaused"])
>> }
>>
>> wake_up_controller() {
>>    hv=$1
>>    echo Controller $hv going to wake up
>>    as $hv ovn-appctl debug/resume
>>    OVS_WAIT_UNTIL([test x$(as $hv ovn-appctl -t ovn-controller debug/status)
>> = "xrunning"])
>> }
>> get_flows()
>> {
>>      hv=$1
>>      out=oflows${2}
>>      as $hv ovs-ofctl dump-flows br-int |
>>          sed -e 's/cookie=[[0-9.]]*s, //g' |
>>          sed -e 's/duration=[[0-9.]]*s, //g' |
>>          sed -e 's/idle_age=[[0-9]]*, //g' |
>>          sed -e 's/n_packets=[[0-9]]*, //g' |
>>          sed -e 's/n_bytes=[[0-9]]*, //g' | sort -k2 > $out
>>      AT_CAPTURE_FILE([$out])
>> }
>>
>> check ovn-nbctl --wait=hv sync
>> hv1_uuid=$(ovn-sbctl --bare --columns _uuid list chassis hv1)
>> hv2_uuid=$(ovn-sbctl --bare --columns _uuid list chassis hv2)
>>
>> as hv1 ovs-vsctl -- add-port br-int vif1 -- set Interface vif1
>> external-ids:iface-id=lsp1
>> as hv1 ovs-vsctl -- add-port br-int vif -- set Interface vif
>> external-ids:iface-id=lsp0
>> check ovn-nbctl pg-add pg1 lsp0 lsp1
>> wait_for_ports_up
>> check ovn-nbctl --wait=hv acl-add ls0 to-lport 1000 'eth.type == 0x1234 &&
>> outport == @pg1' drop
>>
>> # Delete vif => store flows w/ only vif1, and no vif
>> as hv1 ovs-vsctl -- del-port br-int vif
>> check ovn-nbctl --wait=hv sync
>> get_flows hv1 1
>>
>> AS_BOX([sleeping hv1, binding hv1 and hv2])
>> sleep_controller hv1
>> as hv2 ovs-vsctl -- add-port br-int vif -- set Interface vif
>> external-ids:iface-id=lsp0
>> as hv1 ovs-vsctl -- add-port br-int vif -- set Interface vif
>> external-ids:iface-id=lsp0
>>
>> OVS_WAIT_UNTIL([
>>      chassis=$(ovn-sbctl --bare --columns chassis list port_binding lsp0)
>>      test "$chassis" = $hv2_uuid
>> ])
>>
>> AS_BOX([sleeping sb, waking up hv1, Sleeping hv2])
>> sleep_sb
>> wake_up_controller hv1
>> sleep_controller hv2
>> *sleep 2*
>> AS_BOX([Unbinding hv1, waking up sb, waking up hv2])
>>
>> as hv1 ovs-vsctl -- del-port br-int vif
>> wake_up_sb
>> wake_up_controller hv2
>>
>> AS_BOX([Unbinding hv2])
>> as hv2 ovs-vsctl -- del-port br-int vif
>> check ovn-nbctl --wait=hv sync
>>
>> get_flows hv1 2
>>
>> check diff oflows1 oflows2
>> OVN_CLEANUP([hv1],[hv2])
>>
>> AT_CLEANUP
>> ])
>>
>> ~
>>
>>
>>
>>
>> On Thu, Jun 1, 2023 at 8:21 PM Priyankar Jain <priyankar.jain@nutanix.com>
>> wrote:
>>
>>> Hi Mark,
>>>
>>> Thanks for the review.
>>>
>>> On 01/06/23 12:29 am, Mark Michelson wrote:
>>>> Hi Priyankar,
>>>>
>>>> The description makes the issue crystal clear, and you appear to be
>>>> solving the race condition that can happen between the OVS interface
>>>> table and the southbound port_binding table.
>>>>
>>>> Acked-by: Mark Michelson <mmichels@redhat.com>
>>>>
>>>> Just to let you know, the flapping problem you mention can be avoided
>>>> altogether by using options:requested-chassis on the northbound logical
>>>> switch port. When you migrate the port to a new chassis, place the new
>>>> chassis's name or hostname as this option, and ovn-controller will only
>>>> claim the logical switch port on that chassis. The old chassis will not
>>>> try to claim the port even if the tap is still present.
>>>>
>>>
>>> Thanks for the suggestion. I'll definitely try out this.
>>> Appreciate all the help!
>>>
>>> Regards,
>>> Priyankar
>>>
>>>> I wouldn't be surprised if there were other ways to trigger this race
>>>> condition as well. I suspect the port-flapping scenario is most likely
>>>> to trigger it, though.
>>>>
>>>> On 5/31/23 01:35, Priyankar Jain wrote:
>>>>> Currently during port migration, two chassis (source and destination)
>>>>> can try to claim the same logical switch port simultaneously for a
>>>>> short-period of time until the tap is deleted on source hypervisor.
>>>>> ovn-controllers on these 2 hosts constantly receives port-binding
>>>>> updates about other chassis claiming the port and as a result it tries
>>>>> to claim the port again (because its chassis has a tap interface
>>>>> referencing the LSP). This flapping ends once CMS cleans up tap
>>>>> interface from the source chassis.
>>>>>
>>>>> Now following steps occur during a single iteration inc-proc-eng during
>>>>> flapping:
>>>>>
>>>>> 1. PB update received on OVN controller about other chassis owning the
>>>>>      port.
>>>>> 2. ovn-controller tries to claim the port.
>>>>> 3. It installs the OVS flows for the port and updates the runtime_data
>>>>>      to include this port in locally relevant ports.
>>>>> 4. If some change to runtime data happens as part of 3, port-groups
>>>>>      containing the affected ports are recomputed. It uses related_lports
>>>>>      runtime data to compute the port-groups.
>>>>>
>>>>> Finally, ovn-controller sends a port-binding update to SB changing the
>>>>> chassis to itself.
>>>>> At a later point of time, SB sends the notification to ovn-controller
>>>>> about (4) being completed.
>>>>>
>>>>> Once CMS deletes the tap interface, ovn-controller receives the
>>>>> notification and updates the runtime data accordingly.
>>>>>
>>>>> Issue: ovs-flows are (sometimes)not cleaned up upon port migration.
>>>>>
>>>>> If the notification of OVS interface deletion is received before SB
>>>>> acks the PortBinding update, then ovn-controller does not cleanup
>>>>> related_lports leading to incorrect port-groups computation.
>>>>>
>>>>> i.e if the order of events is as follows:
>>>>>
>>>>> 1. PB update received on OVN controller about other chassis owning the
>>>>>      port.
>>>>> 2. ovn-controller claims the port, installs OVS flows and sends the
>>>>>      PortBinding update to SB.
>>>>> 3. OVS interface deletion notification received by ovn-controller.
>>>>> 4. SB ack received for step-2 PB update.
>>>>>
>>>>> This commit fixes this issue by removing the logical_port from related
>>>>> port even in case there is no binding available locally.
>>>>>
>>>>> Signed-off-by: Priyankar Jain <priyankar.jain@nutanix.com>
>>>>> ---
>>>>>    controller/binding.c | 1 +
>>>>>    1 file changed, 1 insertion(+)
>>>>>
>>>>> diff --git a/controller/binding.c b/controller/binding.c
>>>>> index 9b0647b70..9889be5c7 100644
>>>>> --- a/controller/binding.c
>>>>> +++ b/controller/binding.c
>>>>> @@ -1568,6 +1568,7 @@ consider_vif_lport_(const struct
>>>>> sbrec_port_binding *pb,
>>>>>                || is_additional_chassis(pb, b_ctx_in->chassis_rec)) {
>>>>>            /* Release the lport if there is no lbinding. */
>>>>>            if (!lbinding_set || !can_bind) {
>>>>> +            remove_related_lport(pb, b_ctx_out);
>>>>>                return release_lport(pb, b_ctx_in->chassis_rec,
>>>>>                                     !b_ctx_in->ovnsb_idl_txn,
>>>>>                                     b_ctx_out->tracked_dp_bindings,
>>>>
>>> _______________________________________________
>>> dev mailing list
>>> dev@openvswitch.org
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_mailman_listinfo_ovs-2Ddev&d=DwIDaQ&c=s883GpUCOChKOHiocYtGcg&r=BWfApgwqQ_BTXd_yEq_kwWLM-KTqkdPvolDMCJmUagg&m=5RE3xB3ecoAnUsl4KL4Rrl10TBJg061rQvodIeQHoE8XevtCE7KUyGPW6HESfx9q&s=rGwggMziqjM6RzqSmxtynVVCRbZysklqZChhzQHYPHw&e=
>>>
>> _______________________________________________
>> dev mailing list
>> dev@openvswitch.org
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_mailman_listinfo_ovs-2Ddev&d=DwIDaQ&c=s883GpUCOChKOHiocYtGcg&r=BWfApgwqQ_BTXd_yEq_kwWLM-KTqkdPvolDMCJmUagg&m=5RE3xB3ecoAnUsl4KL4Rrl10TBJg061rQvodIeQHoE8XevtCE7KUyGPW6HESfx9q&s=rGwggMziqjM6RzqSmxtynVVCRbZysklqZChhzQHYPHw&e=
>
Dumitru Ceara July 13, 2023, 8:58 a.m. UTC | #6
On 7/13/23 08:48, Priyankar Jain wrote:
> Hi Dumitru,
> 
> On 12/07/23 4:12 pm, Dumitru Ceara wrote:
>> On 6/6/23 14:10, Xavier Simonart wrote:
>>> Hi Priyankar, Mark
>>>
>>> Thanks for the patch. I agree with Mark - the description is really
>>> great !
>>> Based on your description, I tried creating a unit-test reproducing the
>>> issue, and checking that your patch fixes it.
>>> I came up with [0]. It reproduces some issues (flows not deleted) on
>>> origin/main, and the issues fixed using your patch. So, it looks good.
>>>
>>> However, if, in [0] I remove the "sleep 2" (see below), then it seems
>>> that
>>> there are still some issues.
>>> It might not be exactly the same issue you saw, but is very similar -
>>> the
>>> same flow does not get properly deleted.
>>> I think that the (new) issue is the following:
>>> When a port is claimed by two different chassis (as part of the
>>> migration),
>>> ovn-controllers try to avoid a "war" between themselves, and postpone
>>> port
>>> claiming if the port got claimed very recently.
>>> This works fine. But, if, while a port claim is postponed, the
>>> interface is
>>> deleted, it seems that some flows are not properly removed.
>>> Checking that the port is postponed in is_binding_lport_this_chassis
>>> might
>>> be enough, but this requires additional check.
>>> (if we want to add this unit test to the patch, then we probably need to
>>> move some of the functions to ovn-macros to avoid duplication, as I
>>> steal
>>> them from the system tests)
>>
>> Hi Priyankar,
>>
>> Thanks for the fix!
>>
>> I guess we have two options:
>>
>> (1) go ahead with your patch but also include Xavier's test.
>> (2) figure out a more generic solution which covers the second problem
>> reported by Xavier too and respin this patch as a v2 that fixes all
>> cases.
>>
>> I'd prefer we go for (2) mainly because we know there's a problem but
>> also because the "sleep 2" in the test case makes me uneasy.
>>
>> Do you have some input on this?
> 
> I concur. We should go ahead with (2). Lately I was not able to spend
> time on the patch. I'll try to spend some cycles working on the generic
> fix for the issues forementioned.

Thanks, I set the state of this patch to "changes requested" in
patchwork and I'm looking forward to v2.

Regards,
Dumitru
diff mbox series

Patch

diff --git a/controller/binding.c b/controller/binding.c
index 9b0647b70..9889be5c7 100644
--- a/controller/binding.c
+++ b/controller/binding.c
@@ -1568,6 +1568,7 @@  consider_vif_lport_(const struct sbrec_port_binding *pb,
             || is_additional_chassis(pb, b_ctx_in->chassis_rec)) {
         /* Release the lport if there is no lbinding. */
         if (!lbinding_set || !can_bind) {
+            remove_related_lport(pb, b_ctx_out);
             return release_lport(pb, b_ctx_in->chassis_rec,
                                  !b_ctx_in->ovnsb_idl_txn,
                                  b_ctx_out->tracked_dp_bindings,