diff mbox series

[ovs-dev] bond: Avoid deadlock while updating post recirculation rules.

Message ID 20220811145459.1751516-1-i.maximets@ovn.org
State Superseded
Headers show
Series [ovs-dev] bond: Avoid deadlock while updating post recirculation rules. | expand

Checks

Context Check Description
ovsrobot/apply-robot success apply and check: success
ovsrobot/github-robot-_Build_and_Test success github build: passed
ovsrobot/intel-ovs-compilation success test: success

Commit Message

Ilya Maximets Aug. 11, 2022, 2:54 p.m. UTC
If the PACKET_OUT from controller ends up with sending packet to
a bond interface, the main thread will take locks in the following
order:
  handle_openflow
  --> take ofproto_mutex
  handle_packet_out
  packet_xlate
  output_normal
  bond_update_post_recirc_rules
  --> take rwlock in bond.c

If at the same time revalidator thread is processing other packet
with the output to the same bond:
  xlate_actions
  output_normal
  bond_update_post_recirc_rules
  --> take rwlock in bond.c
  update_recirc_rules
  ofproto_dpif_add_internal_flow
  ofproto_flow_mod
  --> take ofproto_mutex

So, it is possible for these 2 threads to lock each other by
taking one lock and waiting for another thread to release the
second lock.

It might also be possible for the main thread to lock itself
up by trying to acquire ofproto_mutex for the second time, if
it will actually proceed with update_recirc_rules() after
taking the bond rwlock.

The problem appears to be that bond_update_post_recirc_rules()
is called during the flow translation even if side effects are
prohibited, which is the case for openflow PACKET_OUT handling.

Skipping actual flow updates during the flow translation if
side effects are disabled to avoid the deadlock.

Since flow are not installed now when actions translated for
very first packet, installing initial flows in bond_reconfigure().
This will cover the case of allocating a new recirc_id.

Also checking if we need to update flows in bond_run() to cover
link state changes.

Reported-at: https://github.com/openvswitch/ovs-issues/issues/259
Fixes: adcf00ba35a0 ("ofproto/bond: Implement bond megaflow using recirculation")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
---

I'm not sure how to create a unit test for the lock race with
revalidator, but it might be possible to trigger the self-lockup
issue on the main thread.  I'll try to work on the test while
waiting for review.

 ofproto/bond.c               | 30 ++++++++++++++++++++++++++++++
 ofproto/bond.h               |  3 +++
 ofproto/ofproto-dpif-xlate.c | 15 ++++++++++++---
 3 files changed, 45 insertions(+), 3 deletions(-)

Comments

Mike Pattrick Aug. 22, 2022, 3:21 p.m. UTC | #1
On Thu, Aug 11, 2022 at 10:55 AM Ilya Maximets <i.maximets@ovn.org> wrote:
>
> If the PACKET_OUT from controller ends up with sending packet to
> a bond interface, the main thread will take locks in the following
> order:
>   handle_openflow
>   --> take ofproto_mutex
>   handle_packet_out
>   packet_xlate
>   output_normal
>   bond_update_post_recirc_rules
>   --> take rwlock in bond.c
>
> If at the same time revalidator thread is processing other packet
> with the output to the same bond:
>   xlate_actions
>   output_normal
>   bond_update_post_recirc_rules
>   --> take rwlock in bond.c
>   update_recirc_rules
>   ofproto_dpif_add_internal_flow
>   ofproto_flow_mod
>   --> take ofproto_mutex
>
> So, it is possible for these 2 threads to lock each other by
> taking one lock and waiting for another thread to release the
> second lock.
>
> It might also be possible for the main thread to lock itself
> up by trying to acquire ofproto_mutex for the second time, if
> it will actually proceed with update_recirc_rules() after
> taking the bond rwlock.
>
> The problem appears to be that bond_update_post_recirc_rules()
> is called during the flow translation even if side effects are
> prohibited, which is the case for openflow PACKET_OUT handling.
>
> Skipping actual flow updates during the flow translation if
> side effects are disabled to avoid the deadlock.
>
> Since flow are not installed now when actions translated for
> very first packet, installing initial flows in bond_reconfigure().
> This will cover the case of allocating a new recirc_id.
>
> Also checking if we need to update flows in bond_run() to cover
> link state changes.
>
> Reported-at: https://github.com/openvswitch/ovs-issues/issues/259
> Fixes: adcf00ba35a0 ("ofproto/bond: Implement bond megaflow using recirculation")
> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
> ---
>
> I'm not sure how to create a unit test for the lock race with
> revalidator, but it might be possible to trigger the self-lockup
> issue on the main thread.  I'll try to work on the test while
> waiting for review.
>

Hello Ilya,

I've tested this and confirmed that it solves the problem for me.

Acked-by: Mike Pattrick <mkp@redhat.com>

I created a reproducer that doesn't require dpdk or real interfaces:

ip netns add sidea
ip netns add sideb
ovs-vsctl add-br br-sidea
ovs-vsctl add-br br-sideb
ip link add portab0 type veth peer name portbb0
ip link add portab1 type veth peer name portbb1
ip link set dev portab0 up
ip link set dev portab1 up
ip link set dev portbb0 up
ip link set dev portbb1 up
ovs-vsctl add-bond br-sidea bonda portab0 portab1 -- set port bonda
lacp=active -- set port bonda bond_mode=balance-tcp
ovs-vsctl add-bond br-sideb bondb portbb0 portbb1 -- set port bondb
lacp=active -- set port bondb bond_mode=balance-tcp
ip link add porta2 type veth peer name ovs-porta2
ip link add portb2 type veth peer name ovs-portb2
ip link set dev ovs-porta2 up
ip link set dev ovs-portb2 up
ip link set porta2 netns sidea
ip link set portb2 netns sideb
ip netns exec sidea ip link set dev porta2 up
ip netns exec sideb ip link set dev portb2 up
ip netns exec sidea ip addr add 172.31.1.1/24 dev porta2
ip netns exec sideb ip addr add 172.31.1.2/24 dev portb2
ovs-vsctl add-port br-sidea ovs-porta2
ovs-vsctl add-port br-sideb ovs-portb2
ovs-ofctl add-flow br-sidea "actions=normal"
ovs-ofctl add-flow br-sideb "actions=normal"
while [[ 1 ]]; do
    ovs-ofctl packet-out br-sidea
"packet=ffffffffffff000000000000080045000054fbc340004001e4a3ac1f0101ac1f01020800e9af9e540001ce8e036300000000da35050000000000101112131415161718191a1b1c1d1e1f202122231415161718191a1b1c1d1e1f202122232425262728292a2b2c2d2e2f3031323334353637,
actions=resubmit(,0)" 2>/dev/null >/dev/null
    ovs-ofctl packet-out br-sideb
"packet=ffffffffffff000000000000080045000054de00000040014267ac1f0102ac1f01010000efecd19a00019c90036300000000d5b00a0000000000101112131415161718191a1b1c1d1e1f202122232425262728292a2b2c2d2e2f3031323334353637,
actions=resubmit(,0)"
done &
while [[ 1 ]]; do
    ovs-vsctl --may-exist br-sidea
    ovs-vsctl --if-exists del-port bonda
    ovs-vsctl add-bond br-sidea bonda portab0 portab1 -- set port
bonda lacp=active -- set port bonda bond_mode=balance-tcp
    ovs-vsctl add-port br-sidea ovs-porta2
    ovs-ofctl add-flow br-sidea "actions=normal"
done &
ip netns exec sidea ping -f 172.31.1.2


On my machine it triggers a deadlock fairly consistently in under 3
seconds. Do you think it's worth turning something like this into a
test? Or would this be too slow and resource intensive?


Cheers,
M
Ilya Maximets Sept. 13, 2022, 7:11 p.m. UTC | #2
On 8/22/22 17:21, Mike Pattrick wrote:
> On Thu, Aug 11, 2022 at 10:55 AM Ilya Maximets <i.maximets@ovn.org> wrote:
>>
>> If the PACKET_OUT from controller ends up with sending packet to
>> a bond interface, the main thread will take locks in the following
>> order:
>>   handle_openflow
>>   --> take ofproto_mutex
>>   handle_packet_out
>>   packet_xlate
>>   output_normal
>>   bond_update_post_recirc_rules
>>   --> take rwlock in bond.c
>>
>> If at the same time revalidator thread is processing other packet
>> with the output to the same bond:
>>   xlate_actions
>>   output_normal
>>   bond_update_post_recirc_rules
>>   --> take rwlock in bond.c
>>   update_recirc_rules
>>   ofproto_dpif_add_internal_flow
>>   ofproto_flow_mod
>>   --> take ofproto_mutex
>>
>> So, it is possible for these 2 threads to lock each other by
>> taking one lock and waiting for another thread to release the
>> second lock.
>>
>> It might also be possible for the main thread to lock itself
>> up by trying to acquire ofproto_mutex for the second time, if
>> it will actually proceed with update_recirc_rules() after
>> taking the bond rwlock.
>>
>> The problem appears to be that bond_update_post_recirc_rules()
>> is called during the flow translation even if side effects are
>> prohibited, which is the case for openflow PACKET_OUT handling.
>>
>> Skipping actual flow updates during the flow translation if
>> side effects are disabled to avoid the deadlock.
>>
>> Since flow are not installed now when actions translated for
>> very first packet, installing initial flows in bond_reconfigure().
>> This will cover the case of allocating a new recirc_id.
>>
>> Also checking if we need to update flows in bond_run() to cover
>> link state changes.
>>
>> Reported-at: https://github.com/openvswitch/ovs-issues/issues/259
>> Fixes: adcf00ba35a0 ("ofproto/bond: Implement bond megaflow using recirculation")
>> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
>> ---
>>
>> I'm not sure how to create a unit test for the lock race with
>> revalidator, but it might be possible to trigger the self-lockup
>> issue on the main thread.  I'll try to work on the test while
>> waiting for review.
>>
> 
> Hello Ilya,
> 
> I've tested this and confirmed that it solves the problem for me.
> 
> Acked-by: Mike Pattrick <mkp@redhat.com>
> 
> I created a reproducer that doesn't require dpdk or real interfaces:
> 
> ip netns add sidea
> ip netns add sideb
> ovs-vsctl add-br br-sidea
> ovs-vsctl add-br br-sideb
> ip link add portab0 type veth peer name portbb0
> ip link add portab1 type veth peer name portbb1
> ip link set dev portab0 up
> ip link set dev portab1 up
> ip link set dev portbb0 up
> ip link set dev portbb1 up
> ovs-vsctl add-bond br-sidea bonda portab0 portab1 -- set port bonda
> lacp=active -- set port bonda bond_mode=balance-tcp
> ovs-vsctl add-bond br-sideb bondb portbb0 portbb1 -- set port bondb
> lacp=active -- set port bondb bond_mode=balance-tcp
> ip link add porta2 type veth peer name ovs-porta2
> ip link add portb2 type veth peer name ovs-portb2
> ip link set dev ovs-porta2 up
> ip link set dev ovs-portb2 up
> ip link set porta2 netns sidea
> ip link set portb2 netns sideb
> ip netns exec sidea ip link set dev porta2 up
> ip netns exec sideb ip link set dev portb2 up
> ip netns exec sidea ip addr add 172.31.1.1/24 dev porta2
> ip netns exec sideb ip addr add 172.31.1.2/24 dev portb2
> ovs-vsctl add-port br-sidea ovs-porta2
> ovs-vsctl add-port br-sideb ovs-portb2
> ovs-ofctl add-flow br-sidea "actions=normal"
> ovs-ofctl add-flow br-sideb "actions=normal"
> while [[ 1 ]]; do
>     ovs-ofctl packet-out br-sidea
> "packet=ffffffffffff000000000000080045000054fbc340004001e4a3ac1f0101ac1f01020800e9af9e540001ce8e036300000000da35050000000000101112131415161718191a1b1c1d1e1f202122231415161718191a1b1c1d1e1f202122232425262728292a2b2c2d2e2f3031323334353637,
> actions=resubmit(,0)" 2>/dev/null >/dev/null
>     ovs-ofctl packet-out br-sideb
> "packet=ffffffffffff000000000000080045000054de00000040014267ac1f0102ac1f01010000efecd19a00019c90036300000000d5b00a0000000000101112131415161718191a1b1c1d1e1f202122232425262728292a2b2c2d2e2f3031323334353637,
> actions=resubmit(,0)"
> done &
> while [[ 1 ]]; do
>     ovs-vsctl --may-exist br-sidea
>     ovs-vsctl --if-exists del-port bonda
>     ovs-vsctl add-bond br-sidea bonda portab0 portab1 -- set port
> bonda lacp=active -- set port bonda bond_mode=balance-tcp
>     ovs-vsctl add-port br-sidea ovs-porta2
>     ovs-ofctl add-flow br-sidea "actions=normal"
> done &
> ip netns exec sidea ping -f 172.31.1.2
> 
> 
> On my machine it triggers a deadlock fairly consistently in under 3
> seconds. Do you think it's worth turning something like this into a
> test? Or would this be too slow and resource intensive?

Thanks, Mike.  It looks hard to reproduce this way in CI, and
also a bit unreliable.  I manged to create a test that is
triggering double locking.  I sent v2 with that test included.
Please, take a look.

Best regards, Ilya Maximets.
diff mbox series

Patch

diff --git a/ofproto/bond.c b/ofproto/bond.c
index f06cf20c9..a3cc795bf 100644
--- a/ofproto/bond.c
+++ b/ofproto/bond.c
@@ -187,10 +187,15 @@  static struct bond_member *choose_output_member(const struct bond *,
                                                 uint16_t vlan)
     OVS_REQ_RDLOCK(rwlock);
 static void update_recirc_rules__(struct bond *);
+static bool bond_may_recirc(const struct bond *bond);
+static void bond_update_post_recirc_rules__(struct bond *bond,
+                                            const bool force)
+    OVS_REQ_WRLOCK(rwlock);
 static bool bond_is_falling_back_to_ab(const struct bond *);
 static void bond_add_lb_output_buckets(const struct bond *);
 static void bond_del_lb_output_buckets(const struct bond *);
 
+
 /* Attempts to parse 's' as the name of a bond balancing mode.  If successful,
  * stores the mode in '*balance' and returns true.  Otherwise returns false
  * without modifying '*balance'. */
@@ -515,6 +520,12 @@  bond_reconfigure(struct bond *bond, const struct bond_settings *s)
         bond_entry_reset(bond);
     }
 
+    if (bond->ofproto->backer->rt_support.odp.recirc
+        && bond_may_recirc(bond)) {
+        /* Update rules to reflect possible recirc_id changes. */
+        update_recirc_rules(bond);
+    }
+
     ovs_rwlock_unlock(&rwlock);
     return revalidate;
 }
@@ -728,6 +739,12 @@  bond_run(struct bond *bond, enum lacp_status lacp_status)
         bond_choose_active_member(bond);
     }
 
+    if (bond->ofproto->backer->rt_support.odp.recirc
+        && bond_may_recirc(bond)) {
+        /* Update rules to reflect possible link state changes. */
+        bond_update_post_recirc_rules__(bond, false);
+    }
+
     revalidate = bond->bond_revalidate;
     bond->bond_revalidate = false;
     ovs_rwlock_unlock(&rwlock);
@@ -1091,6 +1108,19 @@  bond_update_post_recirc_rules(struct bond *bond, uint32_t *recirc_id,
     }
 }
 
+void
+bond_get_recirc_id_and_hash_basis(struct bond *bond, uint32_t *recirc_id,
+                                  uint32_t *hash_basis)
+{
+    ovs_rwlock_rdlock(&rwlock);
+    if (bond_may_recirc(bond)) {
+        *recirc_id = bond->recirc_id;
+        *hash_basis = bond->basis;
+    } else {
+        *recirc_id = *hash_basis = 0;
+    }
+    ovs_rwlock_unlock(&rwlock);
+}
 
 /* Rebalancing. */
 
diff --git a/ofproto/bond.h b/ofproto/bond.h
index 2eb0c95a7..9c2b994c6 100644
--- a/ofproto/bond.h
+++ b/ofproto/bond.h
@@ -131,6 +131,9 @@  void bond_rebalance(struct bond *);
 void bond_update_post_recirc_rules(struct bond *, uint32_t *recirc_id,
                                    uint32_t *hash_basis);
 
+void bond_get_recirc_id_and_hash_basis(struct bond *bond, uint32_t *recirc_id,
+                                       uint32_t *hash_basis);
+
 bool bond_use_lb_output_action(const struct bond *bond);
 
 #endif /* bond.h */
diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
index fda802e83..60ce7bb5b 100644
--- a/ofproto/ofproto-dpif-xlate.c
+++ b/ofproto/ofproto-dpif-xlate.c
@@ -2466,9 +2466,18 @@  output_normal(struct xlate_ctx *ctx, const struct xbundle *out_xbundle,
             /* In case recirculation is not actually in use, 'xr.recirc_id'
              * will be set to '0', since a valid 'recirc_id' can
              * not be zero.  */
-            bond_update_post_recirc_rules(out_xbundle->bond,
-                                          &xr.recirc_id,
-                                          &xr.hash_basis);
+            if (ctx->xin->allow_side_effects) {
+                bond_update_post_recirc_rules(out_xbundle->bond,
+                                              &xr.recirc_id,
+                                              &xr.hash_basis);
+            } else {
+                /* If side effects are not allowed, only getting the bond
+                 * configuration.  Rule updates will be handled by the
+                 * main thread later. */
+                bond_get_recirc_id_and_hash_basis(out_xbundle->bond,
+                                                  &xr.recirc_id,
+                                                  &xr.hash_basis);
+            }
             if (xr.recirc_id) {
                 /* Use recirculation instead of output. */
                 use_recirc = true;