diff mbox series

[ovs-dev] dpif-netdev: Make datapath port mutex recursive.

Message ID 20200115172919.30551-1-i.maximets@ovn.org
State Accepted
Headers show
Series [ovs-dev] dpif-netdev: Make datapath port mutex recursive. | expand

Commit Message

Ilya Maximets Jan. 15, 2020, 5:29 p.m. UTC
Upcoming HW offloading will request flow statistics from the dpdk
offloading module.  This operation requires holding datapath port
mutex.  However, there is a possible scenarion in which flow deletion
happens during datapath reconfiguration process and the mutex already
acquired:

  0  raise () from /lib64/libc.so.6
  1  abort () from /lib64/libc.so.6
  2  ovs_abort_valist ()
  3  ovs_abort ()
  4  ovs_mutex_lock_at ()
  5  dpif_netdev_get_flow_offload_status ()
  6  get_dpif_flow_status ()
  7  flow_del_on_pmd ()
  8  dpif_netdev_flow_del ()
  9  dpif_netdev_operate ()
  10 dpif_operate ()
  11 push_dp_ops ()
  12 push_ukey_ops ()
  13 dp_purge_cb ()
  14 dp_netdev_del_pmd ()
  15 reconfigure_pmd_threads ()
  16 reconfigure_datapath ()
  17 do_del_port ()
  18 dpif_netdev_port_del ()
  19 dpif_port_del ()
  20 port_del ()
  21 ofproto_port_del ()
  22 bridge_delete_or_reconfigure_ports ()
  23 bridge_reconfigure ()
  24 bridge_run ()
  25 main ()

This happens while removing the last port of a particular PMD thread.
Reconfiguration process decides that we need to remove current PMD
thread and calls datapath purge callback in order to clean up resources
assigned to it.  This turns into flow removal and flow_del() tries to
request statistics.

Turning the dp->mutex into recursive version as a quick fix for this
issue.  Better solutions might be to avoid statistics request somehow,
or fully disassociate offloaded flows from the datapath flows.

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
---

Ideas for better solution are welcome.

 lib/dpif-netdev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Stokes, Ian Jan. 16, 2020, 12:13 p.m. UTC | #1
On 1/15/2020 5:29 PM, Ilya Maximets wrote:
> Upcoming HW offloading will request flow statistics from the dpdk
> offloading module.  This operation requires holding datapath port
> mutex.  However, there is a possible scenarion in which flow deletion
> happens during datapath reconfiguration process and the mutex already
> acquired:
> 
>    0  raise () from /lib64/libc.so.6
>    1  abort () from /lib64/libc.so.6
>    2  ovs_abort_valist ()
>    3  ovs_abort ()
>    4  ovs_mutex_lock_at ()
>    5  dpif_netdev_get_flow_offload_status ()
>    6  get_dpif_flow_status ()
>    7  flow_del_on_pmd ()
>    8  dpif_netdev_flow_del ()
>    9  dpif_netdev_operate ()
>    10 dpif_operate ()
>    11 push_dp_ops ()
>    12 push_ukey_ops ()
>    13 dp_purge_cb ()
>    14 dp_netdev_del_pmd ()
>    15 reconfigure_pmd_threads ()
>    16 reconfigure_datapath ()
>    17 do_del_port ()
>    18 dpif_netdev_port_del ()
>    19 dpif_port_del ()
>    20 port_del ()
>    21 ofproto_port_del ()
>    22 bridge_delete_or_reconfigure_ports ()
>    23 bridge_reconfigure ()
>    24 bridge_run ()
>    25 main ()
> 
> This happens while removing the last port of a particular PMD thread.
> Reconfiguration process decides that we need to remove current PMD
> thread and calls datapath purge callback in order to clean up resources
> assigned to it.  This turns into flow removal and flow_del() tries to
> request statistics.
> 
> Turning the dp->mutex into recursive version as a quick fix for this
> issue.  Better solutions might be to avoid statistics request somehow,
> or fully disassociate offloaded flows from the datapath flows.
> 
> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

Thanks for the patch Ilya, makes snse to me as a workaround for the 
moment. Tested in my own deployment and didnt see any issues.

Acked.

Ian
> ---
> 
> Ideas for better solution are welcome.
> 
>   lib/dpif-netdev.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index 079bd1bdf..d4b1ebbff 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -1547,7 +1547,7 @@ create_dp_netdev(const char *name, const struct dpif_class *class,
>       ovs_refcount_init(&dp->ref_cnt);
>       atomic_flag_clear(&dp->destroyed);
>   
> -    ovs_mutex_init(&dp->port_mutex);
> +    ovs_mutex_init_recursive(&dp->port_mutex);
>       hmap_init(&dp->ports);
>       dp->port_seq = seq_create();
>       fat_rwlock_init(&dp->upcall_rwlock);
>
Ilya Maximets Jan. 16, 2020, 1:32 p.m. UTC | #2
On 16.01.2020 13:13, Stokes, Ian wrote:
> 
> 
> On 1/15/2020 5:29 PM, Ilya Maximets wrote:
>> Upcoming HW offloading will request flow statistics from the dpdk
>> offloading module.  This operation requires holding datapath port
>> mutex.  However, there is a possible scenarion in which flow deletion
>> happens during datapath reconfiguration process and the mutex already
>> acquired:
>>
>>    0  raise () from /lib64/libc.so.6
>>    1  abort () from /lib64/libc.so.6
>>    2  ovs_abort_valist ()
>>    3  ovs_abort ()
>>    4  ovs_mutex_lock_at ()
>>    5  dpif_netdev_get_flow_offload_status ()
>>    6  get_dpif_flow_status ()
>>    7  flow_del_on_pmd ()
>>    8  dpif_netdev_flow_del ()
>>    9  dpif_netdev_operate ()
>>    10 dpif_operate ()
>>    11 push_dp_ops ()
>>    12 push_ukey_ops ()
>>    13 dp_purge_cb ()
>>    14 dp_netdev_del_pmd ()
>>    15 reconfigure_pmd_threads ()
>>    16 reconfigure_datapath ()
>>    17 do_del_port ()
>>    18 dpif_netdev_port_del ()
>>    19 dpif_port_del ()
>>    20 port_del ()
>>    21 ofproto_port_del ()
>>    22 bridge_delete_or_reconfigure_ports ()
>>    23 bridge_reconfigure ()
>>    24 bridge_run ()
>>    25 main ()
>>
>> This happens while removing the last port of a particular PMD thread.
>> Reconfiguration process decides that we need to remove current PMD
>> thread and calls datapath purge callback in order to clean up resources
>> assigned to it.  This turns into flow removal and flow_del() tries to
>> request statistics.
>>
>> Turning the dp->mutex into recursive version as a quick fix for this
>> issue.  Better solutions might be to avoid statistics request somehow,
>> or fully disassociate offloaded flows from the datapath flows.
>>
>> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
> 
> Thanks for the patch Ilya, makes snse to me as a workaround for the moment. Tested in my own deployment and didnt see any issues.
> 
> Acked.


Thanks.  I don't really like this solution.  Recursive mutexes usually
indicates architectural issues (which we definitely have) and should be
avoided as possible.  Hope, we'll find a way to revert this change in
the future.  For now, applied to master.

Best regards, Ilya Maximets.
diff mbox series

Patch

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 079bd1bdf..d4b1ebbff 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -1547,7 +1547,7 @@  create_dp_netdev(const char *name, const struct dpif_class *class,
     ovs_refcount_init(&dp->ref_cnt);
     atomic_flag_clear(&dp->destroyed);
 
-    ovs_mutex_init(&dp->port_mutex);
+    ovs_mutex_init_recursive(&dp->port_mutex);
     hmap_init(&dp->ports);
     dp->port_seq = seq_create();
     fat_rwlock_init(&dp->upcall_rwlock);