diff mbox series

[ovs-dev] dpif-netdev: Enter quiescent state after each offloading operation.

Message ID 20200221145401.393980-1-i.maximets@ovn.org
State Accepted
Delegated to: Ilya Maximets
Headers show
Series [ovs-dev] dpif-netdev: Enter quiescent state after each offloading operation. | expand

Commit Message

Ilya Maximets Feb. 21, 2020, 2:54 p.m. UTC
If the offloading queue is big and filled continuously, offloading
thread may have no chance to quiesce blocking rcu callbacks and
other threads waiting for synchronization.

Fix that by entering momentary quiescent state after each operation
since we're not holding any rcu-protected memory here.

Fixes: 02bb2824e51d ("dpif-netdev: do hw flow offload in a thread")
Reported-by: Eli Britstein <elibr@mellanox.com>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2020-February/049768.html
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
---
 lib/dpif-netdev.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Eli Britstein Feb. 23, 2020, 2:32 p.m. UTC | #1
On 2/21/2020 4:54 PM, Ilya Maximets wrote:
> If the offloading queue is big and filled continuously, offloading
> thread may have no chance to quiesce blocking rcu callbacks and
> other threads waiting for synchronization.
>
> Fix that by entering momentary quiescent state after each operation
> since we're not holding any rcu-protected memory here.
>
> Fixes: 02bb2824e51d ("dpif-netdev: do hw flow offload in a thread")
> Reported-by: Eli Britstein <elibr@mellanox.com>
> Reported-at: https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.openvswitch.org%2Fpipermail%2Fovs-discuss%2F2020-February%2F049768.html&amp;data=02%7C01%7Celibr%40mellanox.com%7Cd9f2868bb21d43721b0408d7b6ddedd5%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C637178936610286432&amp;sdata=I1qteh%2FJ6QHMn593P1uJOvWqbrumsfthYdgZ1Me96Fo%3D&amp;reserved=0
> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
> ---
>   lib/dpif-netdev.c | 1 +
>   1 file changed, 1 insertion(+)
>
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index d393aab5e..a798db45d 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -2512,6 +2512,7 @@ dp_netdev_flow_offload_main(void *data OVS_UNUSED)
>           VLOG_DBG("%s to %s netdev flow\n",
>                    ret == 0 ? "succeed" : "failed", op);
>           dp_netdev_free_flow_offload(offload);
> +        ovsrcu_quiesce();

This seems to solve the issue of responsiveness, but I have encountered 
a crash using it, while there is a lot of flow deletions.

#0  0x0000000001e462dc in get_unaligned_u32 (p=0x7f57f7813500) at 
lib/unaligned.h:86
#1  0x0000000001e46388 in hash_bytes (p_=0x7f57f7813500, n=16, basis=0) 
at lib/hash.c:38
#2  0x0000000001f7f836 in ufid_to_rte_flow_data_find 
(ufid=0x7f57f7813500) at lib/netdev-offload-dpdk.c:134
#3  0x0000000001f88693 in netdev_offload_dpdk_flow_del 
(netdev=0x42260da0, ufid=0x7f57f7813500, stats=0x0) at 
lib/netdev-offload-dpdk.c:3361
#4  0x0000000001e6a00d in netdev_flow_del (netdev=0x42260da0, 
ufid=0x7f57f7813500, stats=0x0) at lib/netdev-offload.c:296

In other trials, I see the ovs-appctl stuck again, but then I cannot 
attach gdb (or use pstack). It hangs. In this scenario, I see this in dmesg:

[  404.450694] ovs-vswitchd[8574]: segfault at 7f8bd4e87280 ip 
0000000001e463c3 sp 00007f8cfaf33028 error 4 in ovs-vswitchd[400000+200c000]

Yanqin Wei <Yanqin.Wei@arm.com> suggested to add ovsrcu_try_quiesce call 
(in the same place). It seems more stable (haven't see such crash as 
above), but it has the same stuck symptom as above.

>       }
>   
>       return NULL;
Ilya Maximets Feb. 24, 2020, 11:06 a.m. UTC | #2
On 2/23/20 3:32 PM, Eli Britstein wrote:
> 
> On 2/21/2020 4:54 PM, Ilya Maximets wrote:
>> If the offloading queue is big and filled continuously, offloading
>> thread may have no chance to quiesce blocking rcu callbacks and
>> other threads waiting for synchronization.
>>
>> Fix that by entering momentary quiescent state after each operation
>> since we're not holding any rcu-protected memory here.
>>
>> Fixes: 02bb2824e51d ("dpif-netdev: do hw flow offload in a thread")
>> Reported-by: Eli Britstein <elibr@mellanox.com>
>> Reported-at: https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.openvswitch.org%2Fpipermail%2Fovs-discuss%2F2020-February%2F049768.html&amp;data=02%7C01%7Celibr%40mellanox.com%7Cd9f2868bb21d43721b0408d7b6ddedd5%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C637178936610286432&amp;sdata=I1qteh%2FJ6QHMn593P1uJOvWqbrumsfthYdgZ1Me96Fo%3D&amp;reserved=0
>> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
>> ---
>>   lib/dpif-netdev.c | 1 +
>>   1 file changed, 1 insertion(+)
>>
>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>> index d393aab5e..a798db45d 100644
>> --- a/lib/dpif-netdev.c
>> +++ b/lib/dpif-netdev.c
>> @@ -2512,6 +2512,7 @@ dp_netdev_flow_offload_main(void *data OVS_UNUSED)
>>           VLOG_DBG("%s to %s netdev flow\n",
>>                    ret == 0 ? "succeed" : "failed", op);
>>           dp_netdev_free_flow_offload(offload);
>> +        ovsrcu_quiesce();
> 
> This seems to solve the issue of responsiveness, but I have encountered a crash using it, while there is a lot of flow deletions.
> 
> #0  0x0000000001e462dc in get_unaligned_u32 (p=0x7f57f7813500) at lib/unaligned.h:86
> #1  0x0000000001e46388 in hash_bytes (p_=0x7f57f7813500, n=16, basis=0) at lib/hash.c:38
> #2  0x0000000001f7f836 in ufid_to_rte_flow_data_find (ufid=0x7f57f7813500) at lib/netdev-offload-dpdk.c:134
> #3  0x0000000001f88693 in netdev_offload_dpdk_flow_del (netdev=0x42260da0, ufid=0x7f57f7813500, stats=0x0) at lib/netdev-offload-dpdk.c:3361
> #4  0x0000000001e6a00d in netdev_flow_del (netdev=0x42260da0, ufid=0x7f57f7813500, stats=0x0) at lib/netdev-offload.c:296

This might mean 2 things:
1. Incorrect usage of RCU protected data inside netdev-offload-dpdk.
2. We're not allowed to quiesce if we have any item in offloading queue,
   i.e. bad RCU usage scheme in dpif-netdev.

Both cases are bad and I'm not sure if we can easily fix case #2.
I'll take a look and try to find a root cause.

> 
> In other trials, I see the ovs-appctl stuck again, but then I cannot attach gdb (or use pstack). It hangs. In this scenario, I see this in dmesg:
> 
> [  404.450694] ovs-vswitchd[8574]: segfault at 7f8bd4e87280 ip 0000000001e463c3 sp 00007f8cfaf33028 error 4 in ovs-vswitchd[400000+200c000]
> 
> Yanqin Wei <Yanqin.Wei@arm.com> suggested to add ovsrcu_try_quiesce call (in the same place). It seems more stable (haven't see such crash as above), but it has the same stuck symptom as above.

ovsrcu_try_quiesce() is the same as ovsrcu_quiesce() except that is fails
if mutex can not be taken right now.  There is no real difference.  Your
crashes are less frequent just because thread enters quiescent state less
frequently.

Best regards, Ilya Maximets.
Eli Britstein Feb. 26, 2020, 12:05 p.m. UTC | #3
On 2/24/2020 1:06 PM, Ilya Maximets wrote:
> On 2/23/20 3:32 PM, Eli Britstein wrote:
>> On 2/21/2020 4:54 PM, Ilya Maximets wrote:
>>> If the offloading queue is big and filled continuously, offloading
>>> thread may have no chance to quiesce blocking rcu callbacks and
>>> other threads waiting for synchronization.
>>>
>>> Fix that by entering momentary quiescent state after each operation
>>> since we're not holding any rcu-protected memory here.
>>>
>>> Fixes: 02bb2824e51d ("dpif-netdev: do hw flow offload in a thread")
>>> Reported-by: Eli Britstein <elibr@mellanox.com>
>>> Reported-at: https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.openvswitch.org%2Fpipermail%2Fovs-discuss%2F2020-February%2F049768.html&amp;data=02%7C01%7Celibr%40mellanox.com%7C1989a8c52ed74a5b3a6208d7b9199d32%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C637181391983096010&amp;sdata=eOLfKmc9Xo3APYbOWoajnrvIaUir3A4x4%2BiLc0CbhPI%3D&amp;reserved=0
>>> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
>>> ---
>>>    lib/dpif-netdev.c | 1 +
>>>    1 file changed, 1 insertion(+)
>>>
>>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>>> index d393aab5e..a798db45d 100644
>>> --- a/lib/dpif-netdev.c
>>> +++ b/lib/dpif-netdev.c
>>> @@ -2512,6 +2512,7 @@ dp_netdev_flow_offload_main(void *data OVS_UNUSED)
>>>            VLOG_DBG("%s to %s netdev flow\n",
>>>                     ret == 0 ? "succeed" : "failed", op);
>>>            dp_netdev_free_flow_offload(offload);
>>> +        ovsrcu_quiesce();
>> This seems to solve the issue of responsiveness, but I have encountered a crash using it, while there is a lot of flow deletions.
>>
>> #0  0x0000000001e462dc in get_unaligned_u32 (p=0x7f57f7813500) at lib/unaligned.h:86
>> #1  0x0000000001e46388 in hash_bytes (p_=0x7f57f7813500, n=16, basis=0) at lib/hash.c:38
>> #2  0x0000000001f7f836 in ufid_to_rte_flow_data_find (ufid=0x7f57f7813500) at lib/netdev-offload-dpdk.c:134
>> #3  0x0000000001f88693 in netdev_offload_dpdk_flow_del (netdev=0x42260da0, ufid=0x7f57f7813500, stats=0x0) at lib/netdev-offload-dpdk.c:3361
>> #4  0x0000000001e6a00d in netdev_flow_del (netdev=0x42260da0, ufid=0x7f57f7813500, stats=0x0) at lib/netdev-offload.c:296
> This might mean 2 things:
> 1. Incorrect usage of RCU protected data inside netdev-offload-dpdk.
> 2. We're not allowed to quiesce if we have any item in offloading queue,
>     i.e. bad RCU usage scheme in dpif-netdev.
>
> Both cases are bad and I'm not sure if we can easily fix case #2.
> I'll take a look and try to find a root cause.

It was my bad. Sorry. This is another issue, that exists in my branch 
only (CT offloads), so it's not related. I was able to fix it.

However, I wanted to reproduce with master branch, but then I cannot 
create a lot of flows as I get: "upcall: datapath flow limit reached".

I tried setting large number using "ovs-appctl upcall/set-flow-limit 
400000", but it didn't help.

>
>> In other trials, I see the ovs-appctl stuck again, but then I cannot attach gdb (or use pstack). It hangs. In this scenario, I see this in dmesg:
>>
>> [  404.450694] ovs-vswitchd[8574]: segfault at 7f8bd4e87280 ip 0000000001e463c3 sp 00007f8cfaf33028 error 4 in ovs-vswitchd[400000+200c000]
>>
>> Yanqin Wei <Yanqin.Wei@arm.com> suggested to add ovsrcu_try_quiesce call (in the same place). It seems more stable (haven't see such crash as above), but it has the same stuck symptom as above.
> ovsrcu_try_quiesce() is the same as ovsrcu_quiesce() except that is fails
> if mutex can not be taken right now.  There is no real difference.  Your
> crashes are less frequent just because thread enters quiescent state less
> frequently.
If so, I think "ovsrcu_try_quiesce" is more suitable here. Don't you think?
>
> Best regards, Ilya Maximets.
Ilya Maximets Feb. 26, 2020, 12:31 p.m. UTC | #4
On 2/26/20 1:05 PM, Eli Britstein wrote:
> 
> On 2/24/2020 1:06 PM, Ilya Maximets wrote:
>> On 2/23/20 3:32 PM, Eli Britstein wrote:
>>> On 2/21/2020 4:54 PM, Ilya Maximets wrote:
>>>> If the offloading queue is big and filled continuously, offloading
>>>> thread may have no chance to quiesce blocking rcu callbacks and
>>>> other threads waiting for synchronization.
>>>>
>>>> Fix that by entering momentary quiescent state after each operation
>>>> since we're not holding any rcu-protected memory here.
>>>>
>>>> Fixes: 02bb2824e51d ("dpif-netdev: do hw flow offload in a thread")
>>>> Reported-by: Eli Britstein <elibr@mellanox.com>
>>>> Reported-at: https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.openvswitch.org%2Fpipermail%2Fovs-discuss%2F2020-February%2F049768.html&amp;data=02%7C01%7Celibr%40mellanox.com%7C1989a8c52ed74a5b3a6208d7b9199d32%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C637181391983096010&amp;sdata=eOLfKmc9Xo3APYbOWoajnrvIaUir3A4x4%2BiLc0CbhPI%3D&amp;reserved=0
>>>> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
>>>> ---
>>>>    lib/dpif-netdev.c | 1 +
>>>>    1 file changed, 1 insertion(+)
>>>>
>>>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>>>> index d393aab5e..a798db45d 100644
>>>> --- a/lib/dpif-netdev.c
>>>> +++ b/lib/dpif-netdev.c
>>>> @@ -2512,6 +2512,7 @@ dp_netdev_flow_offload_main(void *data OVS_UNUSED)
>>>>            VLOG_DBG("%s to %s netdev flow\n",
>>>>                     ret == 0 ? "succeed" : "failed", op);
>>>>            dp_netdev_free_flow_offload(offload);
>>>> +        ovsrcu_quiesce();
>>> This seems to solve the issue of responsiveness, but I have encountered a crash using it, while there is a lot of flow deletions.
>>>
>>> #0  0x0000000001e462dc in get_unaligned_u32 (p=0x7f57f7813500) at lib/unaligned.h:86
>>> #1  0x0000000001e46388 in hash_bytes (p_=0x7f57f7813500, n=16, basis=0) at lib/hash.c:38
>>> #2  0x0000000001f7f836 in ufid_to_rte_flow_data_find (ufid=0x7f57f7813500) at lib/netdev-offload-dpdk.c:134
>>> #3  0x0000000001f88693 in netdev_offload_dpdk_flow_del (netdev=0x42260da0, ufid=0x7f57f7813500, stats=0x0) at lib/netdev-offload-dpdk.c:3361
>>> #4  0x0000000001e6a00d in netdev_flow_del (netdev=0x42260da0, ufid=0x7f57f7813500, stats=0x0) at lib/netdev-offload.c:296
>> This might mean 2 things:
>> 1. Incorrect usage of RCU protected data inside netdev-offload-dpdk.
>> 2. We're not allowed to quiesce if we have any item in offloading queue,
>>     i.e. bad RCU usage scheme in dpif-netdev.
>>
>> Both cases are bad and I'm not sure if we can easily fix case #2.
>> I'll take a look and try to find a root cause.
> 
> It was my bad. Sorry. This is another issue, that exists in my branch only (CT offloads), so it's not related. I was able to fix it.

OK.  Good to know.

> 
> However, I wanted to reproduce with master branch, but then I cannot create a lot of flows as I get: "upcall: datapath flow limit reached".
> 
> I tried setting large number using "ovs-appctl upcall/set-flow-limit 400000", but it didn't help.

Number of flows in dpif-netdev is artificially limited by MAX_FLOWS (65536)
value.  There was a recent discussion about this limit and I think we could
just remove it.  I have a small patch for this, will send soon.
For now, you can just raise the hardcoded value of MAX_FLOWS in dpif-netdev.c.

> 
>>
>>> In other trials, I see the ovs-appctl stuck again, but then I cannot attach gdb (or use pstack). It hangs. In this scenario, I see this in dmesg:
>>>
>>> [  404.450694] ovs-vswitchd[8574]: segfault at 7f8bd4e87280 ip 0000000001e463c3 sp 00007f8cfaf33028 error 4 in ovs-vswitchd[400000+200c000]
>>>
>>> Yanqin Wei <Yanqin.Wei@arm.com> suggested to add ovsrcu_try_quiesce call (in the same place). It seems more stable (haven't see such crash as above), but it has the same stuck symptom as above.
>> ovsrcu_try_quiesce() is the same as ovsrcu_quiesce() except that is fails
>> if mutex can not be taken right now.  There is no real difference.  Your
>> crashes are less frequent just because thread enters quiescent state less
>> frequently.
> If so, I think "ovsrcu_try_quiesce" is more suitable here. Don't you think?

I don't think that taking of one more mutex could make harm in
offloading thread.  If we'll want to care about minor sleeping
spikes, we should start from replacing the dp_flow_offload.mutex
and sleeping on the conditional variable.
Eli Britstein Feb. 27, 2020, 9 a.m. UTC | #5
On 2/26/2020 2:31 PM, Ilya Maximets wrote:
> On 2/26/20 1:05 PM, Eli Britstein wrote:
>> On 2/24/2020 1:06 PM, Ilya Maximets wrote:
>>> On 2/23/20 3:32 PM, Eli Britstein wrote:
>>>> On 2/21/2020 4:54 PM, Ilya Maximets wrote:
>>>>> If the offloading queue is big and filled continuously, offloading
>>>>> thread may have no chance to quiesce blocking rcu callbacks and
>>>>> other threads waiting for synchronization.
>>>>>
>>>>> Fix that by entering momentary quiescent state after each operation
>>>>> since we're not holding any rcu-protected memory here.
>>>>>
>>>>> Fixes: 02bb2824e51d ("dpif-netdev: do hw flow offload in a thread")
>>>>> Reported-by: Eli Britstein <elibr@mellanox.com>
>>>>> Reported-at: https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.openvswitch.org%2Fpipermail%2Fovs-discuss%2F2020-February%2F049768.html&amp;data=02%7C01%7Celibr%40mellanox.com%7C5aeca2da0c4144e42fb708d7bab7de50%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C637183171193644535&amp;sdata=V7N8t72VsG4eD9eJbhyn2xh6Y73b%2BTz72utoSOFSEaQ%3D&amp;reserved=0
>>>>> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
>>>>> ---
>>>>>     lib/dpif-netdev.c | 1 +
>>>>>     1 file changed, 1 insertion(+)
>>>>>
>>>>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>>>>> index d393aab5e..a798db45d 100644
>>>>> --- a/lib/dpif-netdev.c
>>>>> +++ b/lib/dpif-netdev.c
>>>>> @@ -2512,6 +2512,7 @@ dp_netdev_flow_offload_main(void *data OVS_UNUSED)
>>>>>             VLOG_DBG("%s to %s netdev flow\n",
>>>>>                      ret == 0 ? "succeed" : "failed", op);
>>>>>             dp_netdev_free_flow_offload(offload);
>>>>> +        ovsrcu_quiesce();
>>>> This seems to solve the issue of responsiveness, but I have encountered a crash using it, while there is a lot of flow deletions.
>>>>
>>>> #0  0x0000000001e462dc in get_unaligned_u32 (p=0x7f57f7813500) at lib/unaligned.h:86
>>>> #1  0x0000000001e46388 in hash_bytes (p_=0x7f57f7813500, n=16, basis=0) at lib/hash.c:38
>>>> #2  0x0000000001f7f836 in ufid_to_rte_flow_data_find (ufid=0x7f57f7813500) at lib/netdev-offload-dpdk.c:134
>>>> #3  0x0000000001f88693 in netdev_offload_dpdk_flow_del (netdev=0x42260da0, ufid=0x7f57f7813500, stats=0x0) at lib/netdev-offload-dpdk.c:3361
>>>> #4  0x0000000001e6a00d in netdev_flow_del (netdev=0x42260da0, ufid=0x7f57f7813500, stats=0x0) at lib/netdev-offload.c:296
>>> This might mean 2 things:
>>> 1. Incorrect usage of RCU protected data inside netdev-offload-dpdk.
>>> 2. We're not allowed to quiesce if we have any item in offloading queue,
>>>      i.e. bad RCU usage scheme in dpif-netdev.
>>>
>>> Both cases are bad and I'm not sure if we can easily fix case #2.
>>> I'll take a look and try to find a root cause.
>> It was my bad. Sorry. This is another issue, that exists in my branch only (CT offloads), so it's not related. I was able to fix it.
> OK.  Good to know.
>
>> However, I wanted to reproduce with master branch, but then I cannot create a lot of flows as I get: "upcall: datapath flow limit reached".
>>
>> I tried setting large number using "ovs-appctl upcall/set-flow-limit 400000", but it didn't help.
> Number of flows in dpif-netdev is artificially limited by MAX_FLOWS (65536)
> value.  There was a recent discussion about this limit and I think we could
> just remove it.  I have a small patch for this, will send soon.
> For now, you can just raise the hardcoded value of MAX_FLOWS in dpif-netdev.c.

The issue is not in dpif-netdev.c. It's in 
ofproto/ofproto-dpif-upcall.c. If I'm not wrong, when it happens, there 
is not attempt to install the flow, so anyway MAX_FLOWS in dpif-netdev.c 
is not the blocker.

BTW, my setup is the PF and VF rep in a bridge. I run testpmd with 
macswap on the VF, and send 10k different src MAC packets (OpenFlow 
configuration is NORMAL). So, I expect total 20k flows, also less than 
64k of MAX_FLOWS.

>
>>>> In other trials, I see the ovs-appctl stuck again, but then I cannot attach gdb (or use pstack). It hangs. In this scenario, I see this in dmesg:
>>>>
>>>> [  404.450694] ovs-vswitchd[8574]: segfault at 7f8bd4e87280 ip 0000000001e463c3 sp 00007f8cfaf33028 error 4 in ovs-vswitchd[400000+200c000]
>>>>
>>>> Yanqin Wei <Yanqin.Wei@arm.com> suggested to add ovsrcu_try_quiesce call (in the same place). It seems more stable (haven't see such crash as above), but it has the same stuck symptom as above.
>>> ovsrcu_try_quiesce() is the same as ovsrcu_quiesce() except that is fails
>>> if mutex can not be taken right now.  There is no real difference.  Your
>>> crashes are less frequent just because thread enters quiescent state less
>>> frequently.
>> If so, I think "ovsrcu_try_quiesce" is more suitable here. Don't you think?
> I don't think that taking of one more mutex could make harm in
> offloading thread.  If we'll want to care about minor sleeping
> spikes, we should start from replacing the dp_flow_offload.mutex
> and sleeping on the conditional variable.

Agree it's not critical. However, if it's not mandatory, I think it's 
better to avoid it.

In addition I think of making it multi-threaded (thread per port) to 
boost insertion/deletion rates.
Ilya Maximets Feb. 28, 2020, 9:03 a.m. UTC | #6
On 2/27/20 10:00 AM, Eli Britstein wrote:
> 
> On 2/26/2020 2:31 PM, Ilya Maximets wrote:
>> On 2/26/20 1:05 PM, Eli Britstein wrote:
>>> On 2/24/2020 1:06 PM, Ilya Maximets wrote:
>>>> On 2/23/20 3:32 PM, Eli Britstein wrote:
>>>>> On 2/21/2020 4:54 PM, Ilya Maximets wrote:
>>>>>> If the offloading queue is big and filled continuously, offloading
>>>>>> thread may have no chance to quiesce blocking rcu callbacks and
>>>>>> other threads waiting for synchronization.
>>>>>>
>>>>>> Fix that by entering momentary quiescent state after each operation
>>>>>> since we're not holding any rcu-protected memory here.
>>>>>>
>>>>>> Fixes: 02bb2824e51d ("dpif-netdev: do hw flow offload in a thread")
>>>>>> Reported-by: Eli Britstein <elibr@mellanox.com>
>>>>>> Reported-at: https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.openvswitch.org%2Fpipermail%2Fovs-discuss%2F2020-February%2F049768.html&amp;data=02%7C01%7Celibr%40mellanox.com%7C5aeca2da0c4144e42fb708d7bab7de50%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C637183171193644535&amp;sdata=V7N8t72VsG4eD9eJbhyn2xh6Y73b%2BTz72utoSOFSEaQ%3D&amp;reserved=0
>>>>>> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
>>>>>> ---
>>>>>>     lib/dpif-netdev.c | 1 +
>>>>>>     1 file changed, 1 insertion(+)
>>>>>>
>>>>>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>>>>>> index d393aab5e..a798db45d 100644
>>>>>> --- a/lib/dpif-netdev.c
>>>>>> +++ b/lib/dpif-netdev.c
>>>>>> @@ -2512,6 +2512,7 @@ dp_netdev_flow_offload_main(void *data OVS_UNUSED)
>>>>>>             VLOG_DBG("%s to %s netdev flow\n",
>>>>>>                      ret == 0 ? "succeed" : "failed", op);
>>>>>>             dp_netdev_free_flow_offload(offload);
>>>>>> +        ovsrcu_quiesce();
>>>>> This seems to solve the issue of responsiveness, but I have encountered a crash using it, while there is a lot of flow deletions.
>>>>>
>>>>> #0  0x0000000001e462dc in get_unaligned_u32 (p=0x7f57f7813500) at lib/unaligned.h:86
>>>>> #1  0x0000000001e46388 in hash_bytes (p_=0x7f57f7813500, n=16, basis=0) at lib/hash.c:38
>>>>> #2  0x0000000001f7f836 in ufid_to_rte_flow_data_find (ufid=0x7f57f7813500) at lib/netdev-offload-dpdk.c:134
>>>>> #3  0x0000000001f88693 in netdev_offload_dpdk_flow_del (netdev=0x42260da0, ufid=0x7f57f7813500, stats=0x0) at lib/netdev-offload-dpdk.c:3361
>>>>> #4  0x0000000001e6a00d in netdev_flow_del (netdev=0x42260da0, ufid=0x7f57f7813500, stats=0x0) at lib/netdev-offload.c:296
>>>> This might mean 2 things:
>>>> 1. Incorrect usage of RCU protected data inside netdev-offload-dpdk.
>>>> 2. We're not allowed to quiesce if we have any item in offloading queue,
>>>>      i.e. bad RCU usage scheme in dpif-netdev.
>>>>
>>>> Both cases are bad and I'm not sure if we can easily fix case #2.
>>>> I'll take a look and try to find a root cause.
>>> It was my bad. Sorry. This is another issue, that exists in my branch only (CT offloads), so it's not related. I was able to fix it.
>> OK.  Good to know.
>>
>>> However, I wanted to reproduce with master branch, but then I cannot create a lot of flows as I get: "upcall: datapath flow limit reached".
>>>
>>> I tried setting large number using "ovs-appctl upcall/set-flow-limit 400000", but it didn't help.
>> Number of flows in dpif-netdev is artificially limited by MAX_FLOWS (65536)
>> value.  There was a recent discussion about this limit and I think we could
>> just remove it.  I have a small patch for this, will send soon.
>> For now, you can just raise the hardcoded value of MAX_FLOWS in dpif-netdev.c.
> 
> The issue is not in dpif-netdev.c. It's in ofproto/ofproto-dpif-upcall.c. If I'm not wrong, when it happens, there is not attempt to install the flow, so anyway MAX_FLOWS in dpif-netdev.c is not the blocker.

There are couple of things that controls the maximum number of datapath flows.
First is the other_config:flow-limit that defaults to 200000.  Second is the
datapath internal MAX_FLOWS define.  And also, "ovs-appctl upcall/set-flow-limit"
should be able to modify this value.  Sounds strange.

> 
> BTW, my setup is the PF and VF rep in a bridge. I run testpmd with macswap on the VF, and send 10k different src MAC packets (OpenFlow configuration is NORMAL). So, I expect total 20k flows, also less than 64k of MAX_FLOWS.

Are you sure that packets are not corrupted?  Could you dump datapath flows
and verify that these flows are what you've expected?

> 
>>
>>>>> In other trials, I see the ovs-appctl stuck again, but then I cannot attach gdb (or use pstack). It hangs. In this scenario, I see this in dmesg:
>>>>>
>>>>> [  404.450694] ovs-vswitchd[8574]: segfault at 7f8bd4e87280 ip 0000000001e463c3 sp 00007f8cfaf33028 error 4 in ovs-vswitchd[400000+200c000]
>>>>>
>>>>> Yanqin Wei <Yanqin.Wei@arm.com> suggested to add ovsrcu_try_quiesce call (in the same place). It seems more stable (haven't see such crash as above), but it has the same stuck symptom as above.
>>>> ovsrcu_try_quiesce() is the same as ovsrcu_quiesce() except that is fails
>>>> if mutex can not be taken right now.  There is no real difference.  Your
>>>> crashes are less frequent just because thread enters quiescent state less
>>>> frequently.
>>> If so, I think "ovsrcu_try_quiesce" is more suitable here. Don't you think?
>> I don't think that taking of one more mutex could make harm in
>> offloading thread.  If we'll want to care about minor sleeping
>> spikes, we should start from replacing the dp_flow_offload.mutex
>> and sleeping on the conditional variable.
> 
> Agree it's not critical. However, if it's not mandatory, I think it's better to avoid it.

Since it's not critical, I think it's better to not block RCU callbacks
and other threads that waits for RCU synchronization.

> 
> In addition I think of making it multi-threaded (thread per port) to boost insertion/deletion rates.
> 

This will not give any performance unless you've found a good way how to
avoid taking of global dp->port_mutex.  Also, "per port" seems too much.
Eli Britstein March 3, 2020, 2:44 p.m. UTC | #7
On 2/28/2020 11:03 AM, Ilya Maximets wrote:
> On 2/27/20 10:00 AM, Eli Britstein wrote:
>> On 2/26/2020 2:31 PM, Ilya Maximets wrote:
>>> On 2/26/20 1:05 PM, Eli Britstein wrote:
>>>> On 2/24/2020 1:06 PM, Ilya Maximets wrote:
>>>>> On 2/23/20 3:32 PM, Eli Britstein wrote:
>>>>>> On 2/21/2020 4:54 PM, Ilya Maximets wrote:
>>>>>>> If the offloading queue is big and filled continuously, offloading
>>>>>>> thread may have no chance to quiesce blocking rcu callbacks and
>>>>>>> other threads waiting for synchronization.
>>>>>>>
>>>>>>> Fix that by entering momentary quiescent state after each operation
>>>>>>> since we're not holding any rcu-protected memory here.
>>>>>>>
>>>>>>> Fixes: 02bb2824e51d ("dpif-netdev: do hw flow offload in a thread")
>>>>>>> Reported-by: Eli Britstein <elibr@mellanox.com>
>>>>>>> Reported-at: https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.openvswitch.org%2Fpipermail%2Fovs-discuss%2F2020-February%2F049768.html&amp;data=02%7C01%7Celibr%40mellanox.com%7Ceb2c70875cb14a4d4b4108d7bc2d2676%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C637184774411852418&amp;sdata=8YgXyAJ248Mqhiq4tbUcAb357Dqdu1ue%2BBXp%2F6pX6tY%3D&amp;reserved=0
>>>>>>> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
>>>>>>> ---
>>>>>>>      lib/dpif-netdev.c | 1 +
>>>>>>>      1 file changed, 1 insertion(+)
>>>>>>>
>>>>>>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>>>>>>> index d393aab5e..a798db45d 100644
>>>>>>> --- a/lib/dpif-netdev.c
>>>>>>> +++ b/lib/dpif-netdev.c
>>>>>>> @@ -2512,6 +2512,7 @@ dp_netdev_flow_offload_main(void *data OVS_UNUSED)
>>>>>>>              VLOG_DBG("%s to %s netdev flow\n",
>>>>>>>                       ret == 0 ? "succeed" : "failed", op);
>>>>>>>              dp_netdev_free_flow_offload(offload);
>>>>>>> +        ovsrcu_quiesce();
>>>>>> This seems to solve the issue of responsiveness, but I have encountered a crash using it, while there is a lot of flow deletions.
>>>>>>
>>>>>> #0  0x0000000001e462dc in get_unaligned_u32 (p=0x7f57f7813500) at lib/unaligned.h:86
>>>>>> #1  0x0000000001e46388 in hash_bytes (p_=0x7f57f7813500, n=16, basis=0) at lib/hash.c:38
>>>>>> #2  0x0000000001f7f836 in ufid_to_rte_flow_data_find (ufid=0x7f57f7813500) at lib/netdev-offload-dpdk.c:134
>>>>>> #3  0x0000000001f88693 in netdev_offload_dpdk_flow_del (netdev=0x42260da0, ufid=0x7f57f7813500, stats=0x0) at lib/netdev-offload-dpdk.c:3361
>>>>>> #4  0x0000000001e6a00d in netdev_flow_del (netdev=0x42260da0, ufid=0x7f57f7813500, stats=0x0) at lib/netdev-offload.c:296
>>>>> This might mean 2 things:
>>>>> 1. Incorrect usage of RCU protected data inside netdev-offload-dpdk.
>>>>> 2. We're not allowed to quiesce if we have any item in offloading queue,
>>>>>       i.e. bad RCU usage scheme in dpif-netdev.
>>>>>
>>>>> Both cases are bad and I'm not sure if we can easily fix case #2.
>>>>> I'll take a look and try to find a root cause.
>>>> It was my bad. Sorry. This is another issue, that exists in my branch only (CT offloads), so it's not related. I was able to fix it.
>>> OK.  Good to know.
>>>
>>>> However, I wanted to reproduce with master branch, but then I cannot create a lot of flows as I get: "upcall: datapath flow limit reached".
>>>>
>>>> I tried setting large number using "ovs-appctl upcall/set-flow-limit 400000", but it didn't help.
>>> Number of flows in dpif-netdev is artificially limited by MAX_FLOWS (65536)
>>> value.  There was a recent discussion about this limit and I think we could
>>> just remove it.  I have a small patch for this, will send soon.
>>> For now, you can just raise the hardcoded value of MAX_FLOWS in dpif-netdev.c.
>> The issue is not in dpif-netdev.c. It's in ofproto/ofproto-dpif-upcall.c. If I'm not wrong, when it happens, there is not attempt to install the flow, so anyway MAX_FLOWS in dpif-netdev.c is not the blocker.
> There are couple of things that controls the maximum number of datapath flows.
> First is the other_config:flow-limit that defaults to 200000.  Second is the
> datapath internal MAX_FLOWS define.  And also, "ovs-appctl upcall/set-flow-limit"
> should be able to modify this value.  Sounds strange.
>
>> BTW, my setup is the PF and VF rep in a bridge. I run testpmd with macswap on the VF, and send 10k different src MAC packets (OpenFlow configuration is NORMAL). So, I expect total 20k flows, also less than 64k of MAX_FLOWS.
> Are you sure that packets are not corrupted?  Could you dump datapath flows
> and verify that these flows are what you've expected?

This is for example for 4 different MACs 
(e4:10:00:00:00:01-e4:10:00:00:00:04). As expected:

$ ovs-appctl dpctl/dump-flows -m
flow-dump from pmd on cpu core: 8
ufid:65aae19f-06a9-4887-8d2d-60b04187f7cd, 
skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(vm1),packet_type(ns=0,id=0),eth(src=e4:11:00:00:00:01,dst=e4:10:00:00:00:01),eth_type(0x0800),ipv4(src=1.1.1.1/0.0.0.0,dst=1.1.1.2/0.0.0.0,proto=17/0,tos=0/0,ttl=64/0,frag=no),udp(src=1025/0,dst=1025/0), 
packets:11316766, bytes:1244844260, used:0.013s, offloaded:yes, dp:dpdk, 
actions:pf, dp-extra-info:miniflow_bits(5,1)
ufid:a0e4b9bc-8973-41d1-b2cf-29f2a14935d8, 
skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(pf),packet_type(ns=0,id=0),eth(src=e4:10:00:00:00:03,dst=e4:11:00:00:00:01),eth_type(0x0800),ipv4(src=1.1.1.2/0.0.0.0,dst=1.1.1.1/0.0.0.0,proto=17/0,tos=0/0,ttl=64/0,frag=no),udp(src=1025/0,dst=1025/0), 
packets:109476803, bytes:12042448330, used:0.013s, offloaded:yes, 
dp:dpdk, actions:vm1, dp-extra-info:miniflow_bits(5,1)
ufid:bbb7fe49-422b-434c-bd70-32d6a90c44c4, 
skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(vm1),packet_type(ns=0,id=0),eth(src=e4:11:00:00:00:01,dst=e4:10:00:00:00:03),eth_type(0x0800),ipv4(src=1.1.1.1/0.0.0.0,dst=1.1.1.2/0.0.0.0,proto=17/0,tos=0/0,ttl=64/0,frag=no),udp(src=1025/0,dst=1025/0), 
packets:11147160, bytes:1226187600, used:0.013s, offloaded:yes, dp:dpdk, 
actions:pf, dp-extra-info:miniflow_bits(5,1)
ufid:bebe7de1-b2d6-48d6-87aa-3fab025a793d, 
skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(pf),packet_type(ns=0,id=0),eth(src=e4:10:00:00:00:02,dst=e4:11:00:00:00:01),eth_type(0x0800),ipv4(src=1.1.1.2/0.0.0.0,dst=1.1.1.1/0.0.0.0,proto=17/0,tos=0/0,ttl=64/0,frag=no),udp(src=1025/0,dst=1025/0), 
packets:110643486, bytes:12170783460, used:0.013s, offloaded:yes, 
dp:dpdk, actions:vm1, dp-extra-info:miniflow_bits(5,1)
ufid:24e09efe-f02c-41d5-a7c1-3d0aea28cbe0, 
skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(vm1),packet_type(ns=0,id=0),eth(src=e4:11:00:00:00:01,dst=e4:10:00:00:00:02),eth_type(0x0800),ipv4(src=1.1.1.1/0.0.0.0,dst=1.1.1.2/0.0.0.0,proto=17/0,tos=0/0,ttl=64/0,frag=no),udp(src=1025/0,dst=1025/0), 
packets:11304574, bytes:1243503140, used:0.013s, offloaded:yes, dp:dpdk, 
actions:pf, dp-extra-info:miniflow_bits(5,1)
ufid:4fba2205-37d8-42a1-8899-57bdbbc1c539, 
skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(pf),packet_type(ns=0,id=0),eth(src=e4:10:00:00:00:01,dst=e4:11:00:00:00:01),eth_type(0x0800),ipv4(src=1.1.1.2/0.0.0.0,dst=1.1.1.1/0.0.0.0,proto=17/0,tos=0/0,ttl=64/0,frag=no),udp(src=1025/0,dst=1025/0), 
packets:110644871, bytes:12170935810, used:0.013s, offloaded:yes, 
dp:dpdk, actions:vm1, dp-extra-info:miniflow_bits(5,1)
ufid:99848c40-a8cd-4732-b520-30a27010964b, 
skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(pf),packet_type(ns=0,id=0),eth(src=e4:10:00:00:00:04,dst=e4:11:00:00:00:01),eth_type(0x0800),ipv4(src=1.1.1.2/0.0.0.0,dst=1.1.1.1/0.0.0.0,proto=17/0,tos=0/0,ttl=64/0,frag=no),udp(src=1025/0,dst=1025/0), 
packets:110644198, bytes:12170861780, used:0.013s, offloaded:yes, 
dp:dpdk, actions:vm1, dp-extra-info:miniflow_bits(5,1)
ufid:293e5fa1-cce6-41d2-9b3c-db01105f3cc2, 
skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(vm1),packet_type(ns=0,id=0),eth(src=e4:11:00:00:00:01,dst=e4:10:00:00:00:04),eth_type(0x0800),ipv4(src=1.1.1.1/0.0.0.0,dst=1.1.1.2/0.0.0.0,proto=17/0,tos=0/0,ttl=64/0,frag=no),udp(src=1025/0,dst=1025/0), 
packets:11327727, bytes:1246049970, used:0.013s, offloaded:yes, dp:dpdk, 
actions:pf, dp-extra-info:miniflow_bits(5,1)

>
>>>>>> In other trials, I see the ovs-appctl stuck again, but then I cannot attach gdb (or use pstack). It hangs. In this scenario, I see this in dmesg:
>>>>>>
>>>>>> [  404.450694] ovs-vswitchd[8574]: segfault at 7f8bd4e87280 ip 0000000001e463c3 sp 00007f8cfaf33028 error 4 in ovs-vswitchd[400000+200c000]
>>>>>>
>>>>>> Yanqin Wei <Yanqin.Wei@arm.com> suggested to add ovsrcu_try_quiesce call (in the same place). It seems more stable (haven't see such crash as above), but it has the same stuck symptom as above.
>>>>> ovsrcu_try_quiesce() is the same as ovsrcu_quiesce() except that is fails
>>>>> if mutex can not be taken right now.  There is no real difference.  Your
>>>>> crashes are less frequent just because thread enters quiescent state less
>>>>> frequently.
>>>> If so, I think "ovsrcu_try_quiesce" is more suitable here. Don't you think?
>>> I don't think that taking of one more mutex could make harm in
>>> offloading thread.  If we'll want to care about minor sleeping
>>> spikes, we should start from replacing the dp_flow_offload.mutex
>>> and sleeping on the conditional variable.
>> Agree it's not critical. However, if it's not mandatory, I think it's better to avoid it.
> Since it's not critical, I think it's better to not block RCU callbacks
> and other threads that waits for RCU synchronization.

I am not clear. I thought ovsrcu_try_quiesce is more lightweight than 
ovsrcu_quiesce. Now I see in the code there is indeed a lock there 
(seq_try_lock()).

So, what's the logic behind your previous comment:

Your
crashes are less frequent just because thread enters quiescent state less
frequently.

Is it because of that lock in ovsrcu_try_quiesce the thread enters 
quiescent less frequently? So actually ovsrcu_try_quiesce is heavier 
than ovsrcu_quiesce.

>
>> In addition I think of making it multi-threaded (thread per port) to boost insertion/deletion rates.
>>
> This will not give any performance unless you've found a good way how to
> avoid taking of global dp->port_mutex.  Also, "per port" seems too much.
Of course. Need to find a way not to have a global lock. Regarding 
"per-port", maybe not a thread per-port, but per-port request queue. The 
number of threads can maybe configurable, each handling several queues.
Eli Britstein March 4, 2020, 3:51 p.m. UTC | #8
On 3/3/2020 4:44 PM, Eli Britstein wrote:
>
> On 2/28/2020 11:03 AM, Ilya Maximets wrote:
>> On 2/27/20 10:00 AM, Eli Britstein wrote:
>>> On 2/26/2020 2:31 PM, Ilya Maximets wrote:
>>>> On 2/26/20 1:05 PM, Eli Britstein wrote:
>>>>> On 2/24/2020 1:06 PM, Ilya Maximets wrote:
>>>>>> On 2/23/20 3:32 PM, Eli Britstein wrote:
>>>>>>> On 2/21/2020 4:54 PM, Ilya Maximets wrote:
>>>>>>>> If the offloading queue is big and filled continuously, offloading
>>>>>>>> thread may have no chance to quiesce blocking rcu callbacks and
>>>>>>>> other threads waiting for synchronization.
>>>>>>>>
>>>>>>>> Fix that by entering momentary quiescent state after each 
>>>>>>>> operation
>>>>>>>> since we're not holding any rcu-protected memory here.
>>>>>>>>
>>>>>>>> Fixes: 02bb2824e51d ("dpif-netdev: do hw flow offload in a 
>>>>>>>> thread")
>>>>>>>> Reported-by: Eli Britstein <elibr@mellanox.com>
>>>>>>>> Reported-at: 
>>>>>>>> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.openvswitch.org%2Fpipermail%2Fovs-discuss%2F2020-February%2F049768.html&amp;data=02%7C01%7Celibr%40mellanox.com%7Ceb2c70875cb14a4d4b4108d7bc2d2676%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C637184774411852418&amp;sdata=8YgXyAJ248Mqhiq4tbUcAb357Dqdu1ue%2BBXp%2F6pX6tY%3D&amp;reserved=0
>>>>>>>> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
>>>>>>>> ---
>>>>>>>>      lib/dpif-netdev.c | 1 +
>>>>>>>>      1 file changed, 1 insertion(+)
>>>>>>>>
>>>>>>>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>>>>>>>> index d393aab5e..a798db45d 100644
>>>>>>>> --- a/lib/dpif-netdev.c
>>>>>>>> +++ b/lib/dpif-netdev.c
>>>>>>>> @@ -2512,6 +2512,7 @@ dp_netdev_flow_offload_main(void *data 
>>>>>>>> OVS_UNUSED)
>>>>>>>>              VLOG_DBG("%s to %s netdev flow\n",
>>>>>>>>                       ret == 0 ? "succeed" : "failed", op);
>>>>>>>>              dp_netdev_free_flow_offload(offload);
>>>>>>>> +        ovsrcu_quiesce();
Acked-by: Eli Britstein <elibr@mellanox.com>
>>>>>>> This seems to solve the issue of responsiveness, but I have 
>>>>>>> encountered a crash using it, while there is a lot of flow 
>>>>>>> deletions.
>>>>>>>
>>>>>>> #0  0x0000000001e462dc in get_unaligned_u32 (p=0x7f57f7813500) 
>>>>>>> at lib/unaligned.h:86
>>>>>>> #1  0x0000000001e46388 in hash_bytes (p_=0x7f57f7813500, n=16, 
>>>>>>> basis=0) at lib/hash.c:38
>>>>>>> #2  0x0000000001f7f836 in ufid_to_rte_flow_data_find 
>>>>>>> (ufid=0x7f57f7813500) at lib/netdev-offload-dpdk.c:134
>>>>>>> #3  0x0000000001f88693 in netdev_offload_dpdk_flow_del 
>>>>>>> (netdev=0x42260da0, ufid=0x7f57f7813500, stats=0x0) at 
>>>>>>> lib/netdev-offload-dpdk.c:3361
>>>>>>> #4  0x0000000001e6a00d in netdev_flow_del (netdev=0x42260da0, 
>>>>>>> ufid=0x7f57f7813500, stats=0x0) at lib/netdev-offload.c:296
>>>>>> This might mean 2 things:
>>>>>> 1. Incorrect usage of RCU protected data inside netdev-offload-dpdk.
>>>>>> 2. We're not allowed to quiesce if we have any item in offloading 
>>>>>> queue,
>>>>>>       i.e. bad RCU usage scheme in dpif-netdev.
>>>>>>
>>>>>> Both cases are bad and I'm not sure if we can easily fix case #2.
>>>>>> I'll take a look and try to find a root cause.
>>>>> It was my bad. Sorry. This is another issue, that exists in my 
>>>>> branch only (CT offloads), so it's not related. I was able to fix it.
>>>> OK.  Good to know.
>>>>
>>>>> However, I wanted to reproduce with master branch, but then I 
>>>>> cannot create a lot of flows as I get: "upcall: datapath flow 
>>>>> limit reached".
>>>>>
>>>>> I tried setting large number using "ovs-appctl 
>>>>> upcall/set-flow-limit 400000", but it didn't help.
>>>> Number of flows in dpif-netdev is artificially limited by MAX_FLOWS 
>>>> (65536)
>>>> value.  There was a recent discussion about this limit and I think 
>>>> we could
>>>> just remove it.  I have a small patch for this, will send soon.
>>>> For now, you can just raise the hardcoded value of MAX_FLOWS in 
>>>> dpif-netdev.c.
>>> The issue is not in dpif-netdev.c. It's in 
>>> ofproto/ofproto-dpif-upcall.c. If I'm not wrong, when it happens, 
>>> there is not attempt to install the flow, so anyway MAX_FLOWS in 
>>> dpif-netdev.c is not the blocker.
>> There are couple of things that controls the maximum number of 
>> datapath flows.
>> First is the other_config:flow-limit that defaults to 200000. Second 
>> is the
>> datapath internal MAX_FLOWS define.  And also, "ovs-appctl 
>> upcall/set-flow-limit"
>> should be able to modify this value.  Sounds strange.
>>
>>> BTW, my setup is the PF and VF rep in a bridge. I run testpmd with 
>>> macswap on the VF, and send 10k different src MAC packets (OpenFlow 
>>> configuration is NORMAL). So, I expect total 20k flows, also less 
>>> than 64k of MAX_FLOWS.
>> Are you sure that packets are not corrupted?  Could you dump datapath 
>> flows
>> and verify that these flows are what you've expected?
>
> This is for example for 4 different MACs 
> (e4:10:00:00:00:01-e4:10:00:00:00:04). As expected:
>
> $ ovs-appctl dpctl/dump-flows -m
> flow-dump from pmd on cpu core: 8
> ufid:65aae19f-06a9-4887-8d2d-60b04187f7cd, 
> skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(vm1),packet_type(ns=0,id=0),eth(src=e4:11:00:00:00:01,dst=e4:10:00:00:00:01),eth_type(0x0800),ipv4(src=1.1.1.1/0.0.0.0,dst=1.1.1.2/0.0.0.0,proto=17/0,tos=0/0,ttl=64/0,frag=no),udp(src=1025/0,dst=1025/0), 
> packets:11316766, bytes:1244844260, used:0.013s, offloaded:yes, 
> dp:dpdk, actions:pf, dp-extra-info:miniflow_bits(5,1)
> ufid:a0e4b9bc-8973-41d1-b2cf-29f2a14935d8, 
> skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(pf),packet_type(ns=0,id=0),eth(src=e4:10:00:00:00:03,dst=e4:11:00:00:00:01),eth_type(0x0800),ipv4(src=1.1.1.2/0.0.0.0,dst=1.1.1.1/0.0.0.0,proto=17/0,tos=0/0,ttl=64/0,frag=no),udp(src=1025/0,dst=1025/0), 
> packets:109476803, bytes:12042448330, used:0.013s, offloaded:yes, 
> dp:dpdk, actions:vm1, dp-extra-info:miniflow_bits(5,1)
> ufid:bbb7fe49-422b-434c-bd70-32d6a90c44c4, 
> skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(vm1),packet_type(ns=0,id=0),eth(src=e4:11:00:00:00:01,dst=e4:10:00:00:00:03),eth_type(0x0800),ipv4(src=1.1.1.1/0.0.0.0,dst=1.1.1.2/0.0.0.0,proto=17/0,tos=0/0,ttl=64/0,frag=no),udp(src=1025/0,dst=1025/0), 
> packets:11147160, bytes:1226187600, used:0.013s, offloaded:yes, 
> dp:dpdk, actions:pf, dp-extra-info:miniflow_bits(5,1)
> ufid:bebe7de1-b2d6-48d6-87aa-3fab025a793d, 
> skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(pf),packet_type(ns=0,id=0),eth(src=e4:10:00:00:00:02,dst=e4:11:00:00:00:01),eth_type(0x0800),ipv4(src=1.1.1.2/0.0.0.0,dst=1.1.1.1/0.0.0.0,proto=17/0,tos=0/0,ttl=64/0,frag=no),udp(src=1025/0,dst=1025/0), 
> packets:110643486, bytes:12170783460, used:0.013s, offloaded:yes, 
> dp:dpdk, actions:vm1, dp-extra-info:miniflow_bits(5,1)
> ufid:24e09efe-f02c-41d5-a7c1-3d0aea28cbe0, 
> skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(vm1),packet_type(ns=0,id=0),eth(src=e4:11:00:00:00:01,dst=e4:10:00:00:00:02),eth_type(0x0800),ipv4(src=1.1.1.1/0.0.0.0,dst=1.1.1.2/0.0.0.0,proto=17/0,tos=0/0,ttl=64/0,frag=no),udp(src=1025/0,dst=1025/0), 
> packets:11304574, bytes:1243503140, used:0.013s, offloaded:yes, 
> dp:dpdk, actions:pf, dp-extra-info:miniflow_bits(5,1)
> ufid:4fba2205-37d8-42a1-8899-57bdbbc1c539, 
> skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(pf),packet_type(ns=0,id=0),eth(src=e4:10:00:00:00:01,dst=e4:11:00:00:00:01),eth_type(0x0800),ipv4(src=1.1.1.2/0.0.0.0,dst=1.1.1.1/0.0.0.0,proto=17/0,tos=0/0,ttl=64/0,frag=no),udp(src=1025/0,dst=1025/0), 
> packets:110644871, bytes:12170935810, used:0.013s, offloaded:yes, 
> dp:dpdk, actions:vm1, dp-extra-info:miniflow_bits(5,1)
> ufid:99848c40-a8cd-4732-b520-30a27010964b, 
> skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(pf),packet_type(ns=0,id=0),eth(src=e4:10:00:00:00:04,dst=e4:11:00:00:00:01),eth_type(0x0800),ipv4(src=1.1.1.2/0.0.0.0,dst=1.1.1.1/0.0.0.0,proto=17/0,tos=0/0,ttl=64/0,frag=no),udp(src=1025/0,dst=1025/0), 
> packets:110644198, bytes:12170861780, used:0.013s, offloaded:yes, 
> dp:dpdk, actions:vm1, dp-extra-info:miniflow_bits(5,1)
> ufid:293e5fa1-cce6-41d2-9b3c-db01105f3cc2, 
> skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(vm1),packet_type(ns=0,id=0),eth(src=e4:11:00:00:00:01,dst=e4:10:00:00:00:04),eth_type(0x0800),ipv4(src=1.1.1.1/0.0.0.0,dst=1.1.1.2/0.0.0.0,proto=17/0,tos=0/0,ttl=64/0,frag=no),udp(src=1025/0,dst=1025/0), 
> packets:11327727, bytes:1246049970, used:0.013s, offloaded:yes, 
> dp:dpdk, actions:pf, dp-extra-info:miniflow_bits(5,1)

This issue is resolved by enlarging the MAC learning number:

ovs-vsctl set bridge br-int other_config:mac-table-size=200000

>
>>
>>>>>>> In other trials, I see the ovs-appctl stuck again, but then I 
>>>>>>> cannot attach gdb (or use pstack). It hangs. In this scenario, I 
>>>>>>> see this in dmesg:
>>>>>>>
>>>>>>> [  404.450694] ovs-vswitchd[8574]: segfault at 7f8bd4e87280 ip 
>>>>>>> 0000000001e463c3 sp 00007f8cfaf33028 error 4 in 
>>>>>>> ovs-vswitchd[400000+200c000]
>>>>>>>
>>>>>>> Yanqin Wei <Yanqin.Wei@arm.com> suggested to add 
>>>>>>> ovsrcu_try_quiesce call (in the same place). It seems more 
>>>>>>> stable (haven't see such crash as above), but it has the same 
>>>>>>> stuck symptom as above.
>>>>>> ovsrcu_try_quiesce() is the same as ovsrcu_quiesce() except that 
>>>>>> is fails
>>>>>> if mutex can not be taken right now.  There is no real 
>>>>>> difference.  Your
>>>>>> crashes are less frequent just because thread enters quiescent 
>>>>>> state less
>>>>>> frequently.
>>>>> If so, I think "ovsrcu_try_quiesce" is more suitable here. Don't 
>>>>> you think?
>>>> I don't think that taking of one more mutex could make harm in
>>>> offloading thread.  If we'll want to care about minor sleeping
>>>> spikes, we should start from replacing the dp_flow_offload.mutex
>>>> and sleeping on the conditional variable.
>>> Agree it's not critical. However, if it's not mandatory, I think 
>>> it's better to avoid it.
>> Since it's not critical, I think it's better to not block RCU callbacks
>> and other threads that waits for RCU synchronization.
>
> I am not clear. I thought ovsrcu_try_quiesce is more lightweight than 
> ovsrcu_quiesce. Now I see in the code there is indeed a lock there 
> (seq_try_lock()).
>
> So, what's the logic behind your previous comment:
>
> Your
> crashes are less frequent just because thread enters quiescent state less
> frequently.
>
> Is it because of that lock in ovsrcu_try_quiesce the thread enters 
> quiescent less frequently? So actually ovsrcu_try_quiesce is heavier 
> than ovsrcu_quiesce.
>
>>
>>> In addition I think of making it multi-threaded (thread per port) to 
>>> boost insertion/deletion rates.
>>>
>> This will not give any performance unless you've found a good way how to
>> avoid taking of global dp->port_mutex.  Also, "per port" seems too much.
> Of course. Need to find a way not to have a global lock. Regarding 
> "per-port", maybe not a thread per-port, but per-port request queue. 
> The number of threads can maybe configurable, each handling several 
> queues.
Ilya Maximets March 16, 2020, 11:41 a.m. UTC | #9
On 3/4/20 4:51 PM, Eli Britstein wrote:
> 
> On 3/3/2020 4:44 PM, Eli Britstein wrote:
>>
>> On 2/28/2020 11:03 AM, Ilya Maximets wrote:
>>> On 2/27/20 10:00 AM, Eli Britstein wrote:
>>>> On 2/26/2020 2:31 PM, Ilya Maximets wrote:
>>>>> On 2/26/20 1:05 PM, Eli Britstein wrote:
>>>>>> On 2/24/2020 1:06 PM, Ilya Maximets wrote:
>>>>>>> On 2/23/20 3:32 PM, Eli Britstein wrote:
>>>>>>>> On 2/21/2020 4:54 PM, Ilya Maximets wrote:
>>>>>>>>> If the offloading queue is big and filled continuously, offloading
>>>>>>>>> thread may have no chance to quiesce blocking rcu callbacks and
>>>>>>>>> other threads waiting for synchronization.
>>>>>>>>>
>>>>>>>>> Fix that by entering momentary quiescent state after each operation
>>>>>>>>> since we're not holding any rcu-protected memory here.
>>>>>>>>>
>>>>>>>>> Fixes: 02bb2824e51d ("dpif-netdev: do hw flow offload in a thread")
>>>>>>>>> Reported-by: Eli Britstein <elibr@mellanox.com>
>>>>>>>>> Reported-at: https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.openvswitch.org%2Fpipermail%2Fovs-discuss%2F2020-February%2F049768.html&amp;data=02%7C01%7Celibr%40mellanox.com%7Ceb2c70875cb14a4d4b4108d7bc2d2676%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C637184774411852418&amp;sdata=8YgXyAJ248Mqhiq4tbUcAb357Dqdu1ue%2BBXp%2F6pX6tY%3D&amp;reserved=0
>>>>>>>>> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
>>>>>>>>> ---
>>>>>>>>>      lib/dpif-netdev.c | 1 +
>>>>>>>>>      1 file changed, 1 insertion(+)
>>>>>>>>>
>>>>>>>>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>>>>>>>>> index d393aab5e..a798db45d 100644
>>>>>>>>> --- a/lib/dpif-netdev.c
>>>>>>>>> +++ b/lib/dpif-netdev.c
>>>>>>>>> @@ -2512,6 +2512,7 @@ dp_netdev_flow_offload_main(void *data OVS_UNUSED)
>>>>>>>>>              VLOG_DBG("%s to %s netdev flow\n",
>>>>>>>>>                       ret == 0 ? "succeed" : "failed", op);
>>>>>>>>>              dp_netdev_free_flow_offload(offload);
>>>>>>>>> +        ovsrcu_quiesce();
> Acked-by: Eli Britstein <elibr@mellanox.com>


Applied to master and backported down to branch-2.10.


>>>>>>>> This seems to solve the issue of responsiveness, but I have encountered a crash using it, while there is a lot of flow deletions.
>>>>>>>>
>>>>>>>> #0  0x0000000001e462dc in get_unaligned_u32 (p=0x7f57f7813500) at lib/unaligned.h:86
>>>>>>>> #1  0x0000000001e46388 in hash_bytes (p_=0x7f57f7813500, n=16, basis=0) at lib/hash.c:38
>>>>>>>> #2  0x0000000001f7f836 in ufid_to_rte_flow_data_find (ufid=0x7f57f7813500) at lib/netdev-offload-dpdk.c:134
>>>>>>>> #3  0x0000000001f88693 in netdev_offload_dpdk_flow_del (netdev=0x42260da0, ufid=0x7f57f7813500, stats=0x0) at lib/netdev-offload-dpdk.c:3361
>>>>>>>> #4  0x0000000001e6a00d in netdev_flow_del (netdev=0x42260da0, ufid=0x7f57f7813500, stats=0x0) at lib/netdev-offload.c:296
>>>>>>> This might mean 2 things:
>>>>>>> 1. Incorrect usage of RCU protected data inside netdev-offload-dpdk.
>>>>>>> 2. We're not allowed to quiesce if we have any item in offloading queue,
>>>>>>>       i.e. bad RCU usage scheme in dpif-netdev.
>>>>>>>
>>>>>>> Both cases are bad and I'm not sure if we can easily fix case #2.
>>>>>>> I'll take a look and try to find a root cause.
>>>>>> It was my bad. Sorry. This is another issue, that exists in my branch only (CT offloads), so it's not related. I was able to fix it.
>>>>> OK.  Good to know.
>>>>>
>>>>>> However, I wanted to reproduce with master branch, but then I cannot create a lot of flows as I get: "upcall: datapath flow limit reached".
>>>>>>
>>>>>> I tried setting large number using "ovs-appctl upcall/set-flow-limit 400000", but it didn't help.
>>>>> Number of flows in dpif-netdev is artificially limited by MAX_FLOWS (65536)
>>>>> value.  There was a recent discussion about this limit and I think we could
>>>>> just remove it.  I have a small patch for this, will send soon.
>>>>> For now, you can just raise the hardcoded value of MAX_FLOWS in dpif-netdev.c.
>>>> The issue is not in dpif-netdev.c. It's in ofproto/ofproto-dpif-upcall.c. If I'm not wrong, when it happens, there is not attempt to install the flow, so anyway MAX_FLOWS in dpif-netdev.c is not the blocker.
>>> There are couple of things that controls the maximum number of datapath flows.
>>> First is the other_config:flow-limit that defaults to 200000. Second is the
>>> datapath internal MAX_FLOWS define.  And also, "ovs-appctl upcall/set-flow-limit"
>>> should be able to modify this value.  Sounds strange.
>>>
>>>> BTW, my setup is the PF and VF rep in a bridge. I run testpmd with macswap on the VF, and send 10k different src MAC packets (OpenFlow configuration is NORMAL). So, I expect total 20k flows, also less than 64k of MAX_FLOWS.
>>> Are you sure that packets are not corrupted?  Could you dump datapath flows
>>> and verify that these flows are what you've expected?
>>
>> This is for example for 4 different MACs (e4:10:00:00:00:01-e4:10:00:00:00:04). As expected:
>>
>> $ ovs-appctl dpctl/dump-flows -m
>> flow-dump from pmd on cpu core: 8
>> ufid:65aae19f-06a9-4887-8d2d-60b04187f7cd, skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(vm1),packet_type(ns=0,id=0),eth(src=e4:11:00:00:00:01,dst=e4:10:00:00:00:01),eth_type(0x0800),ipv4(src=1.1.1.1/0.0.0.0,dst=1.1.1.2/0.0.0.0,proto=17/0,tos=0/0,ttl=64/0,frag=no),udp(src=1025/0,dst=1025/0), packets:11316766, bytes:1244844260, used:0.013s, offloaded:yes, dp:dpdk, actions:pf, dp-extra-info:miniflow_bits(5,1)
>> ufid:a0e4b9bc-8973-41d1-b2cf-29f2a14935d8, skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(pf),packet_type(ns=0,id=0),eth(src=e4:10:00:00:00:03,dst=e4:11:00:00:00:01),eth_type(0x0800),ipv4(src=1.1.1.2/0.0.0.0,dst=1.1.1.1/0.0.0.0,proto=17/0,tos=0/0,ttl=64/0,frag=no),udp(src=1025/0,dst=1025/0), packets:109476803, bytes:12042448330, used:0.013s, offloaded:yes, dp:dpdk, actions:vm1, dp-extra-info:miniflow_bits(5,1)
>> ufid:bbb7fe49-422b-434c-bd70-32d6a90c44c4, skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(vm1),packet_type(ns=0,id=0),eth(src=e4:11:00:00:00:01,dst=e4:10:00:00:00:03),eth_type(0x0800),ipv4(src=1.1.1.1/0.0.0.0,dst=1.1.1.2/0.0.0.0,proto=17/0,tos=0/0,ttl=64/0,frag=no),udp(src=1025/0,dst=1025/0), packets:11147160, bytes:1226187600, used:0.013s, offloaded:yes, dp:dpdk, actions:pf, dp-extra-info:miniflow_bits(5,1)
>> ufid:bebe7de1-b2d6-48d6-87aa-3fab025a793d, skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(pf),packet_type(ns=0,id=0),eth(src=e4:10:00:00:00:02,dst=e4:11:00:00:00:01),eth_type(0x0800),ipv4(src=1.1.1.2/0.0.0.0,dst=1.1.1.1/0.0.0.0,proto=17/0,tos=0/0,ttl=64/0,frag=no),udp(src=1025/0,dst=1025/0), packets:110643486, bytes:12170783460, used:0.013s, offloaded:yes, dp:dpdk, actions:vm1, dp-extra-info:miniflow_bits(5,1)
>> ufid:24e09efe-f02c-41d5-a7c1-3d0aea28cbe0, skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(vm1),packet_type(ns=0,id=0),eth(src=e4:11:00:00:00:01,dst=e4:10:00:00:00:02),eth_type(0x0800),ipv4(src=1.1.1.1/0.0.0.0,dst=1.1.1.2/0.0.0.0,proto=17/0,tos=0/0,ttl=64/0,frag=no),udp(src=1025/0,dst=1025/0), packets:11304574, bytes:1243503140, used:0.013s, offloaded:yes, dp:dpdk, actions:pf, dp-extra-info:miniflow_bits(5,1)
>> ufid:4fba2205-37d8-42a1-8899-57bdbbc1c539, skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(pf),packet_type(ns=0,id=0),eth(src=e4:10:00:00:00:01,dst=e4:11:00:00:00:01),eth_type(0x0800),ipv4(src=1.1.1.2/0.0.0.0,dst=1.1.1.1/0.0.0.0,proto=17/0,tos=0/0,ttl=64/0,frag=no),udp(src=1025/0,dst=1025/0), packets:110644871, bytes:12170935810, used:0.013s, offloaded:yes, dp:dpdk, actions:vm1, dp-extra-info:miniflow_bits(5,1)
>> ufid:99848c40-a8cd-4732-b520-30a27010964b, skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(pf),packet_type(ns=0,id=0),eth(src=e4:10:00:00:00:04,dst=e4:11:00:00:00:01),eth_type(0x0800),ipv4(src=1.1.1.2/0.0.0.0,dst=1.1.1.1/0.0.0.0,proto=17/0,tos=0/0,ttl=64/0,frag=no),udp(src=1025/0,dst=1025/0), packets:110644198, bytes:12170861780, used:0.013s, offloaded:yes, dp:dpdk, actions:vm1, dp-extra-info:miniflow_bits(5,1)
>> ufid:293e5fa1-cce6-41d2-9b3c-db01105f3cc2, skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(vm1),packet_type(ns=0,id=0),eth(src=e4:11:00:00:00:01,dst=e4:10:00:00:00:04),eth_type(0x0800),ipv4(src=1.1.1.1/0.0.0.0,dst=1.1.1.2/0.0.0.0,proto=17/0,tos=0/0,ttl=64/0,frag=no),udp(src=1025/0,dst=1025/0), packets:11327727, bytes:1246049970, used:0.013s, offloaded:yes, dp:dpdk, actions:pf, dp-extra-info:miniflow_bits(5,1)
> 
> This issue is resolved by enlarging the MAC learning number:
> 
> ovs-vsctl set bridge br-int other_config:mac-table-size=200000

Hmm. OK.

> 
>>
>>>
>>>>>>>> In other trials, I see the ovs-appctl stuck again, but then I cannot attach gdb (or use pstack). It hangs. In this scenario, I see this in dmesg:
>>>>>>>>
>>>>>>>> [  404.450694] ovs-vswitchd[8574]: segfault at 7f8bd4e87280 ip 0000000001e463c3 sp 00007f8cfaf33028 error 4 in ovs-vswitchd[400000+200c000]
>>>>>>>>
>>>>>>>> Yanqin Wei <Yanqin.Wei@arm.com> suggested to add ovsrcu_try_quiesce call (in the same place). It seems more stable (haven't see such crash as above), but it has the same stuck symptom as above.
>>>>>>> ovsrcu_try_quiesce() is the same as ovsrcu_quiesce() except that is fails
>>>>>>> if mutex can not be taken right now.  There is no real difference.  Your
>>>>>>> crashes are less frequent just because thread enters quiescent state less
>>>>>>> frequently.
>>>>>> If so, I think "ovsrcu_try_quiesce" is more suitable here. Don't you think?
>>>>> I don't think that taking of one more mutex could make harm in
>>>>> offloading thread.  If we'll want to care about minor sleeping
>>>>> spikes, we should start from replacing the dp_flow_offload.mutex
>>>>> and sleeping on the conditional variable.
>>>> Agree it's not critical. However, if it's not mandatory, I think it's better to avoid it.
>>> Since it's not critical, I think it's better to not block RCU callbacks
>>> and other threads that waits for RCU synchronization.
>>
>> I am not clear. I thought ovsrcu_try_quiesce is more lightweight than ovsrcu_quiesce. Now I see in the code there is indeed a lock there (seq_try_lock()).
>>
>> So, what's the logic behind your previous comment:
>>
>> Your
>> crashes are less frequent just because thread enters quiescent state less
>> frequently.
>>
>> Is it because of that lock in ovsrcu_try_quiesce the thread enters quiescent less frequently? So actually ovsrcu_try_quiesce is heavier than ovsrcu_quiesce.

They're very similar except that 'try' fails if lock already held,
but regular function goes to sleep waiting for kernel to wake it
up when mutex is free to be locked.  There should be no much difference
beside that.  So, ovsrcu_try_quiesce doesn't enter quiescent state
is seq_lock is busy, i.e. thread enters quiescent state less frequently.


>>
>>>
>>>> In addition I think of making it multi-threaded (thread per port) to boost insertion/deletion rates.
>>>>
>>> This will not give any performance unless you've found a good way how to
>>> avoid taking of global dp->port_mutex.  Also, "per port" seems too much.
>> Of course. Need to find a way not to have a global lock. Regarding "per-port", maybe not a thread per-port, but per-port request queue. The number of threads can maybe configurable, each handling several queues.
diff mbox series

Patch

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index d393aab5e..a798db45d 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -2512,6 +2512,7 @@  dp_netdev_flow_offload_main(void *data OVS_UNUSED)
         VLOG_DBG("%s to %s netdev flow\n",
                  ret == 0 ? "succeed" : "failed", op);
         dp_netdev_free_flow_offload(offload);
+        ovsrcu_quiesce();
     }
 
     return NULL;