Message ID | cover.1612968145.git.grive@u256.net |
---|---|
Headers | show |
Series | dpif-netdev: Parallel offload processing | expand |
On Wed, Feb 10, 2021 at 4:07 PM Gaetan Rivet <grive@u256.net> wrote: > > This patch series aims to improve the performance of the management > of hw-offloads in dpif-netdev. In the current version, some setup > will experience high memory usage and poor latency between a flow > decision and its execution regarding hardware offloading. > > This series starts by measuring key metrics regarding both issues > Those patches are introduced first to compare the current status > with each improvements introduced. > Offloads enqueued and inserted, as well as the latency > from queue insertion to hardware insertion is measured. A new > command 'ovs-appctl dpctl/offload-stats-show' is introduced > to show the current measure. > > In my current performance test setup I am measuring an > average latency hovering between 1~2 seconds. > After the optimizations, it is reduced to 500~900 ms. > Finally when using multiple threads and with proper driver > support[1], it is measured in the order of 1 ms. > > A few modules are introduced: > > * An ID pool with reduced capabilities, simplifying its > operations and allowing better performances in both > single and multi-thread setup. > > * A lockless queue between PMDs / revalidators and > offload thread(s). As the number of PMDs increases, > contention can be high on the shared queue. > This queue is designed to serve as message queue > between threads. > > * A bounded lockless MPMC ring and some helpers for > calculating moving averages. > > * A moving average module for Cumulative and Exponential > moving averages. > > The netdev-offload-dpdk module is made thread-safe. > Internal maps are made per-netdev instead, and locks are > taken for shorter critical sections within the module. Hey Gaetan, This looks like an interesting series, I'll be looking too at it in the coming weeks. Just a first and easy update, I noticed an assert is triggered when stopping OVS (at least). 2021-03-11T20:32:01.928Z|00350|util|EMER|lib/netdev-offload.c:479: assertion thread_is_hw_offload || thread_is_rcu failed in netdev_offload_thread_init() I did not check yet where the issue is, but I noticed it too half way of the series, and it was not on stop in this case. You can probably catch it easily. I have hw-offloads enabled, 2 pf ports, 2 representors ports in a single bridge, 2 pmds, no additional configuration. I usually have bi directional traffic running through OVS while I restart.
On Mon, Mar 15, 2021, at 09:38, David Marchand wrote: > On Wed, Feb 10, 2021 at 4:07 PM Gaetan Rivet <grive@u256.net> wrote: > > > > This patch series aims to improve the performance of the management > > of hw-offloads in dpif-netdev. In the current version, some setup > > will experience high memory usage and poor latency between a flow > > decision and its execution regarding hardware offloading. > > > > This series starts by measuring key metrics regarding both issues > > Those patches are introduced first to compare the current status > > with each improvements introduced. > > Offloads enqueued and inserted, as well as the latency > > from queue insertion to hardware insertion is measured. A new > > command 'ovs-appctl dpctl/offload-stats-show' is introduced > > to show the current measure. > > > > In my current performance test setup I am measuring an > > average latency hovering between 1~2 seconds. > > After the optimizations, it is reduced to 500~900 ms. > > Finally when using multiple threads and with proper driver > > support[1], it is measured in the order of 1 ms. > > > > A few modules are introduced: > > > > * An ID pool with reduced capabilities, simplifying its > > operations and allowing better performances in both > > single and multi-thread setup. > > > > * A lockless queue between PMDs / revalidators and > > offload thread(s). As the number of PMDs increases, > > contention can be high on the shared queue. > > This queue is designed to serve as message queue > > between threads. > > > > * A bounded lockless MPMC ring and some helpers for > > calculating moving averages. > > > > * A moving average module for Cumulative and Exponential > > moving averages. > > > > The netdev-offload-dpdk module is made thread-safe. > > Internal maps are made per-netdev instead, and locks are > > taken for shorter critical sections within the module. > > Hey Gaetan, > > This looks like an interesting series, I'll be looking too at it in > the coming weeks. > Just a first and easy update, I noticed an assert is triggered when > stopping OVS (at least). > 2021-03-11T20:32:01.928Z|00350|util|EMER|lib/netdev-offload.c:479: > assertion thread_is_hw_offload || thread_is_rcu failed in > netdev_offload_thread_init() > > I did not check yet where the issue is, but I noticed it too half way > of the series, and it was not on stop in this case. > You can probably catch it easily. > > I have hw-offloads enabled, 2 pf ports, 2 representors ports in a > single bridge, 2 pmds, no additional configuration. > I usually have bi directional traffic running through OVS while I restart. > > > -- > David Marchand Hey David, Thanks for the report! This abort is added in patch 12: netdev-offload: Add multi-thread API It is relying on the offload thread being named 'hw_offload'. This name is changed by patch 22: dpif-netdev: Use one or more offload threads In the RFC series, a separate patch did a fix on the thread name [1]. I was relying on this fix happening first. I only did compilation checks in-between after squashing this patch. I can either re-introduce the fix patch separately, or rewrite the check in patch 12, then update it in patch 22. In my opinion having the fix separate was better but I can go with either solutions. Do you have a preference? (I did not have a setup this morning to do runtime checks, I will run them later. Reading the code this is my current understanding and it make sense for now.) [1]: https://mail.openvswitch.org/pipermail/ovs-dev/2020-December/378337.html
On Mon, Mar 15, 2021 at 1:29 PM Gaëtan Rivet <grive@u256.net> wrote: > > Just a first and easy update, I noticed an assert is triggered when > > stopping OVS (at least). > > 2021-03-11T20:32:01.928Z|00350|util|EMER|lib/netdev-offload.c:479: > > assertion thread_is_hw_offload || thread_is_rcu failed in > > netdev_offload_thread_init() > > > > I did not check yet where the issue is, but I noticed it too half way > > of the series, and it was not on stop in this case. > > You can probably catch it easily. > > > > I have hw-offloads enabled, 2 pf ports, 2 representors ports in a > > single bridge, 2 pmds, no additional configuration. > > I usually have bi directional traffic running through OVS while I restart. > This abort is added in patch 12: netdev-offload: Add multi-thread API > It is relying on the offload thread being named 'hw_offload'. > This name is changed by patch 22: dpif-netdev: Use one or more offload threads > > In the RFC series, a separate patch did a fix on the thread name [1]. > I was relying on this fix happening first. I only did compilation checks in-between after squashing this patch. > > I can either re-introduce the fix patch separately, or rewrite the check in patch 12, then update it in patch 22. > In my opinion having the fix separate was better but I can go with either solutions. > > Do you have a preference? > > (I did not have a setup this morning to do runtime checks, I will run them later. Reading the code this is my current understanding and it make sense for now.) I would have a separate fix. That's one thing. But, either I missed something or there is another issue. I _also_ got the assert with the whole series applied, while restarting (stop; sleep 10; start): 2021-03-16T15:40:30.970Z|00368|dpif_netdev|WARN|There's no available (non-isolated) pmd thread on numa node 0. Queue 0 on port 'vhost0' will be assigned to the pmd on core 31 (numa node 1). Expect reduced performance. 2021-03-16T15:40:30.970Z|00369|dpdk|INFO|VHOST_CONFIG: free connfd = 114 for device '/var/lib/vhost_sockets/vhost6' 2021-03-16T15:40:30.970Z|00370|netdev_dpdk|INFO|vHost Device '/var/lib/vhost_sockets/vhost6' not found 2021-03-16T15:40:30.989Z|00371|util|EMER|lib/netdev-offload.c:479: assertion thread_is_hw_offload || thread_is_rcu failed in netdev_offload_thread_init() 2021-03-16T15:40:41.013Z|00001|vlog|INFO|opened log file /var/log/openvswitch/ovs-vswitchd.log 2021-03-16T15:40:41.028Z|00002|ovs_numa|INFO|Discovered 28 CPU cores on NUMA node 0 2021-03-16T15:40:41.028Z|00003|ovs_numa|INFO|Discovered 28 CPU cores on NUMA node 1 2021-03-16T15:40:41.028Z|00004|ovs_numa|INFO|Discovered 2 NUMA nodes and 56 CPU cores And looking at ovs threads, they look fine: 70948 cpu_list=0-1,28-29 ctxt_switches=492,19 urcu3 70950 cpu_list=0-1,28-29 ctxt_switches=4453,12 hw_offload5
On Tue, Mar 16, 2021 at 4:45 PM David Marchand <david.marchand@redhat.com> wrote: > But, either I missed something or there is another issue. > I _also_ got the assert with the whole series applied, while > restarting (stop; sleep 10; start): > > 2021-03-16T15:40:30.970Z|00368|dpif_netdev|WARN|There's no available > (non-isolated) pmd thread on numa node 0. Queue 0 on port 'vhost0' > will be assigned to the pmd on core 31 (numa node 1). Expect reduced > performance. > 2021-03-16T15:40:30.970Z|00369|dpdk|INFO|VHOST_CONFIG: free connfd = > 114 for device '/var/lib/vhost_sockets/vhost6' > 2021-03-16T15:40:30.970Z|00370|netdev_dpdk|INFO|vHost Device > '/var/lib/vhost_sockets/vhost6' not found > 2021-03-16T15:40:30.989Z|00371|util|EMER|lib/netdev-offload.c:479: > assertion thread_is_hw_offload || thread_is_rcu failed in > netdev_offload_thread_init() > > 2021-03-16T15:40:41.013Z|00001|vlog|INFO|opened log file > /var/log/openvswitch/ovs-vswitchd.log > 2021-03-16T15:40:41.028Z|00002|ovs_numa|INFO|Discovered 28 CPU cores > on NUMA node 0 > 2021-03-16T15:40:41.028Z|00003|ovs_numa|INFO|Discovered 28 CPU cores > on NUMA node 1 > 2021-03-16T15:40:41.028Z|00004|ovs_numa|INFO|Discovered 2 NUMA nodes > and 56 CPU cores > > And looking at ovs threads, they look fine: > 70948 cpu_list=0-1,28-29 ctxt_switches=492,19 urcu3 > 70950 cpu_list=0-1,28-29 ctxt_switches=4453,12 hw_offload5 Some more context (played with backtrace() + adding some more info manually in ovs_assert()): 2021-03-16T17:13:34.278Z|00372|util|ERR|15: [ovs-vswitchd(_start+0x2e) [0x4e8e5e]] 2021-03-16T17:13:34.278Z|00373|util|ERR|14: [/lib64/libc.so.6(__libc_start_main+0xf3) [0x7fdfc1674803]] 2021-03-16T17:13:34.278Z|00374|util|ERR|13: [ovs-vswitchd(main+0x389) [0x4e7cd9]] 2021-03-16T17:13:34.278Z|00375|util|ERR|12: [ovs-vswitchd(bridge_exit+0x159) [0x59d6669]] 2021-03-16T17:13:34.278Z|00376|util|ERR|11: [ovs-vswitchd() [0x59d1f0e]] 2021-03-16T17:13:34.278Z|00377|util|ERR|10: [ovs-vswitchd(ofproto_destroy+0x110) [0x59e7af0]] 2021-03-16T17:13:34.278Z|00378|util|ERR|9: [ovs-vswitchd() [0x59df91b]] 2021-03-16T17:13:34.278Z|00379|util|ERR|8: [ovs-vswitchd() [0x59f34d2]] 2021-03-16T17:13:34.278Z|00380|util|ERR|7: [ovs-vswitchd(dpif_port_del+0x1f) [0x5a461df]] 2021-03-16T17:13:34.279Z|00381|util|ERR|6: [ovs-vswitchd() [0x5a3c207]] 2021-03-16T17:13:34.279Z|00382|util|ERR|5: [ovs-vswitchd() [0x5a3bcd5]] <-- do_del_port 2021-03-16T17:13:34.279Z|00383|util|ERR|4: [ovs-vswitchd() [0x5b27fc8]] <-- netdev_offload_dpdk_flow_flush 2021-03-16T17:13:34.279Z|00384|util|ERR|3: [ovs-vswitchd() [0x5b27f85]] <-- netdev_offload_dpdk_flow_destroy 2021-03-16T17:13:34.279Z|00385|util|ERR|2: [ovs-vswitchd(netdev_offload_thread_init+0x84) [0x5a684d4]] 2021-03-16T17:13:34.279Z|00386|util|ERR|1: [ovs-vswitchd(ovs_assert_failure+0x25) [0x5af3985]] 2021-03-16T17:13:34.279Z|00387|util|EMER|lib/netdev-offload.c:479: assertion thread_is_hw_offload || thread_is_rcu failed in netdev_offload_thread_init()
On Tue, Mar 16, 2021, at 17:23, David Marchand wrote: > On Tue, Mar 16, 2021 at 4:45 PM David Marchand > <david.marchand@redhat.com> wrote: > > But, either I missed something or there is another issue. > > I _also_ got the assert with the whole series applied, while > > restarting (stop; sleep 10; start): > > > > 2021-03-16T15:40:30.970Z|00368|dpif_netdev|WARN|There's no available > > (non-isolated) pmd thread on numa node 0. Queue 0 on port 'vhost0' > > will be assigned to the pmd on core 31 (numa node 1). Expect reduced > > performance. > > 2021-03-16T15:40:30.970Z|00369|dpdk|INFO|VHOST_CONFIG: free connfd = > > 114 for device '/var/lib/vhost_sockets/vhost6' > > 2021-03-16T15:40:30.970Z|00370|netdev_dpdk|INFO|vHost Device > > '/var/lib/vhost_sockets/vhost6' not found > > 2021-03-16T15:40:30.989Z|00371|util|EMER|lib/netdev-offload.c:479: > > assertion thread_is_hw_offload || thread_is_rcu failed in > > netdev_offload_thread_init() > > > > 2021-03-16T15:40:41.013Z|00001|vlog|INFO|opened log file > > /var/log/openvswitch/ovs-vswitchd.log > > 2021-03-16T15:40:41.028Z|00002|ovs_numa|INFO|Discovered 28 CPU cores > > on NUMA node 0 > > 2021-03-16T15:40:41.028Z|00003|ovs_numa|INFO|Discovered 28 CPU cores > > on NUMA node 1 > > 2021-03-16T15:40:41.028Z|00004|ovs_numa|INFO|Discovered 2 NUMA nodes > > and 56 CPU cores > > > > And looking at ovs threads, they look fine: > > 70948 cpu_list=0-1,28-29 ctxt_switches=492,19 urcu3 > > 70950 cpu_list=0-1,28-29 ctxt_switches=4453,12 hw_offload5 > > Some more context (played with backtrace() + adding some more info > manually in ovs_assert()): > > 2021-03-16T17:13:34.278Z|00372|util|ERR|15: [ovs-vswitchd(_start+0x2e) > [0x4e8e5e]] > 2021-03-16T17:13:34.278Z|00373|util|ERR|14: > [/lib64/libc.so.6(__libc_start_main+0xf3) [0x7fdfc1674803]] > 2021-03-16T17:13:34.278Z|00374|util|ERR|13: [ovs-vswitchd(main+0x389) > [0x4e7cd9]] > 2021-03-16T17:13:34.278Z|00375|util|ERR|12: > [ovs-vswitchd(bridge_exit+0x159) [0x59d6669]] > 2021-03-16T17:13:34.278Z|00376|util|ERR|11: [ovs-vswitchd() [0x59d1f0e]] > 2021-03-16T17:13:34.278Z|00377|util|ERR|10: > [ovs-vswitchd(ofproto_destroy+0x110) [0x59e7af0]] > 2021-03-16T17:13:34.278Z|00378|util|ERR|9: [ovs-vswitchd() [0x59df91b]] > 2021-03-16T17:13:34.278Z|00379|util|ERR|8: [ovs-vswitchd() [0x59f34d2]] > 2021-03-16T17:13:34.278Z|00380|util|ERR|7: > [ovs-vswitchd(dpif_port_del+0x1f) [0x5a461df]] > 2021-03-16T17:13:34.279Z|00381|util|ERR|6: [ovs-vswitchd() [0x5a3c207]] > 2021-03-16T17:13:34.279Z|00382|util|ERR|5: [ovs-vswitchd() > [0x5a3bcd5]] <-- do_del_port > 2021-03-16T17:13:34.279Z|00383|util|ERR|4: [ovs-vswitchd() > [0x5b27fc8]] <-- netdev_offload_dpdk_flow_flush > 2021-03-16T17:13:34.279Z|00384|util|ERR|3: [ovs-vswitchd() > [0x5b27f85]] <-- netdev_offload_dpdk_flow_destroy > 2021-03-16T17:13:34.279Z|00385|util|ERR|2: > [ovs-vswitchd(netdev_offload_thread_init+0x84) [0x5a684d4]] > 2021-03-16T17:13:34.279Z|00386|util|ERR|1: > [ovs-vswitchd(ovs_assert_failure+0x25) [0x5af3985]] > 2021-03-16T17:13:34.279Z|00387|util|EMER|lib/netdev-offload.c:479: > assertion thread_is_hw_offload || thread_is_rcu failed in > netdev_offload_thread_init() > > > -- > David Marchand > > Hey, thanks for taking the time to give more info! I'm in a pickle now though. To rewind back a little: flush support was implemented after this series was written. I added support for it as well, doing a proper parallel dispatch. It needs however some ugly sync between the thread doing the flush and offload threads. The ovs_barrier used for this has a UAF (which does not affect RCU or revalidators, only in very specific contexts such as this new one). It was starting to become a series on its own, on top of an already large one, so I decided to keep it for later. It seems I will need to do it in one fell swoop instead. Sorry about this, I should have picked it up before. Well at least the crash is pretty obvious :) . Thanks again though, I'll send a proper v2 ASAP, but it will require a few additional patches. Regards,
On Tue, Mar 16, 2021 at 6:49 PM Gaëtan Rivet <grive@u256.net> wrote: > Hey, thanks for taking the time to give more info! > > I'm in a pickle now though. > > To rewind back a little: flush support was implemented after this series was written. > I added support for it as well, doing a proper parallel dispatch. It needs however some ugly sync between the thread doing the flush and offload threads. The ovs_barrier used for this has a UAF (which does not affect RCU or revalidators, only in very specific contexts such as this new one). > > It was starting to become a series on its own, on top of an already large one, so I decided to keep it for later. > It seems I will need to do it in one fell swoop instead. > > Sorry about this, I should have picked it up before. Well at least the crash is pretty obvious :) . Err, yes. Did you consider adding more unit tests? Not sure it could catch this situation though. > Thanks again though, I'll send a proper v2 ASAP, but it will require a few additional patches. I'll continue looking at the v1 anyway. One issue that I noticed. With traffic running non stop and restarting ovs, one way of my flows (pf -> vf rep) remains as partial offloads while the other way (vf rep -> pf) is fully offloaded. I can easily "fix" this by stopping traffic, letting flows expire and restarting traffic and then, all flows are fully offloaded afterwards. Flushing fdb also works. I did not see this before the series, but I need to double check.
On 3/17/21 2:38 PM, David Marchand wrote: > On Tue, Mar 16, 2021 at 6:49 PM Gaëtan Rivet <grive@u256.net> wrote: >> Hey, thanks for taking the time to give more info! >> >> I'm in a pickle now though. >> >> To rewind back a little: flush support was implemented after this series was written. >> I added support for it as well, doing a proper parallel dispatch. It needs however some ugly sync between the thread doing the flush and offload threads. The ovs_barrier used for this has a UAF (which does not affect RCU or revalidators, only in very specific contexts such as this new one). >> >> It was starting to become a series on its own, on top of an already large one, so I decided to keep it for later. >> It seems I will need to do it in one fell swoop instead. >> >> Sorry about this, I should have picked it up before. Well at least the crash is pretty obvious :) . > > Err, yes. > Did you consider adding more unit tests? > Not sure it could catch this situation though. > > >> Thanks again though, I'll send a proper v2 ASAP, but it will require a few additional patches. > > I'll continue looking at the v1 anyway. > > One issue that I noticed. > With traffic running non stop and restarting ovs, one way of my flows > (pf -> vf rep) remains as partial offloads while the other way (vf rep > -> pf) is fully offloaded. > I can easily "fix" this by stopping traffic, letting flows expire and > restarting traffic and then, all flows are fully offloaded afterwards. > Flushing fdb also works. > > I did not see this before the series, but I need to double check. > > FYI, on my testbed (CX-6 Dx), I can reproduce it without this series applied.
On Wed, Mar 17, 2021, at 18:48, Maxime Coquelin wrote: > > > On 3/17/21 2:38 PM, David Marchand wrote: > > On Tue, Mar 16, 2021 at 6:49 PM Gaëtan Rivet <grive@u256.net> wrote: > >> Hey, thanks for taking the time to give more info! > >> > >> I'm in a pickle now though. > >> > >> To rewind back a little: flush support was implemented after this series was written. > >> I added support for it as well, doing a proper parallel dispatch. It needs however some ugly sync between the thread doing the flush and offload threads. The ovs_barrier used for this has a UAF (which does not affect RCU or revalidators, only in very specific contexts such as this new one). > >> > >> It was starting to become a series on its own, on top of an already large one, so I decided to keep it for later. > >> It seems I will need to do it in one fell swoop instead. > >> > >> Sorry about this, I should have picked it up before. Well at least the crash is pretty obvious :) . > > > > Err, yes. > > Did you consider adding more unit tests? > > Not sure it could catch this situation though. > > > > > >> Thanks again though, I'll send a proper v2 ASAP, but it will require a few additional patches. > > > > I'll continue looking at the v1 anyway. > > > > One issue that I noticed. > > With traffic running non stop and restarting ovs, one way of my flows > > (pf -> vf rep) remains as partial offloads while the other way (vf rep > > -> pf) is fully offloaded. > > I can easily "fix" this by stopping traffic, letting flows expire and > > restarting traffic and then, all flows are fully offloaded afterwards. > > Flushing fdb also works. > > > > I did not see this before the series, but I need to double check. > > > > > > FYI, on my testbed (CX-6 Dx), I can reproduce it without this series > applied. > > The v2 has been submitted: https://mail.openvswitch.org/pipermail/ovs-dev/2021-April/382036.html