[ovs-dev,v1,00/23] dpif-netdev: Parallel offload processing

Message ID	cover.1612968145.git.grive@u256.net
Headers	show Return-Path: <ovs-dev-bounces@openvswitch.org> From: Gaetan Rivet <grive@u256.net> To: dev@openvswitch.org Date: Wed, 10 Feb 2021 15:57:18 +0100 Message-Id: <cover.1612968145.git.grive@u256.net> MIME-Version: 1.0 Cc: elibr@nvidia.com, ameerm@nvidia.com, i.maximets@ovn.org, majd@nvidia.com Subject: [ovs-dev] [PATCH v1 00/23] dpif-netdev: Parallel offload processing Precedence: list Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" <ovs-dev-bounces@openvswitch.org>
Series	dpif-netdev: Parallel offload processing \| expand [ovs-dev,v1,00/23] dpif-netdev: Parallel offload processing [ovs-dev,v1,01/23] netdev: Add flow API de-init function [ovs-dev,v1,02/23] netdev-offload-dpdk: Use per-netdev offload metadata [ovs-dev,v1,03/23] netdev-offload-dpdk: Implement hw-offload statistics read [ovs-dev,v1,04/23] dpctl: Add function to read hardware offload statistics [ovs-dev,v1,05/23] dpif-netdev: Rename offload thread structure [ovs-dev,v1,06/23] mov-avg: Add a moving average helper structure [ovs-dev,v1,07/23] dpif-netdev: Implement hardware offloads stats query [ovs-dev,v1,08/23] ovs-atomic: Expose atomic exchange operation [ovs-dev,v1,09/23] mpsc-queue: Module for lock-free message passing [ovs-dev,v1,10/23] llring: Add lockless MPMC bounded queue structure [ovs-dev,v1,11/23] seq-pool: Module for faster ID generation [ovs-dev,v1,12/23] netdev-offload: Add multi-thread API [ovs-dev,v1,13/23] dpif-netdev: Quiesce offload thread periodically [ovs-dev,v1,14/23] dpif-netdev: Postpone flow offload item freeing [ovs-dev,v1,15/23] dpif-netdev: Use seq-pool for mark allocation [ovs-dev,v1,16/23] netdev-offload-dpdk: Use per-thread HW offload stats [ovs-dev,v1,17/23] netdev-offload-dpdk: Lock rte_flow map access [ovs-dev,v1,18/23] netdev-offload-dpdk: Protect concurrent offload destroy/query [ovs-dev,v1,19/23] dpif-netdev: Use lockless queue to manage offloads [ovs-dev,v1,20/23] dpif-netdev: Make megaflow and mark mappings thread objects [ovs-dev,v1,21/23] dpif-netdev: Replace port mutex by rwlock [ovs-dev,v1,22/23] dpif-netdev: Use one or more offload threads [ovs-dev,v1,23/23] netdev-dpdk: Remove rte-flow API access locks

Gaetan Rivet Feb. 10, 2021, 2:57 p.m. UTC

This patch series aims to improve the performance of the management
of hw-offloads in dpif-netdev. In the current version, some setup
will experience high memory usage and poor latency between a flow
decision and its execution regarding hardware offloading.

This series starts by measuring key metrics regarding both issues
Those patches are introduced first to compare the current status
with each improvements introduced.
Offloads enqueued and inserted, as well as the latency
from queue insertion to hardware insertion is measured. A new
command 'ovs-appctl dpctl/offload-stats-show' is introduced
to show the current measure.

In my current performance test setup I am measuring an
average latency hovering between 1~2 seconds.
After the optimizations, it is reduced to 500~900 ms.
Finally when using multiple threads and with proper driver
support[1], it is measured in the order of 1 ms.

A few modules are introduced:

  * An ID pool with reduced capabilities, simplifying its
    operations and allowing better performances in both
    single and multi-thread setup.

  * A lockless queue between PMDs / revalidators and
    offload thread(s). As the number of PMDs increases,
    contention can be high on the shared queue.
    This queue is designed to serve as message queue
    between threads.

  * A bounded lockless MPMC ring and some helpers for
    calculating moving averages.

  * A moving average module for Cumulative and Exponential
    moving averages.

The netdev-offload-dpdk module is made thread-safe.
Internal maps are made per-netdev instead, and locks are
taken for shorter critical sections within the module.

CI result: https://github.com/grivet/ovs/actions/runs/554918929

[1]: The rte_flow API was made thread-safe in the 20.11 DPDK
     release. Drivers that do not implement those operations
     concurrently are protected by a lock. Others will
     allow better concurrency, that improve the result
     of this series.


Gaetan Rivet (23):
  netdev: Add flow API de-init function
  netdev-offload-dpdk: Use per-netdev offload metadata
  netdev-offload-dpdk: Implement hw-offload statistics read
  dpctl: Add function to read hardware offload statistics
  dpif-netdev: Rename offload thread structure
  mov-avg: Add a moving average helper structure
  dpif-netdev: Implement hardware offloads stats query
  ovs-atomic: Expose atomic exchange operation
  mpsc-queue: Module for lock-free message passing
  llring: Add lockless MPMC bounded queue structure
  seq-pool: Module for faster ID generation
  netdev-offload: Add multi-thread API
  dpif-netdev: Quiesce offload thread periodically
  dpif-netdev: Postpone flow offload item freeing
  dpif-netdev: Use seq-pool for mark allocation
  netdev-offload-dpdk: Use per-thread HW offload stats
  netdev-offload-dpdk: Lock rte_flow map access
  netdev-offload-dpdk: Protect concurrent offload destroy/query
  dpif-netdev: Use lockless queue to manage offloads
  dpif-netdev: Make megaflow and mark mappings thread objects
  dpif-netdev: Replace port mutex by rwlock
  dpif-netdev: Use one or more offload threads
  netdev-dpdk: Remove rte-flow API access locks

 lib/automake.mk               |   7 +
 lib/dpctl.c                   |  36 +++
 lib/dpif-netdev.c             | 506 ++++++++++++++++++++---------
 lib/dpif-netlink.c            |   1 +
 lib/dpif-provider.h           |   7 +
 lib/dpif.c                    |   8 +
 lib/dpif.h                    |   9 +
 lib/llring.c                  | 153 +++++++++
 lib/llring.h                  |  76 +++++
 lib/mov-avg.h                 | 171 ++++++++++
 lib/mpsc-queue.c              | 190 +++++++++++
 lib/mpsc-queue.h              | 149 +++++++++
 lib/netdev-dpdk.c             |   6 -
 lib/netdev-offload-dpdk.c     | 270 ++++++++++++++--
 lib/netdev-offload-provider.h |   4 +
 lib/netdev-offload.c          |  92 +++++-
 lib/netdev-offload.h          |  21 ++
 lib/ovs-atomic-c++.h          |   3 +
 lib/ovs-atomic-clang.h        |   5 +
 lib/ovs-atomic-gcc4+.h        |   5 +
 lib/ovs-atomic-gcc4.7+.h      |   5 +
 lib/ovs-atomic-i586.h         |   5 +
 lib/ovs-atomic-locked.h       |   9 +
 lib/ovs-atomic-msvc.h         |  22 ++
 lib/ovs-atomic-pthreads.h     |   5 +
 lib/ovs-atomic-x86_64.h       |   5 +
 lib/ovs-atomic.h              |   8 +-
 lib/seq-pool.c                | 198 ++++++++++++
 lib/seq-pool.h                |  66 ++++
 tests/automake.mk             |   2 +
 tests/library.at              |  10 +
 tests/test-mpsc-queue.c       | 580 ++++++++++++++++++++++++++++++++++
 tests/test-seq-pool.c         | 543 +++++++++++++++++++++++++++++++
 vswitchd/vswitch.xml          |  16 +
 34 files changed, 3007 insertions(+), 186 deletions(-)
 create mode 100644 lib/llring.c
 create mode 100644 lib/llring.h
 create mode 100644 lib/mov-avg.h
 create mode 100644 lib/mpsc-queue.c
 create mode 100644 lib/mpsc-queue.h
 create mode 100644 lib/seq-pool.c
 create mode 100644 lib/seq-pool.h
 create mode 100644 tests/test-mpsc-queue.c
 create mode 100644 tests/test-seq-pool.c

--
2.30.0

David Marchand March 15, 2021, 9:38 a.m. UTC | #1

On Wed, Feb 10, 2021 at 4:07 PM Gaetan Rivet <grive@u256.net> wrote:
>
> This patch series aims to improve the performance of the management
> of hw-offloads in dpif-netdev. In the current version, some setup
> will experience high memory usage and poor latency between a flow
> decision and its execution regarding hardware offloading.
>
> This series starts by measuring key metrics regarding both issues
> Those patches are introduced first to compare the current status
> with each improvements introduced.
> Offloads enqueued and inserted, as well as the latency
> from queue insertion to hardware insertion is measured. A new
> command 'ovs-appctl dpctl/offload-stats-show' is introduced
> to show the current measure.
>
> In my current performance test setup I am measuring an
> average latency hovering between 1~2 seconds.
> After the optimizations, it is reduced to 500~900 ms.
> Finally when using multiple threads and with proper driver
> support[1], it is measured in the order of 1 ms.
>
> A few modules are introduced:
>
>   * An ID pool with reduced capabilities, simplifying its
>     operations and allowing better performances in both
>     single and multi-thread setup.
>
>   * A lockless queue between PMDs / revalidators and
>     offload thread(s). As the number of PMDs increases,
>     contention can be high on the shared queue.
>     This queue is designed to serve as message queue
>     between threads.
>
>   * A bounded lockless MPMC ring and some helpers for
>     calculating moving averages.
>
>   * A moving average module for Cumulative and Exponential
>     moving averages.
>
> The netdev-offload-dpdk module is made thread-safe.
> Internal maps are made per-netdev instead, and locks are
> taken for shorter critical sections within the module.

Hey Gaetan,

This looks like an interesting series, I'll be looking too at it in
the coming weeks.
Just a first and easy update, I noticed an assert is triggered when
stopping OVS (at least).
2021-03-11T20:32:01.928Z|00350|util|EMER|lib/netdev-offload.c:479:
assertion thread_is_hw_offload || thread_is_rcu failed in
netdev_offload_thread_init()

I did not check yet where the issue is, but I noticed it too half way
of the series, and it was not on stop in this case.
You can probably catch it easily.

I have hw-offloads enabled, 2 pf ports, 2 representors ports in a
single bridge, 2 pmds, no additional configuration.
I usually have bi directional traffic running through OVS while I restart.

Gaetan Rivet March 15, 2021, 12:29 p.m. UTC | #2

On Mon, Mar 15, 2021, at 09:38, David Marchand wrote:
> On Wed, Feb 10, 2021 at 4:07 PM Gaetan Rivet <grive@u256.net> wrote:
> >
> > This patch series aims to improve the performance of the management
> > of hw-offloads in dpif-netdev. In the current version, some setup
> > will experience high memory usage and poor latency between a flow
> > decision and its execution regarding hardware offloading.
> >
> > This series starts by measuring key metrics regarding both issues
> > Those patches are introduced first to compare the current status
> > with each improvements introduced.
> > Offloads enqueued and inserted, as well as the latency
> > from queue insertion to hardware insertion is measured. A new
> > command 'ovs-appctl dpctl/offload-stats-show' is introduced
> > to show the current measure.
> >
> > In my current performance test setup I am measuring an
> > average latency hovering between 1~2 seconds.
> > After the optimizations, it is reduced to 500~900 ms.
> > Finally when using multiple threads and with proper driver
> > support[1], it is measured in the order of 1 ms.
> >
> > A few modules are introduced:
> >
> >   * An ID pool with reduced capabilities, simplifying its
> >     operations and allowing better performances in both
> >     single and multi-thread setup.
> >
> >   * A lockless queue between PMDs / revalidators and
> >     offload thread(s). As the number of PMDs increases,
> >     contention can be high on the shared queue.
> >     This queue is designed to serve as message queue
> >     between threads.
> >
> >   * A bounded lockless MPMC ring and some helpers for
> >     calculating moving averages.
> >
> >   * A moving average module for Cumulative and Exponential
> >     moving averages.
> >
> > The netdev-offload-dpdk module is made thread-safe.
> > Internal maps are made per-netdev instead, and locks are
> > taken for shorter critical sections within the module.
> 
> Hey Gaetan,
> 
> This looks like an interesting series, I'll be looking too at it in
> the coming weeks.
> Just a first and easy update, I noticed an assert is triggered when
> stopping OVS (at least).
> 2021-03-11T20:32:01.928Z|00350|util|EMER|lib/netdev-offload.c:479:
> assertion thread_is_hw_offload || thread_is_rcu failed in
> netdev_offload_thread_init()
> 
> I did not check yet where the issue is, but I noticed it too half way
> of the series, and it was not on stop in this case.
> You can probably catch it easily.
> 
> I have hw-offloads enabled, 2 pf ports, 2 representors ports in a
> single bridge, 2 pmds, no additional configuration.
> I usually have bi directional traffic running through OVS while I restart.
> 
> 
> -- 
> David Marchand

Hey David,

Thanks for the report!
This abort is added in patch 12: netdev-offload: Add multi-thread API
It is relying on the offload thread being named 'hw_offload'.
This name is changed by patch 22: dpif-netdev: Use one or more offload threads

In the RFC series, a separate patch did a fix on the thread name [1].
I was relying on this fix happening first. I only did compilation checks in-between after squashing this patch.

I can either re-introduce the fix patch separately, or rewrite the check in patch 12, then update it in patch 22.
In my opinion having the fix separate was better but I can go with either solutions.

Do you have a preference?

(I did not have a setup this morning to do runtime checks, I will run them later. Reading the code this is my current understanding and it make sense for now.)

[1]: https://mail.openvswitch.org/pipermail/ovs-dev/2020-December/378337.html

David Marchand March 16, 2021, 3:45 p.m. UTC | #3

On Mon, Mar 15, 2021 at 1:29 PM Gaëtan Rivet <grive@u256.net> wrote:
> > Just a first and easy update, I noticed an assert is triggered when
> > stopping OVS (at least).
> > 2021-03-11T20:32:01.928Z|00350|util|EMER|lib/netdev-offload.c:479:
> > assertion thread_is_hw_offload || thread_is_rcu failed in
> > netdev_offload_thread_init()
> >
> > I did not check yet where the issue is, but I noticed it too half way
> > of the series, and it was not on stop in this case.
> > You can probably catch it easily.
> >
> > I have hw-offloads enabled, 2 pf ports, 2 representors ports in a
> > single bridge, 2 pmds, no additional configuration.
> > I usually have bi directional traffic running through OVS while I restart.
> This abort is added in patch 12: netdev-offload: Add multi-thread API
> It is relying on the offload thread being named 'hw_offload'.
> This name is changed by patch 22: dpif-netdev: Use one or more offload threads
>
> In the RFC series, a separate patch did a fix on the thread name [1].
> I was relying on this fix happening first. I only did compilation checks in-between after squashing this patch.
>
> I can either re-introduce the fix patch separately, or rewrite the check in patch 12, then update it in patch 22.
> In my opinion having the fix separate was better but I can go with either solutions.
>
> Do you have a preference?
>
> (I did not have a setup this morning to do runtime checks, I will run them later. Reading the code this is my current understanding and it make sense for now.)

I would have a separate fix.
That's one thing.

But, either I missed something or there is another issue.
I _also_ got the assert with the whole series applied, while
restarting (stop; sleep 10; start):

2021-03-16T15:40:30.970Z|00368|dpif_netdev|WARN|There's no available
(non-isolated) pmd thread on numa node 0. Queue 0 on port 'vhost0'
will be assigned to the pmd on core 31 (numa node 1). Expect reduced
performance.
2021-03-16T15:40:30.970Z|00369|dpdk|INFO|VHOST_CONFIG: free connfd =
114 for device '/var/lib/vhost_sockets/vhost6'
2021-03-16T15:40:30.970Z|00370|netdev_dpdk|INFO|vHost Device
'/var/lib/vhost_sockets/vhost6' not found
2021-03-16T15:40:30.989Z|00371|util|EMER|lib/netdev-offload.c:479:
assertion thread_is_hw_offload || thread_is_rcu failed in
netdev_offload_thread_init()

2021-03-16T15:40:41.013Z|00001|vlog|INFO|opened log file
/var/log/openvswitch/ovs-vswitchd.log
2021-03-16T15:40:41.028Z|00002|ovs_numa|INFO|Discovered 28 CPU cores
on NUMA node 0
2021-03-16T15:40:41.028Z|00003|ovs_numa|INFO|Discovered 28 CPU cores
on NUMA node 1
2021-03-16T15:40:41.028Z|00004|ovs_numa|INFO|Discovered 2 NUMA nodes
and 56 CPU cores

And looking at ovs threads, they look fine:
70948    cpu_list=0-1,28-29    ctxt_switches=492,19    urcu3
70950    cpu_list=0-1,28-29    ctxt_switches=4453,12    hw_offload5

David Marchand March 16, 2021, 5:23 p.m. UTC | #4

On Tue, Mar 16, 2021 at 4:45 PM David Marchand
<david.marchand@redhat.com> wrote:
> But, either I missed something or there is another issue.
> I _also_ got the assert with the whole series applied, while
> restarting (stop; sleep 10; start):
>
> 2021-03-16T15:40:30.970Z|00368|dpif_netdev|WARN|There's no available
> (non-isolated) pmd thread on numa node 0. Queue 0 on port 'vhost0'
> will be assigned to the pmd on core 31 (numa node 1). Expect reduced
> performance.
> 2021-03-16T15:40:30.970Z|00369|dpdk|INFO|VHOST_CONFIG: free connfd =
> 114 for device '/var/lib/vhost_sockets/vhost6'
> 2021-03-16T15:40:30.970Z|00370|netdev_dpdk|INFO|vHost Device
> '/var/lib/vhost_sockets/vhost6' not found
> 2021-03-16T15:40:30.989Z|00371|util|EMER|lib/netdev-offload.c:479:
> assertion thread_is_hw_offload || thread_is_rcu failed in
> netdev_offload_thread_init()
>
> 2021-03-16T15:40:41.013Z|00001|vlog|INFO|opened log file
> /var/log/openvswitch/ovs-vswitchd.log
> 2021-03-16T15:40:41.028Z|00002|ovs_numa|INFO|Discovered 28 CPU cores
> on NUMA node 0
> 2021-03-16T15:40:41.028Z|00003|ovs_numa|INFO|Discovered 28 CPU cores
> on NUMA node 1
> 2021-03-16T15:40:41.028Z|00004|ovs_numa|INFO|Discovered 2 NUMA nodes
> and 56 CPU cores
>
> And looking at ovs threads, they look fine:
> 70948    cpu_list=0-1,28-29    ctxt_switches=492,19    urcu3
> 70950    cpu_list=0-1,28-29    ctxt_switches=4453,12    hw_offload5

Some more context (played with backtrace() + adding some more info
manually in ovs_assert()):

2021-03-16T17:13:34.278Z|00372|util|ERR|15: [ovs-vswitchd(_start+0x2e)
[0x4e8e5e]]
2021-03-16T17:13:34.278Z|00373|util|ERR|14:
[/lib64/libc.so.6(__libc_start_main+0xf3) [0x7fdfc1674803]]
2021-03-16T17:13:34.278Z|00374|util|ERR|13: [ovs-vswitchd(main+0x389)
[0x4e7cd9]]
2021-03-16T17:13:34.278Z|00375|util|ERR|12:
[ovs-vswitchd(bridge_exit+0x159) [0x59d6669]]
2021-03-16T17:13:34.278Z|00376|util|ERR|11: [ovs-vswitchd() [0x59d1f0e]]
2021-03-16T17:13:34.278Z|00377|util|ERR|10:
[ovs-vswitchd(ofproto_destroy+0x110) [0x59e7af0]]
2021-03-16T17:13:34.278Z|00378|util|ERR|9: [ovs-vswitchd() [0x59df91b]]
2021-03-16T17:13:34.278Z|00379|util|ERR|8: [ovs-vswitchd() [0x59f34d2]]
2021-03-16T17:13:34.278Z|00380|util|ERR|7:
[ovs-vswitchd(dpif_port_del+0x1f) [0x5a461df]]
2021-03-16T17:13:34.279Z|00381|util|ERR|6: [ovs-vswitchd() [0x5a3c207]]
2021-03-16T17:13:34.279Z|00382|util|ERR|5: [ovs-vswitchd()
[0x5a3bcd5]] <-- do_del_port
2021-03-16T17:13:34.279Z|00383|util|ERR|4: [ovs-vswitchd()
[0x5b27fc8]] <-- netdev_offload_dpdk_flow_flush
2021-03-16T17:13:34.279Z|00384|util|ERR|3: [ovs-vswitchd()
[0x5b27f85]] <-- netdev_offload_dpdk_flow_destroy
2021-03-16T17:13:34.279Z|00385|util|ERR|2:
[ovs-vswitchd(netdev_offload_thread_init+0x84) [0x5a684d4]]
2021-03-16T17:13:34.279Z|00386|util|ERR|1:
[ovs-vswitchd(ovs_assert_failure+0x25) [0x5af3985]]
2021-03-16T17:13:34.279Z|00387|util|EMER|lib/netdev-offload.c:479:
assertion thread_is_hw_offload || thread_is_rcu failed in
netdev_offload_thread_init()

Gaetan Rivet March 16, 2021, 5:48 p.m. UTC | #5

On Tue, Mar 16, 2021, at 17:23, David Marchand wrote:
> On Tue, Mar 16, 2021 at 4:45 PM David Marchand
> <david.marchand@redhat.com> wrote:
> > But, either I missed something or there is another issue.
> > I _also_ got the assert with the whole series applied, while
> > restarting (stop; sleep 10; start):
> >
> > 2021-03-16T15:40:30.970Z|00368|dpif_netdev|WARN|There's no available
> > (non-isolated) pmd thread on numa node 0. Queue 0 on port 'vhost0'
> > will be assigned to the pmd on core 31 (numa node 1). Expect reduced
> > performance.
> > 2021-03-16T15:40:30.970Z|00369|dpdk|INFO|VHOST_CONFIG: free connfd =
> > 114 for device '/var/lib/vhost_sockets/vhost6'
> > 2021-03-16T15:40:30.970Z|00370|netdev_dpdk|INFO|vHost Device
> > '/var/lib/vhost_sockets/vhost6' not found
> > 2021-03-16T15:40:30.989Z|00371|util|EMER|lib/netdev-offload.c:479:
> > assertion thread_is_hw_offload || thread_is_rcu failed in
> > netdev_offload_thread_init()
> >
> > 2021-03-16T15:40:41.013Z|00001|vlog|INFO|opened log file
> > /var/log/openvswitch/ovs-vswitchd.log
> > 2021-03-16T15:40:41.028Z|00002|ovs_numa|INFO|Discovered 28 CPU cores
> > on NUMA node 0
> > 2021-03-16T15:40:41.028Z|00003|ovs_numa|INFO|Discovered 28 CPU cores
> > on NUMA node 1
> > 2021-03-16T15:40:41.028Z|00004|ovs_numa|INFO|Discovered 2 NUMA nodes
> > and 56 CPU cores
> >
> > And looking at ovs threads, they look fine:
> > 70948    cpu_list=0-1,28-29    ctxt_switches=492,19    urcu3
> > 70950    cpu_list=0-1,28-29    ctxt_switches=4453,12    hw_offload5
> 
> Some more context (played with backtrace() + adding some more info
> manually in ovs_assert()):
> 
> 2021-03-16T17:13:34.278Z|00372|util|ERR|15: [ovs-vswitchd(_start+0x2e)
> [0x4e8e5e]]
> 2021-03-16T17:13:34.278Z|00373|util|ERR|14:
> [/lib64/libc.so.6(__libc_start_main+0xf3) [0x7fdfc1674803]]
> 2021-03-16T17:13:34.278Z|00374|util|ERR|13: [ovs-vswitchd(main+0x389)
> [0x4e7cd9]]
> 2021-03-16T17:13:34.278Z|00375|util|ERR|12:
> [ovs-vswitchd(bridge_exit+0x159) [0x59d6669]]
> 2021-03-16T17:13:34.278Z|00376|util|ERR|11: [ovs-vswitchd() [0x59d1f0e]]
> 2021-03-16T17:13:34.278Z|00377|util|ERR|10:
> [ovs-vswitchd(ofproto_destroy+0x110) [0x59e7af0]]
> 2021-03-16T17:13:34.278Z|00378|util|ERR|9: [ovs-vswitchd() [0x59df91b]]
> 2021-03-16T17:13:34.278Z|00379|util|ERR|8: [ovs-vswitchd() [0x59f34d2]]
> 2021-03-16T17:13:34.278Z|00380|util|ERR|7:
> [ovs-vswitchd(dpif_port_del+0x1f) [0x5a461df]]
> 2021-03-16T17:13:34.279Z|00381|util|ERR|6: [ovs-vswitchd() [0x5a3c207]]
> 2021-03-16T17:13:34.279Z|00382|util|ERR|5: [ovs-vswitchd()
> [0x5a3bcd5]] <-- do_del_port
> 2021-03-16T17:13:34.279Z|00383|util|ERR|4: [ovs-vswitchd()
> [0x5b27fc8]] <-- netdev_offload_dpdk_flow_flush
> 2021-03-16T17:13:34.279Z|00384|util|ERR|3: [ovs-vswitchd()
> [0x5b27f85]] <-- netdev_offload_dpdk_flow_destroy
> 2021-03-16T17:13:34.279Z|00385|util|ERR|2:
> [ovs-vswitchd(netdev_offload_thread_init+0x84) [0x5a684d4]]
> 2021-03-16T17:13:34.279Z|00386|util|ERR|1:
> [ovs-vswitchd(ovs_assert_failure+0x25) [0x5af3985]]
> 2021-03-16T17:13:34.279Z|00387|util|EMER|lib/netdev-offload.c:479:
> assertion thread_is_hw_offload || thread_is_rcu failed in
> netdev_offload_thread_init()
> 
> 
> -- 
> David Marchand
> 
>

Hey, thanks for taking the time to give more info!

I'm in a pickle now though.

To rewind back a little: flush support was implemented after this series was written.
I added support for it as well, doing a proper parallel dispatch. It needs however some ugly sync between the thread doing the flush and offload threads. The ovs_barrier used for this has a UAF (which does not affect RCU or revalidators, only in very specific contexts such as this new one).

It was starting to become a series on its own, on top of an already large one, so I decided to keep it for later.
It seems I will need to do it in one fell swoop instead.

Sorry about this, I should have picked it up before. Well at least the crash is pretty obvious :) .
Thanks again though, I'll send a proper v2 ASAP, but it will require a few additional patches.

Regards,

David Marchand March 17, 2021, 1:38 p.m. UTC | #6

On Tue, Mar 16, 2021 at 6:49 PM Gaëtan Rivet <grive@u256.net> wrote:
> Hey, thanks for taking the time to give more info!
>
> I'm in a pickle now though.
>
> To rewind back a little: flush support was implemented after this series was written.
> I added support for it as well, doing a proper parallel dispatch. It needs however some ugly sync between the thread doing the flush and offload threads. The ovs_barrier used for this has a UAF (which does not affect RCU or revalidators, only in very specific contexts such as this new one).
>
> It was starting to become a series on its own, on top of an already large one, so I decided to keep it for later.
> It seems I will need to do it in one fell swoop instead.
>
> Sorry about this, I should have picked it up before. Well at least the crash is pretty obvious :) .

Err, yes.
Did you consider adding more unit tests?
Not sure it could catch this situation though.


> Thanks again though, I'll send a proper v2 ASAP, but it will require a few additional patches.

I'll continue looking at the v1 anyway.

One issue that I noticed.
With traffic running non stop and restarting ovs, one way of my flows
(pf -> vf rep) remains as partial offloads while the other way (vf rep
-> pf) is fully offloaded.
I can easily "fix" this by stopping traffic, letting flows expire and
restarting traffic and then, all flows are fully offloaded afterwards.
Flushing fdb also works.

I did not see this before the series, but I need to double check.

Maxime Coquelin March 17, 2021, 5:48 p.m. UTC | #7

On 3/17/21 2:38 PM, David Marchand wrote:
> On Tue, Mar 16, 2021 at 6:49 PM Gaëtan Rivet <grive@u256.net> wrote:
>> Hey, thanks for taking the time to give more info!
>>
>> I'm in a pickle now though.
>>
>> To rewind back a little: flush support was implemented after this series was written.
>> I added support for it as well, doing a proper parallel dispatch. It needs however some ugly sync between the thread doing the flush and offload threads. The ovs_barrier used for this has a UAF (which does not affect RCU or revalidators, only in very specific contexts such as this new one).
>>
>> It was starting to become a series on its own, on top of an already large one, so I decided to keep it for later.
>> It seems I will need to do it in one fell swoop instead.
>>
>> Sorry about this, I should have picked it up before. Well at least the crash is pretty obvious :) .
> 
> Err, yes.
> Did you consider adding more unit tests?
> Not sure it could catch this situation though.
> 
> 
>> Thanks again though, I'll send a proper v2 ASAP, but it will require a few additional patches.
> 
> I'll continue looking at the v1 anyway.
> 
> One issue that I noticed.
> With traffic running non stop and restarting ovs, one way of my flows
> (pf -> vf rep) remains as partial offloads while the other way (vf rep
> -> pf) is fully offloaded.
> I can easily "fix" this by stopping traffic, letting flows expire and
> restarting traffic and then, all flows are fully offloaded afterwards.
> Flushing fdb also works.
> 
> I did not see this before the series, but I need to double check.
> 
> 

FYI, on my testbed (CX-6 Dx), I can reproduce it without this series
applied.

Gaetan Rivet April 12, 2021, 3:27 p.m. UTC | #8

On Wed, Mar 17, 2021, at 18:48, Maxime Coquelin wrote:
> 
> 
> On 3/17/21 2:38 PM, David Marchand wrote:
> > On Tue, Mar 16, 2021 at 6:49 PM Gaëtan Rivet <grive@u256.net> wrote:
> >> Hey, thanks for taking the time to give more info!
> >>
> >> I'm in a pickle now though.
> >>
> >> To rewind back a little: flush support was implemented after this series was written.
> >> I added support for it as well, doing a proper parallel dispatch. It needs however some ugly sync between the thread doing the flush and offload threads. The ovs_barrier used for this has a UAF (which does not affect RCU or revalidators, only in very specific contexts such as this new one).
> >>
> >> It was starting to become a series on its own, on top of an already large one, so I decided to keep it for later.
> >> It seems I will need to do it in one fell swoop instead.
> >>
> >> Sorry about this, I should have picked it up before. Well at least the crash is pretty obvious :) .
> > 
> > Err, yes.
> > Did you consider adding more unit tests?
> > Not sure it could catch this situation though.
> > 
> > 
> >> Thanks again though, I'll send a proper v2 ASAP, but it will require a few additional patches.
> > 
> > I'll continue looking at the v1 anyway.
> > 
> > One issue that I noticed.
> > With traffic running non stop and restarting ovs, one way of my flows
> > (pf -> vf rep) remains as partial offloads while the other way (vf rep
> > -> pf) is fully offloaded.
> > I can easily "fix" this by stopping traffic, letting flows expire and
> > restarting traffic and then, all flows are fully offloaded afterwards.
> > Flushing fdb also works.
> > 
> > I did not see this before the series, but I need to double check.
> > 
> > 
> 
> FYI, on my testbed (CX-6 Dx), I can reproduce it without this series
> applied.
> 
>

The v2 has been submitted: https://mail.openvswitch.org/pipermail/ovs-dev/2021-April/382036.html

[ovs-dev,v1,00/23] dpif-netdev: Parallel offload processing

Message

Comments