mbox series

[ovs-dev,RFC,0/1] Userspace deferral of work

Message ID 20210414154427.56310-1-cian.ferriter@intel.com
Headers show
Series Userspace deferral of work | expand


Cian Ferriter April 14, 2021, 3:44 p.m. UTC
This patch adds infrastructure to the userspace datapath to defer or postpone
work. At a high level, each PMD thread places work items into its own per
thread work ring to be done later. The work ring is a FIFO queue of pointers to
work items. Each work item has a "work_func()" function pointer allowing
abstraction from what work is actually being done. More details about the
infrastructure can be seen in the patch and its commit message.

The ability to defer work is necessary when considering asynchronous use-cases.
The use-case this patch is targeted at is DMA offload of TX using VHOST ports.
In this use-case, packets are passed to a copy engine rather than being copied
in software. Once completed, the packets have to be freed and VHOST port
statistics have to be updated in software. This completion work needs to be

There are a number of requirements for an effective defer infrastructure. What
are these and how are they accomplished:

1. Allow the thread which kicked off the DMA transfer to keep doing useful
work, rather than waiting or polling for work to be completed.
This is accomplished by deferring the completion work for DMA transfer rather
than waiting for the DMA transfer to complete before moving on to process more
packets. The completion work is added to the work ring to be done after some
time, but more useful work can be done in the meantime.

2. Allow some time to pass between kicking off a DMA transfer for a VHOST port
and checking for completion of the DMA transfer.
This is accomplished by doing deferred work after processing all RXQs assigned
to a PMD thread.

3. Upon checking for completion of the DMA transfer, allow re-deferral of work
in the case where the DMA transfer has not completed.
This is accomplished by adding checks in the "do_work()" function to defer the
work again when DMA has not completed. This re-deferring of work helps with
requirements 1 and 2.

A ring buffer is used to queue the pointers to work items since its FIFO
property means the DMA transfers which have been in progress the longest are
checked first and have the highest chance of being completed.

For this RFC, DPDK's rte_ring is used as the ring buffer implementation. This
was the quickest way to get working code. A better solution will need to be
found, since rte_ring should not be used in generic OVS datapath code. This
TODO is mentioned in the code.

Cian Ferriter (1):
  dpif-netdev: Add a per thread work ring

 lib/dpif-netdev-perf.c |  13 ++++-
 lib/dpif-netdev-perf.h |   7 +++
 lib/dpif-netdev.c      | 125 ++++++++++++++++++++++++++++++++++++++++-
 lib/netdev-dpdk.c      |  22 +++++---
 lib/netdev-provider.h  |  15 ++++-
 lib/netdev.c           |   3 +-
 6 files changed, 172 insertions(+), 13 deletions(-)