diff mbox series

[ovs-dev,v2,2/9] doc: Add "PMD" topic document

Message ID 20180416143026.24561-3-stephen@that.guru
State Changes Requested
Delegated to: Ian Stokes
Headers show
Series Split up the DPDK how-to | expand

Commit Message

Stephen Finucane April 16, 2018, 2:30 p.m. UTC
This continues the breakup of the huge DPDK "howto" into smaller
components. There are a couple of related changes included, such as
using "Rx queue" instead of "rxq" and noting how Tx queues cannot be
configured.

Signed-off-by: Stephen Finucane <stephen@that.guru>
---
v2:
- Add cross-references from 'pmd' doc to 'vhost-user' and 'phy' docs
- Add 'versionchanged' warning about automatic assignment of Rx queues
- Add a 'todo' to describe Tx queue behavior
---
 Documentation/howto/dpdk.rst             |  86 -----------------
 Documentation/topics/dpdk/index.rst      |   1 +
 Documentation/topics/dpdk/phy.rst        |  12 +++
 Documentation/topics/dpdk/pmd.rst        | 156 +++++++++++++++++++++++++++++++
 Documentation/topics/dpdk/vhost-user.rst |  17 ++--
 5 files changed, 177 insertions(+), 95 deletions(-)
 create mode 100644 Documentation/topics/dpdk/pmd.rst

Comments

Stokes, Ian April 18, 2018, 3:31 p.m. UTC | #1
> This continues the breakup of the huge DPDK "howto" into smaller
> components. There are a couple of related changes included, such as using
> "Rx queue" instead of "rxq" and noting how Tx queues cannot be configured.
> 
> Signed-off-by: Stephen Finucane <stephen@that.guru>
> ---
> v2:
> - Add cross-references from 'pmd' doc to 'vhost-user' and 'phy' docs
> - Add 'versionchanged' warning about automatic assignment of Rx queues
> - Add a 'todo' to describe Tx queue behavior
> ---
>  Documentation/howto/dpdk.rst             |  86 -----------------
>  Documentation/topics/dpdk/index.rst      |   1 +
>  Documentation/topics/dpdk/phy.rst        |  12 +++
>  Documentation/topics/dpdk/pmd.rst        | 156
> +++++++++++++++++++++++++++++++
>  Documentation/topics/dpdk/vhost-user.rst |  17 ++--
>  5 files changed, 177 insertions(+), 95 deletions(-)  create mode 100644
> Documentation/topics/dpdk/pmd.rst
> 
> diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst
> index 79b626c76..388728363 100644
> --- a/Documentation/howto/dpdk.rst
> +++ b/Documentation/howto/dpdk.rst
> @@ -81,92 +81,6 @@ To stop ovs-vswitchd & delete bridge, run::
>      $ ovs-appctl -t ovsdb-server exit
>      $ ovs-vsctl del-br br0
> 
> -PMD Thread Statistics
> ----------------------
> -
> -To show current stats::
> -
> -    $ ovs-appctl dpif-netdev/pmd-stats-show
> -
> -To clear previous stats::
> -
> -    $ ovs-appctl dpif-netdev/pmd-stats-clear
> -
> -Port/RXQ Assigment to PMD Threads
> ----------------------------------
> -
> -To show port/rxq assignment::
> -
> -    $ ovs-appctl dpif-netdev/pmd-rxq-show
> -
> -To change default rxq assignment to pmd threads, rxqs may be manually
> pinned to -desired cores using::
> -
> -    $ ovs-vsctl set Interface <iface> \
> -        other_config:pmd-rxq-affinity=<rxq-affinity-list>
> -
> -where:
> -
> -- ``<rxq-affinity-list>`` is a CSV list of ``<queue-id>:<core-id>``
> values
> -
> -For example::
> -
> -    $ ovs-vsctl set interface dpdk-p0 options:n_rxq=4 \
> -        other_config:pmd-rxq-affinity="0:3,1:7,3:8"
> -
> -This will ensure:
> -
> -- Queue #0 pinned to core 3
> -- Queue #1 pinned to core 7
> -- Queue #2 not pinned
> -- Queue #3 pinned to core 8
> -
> -After that PMD threads on cores where RX queues was pinned will become -
> ``isolated``. This means that this thread will poll only pinned RX queues.
> -
> -.. warning::
> -  If there are no ``non-isolated`` PMD threads, ``non-pinned`` RX queues
> will
> -  not be polled. Also, if provided ``core_id`` is not available (ex. this
> -  ``core_id`` not in ``pmd-cpu-mask``), RX queue will not be polled by
> any PMD
> -  thread.
> -
> -If pmd-rxq-affinity is not set for rxqs, they will be assigned to pmds
> (cores) -automatically. The processing cycles that have been stored for
> each rxq -will be used where known to assign rxqs to pmd based on a round
> robin of the -sorted rxqs.
> -
> -For example, in the case where here there are 5 rxqs and 3 cores (e.g.
> 3,7,8) -available, and the measured usage of core cycles per rxq over the
> last -interval is seen to be:
> -
> -- Queue #0: 30%
> -- Queue #1: 80%
> -- Queue #3: 60%
> -- Queue #4: 70%
> -- Queue #5: 10%
> -
> -The rxqs will be assigned to cores 3,7,8 in the following order:
> -
> -Core 3: Q1 (80%) |
> -Core 7: Q4 (70%) | Q5 (10%)
> -core 8: Q3 (60%) | Q0 (30%)
> -
> -To see the current measured usage history of pmd core cycles for each
> rxq::
> -
> -    $ ovs-appctl dpif-netdev/pmd-rxq-show
> -
> -.. note::
> -
> -  A history of one minute is recorded and shown for each rxq to allow for
> -  traffic pattern spikes. An rxq's pmd core cycles usage changes due to
> traffic
> -  pattern or reconfig changes will take one minute before they are fully
> -  reflected in the stats.
> -
> -Rxq to pmds assignment takes place whenever there are configuration
> changes -or can be triggered by using::
> -
> -    $ ovs-appctl dpif-netdev/pmd-rxq-rebalance
> -
>  QoS
>  ---
> 
> diff --git a/Documentation/topics/dpdk/index.rst
> b/Documentation/topics/dpdk/index.rst
> index 5f836a6e9..dfde88377 100644
> --- a/Documentation/topics/dpdk/index.rst
> +++ b/Documentation/topics/dpdk/index.rst
> @@ -31,3 +31,4 @@ The DPDK Datapath
>     phy
>     vhost-user
>     ring
> +   pmd
> diff --git a/Documentation/topics/dpdk/phy.rst
> b/Documentation/topics/dpdk/phy.rst
> index a3f8b475c..ad191dad0 100644
> --- a/Documentation/topics/dpdk/phy.rst
> +++ b/Documentation/topics/dpdk/phy.rst
> @@ -113,3 +113,15 @@ tool::
>  For more information, refer to the `DPDK documentation <dpdk-drivers>`__.
> 
>  .. _dpdk-drivers: http://dpdk.org/doc/guides/linux_gsg/linux_drivers.html
> +
> +.. _dpdk-phy-multiqueue:
> +
> +Multiqueue
> +----------
> +
> +Poll Mode Driver (PMD) threads are the threads that do the heavy
> +lifting for the DPDK datapath. Correct configuration of PMD threads and
> +the Rx queues they utilize is a requirement in order to deliver the
> +high-performance possible with DPDK acceleration. It is possible to
> +configure multiple Rx queues for ``dpdk`` ports, thus ensuring this is
> +not a bottleneck for performance. For information on configuring PMD
> threads, refer to :doc:`pmd`.
> diff --git a/Documentation/topics/dpdk/pmd.rst
> b/Documentation/topics/dpdk/pmd.rst
> new file mode 100644
> index 000000000..1be25ade0
> --- /dev/null
> +++ b/Documentation/topics/dpdk/pmd.rst

Will cause compilation failure, pmd.rst not listed in Documentation/automake.mk

> @@ -0,0 +1,156 @@
> +..
> +      Licensed under the Apache License, Version 2.0 (the "License"); you
> may
> +      not use this file except in compliance with the License. You may
> obtain
> +      a copy of the License at
> +
> +          http://www.apache.org/licenses/LICENSE-2.0
> +
> +      Unless required by applicable law or agreed to in writing, software
> +      distributed under the License is distributed on an "AS IS" BASIS,
> WITHOUT
> +      WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> See the
> +      License for the specific language governing permissions and
> limitations
> +      under the License.
> +
> +      Convention for heading levels in Open vSwitch documentation:
> +
> +      =======  Heading 0 (reserved for the title in a document)
> +      -------  Heading 1
> +      ~~~~~~~  Heading 2
> +      +++++++  Heading 3
> +      '''''''  Heading 4
> +
> +      Avoid deeper levels because they do not render well.
> +
> +===========
> +PMD Threads
> +===========
> +
> +Poll Mode Driver (PMD) threads are the threads that do the heavy
> +lifting for the DPDK datapath and perform tasks such as continuous
> +polling of input ports for packets, classifying packets once received,
> +and executing actions on the packets once they are classified.
> +
> +PMD threads utilize Receive (Rx) and Transmit (Tx) queues, commonly
> +known as *rxq*\s and *txq*\s. While Tx queue configuration happens
> +automatically, Rx queues can be configured by the user. This can happen
> in one of two ways:
> +
> +- For physical interfaces, configuration is done using the
> +  :program:`ovs-appctl` utility.
> +
> +- For virtual interfaces, configuration is done using the
> +:program:`ovs-appctl`
> +  utility, but this configuration must be reflected in the guest
> +configuration
> +  (e.g. QEMU command line arguments).
> +
> +The :program:`ovs-appctl` utility also provides a number of commands
> +for querying PMD threads and their respective queues. This, and all of
> +the above, is discussed here.
> +
> +.. todo::
> +
> +   Add an overview of Tx queues including numbers created, how they
> relate to
> +   PMD threads, etc.
> +
> +PMD Thread Statistics
> +---------------------
> +
> +To show current stats::
> +
> +    $ ovs-appctl dpif-netdev/pmd-stats-show
> +
> +To clear previous stats::
> +
> +    $ ovs-appctl dpif-netdev/pmd-stats-clear
> +
> +Port/Rx Queue Assigment to PMD Threads
> +--------------------------------------
> +
> +.. todo::
> +
> +   This needs a more detailed overview of *why* this should be done,
> along with
> +   the impact on things like NUMA affinity.
> +
> +Correct configuration of PMD threads and the Rx queues they utilize is
> +a requirement in order to achieve maximum performance. This is
> +particularly true for enabling things like multiqueue for
> +:ref:`physical <dpdk-phy-multiqueue>` and :ref:`vhost-user <dpdk-vhost-
> user>` interfaces.
> +
> +To show port/Rx queue assignment::
> +
> +    $ ovs-appctl dpif-netdev/pmd-rxq-show
> +
> +Rx queues may be manually pinned to cores. This will change the default
> +Rx queue assignment to PMD threads::
> +
> +    $ ovs-vsctl set Interface <iface> \
> +        other_config:pmd-rxq-affinity=<rxq-affinity-list>
> +
> +where:
> +
> +- ``<rxq-affinity-list>`` is a CSV list of ``<queue-id>:<core-id>``
> +values
> +
> +For example::
> +
> +    $ ovs-vsctl set interface dpdk-p0 options:n_rxq=4 \
> +        other_config:pmd-rxq-affinity="0:3,1:7,3:8"
> +
> +This will ensure there are *4* Rx queues and that these queues are
> +configured like so:
> +
> +- Queue #0 pinned to core 3
> +- Queue #1 pinned to core 7
> +- Queue #2 not pinned
> +- Queue #3 pinned to core 8
> +
> +PMD threads on cores where Rx queues are *pinned* will become
> +*isolated*. This means that this thread will only poll the *pinned* Rx
> queues.
> +
> +.. warning::
> +
> +   If there are no *non-isolated* PMD threads, *non-pinned* RX queues
> will not
> +   be polled. Also, if the provided ``<core-id>`` is not available (e.g.
> the
> +   ``<core-id>`` is not in ``pmd-cpu-mask``), the RX queue will not be
> polled
> +   by any PMD thread.
> +
> +If ``pmd-rxq-affinity`` is not set for Rx queues, they will be assigned
> +to PMDs
> +(cores) automatically. Where known, the processing cycles that have
> +been stored for each Rx queue will be used to assign Rx queue to PMDs
> +based on a round robin of the sorted Rx queues. For example, take the
> +following example, where there are five Rx queues and three cores - 3,
> +7, and 8 - available and the measured usage of core cycles per Rx queue
> +over the last interval is seen to
> +be:
> +
> +- Queue #0: 30%
> +- Queue #1: 80%
> +- Queue #3: 60%
> +- Queue #4: 70%
> +- Queue #5: 10%
> +
> +The Rx queues will be assigned to the cores in the following order::
> +
> +    Core 3: Q1 (80%) |
> +    Core 7: Q4 (70%) | Q5 (10%)
> +    Core 8: Q3 (60%) | Q0 (30%)
> +
> +To see the current measured usage history of PMD core cycles for each
> +Rx
> +queue::
> +
> +    $ ovs-appctl dpif-netdev/pmd-rxq-show
> +
> +.. note::
> +
> +   A history of one minute is recorded and shown for each Rx queue to
> allow for
> +   traffic pattern spikes. Any changes in the Rx queue's PMD core cycles
> usage,
> +   due to traffic pattern or reconfig changes, will take one minute to be
> fully
> +   reflected in the stats.
> +
> +Rx queue to PMD assignment takes place whenever there are configuration
> +changes or can be triggered by using::
> +
> +    $ ovs-appctl dpif-netdev/pmd-rxq-rebalance
> +
> +.. versionchanged:: 2.8.0
> +
> +   Automatic assignment of Rx queues to PMDs and the two related
> commands,
> +   ``pmd-rxq-show`` and ``pmd-rxq-rebalance``, were added in OVS 2.8.0.
> Prior
> +   to this, behavior was round-robin and processing cycles were not taken
> into
> +   consideration. Tracking for stats was not available.

In 2.9 the output was changed to include % usage, this wasn't present in 2.8. Could be worth mentioning.

Ian 

> diff --git a/Documentation/topics/dpdk/vhost-user.rst
> b/Documentation/topics/dpdk/vhost-user.rst
> index ca8a3289f..6f794f296 100644
> --- a/Documentation/topics/dpdk/vhost-user.rst
> +++ b/Documentation/topics/dpdk/vhost-user.rst
> @@ -130,11 +130,10 @@ an additional set of parameters::
>      -netdev type=vhost-user,id=mynet2,chardev=char2,vhostforce
>      -device virtio-net-pci,mac=00:00:00:00:00:02,netdev=mynet2
> 
> -In addition,       QEMU must allocate the VM's memory on hugetlbfs.
> vhost-user
> -ports access a virtio-net device's virtual rings and packet buffers
> mapping the -VM's physical memory on hugetlbfs. To enable vhost-user ports
> to map the VM's -memory into their process address space, pass the
> following parameters to
> -QEMU::
> +In addition, QEMU must allocate the VM's memory on hugetlbfs.
> +vhost-user ports access a virtio-net device's virtual rings and packet
> +buffers mapping the VM's physical memory on hugetlbfs. To enable
> +vhost-user ports to map the VM's memory into their process address space,
> pass the following parameters to QEMU::
> 
>      -object memory-backend-file,id=mem,size=4096M,mem-
> path=/dev/hugepages,share=on
>      -numa node,memdev=mem -mem-prealloc @@ -154,18 +153,18 @@ where:
>    The number of vectors, which is ``$q`` * 2 + 2
> 
>  The vhost-user interface will be automatically reconfigured with required
> -number of rx and tx queues after connection of virtio device.  Manual
> +number of Rx and Tx queues after connection of virtio device.  Manual
>  configuration of ``n_rxq`` is not supported because OVS will work
> properly only  if ``n_rxq`` will match number of queues configured in
> QEMU.
> 
> -A least 2 PMDs should be configured for the vswitch when using
> multiqueue.
> +A least two PMDs should be configured for the vswitch when using
> multiqueue.
>  Using a single PMD will cause traffic to be enqueued to the same vhost
> queue  rather than being distributed among different vhost queues for a
> vhost-user  interface.
> 
>  If traffic destined for a VM configured with multiqueue arrives to the
> vswitch -via a physical DPDK port, then the number of rxqs should also be
> set to at -least 2 for that physical DPDK port. This is required to
> increase the
> +via a physical DPDK port, then the number of Rx queues should also be
> +set to at least two for that physical DPDK port. This is required to
> +increase the
>  probability that a different PMD will handle the multiqueue transmission
> to the  guest using a different vhost queue.
> 
> --
> 2.14.3
Stephen Finucane April 19, 2018, 12:41 p.m. UTC | #2
On Wed, 2018-04-18 at 15:31 +0000, Stokes, Ian wrote:
> > This continues the breakup of the huge DPDK "howto" into smaller
> > components. There are a couple of related changes included, such as using
> > "Rx queue" instead of "rxq" and noting how Tx queues cannot be configured.
> > 
> > Signed-off-by: Stephen Finucane <stephen@that.guru>
> > ---
> > v2:
> > - Add cross-references from 'pmd' doc to 'vhost-user' and 'phy' docs
> > - Add 'versionchanged' warning about automatic assignment of Rx queues
> > - Add a 'todo' to describe Tx queue behavior
> > ---
> >  Documentation/howto/dpdk.rst             |  86 -----------------
> >  Documentation/topics/dpdk/index.rst      |   1 +
> >  Documentation/topics/dpdk/phy.rst        |  12 +++
> >  Documentation/topics/dpdk/pmd.rst        | 156
> > +++++++++++++++++++++++++++++++
> >  Documentation/topics/dpdk/vhost-user.rst |  17 ++--
> >  5 files changed, 177 insertions(+), 95 deletions(-)  create mode 100644
> > Documentation/topics/dpdk/pmd.rst
> > 
> > diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst
> > index 79b626c76..388728363 100644
> > --- a/Documentation/howto/dpdk.rst
> > +++ b/Documentation/howto/dpdk.rst
> > @@ -81,92 +81,6 @@ To stop ovs-vswitchd & delete bridge, run::
> >      $ ovs-appctl -t ovsdb-server exit
> >      $ ovs-vsctl del-br br0
> > 
> > -PMD Thread Statistics
> > ----------------------
> > -
> > -To show current stats::
> > -
> > -    $ ovs-appctl dpif-netdev/pmd-stats-show
> > -
> > -To clear previous stats::
> > -
> > -    $ ovs-appctl dpif-netdev/pmd-stats-clear
> > -
> > -Port/RXQ Assigment to PMD Threads
> > ----------------------------------
> > -
> > -To show port/rxq assignment::
> > -
> > -    $ ovs-appctl dpif-netdev/pmd-rxq-show
> > -
> > -To change default rxq assignment to pmd threads, rxqs may be manually
> > pinned to -desired cores using::
> > -
> > -    $ ovs-vsctl set Interface <iface> \
> > -        other_config:pmd-rxq-affinity=<rxq-affinity-list>
> > -
> > -where:
> > -
> > -- ``<rxq-affinity-list>`` is a CSV list of ``<queue-id>:<core-id>``
> > values
> > -
> > -For example::
> > -
> > -    $ ovs-vsctl set interface dpdk-p0 options:n_rxq=4 \
> > -        other_config:pmd-rxq-affinity="0:3,1:7,3:8"
> > -
> > -This will ensure:
> > -
> > -- Queue #0 pinned to core 3
> > -- Queue #1 pinned to core 7
> > -- Queue #2 not pinned
> > -- Queue #3 pinned to core 8
> > -
> > -After that PMD threads on cores where RX queues was pinned will become -
> > ``isolated``. This means that this thread will poll only pinned RX queues.
> > -
> > -.. warning::
> > -  If there are no ``non-isolated`` PMD threads, ``non-pinned`` RX queues
> > will
> > -  not be polled. Also, if provided ``core_id`` is not available (ex. this
> > -  ``core_id`` not in ``pmd-cpu-mask``), RX queue will not be polled by
> > any PMD
> > -  thread.
> > -
> > -If pmd-rxq-affinity is not set for rxqs, they will be assigned to pmds
> > (cores) -automatically. The processing cycles that have been stored for
> > each rxq -will be used where known to assign rxqs to pmd based on a round
> > robin of the -sorted rxqs.
> > -
> > -For example, in the case where here there are 5 rxqs and 3 cores (e.g.
> > 3,7,8) -available, and the measured usage of core cycles per rxq over the
> > last -interval is seen to be:
> > -
> > -- Queue #0: 30%
> > -- Queue #1: 80%
> > -- Queue #3: 60%
> > -- Queue #4: 70%
> > -- Queue #5: 10%
> > -
> > -The rxqs will be assigned to cores 3,7,8 in the following order:
> > -
> > -Core 3: Q1 (80%) |
> > -Core 7: Q4 (70%) | Q5 (10%)
> > -core 8: Q3 (60%) | Q0 (30%)
> > -
> > -To see the current measured usage history of pmd core cycles for each
> > rxq::
> > -
> > -    $ ovs-appctl dpif-netdev/pmd-rxq-show
> > -
> > -.. note::
> > -
> > -  A history of one minute is recorded and shown for each rxq to allow for
> > -  traffic pattern spikes. An rxq's pmd core cycles usage changes due to
> > traffic
> > -  pattern or reconfig changes will take one minute before they are fully
> > -  reflected in the stats.
> > -
> > -Rxq to pmds assignment takes place whenever there are configuration
> > changes -or can be triggered by using::
> > -
> > -    $ ovs-appctl dpif-netdev/pmd-rxq-rebalance
> > -
> >  QoS
> >  ---
> > 
> > diff --git a/Documentation/topics/dpdk/index.rst
> > b/Documentation/topics/dpdk/index.rst
> > index 5f836a6e9..dfde88377 100644
> > --- a/Documentation/topics/dpdk/index.rst
> > +++ b/Documentation/topics/dpdk/index.rst
> > @@ -31,3 +31,4 @@ The DPDK Datapath
> >     phy
> >     vhost-user
> >     ring
> > +   pmd
> > diff --git a/Documentation/topics/dpdk/phy.rst
> > b/Documentation/topics/dpdk/phy.rst
> > index a3f8b475c..ad191dad0 100644
> > --- a/Documentation/topics/dpdk/phy.rst
> > +++ b/Documentation/topics/dpdk/phy.rst
> > @@ -113,3 +113,15 @@ tool::
> >  For more information, refer to the `DPDK documentation <dpdk-drivers>`__.
> > 
> >  .. _dpdk-drivers: http://dpdk.org/doc/guides/linux_gsg/linux_drivers.html
> > +
> > +.. _dpdk-phy-multiqueue:
> > +
> > +Multiqueue
> > +----------
> > +
> > +Poll Mode Driver (PMD) threads are the threads that do the heavy
> > +lifting for the DPDK datapath. Correct configuration of PMD threads and
> > +the Rx queues they utilize is a requirement in order to deliver the
> > +high-performance possible with DPDK acceleration. It is possible to
> > +configure multiple Rx queues for ``dpdk`` ports, thus ensuring this is
> > +not a bottleneck for performance. For information on configuring PMD
> > threads, refer to :doc:`pmd`.
> > diff --git a/Documentation/topics/dpdk/pmd.rst
> > b/Documentation/topics/dpdk/pmd.rst
> > new file mode 100644
> > index 000000000..1be25ade0
> > --- /dev/null
> > +++ b/Documentation/topics/dpdk/pmd.rst
> 
> Will cause compilation failure, pmd.rst not listed in Documentation/automake.mk

Done.

> > @@ -0,0 +1,156 @@
> > +..
> > +      Licensed under the Apache License, Version 2.0 (the "License"); you
> > may
> > +      not use this file except in compliance with the License. You may
> > obtain
> > +      a copy of the License at
> > +
> > +          http://www.apache.org/licenses/LICENSE-2.0
> > +
> > +      Unless required by applicable law or agreed to in writing, software
> > +      distributed under the License is distributed on an "AS IS" BASIS,
> > WITHOUT
> > +      WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> > See the
> > +      License for the specific language governing permissions and
> > limitations
> > +      under the License.
> > +
> > +      Convention for heading levels in Open vSwitch documentation:
> > +
> > +      =======  Heading 0 (reserved for the title in a document)
> > +      -------  Heading 1
> > +      ~~~~~~~  Heading 2
> > +      +++++++  Heading 3
> > +      '''''''  Heading 4
> > +
> > +      Avoid deeper levels because they do not render well.
> > +
> > +===========
> > +PMD Threads
> > +===========
> > +
> > +Poll Mode Driver (PMD) threads are the threads that do the heavy
> > +lifting for the DPDK datapath and perform tasks such as continuous
> > +polling of input ports for packets, classifying packets once received,
> > +and executing actions on the packets once they are classified.
> > +
> > +PMD threads utilize Receive (Rx) and Transmit (Tx) queues, commonly
> > +known as *rxq*\s and *txq*\s. While Tx queue configuration happens
> > +automatically, Rx queues can be configured by the user. This can happen
> > in one of two ways:
> > +
> > +- For physical interfaces, configuration is done using the
> > +  :program:`ovs-appctl` utility.
> > +
> > +- For virtual interfaces, configuration is done using the
> > +:program:`ovs-appctl`
> > +  utility, but this configuration must be reflected in the guest
> > +configuration
> > +  (e.g. QEMU command line arguments).
> > +
> > +The :program:`ovs-appctl` utility also provides a number of commands
> > +for querying PMD threads and their respective queues. This, and all of
> > +the above, is discussed here.
> > +
> > +.. todo::
> > +
> > +   Add an overview of Tx queues including numbers created, how they
> > relate to
> > +   PMD threads, etc.
> > +
> > +PMD Thread Statistics
> > +---------------------
> > +
> > +To show current stats::
> > +
> > +    $ ovs-appctl dpif-netdev/pmd-stats-show
> > +
> > +To clear previous stats::
> > +
> > +    $ ovs-appctl dpif-netdev/pmd-stats-clear
> > +
> > +Port/Rx Queue Assigment to PMD Threads
> > +--------------------------------------
> > +
> > +.. todo::
> > +
> > +   This needs a more detailed overview of *why* this should be done,
> > along with
> > +   the impact on things like NUMA affinity.
> > +
> > +Correct configuration of PMD threads and the Rx queues they utilize is
> > +a requirement in order to achieve maximum performance. This is
> > +particularly true for enabling things like multiqueue for
> > +:ref:`physical <dpdk-phy-multiqueue>` and :ref:`vhost-user <dpdk-vhost-
> > user>` interfaces.
> > +
> > +To show port/Rx queue assignment::
> > +
> > +    $ ovs-appctl dpif-netdev/pmd-rxq-show
> > +
> > +Rx queues may be manually pinned to cores. This will change the default
> > +Rx queue assignment to PMD threads::
> > +
> > +    $ ovs-vsctl set Interface <iface> \
> > +        other_config:pmd-rxq-affinity=<rxq-affinity-list>
> > +
> > +where:
> > +
> > +- ``<rxq-affinity-list>`` is a CSV list of ``<queue-id>:<core-id>``
> > +values
> > +
> > +For example::
> > +
> > +    $ ovs-vsctl set interface dpdk-p0 options:n_rxq=4 \
> > +        other_config:pmd-rxq-affinity="0:3,1:7,3:8"
> > +
> > +This will ensure there are *4* Rx queues and that these queues are
> > +configured like so:
> > +
> > +- Queue #0 pinned to core 3
> > +- Queue #1 pinned to core 7
> > +- Queue #2 not pinned
> > +- Queue #3 pinned to core 8
> > +
> > +PMD threads on cores where Rx queues are *pinned* will become
> > +*isolated*. This means that this thread will only poll the *pinned* Rx
> > queues.
> > +
> > +.. warning::
> > +
> > +   If there are no *non-isolated* PMD threads, *non-pinned* RX queues
> > will not
> > +   be polled. Also, if the provided ``<core-id>`` is not available (e.g.
> > the
> > +   ``<core-id>`` is not in ``pmd-cpu-mask``), the RX queue will not be
> > polled
> > +   by any PMD thread.
> > +
> > +If ``pmd-rxq-affinity`` is not set for Rx queues, they will be assigned
> > +to PMDs
> > +(cores) automatically. Where known, the processing cycles that have
> > +been stored for each Rx queue will be used to assign Rx queue to PMDs
> > +based on a round robin of the sorted Rx queues. For example, take the
> > +following example, where there are five Rx queues and three cores - 3,
> > +7, and 8 - available and the measured usage of core cycles per Rx queue
> > +over the last interval is seen to
> > +be:
> > +
> > +- Queue #0: 30%
> > +- Queue #1: 80%
> > +- Queue #3: 60%
> > +- Queue #4: 70%
> > +- Queue #5: 10%
> > +
> > +The Rx queues will be assigned to the cores in the following order::
> > +
> > +    Core 3: Q1 (80%) |
> > +    Core 7: Q4 (70%) | Q5 (10%)
> > +    Core 8: Q3 (60%) | Q0 (30%)
> > +
> > +To see the current measured usage history of PMD core cycles for each
> > +Rx
> > +queue::
> > +
> > +    $ ovs-appctl dpif-netdev/pmd-rxq-show
> > +
> > +.. note::
> > +
> > +   A history of one minute is recorded and shown for each Rx queue to
> > allow for
> > +   traffic pattern spikes. Any changes in the Rx queue's PMD core cycles
> > usage,
> > +   due to traffic pattern or reconfig changes, will take one minute to be
> > fully
> > +   reflected in the stats.
> > +
> > +Rx queue to PMD assignment takes place whenever there are configuration
> > +changes or can be triggered by using::
> > +
> > +    $ ovs-appctl dpif-netdev/pmd-rxq-rebalance
> > +
> > +.. versionchanged:: 2.8.0
> > +
> > +   Automatic assignment of Rx queues to PMDs and the two related
> > commands,
> > +   ``pmd-rxq-show`` and ``pmd-rxq-rebalance``, were added in OVS 2.8.0.
> > Prior
> > +   to this, behavior was round-robin and processing cycles were not taken
> > into
> > +   consideration. Tracking for stats was not available.
> 
> In 2.9 the output was changed to include % usage, this wasn't present in 2.8. Could be worth mentioning.

I assume you're referring to ``pmd-rxq-show``? If not, feel free to
correct what I've done in v3 at merge time :)

Stephen

> Ian 
> 
> > diff --git a/Documentation/topics/dpdk/vhost-user.rst
> > b/Documentation/topics/dpdk/vhost-user.rst
> > index ca8a3289f..6f794f296 100644
> > --- a/Documentation/topics/dpdk/vhost-user.rst
> > +++ b/Documentation/topics/dpdk/vhost-user.rst
> > @@ -130,11 +130,10 @@ an additional set of parameters::
> >      -netdev type=vhost-user,id=mynet2,chardev=char2,vhostforce
> >      -device virtio-net-pci,mac=00:00:00:00:00:02,netdev=mynet2
> > 
> > -In addition,       QEMU must allocate the VM's memory on hugetlbfs.
> > vhost-user
> > -ports access a virtio-net device's virtual rings and packet buffers
> > mapping the -VM's physical memory on hugetlbfs. To enable vhost-user ports
> > to map the VM's -memory into their process address space, pass the
> > following parameters to
> > -QEMU::
> > +In addition, QEMU must allocate the VM's memory on hugetlbfs.
> > +vhost-user ports access a virtio-net device's virtual rings and packet
> > +buffers mapping the VM's physical memory on hugetlbfs. To enable
> > +vhost-user ports to map the VM's memory into their process address space,
> > pass the following parameters to QEMU::
> > 
> >      -object memory-backend-file,id=mem,size=4096M,mem-
> > path=/dev/hugepages,share=on
> >      -numa node,memdev=mem -mem-prealloc @@ -154,18 +153,18 @@ where:
> >    The number of vectors, which is ``$q`` * 2 + 2
> > 
> >  The vhost-user interface will be automatically reconfigured with required
> > -number of rx and tx queues after connection of virtio device.  Manual
> > +number of Rx and Tx queues after connection of virtio device.  Manual
> >  configuration of ``n_rxq`` is not supported because OVS will work
> > properly only  if ``n_rxq`` will match number of queues configured in
> > QEMU.
> > 
> > -A least 2 PMDs should be configured for the vswitch when using
> > multiqueue.
> > +A least two PMDs should be configured for the vswitch when using
> > multiqueue.
> >  Using a single PMD will cause traffic to be enqueued to the same vhost
> > queue  rather than being distributed among different vhost queues for a
> > vhost-user  interface.
> > 
> >  If traffic destined for a VM configured with multiqueue arrives to the
> > vswitch -via a physical DPDK port, then the number of rxqs should also be
> > set to at -least 2 for that physical DPDK port. This is required to
> > increase the
> > +via a physical DPDK port, then the number of Rx queues should also be
> > +set to at least two for that physical DPDK port. This is required to
> > +increase the
> >  probability that a different PMD will handle the multiqueue transmission
> > to the  guest using a different vhost queue.
> > 
> > --
> > 2.14.3
> 
>
diff mbox series

Patch

diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst
index 79b626c76..388728363 100644
--- a/Documentation/howto/dpdk.rst
+++ b/Documentation/howto/dpdk.rst
@@ -81,92 +81,6 @@  To stop ovs-vswitchd & delete bridge, run::
     $ ovs-appctl -t ovsdb-server exit
     $ ovs-vsctl del-br br0
 
-PMD Thread Statistics
----------------------
-
-To show current stats::
-
-    $ ovs-appctl dpif-netdev/pmd-stats-show
-
-To clear previous stats::
-
-    $ ovs-appctl dpif-netdev/pmd-stats-clear
-
-Port/RXQ Assigment to PMD Threads
----------------------------------
-
-To show port/rxq assignment::
-
-    $ ovs-appctl dpif-netdev/pmd-rxq-show
-
-To change default rxq assignment to pmd threads, rxqs may be manually pinned to
-desired cores using::
-
-    $ ovs-vsctl set Interface <iface> \
-        other_config:pmd-rxq-affinity=<rxq-affinity-list>
-
-where:
-
-- ``<rxq-affinity-list>`` is a CSV list of ``<queue-id>:<core-id>`` values
-
-For example::
-
-    $ ovs-vsctl set interface dpdk-p0 options:n_rxq=4 \
-        other_config:pmd-rxq-affinity="0:3,1:7,3:8"
-
-This will ensure:
-
-- Queue #0 pinned to core 3
-- Queue #1 pinned to core 7
-- Queue #2 not pinned
-- Queue #3 pinned to core 8
-
-After that PMD threads on cores where RX queues was pinned will become
-``isolated``. This means that this thread will poll only pinned RX queues.
-
-.. warning::
-  If there are no ``non-isolated`` PMD threads, ``non-pinned`` RX queues will
-  not be polled. Also, if provided ``core_id`` is not available (ex. this
-  ``core_id`` not in ``pmd-cpu-mask``), RX queue will not be polled by any PMD
-  thread.
-
-If pmd-rxq-affinity is not set for rxqs, they will be assigned to pmds (cores)
-automatically. The processing cycles that have been stored for each rxq
-will be used where known to assign rxqs to pmd based on a round robin of the
-sorted rxqs.
-
-For example, in the case where here there are 5 rxqs and 3 cores (e.g. 3,7,8)
-available, and the measured usage of core cycles per rxq over the last
-interval is seen to be:
-
-- Queue #0: 30%
-- Queue #1: 80%
-- Queue #3: 60%
-- Queue #4: 70%
-- Queue #5: 10%
-
-The rxqs will be assigned to cores 3,7,8 in the following order:
-
-Core 3: Q1 (80%) |
-Core 7: Q4 (70%) | Q5 (10%)
-core 8: Q3 (60%) | Q0 (30%)
-
-To see the current measured usage history of pmd core cycles for each rxq::
-
-    $ ovs-appctl dpif-netdev/pmd-rxq-show
-
-.. note::
-
-  A history of one minute is recorded and shown for each rxq to allow for
-  traffic pattern spikes. An rxq's pmd core cycles usage changes due to traffic
-  pattern or reconfig changes will take one minute before they are fully
-  reflected in the stats.
-
-Rxq to pmds assignment takes place whenever there are configuration changes
-or can be triggered by using::
-
-    $ ovs-appctl dpif-netdev/pmd-rxq-rebalance
-
 QoS
 ---
 
diff --git a/Documentation/topics/dpdk/index.rst b/Documentation/topics/dpdk/index.rst
index 5f836a6e9..dfde88377 100644
--- a/Documentation/topics/dpdk/index.rst
+++ b/Documentation/topics/dpdk/index.rst
@@ -31,3 +31,4 @@  The DPDK Datapath
    phy
    vhost-user
    ring
+   pmd
diff --git a/Documentation/topics/dpdk/phy.rst b/Documentation/topics/dpdk/phy.rst
index a3f8b475c..ad191dad0 100644
--- a/Documentation/topics/dpdk/phy.rst
+++ b/Documentation/topics/dpdk/phy.rst
@@ -113,3 +113,15 @@  tool::
 For more information, refer to the `DPDK documentation <dpdk-drivers>`__.
 
 .. _dpdk-drivers: http://dpdk.org/doc/guides/linux_gsg/linux_drivers.html
+
+.. _dpdk-phy-multiqueue:
+
+Multiqueue
+----------
+
+Poll Mode Driver (PMD) threads are the threads that do the heavy lifting for
+the DPDK datapath. Correct configuration of PMD threads and the Rx queues they
+utilize is a requirement in order to deliver the high-performance possible with
+DPDK acceleration. It is possible to configure multiple Rx queues for ``dpdk``
+ports, thus ensuring this is not a bottleneck for performance. For information
+on configuring PMD threads, refer to :doc:`pmd`.
diff --git a/Documentation/topics/dpdk/pmd.rst b/Documentation/topics/dpdk/pmd.rst
new file mode 100644
index 000000000..1be25ade0
--- /dev/null
+++ b/Documentation/topics/dpdk/pmd.rst
@@ -0,0 +1,156 @@ 
+..
+      Licensed under the Apache License, Version 2.0 (the "License"); you may
+      not use this file except in compliance with the License. You may obtain
+      a copy of the License at
+
+          http://www.apache.org/licenses/LICENSE-2.0
+
+      Unless required by applicable law or agreed to in writing, software
+      distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+      WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+      License for the specific language governing permissions and limitations
+      under the License.
+
+      Convention for heading levels in Open vSwitch documentation:
+
+      =======  Heading 0 (reserved for the title in a document)
+      -------  Heading 1
+      ~~~~~~~  Heading 2
+      +++++++  Heading 3
+      '''''''  Heading 4
+
+      Avoid deeper levels because they do not render well.
+
+===========
+PMD Threads
+===========
+
+Poll Mode Driver (PMD) threads are the threads that do the heavy lifting for
+the DPDK datapath and perform tasks such as continuous polling of input ports
+for packets, classifying packets once received, and executing actions on the
+packets once they are classified.
+
+PMD threads utilize Receive (Rx) and Transmit (Tx) queues, commonly known as
+*rxq*\s and *txq*\s. While Tx queue configuration happens automatically, Rx
+queues can be configured by the user. This can happen in one of two ways:
+
+- For physical interfaces, configuration is done using the
+  :program:`ovs-appctl` utility.
+
+- For virtual interfaces, configuration is done using the :program:`ovs-appctl`
+  utility, but this configuration must be reflected in the guest configuration
+  (e.g. QEMU command line arguments).
+
+The :program:`ovs-appctl` utility also provides a number of commands for
+querying PMD threads and their respective queues. This, and all of the above,
+is discussed here.
+
+.. todo::
+
+   Add an overview of Tx queues including numbers created, how they relate to
+   PMD threads, etc.
+
+PMD Thread Statistics
+---------------------
+
+To show current stats::
+
+    $ ovs-appctl dpif-netdev/pmd-stats-show
+
+To clear previous stats::
+
+    $ ovs-appctl dpif-netdev/pmd-stats-clear
+
+Port/Rx Queue Assigment to PMD Threads
+--------------------------------------
+
+.. todo::
+
+   This needs a more detailed overview of *why* this should be done, along with
+   the impact on things like NUMA affinity.
+
+Correct configuration of PMD threads and the Rx queues they utilize is a
+requirement in order to achieve maximum performance. This is particularly true
+for enabling things like multiqueue for :ref:`physical <dpdk-phy-multiqueue>`
+and :ref:`vhost-user <dpdk-vhost-user>` interfaces.
+
+To show port/Rx queue assignment::
+
+    $ ovs-appctl dpif-netdev/pmd-rxq-show
+
+Rx queues may be manually pinned to cores. This will change the default Rx
+queue assignment to PMD threads::
+
+    $ ovs-vsctl set Interface <iface> \
+        other_config:pmd-rxq-affinity=<rxq-affinity-list>
+
+where:
+
+- ``<rxq-affinity-list>`` is a CSV list of ``<queue-id>:<core-id>`` values
+
+For example::
+
+    $ ovs-vsctl set interface dpdk-p0 options:n_rxq=4 \
+        other_config:pmd-rxq-affinity="0:3,1:7,3:8"
+
+This will ensure there are *4* Rx queues and that these queues are configured
+like so:
+
+- Queue #0 pinned to core 3
+- Queue #1 pinned to core 7
+- Queue #2 not pinned
+- Queue #3 pinned to core 8
+
+PMD threads on cores where Rx queues are *pinned* will become *isolated*. This
+means that this thread will only poll the *pinned* Rx queues.
+
+.. warning::
+
+   If there are no *non-isolated* PMD threads, *non-pinned* RX queues will not
+   be polled. Also, if the provided ``<core-id>`` is not available (e.g. the
+   ``<core-id>`` is not in ``pmd-cpu-mask``), the RX queue will not be polled
+   by any PMD thread.
+
+If ``pmd-rxq-affinity`` is not set for Rx queues, they will be assigned to PMDs
+(cores) automatically. Where known, the processing cycles that have been stored
+for each Rx queue will be used to assign Rx queue to PMDs based on a round
+robin of the sorted Rx queues. For example, take the following example, where
+there are five Rx queues and three cores - 3, 7, and 8 - available and the
+measured usage of core cycles per Rx queue over the last interval is seen to
+be:
+
+- Queue #0: 30%
+- Queue #1: 80%
+- Queue #3: 60%
+- Queue #4: 70%
+- Queue #5: 10%
+
+The Rx queues will be assigned to the cores in the following order::
+
+    Core 3: Q1 (80%) |
+    Core 7: Q4 (70%) | Q5 (10%)
+    Core 8: Q3 (60%) | Q0 (30%)
+
+To see the current measured usage history of PMD core cycles for each Rx
+queue::
+
+    $ ovs-appctl dpif-netdev/pmd-rxq-show
+
+.. note::
+
+   A history of one minute is recorded and shown for each Rx queue to allow for
+   traffic pattern spikes. Any changes in the Rx queue's PMD core cycles usage,
+   due to traffic pattern or reconfig changes, will take one minute to be fully
+   reflected in the stats.
+
+Rx queue to PMD assignment takes place whenever there are configuration changes
+or can be triggered by using::
+
+    $ ovs-appctl dpif-netdev/pmd-rxq-rebalance
+
+.. versionchanged:: 2.8.0
+
+   Automatic assignment of Rx queues to PMDs and the two related commands,
+   ``pmd-rxq-show`` and ``pmd-rxq-rebalance``, were added in OVS 2.8.0. Prior
+   to this, behavior was round-robin and processing cycles were not taken into
+   consideration. Tracking for stats was not available.
diff --git a/Documentation/topics/dpdk/vhost-user.rst b/Documentation/topics/dpdk/vhost-user.rst
index ca8a3289f..6f794f296 100644
--- a/Documentation/topics/dpdk/vhost-user.rst
+++ b/Documentation/topics/dpdk/vhost-user.rst
@@ -130,11 +130,10 @@  an additional set of parameters::
     -netdev type=vhost-user,id=mynet2,chardev=char2,vhostforce
     -device virtio-net-pci,mac=00:00:00:00:00:02,netdev=mynet2
 
-In addition,       QEMU must allocate the VM's memory on hugetlbfs. vhost-user
-ports access a virtio-net device's virtual rings and packet buffers mapping the
-VM's physical memory on hugetlbfs. To enable vhost-user ports to map the VM's
-memory into their process address space, pass the following parameters to
-QEMU::
+In addition, QEMU must allocate the VM's memory on hugetlbfs. vhost-user ports
+access a virtio-net device's virtual rings and packet buffers mapping the VM's
+physical memory on hugetlbfs. To enable vhost-user ports to map the VM's memory
+into their process address space, pass the following parameters to QEMU::
 
     -object memory-backend-file,id=mem,size=4096M,mem-path=/dev/hugepages,share=on
     -numa node,memdev=mem -mem-prealloc
@@ -154,18 +153,18 @@  where:
   The number of vectors, which is ``$q`` * 2 + 2
 
 The vhost-user interface will be automatically reconfigured with required
-number of rx and tx queues after connection of virtio device.  Manual
+number of Rx and Tx queues after connection of virtio device.  Manual
 configuration of ``n_rxq`` is not supported because OVS will work properly only
 if ``n_rxq`` will match number of queues configured in QEMU.
 
-A least 2 PMDs should be configured for the vswitch when using multiqueue.
+A least two PMDs should be configured for the vswitch when using multiqueue.
 Using a single PMD will cause traffic to be enqueued to the same vhost queue
 rather than being distributed among different vhost queues for a vhost-user
 interface.
 
 If traffic destined for a VM configured with multiqueue arrives to the vswitch
-via a physical DPDK port, then the number of rxqs should also be set to at
-least 2 for that physical DPDK port. This is required to increase the
+via a physical DPDK port, then the number of Rx queues should also be set to at
+least two for that physical DPDK port. This is required to increase the
 probability that a different PMD will handle the multiqueue transmission to the
 guest using a different vhost queue.