diff mbox series

[ovs-dev,2/8] doc: Add "PMD" topic document

Message ID 20180212181306.6674-3-stephen@that.guru
State Changes Requested
Delegated to: Ian Stokes
Headers show
Series Split up the DPDK howto | expand

Commit Message

Stephen Finucane Feb. 12, 2018, 6:13 p.m. UTC
This continues the breakup of the huge DPDK "howto" into smaller
components. There are a couple of related changes included, such as
using "Rx queue" instead of "rxq" and noting how Tx queues cannot be
configured.

We enable the TODO directive, so we can actually start calling out some
TODOs.

Signed-off-by: Stephen Finucane <stephen@that.guru>
---
 Documentation/conf.py                    |   2 +-
 Documentation/howto/dpdk.rst             |  86 -------------------
 Documentation/topics/dpdk/index.rst      |   1 +
 Documentation/topics/dpdk/phy.rst        |  10 +++
 Documentation/topics/dpdk/pmd.rst        | 139 +++++++++++++++++++++++++++++++
 Documentation/topics/dpdk/vhost-user.rst |  17 ++--
 6 files changed, 159 insertions(+), 96 deletions(-)
 create mode 100644 Documentation/topics/dpdk/pmd.rst

Comments

Stokes, Ian April 9, 2018, 3:16 p.m. UTC | #1
> This continues the breakup of the huge DPDK "howto" into smaller
> components. There are a couple of related changes included, such as using
> "Rx queue" instead of "rxq" and noting how Tx queues cannot be configured.
> 
> We enable the TODO directive, so we can actually start calling out some
> TODOs.
> 
> Signed-off-by: Stephen Finucane <stephen@that.guru>
> ---
>  Documentation/conf.py                    |   2 +-
>  Documentation/howto/dpdk.rst             |  86 -------------------
>  Documentation/topics/dpdk/index.rst      |   1 +
>  Documentation/topics/dpdk/phy.rst        |  10 +++
>  Documentation/topics/dpdk/pmd.rst        | 139
> +++++++++++++++++++++++++++++++
>  Documentation/topics/dpdk/vhost-user.rst |  17 ++--
>  6 files changed, 159 insertions(+), 96 deletions(-)  create mode 100644
> Documentation/topics/dpdk/pmd.rst
> 
> diff --git a/Documentation/conf.py b/Documentation/conf.py index
> 6ab144c5d..babda21de 100644
> --- a/Documentation/conf.py
> +++ b/Documentation/conf.py
> @@ -32,7 +32,7 @@ needs_sphinx = '1.1'
>  # Add any Sphinx extension module names here, as strings. They can be  #
> extensions coming with Sphinx (named 'sphinx.ext.*') or your custom  #
> ones.
> -extensions = []
> +extensions = ['sphinx.ext.todo']
> 
>  # Add any paths that contain templates here, relative to this directory.
>  templates_path = ['_templates']
> diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst
> index d717d2ebe..c2324118d 100644
> --- a/Documentation/howto/dpdk.rst
> +++ b/Documentation/howto/dpdk.rst
> @@ -81,92 +81,6 @@ To stop ovs-vswitchd & delete bridge, run::
>      $ ovs-appctl -t ovsdb-server exit
>      $ ovs-vsctl del-br br0
> 
> -PMD Thread Statistics
> ----------------------
> -
> -To show current stats::
> -
> -    $ ovs-appctl dpif-netdev/pmd-stats-show
> -
> -To clear previous stats::
> -
> -    $ ovs-appctl dpif-netdev/pmd-stats-clear
> -
> -Port/RXQ Assigment to PMD Threads
> ----------------------------------
> -
> -To show port/rxq assignment::
> -
> -    $ ovs-appctl dpif-netdev/pmd-rxq-show
> -
> -To change default rxq assignment to pmd threads, rxqs may be manually
> pinned to -desired cores using::
> -
> -    $ ovs-vsctl set Interface <iface> \
> -        other_config:pmd-rxq-affinity=<rxq-affinity-list>
> -
> -where:
> -
> -- ``<rxq-affinity-list>`` is a CSV list of ``<queue-id>:<core-id>``
> values
> -
> -For example::
> -
> -    $ ovs-vsctl set interface dpdk-p0 options:n_rxq=4 \
> -        other_config:pmd-rxq-affinity="0:3,1:7,3:8"
> -
> -This will ensure:
> -
> -- Queue #0 pinned to core 3
> -- Queue #1 pinned to core 7
> -- Queue #2 not pinned
> -- Queue #3 pinned to core 8
> -
> -After that PMD threads on cores where RX queues was pinned will become -
> ``isolated``. This means that this thread will poll only pinned RX queues.
> -
> -.. warning::
> -  If there are no ``non-isolated`` PMD threads, ``non-pinned`` RX queues
> will
> -  not be polled. Also, if provided ``core_id`` is not available (ex. this
> -  ``core_id`` not in ``pmd-cpu-mask``), RX queue will not be polled by
> any PMD
> -  thread.
> -
> -If pmd-rxq-affinity is not set for rxqs, they will be assigned to pmds
> (cores) -automatically. The processing cycles that have been stored for
> each rxq -will be used where known to assign rxqs to pmd based on a round
> robin of the -sorted rxqs.
> -
> -For example, in the case where here there are 5 rxqs and 3 cores (e.g.
> 3,7,8) -available, and the measured usage of core cycles per rxq over the
> last -interval is seen to be:
> -
> -- Queue #0: 30%
> -- Queue #1: 80%
> -- Queue #3: 60%
> -- Queue #4: 70%
> -- Queue #5: 10%
> -
> -The rxqs will be assigned to cores 3,7,8 in the following order:
> -
> -Core 3: Q1 (80%) |
> -Core 7: Q4 (70%) | Q5 (10%)
> -core 8: Q3 (60%) | Q0 (30%)
> -
> -To see the current measured usage history of pmd core cycles for each
> rxq::
> -
> -    $ ovs-appctl dpif-netdev/pmd-rxq-show
> -
> -.. note::
> -
> -  A history of one minute is recorded and shown for each rxq to allow for
> -  traffic pattern spikes. An rxq's pmd core cycles usage changes due to
> traffic
> -  pattern or reconfig changes will take one minute before they are fully
> -  reflected in the stats.
> -
> -Rxq to pmds assignment takes place whenever there are configuration
> changes -or can be triggered by using::
> -
> -    $ ovs-appctl dpif-netdev/pmd-rxq-rebalance
> -
>  QoS
>  ---
> 
> diff --git a/Documentation/topics/dpdk/index.rst
> b/Documentation/topics/dpdk/index.rst
> index 5f836a6e9..dfde88377 100644
> --- a/Documentation/topics/dpdk/index.rst
> +++ b/Documentation/topics/dpdk/index.rst
> @@ -31,3 +31,4 @@ The DPDK Datapath
>     phy
>     vhost-user
>     ring
> +   pmd
> diff --git a/Documentation/topics/dpdk/phy.rst
> b/Documentation/topics/dpdk/phy.rst
> index 1c18e4e3d..222fa3e9f 100644
> --- a/Documentation/topics/dpdk/phy.rst
> +++ b/Documentation/topics/dpdk/phy.rst
> @@ -109,3 +109,13 @@ tool::
>  For more information, refer to the `DPDK documentation <dpdk-drivers>`__.
> 
>  .. _dpdk-drivers: http://dpdk.org/doc/guides/linux_gsg/linux_drivers.html
> +
> +Multiqueue
> +----------
> +
> +Poll Mode Driver (PMD) threads are the threads that do the heavy
> +lifting for the DPDK datapath. Correct configuration of PMD threads and
> +the Rx queues they utilize is a requirement in order to deliver the
> +high-performance possible with the DPDK datapath. It is possible to
> +configure multiple Rx queues for ``dpdk`` ports, thus ensuring this is
> +not a bottleneck for performance. For information on configuring PMD
> threads, refer to :doc:`pmd`.
> diff --git a/Documentation/topics/dpdk/pmd.rst
> b/Documentation/topics/dpdk/pmd.rst
> new file mode 100644
> index 000000000..e15e8cc3b
> --- /dev/null
> +++ b/Documentation/topics/dpdk/pmd.rst
> @@ -0,0 +1,139 @@
> +..
> +      Licensed under the Apache License, Version 2.0 (the "License"); you
> may
> +      not use this file except in compliance with the License. You may
> obtain
> +      a copy of the License at
> +
> +          http://www.apache.org/licenses/LICENSE-2.0
> +
> +      Unless required by applicable law or agreed to in writing, software
> +      distributed under the License is distributed on an "AS IS" BASIS,
> WITHOUT
> +      WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> See the
> +      License for the specific language governing permissions and
> limitations
> +      under the License.
> +
> +      Convention for heading levels in Open vSwitch documentation:
> +
> +      =======  Heading 0 (reserved for the title in a document)
> +      -------  Heading 1
> +      ~~~~~~~  Heading 2
> +      +++++++  Heading 3
> +      '''''''  Heading 4
> +
> +      Avoid deeper levels because they do not render well.
> +
> +===========
> +PMD Threads
> +===========
> +
> +Poll Mode Driver (PMD) threads are the threads that do the heavy
> +lifting for the DPDK datapath and perform tasks such as continuous
> +polling of input ports for packets, classifying packets once received,
> +and executing actions on the packets once they are classified.
> +
> +PMD threads utilize Receive (Rx) and Transmit (Tx) queues, commonly
> +known as *rxq*\s and *txq*\s. While Tx queue configuration happens
> +automatically, Rx queues can be configured by the user. This can happen
> in one of two ways:

Just on above, could be a to-do but it's a good opportunity to add a note on the "automatic" behavior of tx queues, number created and how it relates to the number of PMDs etc. Could be a separate section in the PMD doc.

> +
> +- For physical interfaces, configuration is done using the
> +  :program:`ovs-appctl` utility.
> +
> +- For virtual interfaces, configuration is done using the
> +:program:`ovs-appctl`
> +  utility, but this configuration must be reflected in the guest
> +configuration
> +  (e.g. QEMU command line arguments).
> +
> +The :program:`ovs-appctl` utility also provides a number of commands
> +for querying PMD threads and their respective queues. This, and all of
> +the above, is discussed here.
> +
> +PMD Thread Statistics
> +---------------------
> +
> +To show current stats::
> +
> +    $ ovs-appctl dpif-netdev/pmd-stats-show
> +
> +To clear previous stats::
> +
> +    $ ovs-appctl dpif-netdev/pmd-stats-clear
> +
> +Port/Rx Queue Assigment to PMD Threads
> +--------------------------------------
> +
> +.. todo::
> +
> +   This needs a more detailed overview of *why* this should be done,
> along with
> +   the impact on things like NUMA affinity.
> +
> +To show port/RX queue assignment::
> +
> +    $ ovs-appctl dpif-netdev/pmd-rxq-show
> +
> +Rx queues may be manually pinned to cores. This will change the default
> +Rx queue assignment to PMD threads::
> +
> +    $ ovs-vsctl set Interface <iface> \
> +        other_config:pmd-rxq-affinity=<rxq-affinity-list>
> +
> +where:
> +
> +- ``<rxq-affinity-list>`` is a CSV list of ``<queue-id>:<core-id>``
> +values
> +
> +For example::
> +
> +    $ ovs-vsctl set interface dpdk-p0 options:n_rxq=4 \
> +        other_config:pmd-rxq-affinity="0:3,1:7,3:8"
> +
> +This will ensure there are *4* Rx queues and that these queues are
> +configured like so:
> +
> +- Queue #0 pinned to core 3
> +- Queue #1 pinned to core 7
> +- Queue #2 not pinned
> +- Queue #3 pinned to core 8
> +
> +PMD threads on cores where Rx queues are *pinned* will become
> +*isolated*. This means that this thread will only poll the *pinned* Rx
> queues.
> +
> +.. warning::
> +
> +  If there are no *non-isolated* PMD threads, *non-pinned* RX queues
> + will not  be polled. Also, if the provided ``<core-id>`` is not
> + available (e.g. the  ``<core-id>`` is not in ``pmd-cpu-mask``), the RX
> + queue will not be polled by  any PMD thread.
> +
> +If ``pmd-rxq-affinity`` is not set for Rx queues, they will be assigned
> +to PMDs
> +(cores) automatically. Where known, the processing cycles that have
> +been stored for each Rx queue will be used to assign Rx queue to PMDs
> +based on a round robin of the sorted Rx queues. For example, take the
> +following example, where there are five Rx queues and three cores - 3,
> +7, and 8 - available and the measured usage of core cycles per Rx queue
> +over the last interval is seen to
> +be:
> +
> +- Queue #0: 30%
> +- Queue #1: 80%
> +- Queue #3: 60%
> +- Queue #4: 70%
> +- Queue #5: 10%
> +
> +The Rx queues will be assigned to the cores in the following order:
> +
> +Core 3: Q1 (80%) |
> +Core 7: Q4 (70%) | Q5 (10%)
> +core 8: Q3 (60%) | Q0 (30%)
> +

This functionality was introduced in OVS 2.8.
Do we need to warn the user with a versionchanged:: 2.8.0 and that it's unavailable prior to this?
The behavior in that case was round robin without taking processing cycles into consideration.
There would also be no history tracking for the stats and no pmd rebalance command.

> +To see the current measured usage history of PMD core cycles for each
> +Rx
> +queue::
> +
> +    $ ovs-appctl dpif-netdev/pmd-rxq-show
> +
> +.. note::
> +
> +  A history of one minute is recorded and shown for each Rx queue to
> + allow for  traffic pattern spikes. Any changes in the Rx queue's PMD
> + core cycles usage,  due to traffic pattern or reconfig changes, will
> + take one minute to be fully  reflected in the stats.
> +
> +Rx queue to PMD assignment takes place whenever there are configuration
> +changes or can be triggered by using::
> +
> +    $ ovs-appctl dpif-netdev/pmd-rxq-rebalance

We should probably flag to users considerations for PMD and multi queue specific to phy and vhost ports.

Perhaps a link to the specific documents below along with the heads up:

Documentation/topics/dpdk/vhost-user.rst
Documentation/topics/dpdk/phy.rst

Ian

> diff --git a/Documentation/topics/dpdk/vhost-user.rst
> b/Documentation/topics/dpdk/vhost-user.rst
> index 95517a676..d84d99246 100644
> --- a/Documentation/topics/dpdk/vhost-user.rst
> +++ b/Documentation/topics/dpdk/vhost-user.rst
> @@ -127,11 +127,10 @@ an additional set of parameters::
>      -netdev type=vhost-user,id=mynet2,chardev=char2,vhostforce
>      -device virtio-net-pci,mac=00:00:00:00:00:02,netdev=mynet2
> 
> -In addition,       QEMU must allocate the VM's memory on hugetlbfs.
> vhost-user
> -ports access a virtio-net device's virtual rings and packet buffers
> mapping the -VM's physical memory on hugetlbfs. To enable vhost-user ports
> to map the VM's -memory into their process address space, pass the
> following parameters to
> -QEMU::
> +In addition, QEMU must allocate the VM's memory on hugetlbfs.
> +vhost-user ports access a virtio-net device's virtual rings and packet
> +buffers mapping the VM's physical memory on hugetlbfs. To enable
> +vhost-user ports to map the VM's memory into their process address space,
> pass the following parameters to QEMU::
> 
>      -object memory-backend-file,id=mem,size=4096M,mem-
> path=/dev/hugepages,share=on
>      -numa node,memdev=mem -mem-prealloc @@ -151,18 +150,18 @@ where:
>    The number of vectors, which is ``$q`` * 2 + 2
> 
>  The vhost-user interface will be automatically reconfigured with required
> -number of rx and tx queues after connection of virtio device.  Manual
> +number of Rx and Tx queues after connection of virtio device.  Manual
>  configuration of ``n_rxq`` is not supported because OVS will work
> properly only  if ``n_rxq`` will match number of queues configured in
> QEMU.
> 
> -A least 2 PMDs should be configured for the vswitch when using
> multiqueue.
> +A least two PMDs should be configured for the vswitch when using
> multiqueue.
>  Using a single PMD will cause traffic to be enqueued to the same vhost
> queue  rather than being distributed among different vhost queues for a
> vhost-user  interface.
> 
>  If traffic destined for a VM configured with multiqueue arrives to the
> vswitch -via a physical DPDK port, then the number of rxqs should also be
> set to at -least 2 for that physical DPDK port. This is required to
> increase the
> +via a physical DPDK port, then the number of Rx queues should also be
> +set to at least two for that physical DPDK port. This is required to
> +increase the
>  probability that a different PMD will handle the multiqueue transmission
> to the  guest using a different vhost queue.
> 
> --
> 2.14.3
> 
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Stephen Finucane April 16, 2018, 1:18 p.m. UTC | #2
On Mon, 2018-04-09 at 15:16 +0000, Stokes, Ian wrote:
> > This continues the breakup of the huge DPDK "howto" into smaller
> > components. There are a couple of related changes included, such as using
> > "Rx queue" instead of "rxq" and noting how Tx queues cannot be configured.
> > 
> > We enable the TODO directive, so we can actually start calling out some
> > TODOs.
> > 
> > Signed-off-by: Stephen Finucane <stephen@that.guru>
> > ---
> >  Documentation/conf.py                    |   2 +-
> >  Documentation/howto/dpdk.rst             |  86 -------------------
> >  Documentation/topics/dpdk/index.rst      |   1 +
> >  Documentation/topics/dpdk/phy.rst        |  10 +++
> >  Documentation/topics/dpdk/pmd.rst        | 139
> > +++++++++++++++++++++++++++++++
> >  Documentation/topics/dpdk/vhost-user.rst |  17 ++--
> >  6 files changed, 159 insertions(+), 96 deletions(-)  create mode 100644
> > Documentation/topics/dpdk/pmd.rst
> > 
> > diff --git a/Documentation/conf.py b/Documentation/conf.py index
> > 6ab144c5d..babda21de 100644
> > --- a/Documentation/conf.py
> > +++ b/Documentation/conf.py
> > @@ -32,7 +32,7 @@ needs_sphinx = '1.1'
> >  # Add any Sphinx extension module names here, as strings. They can be  #
> > extensions coming with Sphinx (named 'sphinx.ext.*') or your custom  #
> > ones.
> > -extensions = []
> > +extensions = ['sphinx.ext.todo']
> > 
> >  # Add any paths that contain templates here, relative to this directory.
> >  templates_path = ['_templates']
> > diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst
> > index d717d2ebe..c2324118d 100644
> > --- a/Documentation/howto/dpdk.rst
> > +++ b/Documentation/howto/dpdk.rst
> > @@ -81,92 +81,6 @@ To stop ovs-vswitchd & delete bridge, run::
> >      $ ovs-appctl -t ovsdb-server exit
> >      $ ovs-vsctl del-br br0
> > 
> > -PMD Thread Statistics
> > ----------------------
> > -
> > -To show current stats::
> > -
> > -    $ ovs-appctl dpif-netdev/pmd-stats-show
> > -
> > -To clear previous stats::
> > -
> > -    $ ovs-appctl dpif-netdev/pmd-stats-clear
> > -
> > -Port/RXQ Assigment to PMD Threads
> > ----------------------------------
> > -
> > -To show port/rxq assignment::
> > -
> > -    $ ovs-appctl dpif-netdev/pmd-rxq-show
> > -
> > -To change default rxq assignment to pmd threads, rxqs may be manually
> > pinned to -desired cores using::
> > -
> > -    $ ovs-vsctl set Interface <iface> \
> > -        other_config:pmd-rxq-affinity=<rxq-affinity-list>
> > -
> > -where:
> > -
> > -- ``<rxq-affinity-list>`` is a CSV list of ``<queue-id>:<core-id>``
> > values
> > -
> > -For example::
> > -
> > -    $ ovs-vsctl set interface dpdk-p0 options:n_rxq=4 \
> > -        other_config:pmd-rxq-affinity="0:3,1:7,3:8"
> > -
> > -This will ensure:
> > -
> > -- Queue #0 pinned to core 3
> > -- Queue #1 pinned to core 7
> > -- Queue #2 not pinned
> > -- Queue #3 pinned to core 8
> > -
> > -After that PMD threads on cores where RX queues was pinned will become -
> > ``isolated``. This means that this thread will poll only pinned RX queues.
> > -
> > -.. warning::
> > -  If there are no ``non-isolated`` PMD threads, ``non-pinned`` RX queues
> > will
> > -  not be polled. Also, if provided ``core_id`` is not available (ex. this
> > -  ``core_id`` not in ``pmd-cpu-mask``), RX queue will not be polled by
> > any PMD
> > -  thread.
> > -
> > -If pmd-rxq-affinity is not set for rxqs, they will be assigned to pmds
> > (cores) -automatically. The processing cycles that have been stored for
> > each rxq -will be used where known to assign rxqs to pmd based on a round
> > robin of the -sorted rxqs.
> > -
> > -For example, in the case where here there are 5 rxqs and 3 cores (e.g.
> > 3,7,8) -available, and the measured usage of core cycles per rxq over the
> > last -interval is seen to be:
> > -
> > -- Queue #0: 30%
> > -- Queue #1: 80%
> > -- Queue #3: 60%
> > -- Queue #4: 70%
> > -- Queue #5: 10%
> > -
> > -The rxqs will be assigned to cores 3,7,8 in the following order:
> > -
> > -Core 3: Q1 (80%) |
> > -Core 7: Q4 (70%) | Q5 (10%)
> > -core 8: Q3 (60%) | Q0 (30%)
> > -
> > -To see the current measured usage history of pmd core cycles for each
> > rxq::
> > -
> > -    $ ovs-appctl dpif-netdev/pmd-rxq-show
> > -
> > -.. note::
> > -
> > -  A history of one minute is recorded and shown for each rxq to allow for
> > -  traffic pattern spikes. An rxq's pmd core cycles usage changes due to
> > traffic
> > -  pattern or reconfig changes will take one minute before they are fully
> > -  reflected in the stats.
> > -
> > -Rxq to pmds assignment takes place whenever there are configuration
> > changes -or can be triggered by using::
> > -
> > -    $ ovs-appctl dpif-netdev/pmd-rxq-rebalance
> > -
> >  QoS
> >  ---
> > 
> > diff --git a/Documentation/topics/dpdk/index.rst
> > b/Documentation/topics/dpdk/index.rst
> > index 5f836a6e9..dfde88377 100644
> > --- a/Documentation/topics/dpdk/index.rst
> > +++ b/Documentation/topics/dpdk/index.rst
> > @@ -31,3 +31,4 @@ The DPDK Datapath
> >     phy
> >     vhost-user
> >     ring
> > +   pmd
> > diff --git a/Documentation/topics/dpdk/phy.rst
> > b/Documentation/topics/dpdk/phy.rst
> > index 1c18e4e3d..222fa3e9f 100644
> > --- a/Documentation/topics/dpdk/phy.rst
> > +++ b/Documentation/topics/dpdk/phy.rst
> > @@ -109,3 +109,13 @@ tool::
> >  For more information, refer to the `DPDK documentation <dpdk-drivers>`__.
> > 
> >  .. _dpdk-drivers: http://dpdk.org/doc/guides/linux_gsg/linux_drivers.html
> > +
> > +Multiqueue
> > +----------
> > +
> > +Poll Mode Driver (PMD) threads are the threads that do the heavy
> > +lifting for the DPDK datapath. Correct configuration of PMD threads and
> > +the Rx queues they utilize is a requirement in order to deliver the
> > +high-performance possible with the DPDK datapath. It is possible to
> > +configure multiple Rx queues for ``dpdk`` ports, thus ensuring this is
> > +not a bottleneck for performance. For information on configuring PMD
> > threads, refer to :doc:`pmd`.
> > diff --git a/Documentation/topics/dpdk/pmd.rst
> > b/Documentation/topics/dpdk/pmd.rst
> > new file mode 100644
> > index 000000000..e15e8cc3b
> > --- /dev/null
> > +++ b/Documentation/topics/dpdk/pmd.rst
> > @@ -0,0 +1,139 @@
> > +..
> > +      Licensed under the Apache License, Version 2.0 (the "License"); you
> > may
> > +      not use this file except in compliance with the License. You may
> > obtain
> > +      a copy of the License at
> > +
> > +          http://www.apache.org/licenses/LICENSE-2.0
> > +
> > +      Unless required by applicable law or agreed to in writing, software
> > +      distributed under the License is distributed on an "AS IS" BASIS,
> > WITHOUT
> > +      WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> > See the
> > +      License for the specific language governing permissions and
> > limitations
> > +      under the License.
> > +
> > +      Convention for heading levels in Open vSwitch documentation:
> > +
> > +      =======  Heading 0 (reserved for the title in a document)
> > +      -------  Heading 1
> > +      ~~~~~~~  Heading 2
> > +      +++++++  Heading 3
> > +      '''''''  Heading 4
> > +
> > +      Avoid deeper levels because they do not render well.
> > +
> > +===========
> > +PMD Threads
> > +===========
> > +
> > +Poll Mode Driver (PMD) threads are the threads that do the heavy
> > +lifting for the DPDK datapath and perform tasks such as continuous
> > +polling of input ports for packets, classifying packets once received,
> > +and executing actions on the packets once they are classified.
> > +
> > +PMD threads utilize Receive (Rx) and Transmit (Tx) queues, commonly
> > +known as *rxq*\s and *txq*\s. While Tx queue configuration happens
> > +automatically, Rx queues can be configured by the user. This can happen
> > in one of two ways:
> 
> Just on above, could be a to-do but it's a good opportunity to add a
> note on the "automatic" behavior of tx queues, number created and how
> it relates to the number of PMDs etc. Could be a separate section in
> the PMD doc.

Yeah, if it's OK with you I'll add this as a TODO and then work with
you to write this additional section.

> > +
> > +- For physical interfaces, configuration is done using the
> > +  :program:`ovs-appctl` utility.
> > +
> > +- For virtual interfaces, configuration is done using the
> > +:program:`ovs-appctl`
> > +  utility, but this configuration must be reflected in the guest
> > +configuration
> > +  (e.g. QEMU command line arguments).
> > +
> > +The :program:`ovs-appctl` utility also provides a number of commands
> > +for querying PMD threads and their respective queues. This, and all of
> > +the above, is discussed here.
> > +
> > +PMD Thread Statistics
> > +---------------------
> > +
> > +To show current stats::
> > +
> > +    $ ovs-appctl dpif-netdev/pmd-stats-show
> > +
> > +To clear previous stats::
> > +
> > +    $ ovs-appctl dpif-netdev/pmd-stats-clear
> > +
> > +Port/Rx Queue Assigment to PMD Threads
> > +--------------------------------------
> > +
> > +.. todo::
> > +
> > +   This needs a more detailed overview of *why* this should be done,
> > along with
> > +   the impact on things like NUMA affinity.
> > +
> > +To show port/RX queue assignment::
> > +
> > +    $ ovs-appctl dpif-netdev/pmd-rxq-show
> > +
> > +Rx queues may be manually pinned to cores. This will change the default
> > +Rx queue assignment to PMD threads::
> > +
> > +    $ ovs-vsctl set Interface <iface> \
> > +        other_config:pmd-rxq-affinity=<rxq-affinity-list>
> > +
> > +where:
> > +
> > +- ``<rxq-affinity-list>`` is a CSV list of ``<queue-id>:<core-id>``
> > +values
> > +
> > +For example::
> > +
> > +    $ ovs-vsctl set interface dpdk-p0 options:n_rxq=4 \
> > +        other_config:pmd-rxq-affinity="0:3,1:7,3:8"
> > +
> > +This will ensure there are *4* Rx queues and that these queues are
> > +configured like so:
> > +
> > +- Queue #0 pinned to core 3
> > +- Queue #1 pinned to core 7
> > +- Queue #2 not pinned
> > +- Queue #3 pinned to core 8
> > +
> > +PMD threads on cores where Rx queues are *pinned* will become
> > +*isolated*. This means that this thread will only poll the *pinned* Rx
> > queues.
> > +
> > +.. warning::
> > +
> > +  If there are no *non-isolated* PMD threads, *non-pinned* RX queues
> > + will not  be polled. Also, if the provided ``<core-id>`` is not
> > + available (e.g. the  ``<core-id>`` is not in ``pmd-cpu-mask``), the RX
> > + queue will not be polled by  any PMD thread.
> > +
> > +If ``pmd-rxq-affinity`` is not set for Rx queues, they will be assigned
> > +to PMDs
> > +(cores) automatically. Where known, the processing cycles that have
> > +been stored for each Rx queue will be used to assign Rx queue to PMDs
> > +based on a round robin of the sorted Rx queues. For example, take the
> > +following example, where there are five Rx queues and three cores - 3,
> > +7, and 8 - available and the measured usage of core cycles per Rx queue
> > +over the last interval is seen to
> > +be:
> > +
> > +- Queue #0: 30%
> > +- Queue #1: 80%
> > +- Queue #3: 60%
> > +- Queue #4: 70%
> > +- Queue #5: 10%
> > +
> > +The Rx queues will be assigned to the cores in the following order:
> > +
> > +Core 3: Q1 (80%) |
> > +Core 7: Q4 (70%) | Q5 (10%)
> > +core 8: Q3 (60%) | Q0 (30%)
> > +
> 
> This functionality was introduced in OVS 2.8. Do we need to warn the
> user with a versionchanged:: 2.8.0 and that it's unavailable prior to
> this? The behavior in that case was round robin without taking
> processing cycles into consideration. There would also be no history
> tracking for the stats and no pmd rebalance command.

Yes, I'll add this.

> > +To see the current measured usage history of PMD core cycles for each
> > +Rx
> > +queue::
> > +
> > +    $ ovs-appctl dpif-netdev/pmd-rxq-show
> > +
> > +.. note::
> > +
> > +  A history of one minute is recorded and shown for each Rx queue to
> > + allow for  traffic pattern spikes. Any changes in the Rx queue's PMD
> > + core cycles usage,  due to traffic pattern or reconfig changes, will
> > + take one minute to be fully  reflected in the stats.
> > +
> > +Rx queue to PMD assignment takes place whenever there are configuration
> > +changes or can be triggered by using::
> > +
> > +    $ ovs-appctl dpif-netdev/pmd-rxq-rebalance
> 
> We should probably flag to users considerations for PMD and multi queue specific to phy and vhost ports.
> 
> Perhaps a link to the specific documents below along with the heads up:
> 
> Documentation/topics/dpdk/vhost-user.rst
> Documentation/topics/dpdk/phy.rst

Yup, good call. Done.

Stephen

> Ian
> 
> > diff --git a/Documentation/topics/dpdk/vhost-user.rst
> > b/Documentation/topics/dpdk/vhost-user.rst
> > index 95517a676..d84d99246 100644
> > --- a/Documentation/topics/dpdk/vhost-user.rst
> > +++ b/Documentation/topics/dpdk/vhost-user.rst
> > @@ -127,11 +127,10 @@ an additional set of parameters::
> >      -netdev type=vhost-user,id=mynet2,chardev=char2,vhostforce
> >      -device virtio-net-pci,mac=00:00:00:00:00:02,netdev=mynet2
> > 
> > -In addition,       QEMU must allocate the VM's memory on hugetlbfs.
> > vhost-user
> > -ports access a virtio-net device's virtual rings and packet buffers
> > mapping the -VM's physical memory on hugetlbfs. To enable vhost-user ports
> > to map the VM's -memory into their process address space, pass the
> > following parameters to
> > -QEMU::
> > +In addition, QEMU must allocate the VM's memory on hugetlbfs.
> > +vhost-user ports access a virtio-net device's virtual rings and packet
> > +buffers mapping the VM's physical memory on hugetlbfs. To enable
> > +vhost-user ports to map the VM's memory into their process address space,
> > pass the following parameters to QEMU::
> > 
> >      -object memory-backend-file,id=mem,size=4096M,mem-
> > path=/dev/hugepages,share=on
> >      -numa node,memdev=mem -mem-prealloc @@ -151,18 +150,18 @@ where:
> >    The number of vectors, which is ``$q`` * 2 + 2
> > 
> >  The vhost-user interface will be automatically reconfigured with required
> > -number of rx and tx queues after connection of virtio device.  Manual
> > +number of Rx and Tx queues after connection of virtio device.  Manual
> >  configuration of ``n_rxq`` is not supported because OVS will work
> > properly only  if ``n_rxq`` will match number of queues configured in
> > QEMU.
> > 
> > -A least 2 PMDs should be configured for the vswitch when using
> > multiqueue.
> > +A least two PMDs should be configured for the vswitch when using
> > multiqueue.
> >  Using a single PMD will cause traffic to be enqueued to the same vhost
> > queue  rather than being distributed among different vhost queues for a
> > vhost-user  interface.
> > 
> >  If traffic destined for a VM configured with multiqueue arrives to the
> > vswitch -via a physical DPDK port, then the number of rxqs should also be
> > set to at -least 2 for that physical DPDK port. This is required to
> > increase the
> > +via a physical DPDK port, then the number of Rx queues should also be
> > +set to at least two for that physical DPDK port. This is required to
> > +increase the
> >  probability that a different PMD will handle the multiqueue transmission
> > to the  guest using a different vhost queue.
> > 
> > --
> > 2.14.3
> > 
> > _______________________________________________
> > dev mailing list
> > dev@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Stokes, Ian April 17, 2018, 11:15 a.m. UTC | #3
> On Mon, 2018-04-09 at 15:16 +0000, Stokes, Ian wrote:
> > > This continues the breakup of the huge DPDK "howto" into smaller
> > > components. There are a couple of related changes included, such as
> > > using "Rx queue" instead of "rxq" and noting how Tx queues cannot be
> configured.
> > >
> > > We enable the TODO directive, so we can actually start calling out
> > > some TODOs.
> > >
> > > Signed-off-by: Stephen Finucane <stephen@that.guru>
> > > ---
> > >  Documentation/conf.py                    |   2 +-
> > >  Documentation/howto/dpdk.rst             |  86 -------------------
> > >  Documentation/topics/dpdk/index.rst      |   1 +
> > >  Documentation/topics/dpdk/phy.rst        |  10 +++
> > >  Documentation/topics/dpdk/pmd.rst        | 139
> > > +++++++++++++++++++++++++++++++
> > >  Documentation/topics/dpdk/vhost-user.rst |  17 ++--
> > >  6 files changed, 159 insertions(+), 96 deletions(-)  create mode
> > > 100644 Documentation/topics/dpdk/pmd.rst
> > >
> > > diff --git a/Documentation/conf.py b/Documentation/conf.py index
> > > 6ab144c5d..babda21de 100644
> > > --- a/Documentation/conf.py
> > > +++ b/Documentation/conf.py
> > > @@ -32,7 +32,7 @@ needs_sphinx = '1.1'
> > >  # Add any Sphinx extension module names here, as strings. They can
> > > be  # extensions coming with Sphinx (named 'sphinx.ext.*') or your
> > > custom  # ones.
> > > -extensions = []
> > > +extensions = ['sphinx.ext.todo']
> > >
> > >  # Add any paths that contain templates here, relative to this
> directory.
> > >  templates_path = ['_templates']
> > > diff --git a/Documentation/howto/dpdk.rst
> > > b/Documentation/howto/dpdk.rst index d717d2ebe..c2324118d 100644
> > > --- a/Documentation/howto/dpdk.rst
> > > +++ b/Documentation/howto/dpdk.rst
> > > @@ -81,92 +81,6 @@ To stop ovs-vswitchd & delete bridge, run::
> > >      $ ovs-appctl -t ovsdb-server exit
> > >      $ ovs-vsctl del-br br0
> > >
> > > -PMD Thread Statistics
> > > ----------------------
> > > -
> > > -To show current stats::
> > > -
> > > -    $ ovs-appctl dpif-netdev/pmd-stats-show
> > > -
> > > -To clear previous stats::
> > > -
> > > -    $ ovs-appctl dpif-netdev/pmd-stats-clear
> > > -
> > > -Port/RXQ Assigment to PMD Threads
> > > ----------------------------------
> > > -
> > > -To show port/rxq assignment::
> > > -
> > > -    $ ovs-appctl dpif-netdev/pmd-rxq-show
> > > -
> > > -To change default rxq assignment to pmd threads, rxqs may be
> > > manually pinned to -desired cores using::
> > > -
> > > -    $ ovs-vsctl set Interface <iface> \
> > > -        other_config:pmd-rxq-affinity=<rxq-affinity-list>
> > > -
> > > -where:
> > > -
> > > -- ``<rxq-affinity-list>`` is a CSV list of ``<queue-id>:<core-id>``
> > > values
> > > -
> > > -For example::
> > > -
> > > -    $ ovs-vsctl set interface dpdk-p0 options:n_rxq=4 \
> > > -        other_config:pmd-rxq-affinity="0:3,1:7,3:8"
> > > -
> > > -This will ensure:
> > > -
> > > -- Queue #0 pinned to core 3
> > > -- Queue #1 pinned to core 7
> > > -- Queue #2 not pinned
> > > -- Queue #3 pinned to core 8
> > > -
> > > -After that PMD threads on cores where RX queues was pinned will
> > > become - ``isolated``. This means that this thread will poll only
> pinned RX queues.
> > > -
> > > -.. warning::
> > > -  If there are no ``non-isolated`` PMD threads, ``non-pinned`` RX
> > > queues will
> > > -  not be polled. Also, if provided ``core_id`` is not available
> > > (ex. this
> > > -  ``core_id`` not in ``pmd-cpu-mask``), RX queue will not be polled
> > > by any PMD
> > > -  thread.
> > > -
> > > -If pmd-rxq-affinity is not set for rxqs, they will be assigned to
> > > pmds
> > > (cores) -automatically. The processing cycles that have been stored
> > > for each rxq -will be used where known to assign rxqs to pmd based
> > > on a round robin of the -sorted rxqs.
> > > -
> > > -For example, in the case where here there are 5 rxqs and 3 cores
> (e.g.
> > > 3,7,8) -available, and the measured usage of core cycles per rxq
> > > over the last -interval is seen to be:
> > > -
> > > -- Queue #0: 30%
> > > -- Queue #1: 80%
> > > -- Queue #3: 60%
> > > -- Queue #4: 70%
> > > -- Queue #5: 10%
> > > -
> > > -The rxqs will be assigned to cores 3,7,8 in the following order:
> > > -
> > > -Core 3: Q1 (80%) |
> > > -Core 7: Q4 (70%) | Q5 (10%)
> > > -core 8: Q3 (60%) | Q0 (30%)
> > > -
> > > -To see the current measured usage history of pmd core cycles for
> > > each
> > > rxq::
> > > -
> > > -    $ ovs-appctl dpif-netdev/pmd-rxq-show
> > > -
> > > -.. note::
> > > -
> > > -  A history of one minute is recorded and shown for each rxq to
> > > allow for
> > > -  traffic pattern spikes. An rxq's pmd core cycles usage changes
> > > due to traffic
> > > -  pattern or reconfig changes will take one minute before they are
> > > fully
> > > -  reflected in the stats.
> > > -
> > > -Rxq to pmds assignment takes place whenever there are configuration
> > > changes -or can be triggered by using::
> > > -
> > > -    $ ovs-appctl dpif-netdev/pmd-rxq-rebalance
> > > -
> > >  QoS
> > >  ---
> > >
> > > diff --git a/Documentation/topics/dpdk/index.rst
> > > b/Documentation/topics/dpdk/index.rst
> > > index 5f836a6e9..dfde88377 100644
> > > --- a/Documentation/topics/dpdk/index.rst
> > > +++ b/Documentation/topics/dpdk/index.rst
> > > @@ -31,3 +31,4 @@ The DPDK Datapath
> > >     phy
> > >     vhost-user
> > >     ring
> > > +   pmd
> > > diff --git a/Documentation/topics/dpdk/phy.rst
> > > b/Documentation/topics/dpdk/phy.rst
> > > index 1c18e4e3d..222fa3e9f 100644
> > > --- a/Documentation/topics/dpdk/phy.rst
> > > +++ b/Documentation/topics/dpdk/phy.rst
> > > @@ -109,3 +109,13 @@ tool::
> > >  For more information, refer to the `DPDK documentation <dpdk-
> drivers>`__.
> > >
> > >  .. _dpdk-drivers:
> > > http://dpdk.org/doc/guides/linux_gsg/linux_drivers.html
> > > +
> > > +Multiqueue
> > > +----------
> > > +
> > > +Poll Mode Driver (PMD) threads are the threads that do the heavy
> > > +lifting for the DPDK datapath. Correct configuration of PMD threads
> > > +and the Rx queues they utilize is a requirement in order to deliver
> > > +the high-performance possible with the DPDK datapath. It is
> > > +possible to configure multiple Rx queues for ``dpdk`` ports, thus
> > > +ensuring this is not a bottleneck for performance. For information
> > > +on configuring PMD
> > > threads, refer to :doc:`pmd`.
> > > diff --git a/Documentation/topics/dpdk/pmd.rst
> > > b/Documentation/topics/dpdk/pmd.rst
> > > new file mode 100644
> > > index 000000000..e15e8cc3b
> > > --- /dev/null
> > > +++ b/Documentation/topics/dpdk/pmd.rst
> > > @@ -0,0 +1,139 @@
> > > +..
> > > +      Licensed under the Apache License, Version 2.0 (the
> > > +"License"); you
> > > may
> > > +      not use this file except in compliance with the License. You
> > > + may
> > > obtain
> > > +      a copy of the License at
> > > +
> > > +          http://www.apache.org/licenses/LICENSE-2.0
> > > +
> > > +      Unless required by applicable law or agreed to in writing,
> software
> > > +      distributed under the License is distributed on an "AS IS"
> > > + BASIS,
> > > WITHOUT
> > > +      WARRANTIES OR CONDITIONS OF ANY KIND, either express or
> implied.
> > > See the
> > > +      License for the specific language governing permissions and
> > > limitations
> > > +      under the License.
> > > +
> > > +      Convention for heading levels in Open vSwitch documentation:
> > > +
> > > +      =======  Heading 0 (reserved for the title in a document)
> > > +      -------  Heading 1
> > > +      ~~~~~~~  Heading 2
> > > +      +++++++  Heading 3
> > > +      '''''''  Heading 4
> > > +
> > > +      Avoid deeper levels because they do not render well.
> > > +
> > > +===========
> > > +PMD Threads
> > > +===========
> > > +
> > > +Poll Mode Driver (PMD) threads are the threads that do the heavy
> > > +lifting for the DPDK datapath and perform tasks such as continuous
> > > +polling of input ports for packets, classifying packets once
> > > +received, and executing actions on the packets once they are
> classified.
> > > +
> > > +PMD threads utilize Receive (Rx) and Transmit (Tx) queues, commonly
> > > +known as *rxq*\s and *txq*\s. While Tx queue configuration happens
> > > +automatically, Rx queues can be configured by the user. This can
> > > +happen
> > > in one of two ways:
> >
> > Just on above, could be a to-do but it's a good opportunity to add a
> > note on the "automatic" behavior of tx queues, number created and how
> > it relates to the number of PMDs etc. Could be a separate section in
> > the PMD doc.
> 
> Yeah, if it's OK with you I'll add this as a TODO and then work with you
> to write this additional section.

Sure, just want to keep track of these and can fix them later.

> 
> > > +
> > > +- For physical interfaces, configuration is done using the
> > > +  :program:`ovs-appctl` utility.
> > > +
> > > +- For virtual interfaces, configuration is done using the
> > > +:program:`ovs-appctl`
> > > +  utility, but this configuration must be reflected in the guest
> > > +configuration
> > > +  (e.g. QEMU command line arguments).
> > > +
> > > +The :program:`ovs-appctl` utility also provides a number of
> > > +commands for querying PMD threads and their respective queues.
> > > +This, and all of the above, is discussed here.
> > > +
> > > +PMD Thread Statistics
> > > +---------------------
> > > +
> > > +To show current stats::
> > > +
> > > +    $ ovs-appctl dpif-netdev/pmd-stats-show
> > > +
> > > +To clear previous stats::
> > > +
> > > +    $ ovs-appctl dpif-netdev/pmd-stats-clear
> > > +
> > > +Port/Rx Queue Assigment to PMD Threads
> > > +--------------------------------------
> > > +
> > > +.. todo::
> > > +
> > > +   This needs a more detailed overview of *why* this should be
> > > + done,
> > > along with
> > > +   the impact on things like NUMA affinity.
> > > +
> > > +To show port/RX queue assignment::
> > > +
> > > +    $ ovs-appctl dpif-netdev/pmd-rxq-show
> > > +
> > > +Rx queues may be manually pinned to cores. This will change the
> > > +default Rx queue assignment to PMD threads::
> > > +
> > > +    $ ovs-vsctl set Interface <iface> \
> > > +        other_config:pmd-rxq-affinity=<rxq-affinity-list>
> > > +
> > > +where:
> > > +
> > > +- ``<rxq-affinity-list>`` is a CSV list of ``<queue-id>:<core-id>``
> > > +values
> > > +
> > > +For example::
> > > +
> > > +    $ ovs-vsctl set interface dpdk-p0 options:n_rxq=4 \
> > > +        other_config:pmd-rxq-affinity="0:3,1:7,3:8"
> > > +
> > > +This will ensure there are *4* Rx queues and that these queues are
> > > +configured like so:
> > > +
> > > +- Queue #0 pinned to core 3
> > > +- Queue #1 pinned to core 7
> > > +- Queue #2 not pinned
> > > +- Queue #3 pinned to core 8
> > > +
> > > +PMD threads on cores where Rx queues are *pinned* will become
> > > +*isolated*. This means that this thread will only poll the *pinned*
> > > +Rx
> > > queues.
> > > +
> > > +.. warning::
> > > +
> > > +  If there are no *non-isolated* PMD threads, *non-pinned* RX
> > > + queues will not  be polled. Also, if the provided ``<core-id>`` is
> > > + not available (e.g. the  ``<core-id>`` is not in
> > > + ``pmd-cpu-mask``), the RX queue will not be polled by  any PMD
> thread.
> > > +
> > > +If ``pmd-rxq-affinity`` is not set for Rx queues, they will be
> > > +assigned to PMDs
> > > +(cores) automatically. Where known, the processing cycles that have
> > > +been stored for each Rx queue will be used to assign Rx queue to
> > > +PMDs based on a round robin of the sorted Rx queues. For example,
> > > +take the following example, where there are five Rx queues and
> > > +three cores - 3, 7, and 8 - available and the measured usage of
> > > +core cycles per Rx queue over the last interval is seen to
> > > +be:
> > > +
> > > +- Queue #0: 30%
> > > +- Queue #1: 80%
> > > +- Queue #3: 60%
> > > +- Queue #4: 70%
> > > +- Queue #5: 10%
> > > +
> > > +The Rx queues will be assigned to the cores in the following order:
> > > +
> > > +Core 3: Q1 (80%) |
> > > +Core 7: Q4 (70%) | Q5 (10%)
> > > +core 8: Q3 (60%) | Q0 (30%)
> > > +
> >
> > This functionality was introduced in OVS 2.8. Do we need to warn the
> > user with a versionchanged:: 2.8.0 and that it's unavailable prior to
> > this? The behavior in that case was round robin without taking
> > processing cycles into consideration. There would also be no history
> > tracking for the stats and no pmd rebalance command.
> 
> Yes, I'll add this.

Just a follow up here, the pinning behavior changed in OVS 2.8 and usage was reported in processing cycles. The percentage usage output was introduced in OVS 2.9, not sure if that’s big enough to warrant a specific version change warning on 2.9.

> 
> > > +To see the current measured usage history of PMD core cycles for
> > > +each Rx
> > > +queue::
> > > +
> > > +    $ ovs-appctl dpif-netdev/pmd-rxq-show
> > > +
> > > +.. note::
> > > +
> > > +  A history of one minute is recorded and shown for each Rx queue
> > > + to allow for  traffic pattern spikes. Any changes in the Rx
> > > + queue's PMD core cycles usage,  due to traffic pattern or reconfig
> > > + changes, will take one minute to be fully  reflected in the stats.
> > > +
> > > +Rx queue to PMD assignment takes place whenever there are
> > > +configuration changes or can be triggered by using::
> > > +
> > > +    $ ovs-appctl dpif-netdev/pmd-rxq-rebalance
> >
> > We should probably flag to users considerations for PMD and multi queue
> specific to phy and vhost ports.
> >
> > Perhaps a link to the specific documents below along with the heads up:
> >
> > Documentation/topics/dpdk/vhost-user.rst
> > Documentation/topics/dpdk/phy.rst
> 
> Yup, good call. Done.
> 
> Stephen
> 
> > Ian
> >
> > > diff --git a/Documentation/topics/dpdk/vhost-user.rst
> > > b/Documentation/topics/dpdk/vhost-user.rst
> > > index 95517a676..d84d99246 100644
> > > --- a/Documentation/topics/dpdk/vhost-user.rst
> > > +++ b/Documentation/topics/dpdk/vhost-user.rst
> > > @@ -127,11 +127,10 @@ an additional set of parameters::
> > >      -netdev type=vhost-user,id=mynet2,chardev=char2,vhostforce
> > >      -device virtio-net-pci,mac=00:00:00:00:00:02,netdev=mynet2
> > >
> > > -In addition,       QEMU must allocate the VM's memory on hugetlbfs.
> > > vhost-user
> > > -ports access a virtio-net device's virtual rings and packet buffers
> > > mapping the -VM's physical memory on hugetlbfs. To enable vhost-user
> > > ports to map the VM's -memory into their process address space, pass
> > > the following parameters to
> > > -QEMU::
> > > +In addition, QEMU must allocate the VM's memory on hugetlbfs.
> > > +vhost-user ports access a virtio-net device's virtual rings and
> > > +packet buffers mapping the VM's physical memory on hugetlbfs. To
> > > +enable vhost-user ports to map the VM's memory into their process
> > > +address space,
> > > pass the following parameters to QEMU::
> > >
> > >      -object memory-backend-file,id=mem,size=4096M,mem-
> > > path=/dev/hugepages,share=on
> > >      -numa node,memdev=mem -mem-prealloc @@ -151,18 +150,18 @@ where:
> > >    The number of vectors, which is ``$q`` * 2 + 2
> > >
> > >  The vhost-user interface will be automatically reconfigured with
> > > required -number of rx and tx queues after connection of virtio
> > > device.  Manual
> > > +number of Rx and Tx queues after connection of virtio device.
> > > +Manual
> > >  configuration of ``n_rxq`` is not supported because OVS will work
> > > properly only  if ``n_rxq`` will match number of queues configured
> > > in QEMU.
> > >
> > > -A least 2 PMDs should be configured for the vswitch when using
> > > multiqueue.
> > > +A least two PMDs should be configured for the vswitch when using
> > > multiqueue.
> > >  Using a single PMD will cause traffic to be enqueued to the same
> > > vhost queue  rather than being distributed among different vhost
> > > queues for a vhost-user  interface.
> > >
> > >  If traffic destined for a VM configured with multiqueue arrives to
> > > the vswitch -via a physical DPDK port, then the number of rxqs
> > > should also be set to at -least 2 for that physical DPDK port. This
> > > is required to increase the
> > > +via a physical DPDK port, then the number of Rx queues should also
> > > +be set to at least two for that physical DPDK port. This is
> > > +required to increase the
> > >  probability that a different PMD will handle the multiqueue
> > > transmission to the  guest using a different vhost queue.
> > >
> > > --
> > > 2.14.3
> > >
> > > _______________________________________________
> > > dev mailing list
> > > dev@openvswitch.org
> > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
diff mbox series

Patch

diff --git a/Documentation/conf.py b/Documentation/conf.py
index 6ab144c5d..babda21de 100644
--- a/Documentation/conf.py
+++ b/Documentation/conf.py
@@ -32,7 +32,7 @@  needs_sphinx = '1.1'
 # Add any Sphinx extension module names here, as strings. They can be
 # extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
 # ones.
-extensions = []
+extensions = ['sphinx.ext.todo']
 
 # Add any paths that contain templates here, relative to this directory.
 templates_path = ['_templates']
diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst
index d717d2ebe..c2324118d 100644
--- a/Documentation/howto/dpdk.rst
+++ b/Documentation/howto/dpdk.rst
@@ -81,92 +81,6 @@  To stop ovs-vswitchd & delete bridge, run::
     $ ovs-appctl -t ovsdb-server exit
     $ ovs-vsctl del-br br0
 
-PMD Thread Statistics
----------------------
-
-To show current stats::
-
-    $ ovs-appctl dpif-netdev/pmd-stats-show
-
-To clear previous stats::
-
-    $ ovs-appctl dpif-netdev/pmd-stats-clear
-
-Port/RXQ Assigment to PMD Threads
----------------------------------
-
-To show port/rxq assignment::
-
-    $ ovs-appctl dpif-netdev/pmd-rxq-show
-
-To change default rxq assignment to pmd threads, rxqs may be manually pinned to
-desired cores using::
-
-    $ ovs-vsctl set Interface <iface> \
-        other_config:pmd-rxq-affinity=<rxq-affinity-list>
-
-where:
-
-- ``<rxq-affinity-list>`` is a CSV list of ``<queue-id>:<core-id>`` values
-
-For example::
-
-    $ ovs-vsctl set interface dpdk-p0 options:n_rxq=4 \
-        other_config:pmd-rxq-affinity="0:3,1:7,3:8"
-
-This will ensure:
-
-- Queue #0 pinned to core 3
-- Queue #1 pinned to core 7
-- Queue #2 not pinned
-- Queue #3 pinned to core 8
-
-After that PMD threads on cores where RX queues was pinned will become
-``isolated``. This means that this thread will poll only pinned RX queues.
-
-.. warning::
-  If there are no ``non-isolated`` PMD threads, ``non-pinned`` RX queues will
-  not be polled. Also, if provided ``core_id`` is not available (ex. this
-  ``core_id`` not in ``pmd-cpu-mask``), RX queue will not be polled by any PMD
-  thread.
-
-If pmd-rxq-affinity is not set for rxqs, they will be assigned to pmds (cores)
-automatically. The processing cycles that have been stored for each rxq
-will be used where known to assign rxqs to pmd based on a round robin of the
-sorted rxqs.
-
-For example, in the case where here there are 5 rxqs and 3 cores (e.g. 3,7,8)
-available, and the measured usage of core cycles per rxq over the last
-interval is seen to be:
-
-- Queue #0: 30%
-- Queue #1: 80%
-- Queue #3: 60%
-- Queue #4: 70%
-- Queue #5: 10%
-
-The rxqs will be assigned to cores 3,7,8 in the following order:
-
-Core 3: Q1 (80%) |
-Core 7: Q4 (70%) | Q5 (10%)
-core 8: Q3 (60%) | Q0 (30%)
-
-To see the current measured usage history of pmd core cycles for each rxq::
-
-    $ ovs-appctl dpif-netdev/pmd-rxq-show
-
-.. note::
-
-  A history of one minute is recorded and shown for each rxq to allow for
-  traffic pattern spikes. An rxq's pmd core cycles usage changes due to traffic
-  pattern or reconfig changes will take one minute before they are fully
-  reflected in the stats.
-
-Rxq to pmds assignment takes place whenever there are configuration changes
-or can be triggered by using::
-
-    $ ovs-appctl dpif-netdev/pmd-rxq-rebalance
-
 QoS
 ---
 
diff --git a/Documentation/topics/dpdk/index.rst b/Documentation/topics/dpdk/index.rst
index 5f836a6e9..dfde88377 100644
--- a/Documentation/topics/dpdk/index.rst
+++ b/Documentation/topics/dpdk/index.rst
@@ -31,3 +31,4 @@  The DPDK Datapath
    phy
    vhost-user
    ring
+   pmd
diff --git a/Documentation/topics/dpdk/phy.rst b/Documentation/topics/dpdk/phy.rst
index 1c18e4e3d..222fa3e9f 100644
--- a/Documentation/topics/dpdk/phy.rst
+++ b/Documentation/topics/dpdk/phy.rst
@@ -109,3 +109,13 @@  tool::
 For more information, refer to the `DPDK documentation <dpdk-drivers>`__.
 
 .. _dpdk-drivers: http://dpdk.org/doc/guides/linux_gsg/linux_drivers.html
+
+Multiqueue
+----------
+
+Poll Mode Driver (PMD) threads are the threads that do the heavy lifting for
+the DPDK datapath. Correct configuration of PMD threads and the Rx queues they
+utilize is a requirement in order to deliver the high-performance possible with
+the DPDK datapath. It is possible to configure multiple Rx queues for ``dpdk``
+ports, thus ensuring this is not a bottleneck for performance. For information
+on configuring PMD threads, refer to :doc:`pmd`.
diff --git a/Documentation/topics/dpdk/pmd.rst b/Documentation/topics/dpdk/pmd.rst
new file mode 100644
index 000000000..e15e8cc3b
--- /dev/null
+++ b/Documentation/topics/dpdk/pmd.rst
@@ -0,0 +1,139 @@ 
+..
+      Licensed under the Apache License, Version 2.0 (the "License"); you may
+      not use this file except in compliance with the License. You may obtain
+      a copy of the License at
+
+          http://www.apache.org/licenses/LICENSE-2.0
+
+      Unless required by applicable law or agreed to in writing, software
+      distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+      WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+      License for the specific language governing permissions and limitations
+      under the License.
+
+      Convention for heading levels in Open vSwitch documentation:
+
+      =======  Heading 0 (reserved for the title in a document)
+      -------  Heading 1
+      ~~~~~~~  Heading 2
+      +++++++  Heading 3
+      '''''''  Heading 4
+
+      Avoid deeper levels because they do not render well.
+
+===========
+PMD Threads
+===========
+
+Poll Mode Driver (PMD) threads are the threads that do the heavy lifting for
+the DPDK datapath and perform tasks such as continuous polling of input ports
+for packets, classifying packets once received, and executing actions on the
+packets once they are classified.
+
+PMD threads utilize Receive (Rx) and Transmit (Tx) queues, commonly known as
+*rxq*\s and *txq*\s. While Tx queue configuration happens automatically, Rx
+queues can be configured by the user. This can happen in one of two ways:
+
+- For physical interfaces, configuration is done using the
+  :program:`ovs-appctl` utility.
+
+- For virtual interfaces, configuration is done using the :program:`ovs-appctl`
+  utility, but this configuration must be reflected in the guest configuration
+  (e.g. QEMU command line arguments).
+
+The :program:`ovs-appctl` utility also provides a number of commands for
+querying PMD threads and their respective queues. This, and all of the above,
+is discussed here.
+
+PMD Thread Statistics
+---------------------
+
+To show current stats::
+
+    $ ovs-appctl dpif-netdev/pmd-stats-show
+
+To clear previous stats::
+
+    $ ovs-appctl dpif-netdev/pmd-stats-clear
+
+Port/Rx Queue Assigment to PMD Threads
+--------------------------------------
+
+.. todo::
+
+   This needs a more detailed overview of *why* this should be done, along with
+   the impact on things like NUMA affinity.
+
+To show port/RX queue assignment::
+
+    $ ovs-appctl dpif-netdev/pmd-rxq-show
+
+Rx queues may be manually pinned to cores. This will change the default Rx
+queue assignment to PMD threads::
+
+    $ ovs-vsctl set Interface <iface> \
+        other_config:pmd-rxq-affinity=<rxq-affinity-list>
+
+where:
+
+- ``<rxq-affinity-list>`` is a CSV list of ``<queue-id>:<core-id>`` values
+
+For example::
+
+    $ ovs-vsctl set interface dpdk-p0 options:n_rxq=4 \
+        other_config:pmd-rxq-affinity="0:3,1:7,3:8"
+
+This will ensure there are *4* Rx queues and that these queues are configured
+like so:
+
+- Queue #0 pinned to core 3
+- Queue #1 pinned to core 7
+- Queue #2 not pinned
+- Queue #3 pinned to core 8
+
+PMD threads on cores where Rx queues are *pinned* will become *isolated*. This
+means that this thread will only poll the *pinned* Rx queues.
+
+.. warning::
+
+  If there are no *non-isolated* PMD threads, *non-pinned* RX queues will not
+  be polled. Also, if the provided ``<core-id>`` is not available (e.g. the
+  ``<core-id>`` is not in ``pmd-cpu-mask``), the RX queue will not be polled by
+  any PMD thread.
+
+If ``pmd-rxq-affinity`` is not set for Rx queues, they will be assigned to PMDs
+(cores) automatically. Where known, the processing cycles that have been stored
+for each Rx queue will be used to assign Rx queue to PMDs based on a round
+robin of the sorted Rx queues. For example, take the following example, where
+there are five Rx queues and three cores - 3, 7, and 8 - available and the
+measured usage of core cycles per Rx queue over the last interval is seen to
+be:
+
+- Queue #0: 30%
+- Queue #1: 80%
+- Queue #3: 60%
+- Queue #4: 70%
+- Queue #5: 10%
+
+The Rx queues will be assigned to the cores in the following order:
+
+Core 3: Q1 (80%) |
+Core 7: Q4 (70%) | Q5 (10%)
+core 8: Q3 (60%) | Q0 (30%)
+
+To see the current measured usage history of PMD core cycles for each Rx
+queue::
+
+    $ ovs-appctl dpif-netdev/pmd-rxq-show
+
+.. note::
+
+  A history of one minute is recorded and shown for each Rx queue to allow for
+  traffic pattern spikes. Any changes in the Rx queue's PMD core cycles usage,
+  due to traffic pattern or reconfig changes, will take one minute to be fully
+  reflected in the stats.
+
+Rx queue to PMD assignment takes place whenever there are configuration changes
+or can be triggered by using::
+
+    $ ovs-appctl dpif-netdev/pmd-rxq-rebalance
diff --git a/Documentation/topics/dpdk/vhost-user.rst b/Documentation/topics/dpdk/vhost-user.rst
index 95517a676..d84d99246 100644
--- a/Documentation/topics/dpdk/vhost-user.rst
+++ b/Documentation/topics/dpdk/vhost-user.rst
@@ -127,11 +127,10 @@  an additional set of parameters::
     -netdev type=vhost-user,id=mynet2,chardev=char2,vhostforce
     -device virtio-net-pci,mac=00:00:00:00:00:02,netdev=mynet2
 
-In addition,       QEMU must allocate the VM's memory on hugetlbfs. vhost-user
-ports access a virtio-net device's virtual rings and packet buffers mapping the
-VM's physical memory on hugetlbfs. To enable vhost-user ports to map the VM's
-memory into their process address space, pass the following parameters to
-QEMU::
+In addition, QEMU must allocate the VM's memory on hugetlbfs. vhost-user ports
+access a virtio-net device's virtual rings and packet buffers mapping the VM's
+physical memory on hugetlbfs. To enable vhost-user ports to map the VM's memory
+into their process address space, pass the following parameters to QEMU::
 
     -object memory-backend-file,id=mem,size=4096M,mem-path=/dev/hugepages,share=on
     -numa node,memdev=mem -mem-prealloc
@@ -151,18 +150,18 @@  where:
   The number of vectors, which is ``$q`` * 2 + 2
 
 The vhost-user interface will be automatically reconfigured with required
-number of rx and tx queues after connection of virtio device.  Manual
+number of Rx and Tx queues after connection of virtio device.  Manual
 configuration of ``n_rxq`` is not supported because OVS will work properly only
 if ``n_rxq`` will match number of queues configured in QEMU.
 
-A least 2 PMDs should be configured for the vswitch when using multiqueue.
+A least two PMDs should be configured for the vswitch when using multiqueue.
 Using a single PMD will cause traffic to be enqueued to the same vhost queue
 rather than being distributed among different vhost queues for a vhost-user
 interface.
 
 If traffic destined for a VM configured with multiqueue arrives to the vswitch
-via a physical DPDK port, then the number of rxqs should also be set to at
-least 2 for that physical DPDK port. This is required to increase the
+via a physical DPDK port, then the number of Rx queues should also be set to at
+least two for that physical DPDK port. This is required to increase the
 probability that a different PMD will handle the multiqueue transmission to the
 guest using a different vhost queue.