diff mbox

[v2,net-next] documentation: Document issues with VMs and XPS and drivers enabling it on their own

Message ID 20160825225543.BFA562900836@tardy
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Rick Jones Aug. 25, 2016, 10:55 p.m. UTC
From: Rick Jones <rick.jones2@hpe.com>

Since XPS was first introduced two things have happened.  Some drivers
have started enabling XPS on their own initiative, and it has been
found that when a VM is sending data through a host interface with XPS
enabled, that traffic can end-up seriously out of order.

Signed-off-by: Rick Jones <rick.jones2@hpe.com>
Reviewed-by: Alexander Duyck <alexander.h.duyck@intel.com>
---

Comments

Tom Herbert Aug. 25, 2016, 11:43 p.m. UTC | #1
On Thu, Aug 25, 2016 at 3:55 PM, Rick Jones <rick.jones2@hpe.com> wrote:
> From: Rick Jones <rick.jones2@hpe.com>
>
> Since XPS was first introduced two things have happened.  Some drivers
> have started enabling XPS on their own initiative, and it has been
> found that when a VM is sending data through a host interface with XPS
> enabled, that traffic can end-up seriously out of order.
>
> Signed-off-by: Rick Jones <rick.jones2@hpe.com>
> Reviewed-by: Alexander Duyck <alexander.h.duyck@intel.com>
> ---
>
> diff --git a/Documentation/networking/scaling.txt b/Documentation/networking/scaling.txt
> index 59f4db2..50cc888 100644
> --- a/Documentation/networking/scaling.txt
> +++ b/Documentation/networking/scaling.txt
> @@ -400,15 +400,31 @@ transport layer is responsible for setting ooo_okay appropriately. TCP,
>  for instance, sets the flag when all data for a connection has been
>  acknowledged.
>
> +When the traffic source is a VM running on the host, there is no
> +socket structure known to the host.  In this case, unless the VM is
> +itself CPU-pinned, the traffic being sent from it can end-up queued to
> +multiple transmit queues and end-up being transmitted out of order.
> +
> +In some cases this can result in a considerable loss of performance.
> +
> +In such situations, XPS should not be enabled at runtime, or
> +explicitly disabled if the NIC driver(s) in question enable it on
> +their own.  Otherwise, if possible, the VMs should be CPU pinned.
> +

This seems like it will only confuse users even more. You've clearly
identified an issue, let's figure out how to fix it.

Tom

>  ==== XPS Configuration
>
> -XPS is only available if the kconfig symbol CONFIG_XPS is enabled (on by
> -default for SMP). The functionality remains disabled until explicitly
> -configured. To enable XPS, the bitmap of CPUs that may use a transmit
> -queue is configured using the sysfs file entry:
> +XPS is available only if the kconfig symbol CONFIG_XPS is enabled
> +prior to building the kernel.  It is enabled by default for SMP kernel
> +configurations.  In many cases the functionality remains disabled at
> +runtime until explicitly configured by the system administrator. To
> +enable XPS, the bitmap of CPUs that may use a transmit queue is
> +configured using the sysfs file entry:
>
>  /sys/class/net/<dev>/queues/tx-<n>/xps_cpus
>
> +However, some NIC drivers will configure XPS at runtime for the
> +interfaces they drive, via a call to netif_set_xps_queue.
> +
>  == Suggested Configuration
>
>  For a network device with a single transmission queue, XPS configuration
David Miller Aug. 27, 2016, 4:35 a.m. UTC | #2
From: Tom Herbert <tom@herbertland.com>
Date: Thu, 25 Aug 2016 16:43:35 -0700

> This seems like it will only confuse users even more. You've clearly
> identified an issue, let's figure out how to fix it.

I kinda feel the same way about this situation.
Tom Herbert Aug. 27, 2016, 7:41 p.m. UTC | #3
On Fri, Aug 26, 2016 at 9:35 PM, David Miller <davem@davemloft.net> wrote:
> From: Tom Herbert <tom@herbertland.com>
> Date: Thu, 25 Aug 2016 16:43:35 -0700
>
>> This seems like it will only confuse users even more. You've clearly
>> identified an issue, let's figure out how to fix it.
>
> I kinda feel the same way about this situation.

I'm working on XFS (as the transmit analogue to RFS). We'll track
flows enough so that we should know when it's safe to move them.

Tom
Rick Jones Aug. 29, 2016, 4:26 p.m. UTC | #4
On 08/27/2016 12:41 PM, Tom Herbert wrote:
> On Fri, Aug 26, 2016 at 9:35 PM, David Miller <davem@davemloft.net> wrote:
>> From: Tom Herbert <tom@herbertland.com>
>> Date: Thu, 25 Aug 2016 16:43:35 -0700
>>
>>> This seems like it will only confuse users even more. You've clearly
>>> identified an issue, let's figure out how to fix it.
>>
>> I kinda feel the same way about this situation.
>
> I'm working on XFS (as the transmit analogue to RFS). We'll track
> flows enough so that we should know when it's safe to move them.

Is the XFS you are working on going to subsume XPS or will the two 
continue to exist in parallel a la RPS and RFS?

rick jones
Tom Herbert Aug. 29, 2016, 5:35 p.m. UTC | #5
On Mon, Aug 29, 2016 at 9:26 AM, Rick Jones <rick.jones2@hpe.com> wrote:
> On 08/27/2016 12:41 PM, Tom Herbert wrote:
>>
>> On Fri, Aug 26, 2016 at 9:35 PM, David Miller <davem@davemloft.net> wrote:
>>>
>>> From: Tom Herbert <tom@herbertland.com>
>>> Date: Thu, 25 Aug 2016 16:43:35 -0700
>>>
>>>> This seems like it will only confuse users even more. You've clearly
>>>> identified an issue, let's figure out how to fix it.
>>>
>>>
>>> I kinda feel the same way about this situation.
>>
>>
>> I'm working on XFS (as the transmit analogue to RFS). We'll track
>> flows enough so that we should know when it's safe to move them.
>
>
> Is the XFS you are working on going to subsume XPS or will the two continue
> to exist in parallel a la RPS and RFS?
>
XPS selects the queue, XFS prevents changing the queues when OOO could
occur. I am thinking that XFS is only applicable when we don't have a
socket tracking OOO. Idea is to have a flow table indexed by packet
hash that gives the current TX queue for match flows. The TX queue
comes from doing XPS. We only change the queue used by flows if that
won't result in OOO as determined by tracking the queue similar to how
we do RFS.

Tom

> rick jones
>
diff mbox

Patch

diff --git a/Documentation/networking/scaling.txt b/Documentation/networking/scaling.txt
index 59f4db2..50cc888 100644
--- a/Documentation/networking/scaling.txt
+++ b/Documentation/networking/scaling.txt
@@ -400,15 +400,31 @@  transport layer is responsible for setting ooo_okay appropriately. TCP,
 for instance, sets the flag when all data for a connection has been
 acknowledged.
 
+When the traffic source is a VM running on the host, there is no
+socket structure known to the host.  In this case, unless the VM is
+itself CPU-pinned, the traffic being sent from it can end-up queued to
+multiple transmit queues and end-up being transmitted out of order.
+
+In some cases this can result in a considerable loss of performance.
+
+In such situations, XPS should not be enabled at runtime, or
+explicitly disabled if the NIC driver(s) in question enable it on
+their own.  Otherwise, if possible, the VMs should be CPU pinned.
+
 ==== XPS Configuration
 
-XPS is only available if the kconfig symbol CONFIG_XPS is enabled (on by
-default for SMP). The functionality remains disabled until explicitly
-configured. To enable XPS, the bitmap of CPUs that may use a transmit
-queue is configured using the sysfs file entry:
+XPS is available only if the kconfig symbol CONFIG_XPS is enabled
+prior to building the kernel.  It is enabled by default for SMP kernel
+configurations.  In many cases the functionality remains disabled at
+runtime until explicitly configured by the system administrator. To
+enable XPS, the bitmap of CPUs that may use a transmit queue is
+configured using the sysfs file entry:
 
 /sys/class/net/<dev>/queues/tx-<n>/xps_cpus
 
+However, some NIC drivers will configure XPS at runtime for the
+interfaces they drive, via a call to netif_set_xps_queue.
+
 == Suggested Configuration
 
 For a network device with a single transmission queue, XPS configuration