diff mbox

hv: fix msi affinity when device requests all possible CPU's

Message ID 20170628232204.15227-1-sthemmin@microsoft.com
State Not Applicable
Headers show

Commit Message

Stephen Hemminger June 28, 2017, 11:22 p.m. UTC
When Intel 10G (ixgbevf) is passed to a Hyper-V guest with SR-IOV,
the driver requests affinity with all possible CPU's (0-239) even
those CPU's are not online (and will never be). Because of this the device
is unable to correctly get MSI interrupt's setup.

This was caused by the change in 4.12 that converted this affinity
into all possible CPU's (0-31) but then host reports
an error since this is larger than the number of online cpu's.

Previously, this worked (up to 4.12-rc1) because only online cpu's
would be put in mask passed to the host.

This patch applies only to 4.12.
The driver in linux-next needs a a different fix because of the changes
to PCI host protocol version.

Fixes: 433fcf6b7b31 ("PCI: hv: Specify CPU_AFFINITY_ALL for MSI affinity when >= 32 CPUs")
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
---
 drivers/pci/host/pci-hyperv.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

Jork Loeser June 29, 2017, 10:08 p.m. UTC | #1
> -----Original Message-----
> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> Sent: Wednesday, June 28, 2017 4:22 PM
> To: KY Srinivasan <kys@microsoft.com>; bhelgaas@google.com
> Cc: linux-pci@vger.kernel.org; devel@linuxdriverproject.org; Stephen
> Hemminger <sthemmin@microsoft.com>
> Subject: [PATCH] hv: fix msi affinity when device requests all possible CPU's
> 
> When Intel 10G (ixgbevf) is passed to a Hyper-V guest with SR-IOV, the driver
> requests affinity with all possible CPU's (0-239) even those CPU's are not online
> (and will never be). Because of this the device is unable to correctly get MSI
> interrupt's setup.
> 
> This was caused by the change in 4.12 that converted this affinity into all
> possible CPU's (0-31) but then host reports an error since this is larger than the
> number of online cpu's.
> 
> Previously, this worked (up to 4.12-rc1) because only online cpu's would be put
> in mask passed to the host.
> 
> This patch applies only to 4.12.
> The driver in linux-next needs a a different fix because of the changes to PCI
> host protocol version.

The vPCI patch in linux-next has the issue fixed already.

Regards,
Jork
Stephen Hemminger June 29, 2017, 11:57 p.m. UTC | #2
Patch still needed for 4.12

-----Original Message-----
From: Jork Loeser 
Sent: Thursday, June 29, 2017 3:08 PM
To: stephen@networkplumber.org; KY Srinivasan <kys@microsoft.com>; bhelgaas@google.com
Cc: linux-pci@vger.kernel.org; devel@linuxdriverproject.org; Stephen Hemminger <sthemmin@microsoft.com>
Subject: RE: [PATCH] hv: fix msi affinity when device requests all possible CPU's

> -----Original Message-----
> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> Sent: Wednesday, June 28, 2017 4:22 PM
> To: KY Srinivasan <kys@microsoft.com>; bhelgaas@google.com
> Cc: linux-pci@vger.kernel.org; devel@linuxdriverproject.org; Stephen
> Hemminger <sthemmin@microsoft.com>
> Subject: [PATCH] hv: fix msi affinity when device requests all possible CPU's
> 
> When Intel 10G (ixgbevf) is passed to a Hyper-V guest with SR-IOV, the driver
> requests affinity with all possible CPU's (0-239) even those CPU's are not online
> (and will never be). Because of this the device is unable to correctly get MSI
> interrupt's setup.
> 
> This was caused by the change in 4.12 that converted this affinity into all
> possible CPU's (0-31) but then host reports an error since this is larger than the
> number of online cpu's.
> 
> Previously, this worked (up to 4.12-rc1) because only online cpu's would be put
> in mask passed to the host.
> 
> This patch applies only to 4.12.
> The driver in linux-next needs a a different fix because of the changes to PCI
> host protocol version.

The vPCI patch in linux-next has the issue fixed already.

Regards,
Jork
Bjorn Helgaas July 2, 2017, 9:38 p.m. UTC | #3
On Wed, Jun 28, 2017 at 04:22:04PM -0700, Stephen Hemminger wrote:
> When Intel 10G (ixgbevf) is passed to a Hyper-V guest with SR-IOV,
> the driver requests affinity with all possible CPU's (0-239) even
> those CPU's are not online (and will never be). Because of this the device
> is unable to correctly get MSI interrupt's setup.
> 
> This was caused by the change in 4.12 that converted this affinity
> into all possible CPU's (0-31) but then host reports
> an error since this is larger than the number of online cpu's.
> 
> Previously, this worked (up to 4.12-rc1) because only online cpu's
> would be put in mask passed to the host.
> 
> This patch applies only to 4.12.
> The driver in linux-next needs a a different fix because of the changes
> to PCI host protocol version.

If Linus decides to postpone v4.12 a week, I can ask him to pull this.  But
I suspect he will release v4.12 today.  In that case, I don't know what to
do with this other than maybe send it to Greg for a -stable release.

> Fixes: 433fcf6b7b31 ("PCI: hv: Specify CPU_AFFINITY_ALL for MSI affinity when >= 32 CPUs")
> Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
> ---
>  drivers/pci/host/pci-hyperv.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/host/pci-hyperv.c b/drivers/pci/host/pci-hyperv.c
> index 84936383e269..3cadfcca3ae9 100644
> --- a/drivers/pci/host/pci-hyperv.c
> +++ b/drivers/pci/host/pci-hyperv.c
> @@ -900,10 +900,12 @@ static void hv_compose_msi_msg(struct irq_data *data, struct msi_msg *msg)
>  	 * processors because Hyper-V only supports 64 in a guest.
>  	 */
>  	affinity = irq_data_get_affinity_mask(data);
> +	cpumask_and(affinity, affinity, cpu_online_mask);
> +
>  	if (cpumask_weight(affinity) >= 32) {
>  		int_pkt->int_desc.cpu_mask = CPU_AFFINITY_ALL;
>  	} else {
> -		for_each_cpu_and(cpu, affinity, cpu_online_mask) {
> +		for_each_cpu(cpu, affinity) {
>  			int_pkt->int_desc.cpu_mask |=
>  				(1ULL << vmbus_cpu_number_to_vp_number(cpu));
>  		}
> -- 
> 2.11.0
>
Stephen Hemminger July 4, 2017, 9:59 p.m. UTC | #4
On Sun, 2 Jul 2017 16:38:19 -0500
Bjorn Helgaas <helgaas@kernel.org> wrote:

> On Wed, Jun 28, 2017 at 04:22:04PM -0700, Stephen Hemminger wrote:
> > When Intel 10G (ixgbevf) is passed to a Hyper-V guest with SR-IOV,
> > the driver requests affinity with all possible CPU's (0-239) even
> > those CPU's are not online (and will never be). Because of this the device
> > is unable to correctly get MSI interrupt's setup.
> > 
> > This was caused by the change in 4.12 that converted this affinity
> > into all possible CPU's (0-31) but then host reports
> > an error since this is larger than the number of online cpu's.
> > 
> > Previously, this worked (up to 4.12-rc1) because only online cpu's
> > would be put in mask passed to the host.
> > 
> > This patch applies only to 4.12.
> > The driver in linux-next needs a a different fix because of the changes
> > to PCI host protocol version.  
> 
> If Linus decides to postpone v4.12 a week, I can ask him to pull this.  But
> I suspect he will release v4.12 today.  In that case, I don't know what to
> do with this other than maybe send it to Greg for a -stable release.

Looks like this will have to be queued for 4.12 stable.
Bjorn Helgaas July 5, 2017, 7:49 p.m. UTC | #5
On Tue, Jul 04, 2017 at 02:59:42PM -0700, Stephen Hemminger wrote:
> On Sun, 2 Jul 2017 16:38:19 -0500
> Bjorn Helgaas <helgaas@kernel.org> wrote:
> 
> > On Wed, Jun 28, 2017 at 04:22:04PM -0700, Stephen Hemminger wrote:
> > > When Intel 10G (ixgbevf) is passed to a Hyper-V guest with SR-IOV,
> > > the driver requests affinity with all possible CPU's (0-239) even
> > > those CPU's are not online (and will never be). Because of this the device
> > > is unable to correctly get MSI interrupt's setup.
> > > 
> > > This was caused by the change in 4.12 that converted this affinity
> > > into all possible CPU's (0-31) but then host reports
> > > an error since this is larger than the number of online cpu's.
> > > 
> > > Previously, this worked (up to 4.12-rc1) because only online cpu's
> > > would be put in mask passed to the host.
> > > 
> > > This patch applies only to 4.12.
> > > The driver in linux-next needs a a different fix because of the changes
> > > to PCI host protocol version.  
> > 
> > If Linus decides to postpone v4.12 a week, I can ask him to pull this.  But
> > I suspect he will release v4.12 today.  In that case, I don't know what to
> > do with this other than maybe send it to Greg for a -stable release.
> 
> Looks like this will have to be queued for 4.12 stable.

I assume you'll take care of this, right?  It sounds like there's nothing
to do for upstream because it needs a different fix.

Bjorn
Stephen Hemminger July 5, 2017, 8:07 p.m. UTC | #6
On Wed, 5 Jul 2017 14:49:33 -0500
Bjorn Helgaas <helgaas@kernel.org> wrote:

> On Tue, Jul 04, 2017 at 02:59:42PM -0700, Stephen Hemminger wrote:
> > On Sun, 2 Jul 2017 16:38:19 -0500
> > Bjorn Helgaas <helgaas@kernel.org> wrote:
> >   
> > > On Wed, Jun 28, 2017 at 04:22:04PM -0700, Stephen Hemminger wrote:  
> > > > When Intel 10G (ixgbevf) is passed to a Hyper-V guest with SR-IOV,
> > > > the driver requests affinity with all possible CPU's (0-239) even
> > > > those CPU's are not online (and will never be). Because of this the device
> > > > is unable to correctly get MSI interrupt's setup.
> > > > 
> > > > This was caused by the change in 4.12 that converted this affinity
> > > > into all possible CPU's (0-31) but then host reports
> > > > an error since this is larger than the number of online cpu's.
> > > > 
> > > > Previously, this worked (up to 4.12-rc1) because only online cpu's
> > > > would be put in mask passed to the host.
> > > > 
> > > > This patch applies only to 4.12.
> > > > The driver in linux-next needs a a different fix because of the changes
> > > > to PCI host protocol version.    
> > > 
> > > If Linus decides to postpone v4.12 a week, I can ask him to pull this.  But
> > > I suspect he will release v4.12 today.  In that case, I don't know what to
> > > do with this other than maybe send it to Greg for a -stable release.  
> > 
> > Looks like this will have to be queued for 4.12 stable.  
> 
> I assume you'll take care of this, right?  It sounds like there's nothing
> to do for upstream because it needs a different fix.
> 
> Bjorn

Already fixed in Linux-next. The code is different for PCI 1.2
version and never had the bug.
diff mbox

Patch

diff --git a/drivers/pci/host/pci-hyperv.c b/drivers/pci/host/pci-hyperv.c
index 84936383e269..3cadfcca3ae9 100644
--- a/drivers/pci/host/pci-hyperv.c
+++ b/drivers/pci/host/pci-hyperv.c
@@ -900,10 +900,12 @@  static void hv_compose_msi_msg(struct irq_data *data, struct msi_msg *msg)
 	 * processors because Hyper-V only supports 64 in a guest.
 	 */
 	affinity = irq_data_get_affinity_mask(data);
+	cpumask_and(affinity, affinity, cpu_online_mask);
+
 	if (cpumask_weight(affinity) >= 32) {
 		int_pkt->int_desc.cpu_mask = CPU_AFFINITY_ALL;
 	} else {
-		for_each_cpu_and(cpu, affinity, cpu_online_mask) {
+		for_each_cpu(cpu, affinity) {
 			int_pkt->int_desc.cpu_mask |=
 				(1ULL << vmbus_cpu_number_to_vp_number(cpu));
 		}