diff mbox series

PCI/MSI: preference to returning -ENOSPC from pci_alloc_irq_vectors_affinity

Message ID 20190103013106.26452-1-ming.lei@redhat.com
State Accepted
Delegated to: Bjorn Helgaas
Headers show
Series PCI/MSI: preference to returning -ENOSPC from pci_alloc_irq_vectors_affinity | expand

Commit Message

Ming Lei Jan. 3, 2019, 1:31 a.m. UTC
The API of pci_alloc_irq_vectors_affinity() requires to return -ENOSPC
if fewer than @min_vecs interrupt vectors are available for @dev.

However, if a device supports MSI-X but not MSI and a caller requests
@min_vecs that can't be satisfied by MSI-X, we previously returned
-EINVAL (from the failed attempt to enable MSI), not -ENOSPC.

Users of pci_alloc_irq_vectors_affinity() may try to reduce irq vectors
and allocate vectors again in case that -ENOSPC is returned, such as NVMe,
so we need to respect the current interface and give preference to -ENOSPC.

Especially the min/max vecs doesn't work correctly when using the
irq_affinity nr_sets because rebalancing the set counts is driver specific.
To get around that, drivers using nr_sets have to set min and max to the same
value and handle the "reduce and try again".

Cc: Jens Axboe <axboe@fb.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: linux-nvme@lists.infradead.org
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/pci/msi.c | 22 +++++++++++++---------
 1 file changed, 13 insertions(+), 9 deletions(-)

Comments

Bjorn Helgaas Jan. 14, 2019, 11:23 p.m. UTC | #1
On Thu, Jan 03, 2019 at 09:31:06AM +0800, Ming Lei wrote:
> The API of pci_alloc_irq_vectors_affinity() requires to return -ENOSPC
> if fewer than @min_vecs interrupt vectors are available for @dev.
> 
> However, if a device supports MSI-X but not MSI and a caller requests
> @min_vecs that can't be satisfied by MSI-X, we previously returned
> -EINVAL (from the failed attempt to enable MSI), not -ENOSPC.
> 
> Users of pci_alloc_irq_vectors_affinity() may try to reduce irq vectors
> and allocate vectors again in case that -ENOSPC is returned, such as NVMe,
> so we need to respect the current interface and give preference to -ENOSPC.
> 
> Especially the min/max vecs doesn't work correctly when using the
> irq_affinity nr_sets because rebalancing the set counts is driver specific.
> To get around that, drivers using nr_sets have to set min and max to the same
> value and handle the "reduce and try again".
> 
> Cc: Jens Axboe <axboe@fb.com>
> Cc: Keith Busch <keith.busch@intel.com>
> Cc: linux-nvme@lists.infradead.org
> Cc: Bjorn Helgaas <bhelgaas@google.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Ming Lei <ming.lei@redhat.com>

Applied to pci/msi for v5.1, thanks!

If this is something that should be in v5.0, let me know and include the
justification, e.g., something we already merged for v5.0 or regression
info, etc, and a Fixes: line, and I'll move it to for-linus.

> ---
>  drivers/pci/msi.c | 22 +++++++++++++---------
>  1 file changed, 13 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> index 7a1c8a09efa5..4c0b47867258 100644
> --- a/drivers/pci/msi.c
> +++ b/drivers/pci/msi.c
> @@ -1168,7 +1168,8 @@ int pci_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs,
>  				   const struct irq_affinity *affd)
>  {
>  	static const struct irq_affinity msi_default_affd;
> -	int vecs = -ENOSPC;
> +	int msix_vecs = -ENOSPC;
> +	int msi_vecs = -ENOSPC;
>  
>  	if (flags & PCI_IRQ_AFFINITY) {
>  		if (!affd)
> @@ -1179,16 +1180,17 @@ int pci_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs,
>  	}
>  
>  	if (flags & PCI_IRQ_MSIX) {
> -		vecs = __pci_enable_msix_range(dev, NULL, min_vecs, max_vecs,
> -				affd);
> -		if (vecs > 0)
> -			return vecs;
> +		msix_vecs = __pci_enable_msix_range(dev, NULL, min_vecs,
> +						    max_vecs, affd);
> +		if (msix_vecs > 0)
> +			return msix_vecs;
>  	}
>  
>  	if (flags & PCI_IRQ_MSI) {
> -		vecs = __pci_enable_msi_range(dev, min_vecs, max_vecs, affd);
> -		if (vecs > 0)
> -			return vecs;
> +		msi_vecs = __pci_enable_msi_range(dev, min_vecs, max_vecs,
> +						  affd);
> +		if (msi_vecs > 0)
> +			return msi_vecs;
>  	}
>  
>  	/* use legacy irq if allowed */
> @@ -1199,7 +1201,9 @@ int pci_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs,
>  		}
>  	}
>  
> -	return vecs;
> +	if (msix_vecs == -ENOSPC)
> +		return -ENOSPC;
> +	return msi_vecs;
>  }
>  EXPORT_SYMBOL(pci_alloc_irq_vectors_affinity);
>  
> -- 
> 2.9.5
> 
> 
> _______________________________________________
> Linux-nvme mailing list
> Linux-nvme@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-nvme
Christoph Hellwig Jan. 15, 2019, 1:11 p.m. UTC | #2
On Mon, Jan 14, 2019 at 05:23:39PM -0600, Bjorn Helgaas wrote:
> Applied to pci/msi for v5.1, thanks!
> 
> If this is something that should be in v5.0, let me know and include the
> justification, e.g., something we already merged for v5.0 or regression
> info, etc, and a Fixes: line, and I'll move it to for-linus.

I'd be tempted to queues this up for 5.0.  Ming, what is your position?
Jens Axboe Jan. 15, 2019, 4:22 p.m. UTC | #3
On 1/15/19 6:11 AM, Christoph Hellwig wrote:
> On Mon, Jan 14, 2019 at 05:23:39PM -0600, Bjorn Helgaas wrote:
>> Applied to pci/msi for v5.1, thanks!
>>
>> If this is something that should be in v5.0, let me know and include the
>> justification, e.g., something we already merged for v5.0 or regression
>> info, etc, and a Fixes: line, and I'll move it to for-linus.
> 
> I'd be tempted to queues this up for 5.0.  Ming, what is your position?

I think we should - the API was introduced in this series, I think there's
little (to no) reason NOT to fix it for 5.0.
Bjorn Helgaas Jan. 15, 2019, 7:31 p.m. UTC | #4
On Tue, Jan 15, 2019 at 09:22:45AM -0700, Jens Axboe wrote:
> On 1/15/19 6:11 AM, Christoph Hellwig wrote:
> > On Mon, Jan 14, 2019 at 05:23:39PM -0600, Bjorn Helgaas wrote:
> >> Applied to pci/msi for v5.1, thanks!
> >>
> >> If this is something that should be in v5.0, let me know and include the
> >> justification, e.g., something we already merged for v5.0 or regression
> >> info, etc, and a Fixes: line, and I'll move it to for-linus.
> > 
> > I'd be tempted to queues this up for 5.0.  Ming, what is your position?
> 
> I think we should - the API was introduced in this series, I think there's
> little (to no) reason NOT to fix it for 5.0.

I'm guessing the justification goes something like this (I haven't
done all the research, so I'll leave it to Ming to fill in the details):

  pci_alloc_irq_vectors_affinity() was added in v4.x by XXXX ("...").
  It had this return value defect then, but its min_vecs/max_vecs
  parameters removed the need for callers to interatively reduce the
  number of IRQs requested and retry the allocation, so they didn't
  need to distinguish -ENOSPC from -EINVAL.

  In v5.0, XXX ("...") added IRQ sets to the interface, which
  reintroduced the need to check for -ENOSPC and possibly reduce the
  number of IRQs requested and retry the allocation.
Ming Lei Jan. 15, 2019, 10:46 p.m. UTC | #5
Hi Bjorn,

I think Christoph and Jens are correct, we should make this patch into
5.0 because the issue is triggered since 3b6592f70ad7b4c2 ("nvme: utilize
two queue maps, one for reads and one for writes"), which is merged to
5.0-rc.

For example, before 3b6592f70ad7b4c2, one nvme controller may be
allocated 64 irq vectors; but after that commit, only 1 irq vector
is assigned to this controller.

On Tue, Jan 15, 2019 at 01:31:35PM -0600, Bjorn Helgaas wrote:
> On Tue, Jan 15, 2019 at 09:22:45AM -0700, Jens Axboe wrote:
> > On 1/15/19 6:11 AM, Christoph Hellwig wrote:
> > > On Mon, Jan 14, 2019 at 05:23:39PM -0600, Bjorn Helgaas wrote:
> > >> Applied to pci/msi for v5.1, thanks!
> > >>
> > >> If this is something that should be in v5.0, let me know and include the
> > >> justification, e.g., something we already merged for v5.0 or regression
> > >> info, etc, and a Fixes: line, and I'll move it to for-linus.
> > > 
> > > I'd be tempted to queues this up for 5.0.  Ming, what is your position?
> > 
> > I think we should - the API was introduced in this series, I think there's
> > little (to no) reason NOT to fix it for 5.0.
> 
> I'm guessing the justification goes something like this (I haven't
> done all the research, so I'll leave it to Ming to fill in the details):
> 
>   pci_alloc_irq_vectors_affinity() was added in v4.x by XXXX ("...").

dca51e7892fa3b ("nvme: switch to use pci_alloc_irq_vectors")

>   It had this return value defect then, but its min_vecs/max_vecs
>   parameters removed the need for callers to interatively reduce the
>   number of IRQs requested and retry the allocation, so they didn't
>   need to distinguish -ENOSPC from -EINVAL.
> 
>   In v5.0, XXX ("...") added IRQ sets to the interface, which

3b6592f70ad7b4c2 ("nvme: utilize two queue maps, one for reads and one for writes")

>   reintroduced the need to check for -ENOSPC and possibly reduce the
>   number of IRQs requested and retry the allocation.

Thanks,
Ming
Bjorn Helgaas Jan. 15, 2019, 11:49 p.m. UTC | #6
On Wed, Jan 16, 2019 at 06:46:32AM +0800, Ming Lei wrote:
> Hi Bjorn,
> 
> I think Christoph and Jens are correct, we should make this patch into
> 5.0 because the issue is triggered since 3b6592f70ad7b4c2 ("nvme: utilize
> two queue maps, one for reads and one for writes"), which is merged to
> 5.0-rc.
> 
> For example, before 3b6592f70ad7b4c2, one nvme controller may be
> allocated 64 irq vectors; but after that commit, only 1 irq vector
> is assigned to this controller.
> 
> On Tue, Jan 15, 2019 at 01:31:35PM -0600, Bjorn Helgaas wrote:
> > On Tue, Jan 15, 2019 at 09:22:45AM -0700, Jens Axboe wrote:
> > > On 1/15/19 6:11 AM, Christoph Hellwig wrote:
> > > > On Mon, Jan 14, 2019 at 05:23:39PM -0600, Bjorn Helgaas wrote:
> > > >> Applied to pci/msi for v5.1, thanks!
> > > >>
> > > >> If this is something that should be in v5.0, let me know and include the
> > > >> justification, e.g., something we already merged for v5.0 or regression
> > > >> info, etc, and a Fixes: line, and I'll move it to for-linus.
> > > > 
> > > > I'd be tempted to queues this up for 5.0.  Ming, what is your position?
> > > 
> > > I think we should - the API was introduced in this series, I think there's
> > > little (to no) reason NOT to fix it for 5.0.
> > 
> > I'm guessing the justification goes something like this (I haven't
> > done all the research, so I'll leave it to Ming to fill in the details):
> > 
> >   pci_alloc_irq_vectors_affinity() was added in v4.x by XXXX ("...").
> 
> dca51e7892fa3b ("nvme: switch to use pci_alloc_irq_vectors")
> 
> >   It had this return value defect then, but its min_vecs/max_vecs
> >   parameters removed the need for callers to interatively reduce the
> >   number of IRQs requested and retry the allocation, so they didn't
> >   need to distinguish -ENOSPC from -EINVAL.
> > 
> >   In v5.0, XXX ("...") added IRQ sets to the interface, which
> 
> 3b6592f70ad7b4c2 ("nvme: utilize two queue maps, one for reads and one for writes")
> 
> >   reintroduced the need to check for -ENOSPC and possibly reduce the
> >   number of IRQs requested and retry the allocation.

We're fixing a PCI core defect, so we should mention the relevant PCI
core commits, not the nvme-specific ones.  I looked them up for you
and moved this to for-linus for v5.0.


commit 77f88abd4a6f73a1a68dbdc0e3f21575fd508fc3
Author: Ming Lei <ming.lei@redhat.com>
Date:   Tue Jan 15 17:31:29 2019 -0600

    PCI/MSI: Return -ENOSPC from pci_alloc_irq_vectors_affinity()
    
    The API of pci_alloc_irq_vectors_affinity() says it returns -ENOSPC if
    fewer than @min_vecs interrupt vectors are available for @dev.
    
    However, if a device supports MSI-X but not MSI and a caller requests
    @min_vecs that can't be satisfied by MSI-X, we previously returned -EINVAL
    (from the failed attempt to enable MSI), not -ENOSPC.
    
    When -ENOSPC is returned, callers may reduce the number IRQs they request
    and try again.  Most callers can use the @min_vecs and @max_vecs
    parameters to avoid this retry loop, but that doesn't work when using IRQ
    affinity "nr_sets" because rebalancing the sets is driver-specific.
    
    This return value bug has been present since pci_alloc_irq_vectors() was
    added in v4.10 by aff171641d18 ("PCI: Provide sensible IRQ vector
    alloc/free routines"), but it wasn't an issue because @min_vecs/@max_vecs
    removed the need for callers to iteratively reduce the number of IRQs
    requested and retry the allocation, so they didn't need to distinguish
    -ENOSPC from -EINVAL.
    
    In v5.0, 6da4b3ab9a6e ("genirq/affinity: Add support for allocating
    interrupt sets") added IRQ sets to the interface, which reintroduced the
    need to check for -ENOSPC and possibly reduce the number of IRQs requested
    and retry the allocation.
    
    Signed-off-by: Ming Lei <ming.lei@redhat.com>
    [bhelgaas: changelog]
    Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
    Cc: Jens Axboe <axboe@fb.com>
    Cc: Keith Busch <keith.busch@intel.com>
    Cc: Christoph Hellwig <hch@lst.de>

diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index 7a1c8a09efa5..4c0b47867258 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -1168,7 +1168,8 @@ int pci_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs,
 				   const struct irq_affinity *affd)
 {
 	static const struct irq_affinity msi_default_affd;
-	int vecs = -ENOSPC;
+	int msix_vecs = -ENOSPC;
+	int msi_vecs = -ENOSPC;
 
 	if (flags & PCI_IRQ_AFFINITY) {
 		if (!affd)
@@ -1179,16 +1180,17 @@ int pci_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs,
 	}
 
 	if (flags & PCI_IRQ_MSIX) {
-		vecs = __pci_enable_msix_range(dev, NULL, min_vecs, max_vecs,
-				affd);
-		if (vecs > 0)
-			return vecs;
+		msix_vecs = __pci_enable_msix_range(dev, NULL, min_vecs,
+						    max_vecs, affd);
+		if (msix_vecs > 0)
+			return msix_vecs;
 	}
 
 	if (flags & PCI_IRQ_MSI) {
-		vecs = __pci_enable_msi_range(dev, min_vecs, max_vecs, affd);
-		if (vecs > 0)
-			return vecs;
+		msi_vecs = __pci_enable_msi_range(dev, min_vecs, max_vecs,
+						  affd);
+		if (msi_vecs > 0)
+			return msi_vecs;
 	}
 
 	/* use legacy irq if allowed */
@@ -1199,7 +1201,9 @@ int pci_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs,
 		}
 	}
 
-	return vecs;
+	if (msix_vecs == -ENOSPC)
+		return -ENOSPC;
+	return msi_vecs;
 }
 EXPORT_SYMBOL(pci_alloc_irq_vectors_affinity);
diff mbox series

Patch

diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index 7a1c8a09efa5..4c0b47867258 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -1168,7 +1168,8 @@  int pci_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs,
 				   const struct irq_affinity *affd)
 {
 	static const struct irq_affinity msi_default_affd;
-	int vecs = -ENOSPC;
+	int msix_vecs = -ENOSPC;
+	int msi_vecs = -ENOSPC;
 
 	if (flags & PCI_IRQ_AFFINITY) {
 		if (!affd)
@@ -1179,16 +1180,17 @@  int pci_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs,
 	}
 
 	if (flags & PCI_IRQ_MSIX) {
-		vecs = __pci_enable_msix_range(dev, NULL, min_vecs, max_vecs,
-				affd);
-		if (vecs > 0)
-			return vecs;
+		msix_vecs = __pci_enable_msix_range(dev, NULL, min_vecs,
+						    max_vecs, affd);
+		if (msix_vecs > 0)
+			return msix_vecs;
 	}
 
 	if (flags & PCI_IRQ_MSI) {
-		vecs = __pci_enable_msi_range(dev, min_vecs, max_vecs, affd);
-		if (vecs > 0)
-			return vecs;
+		msi_vecs = __pci_enable_msi_range(dev, min_vecs, max_vecs,
+						  affd);
+		if (msi_vecs > 0)
+			return msi_vecs;
 	}
 
 	/* use legacy irq if allowed */
@@ -1199,7 +1201,9 @@  int pci_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs,
 		}
 	}
 
-	return vecs;
+	if (msix_vecs == -ENOSPC)
+		return -ENOSPC;
+	return msi_vecs;
 }
 EXPORT_SYMBOL(pci_alloc_irq_vectors_affinity);