Patchwork [3/4] AHCI: Conserve interrupts with pci_enable_msi_block_part() interface

login
register
mail settings
Submitter Alexander Gordeev
Date Sept. 2, 2013, 9 a.m.
Message ID <3bb1b4375655ecfde5017cc70973d078f2434d5d.1378111919.git.agordeev@redhat.com>
Download mbox | patch
Permalink /patch/271816/
State Not Applicable
Delegated to: David Miller
Headers show

Comments

Alexander Gordeev - Sept. 2, 2013, 9 a.m.
Make use of the new pci_enable_msi_block_part() interface
and conserve on othewise wasted interrupt resources for 10
of 16 unused MSI vectors on Intel chipsets.

Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
---
 drivers/ata/ahci.c |   48 +++++++++++++++++++++++++++++-------------------
 1 files changed, 29 insertions(+), 19 deletions(-)
Tejun Heo - Sept. 3, 2013, 2:18 p.m.
On Mon, Sep 02, 2013 at 11:00:28AM +0200, Alexander Gordeev wrote:
> +	if (hpriv->flags & AHCI_HFLAG_NO_MSI)
> +		goto intx;
> +
> +	rc = pci_enable_msi_block_part(pdev, n_ports, AHCI_MAX_PORTS);
> +	if (!rc)
> +		return AHCI_MAX_PORTS;
> +	if (rc < 0)
> +		goto intx;
> +
> +	maxvec = rc;
> +	rc = pci_enable_msi_block_part(pdev, n_ports, maxvec);
> +	if (!rc)
> +		return maxvec;
> +	if (rc < 0)
> +		goto intx;

Why is the above fallback necessary?  The only other number which
makes sense is roundup_pow_of_two(n_ports), right?  What does the
above fallback logic buy us?

Thanks.
Alexander Gordeev - Sept. 3, 2013, 4:19 p.m.
On Tue, Sep 03, 2013 at 10:18:24AM -0400, Tejun Heo wrote:
> On Mon, Sep 02, 2013 at 11:00:28AM +0200, Alexander Gordeev wrote:
> > +	if (hpriv->flags & AHCI_HFLAG_NO_MSI)
> > +		goto intx;
> > +
> > +	rc = pci_enable_msi_block_part(pdev, n_ports, AHCI_MAX_PORTS);

We start with maximum possible number of ports AHCI_MAX_PORTS

> > +	if (!rc)
> > +		return AHCI_MAX_PORTS;

If we succeeded the device is indeed supports all AHCI_MAX_PORTS
and we report it.

> > +	if (rc < 0)
> > +		goto intx;

If pci_enable_msi_block_part() failed we should not make further
attempts and fallback to simple IRQ.

> > +	maxvec = rc;

The device supports a lesser of AHCI_MAX_PORTS, because the previous
pci_enable_msi_block_part() has not succeeded nor failed. Thus, rc
contains number of supported MSIs. In case of ICH this will be 16
rather than 32.

Actually, while I was writing this I realized this could be a number
of MSIs that could have been enabled this device, not the maximum
number of supported MSIs - these two may differ. I think MRSM should
be checked. But I will continue as if it always the same.

> > +	rc = pci_enable_msi_block_part(pdev, n_ports, maxvec);

Try pci_enable_msi_block_part() with the maximum number of supported MSIs.

> > +	if (!rc)
> > +		return maxvec;

If we succeeded report the number of enabled MSIs.

> > +	if (rc < 0)
> > +		goto intx;

If pci_enable_msi_block_part() failed we should not make further
attempts and fallback to simple IRQ.

> Why is the above fallback necessary?  The only other number which
> makes sense is roundup_pow_of_two(n_ports), right?  What does the
> above fallback logic buy us?

We must enable maximum possible number of MSIs - the one reported in
Multiple Message Capable register. Otherwise ICH device will fallback
to MRSM. IOW, if the result of roundup_pow_of_two(n_ports) is not what
in Multiple Message Capable register (i.e. as roundup_pow_of_two(6) vs 16)
ICH will enforce MRSM mode.

> Thanks.
> 
> -- 
> tejun
Tejun Heo - Sept. 3, 2013, 6:27 p.m.
Hello,

On Tue, Sep 03, 2013 at 06:19:06PM +0200, Alexander Gordeev wrote:
> We must enable maximum possible number of MSIs - the one reported in
> Multiple Message Capable register. Otherwise ICH device will fallback
> to MRSM. IOW, if the result of roundup_pow_of_two(n_ports) is not what
> in Multiple Message Capable register (i.e. as roundup_pow_of_two(6) vs 16)
> ICH will enforce MRSM mode.

Hmmm... I think the interface in general is pretty messy.  Wouldn't it
be much cleaner to have a separate function to query MSICAP and let
the function just return success / failure?

Thanks.
Alexander Gordeev - Sept. 4, 2013, 7:22 a.m.
On Tue, Sep 03, 2013 at 02:27:31PM -0400, Tejun Heo wrote: > Hello,
> 
> On Tue, Sep 03, 2013 at 06:19:06PM +0200, Alexander Gordeev wrote:
> > We must enable maximum possible number of MSIs - the one reported in
> > Multiple Message Capable register. Otherwise ICH device will fallback
> > to MRSM. IOW, if the result of roundup_pow_of_two(n_ports) is not what
> > in Multiple Message Capable register (i.e. as roundup_pow_of_two(6) vs 16)
> > ICH will enforce MRSM mode.
> 
> Hmmm... I think the interface in general is pretty messy.  Wouldn't it
> be much cleaner to have a separate function to query MSICAP and let
> the function just return success / failure?

Actually, sorry for misleading. It is only ICH (I am aware of) that works
this way and I was focused on.

I think a general approach that will cover it all (including ICH and undesired
sharing of interrupt vectors) - start MME from roundup_pow_of_two(n_ports) and
ensure MRSM bit is unset. If not - double MME and retry. If still not and the
limit is reached - fall back to single MSI.

Makes sense?

> Thanks.
> 
> -- 
> tejun
Tejun Heo - Sept. 4, 2013, 2:55 p.m.
Hello,

On Wed, Sep 04, 2013 at 09:22:57AM +0200, Alexander Gordeev wrote:
> I think a general approach that will cover it all (including ICH and undesired
> sharing of interrupt vectors) - start MME from roundup_pow_of_two(n_ports) and
> ensure MRSM bit is unset. If not - double MME and retry. If still not and the
> limit is reached - fall back to single MSI.
> 
> Makes sense?

Ugh... I really don't want this sort of retry loop.  I'm still a bit
lost on why this is necessary.  Can you please elaborate?

Thanks.
Alexander Gordeev - Sept. 4, 2013, 4:14 p.m.
On Wed, Sep 04, 2013 at 10:55:59AM -0400, Tejun Heo wrote:
> Hello,
> 
> On Wed, Sep 04, 2013 at 09:22:57AM +0200, Alexander Gordeev wrote:
> > I think a general approach that will cover it all (including ICH and undesired
> > sharing of interrupt vectors) - start MME from roundup_pow_of_two(n_ports) and
> > ensure MRSM bit is unset. If not - double MME and retry. If still not and the
> > limit is reached - fall back to single MSI.
> > 
> > Makes sense?
> 
> Ugh... I really don't want this sort of retry loop.  I'm still a bit
> lost on why this is necessary.  Can you please elaborate?

Sure.

1. We do not support sharing MSI messages since there is no appropriate
interrupt handling for it. I am not sure if any hardware supports it at
all. This assumption is just for clarity here.

2. We can not just read out MMC value and try to write it to MME, because
we are not sure the hardware honours the specification.  I.e. in case of 6
ports and MMC value of 16 the value of 8 in MME could enable multiple MSIs
while the value of 16 could enforce MRSM. Contradicts to the AHCI
specification, but look how weird ICH is [4].

3. Enabling more MSIs than needed (MME == MMC instead of MME < MMC) could
lead to unnecessary allocation of internal device resources. Bit lame, but
still true.

4. We can not derive the value of MME needed from the number of ports, at
least in case of ICH. I.e. with 6 ports and MMC value of 16 the value of 8
in MME is what would be expected according to the AHCI specification. But
the ICH reserves it and MRSMs in case 8 is written to MME.

So to cover all the above assumptions we need to scan from lowest possible
and watch the MRSM bit is unset.

> Thanks.
> 
> -- 
> tejun
Tejun Heo - Sept. 4, 2013, 6:06 p.m.
Hello,

On Wed, Sep 04, 2013 at 06:14:43PM +0200, Alexander Gordeev wrote:
> 1. We do not support sharing MSI messages since there is no appropriate
> interrupt handling for it. I am not sure if any hardware supports it at
> all. This assumption is just for clarity here.

I don't think it even matters.  If we can configure all MSIs, good.
If not, using single interrupt is good enough.  Being able to allocate
only some of the interrupts is highly unlikely to begin with and
there's no reason to meddle with extra complexity.

> 2. We can not just read out MMC value and try to write it to MME, because
> we are not sure the hardware honours the specification.  I.e. in case of 6
> ports and MMC value of 16 the value of 8 in MME could enable multiple MSIs
> while the value of 16 could enforce MRSM. Contradicts to the AHCI
> specification, but look how weird ICH is [4].
> 
> 3. Enabling more MSIs than needed (MME == MMC instead of MME < MMC) could
> lead to unnecessary allocation of internal device resources. Bit lame, but
> still true.
> 
> 4. We can not derive the value of MME needed from the number of ports, at
> least in case of ICH. I.e. with 6 ports and MMC value of 16 the value of 8
> in MME is what would be expected according to the AHCI specification. But
> the ICH reserves it and MRSMs in case 8 is written to MME.
> 
> So to cover all the above assumptions we need to scan from lowest possible
> and watch the MRSM bit is unset.

I don't think it's necessary / a good idea to try to support
everything.  Just following the spec would be fine.  If that doesn't
work for too many devices, maybe just do one fallback?

Thanks.
Alexander Gordeev - Sept. 4, 2013, 6:47 p.m.
On Wed, Sep 04, 2013 at 02:06:07PM -0400, Tejun Heo wrote:
> I don't think it's necessary / a good idea to try to support
> everything.  Just following the spec would be fine.  If that doesn't
> work for too many devices, maybe just do one fallback?

Calling pci_enable_msi_block_part() in a loop may seem not that terrible
if you think of the original interface - pci_enable_msi_block(). If we
were to enable multiple MSIs using that one we still had to use it pretty
much the same way ;)

Anyway. I will send a version which reads MMC out.

> Thanks.
> 
> -- 
> tejun
Tejun Heo - Sept. 4, 2013, 6:51 p.m.
On Wed, Sep 04, 2013 at 08:47:16PM +0200, Alexander Gordeev wrote:
> On Wed, Sep 04, 2013 at 02:06:07PM -0400, Tejun Heo wrote:
> > I don't think it's necessary / a good idea to try to support
> > everything.  Just following the spec would be fine.  If that doesn't
> > work for too many devices, maybe just do one fallback?
> 
> Calling pci_enable_msi_block_part() in a loop may seem not that terrible
> if you think of the original interface - pci_enable_msi_block(). If we
> were to enable multiple MSIs using that one we still had to use it pretty
> much the same way ;)

I don't know.  The thing is, do we even want to meddle with
controllers which only accept a value which isn't the value specified
in the spec or the maximum value described as supported?  It's icky
and we are far better of playing it safe.  This has the possibility of
breaking boot on many configurations.

Thanks.

Patch

diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c
index db4380d..b0c1648 100644
--- a/drivers/ata/ahci.c
+++ b/drivers/ata/ahci.c
@@ -1091,26 +1091,36 @@  static inline void ahci_gtf_filter_workaround(struct ata_host *host)
 {}
 #endif
 
-int ahci_init_interrupts(struct pci_dev *pdev, struct ahci_host_priv *hpriv)
+int ahci_init_interrupts(struct pci_dev *pdev, unsigned int n_ports,
+			 struct ahci_host_priv *hpriv)
 {
 	int rc;
 	unsigned int maxvec;
 
-	if (!(hpriv->flags & AHCI_HFLAG_NO_MSI)) {
-		rc = pci_enable_msi_block_auto(pdev, &maxvec);
-		if (rc > 0) {
-			if ((rc == maxvec) || (rc == 1))
-				return rc;
-			/*
-			 * Assume that advantage of multipe MSIs is negated,
-			 * so fallback to single MSI mode to save resources
-			 */
-			pci_disable_msi(pdev);
-			if (!pci_enable_msi(pdev))
-				return 1;
-		}
-	}
+	if (hpriv->flags & AHCI_HFLAG_NO_MSI)
+		goto intx;
+
+	rc = pci_enable_msi_block_part(pdev, n_ports, AHCI_MAX_PORTS);
+	if (!rc)
+		return AHCI_MAX_PORTS;
+	if (rc < 0)
+		goto intx;
+
+	maxvec = rc;
+	rc = pci_enable_msi_block_part(pdev, n_ports, maxvec);
+	if (!rc)
+		return maxvec;
+	if (rc < 0)
+		goto intx;
+
+	/*
+	 * Assume that advantage of multipe MSIs is negated,
+	 * so fallback to single MSI mode to save resources
+	 */
+	if (!pci_enable_msi(pdev))
+		return 1;
 
+intx:
 	pci_intx(pdev, 1);
 	return 0;
 }
@@ -1277,10 +1287,6 @@  static int ahci_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 
 	hpriv->mmio = pcim_iomap_table(pdev)[ahci_pci_bar];
 
-	n_msis = ahci_init_interrupts(pdev, hpriv);
-	if (n_msis > 1)
-		hpriv->flags |= AHCI_HFLAG_MULTI_MSI;
-
 	/* save initial config */
 	ahci_pci_save_initial_config(pdev, hpriv);
 
@@ -1327,6 +1333,10 @@  static int ahci_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 	 */
 	n_ports = max(ahci_nr_ports(hpriv->cap), fls(hpriv->port_map));
 
+	n_msis = ahci_init_interrupts(pdev, n_ports, hpriv);
+	if (n_msis > 1)
+		hpriv->flags |= AHCI_HFLAG_MULTI_MSI;
+
 	host = ata_host_alloc_pinfo(&pdev->dev, ppi, n_ports);
 	if (!host)
 		return -ENOMEM;