Patchwork powerpc: pseries: Round up MSI-X requests

login
register
mail settings
Submitter Anton Blanchard
Date June 5, 2012, 2:47 a.m.
Message ID <20120605124703.12bb4cc3@kryten>
Download mbox | patch
Permalink /patch/162945/
State Accepted
Commit 752f5216f1eaabb0cfa84eaecd0ce17d79c7d2cf
Delegated to: Benjamin Herrenschmidt
Headers show

Comments

Anton Blanchard - June 5, 2012, 2:47 a.m.
Hi,

> On Mon, 2012-06-04 at 16:43 +1000, Michael Ellerman wrote:
> > There is some chance this will result in breakage because the driver
> > asks for N - and assumes that is what was allocated - and the
> > device is configured for > N.
> 
> We can fix that. We can whack the configuration back with N, just know
> that we have "allocated"  > N.

I agree we don't want to be giving back a larger value than requested.
There's only one place that can happen in theory and since firmware only
returns power of two values I dont think it will happen in practise.

Even so do we want to do something like this (as yet untested)? If the
rounded up request fails we retry with the original request.

The pseries msi free code just sets our vectors to 0 so it doesn't need
to know how many were originally allocated.
 
Anton
--

[PATCH] powerpc: pseries: Round up MSI-X requests

The pseries firmware currently refuses any non power of two MSI-X
request. Unfortunately most network drivers end up asking for that
because they want a power of two for RX queues and one or two extra
for everything else.

This patch rounds up the firmware request to the next power of two
if the quota allows it. If this fails we fall back to using the
original request size.

Signed-off-by: Anton Blanchard <anton@samba.org>
---
Michael Ellerman - June 13, 2012, 5:18 a.m.
On Tue, 2012-06-05 at 12:47 +1000, Anton Blanchard wrote:
> Hi,
> 
> > On Mon, 2012-06-04 at 16:43 +1000, Michael Ellerman wrote:
> > > There is some chance this will result in breakage because the driver
> > > asks for N - and assumes that is what was allocated - and the
> > > device is configured for > N.
> > 
> > We can fix that. We can whack the configuration back with N, just know
> > that we have "allocated"  > N.

I think whacking config space is more likely to break something than
just configuring more than the driver asked for.

> I agree we don't want to be giving back a larger value than requested.
> There's only one place that can happen in theory and since firmware only
> returns power of two values I dont think it will happen in practise.

Don't follow you here.

> Even so do we want to do something like this (as yet untested)? If the
> rounded up request fails we retry with the original request.

Yes we do. Had a chance to test it?

> The pseries msi free code just sets our vectors to 0 so it doesn't need
> to know how many were originally allocated.

Yep, looks like it will cope.

We only create virqs for what's in pdev->msi_list, which will be what
the driver originally asked for, and we free all those by walking the
list again. So the fact that firmware allocated a few extra for us is
OK, we have nothing extra to cleanup.

cheers
Benjamin Herrenschmidt - June 13, 2012, 5:25 a.m.
On Wed, 2012-06-13 at 15:18 +1000, Michael Ellerman wrote:
> On Tue, 2012-06-05 at 12:47 +1000, Anton Blanchard wrote:
> > Hi,
> > 
> > > On Mon, 2012-06-04 at 16:43 +1000, Michael Ellerman wrote:
> > > > There is some chance this will result in breakage because the driver
> > > > asks for N - and assumes that is what was allocated - and the
> > > > device is configured for > N.
> > > 
> > > We can fix that. We can whack the configuration back with N, just know
> > > that we have "allocated"  > N.
> 
> I think whacking config space is more likely to break something than
> just configuring more than the driver asked for.

How so ? I tend to disagree here..
.
> > I agree we don't want to be giving back a larger value than requested.
> > There's only one place that can happen in theory and since firmware only
> > returns power of two values I dont think it will happen in practise.
> 
> Don't follow you here.
> 
> > Even so do we want to do something like this (as yet untested)? If the
> > rounded up request fails we retry with the original request.
> 
> Yes we do. Had a chance to test it?
> 
> > The pseries msi free code just sets our vectors to 0 so it doesn't need
> > to know how many were originally allocated.
> 
> Yep, looks like it will cope.
> 
> We only create virqs for what's in pdev->msi_list, which will be what
> the driver originally asked for, and we free all those by walking the
> list again. So the fact that firmware allocated a few extra for us is
> OK, we have nothing extra to cleanup.

The firmware interface is busted anyway. We want to be able to
enable/allocate individual MSI-X at runtime, we need to get a new
interface sorted.

Ben.

Patch

Index: linux-build/arch/powerpc/platforms/pseries/msi.c
===================================================================
--- linux-build.orig/arch/powerpc/platforms/pseries/msi.c	2012-06-04 15:58:48.095833249 +1000
+++ linux-build/arch/powerpc/platforms/pseries/msi.c	2012-06-05 12:15:58.711074499 +1000
@@ -387,12 +387,13 @@  static int check_msix_entries(struct pci
 	return 0;
 }
 
-static int rtas_setup_msi_irqs(struct pci_dev *pdev, int nvec, int type)
+static int rtas_setup_msi_irqs(struct pci_dev *pdev, int nvec_in, int type)
 {
 	struct pci_dn *pdn;
 	int hwirq, virq, i, rc;
 	struct msi_desc *entry;
 	struct msi_msg msg;
+	int nvec = nvec_in;
 
 	pdn = get_pdn(pdev);
 	if (!pdn)
@@ -402,10 +403,23 @@  static int rtas_setup_msi_irqs(struct pc
 		return -EINVAL;
 
 	/*
+	 * Firmware currently refuse any non power of two allocation
+	 * so we round up if the quota will allow it.
+	 */
+	if (type == PCI_CAP_ID_MSIX) {
+		int m = roundup_pow_of_two(nvec);
+		int quota = msi_quota_for_device(pdev, m);
+
+		if (quota >= m)
+			nvec = m;
+	}
+
+	/*
 	 * Try the new more explicit firmware interface, if that fails fall
 	 * back to the old interface. The old interface is known to never
 	 * return MSI-Xs.
 	 */
+again:
 	if (type == PCI_CAP_ID_MSI) {
 		rc = rtas_change_msi(pdn, RTAS_CHANGE_MSI_FN, nvec);
 
@@ -417,6 +431,10 @@  static int rtas_setup_msi_irqs(struct pc
 		rc = rtas_change_msi(pdn, RTAS_CHANGE_MSIX_FN, nvec);
 
 	if (rc != nvec) {
+		if (nvec != nvec_in) {
+			nvec = nvec_in;
+			goto again;
+		}
 		pr_debug("rtas_msi: rtas_change_msi() failed\n");
 		return rc;
 	}