Patchwork powerpc/pci: Improve device hotplug initialization

login
register
mail settings
Submitter Guenter Roeck
Date May 28, 2013, 5:35 p.m.
Message ID <1369762514-27352-1-git-send-email-linux@roeck-us.net>
Download mbox | patch
Permalink /patch/247046/
State Superseded
Headers show

Comments

Guenter Roeck - May 28, 2013, 5:35 p.m.
Commit 37f02195b (powerpc/pci: fix PCI-e devices rescan issue on powerpc
platform) fixes a problem with interrupt and DMA initialization on hot
plugged devices. With this commit, interrupt and DMA initialization for
hot plugged devices is handled in the pci device enable function.

This approach has a couple of drawbacks. First, it creates two code paths
for device initialization, one for hot plugged devices and another for devices
known during the initial PCI scan. Second, the initialization code for hot
plugged devices is only called when the device is enabled, ie typically
in the probe function. Also, the platform specific setup code is called each
time pci_enable_device() is called, not only once during device discovery,
meaning it is actually called multiple times, once for devices discovered
during the initial scan and again each time a driver is re-loaded.

The visible result is that interrupt pins are only assigned to hot plugged
devices when the device driver is loaded. Effectively this changes the PCI
probe API, since pci_dev->irq and the device's dma configuration will now
only be valid after pci_enable() was called at least once. A more subtle
change is that platform specific PCI device setup is moved from device
discovery into the driver's probe function, more specifically into the
pci_enable_device() call.

To fix the inconsistencies, replace pcibios_setup_device() with
pcibios_add_device(). Drop explicit calls to pcibios_setup_device();
this makes pcibios_setup_bus_devices() a noop function which could
eventually be dropped. With this change, device initialization is called
exactly once and consistently for both static and hot plugged devices.

Since pcibios_add_device() is called from pci_device_add(), which initializes
the numa node, drop the now redundant additional numa node initialization.

Cc: Yuanquan Chen <Yuanquan.Chen@freescale.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Hiroo Matsumoto <matsumoto.hiroo@jp.fujitsu.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
---
 arch/powerpc/kernel/pci-common.c |   25 ++++---------------------
 1 file changed, 4 insertions(+), 21 deletions(-)
Yuanquan Chen - May 29, 2013, 9:30 a.m.
On 05/29/2013 01:35 AM, Guenter Roeck wrote:
> bios_add_device(). Drop explicit calls to pcibios_setup_device();
> this makes pcibios_setup_bus_devices() a noop function which could
> eve

Yeah, it's more reasonable to do the irq and DMA related initialization
in one code path for all devices.

Regards,
Yuanquan
Guenter Roeck - May 31, 2013, 5:14 a.m.
On Wed, May 29, 2013 at 05:30:41PM +0800, Chen Yuanquan-B41889 wrote:
> On 05/29/2013 01:35 AM, Guenter Roeck wrote:
> >bios_add_device(). Drop explicit calls to pcibios_setup_device();
> >this makes pcibios_setup_bus_devices() a noop function which could
> >eve
> 
> Yeah, it's more reasonable to do the irq and DMA related initialization
> in one code path for all devices.
> 
Any comments / feedback on the code itself ?

Thanks,
Guenter
Benjamin Herrenschmidt - May 31, 2013, 5:44 a.m.
On Thu, 2013-05-30 at 22:14 -0700, Guenter Roeck wrote:
> On Wed, May 29, 2013 at 05:30:41PM +0800, Chen Yuanquan-B41889 wrote:
> > On 05/29/2013 01:35 AM, Guenter Roeck wrote:
> > >bios_add_device(). Drop explicit calls to pcibios_setup_device();
> > >this makes pcibios_setup_bus_devices() a noop function which could
> > >eve
> > 
> > Yeah, it's more reasonable to do the irq and DMA related initialization
> > in one code path for all devices.
> > 
> Any comments / feedback on the code itself ?

Sorry, I haven't had a chance to review it yet, I'm fairly bogged down
at the moment. I want to tread carefully because the previous iteration
of changing that stuff did break a few platforms in the end.

Cheers,
Ben.
Guenter Roeck - June 1, 2013, 1:58 p.m.
On Fri, May 31, 2013 at 03:44:07PM +1000, Benjamin Herrenschmidt wrote:
> On Thu, 2013-05-30 at 22:14 -0700, Guenter Roeck wrote:
> > On Wed, May 29, 2013 at 05:30:41PM +0800, Chen Yuanquan-B41889 wrote:
> > > On 05/29/2013 01:35 AM, Guenter Roeck wrote:
> > > >bios_add_device(). Drop explicit calls to pcibios_setup_device();
> > > >this makes pcibios_setup_bus_devices() a noop function which could
> > > >eve
> > > 
> > > Yeah, it's more reasonable to do the irq and DMA related initialization
> > > in one code path for all devices.
> > > 
> > Any comments / feedback on the code itself ?
> 
> Sorry, I haven't had a chance to review it yet, I'm fairly bogged down
> at the moment. I want to tread carefully because the previous iteration
> of changing that stuff did break a few platforms in the end.
> 
Hi Ben,

the comment was actuially directed towards Yuanquan.

No problem, take your time. I did my best to test it, but I agree that this is a
critical area of the code, and it would be desirable to get additional scrutiny
and test feedback.

The code has been running in our system (P2020 and P5040) for several months.
I was preparing a patch for upstream submission when I noticed commit 37f02195b.
After testing ithis commit, I noticed the problems with it and wrote this patch,
which aligns the code with our initial patch. I tested it as good as I could on
our systems as well as with a P5040 evaluation board and an Intel GE PCIe
card.

Thanks,
Guenter
Yuanquan Chen - June 3, 2013, 2:28 a.m.
On 06/01/2013 09:58 PM, Guenter Roeck wrote:
> On Fri, May 31, 2013 at 03:44:07PM +1000, Benjamin Herrenschmidt wrote:
>> On Thu, 2013-05-30 at 22:14 -0700, Guenter Roeck wrote:
>>> On Wed, May 29, 2013 at 05:30:41PM +0800, Chen Yuanquan-B41889 wrote:
>>>> On 05/29/2013 01:35 AM, Guenter Roeck wrote:
>>>>> bios_add_device(). Drop explicit calls to pcibios_setup_device();
>>>>> this makes pcibios_setup_bus_devices() a noop function which could
>>>>> eve
>>>> Yeah, it's more reasonable to do the irq and DMA related initialization
>>>> in one code path for all devices.
>>>>
>>> Any comments / feedback on the code itself ?
>> Sorry, I haven't had a chance to review it yet, I'm fairly bogged down
>> at the moment. I want to tread carefully because the previous iteration
>> of changing that stuff did break a few platforms in the end.
>>
> Hi Ben,
>
> the comment was actuially directed towards Yuanquan.
>
> No problem, take your time. I did my best to test it, but I agree that this is a
> critical area of the code, and it would be desirable to get additional scrutiny
> and test feedback.
>
> The code has been running in our system (P2020 and P5040) for several months.
> I was preparing a patch for upstream submission when I noticed commit 37f02195b.
> After testing ithis commit, I noticed the problems with it and wrote this patch,
> which aligns the code with our initial patch. I tested it as good as I could on
> our systems as well as with a P5040 evaluation board and an Intel GE PCIe
> card.
>
> Thanks,
> Guenter
>

Hi Guenter,

Your patch makes sure the initialization of DMA and irq in one code path 
and also
avoids the initialization called two times(in pci bus scanning and 
pci_enable_device()
in device driver) for no-hot-plugged pci device. It's much more reasonable!

I don't know why you also remove the function "set_dev_node()" ? As I 
know, the
function just be called here. I think it should remain in your new function
pcibios_add_device().

Regards,
Yuanquan

>
Guenter Roeck - June 3, 2013, 4:11 a.m.
On Mon, Jun 03, 2013 at 10:28:47AM +0800, Chen Yuanquan-B41889 wrote:
> On 06/01/2013 09:58 PM, Guenter Roeck wrote:
> >On Fri, May 31, 2013 at 03:44:07PM +1000, Benjamin Herrenschmidt wrote:
> >>On Thu, 2013-05-30 at 22:14 -0700, Guenter Roeck wrote:
> >>>On Wed, May 29, 2013 at 05:30:41PM +0800, Chen Yuanquan-B41889 wrote:
> >>>>On 05/29/2013 01:35 AM, Guenter Roeck wrote:
> >>>>>bios_add_device(). Drop explicit calls to pcibios_setup_device();
> >>>>>this makes pcibios_setup_bus_devices() a noop function which could
> >>>>>eve
> >>>>Yeah, it's more reasonable to do the irq and DMA related initialization
> >>>>in one code path for all devices.
> >>>>
> >>>Any comments / feedback on the code itself ?
> >>Sorry, I haven't had a chance to review it yet, I'm fairly bogged down
> >>at the moment. I want to tread carefully because the previous iteration
> >>of changing that stuff did break a few platforms in the end.
> >>
> >Hi Ben,
> >
> >the comment was actuially directed towards Yuanquan.
> >
> >No problem, take your time. I did my best to test it, but I agree that this is a
> >critical area of the code, and it would be desirable to get additional scrutiny
> >and test feedback.
> >
> >The code has been running in our system (P2020 and P5040) for several months.
> >I was preparing a patch for upstream submission when I noticed commit 37f02195b.
> >After testing ithis commit, I noticed the problems with it and wrote this patch,
> >which aligns the code with our initial patch. I tested it as good as I could on
> >our systems as well as with a P5040 evaluation board and an Intel GE PCIe
> >card.
> >
> >Thanks,
> >Guenter
> >
> 
> Hi Guenter,
> 
> Your patch makes sure the initialization of DMA and irq in one code
> path and also
> avoids the initialization called two times(in pci bus scanning and
> pci_enable_device()
> in device driver) for no-hot-plugged pci device. It's much more reasonable!
> 
> I don't know why you also remove the function "set_dev_node()" ? As
> I know, the
> function just be called here. I think it should remain in your new function
> pcibios_add_device().
> 
pcibios_add_device() is (only) called from pci_device_add(),
which already calls set_dev_node().

Guenter
Benjamin Herrenschmidt - June 6, 2013, 1 a.m.
On Sat, 2013-06-01 at 06:58 -0700, Guenter Roeck wrote:
> the comment was actuially directed towards Yuanquan.
> 
> No problem, take your time. I did my best to test it, but I agree that this is a
> critical area of the code, and it would be desirable to get additional scrutiny
> and test feedback.
> 
> The code has been running in our system (P2020 and P5040) for several months.
> I was preparing a patch for upstream submission when I noticed commit 37f02195b.
> After testing ithis commit, I noticed the problems with it and wrote this patch,
> which aligns the code with our initial patch. I tested it as good as I could on
> our systems as well as with a P5040 evaluation board and an Intel GE PCIe
> card.

Ok, so I like this very much. So much that I was considering still sneaking it
into 3.10, until I hit a snag...

[ Basically, the previous patch that moved the setup to pcibios_enable_device()
always made me nervous. It did regress at least one platform (mac stuff) due
to missed IRQ fixup, which I worked around later on, and I'm still not terribly
happy about it. Your approach is much cleaner. ]

I suppose that when I wrote the original setup stuff there wasn't an add
hook or I didn't see it...

In fact I would go further and completely remove pcibios_setup_bus_devices()
which is now empty since it's only called by the powerpc code, it's not
a generic hook.

However, here's the snag. Unless I missed something, we now setup the devices
DMA before we call pcibios_fixup_bus(). And *that* is going to break some
pseries.

We have an assumption in there that the bus fixup is done first, because in
some cases, the DMA windows are established at the bus level, and the "dev"
setup just picks up the bits.

Now looking at that code, it's not unfixable but it won't make 3.10. Maybe
we need a new pre-scan hook for busses... we can use the pcibios_add_device()
hook of the bridge itself for P2P but that won't do for the root bus and I
don't like having two different path here...

Cheers,
Ben.
Guenter Roeck - June 6, 2013, 5:25 a.m.
On Thu, Jun 06, 2013 at 11:00:04AM +1000, Benjamin Herrenschmidt wrote:
> On Sat, 2013-06-01 at 06:58 -0700, Guenter Roeck wrote:
> > the comment was actuially directed towards Yuanquan.
> > 
> > No problem, take your time. I did my best to test it, but I agree that this is a
> > critical area of the code, and it would be desirable to get additional scrutiny
> > and test feedback.
> > 
> > The code has been running in our system (P2020 and P5040) for several months.
> > I was preparing a patch for upstream submission when I noticed commit 37f02195b.
> > After testing ithis commit, I noticed the problems with it and wrote this patch,
> > which aligns the code with our initial patch. I tested it as good as I could on
> > our systems as well as with a P5040 evaluation board and an Intel GE PCIe
> > card.
> 
> Ok, so I like this very much. So much that I was considering still sneaking it
> into 3.10, until I hit a snag...
> 
> [ Basically, the previous patch that moved the setup to pcibios_enable_device()
> always made me nervous. It did regress at least one platform (mac stuff) due
> to missed IRQ fixup, which I worked around later on, and I'm still not terribly
> happy about it. Your approach is much cleaner. ]
> 
> I suppose that when I wrote the original setup stuff there wasn't an add
> hook or I didn't see it...
> 
> In fact I would go further and completely remove pcibios_setup_bus_devices()
> which is now empty since it's only called by the powerpc code, it's not
> a generic hook.
> 
> However, here's the snag. Unless I missed something, we now setup the devices
> DMA before we call pcibios_fixup_bus(). And *that* is going to break some
> pseries.
> 
> We have an assumption in there that the bus fixup is done first, because in
> some cases, the DMA windows are established at the bus level, and the "dev"
> setup just picks up the bits.
> 
> Now looking at that code, it's not unfixable but it won't make 3.10. Maybe
> we need a new pre-scan hook for busses... we can use the pcibios_add_device()
> hook of the bridge itself for P2P but that won't do for the root bus and I
> don't like having two different path here...
> 
Hi Ben,

you are right, pcibios_fixup_bus() is called after pcibios_add_device(),
at least in the initial scan.

Can you point me to some of the breaking code ? I guess it must be in some of
the pci_dma_dev_setup callbacks, but those I looked at only check devicetree
data or simply set function pointers, both of which should not be affected by
the call order.

How about pcibios_fixup_device, to be called after pcibios_fixup_bus ?

Thanks,
Guenter
Benjamin Herrenschmidt - June 6, 2013, 5:44 a.m.
On Wed, 2013-06-05 at 22:25 -0700, Guenter Roeck wrote:
> 
> 
> Can you point me to some of the breaking code ? I guess it must be in some of
> the pci_dma_dev_setup callbacks, but those I looked at only check devicetree
> data or simply set function pointers, both of which should not be affected by
> the call order.
> 
> How about pcibios_fixup_device, to be called after pcibios_fixup_bus ?

Mostly the pseries ones from a cursory glance. They could be improved to
be agnostic to the call order I suppose or simply done better but heh...

I like using pcibios_add though ... we just need to figure out how to
sort out that ordering if possible.

Cheers,
Ben.

Patch

diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index 6053f03..41dd980 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -1004,13 +1004,8 @@  void pcibios_setup_bus_self(struct pci_bus *bus)
 		ppc_md.pci_dma_bus_setup(bus);
 }
 
-void pcibios_setup_device(struct pci_dev *dev)
+int pcibios_add_device(struct pci_dev *dev)
 {
-	/* Fixup NUMA node as it may not be setup yet by the generic
-	 * code and is needed by the DMA init
-	 */
-	set_dev_node(&dev->dev, pcibus_to_node(dev->bus));
-
 	/* Hook up default DMA ops */
 	set_dma_ops(&dev->dev, pci_dma_ops);
 	set_dma_offset(&dev->dev, PCI_DRAM_OFFSET);
@@ -1023,24 +1018,16 @@  void pcibios_setup_device(struct pci_dev *dev)
 	pci_read_irq_line(dev);
 	if (ppc_md.pci_irq_fixup)
 		ppc_md.pci_irq_fixup(dev);
+
+	return 0;
 }
 
 void pcibios_setup_bus_devices(struct pci_bus *bus)
 {
-	struct pci_dev *dev;
-
 	pr_debug("PCI: Fixup bus devices %d (%s)\n",
 		 bus->number, bus->self ? pci_name(bus->self) : "PHB");
 
-	list_for_each_entry(dev, &bus->devices, bus_list) {
-		/* Cardbus can call us to add new devices to a bus, so ignore
-		 * those who are already fully discovered
-		 */
-		if (dev->is_added)
-			continue;
-
-		pcibios_setup_device(dev);
-	}
+	/* nothing to do */
 }
 
 void pcibios_set_master(struct pci_dev *dev)
@@ -1479,10 +1466,6 @@  int pcibios_enable_device(struct pci_dev *dev, int mask)
 		if (ppc_md.pcibios_enable_device_hook(dev))
 			return -EINVAL;
 
-	/* avoid pcie irq fix up impact on cardbus */
-	if (dev->hdr_type != PCI_HEADER_TYPE_CARDBUS)
-		pcibios_setup_device(dev);
-
 	return pci_enable_resources(dev, mask);
 }