Patchwork powerpc: reduce multi-hit of pcibios_setup_device() in hotplug

login
register
mail settings
Submitter Wei Yang
Date May 8, 2014, 6:30 a.m.
Message ID <1399530602-4231-1-git-send-email-weiyang@linux.vnet.ibm.com>
Download mbox | patch
Permalink /patch/346900/
State New
Delegated to: Benjamin Herrenschmidt
Headers show

Comments

Wei Yang - May 8, 2014, 6:30 a.m.
During the EEH hotplug event, pcibios_setup_device() will be invoked two
times. And the last time will trigger a warning of re-attachment of iommu
group.

The two times are:

    pci_device_add
        ...
        pcibios_add_device
        pcibios_setup_device   <- 1st time
    pcibios_add_pci_devices
        ...
        pcibios_setup_bus_devices
        pcibios_setup_device   <- 2rd time

As we see, in pcibios_add_pci_devices() the pci_bus passed in as a parameter
is initialized and already added in the system. Which means the
pcibios_setup_device() in pcibios_add_device() will be called. Then the
pcibios_setup_device() in pcibios_setup_bus_devices() is the 2nd time to be
called on the same pci_dev.

After applying the patch, the warning of re-attach the iommu group will be
fixed.

[  161.133979] ------------[ cut here ]------------
[  161.134013] WARNING: at arch/powerpc/kernel/iommu.c:1125
[  161.134046] Modules linked in: xt_CHECKSUM bnep bluetooth 6lowpan_iphc rfkill nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6t_REJECT xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw mlx4_ib ib_sa ib_mad ib_core ib_addr bnx2x ses enclosure mlx4_core(O) tg3 be2net ptp pps_core mdio libcrc32c shpchp kvm nfsd nfs_acl lockd sunrpc binfmt_misc uinput lpfc scsi_transport_fc ipr scsi_tgt
[  161.134678] CPU: 23 PID: 650 Comm: eehd Tainted: G           O 3.14.0-rc5yw+ #173
[  161.134727] task: c0000027ed485670 ti: c0000027ed50c000 task.ti: c0000027ed50c000
[  161.134777] NIP: c00000000003ca40 LR: c00000000006c738 CTR: c00000000006c6b0
[  161.134827] REGS: c0000027ed50f470 TRAP: 0700   Tainted: G           O  (3.14.0-rc5yw+)
[  161.134876] MSR: 9000000000029032 <SF,HV,EE,ME,IR,DR,RI>  CR: 88008084  XER: 20000000
[  161.134992] CFAR: c00000000006c734 SOFTE: 1
GPR00: c00000000006c738 c0000027ed50f6f0 c0000000013982f0 c0000027ed464000
GPR04: c0000027e8a87000 c00000000006a980 c0000000016e84a0 0000000000004400
GPR08: c0000000012cd490 0000000000000001 c0000027ed463fff 000000003003b868
GPR12: 0000000028008084 c00000000fdccf00 c0000000000d1970 c000002d74cf0840
GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR24: 0000000000000001 000000005360a3b9 00000000000f2d17 c000000fe7659800
GPR28: 0000000000000000 c0000000016e3a80 c0000027e8a87090 c0000027e8a87000
[  161.135624] NIP [c00000000003ca40] .iommu_add_device+0x30/0x1f0
[  161.135668] LR [c00000000006c738] .pnv_pci_ioda_dma_dev_setup+0x88/0xb0
[  161.135709] Call Trace:
[  161.135728] [c0000027ed50f6f0] [c0000027ed50f780] 0xc0000027ed50f780 (unreliable)
[  161.135787] [c0000027ed50f780] [c00000000006c738] .pnv_pci_ioda_dma_dev_setup+0x88/0xb0
[  161.135845] [c0000027ed50f800] [c0000000000699c8] .pnv_pci_dma_dev_setup+0x78/0x310
[  161.135903] [c0000027ed50f8a0] [c000000000044008] .pcibios_setup_device+0x88/0x2f0
[  161.135961] [c0000027ed50f970] [c000000000045d40] .pcibios_setup_bus_devices+0x60/0xd0
[  161.136019] [c0000027ed50f9f0] [c0000000000436dc] .pcibios_add_pci_devices+0xdc/0x1c0
[  161.136078] [c0000027ed50fa80] [c00000000086f4ac] .eeh_reset_device+0x21c/0x2d8
[  161.136136] [c0000027ed50fb40] [c000000000039db8] .eeh_handle_normal_event+0x3b8/0x3f0
[  161.136193] [c0000027ed50fbd0] [c000000000039e38] .eeh_handle_event+0x48/0x320
[  161.136251] [c0000027ed50fc80] [c00000000003a20c] .eeh_event_handler+0xfc/0x1b0
[  161.136309] [c0000027ed50fd30] [c0000000000d1a80] .kthread+0x110/0x130
[  161.136360] [c0000027ed50fe30] [c00000000000a460] .ret_from_kernel_thread+0x5c/0x7c
[  161.136417] Instruction dump:
[  161.136442] 7c0802a6 fba1ffe8 fbc1fff0 fbe1fff8 f8010010 f821ff71 7c7e1b78 60000000
[  161.136524] 60000000 e87e0298 3143ffff 7d2a1910 <0b090000> 2fa90000 40de00c8 ebfe0218
[  161.136608] ---[ end trace b959c4ca6f08c270 ]---
[  161.136641] iommu_tce: device 0003:05:00.0 is already in iommu group 7, skipping

This patch removes the pcibios_setup_bus_devices() in
pcibios_add_pci_devices().

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/pci-hotplug.c |    1 -
 1 file changed, 1 deletion(-)
Benjamin Herrenschmidt - May 12, 2014, 2:59 a.m.
On Thu, 2014-05-08 at 14:30 +0800, Wei Yang wrote:
> During the EEH hotplug event, pcibios_setup_device() will be invoked two
> times. And the last time will trigger a warning of re-attachment of iommu
> group.
> 
> The two times are:
> 
>     pci_device_add
>         ...
>         pcibios_add_device
>         pcibios_setup_device   <- 1st time
>     pcibios_add_pci_devices
>         ...
>         pcibios_setup_bus_devices
>         pcibios_setup_device   <- 2rd time
> 
> As we see, in pcibios_add_pci_devices() the pci_bus passed in as a parameter
> is initialized and already added in the system. Which means the
> pcibios_setup_device() in pcibios_add_device() will be called. Then the
> pcibios_setup_device() in pcibios_setup_bus_devices() is the 2nd time to be
> called on the same pci_dev.

(CC'ing Bjorn to make sure I get the mess right :-)

So the patch makes me a bit nervous because we have convoluted
dependencies on some of that in the actual PCI hotplug code (the
real thing which rescans busses etc...).

I *think* the patch might be right (though incomplete) on those
grounds, but I'd like you to verify and test the various hotplug
cases in pHyp and I think issue comes from pcibios_add_device() being
somewhat a late addition.

Basically, what happens I think is at boot time:

 - bus->is_added is false, dev->is_added is false during initial probe
   of a given device. So pcibios_add_device() does nothing.

 - shortly afterward, the core calls pcibios_fixup_bus(), still with
bus->is_added set to false, which *will* do the setup because
dev->is_added is also false for all devices.

 - we set bus->is_added

 - we call pci_bus_add_devices() which sets all the dev->is_added

So far so good. Now, when we hotplug something (and there are some
distinction here depending on what hotplug path we take which is why I'd
like you to scrutinize things a bit more), we mostly hit powerpc's
pcibios_add_pci_devices().

Now this function will operate differently on a "devtree" style probe
(phyp/kvm guest) vs. a "normal" probe (other bare metal machines).

The normal case is what you are trying to fix here. It does an explicit
pcibios_setup_bus_devices() on the bus being rescanned. I think you are
right this isn't necessary. This bus has bus->is_added set to true and
thus pcibios_add_device() will do the setup for any new devices. 

That makes me think that we need a similar fix in pci_of_scan.c for
when we call of_rescan_bus(). The fix would be probably in
__of_scan_bus(), to move the call to pcibios_setup_bus_devices() inside
the if (!rescan_existing) statement, since if we are scanning an
existing bus (which thus has bus->is_added set), we know
pcibios_add_devices() will have done the updates.

In fact I wonder if we could just use bus->is_added for the test in
there and get rid of the "rescan_existing" argument completely. Worth
adding something like WARN_ON(rescan_existing != bus->is_added); and
do a few tests of hotplug to see what it looks like.

Now there are two different hotplug path at least in the pseries code,
so have a look make sure we aren't missing anything here and please test
all cases !

Cheers,
Ben.

Patch

diff --git a/arch/powerpc/kernel/pci-hotplug.c b/arch/powerpc/kernel/pci-hotplug.c
index c1e17ae..39a1bac 100644
--- a/arch/powerpc/kernel/pci-hotplug.c
+++ b/arch/powerpc/kernel/pci-hotplug.c
@@ -94,7 +94,6 @@  void pcibios_add_pci_devices(struct pci_bus * bus)
 		 */
 		slotno = PCI_SLOT(PCI_DN(dn->child)->devfn);
 		pci_scan_slot(bus, PCI_DEVFN(slotno, 0));
-		pcibios_setup_bus_devices(bus);
 		max = bus->busn_res.start;
 		for (pass = 0; pass < 2; pass++) {
 			list_for_each_entry(dev, &bus->devices, bus_list) {