diff mbox

[bisected] tg3 broken in 3.18.0?

Message ID 548EF90A.5070607@gmail.com
State Not Applicable, archived
Delegated to: David Miller
Headers show

Commit Message

Marcelo Ricardo Leitner Dec. 15, 2014, 3:06 p.m. UTC
On 13-12-2014 19:02, Nils Holland wrote:
> rajatxjain@gmail.com
> Bcc:
> Subject: Re: [bisected] tg3 broken in 3.18.0?
> Reply-To:
> In-Reply-To: <20141212.201831.186234837340644301.davem@davemloft.net>
>
> On Fri, Dec 12, 2014 at 08:18:31PM -0500, David Miller wrote:
>> From: Nils Holland <nholland@tisys.org>
>> Date: Sat, 13 Dec 2014 02:14:08 +0100
>>
>>>
>>> My bisect exercise suggests that the following commit is the culprit:
>>>
>>> 89665a6a71408796565bfd29cfa6a7877b17a667 (PCI: Check only the Vendor
>>> ID to identify Configuration Request Retry)
>>
>> You definitely need to bring this up with the author of that change
>> and the relevent list for the PCI subsystem and/or linux-kernel.
>
> I've now already sent an inquiry to Rajat Jain, the author of the
> patch in question, and this message here is now also CC'd to
> linux-pci@.
>
> With this message, I'd like to add one last result of investigation
> I've done today, in the hope that it will aid the folks with more
> knowledge to go after the issue.

FWIW, reverting this change fixes tg3 in here too.

Thanks Nils for doing the bisect!

With these debugs (note the re-revert):

                         return false;
+               }
                 /* Card hasn't responded in 60 seconds?  Must be stuck. */
                 if (delay > crs_timeout) {
                         printk(KERN_WARNING "pci %04x:%02x:%02x.%d: not 
responding\n",
@@ -1451,6 +1459,7 @@ bool pci_bus_read_dev_vendor_id(struct pci_bus *bus, int 
devfn, u32 *l,
                                PCI_FUNC(devfn));
                         return false;
                 }
+               printk ("pci %04x:%02x:%02x.%d: %x %x\n", pci_domain_nr(bus), 
bus->number, PCI_SLOT(devfn), PCI_FUNC(devfn), *l, *l & 0xffff);
         }

         return true;

I'm getting, with commit 89665a6a71408796565bfd29cfa6a7877b17a667:

$ grep 'pci 0000:02' tg3.bad
[    0.190733] pci 0000:02:00.0: 1st 165a14e4 14e4
[    0.190736] pci 0000:02:00.0: 1st 165a14e4 14e4
[    0.190810] pci 0000:02:00.0: [14e4:165a] type 00 class 0x020000
[    0.190885] pci 0000:02:00.0: reg 0x10: [mem 0xf7c40000-0xf7c4ffff 64bit]
[    0.191048] pci 0000:02:00.0: reg 0x30: [mem 0xf7c00000-0xf7c3ffff pref]
[    0.191382] pci 0000:02:00.0: PME# supported from D3hot D3cold
[    0.191438] pci 0000:02:00.0: System wakeup disabled by ACPI
[    1.561555] pci 0000:02:00.0: 1st 1 1
[    1.561558] pci 0000:02:00.0: crs_timeout: 0
[   20.412021] pci 0000:02:00.0: 1st 1 1
[   20.412022] pci 0000:02:00.0: crs_timeout: 0
[   20.413596] pci 0000:02:00.0: 1st 1 1
[   20.413598] pci 0000:02:00.0: crs_timeout: 0

And without it:

$ grep 'pci 0000:02' tg3.good
[    0.190734] pci 0000:02:00.0: 1st 165a14e4 14e4
[    0.190738] pci 0000:02:00.0: 1st 165a14e4 14e4
[    0.190811] pci 0000:02:00.0: [14e4:165a] type 00 class 0x020000
[    0.190884] pci 0000:02:00.0: reg 0x10: [mem 0xf7c40000-0xf7c4ffff 64bit]
[    0.191047] pci 0000:02:00.0: reg 0x30: [mem 0xf7c00000-0xf7c3ffff pref]
[    0.191380] pci 0000:02:00.0: PME# supported from D3hot D3cold
[    0.191439] pci 0000:02:00.0: System wakeup disabled by ACPI
[    1.576778] pci 0000:02:00.0: 1st 1 1
[   19.068517] pci 0000:02:00.0: 1st 165a14e4 14e4

Hope that helps!

   Marcelo

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Rajat Jain Dec. 16, 2014, 4:04 p.m. UTC | #1
Hello All,

Apologies for jumping in late, but for some reason I do not see the
original mail in my inbox. However I am taking a look at the mails as
sent on linux-pci (and I will keep an eye out for the bug report that
Bjorn asked for).


>
> I'm getting, with commit 89665a6a71408796565bfd29cfa6a7877b17a667:
>
> $ grep 'pci 0000:02' tg3.bad
> [    0.190733] pci 0000:02:00.0: 1st 165a14e4 14e4
> [    0.190736] pci 0000:02:00.0: 1st 165a14e4 14e4
> [    0.190810] pci 0000:02:00.0: [14e4:165a] type 00 class 0x020000
> [    0.190885] pci 0000:02:00.0: reg 0x10: [mem 0xf7c40000-0xf7c4ffff 64bit]
> [    0.191048] pci 0000:02:00.0: reg 0x30: [mem 0xf7c00000-0xf7c3ffff pref]
> [    0.191382] pci 0000:02:00.0: PME# supported from D3hot D3cold
> [    0.191438] pci 0000:02:00.0: System wakeup disabled by ACPI
> [    1.561555] pci 0000:02:00.0: 1st 1 1
> [    1.561558] pci 0000:02:00.0: crs_timeout: 0
> [   20.412021] pci 0000:02:00.0: 1st 1 1
> [   20.412022] pci 0000:02:00.0: crs_timeout: 0
> [   20.413596] pci 0000:02:00.0: 1st 1 1
> [   20.413598] pci 0000:02:00.0: crs_timeout: 0
>
> And without it:
>
> $ grep 'pci 0000:02' tg3.good
> [    0.190734] pci 0000:02:00.0: 1st 165a14e4 14e4
> [    0.190738] pci 0000:02:00.0: 1st 165a14e4 14e4
> [    0.190811] pci 0000:02:00.0: [14e4:165a] type 00 class 0x020000
> [    0.190884] pci 0000:02:00.0: reg 0x10: [mem 0xf7c40000-0xf7c4ffff 64bit]
> [    0.191047] pci 0000:02:00.0: reg 0x30: [mem 0xf7c00000-0xf7c3ffff pref]
> [    0.191380] pci 0000:02:00.0: PME# supported from D3hot D3cold
> [    0.191439] pci 0000:02:00.0: System wakeup disabled by ACPI
> [    1.576778] pci 0000:02:00.0: 1st 1 1
> [   19.068517] pci 0000:02:00.0: 1st 165a14e4 14e4
>

It seems that in the first 2 attempts that were made to probe the
device are all OK and return regular device ID and vendor ID for TG3
(CRS does not have a role to play). However, later attempts return a
CRS.

1) May I ask if you are using acpihp or pciehp? I assume pciehp?

2) Can you please also send dmesg output while passing
pciehp.pciehp_debug=1? In the fail case, do you see a message
indicating the pciehp gave up since it got CRS for a long time
(something like "pci 0000:02:00.0 id reading try 50 times with
interval 20 ms to get ffff0001")?

3) Currently the pciehp passes "0" for the argument "crs_timeout" to
pci_bus_read_dev_vendor_id(). Can you please try increasing it to, say
30 seconds (30 * 1000). (For comparison data, acpihp uses the value
60*1000 i.e. 60 seconds today) and run the fail case once again?

Thanks a lot in advance for the debugging help ;-)

Rajat
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas Dec. 16, 2014, 4:20 p.m. UTC | #2
[+cc Rafael, Prashant, Michael]

On Tue, Dec 16, 2014 at 9:04 AM, Rajat Jain <rajatxjain@gmail.com> wrote:
> Hello All,
>
> Apologies for jumping in late, but for some reason I do not see the
> original mail in my inbox. However I am taking a look at the mails as
> sent on linux-pci (and I will keep an eye out for the bug report that
> Bjorn asked for).
>
>
>>
>> I'm getting, with commit 89665a6a71408796565bfd29cfa6a7877b17a667:
>>
>> $ grep 'pci 0000:02' tg3.bad
>> [    0.190733] pci 0000:02:00.0: 1st 165a14e4 14e4
>> [    0.190736] pci 0000:02:00.0: 1st 165a14e4 14e4
>> [    0.190810] pci 0000:02:00.0: [14e4:165a] type 00 class 0x020000
>> [    0.190885] pci 0000:02:00.0: reg 0x10: [mem 0xf7c40000-0xf7c4ffff 64bit]
>> [    0.191048] pci 0000:02:00.0: reg 0x30: [mem 0xf7c00000-0xf7c3ffff pref]
>> [    0.191382] pci 0000:02:00.0: PME# supported from D3hot D3cold
>> [    0.191438] pci 0000:02:00.0: System wakeup disabled by ACPI
>> [    1.561555] pci 0000:02:00.0: 1st 1 1
>> [    1.561558] pci 0000:02:00.0: crs_timeout: 0
>> [   20.412021] pci 0000:02:00.0: 1st 1 1
>> [   20.412022] pci 0000:02:00.0: crs_timeout: 0
>> [   20.413596] pci 0000:02:00.0: 1st 1 1
>> [   20.413598] pci 0000:02:00.0: crs_timeout: 0
>>
>> And without it:
>>
>> $ grep 'pci 0000:02' tg3.good
>> [    0.190734] pci 0000:02:00.0: 1st 165a14e4 14e4
>> [    0.190738] pci 0000:02:00.0: 1st 165a14e4 14e4
>> [    0.190811] pci 0000:02:00.0: [14e4:165a] type 00 class 0x020000
>> [    0.190884] pci 0000:02:00.0: reg 0x10: [mem 0xf7c40000-0xf7c4ffff 64bit]
>> [    0.191047] pci 0000:02:00.0: reg 0x30: [mem 0xf7c00000-0xf7c3ffff pref]
>> [    0.191380] pci 0000:02:00.0: PME# supported from D3hot D3cold
>> [    0.191439] pci 0000:02:00.0: System wakeup disabled by ACPI
>> [    1.576778] pci 0000:02:00.0: 1st 1 1
>> [   19.068517] pci 0000:02:00.0: 1st 165a14e4 14e4
>>
>
> It seems that in the first 2 attempts that were made to probe the
> device are all OK and return regular device ID and vendor ID for TG3
> (CRS does not have a role to play). However, later attempts return a
> CRS.
>
> 1) May I ask if you are using acpihp or pciehp? I assume pciehp?
>
> 2) Can you please also send dmesg output while passing
> pciehp.pciehp_debug=1? In the fail case, do you see a message
> indicating the pciehp gave up since it got CRS for a long time
> (something like "pci 0000:02:00.0 id reading try 50 times with
> interval 20 ms to get ffff0001")?
>
> 3) Currently the pciehp passes "0" for the argument "crs_timeout" to
> pci_bus_read_dev_vendor_id(). Can you please try increasing it to, say
> 30 seconds (30 * 1000). (For comparison data, acpihp uses the value
> 60*1000 i.e. 60 seconds today) and run the fail case once again?

Using zero for the timeout seems bogus to me.  But I doubt pciehp is
involved in this situation.

I think we're in this path:

    tg3_init_hw
      tg3_reset_hw
        tg3_disable_ints
        tg3_stop_fw
        tg3_write_sig_pre_reset
        tg3_chip_reset
          pci_device_is_present
            pci_bus_read_dev_vendor_id

and in this case pci_device_is_present() also passes a timeout of zero
to pci_bus_read_dev_vendor_id().  My guess is that tg3 is resetting
the device, so it's not too surprising that the config read returns
CRS status immediately afterward.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michael Chan Dec. 16, 2014, 5:15 p.m. UTC | #3
On Tue, 2014-12-16 at 09:20 -0700, Bjorn Helgaas wrote:
> I think we're in this path:
> 
>     tg3_init_hw
>       tg3_reset_hw
>         tg3_disable_ints
>         tg3_stop_fw
>         tg3_write_sig_pre_reset
>         tg3_chip_reset
>           pci_device_is_present
>             pci_bus_read_dev_vendor_id
> 
> and in this case pci_device_is_present() also passes a timeout of zero
> to pci_bus_read_dev_vendor_id().  My guess is that tg3 is resetting
> the device, so it's not too surprising that the config read returns
> CRS status immediately afterward.
> 
At the point of calling pci_device_is_present(), chip reset hasn't
started yet, so there should be no problem reading config space.

In all the newer tg3 chips, chip reset does not reset the PCIE block.
So I think config space should always be accesible even during reset.
> 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Marcelo Ricardo Leitner Dec. 16, 2014, 5:59 p.m. UTC | #4
On 16-12-2014 15:15, Michael Chan wrote:
> On Tue, 2014-12-16 at 09:20 -0700, Bjorn Helgaas wrote:
>> I think we're in this path:
>>
>>      tg3_init_hw
>>        tg3_reset_hw
>>          tg3_disable_ints
>>          tg3_stop_fw
>>          tg3_write_sig_pre_reset
>>          tg3_chip_reset
>>            pci_device_is_present
>>              pci_bus_read_dev_vendor_id
>>
>> and in this case pci_device_is_present() also passes a timeout of zero
>> to pci_bus_read_dev_vendor_id().  My guess is that tg3 is resetting
>> the device, so it's not too surprising that the config read returns
>> CRS status immediately afterward.
>>
> At the point of calling pci_device_is_present(), chip reset hasn't
> started yet, so there should be no problem reading config space.
> 
> In all the newer tg3 chips, chip reset does not reset the PCIE block.
> So I think config space should always be accesible even during reset.

It's a 
02:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5722 Gigabit Ethernet PCI Express
over here

I put a WARN_ON(1) after those printks, and this is what I got:

[    1.550640] pci 0000:02:00.0: 1st 1 1
[    1.550643] pci 0000:02:00.0: crs_timeout: 0
[    1.550645] ------------[ cut here ]------------
[    1.550651] WARNING: CPU: 6 PID: 364 at drivers/pci/probe.c:1445 pci_bus_read_dev_vendor_id+0x1d4/0x1e0()
[    1.550652] Modules linked in: i915(+) raid0 i2c_algo_bit drm_kms_helper drm e1000e(+) tg3(+) ptp pps_core video
[    1.550660] CPU: 6 PID: 364 Comm: systemd-udevd Not tainted 3.18.0-rc6+ #8
[    1.550661] Hardware name: Dell Inc. OptiPlex 9010/03K80F, BIOS A15 08/12/2013
[    1.550662]  0000000000000000 000000004de2d8dc ffff8807eabdf948 ffffffff8173db46
[    1.550665]  0000000000000000 0000000000000000 ffff8807eabdf988 ffffffff81094d41
[    1.550667]  ffff8807eabdf968 ffff8807f1e27000 0000000000000000 0000000000000000
[    1.550669] Call Trace:
[    1.550675]  [<ffffffff8173db46>] dump_stack+0x46/0x58
[    1.550679]  [<ffffffff81094d41>] warn_slowpath_common+0x81/0xa0
[    1.550681]  [<ffffffff81094e5a>] warn_slowpath_null+0x1a/0x20
[    1.550683]  [<ffffffff813b2864>] pci_bus_read_dev_vendor_id+0x1d4/0x1e0
[    1.550687]  [<ffffffff813b7c3e>] pci_device_is_present+0x2e/0x50
[    1.550693]  [<ffffffffa003364f>] tg3_chip_reset+0x2f/0x940 [tg3]
[    1.550697]  [<ffffffffa0033f9f>] tg3_halt+0x3f/0x1e0 [tg3]
[    1.550701]  [<ffffffffa0044f83>] tg3_init_one+0xb83/0x1a40 [tg3]
[    1.550705]  [<ffffffff8127d2bb>] ? kernfs_activate+0x7b/0xf0
[    1.550708]  [<ffffffff813bbdc5>] local_pci_probe+0x45/0xa0
[    1.550711]  [<ffffffff8127fa8d>] ? sysfs_do_create_link_sd.isra.2+0x6d/0xc0
[    1.550714]  [<ffffffff813bd1b9>] pci_device_probe+0xf9/0x150
[    1.550717]  [<ffffffff814906fd>] driver_probe_device+0x12d/0x3d0
[    1.550720]  [<ffffffff81490a7b>] __driver_attach+0x9b/0xa0
[    1.550722]  [<ffffffff814909e0>] ? __device_attach+0x40/0x40
[    1.550724]  [<ffffffff8148e4f3>] bus_for_each_dev+0x73/0xc0
[    1.550726]  [<ffffffff814900ee>] driver_attach+0x1e/0x20
[    1.550729]  [<ffffffff8148fcb0>] bus_add_driver+0x180/0x250
[    1.550731]  [<ffffffffa0050000>] ? 0xffffffffa0050000
[    1.550733]  [<ffffffff81491274>] driver_register+0x64/0xf0
[    1.550735]  [<ffffffff813bb72b>] __pci_register_driver+0x4b/0x50
[    1.550739]  [<ffffffffa005001e>] tg3_driver_init+0x1e/0x1000 [tg3]
[    1.550742]  [<ffffffff81002144>] do_one_initcall+0xd4/0x210
[    1.550747]  [<ffffffff811cbc42>] ? __vunmap+0xc2/0x110
[    1.550751]  [<ffffffff8111336b>] load_module+0x1cab/0x2730
[    1.550753]  [<ffffffff8110efc0>] ? store_uevent+0x70/0x70
[    1.550756]  [<ffffffff8120b090>] ? kernel_read+0x50/0x80
[    1.550760]  [<ffffffff81113fa6>] SyS_finit_module+0xa6/0xd0
[    1.550763]  [<ffffffff81745129>] system_call_fastpath+0x12/0x17
[    1.550764] ---[ end trace 4cc3153e369484ea ]---
[    1.550963] tg3 0000:02:00.0 eth0: Tigon3 [partno(BCM95722) rev a200] (PCI Express) MAC address 00:0a:f7:2b:9b:39
[    1.550965] tg3 0000:02:00.0 eth0: attached PHY is 5722/5756 (10/100/1000Base-T Ethernet) (WireSpeed[1], EEE[0])
[    1.550966] tg3 0000:02:00.0 eth0: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] TSOcap[1]
[    1.550967] tg3 0000:02:00.0 eth0: dma_rwctrl[76180000] dma_mask[64-bit]
[    1.556112] tg3 0000:02:00.0 p1p1: renamed from eth0
...

[   23.545119] tg3 0000:02:00.0: irq 32 for MSI/MSI-X
[   25.424981] tg3 0000:02:00.0 p1p1: No firmware running
[   25.425686] pci 0000:02:00.0: 1st 1 1
[   25.425687] pci 0000:02:00.0: crs_timeout: 0
[   25.425687] ------------[ cut here ]------------
[   25.425691] WARNING: CPU: 0 PID: 1590 at drivers/pci/probe.c:1445 pci_bus_read_dev_vendor_id+0x1d4/0x1e0()
[   25.425692] Modules linked in: bridge stp llc openvswitch x86_pkg_temp_thermal coretemp kvm_intel kvm snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_co
dec_generic snd_hda_intel snd_hda_controller snd_hda_codec crct10dif_pclmul snd_hwdep crc32_pclmul crc32c_intel ghash_clmulni_intel snd_seq mei_me snd_seq_d
evice snd_pcm iTCO_wdt iTCO_vendor_support snd_timer mei snd lpc_ich i2c_i801 pcspkr mfd_core dcdbas serio_raw soundcore microcode ie31200_edac shpchp edac_
core nfsd auth_rpcgss nfs_acl lockd grace sunrpc binfmt_misc xfs libcrc32c i915 raid0 i2c_algo_bit drm_kms_helper drm e1000e tg3 ptp pps_core video
[   25.425714] CPU: 0 PID: 1590 Comm: ip Tainted: G        W      3.18.0-rc6+ #8
[   25.425715] Hardware name: Dell Inc. OptiPlex 9010/03K80F, BIOS A15 08/12/2013
[   25.425716]  0000000000000000 0000000097b01d0c ffff8807f0687408 ffffffff8173db46
[   25.425717]  0000000000000000 0000000000000000 ffff8807f0687448 ffffffff81094d41
[   25.425719]  ffff8807f0687428 ffff8807f1e27000 0000000000000000 0000000000000000
[   25.425720] Call Trace:
[   25.425723]  [<ffffffff8173db46>] dump_stack+0x46/0x58
[   25.425726]  [<ffffffff81094d41>] warn_slowpath_common+0x81/0xa0
[   25.425728]  [<ffffffff81094e5a>] warn_slowpath_null+0x1a/0x20
[   25.425729]  [<ffffffff813b2864>] pci_bus_read_dev_vendor_id+0x1d4/0x1e0
[   25.425733]  [<ffffffffa0028b47>] ? tg3_phy_auxctl_write+0x27/0x30 [tg3]
[   25.425735]  [<ffffffff813b7c3e>] pci_device_is_present+0x2e/0x50
[   25.425738]  [<ffffffffa003364f>] tg3_chip_reset+0x2f/0x940 [tg3]
[   25.425740]  [<ffffffffa003756d>] tg3_reset_hw+0x8d/0x2ce0 [tg3]
[   25.425743]  [<ffffffff81383f6a>] ? delay_tsc+0x4a/0x80
[   25.425744]  [<ffffffff81383eec>] ? __udelay+0x2c/0x30
[   25.425747]  [<ffffffffa0026be4>] ? _tw32_flush+0x44/0x80 [tg3]
[   25.425749]  [<ffffffffa003a216>] tg3_init_hw+0x56/0x60 [tg3]
[   25.425751]  [<ffffffffa003c0d5>] tg3_start+0xbe5/0x1210 [tg3]
[   25.425753]  [<ffffffff81383eec>] ? __udelay+0x2c/0x30
[   25.425755]  [<ffffffffa0026be4>] ? _tw32_flush+0x44/0x80 [tg3]
[   25.425757]  [<ffffffffa003c828>] tg3_open+0x128/0x2e0 [tg3]
[   25.425760]  [<ffffffff8162c6cf>] __dev_open+0xcf/0x140
[   25.425761]  [<ffffffff8162c9f1>] __dev_change_flags+0xa1/0x160
[   25.425762]  [<ffffffff8162cad9>] dev_change_flags+0x29/0x60
[   25.425764]  [<ffffffff8163a4a9>] do_setlink+0x399/0xa90
[   25.425766]  [<ffffffff8163ca7c>] rtnl_newlink+0x51c/0x740
[   25.425768]  [<ffffffff8163c653>] ? rtnl_newlink+0xf3/0x740
[   25.425771]  [<ffffffff811e730c>] ? new_slab+0x14c/0x490
[   25.425774]  [<ffffffff81303188>] ? security_capable+0x18/0x20
[   25.425776]  [<ffffffff8109cf7d>] ? ns_capable+0x2d/0x60
[   25.425778]  [<ffffffff816391a4>] rtnetlink_rcv_msg+0xa4/0x270
[   25.425780]  [<ffffffff8165840d>] ? __netlink_lookup+0x4d/0x70
[   25.425781]  [<ffffffff81639100>] ? rtnetlink_rcv+0x40/0x40
[   25.425783]  [<ffffffff8165c4a1>] netlink_rcv_skb+0xc1/0xe0
[   25.425784]  [<ffffffff816390ec>] rtnetlink_rcv+0x2c/0x40
[   25.425785]  [<ffffffff8165ba26>] netlink_unicast+0x106/0x210
[   25.425787]  [<ffffffff8165be55>] netlink_sendmsg+0x325/0x790
[   25.425788]  [<ffffffff8160de50>] sock_sendmsg+0xa0/0xe0
[   25.425791]  [<ffffffff8120e8cd>] ? lookup_real+0x1d/0x50
[   25.425792]  [<ffffffff8160e394>] ___sys_sendmsg+0x2f4/0x310
[   25.425794]  [<ffffffff8119bdf2>] ? lru_cache_add_active_or_unevictable+0x32/0xc0
[   25.425796]  [<ffffffff8160c673>] ? sock_destroy_inode+0x33/0x40
...
[   25.425794]  [<ffffffff8119bdf2>] ? lru_cache_add_active_or_unevictable+0x32/0xc0
[   25.425796]  [<ffffffff8160c673>] ? sock_destroy_inode+0x33/0x40
[   25.425798]  [<ffffffff8121bfd5>] ? __dentry_kill+0x145/0x1d0
[   25.425799]  [<ffffffff8121c105>] ? dput+0xa5/0x170
[   25.425800]  [<ffffffff81224f74>] ? mntput+0x24/0x40
[   25.425802]  [<ffffffff81206d6a>] ? __fput+0x17a/0x1e0
[   25.425803]  [<ffffffff8160ee21>] __sys_sendmsg+0x51/0x90
[   25.425805]  [<ffffffff8160ee72>] SyS_sendmsg+0x12/0x20
[   25.425807]  [<ffffffff81745129>] system_call_fastpath+0x12/0x17
[   25.425808] ---[ end trace 4cc3153e369484eb ]---
[   25.427385] pci 0000:02:00.0: 1st 1 1
[   25.427386] pci 0000:02:00.0: crs_timeout: 0
[   25.427387] ------------[ cut here ]------------
[   25.427389] WARNING: CPU: 0 PID: 1590 at drivers/pci/probe.c:1445 pci_bus_read_dev_vendor_id+0x1d4/0x1e0()
[   25.427389] Modules linked in: bridge stp llc openvswitch x86_pkg_temp_thermal coretemp kvm_intel kvm snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec crct10dif_pclmul snd_hwdep crc32_pclmul crc32c_intel ghash_clmulni_intel snd_seq mei_me snd_seq_device snd_pcm iTCO_wdt iTCO_vendor_support snd_timer mei snd lpc_ich i2c_i801 pcspkr mfd_core dcdbas serio_raw soundcore microcode ie31200_edac shpchp edac_core nfsd auth_rpcgss nfs_acl lockd grace sunrpc binfmt_misc xfs libcrc32c i915 raid0 i2c_algo_bit drm_kms_helper drm e1000e tg3 ptp pps_core video
[   25.427403] CPU: 0 PID: 1590 Comm: ip Tainted: G        W      3.18.0-rc6+ #8
[   25.427404] Hardware name: Dell Inc. OptiPlex 9010/03K80F, BIOS A15 08/12/2013
[   25.427405]  0000000000000000 0000000097b01d0c ffff8807f0687488 ffffffff8173db46
[   25.427406]  0000000000000000 0000000000000000 ffff8807f06874c8 ffffffff81094d41
[   25.427416]  ffff8807f06874a8 ffff8807f1e27000 0000000000000000 0000000000000000
[   25.427417] Call Trace:
[   25.427418]  [<ffffffff8173db46>] dump_stack+0x46/0x58
[   25.427420]  [<ffffffff81094d41>] warn_slowpath_common+0x81/0xa0
[   25.427421]  [<ffffffff81094e5a>] warn_slowpath_null+0x1a/0x20
[   25.427423]  [<ffffffff813b2864>] pci_bus_read_dev_vendor_id+0x1d4/0x1e0
[   25.427425]  [<ffffffff813b7c3e>] pci_device_is_present+0x2e/0x50
[   25.427427]  [<ffffffffa003364f>] tg3_chip_reset+0x2f/0x940 [tg3]
[   25.427430]  [<ffffffffa0033f9f>] tg3_halt+0x3f/0x1e0 [tg3]
[   25.427433]  [<ffffffffa003c208>] tg3_start+0xd18/0x1210 [tg3]
[   25.427434]  [<ffffffff81383eec>] ? __udelay+0x2c/0x30
[   25.427437]  [<ffffffffa0026be4>] ? _tw32_flush+0x44/0x80 [tg3]
[   25.427439]  [<ffffffffa003c828>] tg3_open+0x128/0x2e0 [tg3]
[   25.427441]  [<ffffffff8162c6cf>] __dev_open+0xcf/0x140
[   25.427442]  [<ffffffff8162c9f1>] __dev_change_flags+0xa1/0x160
[   25.427443]  [<ffffffff8162cad9>] dev_change_flags+0x29/0x60
[   25.427445]  [<ffffffff8163a4a9>] do_setlink+0x399/0xa90
[   25.427448]  [<ffffffff8163ca7c>] rtnl_newlink+0x51c/0x740
[   25.427449]  [<ffffffff8163c653>] ? rtnl_newlink+0xf3/0x740
[   25.427449]  [<ffffffff8163c653>] ? rtnl_newlink+0xf3/0x740
[   25.427452]  [<ffffffff811e730c>] ? new_slab+0x14c/0x490
[   25.427454]  [<ffffffff81303188>] ? security_capable+0x18/0x20
[   25.427455]  [<ffffffff8109cf7d>] ? ns_capable+0x2d/0x60
[   25.427457]  [<ffffffff816391a4>] rtnetlink_rcv_msg+0xa4/0x270
[   25.427459]  [<ffffffff8165840d>] ? __netlink_lookup+0x4d/0x70
[   25.427460]  [<ffffffff81639100>] ? rtnetlink_rcv+0x40/0x40
[   25.427462]  [<ffffffff8165c4a1>] netlink_rcv_skb+0xc1/0xe0
[   25.427464]  [<ffffffff816390ec>] rtnetlink_rcv+0x2c/0x40
[   25.427465]  [<ffffffff8165ba26>] netlink_unicast+0x106/0x210
[   25.427466]  [<ffffffff8165be55>] netlink_sendmsg+0x325/0x790
[   25.427468]  [<ffffffff8160de50>] sock_sendmsg+0xa0/0xe0
[   25.427469]  [<ffffffff8120e8cd>] ? lookup_real+0x1d/0x50
[   25.427471]  [<ffffffff8160e394>] ___sys_sendmsg+0x2f4/0x310
[   25.427472]  [<ffffffff8119bdf2>] ? lru_cache_add_active_or_unevictable+0x32/0xc0
[   25.427475]  [<ffffffff8160c673>] ? sock_destroy_inode+0x33/0x40
[   25.427477]  [<ffffffff8121bfd5>] ? __dentry_kill+0x145/0x1d0
[   25.427478]  [<ffffffff8121c105>] ? dput+0xa5/0x170
[   25.427479]  [<ffffffff81224f74>] ? mntput+0x24/0x40
[   25.427481]  [<ffffffff81206d6a>] ? __fput+0x17a/0x1e0
[   25.427482]  [<ffffffff8160ee21>] __sys_sendmsg+0x51/0x90
[   25.427483]  [<ffffffff8160ee72>] SyS_sendmsg+0x12/0x20
[   25.427493]  [<ffffffff81745129>] system_call_fastpath+0x12/0x17
[   25.427494] ---[ end trace 4cc3153e369484ec ]---

  Marcelo

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Marcelo Ricardo Leitner Dec. 16, 2014, 6 p.m. UTC | #5
On 16-12-2014 14:04, Rajat Jain wrote:
> Hello All,
>
> Apologies for jumping in late, but for some reason I do not see the
> original mail in my inbox. However I am taking a look at the mails as
> sent on linux-pci (and I will keep an eye out for the bug report that
> Bjorn asked for).
>

np!
Nils would you create that BZ please? As you did all the bisect.. :)

>
>>
>> I'm getting, with commit 89665a6a71408796565bfd29cfa6a7877b17a667:
>>
>> $ grep 'pci 0000:02' tg3.bad
>> [    0.190733] pci 0000:02:00.0: 1st 165a14e4 14e4
>> [    0.190736] pci 0000:02:00.0: 1st 165a14e4 14e4
>> [    0.190810] pci 0000:02:00.0: [14e4:165a] type 00 class 0x020000
>> [    0.190885] pci 0000:02:00.0: reg 0x10: [mem 0xf7c40000-0xf7c4ffff 64bit]
>> [    0.191048] pci 0000:02:00.0: reg 0x30: [mem 0xf7c00000-0xf7c3ffff pref]
>> [    0.191382] pci 0000:02:00.0: PME# supported from D3hot D3cold
>> [    0.191438] pci 0000:02:00.0: System wakeup disabled by ACPI
>> [    1.561555] pci 0000:02:00.0: 1st 1 1
>> [    1.561558] pci 0000:02:00.0: crs_timeout: 0
>> [   20.412021] pci 0000:02:00.0: 1st 1 1
>> [   20.412022] pci 0000:02:00.0: crs_timeout: 0
>> [   20.413596] pci 0000:02:00.0: 1st 1 1
>> [   20.413598] pci 0000:02:00.0: crs_timeout: 0
>>
>> And without it:
>>
>> $ grep 'pci 0000:02' tg3.good
>> [    0.190734] pci 0000:02:00.0: 1st 165a14e4 14e4
>> [    0.190738] pci 0000:02:00.0: 1st 165a14e4 14e4
>> [    0.190811] pci 0000:02:00.0: [14e4:165a] type 00 class 0x020000
>> [    0.190884] pci 0000:02:00.0: reg 0x10: [mem 0xf7c40000-0xf7c4ffff 64bit]
>> [    0.191047] pci 0000:02:00.0: reg 0x30: [mem 0xf7c00000-0xf7c3ffff pref]
>> [    0.191380] pci 0000:02:00.0: PME# supported from D3hot D3cold
>> [    0.191439] pci 0000:02:00.0: System wakeup disabled by ACPI
>> [    1.576778] pci 0000:02:00.0: 1st 1 1
>> [   19.068517] pci 0000:02:00.0: 1st 165a14e4 14e4
>>
>
> It seems that in the first 2 attempts that were made to probe the
> device are all OK and return regular device ID and vendor ID for TG3
> (CRS does not have a role to play). However, later attempts return a
> CRS.
>
> 1) May I ask if you are using acpihp or pciehp? I assume pciehp?

Well.. system doesn't support hotplug..
Chipset is a "Intel Corporation 5 Series/3400 Series", fwiw

> 2) Can you please also send dmesg output while passing
> pciehp.pciehp_debug=1? In the fail case, do you see a message
> indicating the pciehp gave up since it got CRS for a long time
> (something like "pci 0000:02:00.0 id reading try 50 times with
> interval 20 ms to get ffff0001")?

I did use that option anyway, but it resulted in no new messages.

> 3) Currently the pciehp passes "0" for the argument "crs_timeout" to
> pci_bus_read_dev_vendor_id(). Can you please try increasing it to, say
> 30 seconds (30 * 1000). (For comparison data, acpihp uses the value
> 60*1000 i.e. 60 seconds today) and run the fail case once again?
>
> Thanks a lot in advance for the debugging help ;-)
>

Seems it's not safe to do that with those backtraces..
I did it, system was very slow to boot, still didn't get the NIC on and 
got a bunch of "scheduling while atomic" due to that msleep() call.

The first invoke was fine:
Dec 16 15:40:00 odin kernel: [    0.190711] pci 0000:02:00.0: 1st 
165a14e4 14e4
Dec 16 15:40:00 odin kernel: [    0.190717] pci 0000:02:00.0: 1st 
165a14e4 14e4
Dec 16 15:40:00 odin kernel: [    0.191091] pci 0000:02:00.0: System 
wakeup disabled by ACPI
Dec 16 15:40:00 odin kernel: [    1.576061] pci 0000:02:00.0: 1st 1 1
Dec 16 15:40:00 odin kernel: [    1.577474] pci 0000:02:00.0: 1 1
Dec 16 15:40:00 odin kernel: [    1.580487] pci 0000:02:00.0: 1 1
Dec 16 15:40:00 odin kernel: [    1.585508] pci 0000:02:00.0: 1 1
Dec 16 15:40:00 odin kernel: [    1.594499] pci 0000:02:00.0: 1 1
Dec 16 15:40:00 odin kernel: [    1.611499] pci 0000:02:00.0: 1 1
Dec 16 15:40:00 odin kernel: [    1.644521] pci 0000:02:00.0: 1 1
Dec 16 15:40:00 odin kernel: [    1.709566] pci 0000:02:00.0: 1 1
Dec 16 15:40:00 odin kernel: [    1.838654] pci 0000:02:00.0: 1 1
Dec 16 15:40:00 odin kernel: [    2.095765] pci 0000:02:00.0: 1 1
Dec 16 15:40:00 odin kernel: [    2.608956] pci 0000:02:00.0: 1 1
Dec 16 15:40:00 odin kernel: [    3.634443] pci 0000:02:00.0: 1 1
Dec 16 15:40:00 odin kernel: [    5.684388] pci 0000:02:00.0: 1 1
Dec 16 15:40:00 odin kernel: [    9.783279] pci 0000:02:00.0: 1 1
Dec 16 15:40:00 odin kernel: [   17.980060] pci 0000:02:00.0: 1 1
Dec 16 15:40:00 odin kernel: [   34.372640] pci 0000:02:00.0: not responding

The other two...
Dec 16 15:40:09 odin kernel: [   54.154688] pci 0000:02:00.0: 1st 1 1
Dec 16 15:40:09 odin kernel: [   54.154690] BUG: scheduling while 
atomic: ip/1575/0x00000200
Dec 16 15:40:09 odin kernel: pci 0000:02:00.0: 1st 1 1
Dec 16 15:40:09 odin kernel: BUG: scheduling while atomic: 
ip/1575/0x00000200
Dec 16 15:40:09 odin kernel: pci 0000:02:00.0: 1 1
Dec 16 15:40:09 odin kernel: BUG: scheduling while atomic: 
ip/1575/0x00000200
(...)

BUG backtraces were very similar to the 2nd and 3rd I posted on the 
other email, it just pointed to the msleep() call instead of my BUG_ON(1).

I can dig deeper if you think it's worth, but as the 1st call didn't 
have this issue and it didn't complete either, seems we are good about 
the test.. right?

Thanks,
Marcelo

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michael Chan Dec. 16, 2014, 7:54 p.m. UTC | #6
On Tue, 2014-12-16 at 15:59 -0200, Marcelo Ricardo Leitner wrote:
> It's a 
> 02:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5722
> Gigabit Ethernet PCI Express
> over here
> 
> I put a WARN_ON(1) after those printks, and this is what I got:
> 
> [    1.550640] pci 0000:02:00.0: 1st 1 1
> [    1.550643] pci 0000:02:00.0: crs_timeout: 0
> [    1.550645] ------------[ cut here ]------------
> [    1.550651] WARNING: CPU: 6 PID: 364 at drivers/pci/probe.c:1445 pci_bus_read_dev_vendor_id+0x1d4/0x1e0()
> [    1.550652] Modules linked in: i915(+) raid0 i2c_algo_bit drm_kms_helper drm e1000e(+) tg3(+) ptp pps_core video
> [    1.550660] CPU: 6 PID: 364 Comm: systemd-udevd Not tainted 3.18.0-rc6+ #8
> [    1.550661] Hardware name: Dell Inc. OptiPlex 9010/03K80F, BIOS A15 08/12/2013
> [    1.550662]  0000000000000000 000000004de2d8dc ffff8807eabdf948 ffffffff8173db46
> [    1.550665]  0000000000000000 0000000000000000 ffff8807eabdf988 ffffffff81094d41
> [    1.550667]  ffff8807eabdf968 ffff8807f1e27000 0000000000000000 0000000000000000
> [    1.550669] Call Trace:
> [    1.550675]  [<ffffffff8173db46>] dump_stack+0x46/0x58
> [    1.550679]  [<ffffffff81094d41>] warn_slowpath_common+0x81/0xa0
> [    1.550681]  [<ffffffff81094e5a>] warn_slowpath_null+0x1a/0x20
> [    1.550683]  [<ffffffff813b2864>] pci_bus_read_dev_vendor_id+0x1d4/0x1e0
> [    1.550687]  [<ffffffff813b7c3e>] pci_device_is_present+0x2e/0x50
> [    1.550693]  [<ffffffffa003364f>] tg3_chip_reset+0x2f/0x940 [tg3]
> [    1.550697]  [<ffffffffa0033f9f>] tg3_halt+0x3f/0x1e0 [tg3]
> [    1.550701]  [<ffffffffa0044f83>] tg3_init_one+0xb83/0x1a40 [tg3] 

So does it work if you use a non-zero crs_timeout?  The driver has
called tg3_halt() which may affect configuration read responses.  I need
to check with the hardware team to see if the 5722 will return CRS in
this scenario.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Marcelo Ricardo Leitner Dec. 16, 2014, 8:02 p.m. UTC | #7
On 16-12-2014 17:54, Michael Chan wrote:
> On Tue, 2014-12-16 at 15:59 -0200, Marcelo Ricardo Leitner wrote:
>> It's a
>> 02:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5722
>> Gigabit Ethernet PCI Express
>> over here
>>
>> I put a WARN_ON(1) after those printks, and this is what I got:
>>
>> [    1.550640] pci 0000:02:00.0: 1st 1 1
>> [    1.550643] pci 0000:02:00.0: crs_timeout: 0
>> [    1.550645] ------------[ cut here ]------------
>> [    1.550651] WARNING: CPU: 6 PID: 364 at drivers/pci/probe.c:1445 pci_bus_read_dev_vendor_id+0x1d4/0x1e0()
>> [    1.550652] Modules linked in: i915(+) raid0 i2c_algo_bit drm_kms_helper drm e1000e(+) tg3(+) ptp pps_core video
>> [    1.550660] CPU: 6 PID: 364 Comm: systemd-udevd Not tainted 3.18.0-rc6+ #8
>> [    1.550661] Hardware name: Dell Inc. OptiPlex 9010/03K80F, BIOS A15 08/12/2013
>> [    1.550662]  0000000000000000 000000004de2d8dc ffff8807eabdf948 ffffffff8173db46
>> [    1.550665]  0000000000000000 0000000000000000 ffff8807eabdf988 ffffffff81094d41
>> [    1.550667]  ffff8807eabdf968 ffff8807f1e27000 0000000000000000 0000000000000000
>> [    1.550669] Call Trace:
>> [    1.550675]  [<ffffffff8173db46>] dump_stack+0x46/0x58
>> [    1.550679]  [<ffffffff81094d41>] warn_slowpath_common+0x81/0xa0
>> [    1.550681]  [<ffffffff81094e5a>] warn_slowpath_null+0x1a/0x20
>> [    1.550683]  [<ffffffff813b2864>] pci_bus_read_dev_vendor_id+0x1d4/0x1e0
>> [    1.550687]  [<ffffffff813b7c3e>] pci_device_is_present+0x2e/0x50
>> [    1.550693]  [<ffffffffa003364f>] tg3_chip_reset+0x2f/0x940 [tg3]
>> [    1.550697]  [<ffffffffa0033f9f>] tg3_halt+0x3f/0x1e0 [tg3]
>> [    1.550701]  [<ffffffffa0044f83>] tg3_init_one+0xb83/0x1a40 [tg3]
>
> So does it work if you use a non-zero crs_timeout?  The driver has
> called tg3_halt() which may affect configuration read responses.  I need
> to check with the hardware team to see if the 5722 will return CRS in
> this scenario.

Sorry, I replied to the thread that you weren't in yet.
It didn't..
http://thread.gmane.org/gmane.linux.network/342566/focus=37932

   Marcelo

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nils Holland Dec. 16, 2014, 8:38 p.m. UTC | #8
On Tue, Dec 16, 2014 at 04:00:04PM -0200, Marcelo Ricardo Leitner wrote:
> On 16-12-2014 14:04, Rajat Jain wrote:
> > Hello All,
> >
> > Apologies for jumping in late, but for some reason I do not see the
> > original mail in my inbox. However I am taking a look at the mails as
> > sent on linux-pci (and I will keep an eye out for the bug report that
> > Bjorn asked for).
> >
> 
> np!
> Nils would you create that BZ please? As you did all the bisect.. :)

I'm only just catching up with all the new messages in this thread,
but I've already opened a bug report, as requested. Come and find it
at:

https://bugzilla.kernel.org/show_bug.cgi?id=89821

Feel free to add more relevant details to it! :-)

Greetings,
Nils
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas Dec. 18, 2014, 7:15 p.m. UTC | #9
On Tue, Dec 16, 2014 at 12:54 PM, Michael Chan <mchan@broadcom.com> wrote:
> On Tue, 2014-12-16 at 15:59 -0200, Marcelo Ricardo Leitner wrote:
>> It's a
>> 02:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5722
>> Gigabit Ethernet PCI Express
>> over here
>>
>> I put a WARN_ON(1) after those printks, and this is what I got:
>>
>> [    1.550640] pci 0000:02:00.0: 1st 1 1
>> [    1.550643] pci 0000:02:00.0: crs_timeout: 0
>> [    1.550645] ------------[ cut here ]------------
>> [    1.550651] WARNING: CPU: 6 PID: 364 at drivers/pci/probe.c:1445 pci_bus_read_dev_vendor_id+0x1d4/0x1e0()
>> [    1.550652] Modules linked in: i915(+) raid0 i2c_algo_bit drm_kms_helper drm e1000e(+) tg3(+) ptp pps_core video
>> [    1.550660] CPU: 6 PID: 364 Comm: systemd-udevd Not tainted 3.18.0-rc6+ #8
>> [    1.550661] Hardware name: Dell Inc. OptiPlex 9010/03K80F, BIOS A15 08/12/2013
>> [    1.550662]  0000000000000000 000000004de2d8dc ffff8807eabdf948 ffffffff8173db46
>> [    1.550665]  0000000000000000 0000000000000000 ffff8807eabdf988 ffffffff81094d41
>> [    1.550667]  ffff8807eabdf968 ffff8807f1e27000 0000000000000000 0000000000000000
>> [    1.550669] Call Trace:
>> [    1.550675]  [<ffffffff8173db46>] dump_stack+0x46/0x58
>> [    1.550679]  [<ffffffff81094d41>] warn_slowpath_common+0x81/0xa0
>> [    1.550681]  [<ffffffff81094e5a>] warn_slowpath_null+0x1a/0x20
>> [    1.550683]  [<ffffffff813b2864>] pci_bus_read_dev_vendor_id+0x1d4/0x1e0
>> [    1.550687]  [<ffffffff813b7c3e>] pci_device_is_present+0x2e/0x50
>> [    1.550693]  [<ffffffffa003364f>] tg3_chip_reset+0x2f/0x940 [tg3]
>> [    1.550697]  [<ffffffffa0033f9f>] tg3_halt+0x3f/0x1e0 [tg3]
>> [    1.550701]  [<ffffffffa0044f83>] tg3_init_one+0xb83/0x1a40 [tg3]
>
> So does it work if you use a non-zero crs_timeout?  The driver has
> called tg3_halt() which may affect configuration read responses.  I need
> to check with the hardware team to see if the 5722 will return CRS in
> this scenario.

Any updates from the hardware team?

This is a pretty serious regression, but as far as I can tell, it is
not a PCI bug.  The device should respond to a config read of vendor
ID.  If the driver does something that make the read return CRS
status, I think the driver is responsible for doing whatever delay or
other fixup is required.

I'm inclined to reassign this bug to the tg3 driver unless you think
the PCI core is doing something wrong here.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c 

index 2306268..4474502 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1436,14 +1436,22 @@  bool pci_bus_read_dev_vendor_id(struct pci_bus *bus, 
int devfn, u32 *l,
                 return false;

         /* Configuration request Retry Status */
-       while (*l == 0xffff0001) {
-               if (!crs_timeout)
+       printk ("pci %04x:%02x:%02x.%d: 1st %x %x\n", pci_domain_nr(bus), 
bus->number,
+               PCI_SLOT(devfn), PCI_FUNC(devfn), *l, *l & 0xffff);
+       while ((*l & 0xffff) == 0x0001) {
+               if (!crs_timeout) {
+                       printk ("pci %04x:%02x:%02x.%d: crs_timeout: %d\n", 
pci_domain_nr(bus),
+                               bus->number, PCI_SLOT(devfn), PCI_FUNC(devfn), 
crs_timeout);
                         return false;
+               }

                 msleep(delay);
                 delay *= 2;
-               if (pci_bus_read_config_dword(bus, devfn, PCI_VENDOR_ID, l))
+               if (pci_bus_read_config_dword(bus, devfn, PCI_VENDOR_ID, l)) {
+                       printk ("pci %04x:%02x:%02x.%d: 
pci_bus_read_config_dword failed\n", pci_domain_nr(bus),
+                               bus->number, PCI_SLOT(devfn), PCI_FUNC(devfn));