Patchwork Dell Vostro 3550: pci_hotplug+acpiphp require 'pcie_aspm=force' on kernel command-line for hotplug to work

login
register
mail settings
Submitter Martin Mokrejs
Date March 8, 2013, 1:47 a.m.
Message ID <51394317.20104@fold.natur.cuni.cz>
Download mbox | patch
Permalink /patch/226016/
State Not Applicable
Headers show

Comments

Martin Mokrejs - March 8, 2013, 1:47 a.m.
[+cc Alan because it looks in general, USB is involved and compared to MS Win7
 with the most restrictive BIOS settings no eSATA/FireWireOHCI cards work
 under linux ('nousb' on kernel commandline) while they both do under Win7. Hope
 we will fork that stuff under a new bug #]

Bjorn Helgaas wrote:
> [+cc Sarah because problem only seems to happen with xhci.  I'm
> assuming this is a pciehp issue for now]
> 
> On Wed, Mar 6, 2013 at 3:30 AM, Martin Mokrejs
> <mmokrejs@fold.natur.cuni.cz> wrote:
>> Hi Bjorn,
>>   thank you for your time on this issue.
>>
>> Bjorn Helgaas wrote:
>>> On Wed, Jan 9, 2013 at 4:10 PM, Martin Mokrejs
>>> <mmokrejs@fold.natur.cuni.cz> wrote:
>>>> Hi,
>>>>   I am following up on a former thread
>>>> Re: 3.2.11: PCI Express card cannot be re-detected withing cca 60sec timeframe
>>>> about the same issue. I think I found some new info while playing with 3.7.1 kernel.
>>>> It happened to me that my hotplug of express cards stopped working so it made me to
>>>> to dive in a figure out what driver did I do to my .config, and what combinations
>>>> of drivers and kernel command-line parameters work and which not. This email will
>>>> cover just one case.
>>>>
>>>> On this Dell Vostro 3550 express card slot works if kernel is without pciehp
>>>> altogether and pci_hotplug+acpiphp are loaded as modules later on. The problem
>>>> is that I must use pcie_aspm=off.
>>>
>>> I confess I am completely bewildered here.  Something is clearly badly
>>> broken, but I'm having a hard time figuring out exactly what it is.  I
>>> think I'm overwhelmed by all the data :)

It won't be easier now but you asked for it. ;)

[cut]

> 
> I'm going to ignore this speculation about xhci and other drivers for
> now and see if we can just get pciehp to work.  We need to reduce the
> number of variables here.
> 
> I opened https://bugzilla.kernel.org/show_bug.cgi?id=54921
> "pciehp/xhci ExpressCard failure on Dell Vostro 3550" for this issue.
> 
>> Or, the card is not PCI or PCIe hotplug capable [6]?
>> Could that be the difference in behior?
> 
> All ExpressCards should support hotplug.  The hotplug support is in
> the root port and other circuitry on the motherboard.
> 
>>> But I think it's a bad idea to go down the road of using acpiphp.
>>
>> Unfortunately I was forced to switch to acpiphp because commit
>> 0d52f54e2ef64c189dedc332e680b2eb4a34590a (as diagnosed by Yinghai)
>> becuase pciehp stopped working (although it worked badly anyways).
> 
> I understand you have to use acpiphp to make things work right now.
> But I think we can make pciehp work, and I think that's what distros
> will want to use in this situation, so that's what I want to debug.
> 
> As background for the whole collection of hotplug drivers we have,
> here's my understanding of the history:
> 
> 1) Originally there was no standard for PCI hotplug hardware, and we
> have drivers like cpqphp, ibmphp, etc. to deal with various hardware
> designs.
> 
> 2) ACPI defined an abstract hotplug model so an OS can have a single
> driver, e.g., acpiphp, for that model, and the BIOS can map the
> abstract model to various hardware designs.
> 
> 3) PCIe defined a single hardware model, so an OS can have a single
> driver, e.g., pciehp.  ACPI is not really involved here except that
> the OS has to ask the BIOS for permission to use this native hotplug
> driver.
> 
> Many machines, including yours, support both ACPI (acpiphp) and PCIe
> native (pciehp) hotplug so they can run both old OSes that don't have
> pciehp, and newer OSes that prefer to use pciehp.
> 
>> Yighai in the 3.2.x thread postulated it is a BIOS or silicon bug
>> incorrectly providing PresDet status. I was glad that I can show
>> that with acpiphp the PresDet is *always* correct (3.7. kernel)
>> *provided* I disabled MediaCard reader in BIOS.
> 
> Can you build a v3.9-rc1 kernel with this config:
> 
>   CONFIG_HOTPLUG_PCI_PCIE=y
>   CONFIG_HOTPLUG_PCI_ACPI=n
>   CONFIG_USB_XHCI_HCD=n
> 
> I want to use pciehp, not acpiphp, and leave the xhci driver out of
> the picture for now.  Boot it with pciehp.pciehp_debug=1 and the
> ExpressCard slot empty, and run this command:
> 
>   # while true; do echo -n "$(date +%T) SlotStatus "; setpci -s1c.7
> 0x5a.w; sleep 1; done
> 
> That command reads the SlotStatus register from the bridge leading to
> your ExpressCard slot every second.  While that command is running, do
> insertions and removals of all your ExpressCards.  Bit 6 (0x0040) is
> the Presence Detect bit.  It should change as you insert and remove
> cards.
> 
> I'd like to see the complete dmesg log, the output of "lspci -vv", and
> the output of the above command while you insert/remove cards (with
> notes about which card is being hotplugged when).  Since there seems
> to be some interaction with the MediaCard reader BIOS settings, maybe
> you could do this whole experiment with the reader disabled, then with
> it enabled.
> 
> You can attach these logs to the bugzilla
> (https://bugzilla.kernel.org/show_bug.cgi?id=54921) if you want, or
> point me to them and I'll do it.

I exposed the collected data (500kB) at http://195.113.57.32/~mmokrejs/tmp/ExpressCard_hotplug_tests_3.9-rc1.tar.bz2


1)
This is 3.9-rc1 kernel. From a quick glance I think it is different from 3.7.x
and 3.2.x in a way that after ejecting eSATA card there is too many diffs in
"lspci -vv" output when comparing cold boot state with the state after card
is ejected. I don't remember that from the past and I don't see that with the
USB3 card nor FireWireOHCI card. Could it be the drivers is in 3.9-rc1 still
loaded? If yes, maybe the multiple attempts to insert/eject card are affected
by this. Please verify what I say and summarize in bugzilla. I won't spoil it
(this email is enough).


2)
I will go straight where the USB interference is. I noticed in the past (email
threads were recapped in this thread already) that after insertion of some cards
"USB" bus is reset or at least, a Media Card reader is detected, or re-detected,
simply appears in dmesg. In dmesg outputs you will see ehci-pci driver being
involved. I tried to disable more and more USB devices in BIOS and also used
'nousb' commandline option to disable BIOS. I do not understand why that completely
ExpressCard slot functionality under Linux while not under Win7. In Linux pciehp
still complains about Surprise Removal even when I insert the card for the very
first time after a coldboot (so the ExpressCard slot is not completely dead but
neither sata_sil24 nor fw_ohci pickus up the device). USB drivers had no chance
to bind because of the 'nousb', of course. Alan/Sarah, please look possibly in
this order into the following dirs (well, don't waste too much time on the first
one, focus more on on difference between the second and third, and I placed a
file Win7.txt in the third to recap the Win7 vs. Linux differences):

without_XHCI_with_EHCI_no_ACPIPHP_with_PCIEHP/
without_USB_no_ACPIPHP_with_PCIEHP/
without_USB_no_ACPIPHP_with_PCIEHP_without_microphone_USB_wakeup_USB_emulation_USB_powershare/

My question is. Has the laptop hardwired the ExpressCard slot somehow through USB
to the SandyBridge chip? It seems not only the UVC Camera is, but also a CD/DVD
optical drive is and the item "Microphone" in BIOS possibly involves whole Intel
HD Audio soundcard. If that is true then I understand why under linux 'nousb'
prevents functionality of all the three ExpressCard tested.

Puzzlingly, while 'nousb' was used my Alps touchpad still worked in VT console.
Further, what I might have had always screwed in my .config is that there are SDHC drivers
and some other which might be competing with some USB drivers (under normal .config)
for the say MediaCardReader, etc.). Please compare without_XHCI_with_EHCI_no_ACPIPHP_with_PCIEHP/.config.gz
with without_USB_no_ACPIPHP_with_PCIEHP_without_microphone_USB_wakeup_USB_emulation_USB_powershare/config.gz.


3)
The hotplug issue itself. I do not understand the PCI(e) hotplug, lspci output but
why is there any difference between a cold booted status of an empty expresscard slot
compared to the status when a card is unloaded?


 # diff -u without_XHCI_with_EHCI_no_ACPIPHP_with_PCIEHP/disabled_MediaCard_reader/USB3/lspci_vv_initial.txt without_XHCI_with_EHCI_no_ACPIPHP_with_PCIEHP/disabled_MediaCard_reader/USB3/lspci_vv_unloading_USB3_card.txt
#

Shouldn't the MAbort been cleared sometimes? Doesn't this fool the PresDet interpretation in kernel?

The eSATA behaves differently maybe because 0100 does not change to 0140 like USB3 card but to 0103?
Um, the shell while loop calling setpci does not report 0103 but the driver say this:

[  211.879397] sata_sil24 0000:11:00.0: enabling device (0100 -> 0103)


I did not check myself whether the lspci differs between the odd and even unplug of USB3 card
or not, but from the win7 behavior it should be the still same under Linux like it
always was.

I do not remember with 3.2.x and 3.7x kernels seeing this with the USB3 card:
+[  594.622211] pci 0000:11:00.0: calling quirk_usb_early_handoff+0x0/0x657
+[  594.622223] pci 0000:11:00.0: device not available (can't reserve [mem 0x00000000-0x00001fff 64bit])
+[  594.622225] pci 0000:11:00.0: Can't enable PCI device, BIOS handoff failed.
But somebody better check the previous tarball. ;-) Ah, well, with 3.7 I used acpiphp,
hmm, and how was it with 3.2.x I for sure forgot. ;)


I do not know what else to disable in BIOS except eSATA port to make the SandyBridge chip doing
better. Actually, I might have also tried to disable ExpressCard to see what it really does.
;)


4)
A new 3.9-rc1 bug is that sometimes, while switching a virtual console I get
(file without_XHCI_with_EHCI_no_ACPIPHP_with_PCIEHP/disabled_MediaCard_reader/USB3/dmesg_VT12_switching_while_reading_slot_status_caused_slowpath.txt):
[  458.113692] WARNING: at drivers/tty/tty_buffer.c:428 flush_to_ldisc+0x55/0x189()
[  458.113693] Hardware name: Vostro 3550
[  458.113693] tty is NULL
Will try to find a proper maintainer for that.


5)
I believe I reported already in the past that repeatedly inserting and removing the FireWireOHCI
card can crash the kernel, completely. That is nothing new since 3.2 at least and although I captured
a partial stacktrace I won't anymore. There are several OOPSes after each other in a row.

Well, thank you for your time. I would have already returned the whole laptop
but I don't know what other piece of HW to buy. I need eSATA, at least 2x USB3.0 ports (4 USB ports
in total), expressCard for extra 2 eSATA ports or Firewire, anti-reflex display, HDMI or DVI at least.
Keyboard with a back-light is a plus. ;-) 8GB RAM is a minimum, currently have 16GB in this thingie.
;)

Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas - March 9, 2013, 3:51 a.m.
On Thu, Mar 7, 2013 at 6:47 PM, Martin Mokrejs
<mmokrejs@fold.natur.cuni.cz> wrote:
> Bjorn Helgaas wrote:
>> On Wed, Mar 6, 2013 at 3:30 AM, Martin Mokrejs
>> <mmokrejs@fold.natur.cuni.cz> wrote:
>>> Bjorn Helgaas wrote:
>>>> On Wed, Jan 9, 2013 at 4:10 PM, Martin Mokrejs
>>>> <mmokrejs@fold.natur.cuni.cz> wrote:
>>>>>   I am following up on a former thread
>>>>> Re: 3.2.11: PCI Express card cannot be re-detected withing cca 60sec timeframe
>>>>> about the same issue. I think I found some new info while playing with 3.7.1 kernel.
>>>>> It happened to me that my hotplug of express cards stopped working so it made me to
>>>>> to dive in a figure out what driver did I do to my .config, and what combinations
>>>>> of drivers and kernel command-line parameters work and which not. This email will
>>>>> cover just one case.
>>>>>
>>>>> On this Dell Vostro 3550 express card slot works if kernel is without pciehp
>>>>> altogether and pci_hotplug+acpiphp are loaded as modules later on. The problem
>>>>> is that I must use pcie_aspm=off.
>>>>
>>>> I confess I am completely bewildered here.  Something is clearly badly
>>>> broken, but I'm having a hard time figuring out exactly what it is.  I
>>>> think I'm overwhelmed by all the data :)
>
> It won't be easier now but you asked for it. ;)

Well, I didn't really ask for 500kB of logs with 150-character
pathnames.  I am unable to process that much stuff.  I'm not
interested in random lspci differences, virtual console bugs, or OHCI
driver crashes at this point, so let's not muddy the waters with them.
 I only want to figure out if pciehp can correctly detect and
enumerate a newly inserted card, and if it can correctly clean up when
a card is removed.  After we figure that out, we can worry about more
complicated issues.

I *think* the bottom line of the Slot Status experiment is that the
Presence Detect bit was reported correctly in every case, for every
card, regardless of BIOS settings.  Right?

There are three cards on the table now:

  pci 0000:11:00.0: [1095:3132] SiI 3132 Serial ATA Raid II Controller
  pci 0000:11:00.0: [1106:3403] VT6315 Series Firewire Controller
  pci 0000:11:00.0: [1033:0194] NEC Corporation uPD720200 USB 3.0 Host
Controller

I assume all the cards work fine if there's no hotplug, i.e., if the
card is present at boot and you never remove it.  Right?

> In Linux pciehp
> still complains about Surprise Removal even when I insert the card for the very
> first time after a coldboot

Hmm.  pciehp prints "Surprise Removal" whether you inserted or removed
the card.  Stupid driver.

> (so the ExpressCard slot is not completely dead but
> neither sata_sil24 nor fw_ohci picks up the device).

I thought the only card with a problem was the USB3.0 card.  But here
you suggest that there *is* a problem with the SATA and Firewire
cards.  Can you describe that problem in one sentence?

> My question is. Has the laptop hardwired the ExpressCard slot somehow through USB
> to the SandyBridge chip?

An ExpressCard slot (spec at [1]) supports both a PCIe interface and a
USB interface, so the slot *should* be connected to a USB controller
as well as to a PCIe root port.  An ExpressCard can contain either a
PCIe device or a USB device or both.  Section 6.3 of the spec talks
about ACPI requirements to describe the relationship between the PCIe
and USB devices.  I'm not sure that Linux pays any attention to this
in the hotplug paths, so I'm a little worried about this.  (Maybe it
doesn't need to in the PCIe-aware case; I don't know.)

It would be interesting to know exactly what devices are on your
cards.  Assuming they all work when present at boot, you could find
that by doing a single "lspci -vv" and "lsusb -v" after a boot with an
empty slot, and doing it again after a boot with a card in the slot.
The difference should be the ExpressCard devices.  I'm sure this is
buried in your tarball somewhere, but all I want is the info from a
machine in default configuration -- MediaCard enabled, etc.  Just the
way a typical user would be using the machine.

[1] http://www.usb.org/developers/expresscard/EC_specifications/ExpressCard_2_0_FINAL.pdf

> The hotplug issue itself. I do not understand the PCI(e) hotplug, lspci output but
> why is there any difference between a cold booted status of an empty expresscard slot
> compared to the status when a card is unloaded?

In principle there shouldn't be any difference, but Linux isn't that good yet.

> --- without_XHCI_with_EHCI_no_ACPIPHP_with_PCIEHP/disabled_MediaCard_reader/USB3/lspci_vv_initial.txt   2013-03-07 22:27:30.000000000 +0100
> +++ without_XHCI_with_EHCI_no_ACPIPHP_with_PCIEHP/disabled_MediaCard_reader/USB3/lspci_vv_unloading_USB3_card.txt       2013-03-07

> -       Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
> +       Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ <SERR- <PERR-
>
> Shouldn't the MAbort been cleared sometimes? Doesn't this fool the PresDet interpretation in kernel?

I doubt it.  Presence Detect is a very simple mechanism that basically
just reports the current state of the CPPE# signal in the ExpressCard
slot.  There's no reason this should be related to MAbort.

> The eSATA behaves differently maybe because 0100 does not change to 0140 like USB3 card but to 0103?
> Um, the shell while loop calling setpci does not report 0103 but the driver say this:
>
> [  211.879397] sata_sil24 0000:11:00.0: enabling device (0100 -> 0103)

This is completely unrelated.  The shell "setpci" command is printing
the Slot Status register; the "0100 -> 0103" above is a change in the
PCI Command register.

> I do not remember with 3.2.x and 3.7x kernels seeing this with the USB3 card:
> +[  594.622211] pci 0000:11:00.0: calling quirk_usb_early_handoff+0x0/0x657
> +[  594.622223] pci 0000:11:00.0: device not available (can't reserve [mem 0x00000000-0x00001fff 64bit])
> +[  594.622225] pci 0000:11:00.0: Can't enable PCI device, BIOS handoff failed.

This is a result of trying to run the quirk on a hot-inserted device
where we haven't assigned resources to it yet.  I don't think we
should really be running quirks on a device that early.  We can look
at that later, if we think this is related to the immediate problem.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

--- without_XHCI_with_EHCI_no_ACPIPHP_with_PCIEHP/disabled_MediaCard_reader/USB3/lspci_vv_initial.txt   2013-03-07 22:27:30.000000000 +0100
+++ without_XHCI_with_EHCI_no_ACPIPHP_with_PCIEHP/disabled_MediaCard_reader/USB3/lspci_vv_unloading_USB3_card.txt       2013-03-07 22:40:25.000000000 +0100
@@ -287,7 +287,7 @@ 
        I/O behind bridge: 0000c000-0000dfff
        Memory behind bridge: f6c00000-f7cfffff
        Prefetchable memory behind bridge: 00000000f0000000-00000000f10fffff
-       Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
+       Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ <SERR- <PERR-
        BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
                PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
        Capabilities: [40] Express (v2) Root Port (Slot+), MSI 00