Patchwork PCI / ACPI: Always resume devices on ACPI wakeup notifications

login
register
mail settings
Submitter Rafael J. Wysocki
Date March 23, 2013, 2:33 p.m.
Message ID <2282655.IicBMMa6jN@vostro.rjw.lan>
Download mbox | patch
Permalink /patch/230337/
State Superseded
Headers show

Comments

Rafael J. Wysocki - March 23, 2013, 2:33 p.m.
From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

It turns out that _Lxx control methods provided by some BIOSes clear
the PME Status bit of PCI devices they handle, which means that
pci_acpi_wake_dev() cannot really use that bit to check whether or
not the device has signalled wakeup.

For this reason, make pci_acpi_wake_dev() always attempt to resume
the device it is called for regardless of the device's PME Status bit
value (that bit still has to be cleared if set at this point,
though).

Reported-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/pci/pci-acpi.c |   15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Matthew Garrett - March 23, 2013, 4:22 p.m.
Looks good to me.
Sarah Sharp - March 25, 2013, 4:45 p.m.
On Sat, Mar 23, 2013 at 03:33:03PM +0100, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> It turns out that _Lxx control methods provided by some BIOSes clear
> the PME Status bit of PCI devices they handle, which means that
> pci_acpi_wake_dev() cannot really use that bit to check whether or
> not the device has signalled wakeup.
> 
> For this reason, make pci_acpi_wake_dev() always attempt to resume
> the device it is called for regardless of the device's PME Status bit
> value (that bit still has to be cleared if set at this point,
> though).
> 
> Reported-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Should this be marked for stable?  I had this issue on 3.7 and 3.8 as
well.

Sarah

> ---
>  drivers/pci/pci-acpi.c |   15 ++++++++-------
>  1 file changed, 8 insertions(+), 7 deletions(-)
> 
> Index: linux-pm/drivers/pci/pci-acpi.c
> ===================================================================
> --- linux-pm.orig/drivers/pci/pci-acpi.c
> +++ linux-pm/drivers/pci/pci-acpi.c
> @@ -53,14 +53,15 @@ static void pci_acpi_wake_dev(acpi_handl
>  		return;
>  	}
>  
> -	if (!pci_dev->pm_cap || !pci_dev->pme_support
> -	     || pci_check_pme_status(pci_dev)) {
> -		if (pci_dev->pme_poll)
> -			pci_dev->pme_poll = false;
> +	/* Clear PME Status if set. */
> +	if (pci_dev->pme_support)
> +		pci_check_pme_status(pci_dev);
>  
> -		pci_wakeup_event(pci_dev);
> -		pm_runtime_resume(&pci_dev->dev);
> -	}
> +	if (pci_dev->pme_poll)
> +		pci_dev->pme_poll = false;
> +
> +	pci_wakeup_event(pci_dev);
> +	pm_runtime_resume(&pci_dev->dev);
>  
>  	if (pci_dev->subordinate)
>  		pci_pme_wakeup_bus(pci_dev->subordinate);
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael J. Wysocki - March 25, 2013, 10:34 p.m.
On Monday, March 25, 2013 09:45:51 AM Sarah Sharp wrote:
> On Sat, Mar 23, 2013 at 03:33:03PM +0100, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > 
> > It turns out that _Lxx control methods provided by some BIOSes clear
> > the PME Status bit of PCI devices they handle, which means that
> > pci_acpi_wake_dev() cannot really use that bit to check whether or
> > not the device has signalled wakeup.
> > 
> > For this reason, make pci_acpi_wake_dev() always attempt to resume
> > the device it is called for regardless of the device's PME Status bit
> > value (that bit still has to be cleared if set at this point,
> > though).
> > 
> > Reported-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> Should this be marked for stable?  I had this issue on 3.7 and 3.8 as
> well.

Yes, it probably should, but that's the maintainer's call.

Thanks,
Rafael


> > ---
> >  drivers/pci/pci-acpi.c |   15 ++++++++-------
> >  1 file changed, 8 insertions(+), 7 deletions(-)
> > 
> > Index: linux-pm/drivers/pci/pci-acpi.c
> > ===================================================================
> > --- linux-pm.orig/drivers/pci/pci-acpi.c
> > +++ linux-pm/drivers/pci/pci-acpi.c
> > @@ -53,14 +53,15 @@ static void pci_acpi_wake_dev(acpi_handl
> >  		return;
> >  	}
> >  
> > -	if (!pci_dev->pm_cap || !pci_dev->pme_support
> > -	     || pci_check_pme_status(pci_dev)) {
> > -		if (pci_dev->pme_poll)
> > -			pci_dev->pme_poll = false;
> > +	/* Clear PME Status if set. */
> > +	if (pci_dev->pme_support)
> > +		pci_check_pme_status(pci_dev);
> >  
> > -		pci_wakeup_event(pci_dev);
> > -		pm_runtime_resume(&pci_dev->dev);
> > -	}
> > +	if (pci_dev->pme_poll)
> > +		pci_dev->pme_poll = false;
> > +
> > +	pci_wakeup_event(pci_dev);
> > +	pm_runtime_resume(&pci_dev->dev);
> >  
> >  	if (pci_dev->subordinate)
> >  		pci_pme_wakeup_bus(pci_dev->subordinate);
> >
Martin Mokrejs - April 2, 2013, 8:55 p.m.
[ +linux-pci and Yinghai as they suffered already those many emails on individual
 threads so one overviewing email hopefully won't harm] ;-)

Martin Mokrejs wrote:
> 
> 
> Bjorn Helgaas wrote:
>> On Tue, Apr 2, 2013 at 9:02 AM, Martin Mokrejs
>> <mmokrejs@fold.natur.cuni.cz> wrote:
>>> Hi Ying,
>>>
>>> huang ying wrote:
>>
>>>> And please give me the full dmesg for boot and incremental dmesg for
>>>> operations.
>>>
>>>
>>> The incremental bits here, the full dmesg will send only directly to your email, due to its size.
>>
>> Is there a bugzilla for this issue?  Please attach the complete dmesg
>> there or somewhere similar so we can all benefit.
> 
> I changed my mind. I am attaching the dmesg here but omitting linux-acpi
> list. After I hear a proposal from Rafel/Bjorn I will open separate bugs.
> I thought that the threads I started so far were enough but yes, dmesg
> files don't pass through list filters so I should move that to bugzilla.
> 
> so far my view of the the bugs was:
> 
> 1) acpiphp hotplug broken due to upstream pcieport 1c.7 PME# enabled
>   (eSATA-based card)

Fixed by Ying Huang port_dbg.patch applied over 3.8.5 (fixes acpiphp hotplug
of eSATA and Firewire cards, NOT the hotplug of a NEC-based USB3 card -> hence
the bug 4) below). Now I can continue using laptop-mode-tools.


> 2) xHCI dead due to to its suspend - 3.8 series and above

Not fixed by port_dbg.patch applied over 3.8.5. Interestingly, a NEC-based
XHCI card *in an express card slot* does not suffer this suspend issue.
Although it is being put into suspend if a device is unplugged.


> 3) pciehp completely broken since about 3.6, still 3.9-rc5

Even 3.9-rc5 with patch 2368081 and port_dbg.patch from Ying Huang this is
still broken (the eject of a cold plugged device from an express card slot).
That results in /proc/interrupts claiming IRQ19 is still used by the driver.
Non-forced but manual 'rmmod sata_sil24' removes the IRQ 19 from the listing.
The rmmod also removes association with sata_sil24 from the /proc/iomem but
the device 11:00 is retained in the file with its memory ranges.
lspci provides, as many times described by me, conflicting information.
Actually, I trust more lspci than /proc/ files.


> 
> 
> 
> There is one more which actually brought me into all of this in May2012 at about
> 3.2.x kernels:
> 
> 4) Even when upstream port 1c.7 is force control to 'on' hot removal of
>    USB3 express card is broken, only every second eject is recognized.
>    Is likely related to xhci_hcd having a special privilege to handle IRQ/PM
>    in its own way. In contrast, Firewire and eSATA cards work under same
>    circumstances. I see different sleep states listed as supported by those
>    cards but my bet is that is due to the exceptional xhci_hcd privilege.
>    I briefly repeated that already with 3.9-rc5.

Still broken even with port_dbg.patch applied over 3.8.5. Turns out the unnoticed
ejects and inserts are actually detected, but later, with 30sec delay of so.
Hmm, in my original thread back in 2012 I said 60sec delay but seems is likely
still the same problem:
3.2.11: PCI Express card cannot be re-detected withing cca 60sec timeframe




Before I forget, I will sketch several more bugs I hit and are all documented
in my postings from last week or two. I can provide the URLs to those postings
already in archives and maybe summarize them in bugzilla, after we agree what
will be worked on and where (email ... bugzilla), under the best matching suibject
you will propose.


5) lspci causes wake and suspend of pcieport handled devices. I fear this is
not good. Maybe it does the same to other pci devices but the "problem" is
that no other pci drivers report same type of message. I would like to see
the PME# enabled/disabled generated by other drivers as well, ideally by some
upstream, common driver.


6) sata_sil24 sometimes initializes badly under pciehp. Provided you once fix
the pciehp and still would like to get the init of sata_sil24 fixed as well.
The are two wrong paths in the driver. One is:

[  899.894862] sata_sil24 0000:11:00.0: version 1.1
[  899.894880] sata_sil24 0000:11:00.0: enabling device (0000 -> 0003)
[  899.985994] sata_sil24 0000:11:00.0: failed to clear port RST
[  900.086097] sata_sil24 0000:11:00.0: failed to clear port RST
[  900.086119] sata_sil24 0000:11:00.0: enabling bus mastering

while the other is:

[  974.021661] pcieport 0000:00:1c.0: PME# disabled
[  974.041697] pcieport 0000:00:1c.7: PME# disabled
[ 1048.450168] sata_sil24 0000:11:00.0: version 1.1
[ 1048.463692] sata_sil24 0000:11:00.0: Refused to change power state, currently in D3
[ 1048.563818] sata_sil24 0000:11:00.0: failed to clear port RST
[ 1048.663935] sata_sil24 0000:11:00.0: failed to clear port RST

Both lead to a broken device and I would prefer the driver to fail to load.
It seems they are at least in part related to early device eject while the
driver did not yet turn down an unused external SATA port.


7) It seems Rafael or Bjorn have a clue why sometimes I see only PME# disabled
or just PME# enabled in dmesg for a particular device and I am worried when was
it silently switched to the other state. I would like to hear this can be prevent
in future by some cross-checks, by design.


8) I don't know whether one can ensure that a driver releases either both
IRQ and memory ranges it has allocated, or just nothing, or an oops happens,
whatever. Maybe something could track what the driver grabbed once and make
sure both are released. even a background scan or /proc files would be fine.
The disagreement with lspci is not good.


9) In the thread 
Re: 3.8.2: stale pci device info for a previously inserted express card
I already showed an example that chimeric entries in 'lspci -vvv' output
can appear. Some data describe the previously loaded card in an Express
Card Slot while the other the one currently loaded in the slot.
This might lead to an explanation why are there those lines in lspci like:

a)
Latency: 0
Latency: 0, Cache Line Size: 64 bytes
or the Latency: line missing altogether

b)
[virtual] Expansion ROM at f6c00000 [disabled] [size=512K]
Expansion ROM at f6c00000 [size=512K]

c)
Region 0: Memory at f6c84000 (64-bit, non-prefetchable) [size=128]
Region 0: Memory at f6c84000 (64-bit, non-prefetchable) [disabled] [size=128]


If kernel does not give a hint what is wrong with a device/driver then
maybe lspci do do a runtime check and give some more useful user-oriented warning.



>>
>> I think we have two problems that may be relevant to this discussion.
>>
>> 1) The _OSC "PCI Express Capability Structure control" bit.  I don't
>> think Linux pays attention to whether the BIOS has granted us control
>> over the capability, so we may do things to it that the BIOS doesn't
>> expect.
>>
>> 2) acpiphp currently uses the presence of _ADR/_EJ0/_RMV to detect
>> hotplug slots.  I don't think this is sufficient (see
>> https://bugzilla.kernel.org/show_bug.cgi?id=54981 for details).
>> Therefore, I don't think pci_bus_has_hotplug_slots() in port_dbg.patch
>> can be accurate.  I think it returns "false" for some buses where it
>> should return "true," such as the ExpressCard slot on Chris Clayton's
>> system (see bug 54981).
> 
> But, I do not how whether and how to split the above 4 bugs into maybe more,
> better described bugs. I will repeat them likely with 3.8.5 and 3.9-rc5,
> I got quite skilled running diff all the last days and weeks. ;-)
> 
> I am waiting for some answers from you before opening bug reports.
> Please tell me how to name them and what data you want to get where.
> After I open them will try to (re)attach your patches. Ying, do you have an
> update for the port_dbg.patch per Bjorns comments about the pci_bus_has_hotplug_slots() 
> being inaccurate? I would gladly wait for an updated patch catching rather
> more scenarios than less.

Feel free to comment on the listing of deemed bugs, add more you saw in the
logs or diffs yourself (especially those downstream, secondary bugs which will
be soon masked by the hotplug issues being *fixed*). ;)
I am quite optimistic. ;))

The above listings don't contain URLs but can be all sorted out in
those respective bugzilla entries.

Thank you,
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sarah Sharp - April 2, 2013, 10:16 p.m.
On Tue, Apr 02, 2013 at 10:55:02PM +0200, Martin Mokrejs wrote:
> > 2) xHCI dead due to to its suspend - 3.8 series and above
> 
> Not fixed by port_dbg.patch applied over 3.8.5. Interestingly, a NEC-based
> XHCI card *in an express card slot* does not suffer this suspend issue.
> Although it is being put into suspend if a device is unplugged.

Wait, wait, wait.  Time out.  You have *two* xHCI host controllers?  Are
they different vendors?  Are they exhibiting different broken behaviors?
Please state for each host controller exactly the symptoms you are
seeing (no dmesg or other log files yet, just one paragraph for each
host).

Sarah Sharp
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
huang ying - April 3, 2013, 2:34 a.m.
Hi, Martin,

On Wed, Apr 3, 2013 at 4:55 AM, Martin Mokrejs
<mmokrejs@fold.natur.cuni.cz> wrote:
> [ +linux-pci and Yinghai as they suffered already those many emails on individual
>  threads so one overviewing email hopefully won't harm] ;-)
>
> Martin Mokrejs wrote:
>>
>>
>> Bjorn Helgaas wrote:
>>> On Tue, Apr 2, 2013 at 9:02 AM, Martin Mokrejs
>>> <mmokrejs@fold.natur.cuni.cz> wrote:
>>>> Hi Ying,
>>>>
>>>> huang ying wrote:
>>>
>>>>> And please give me the full dmesg for boot and incremental dmesg for
>>>>> operations.
>>>>
>>>>
>>>> The incremental bits here, the full dmesg will send only directly to your email, due to its size.
>>>
>>> Is there a bugzilla for this issue?  Please attach the complete dmesg
>>> there or somewhere similar so we can all benefit.
>>
>> I changed my mind. I am attaching the dmesg here but omitting linux-acpi
>> list. After I hear a proposal from Rafel/Bjorn I will open separate bugs.
>> I thought that the threads I started so far were enough but yes, dmesg
>> files don't pass through list filters so I should move that to bugzilla.
>>
>> so far my view of the the bugs was:
>>
>> 1) acpiphp hotplug broken due to upstream pcieport 1c.7 PME# enabled
>>   (eSATA-based card)
>
> Fixed by Ying Huang port_dbg.patch applied over 3.8.5 (fixes acpiphp hotplug
> of eSATA and Firewire cards, NOT the hotplug of a NEC-based USB3 card -> hence
> the bug 4) below). Now I can continue using laptop-mode-tools.
>
>
>> 2) xHCI dead due to to its suspend - 3.8 series and above
>
> Not fixed by port_dbg.patch applied over 3.8.5. Interestingly, a NEC-based
> XHCI card *in an express card slot* does not suffer this suspend issue.
> Although it is being put into suspend if a device is unplugged.

Do not find the dmesg or any other details about this.  Could you
provide some details?  Or I miss some emails from you?

Best Regards,
Huang Ying
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Martin Mokrejs - April 3, 2013, 10:35 a.m.
Sarah Sharp wrote:
> On Tue, Apr 02, 2013 at 10:55:02PM +0200, Martin Mokrejs wrote:
>>> 2) xHCI dead due to to its suspend - 3.8 series and above
>>
>> Not fixed by port_dbg.patch applied over 3.8.5. Interestingly, a NEC-based
>> XHCI card *in an express card slot* does not suffer this suspend issue.
>> Although it is being put into suspend if a device is unplugged.
> 
> Wait, wait, wait.  Time out.  You have *two* xHCI host controllers?  Are
> they different vendors?  Are they exhibiting different broken behaviors?
> Please state for each host controller exactly the symptoms you are
> seeing (no dmesg or other log files yet, just one paragraph for each
> host).

The laptop has TexasInstruments controller, which suffers the problem that
once it is suspended (0b:00) it does not observe that a new device was plugged
into the socket, so the end USB device gets no power and is dead. Manual wakeup
using echo 'on' > /sys/.../*0b:00/control wakes up the upstream PCIe root port
1c.4 (at least with the patch) and the 0b:00 itself as intended by the echo
command. That enables the TI controller realize e.g. a mouse is connected to
the socket and picks it up.
What is not clear to me why the xHCI socket is not dead upon bootup with no
USB devices attached. That also yields the controller 0b:00 in suspended state
but the very first plugin of the e.g. mouse is picked up and the mouse works.
Upon unplug of the mouse something gets screwed. We thought that it is due to
the upstream port being suspended but even with the patch preventing that
(port_dbg.patch) the broken gets is entered: the 0b:00 falls asleep, its
runtime_status files says 'suspended' a the socket is dead.

You maybe remember that I started a year ago the threads with Express Card
hotplug issues with another, USB3 NEC-based controller I have. To test better
the patch from Ying Huang I also tried what happens to the NEC-based controller.
It works. I did not provide you the logs although from the debug info Ying added
it seems the code flow in a different way. I had XHCI_DEBUG enabled while no
external USB devices attached and because I tested with a USB2 device (the mouse)
the xhci_hcd did not flood the logs too much.


Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Martin Mokrejs - April 3, 2013, 10:39 a.m.
huang ying wrote:
> Hi, Martin,
> 
> On Wed, Apr 3, 2013 at 4:55 AM, Martin Mokrejs
> <mmokrejs@fold.natur.cuni.cz> wrote:
>> [ +linux-pci and Yinghai as they suffered already those many emails on individual
>>  threads so one overviewing email hopefully won't harm] ;-)
>>
>> Martin Mokrejs wrote:
>>>
>>>
>>> Bjorn Helgaas wrote:
>>>> On Tue, Apr 2, 2013 at 9:02 AM, Martin Mokrejs
>>>> <mmokrejs@fold.natur.cuni.cz> wrote:
>>>>> Hi Ying,
>>>>>
>>>>> huang ying wrote:
>>>>
>>>>>> And please give me the full dmesg for boot and incremental dmesg for
>>>>>> operations.
>>>>>
>>>>>
>>>>> The incremental bits here, the full dmesg will send only directly to your email, due to its size.
>>>>
>>>> Is there a bugzilla for this issue?  Please attach the complete dmesg
>>>> there or somewhere similar so we can all benefit.
>>>
>>> I changed my mind. I am attaching the dmesg here but omitting linux-acpi
>>> list. After I hear a proposal from Rafel/Bjorn I will open separate bugs.
>>> I thought that the threads I started so far were enough but yes, dmesg
>>> files don't pass through list filters so I should move that to bugzilla.
>>>
>>> so far my view of the the bugs was:
>>>
>>> 1) acpiphp hotplug broken due to upstream pcieport 1c.7 PME# enabled
>>>   (eSATA-based card)
>>
>> Fixed by Ying Huang port_dbg.patch applied over 3.8.5 (fixes acpiphp hotplug
>> of eSATA and Firewire cards, NOT the hotplug of a NEC-based USB3 card -> hence
>> the bug 4) below). Now I can continue using laptop-mode-tools.
>>
>>
>>> 2) xHCI dead due to to its suspend - 3.8 series and above
>>
>> Not fixed by port_dbg.patch applied over 3.8.5. Interestingly, a NEC-based
>> XHCI card *in an express card slot* does not suffer this suspend issue.
>> Although it is being put into suspend if a device is unplugged.
> 
> Do not find the dmesg or any other details about this.  Could you
> provide some details?  Or I miss some emails from you?

No, I did not send them away. ;-) I was really waiting for answers how to separate
the bugs, how to name them, what components in bugzilla, etc.


So? ;)

> 
> Best Regards,
> Huang Ying
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Martin Mokrejs - April 3, 2013, 12:16 p.m.
Meanwhile, the raw data: http://195.113.57.32/~mmokrejs/tmp/20130402.tar.bz2
(size 468641 bytes)

They were collected by:

# cat ~/bin/collect_runtime_status.sh 
#!/bin/sh
grep . /sys/bus/pci/devices/*/power/runtime_status > runtime_status_"$1".txt
grep . /sys/bus/pci/devices/*/power/control > control_"$1".txt
cat /proc/interrupts > interrupts_"$1".txt
cat /proc/iomem > iomem_"$1".txt
lspci -vvv > lspci_vvv_"$1".txt
dmesg > dmesg_"$1".txt
#

Just do 'ls -latr' to see the ordering of the files as they were created.
The longer the filename, the later in the test process. The names should be
relatively self-explaining. Definitely, from the log files you should see
what happened in real and therefore, can figure out what the (maybe weird)
long filename really meant.

Sometimes I manually recorded lsusb of dmesg_final.txt, mostly after I did some
extra tests but but not want to record every step by the above 6 files.

In one or two places I added some my own notes into COMMENTS file.




I will try to guide your below where you can study which of the bugs. Mostly,
for each bug you need just one subdirectory to look into, the other are just
repeated the same bug under different kernel version or another patch.
However, Sarah for the xHCI dead port issue will need to compare by diff
two directories, one with the TI-based controller tests, the other with the
NEC-based tests. Especially there, I would do something like:

cd *TI-based; for f in dmesg*; do cut -c 15- $f > /tmp/TI/$f; done
cd ../*NEC-based; for f in dmesg*; do cut -c 15- $f > /tmp/NEC/$f; done

Then it should be easier to poke through file captured at the same test step,
like:

diff -u -w /tmp/TI/dmesg_initial__mouse_attached__unplugged__reattached_but_port_dead.txt \
/tmp/NEC/dmesg_initial__mouse_attached__detached__reattached.txt



Other than that, just diff pairs of files with each other, like:

diff -u -w lspci_vvv_initial.txt lspci_vvv_initial__mouse_attached.txt


Sorry that I sometimes used only a single underscore instead of double underscores
to separate the test steps from each other in the filename.


Martin Mokrejs wrote:
> [ +linux-pci and Yinghai as they suffered already those many emails on individual
>  threads so one overviewing email hopefully won't harm] ;-)
> 
> Martin Mokrejs wrote:
>>
>>
>> Bjorn Helgaas wrote:
>>> On Tue, Apr 2, 2013 at 9:02 AM, Martin Mokrejs
>>> <mmokrejs@fold.natur.cuni.cz> wrote:
>>>> Hi Ying,
>>>>
>>>> huang ying wrote:
>>>
>>>>> And please give me the full dmesg for boot and incremental dmesg for
>>>>> operations.
>>>>
>>>>
>>>> The incremental bits here, the full dmesg will send only directly to your email, due to its size.
>>>
>>> Is there a bugzilla for this issue?  Please attach the complete dmesg
>>> there or somewhere similar so we can all benefit.
>>
>> I changed my mind. I am attaching the dmesg here but omitting linux-acpi
>> list. After I hear a proposal from Rafel/Bjorn I will open separate bugs.
>> I thought that the threads I started so far were enough but yes, dmesg
>> files don't pass through list filters so I should move that to bugzilla.
>>
>> so far my view of the the bugs was:
>>
>> 1) acpiphp hotplug broken due to upstream pcieport 1c.7 PME# enabled
>>   (eSATA-based card)
> 
> Fixed by Ying Huang port_dbg.patch applied over 3.8.5 (fixes acpiphp hotplug
> of eSATA and Firewire cards, NOT the hotplug of a NEC-based USB3 card -> hence
> the bug 4) below). Now I can continue using laptop-mode-tools.

20130402/3.8.5-ying_port-dbg__with_laptop-mode-tools_eSATA_testing
20130402/3.8.3-vanilla__with_laptop-mode-tools (with some comments in
                                                COMMENTS file)


>> 2) xHCI dead due to to its suspend - 3.8 series and above
> 
> Not fixed by port_dbg.patch applied over 3.8.5. Interestingly, a NEC-based
> XHCI card *in an express card slot* does not suffer this suspend issue.
> Although it is being put into suspend if a device is unplugged.

20130402/3.8.5-ying_port-dbg__with_laptop-mode-tools_xHCI_test_TI-based
20130402/3.8.5-ying_port-dbg__with_laptop-mode-tools_xHCI_test_NEC-based

Same thing but yet without the port_dbg.patch:
20130402/3.9-rc5__with_2368081__with-latop-mode-tools_xhci_testing/


>> 3) pciehp completely broken since about 3.6, still 3.9-rc5
> 
> Even 3.9-rc5 with patch 2368081 and port_dbg.patch from Ying Huang this is
> still broken (the eject of a cold plugged device from an express card slot).
> That results in /proc/interrupts claiming IRQ19 is still used by the driver.
> Non-forced but manual 'rmmod sata_sil24' removes the IRQ 19 from the listing.
> The rmmod also removes association with sata_sil24 from the /proc/iomem but
> the device 11:00 is retained in the file with its memory ranges.
> lspci provides, as many times described by me, conflicting information.
> Actually, I trust more lspci than /proc/ files.

Tests with express cards SATA SiI3132 and FireWire VT6315:
20130402/3.9-rc5__with_2368081__and__ying_port-dbg__with-latop-mode-tools_eSATA_testing
20130402/3.9-rc5__with_2368081__and__ying_port-dbg__with-latop-mode-tools_FireWire_testing

A bit more testing but yet without port_dbg.patch (but contains more data for your
so look into it after the above two):
20130402/3.9-rc5__with_2368081__with-latop-mode-tools_eSATA_testing


>> There is one more which actually brought me into all of this in May2012 at about
>> 3.2.x kernels:
>>
>> 4) Even when upstream port 1c.7 is force control to 'on' hot removal of
>>    USB3 express card is broken, only every second eject is recognized.
>>    Is likely related to xhci_hcd having a special privilege to handle IRQ/PM
>>    in its own way. In contrast, Firewire and eSATA cards work under same
>>    circumstances. I see different sleep states listed as supported by those
>>    cards but my bet is that is due to the exceptional xhci_hcd privilege.
>>    I briefly repeated that already with 3.9-rc5.
> 
> Still broken even with port_dbg.patch applied over 3.8.5. Turns out the unnoticed
> ejects and inserts are actually detected, but later, with 30sec delay of so.
> Hmm, in my original thread back in 2012 I said 60sec delay but seems is likely
> still the same problem:
> 3.2.11: PCI Express card cannot be re-detected withing cca 60sec timeframe

20130402/3.8.5-ying_port-dbg__with_laptop-mode-tools_NEC-based_eject_testing


> Before I forget, I will sketch several more bugs I hit and are all documented
> in my postings from last week or two. I can provide the URLs to those postings
> already in archives and maybe summarize them in bugzilla, after we agree what
> will be worked on and where (email ... bugzilla), under the best matching subject
> you will propose.
> 
> 
> 5) lspci causes wake and suspend of pcieport handled devices. I fear this is
> not good. Maybe it does the same to other pci devices but the "problem" is
> that no other pci drivers report same type of message. I would like to see
> the PME# enabled/disabled generated by other drivers as well, ideally by some
> upstream, common driver.

At least in some cases, lspci -vv causes 7x these:

lspci -vvv causes 11x same message.


> 
> 
> 6) sata_sil24 sometimes initializes badly under pciehp. Provided you once fix
> the pciehp and still would like to get the init of sata_sil24 fixed as well.
> The are two wrong paths in the driver. One is:
> 
> [  899.894862] sata_sil24 0000:11:00.0: version 1.1
> [  899.894880] sata_sil24 0000:11:00.0: enabling device (0000 -> 0003)
> [  899.985994] sata_sil24 0000:11:00.0: failed to clear port RST
> [  900.086097] sata_sil24 0000:11:00.0: failed to clear port RST
> [  900.086119] sata_sil24 0000:11:00.0: enabling bus mastering

20130402/3.9-rc5__with_2368081__with-laptop-mode-tools_eSATA_testing/

> 
> while the other is:
> 
> [  974.021661] pcieport 0000:00:1c.0: PME# disabled
> [  974.041697] pcieport 0000:00:1c.7: PME# disabled
> [ 1048.450168] sata_sil24 0000:11:00.0: version 1.1
> [ 1048.463692] sata_sil24 0000:11:00.0: Refused to change power state, currently in D3
> [ 1048.563818] sata_sil24 0000:11:00.0: failed to clear port RST
> [ 1048.663935] sata_sil24 0000:11:00.0: failed to clear port RST

20130402/3.8.5-ying_port-dbg__with_laptop-mode-tools_NEC-based_eject_testing






The bugs below you will come across in multiple places in the tar.bz2 archive but
were also well described in the past email threads. It does not make sense to repeat
that all here or there. I suggest you come up with a debug patch to help with
these and then we can dive into more crafted log data.

> 
> Both lead to a broken device and I would prefer the driver to fail to load.
> It seems they are at least in part related to early device eject while the
> driver did not yet turn down an unused external SATA port.
> 
> 
> 7) It seems Rafael or Bjorn have a clue why sometimes I see only PME# disabled
> or just PME# enabled in dmesg for a particular device and I am worried when was
> it silently switched to the other state. I would like to hear this can be prevented
> in future by some cross-checks, by design.
> 
> 
> 8) I don't know whether one can ensure that a driver releases either both
> IRQ and memory ranges it has allocated, or just nothing, or an oops happens,
> whatever. Maybe something could track what the driver grabbed once and make
> sure both are released. even a background scan or /proc files would be fine.
> The disagreement with lspci is not good.
> 
> 
> 9) In the thread 
> Re: 3.8.2: stale pci device info for a previously inserted express card
> I already showed an example that chimeric entries in 'lspci -vvv' output
> can appear. Some data describe the previously loaded card in an Express
> Card Slot while the other the one currently loaded in the slot.
> This might lead to an explanation why are there those lines in lspci like:
> 
> a)
> Latency: 0
> Latency: 0, Cache Line Size: 64 bytes
> or the Latency: line missing altogether
> 
> b)
> [virtual] Expansion ROM at f6c00000 [disabled] [size=512K]
> Expansion ROM at f6c00000 [size=512K]
> 
> c)
> Region 0: Memory at f6c84000 (64-bit, non-prefetchable) [size=128]
> Region 0: Memory at f6c84000 (64-bit, non-prefetchable) [disabled] [size=128]
> 
> 
> If kernel does not give a hint what is wrong with a device/driver then
> maybe lspci do do a runtime check and give some more useful user-oriented warning.


Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Huang Ying - April 4, 2013, 11:30 a.m.
Hi, Martin,

On Wed, 2013-04-03 at 14:16 +0200, Martin Mokrejs wrote:
> Meanwhile, the raw data: http://195.113.57.32/~mmokrejs/tmp/20130402.tar.bz2
> (size 468641 bytes)

Thanks a lot!  Your information is very complete and clear :)

> They were collected by:
> 
> # cat ~/bin/collect_runtime_status.sh 
> #!/bin/sh
> grep . /sys/bus/pci/devices/*/power/runtime_status > runtime_status_"$1".txt
> grep . /sys/bus/pci/devices/*/power/control > control_"$1".txt
> cat /proc/interrupts > interrupts_"$1".txt
> cat /proc/iomem > iomem_"$1".txt
> lspci -vvv > lspci_vvv_"$1".txt
> dmesg > dmesg_"$1".txt
> #
> 
> Just do 'ls -latr' to see the ordering of the files as they were created.
> The longer the filename, the later in the test process. The names should be
> relatively self-explaining. Definitely, from the log files you should see
> what happened in real and therefore, can figure out what the (maybe weird)
> long filename really meant.
> 
> Sometimes I manually recorded lsusb of dmesg_final.txt, mostly after I did some
> extra tests but but not want to record every step by the above 6 files.
> 
> In one or two places I added some my own notes into COMMENTS file.
> 
> 
> 
> 
> I will try to guide your below where you can study which of the bugs. Mostly,
> for each bug you need just one subdirectory to look into, the other are just
> repeated the same bug under different kernel version or another patch.
> However, Sarah for the xHCI dead port issue will need to compare by diff
> two directories, one with the TI-based controller tests, the other with the
> NEC-based tests. Especially there, I would do something like:
> 
> cd *TI-based; for f in dmesg*; do cut -c 15- $f > /tmp/TI/$f; done
> cd ../*NEC-based; for f in dmesg*; do cut -c 15- $f > /tmp/NEC/$f; done
> 
> Then it should be easier to poke through file captured at the same test step,
> like:
> 
> diff -u -w /tmp/TI/dmesg_initial__mouse_attached__unplugged__reattached_but_port_dead.txt \
> /tmp/NEC/dmesg_initial__mouse_attached__detached__reattached.txt
> 
> 
> 
> Other than that, just diff pairs of files with each other, like:
> 
> diff -u -w lspci_vvv_initial.txt lspci_vvv_initial__mouse_attached.txt
> 
> 
> Sorry that I sometimes used only a single underscore instead of double underscores
> to separate the test steps from each other in the filename.
> 
> 
> Martin Mokrejs wrote:
> > [ +linux-pci and Yinghai as they suffered already those many emails on individual
> >  threads so one overviewing email hopefully won't harm] ;-)
> > 
> > Martin Mokrejs wrote:
> >>
> >>
> >> Bjorn Helgaas wrote:
> >>> On Tue, Apr 2, 2013 at 9:02 AM, Martin Mokrejs
> >>> <mmokrejs@fold.natur.cuni.cz> wrote:
> >>>> Hi Ying,
> >>>>
> >>>> huang ying wrote:
> >>>
> >>>>> And please give me the full dmesg for boot and incremental dmesg for
> >>>>> operations.
> >>>>
> >>>>
> >>>> The incremental bits here, the full dmesg will send only directly to your email, due to its size.
> >>>
> >>> Is there a bugzilla for this issue?  Please attach the complete dmesg
> >>> there or somewhere similar so we can all benefit.
> >>
> >> I changed my mind. I am attaching the dmesg here but omitting linux-acpi
> >> list. After I hear a proposal from Rafel/Bjorn I will open separate bugs.
> >> I thought that the threads I started so far were enough but yes, dmesg
> >> files don't pass through list filters so I should move that to bugzilla.
> >>
> >> so far my view of the the bugs was:
> >>
> >> 1) acpiphp hotplug broken due to upstream pcieport 1c.7 PME# enabled
> >>   (eSATA-based card)
> > 
> > Fixed by Ying Huang port_dbg.patch applied over 3.8.5 (fixes acpiphp hotplug
> > of eSATA and Firewire cards, NOT the hotplug of a NEC-based USB3 card -> hence
> > the bug 4) below). Now I can continue using laptop-mode-tools.
> 
> 20130402/3.8.5-ying_port-dbg__with_laptop-mode-tools_eSATA_testing
> 20130402/3.8.3-vanilla__with_laptop-mode-tools (with some comments in
>                                                 COMMENTS file)

Thanks for your testing!

> >> 2) xHCI dead due to to its suspend - 3.8 series and above
> > 
> > Not fixed by port_dbg.patch applied over 3.8.5. Interestingly, a NEC-based
> > XHCI card *in an express card slot* does not suffer this suspend issue.
> > Although it is being put into suspend if a device is unplugged.
> 
> 20130402/3.8.5-ying_port-dbg__with_laptop-mode-tools_xHCI_test_TI-based
> 20130402/3.8.5-ying_port-dbg__with_laptop-mode-tools_xHCI_test_NEC-based
> 
> Same thing but yet without the port_dbg.patch:
> 20130402/3.9-rc5__with_2368081__with-latop-mode-tools_xhci_testing/

It appears that TI xHCI dead port issue will present even if the PCIe
port will never go suspended.  So I think this bug is not related to
PCIe port runtime PM but related to USB xHCI.

Do you agree Sarah?

[snip]

Best Regards,
Huang Ying


--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sarah Sharp - April 4, 2013, 7:19 p.m.
On Thu, Apr 04, 2013 at 07:30:19PM +0800, Huang Ying wrote:
> Hi, Martin,
> 
> On Wed, 2013-04-03 at 14:16 +0200, Martin Mokrejs wrote:
> > >> 2) xHCI dead due to to its suspend - 3.8 series and above
> > > 
> > > Not fixed by port_dbg.patch applied over 3.8.5. Interestingly, a NEC-based
> > > XHCI card *in an express card slot* does not suffer this suspend issue.
> > > Although it is being put into suspend if a device is unplugged.
> > 
> > 20130402/3.8.5-ying_port-dbg__with_laptop-mode-tools_xHCI_test_TI-based
> > 20130402/3.8.5-ying_port-dbg__with_laptop-mode-tools_xHCI_test_NEC-based
> > 
> > Same thing but yet without the port_dbg.patch:
> > 20130402/3.9-rc5__with_2368081__with-latop-mode-tools_xhci_testing/
> 
> It appears that TI xHCI dead port issue will present even if the PCIe
> port will never go suspended.  So I think this bug is not related to
> PCIe port runtime PM but related to USB xHCI.
> 
> Do you agree Sarah?

No.  The symptoms he described (in another email) were that the port
only becomes "dead" after a USB 2.0 device is removed, and the host was
suspended.  The issue was that the TI host is simply not reporting the
USB device connect, even if it is manually resumed.  The port status
registers do not show a device connect at all.

Martin, can you confirm this by trying this, and sending me dmesg of the
test with CONFIG_USB_DEBUG and CONFIG_USB_XHCI_HCD_DEBUGGING turned on:

1. Remove the laptop mode tools
2. Reboot with no USB devices attached to the TI host
3. Make sure the xHCI PCI device's power/control file is set to 'on'
   You will find that file in /sys/bus/pci/devices/.  Use lspci to
   figure out which directory is the xHCI PCI device.
4. Plug in a USB 2.0 device and make sure it works (e.g. wiggle a
   mouse)
5. Unplug the device, replug it, and check to see if it works.

If you have problems, stop here.  Otherwise try:

6. Unplug all USB devices
7. echo 'auto' to the xHCI PCI device's power/control file in
8. echo 'auto' to both xHCI roothubs in /sys/bus/usb/devices/
   (i.e. all usbN directories)
9. Wait a few seconds or so until the xHCI PCI host suspends, meaning the
   power/runtime_status file reads as 'suspended'
10. Plug in the same USB 2.0 device, and check if it works.
11. Unplug the device, and wait until the PCI host is suspended.
12. Replug the device, and check to see if it works.

Sarah Sharp
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Martin Mokrejs - April 5, 2013, 12:30 p.m.
Sarah Sharp wrote:
> On Thu, Apr 04, 2013 at 07:30:19PM +0800, Huang Ying wrote:
>> Hi, Martin,
>>
>> On Wed, 2013-04-03 at 14:16 +0200, Martin Mokrejs wrote:
>>>>> 2) xHCI dead due to to its suspend - 3.8 series and above
>>>>
>>>> Not fixed by port_dbg.patch applied over 3.8.5. Interestingly, a NEC-based
>>>> XHCI card *in an express card slot* does not suffer this suspend issue.
>>>> Although it is being put into suspend if a device is unplugged.
>>>
>>> 20130402/3.8.5-ying_port-dbg__with_laptop-mode-tools_xHCI_test_TI-based
>>> 20130402/3.8.5-ying_port-dbg__with_laptop-mode-tools_xHCI_test_NEC-based
>>>
>>> Same thing but yet without the port_dbg.patch:
>>> 20130402/3.9-rc5__with_2368081__with-latop-mode-tools_xhci_testing/
>>
>> It appears that TI xHCI dead port issue will present even if the PCIe
>> port will never go suspended.  So I think this bug is not related to
>> PCIe port runtime PM but related to USB xHCI.
>>
>> Do you agree Sarah?
> 
> No.  The symptoms he described (in another email) were that the port
> only becomes "dead" after a USB 2.0 device is removed, and the host was
> suspended.  The issue was that the TI host is simply not reporting the
> USB device connect, even if it is manually resumed.  The port status
> registers do not show a device connect at all.
> 
> Martin, can you confirm this by trying this, and sending me dmesg of the
> test with CONFIG_USB_DEBUG and CONFIG_USB_XHCI_HCD_DEBUGGING turned on:
> 
> 1. Remove the laptop mode tools
> 2. Reboot with no USB devices attached to the TI host
> 3. Make sure the xHCI PCI device's power/control file is set to 'on'
>    You will find that file in /sys/bus/pci/devices/.  Use lspci to
>    figure out which directory is the xHCI PCI device.
> 4. Plug in a USB 2.0 device and make sure it works (e.g. wiggle a
>    mouse)
> 5. Unplug the device, replug it, and check to see if it works.

Works. Actually, I plugged in the mouse in and out several times
to show that the *unplug* does not kill the socket.


> If you have problems, stop here.  Otherwise try:
> 
> 6. Unplug all USB devices
> 7. echo 'auto' to the xHCI PCI device's power/control file in

The 0b:00.0 is already suspended after the echo 'auto', but I tried to continue
with step 8. Some default kicks in?


> 8. echo 'auto' to both xHCI roothubs in /sys/bus/usb/devices/
>    (i.e. all usbN directories)

No need, they are already suspended:

# cat /sys/devices/pci0000\:00/0000:00:1c.4/0000:0b:00.0/usb3/power/control
auto
# cat /sys/devices/pci0000\:00/0000:00:1c.4/0000:0b:00.0/usb3/power/runtime_status
suspended
# cat /sys/devices/pci0000\:00/0000:00:1c.4/0000:0b:00.0/usb4/power/runtime_status
suspended
#

> 9. Wait a few seconds or so until the xHCI PCI host suspends, meaning the
>    power/runtime_status file reads as 'suspended'
> 10. Plug in the same USB 2.0 device, and check if it works.

It works.

> 11. Unplug the device, and wait until the PCI host is suspended.

Unplug causes death per dmesg.

[  932.419828] xhci_hcd 0000:0b:00.0: Cached old ring, 1 ring cached
[  932.420240] xhci_hcd 0000:0b:00.0: // Ding dong!
[  932.420342] xhci_hcd 0000:0b:00.0: get port status, actual port 1 status  = 0x2a0
[  932.420344] xhci_hcd 0000:0b:00.0: Get port status returned 0x100
[  932.454637] xhci_hcd 0000:0b:00.0: get port status, actual port 1 status  = 0x2a0
[  932.454638] xhci_hcd 0000:0b:00.0: Get port status returned 0x100
[  932.494828] xhci_hcd 0000:0b:00.0: get port status, actual port 1 status  = 0x2a0
[  932.494831] xhci_hcd 0000:0b:00.0: Get port status returned 0x100
[  932.534856] xhci_hcd 0000:0b:00.0: get port status, actual port 1 status  = 0x2a0
[  932.534859] xhci_hcd 0000:0b:00.0: Get port status returned 0x100
[  932.574871] xhci_hcd 0000:0b:00.0: get port status, actual port 1 status  = 0x2a0
[  932.574874] xhci_hcd 0000:0b:00.0: Get port status returned 0x100
[  932.574888] hub 3-0:1.0: debounce: port 2: total 100ms stable 100ms status 0x100
[  932.574905] hub 3-0:1.0: hub_suspend
[  932.574912] usb usb3: bus auto-suspend, wakeup 1
[  932.574928] xhci_hcd 0000:0b:00.0: xhci_hub_status_data: stopping port polling.
[  932.574947] xhci_hcd 0000:0b:00.0: xhci_suspend: stopping port polling.
[  932.574974] xhci_hcd 0000:0b:00.0: // Setting command ring address to 0xd6007001
[  932.575026] xhci_hcd 0000:0b:00.0: hcd_pci_runtime_suspend: 0
[  932.575119] xhci_hcd 0000:0b:00.0: PME# enabled
[  932.594863] xhci_hcd 0000:0b:00.0: pfrs: target: 3, 0


> 12. Replug the device, and check to see if it works.

Is dead.

Full logs at:
http://195.113.57.32/~mmokrejs/tmp/20130405.tar.bz2 (unpack, 'ls -latr', diff as you like).
Also .config is in there.

Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Martin Mokrejs - April 5, 2013, 12:40 p.m.
Huang Ying wrote:
> Hi, Martin,
> 
> On Wed, 2013-04-03 at 14:16 +0200, Martin Mokrejs wrote:
>> Meanwhile, the raw data: http://195.113.57.32/~mmokrejs/tmp/20130402.tar.bz2
>> (size 468641 bytes)
> 
> Thanks a lot!  Your information is very complete and clear :)
> 
>> They were collected by:
>>
>> # cat ~/bin/collect_runtime_status.sh 
>> #!/bin/sh
>> grep . /sys/bus/pci/devices/*/power/runtime_status > runtime_status_"$1".txt
>> grep . /sys/bus/pci/devices/*/power/control > control_"$1".txt
>> cat /proc/interrupts > interrupts_"$1".txt
>> cat /proc/iomem > iomem_"$1".txt
>> lspci -vvv > lspci_vvv_"$1".txt
>> dmesg > dmesg_"$1".txt
>> #
>>
>> Just do 'ls -latr' to see the ordering of the files as they were created.
>> The longer the filename, the later in the test process. The names should be
>> relatively self-explaining. Definitely, from the log files you should see
>> what happened in real and therefore, can figure out what the (maybe weird)
>> long filename really meant.
>>
>> Sometimes I manually recorded lsusb of dmesg_final.txt, mostly after I did some
>> extra tests but but not want to record every step by the above 6 files.
>>
>> In one or two places I added some my own notes into COMMENTS file.
>>
>>
>>
>>
>> I will try to guide your below where you can study which of the bugs. Mostly,
>> for each bug you need just one subdirectory to look into, the other are just
>> repeated the same bug under different kernel version or another patch.
>> However, Sarah for the xHCI dead port issue will need to compare by diff
>> two directories, one with the TI-based controller tests, the other with the
>> NEC-based tests. Especially there, I would do something like:
>>
>> cd *TI-based; for f in dmesg*; do cut -c 15- $f > /tmp/TI/$f; done
>> cd ../*NEC-based; for f in dmesg*; do cut -c 15- $f > /tmp/NEC/$f; done
>>
>> Then it should be easier to poke through file captured at the same test step,
>> like:
>>
>> diff -u -w /tmp/TI/dmesg_initial__mouse_attached__unplugged__reattached_but_port_dead.txt \
>> /tmp/NEC/dmesg_initial__mouse_attached__detached__reattached.txt
>>
>>
>>
>> Other than that, just diff pairs of files with each other, like:
>>
>> diff -u -w lspci_vvv_initial.txt lspci_vvv_initial__mouse_attached.txt
>>
>>
>> Sorry that I sometimes used only a single underscore instead of double underscores
>> to separate the test steps from each other in the filename.
>>
>>
>> Martin Mokrejs wrote:
>>> [ +linux-pci and Yinghai as they suffered already those many emails on individual
>>>  threads so one overviewing email hopefully won't harm] ;-)
>>>
>>> Martin Mokrejs wrote:
>>>>
>>>>
>>>> Bjorn Helgaas wrote:
>>>>> On Tue, Apr 2, 2013 at 9:02 AM, Martin Mokrejs
>>>>> <mmokrejs@fold.natur.cuni.cz> wrote:
>>>>>> Hi Ying,
>>>>>>
>>>>>> huang ying wrote:
>>>>>
>>>>>>> And please give me the full dmesg for boot and incremental dmesg for
>>>>>>> operations.
>>>>>>
>>>>>>
>>>>>> The incremental bits here, the full dmesg will send only directly to your email, due to its size.
>>>>>
>>>>> Is there a bugzilla for this issue?  Please attach the complete dmesg
>>>>> there or somewhere similar so we can all benefit.
>>>>
>>>> I changed my mind. I am attaching the dmesg here but omitting linux-acpi
>>>> list. After I hear a proposal from Rafel/Bjorn I will open separate bugs.
>>>> I thought that the threads I started so far were enough but yes, dmesg
>>>> files don't pass through list filters so I should move that to bugzilla.
>>>>
>>>> so far my view of the the bugs was:
>>>>
>>>> 1) acpiphp hotplug broken due to upstream pcieport 1c.7 PME# enabled
>>>>   (eSATA-based card)
>>>
>>> Fixed by Ying Huang port_dbg.patch applied over 3.8.5 (fixes acpiphp hotplug
>>> of eSATA and Firewire cards, NOT the hotplug of a NEC-based USB3 card -> hence
>>> the bug 4) below). Now I can continue using laptop-mode-tools.
>>
>> 20130402/3.8.5-ying_port-dbg__with_laptop-mode-tools_eSATA_testing
>> 20130402/3.8.3-vanilla__with_laptop-mode-tools (with some comments in
>>                                                 COMMENTS file)
> 
> Thanks for your testing!
> 
>>>> 2) xHCI dead due to to its suspend - 3.8 series and above
>>>
>>> Not fixed by port_dbg.patch applied over 3.8.5. Interestingly, a NEC-based
>>> XHCI card *in an express card slot* does not suffer this suspend issue.
>>> Although it is being put into suspend if a device is unplugged.
>>
>> 20130402/3.8.5-ying_port-dbg__with_laptop-mode-tools_xHCI_test_TI-based
>> 20130402/3.8.5-ying_port-dbg__with_laptop-mode-tools_xHCI_test_NEC-based
>>
>> Same thing but yet without the port_dbg.patch:
>> 20130402/3.9-rc5__with_2368081__with-latop-mode-tools_xhci_testing/
> 
> It appears that TI xHCI dead port issue will present even if the PCIe
> port will never go suspended.  So I think this bug is not related to
> PCIe port runtime PM but related to USB xHCI.
> 
> Do you agree Sarah?

Although I confirmed with 20130405.tar.bz2 dataset what Sarah repeated from our
past findings in the email which should be just in your your inbox, one thing is
puzzling:
When I have powersaving enabled upon bootup with NO USB devices attached to the TI
controller, effectively while reaching multiuser mode the 0b:00.0 is in a suspend
state. But, somehow, the very first mouse plugin works. Only the reject causes
more 'aggressive' suspend.
As it seems no upstream 1c.4 is messing up here (in the test Sarah wanted me to do
we have all control files 'on' except the end 0b:00.0) then really still something
*else* is causing the dead port *in conjunction* with 'suspended' runtime state.
Please double check what I wrote initially about the 20130402.tar.bz2 dataset.
Notably, I would compare lspci outputs from a cold boot state with no devices
attached and suspended 0b:00.0 (the 20130402.tar.bz2 dataset) with the dead port
status in lspci (find any in 20130402.tar.bz2 or now in 20130405.tar.bz2).

Martin

> 
> [snip]
> 
> Best Regards,
> Huang Ying
> 
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

Index: linux-pm/drivers/pci/pci-acpi.c
===================================================================
--- linux-pm.orig/drivers/pci/pci-acpi.c
+++ linux-pm/drivers/pci/pci-acpi.c
@@ -53,14 +53,15 @@  static void pci_acpi_wake_dev(acpi_handl
 		return;
 	}
 
-	if (!pci_dev->pm_cap || !pci_dev->pme_support
-	     || pci_check_pme_status(pci_dev)) {
-		if (pci_dev->pme_poll)
-			pci_dev->pme_poll = false;
+	/* Clear PME Status if set. */
+	if (pci_dev->pme_support)
+		pci_check_pme_status(pci_dev);
 
-		pci_wakeup_event(pci_dev);
-		pm_runtime_resume(&pci_dev->dev);
-	}
+	if (pci_dev->pme_poll)
+		pci_dev->pme_poll = false;
+
+	pci_wakeup_event(pci_dev);
+	pm_runtime_resume(&pci_dev->dev);
 
 	if (pci_dev->subordinate)
 		pci_pme_wakeup_bus(pci_dev->subordinate);