diff mbox

[2/2] PCI,pciehp: avoid add a device already exist during pciehp_resume

Message ID 1373356564-21668-1-git-send-email-wangyijing@huawei.com
State Not Applicable
Headers show

Commit Message

Yijing Wang July 9, 2013, 7:56 a.m. UTC
Currently, pciehp_resume will call pciehp_enable_slot() to add
device if there is a device in the slot. But if the device was
present before suspend, it's no necessary to add again. Now in
such case, there is some uncomfortable message like

    pciehp 0000:00:1c.1:pcie04: Device 0000:03:00.0 already exists at 0000:03:00, cannot hot-add
	pciehp 0000:00:1c.1:pcie04: Cannot add device at 0000:03:00

This problem was reported by Paul Bolle <pebolle@tiscali.nl>
The discussion link: http://comments.gmane.org/gmane.linux.kernel.pci/19876

We can use PCIe Device Serial Number to identify the device if
device support DSN.

currently:
1. slot is empty before suspend, insert card during suspend.
	In this case, is safe, pciehp will add device by check adapter
status register in pciehp_resume.

2. slot is non empty before suspend, remove card during suspend.
	Also be safe, pciehp will remove device by check adapter
status register in pciehp_resume.

3. slot is non empty before suspend, remove card during suspend
and insert a new card.
	Now pciehp just call pciehp_enable_slot() roughly. We should
remove the old card firstly, then add the new card.

4. slot is non empty before suspend, no action during suspend.
	We should do nothing in pciehp_resume, but we call
pciehp_enable_slot(), so some uncomfortable messages show like above.
In this case, we can improve it a little by add a guard
if (!list_empty(bus->devices)).

Reported-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Cc: Paul Bolle <pebolle@tiscali.nl>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Oliver Neukum <oneukum@suse.de>
Cc: Gu Zheng <guz.fnst@cn.fujitsu.com>
Cc: linux-pci@vger.kernel.org
---
 drivers/pci/hotplug/pciehp_core.c |   38 +++++++++++++++++++++++++++++++++---
 1 files changed, 34 insertions(+), 4 deletions(-)

Comments

Paul Bolle July 9, 2013, 8:07 a.m. UTC | #1
On Tue, 2013-07-09 at 15:56 +0800, Yijing Wang wrote:
> Currently, pciehp_resume will call pciehp_enable_slot() to add
> device if there is a device in the slot. But if the device was
> present before suspend, it's no necessary to add again. Now in
> such case, there is some uncomfortable message like
> 
>     pciehp 0000:00:1c.1:pcie04: Device 0000:03:00.0 already exists at 0000:03:00, cannot hot-add
> 	pciehp 0000:00:1c.1:pcie04: Cannot add device at 0000:03:00
> 
> This problem was reported by Paul Bolle <pebolle@tiscali.nl>
> The discussion link: http://comments.gmane.org/gmane.linux.kernel.pci/19876
> 
> We can use PCIe Device Serial Number to identify the device if
> device support DSN.
> 
> currently:
> 1. slot is empty before suspend, insert card during suspend.
> 	In this case, is safe, pciehp will add device by check adapter
> status register in pciehp_resume.
> 
> 2. slot is non empty before suspend, remove card during suspend.
> 	Also be safe, pciehp will remove device by check adapter
> status register in pciehp_resume.
> 
> 3. slot is non empty before suspend, remove card during suspend
> and insert a new card.
> 	Now pciehp just call pciehp_enable_slot() roughly. We should
> remove the old card firstly, then add the new card.
> 
> 4. slot is non empty before suspend, no action during suspend.
> 	We should do nothing in pciehp_resume, but we call
> pciehp_enable_slot(), so some uncomfortable messages show like above.
> In this case, we can improve it a little by add a guard
> if (!list_empty(bus->devices)).

Great!

I'm currently trying to bisect another problem, but hope to test this
patch (and the preceding patch it apparently needs) in a few days.
Please feel free to prod me if you think testing is needed but I'm
taking to long to report back.

Thanks!


Paul Bolle

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yijing Wang July 9, 2013, 8:18 a.m. UTC | #2
>> In this case, we can improve it a little by add a guard
>> if (!list_empty(bus->devices)).
> 
> Great!
> 
> I'm currently trying to bisect another problem, but hope to test this
> patch (and the preceding patch it apparently needs) in a few days.
> Please feel free to prod me if you think testing is needed but I'm
> taking to long to report back.

Hi Paul,
   Thank you for helping test. Because I have no machine to test this case,
so your test report is very important, it can help us going in the right way
to solve this problem.

Thanks!
Yijing.

> 
> 
> .
>
Bjorn Helgaas July 9, 2013, 10:27 p.m. UTC | #3
On Tue, Jul 9, 2013 at 1:56 AM, Yijing Wang <wangyijing@huawei.com> wrote:
> Currently, pciehp_resume will call pciehp_enable_slot() to add
> device if there is a device in the slot. But if the device was
> present before suspend, it's no necessary to add again. Now in
> such case, there is some uncomfortable message like
>
>     pciehp 0000:00:1c.1:pcie04: Device 0000:03:00.0 already exists at 0000:03:00, cannot hot-add
>         pciehp 0000:00:1c.1:pcie04: Cannot add device at 0000:03:00
>
> This problem was reported by Paul Bolle <pebolle@tiscali.nl>
> The discussion link: http://comments.gmane.org/gmane.linux.kernel.pci/19876
>
> We can use PCIe Device Serial Number to identify the device if
> device support DSN.

I think I like the idea of this, especially because the Microsoft PCI
Hardware Compliance Test apparently requires DSN for hot-pluggable
PCIe devices [1], so it should be pretty universal.

[1] http://www.techtalkz.com/microsoft-device-drivers-dtm/341362-dtm-pcihct-test-violates-pci-express-base-specification-revision-1-a.html

> currently:
> 1. slot is empty before suspend, insert card during suspend.
>         In this case, is safe, pciehp will add device by check adapter
> status register in pciehp_resume.

Your patch doesn't change anything here.

> 2. slot is non empty before suspend, remove card during suspend.
>         Also be safe, pciehp will remove device by check adapter
> status register in pciehp_resume.

Your patch doesn't change anything here.  (But I think the driver
.remove() method will try to poke at the non-existent device; see
below.)

> 3. slot is non empty before suspend, remove card during suspend
> and insert a new card.
>         Now pciehp just call pciehp_enable_slot() roughly. We should
> remove the old card firstly, then add the new card.

With your patch, I think we'll call the old driver's .remove() method
on the new device.  This seems bad; see below.

With your patch, if we remove and reinsert the same device while
suspended, we do nothing because the DSN didn't change.  Previously we
called pciehp_enable_slot().  I don't know if we need to do anything
here or not.

> 4. slot is non empty before suspend, no action during suspend.
>         We should do nothing in pciehp_resume, but we call
> pciehp_enable_slot(), so some uncomfortable messages show like above.
> In this case, we can improve it a little by add a guard
> if (!list_empty(bus->devices)).

This is the common case.  Previously we called pciehp_enable_slot(),
and with your patch we do nothing.  I think that seems sensible, but
this part should be split into a separate patch.  That way we can keep
the benefit of this change even if we trip over something with the
other changes.

> Reported-by: Paul Bolle <pebolle@tiscali.nl>
> Signed-off-by: Yijing Wang <wangyijing@huawei.com>
> Cc: Paul Bolle <pebolle@tiscali.nl>
> Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
> Cc: Oliver Neukum <oneukum@suse.de>
> Cc: Gu Zheng <guz.fnst@cn.fujitsu.com>
> Cc: linux-pci@vger.kernel.org
> ---
>  drivers/pci/hotplug/pciehp_core.c |   38 +++++++++++++++++++++++++++++++++---
>  1 files changed, 34 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/pci/hotplug/pciehp_core.c b/drivers/pci/hotplug/pciehp_core.c
> index 7d72c5e..d01e093 100644
> --- a/drivers/pci/hotplug/pciehp_core.c
> +++ b/drivers/pci/hotplug/pciehp_core.c
> @@ -291,6 +291,28 @@ static void pciehp_remove(struct pcie_device *dev)
>  }
>
>  #ifdef CONFIG_PM
> +
> +/* If device support Device Serial Numner, use DSN

s/support/supports/
s/Numner/Number/
Use conventional comment style:
  /*
   * If device ...
   */

> + * to identify the device
> + */
> +static bool device_in_slot_is_changed(struct pci_bus *pbus)
> +{
> +       u64 old_dsn, new_dsn;
> +       struct pci_dev *pdev;
> +
> +       pdev = pci_get_slot(pbus, PCI_DEVFN(0, 0));

pci_get_slot() can fail.

> +       old_dsn = pdev->sn;
> +
> +       /* get func 0 device serial number */
> +       pci_get_dsn(pdev, &new_dsn);
> +       pci_dev_put(pdev);
> +
> +       if (old_dsn != new_dsn)
> +               return true;
> +
> +       return false;
> +}
> +
>  static int pciehp_suspend (struct pcie_device *dev)
>  {
>         return 0;
> @@ -300,6 +322,7 @@ static int pciehp_resume (struct pcie_device *dev)
>  {
>         struct controller *ctrl;
>         struct slot *slot;
> +       struct pci_bus *pbus = dev->port->subordinate;
>         u8 status;
>
>         ctrl = get_service_data(dev);
> @@ -311,10 +334,17 @@ static int pciehp_resume (struct pcie_device *dev)
>
>         /* Check if slot is occupied */
>         pciehp_get_adapter_status(slot, &status);
> -       if (status)
> -               pciehp_enable_slot(slot);
> -       else
> -               pciehp_disable_slot(slot);
> +       if (status) {
> +               if (list_empty(&pbus->devices))
> +                       pciehp_enable_slot(slot);
> +               else if (device_in_slot_is_changed(pbus)) {
> +                       pciehp_disable_slot(slot);

pciehp_disable_slot() ultimately calls the .remove() method for the
device that has already been removed.  That method is likely to poke
at the device, which will do something unexpected because the new
device is likely to be completely different.

I think the call path is this:

    pciehp_resume
      pciehp_disable_slot
        remove_board
          pciehp_unconfigure_device
            pci_stop_and_remove_bus_device
              pci_stop_bus_device
                pci_stop_dev
                  device_del
                    bus_remove_device
                      device_release_driver
                        __device_release_driver
                          pci_device_remove     # pci_bus_type.remove
                            dev->driver->remove

I think we already had this problem for case 2 (card removed while
suspended), but at least in that case, the driver .remove() method
doesn't have a device to poke at, so it's less likely to do something
bad.

> +                       pciehp_enable_slot(slot);
> +               }
> +       } else {
> +               if (!list_empty(&pbus->devices))
> +                       pciehp_disable_slot(slot);
> +       }
>         return 0;
>  }
>  #endif /* PM */
> --
> 1.7.1
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yijing Wang July 10, 2013, 3 a.m. UTC | #4
Hi Bjorn,
   Thanks for your review and comments!

>> We can use PCIe Device Serial Number to identify the device if
>> device support DSN.
> 
> I think I like the idea of this, especially because the Microsoft PCI
> Hardware Compliance Test apparently requires DSN for hot-pluggable
> PCIe devices [1], so it should be pretty universal.
> 
> [1] http://www.techtalkz.com/microsoft-device-drivers-dtm/341362-dtm-pcihct-test-violates-pci-express-base-specification-revision-1-a.html
> 
>> currently:
>> 1. slot is empty before suspend, insert card during suspend.
>>         In this case, is safe, pciehp will add device by check adapter
>> status register in pciehp_resume.
> 
> Your patch doesn't change anything here.

Yes, I only to make some changes for case 3/4.

> 
>> 2. slot is non empty before suspend, remove card during suspend.
>>         Also be safe, pciehp will remove device by check adapter
>> status register in pciehp_resume.
> 
> Your patch doesn't change anything here.  (But I think the driver
> .remove() method will try to poke at the non-existent device; see
> below.)

I'm not sure the result of driver .remove() method to poke at the non-existent device.
If driver .remove() method cannot detect the real device, remove action will be block ?
If the slot support surprise hot remove, this action maybe safe. right?

If the slot does not support surprise hot remove, but the device was already removed,
we seem to have no other way to clean the stale data related to the old device.

Now if we check adapter status in slot and found adapter is non existent, pciehp resume
call pciehp_disable_slot() , in pciehp_disable_slot() function, we will check latch status,
I guess this case latch is open(because slot is empty), this action will abort.
But I have no platform to test it.

> 
>> 3. slot is non empty before suspend, remove card during suspend
>> and insert a new card.
>>         Now pciehp just call pciehp_enable_slot() roughly. We should
>> remove the old card firstly, then add the new card.
> 
> With your patch, I think we'll call the old driver's .remove() method
> on the new device.  This seems bad; see below.

Ah, this is issue.
What about power off slot first, then call the old driver's remove() method
will not touch the new physical device. After the old driver's remove() cleanup,
we call pciehp_enable_slot() to power on and enable the new device.

> 
> With your patch, if we remove and reinsert the same device while
> suspended, we do nothing because the DSN didn't change.  Previously we
> called pciehp_enable_slot().  I don't know if we need to do anything
> here or not.

Mainly to avoid the redundant device add, the same like the changes for case 4

> 
>> 4. slot is non empty before suspend, no action during suspend.
>>         We should do nothing in pciehp_resume, but we call
>> pciehp_enable_slot(), so some uncomfortable messages show like above.
>> In this case, we can improve it a little by add a guard
>> if (!list_empty(bus->devices)).
> 
> This is the common case.  Previously we called pciehp_enable_slot(),
> and with your patch we do nothing.  I think that seems sensible, but
> this part should be split into a separate patch.  That way we can keep
> the benefit of this change even if we trip over something with the
> other changes.

OK, I will split this changes into a new patch.

> 
>> Reported-by: Paul Bolle <pebolle@tiscali.nl>
>> Signed-off-by: Yijing Wang <wangyijing@huawei.com>
>> Cc: Paul Bolle <pebolle@tiscali.nl>
>> Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
>> Cc: Oliver Neukum <oneukum@suse.de>
>> Cc: Gu Zheng <guz.fnst@cn.fujitsu.com>
>> Cc: linux-pci@vger.kernel.org
>> ---
>>  drivers/pci/hotplug/pciehp_core.c |   38 +++++++++++++++++++++++++++++++++---
>>  1 files changed, 34 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/pci/hotplug/pciehp_core.c b/drivers/pci/hotplug/pciehp_core.c
>> index 7d72c5e..d01e093 100644
>> --- a/drivers/pci/hotplug/pciehp_core.c
>> +++ b/drivers/pci/hotplug/pciehp_core.c
>> @@ -291,6 +291,28 @@ static void pciehp_remove(struct pcie_device *dev)
>>  }
>>
>>  #ifdef CONFIG_PM
>> +
>> +/* If device support Device Serial Numner, use DSN
> 
> s/support/supports/
> s/Numner/Number/
> Use conventional comment style:
>   /*
>    * If device ...
>    */
> 

Will update, thanks.


>> + * to identify the device
>> + */
>> +static bool device_in_slot_is_changed(struct pci_bus *pbus)
>> +{
>> +       u64 old_dsn, new_dsn;
>> +       struct pci_dev *pdev;
>> +
>> +       pdev = pci_get_slot(pbus, PCI_DEVFN(0, 0));
> 
> pci_get_slot() can fail.

Will add failure return check, thanks.

> 
>> +       old_dsn = pdev->sn;
>> +
>> +       /* get func 0 device serial number */
>> +       pci_get_dsn(pdev, &new_dsn);
>> +       if (status) {
>> +               if (list_empty(&pbus->devices))
>> +                       pciehp_enable_slot(slot);
>> +               else if (device_in_slot_is_changed(pbus)) {
>> +                       pciehp_disable_slot(slot);
> 
> pciehp_disable_slot() ultimately calls the .remove() method for the
> device that has already been removed.  That method is likely to poke
> at the device, which will do something unexpected because the new
> device is likely to be completely different.

Yes, this is a really issue, so what about firstly power off the slot before call
.remove() method for the old device.

> 
> I think the call path is this:
> 
>     pciehp_resume
>       pciehp_disable_slot
>         remove_board
>           pciehp_unconfigure_device
>             pci_stop_and_remove_bus_device
>               pci_stop_bus_device
>                 pci_stop_dev
>                   device_del
>                     bus_remove_device
>                       device_release_driver
>                         __device_release_driver
>                           pci_device_remove     # pci_bus_type.remove
>                             dev->driver->remove
> 
> I think we already had this problem for case 2 (card removed while
> suspended), but at least in that case, the driver .remove() method
> doesn't have a device to poke at, so it's less likely to do something
> bad.

Agree.

> 
>> +                       pciehp_enable_slot(slot);
>> +               }
>> +       } else {
>> +               if (!list_empty(&pbus->devices))
>> +                       pciehp_disable_slot(slot);
>> +       }
>>         return 0;
>>  }
>>  #endif /* PM */
>> --
>> 1.7.1
>>
>>
> 
> .
>
Bjorn Helgaas July 10, 2013, 8:27 p.m. UTC | #5
On Tue, Jul 9, 2013 at 9:00 PM, Yijing Wang <wangyijing@huawei.com> wrote:
> Hi Bjorn,
>    Thanks for your review and comments!
>
>>> We can use PCIe Device Serial Number to identify the device if
>>> device support DSN.
>>
>> I think I like the idea of this, especially because the Microsoft PCI
>> Hardware Compliance Test apparently requires DSN for hot-pluggable
>> PCIe devices [1], so it should be pretty universal.
>>
>> [1] http://www.techtalkz.com/microsoft-device-drivers-dtm/341362-dtm-pcihct-test-violates-pci-express-base-specification-revision-1-a.html
>>
>>> currently:
>>> 1. slot is empty before suspend, insert card during suspend.
>>>         In this case, is safe, pciehp will add device by check adapter
>>> status register in pciehp_resume.
>>
>> Your patch doesn't change anything here.
>
> Yes, I only to make some changes for case 3/4.
>
>>
>>> 2. slot is non empty before suspend, remove card during suspend.
>>>         Also be safe, pciehp will remove device by check adapter
>>> status register in pciehp_resume.
>>
>> Your patch doesn't change anything here.  (But I think the driver
>> .remove() method will try to poke at the non-existent device; see
>> below.)
>
> I'm not sure the result of driver .remove() method to poke at the non-existent device.
> If driver .remove() method cannot detect the real device, remove action will be block ?
> If the slot support surprise hot remove, this action maybe safe. right?

If there's no device, config space accesses performed by .remove()
will fail (reads will return -1 data or error; writes will be
dropped).  MMIO or I/O port accesses may fail with machine checks or
similar bad things.  But I don't see a way around that except by
fixing drivers as we encounter issues like that.

Since you're not changing anything here, I don't think we should worry
about it for now.

> If the slot does not support surprise hot remove, but the device was already removed,
> we seem to have no other way to clean the stale data related to the old device.
>
> Now if we check adapter status in slot and found adapter is non existent, pciehp resume
> call pciehp_disable_slot() , in pciehp_disable_slot() function, we will check latch status,
> I guess this case latch is open(because slot is empty), this action will abort.
> But I have no platform to test it.
>
>>
>>> 3. slot is non empty before suspend, remove card during suspend
>>> and insert a new card.
>>>         Now pciehp just call pciehp_enable_slot() roughly. We should
>>> remove the old card firstly, then add the new card.
>>
>> With your patch, I think we'll call the old driver's .remove() method
>> on the new device.  This seems bad; see below.
>
> Ah, this is issue.
> What about power off slot first, then call the old driver's remove() method
> will not touch the new physical device. After the old driver's remove() cleanup,
> we call pciehp_enable_slot() to power on and enable the new device.

Turning off power to the slot seems like a reasonable approach.  Then
we can run the old .remove() method in basically the same way we would
in case 2.

>> With your patch, if we remove and reinsert the same device while
>> suspended, we do nothing because the DSN didn't change.  Previously we
>> called pciehp_enable_slot().  I don't know if we need to do anything
>> here or not.
>
> Mainly to avoid the redundant device add, the same like the changes for case 4

I don't know whether it's redundant or not.  Obviously if we remove
and reinsert a device, we lose *all* state that was in the device.  If
we lose everything even if the card stayed inserted the whole time we
were suspended, we must already deal with that and the "add" would be
redundant.  But if the state of the card is different if it got pulled
and reinserted, the "add" would be necessary.

>>> 4. slot is non empty before suspend, no action during suspend.
>>>         We should do nothing in pciehp_resume, but we call
>>> pciehp_enable_slot(), so some uncomfortable messages show like above.
>>> In this case, we can improve it a little by add a guard
>>> if (!list_empty(bus->devices)).
>>
>> This is the common case.  Previously we called pciehp_enable_slot(),
>> and with your patch we do nothing.  I think that seems sensible, but
>> this part should be split into a separate patch.  That way we can keep
>> the benefit of this change even if we trip over something with the
>> other changes.
>
> OK, I will split this changes into a new patch.

Actually, without your DSN changes, I don't think we can distinguish
this from case 3.  So I doubt it really could be split out.
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yijing Wang July 11, 2013, 2:33 a.m. UTC | #6
>> If the slot support surprise hot remove, this action maybe safe. right?
> 
> If there's no device, config space accesses performed by .remove()
> will fail (reads will return -1 data or error; writes will be
> dropped).  MMIO or I/O port accesses may fail with machine checks or
> similar bad things.  But I don't see a way around that except by
> fixing drivers as we encounter issues like that.
> 
> Since you're not changing anything here, I don't think we should worry
> about it for now.

OK.

> 
>>>> remove the old card firstly, then add the new card.
>>>
>>> With your patch, I think we'll call the old driver's .remove() method
>>> on the new device.  This seems bad; see below.
>>
>> Ah, this is issue.
>> What about power off slot first, then call the old driver's remove() method
>> will not touch the new physical device. After the old driver's remove() cleanup,
>> we call pciehp_enable_slot() to power on and enable the new device.
> 
> Turning off power to the slot seems like a reasonable approach.  Then
> we can run the old .remove() method in basically the same way we would
> in case 2.

Hmmm, I will follow this method to rework this patch in next version.

> 
>>> With your patch, if we remove and reinsert the same device while
>>> suspended, we do nothing because the DSN didn't change.  Previously we
>>> called pciehp_enable_slot().  I don't know if we need to do anything
>>> here or not.
>>
>> Mainly to avoid the redundant device add, the same like the changes for case 4
> 
> I don't know whether it's redundant or not.  Obviously if we remove
> and reinsert a device, we lose *all* state that was in the device.  If
> we lose everything even if the card stayed inserted the whole time we
> were suspended, we must already deal with that and the "add" would be
> redundant.  But if the state of the card is different if it got pulled
> and reinserted, the "add" would be necessary.

This is a key issue, sorry, I'm not familiar with PM :(
In my opinion, the device driver .suspend() method will be called
regardless of system enter in suspend to RAM(S3) or suspend to Disk(S4). Driver will
save the pci/pcie/pci-x state in .suspend() method. So once device driver .resume()
method be called, the pci/pcie/pci-x state willl be restored.
Because suspend to disk will power off whole system, so I thought if the device
was removed and inserted same device(DSN) again, maybe .resume will enable this device ok
regardless of the pci config space state has been changed.

If I have any thing above understanding wrong, please correct me, thanks!

> 
>>>> 4. slot is non empty before suspend, no action during suspend.
>>>>         We should do nothing in pciehp_resume, but we call
>>>> pciehp_enable_slot(), so some uncomfortable messages show like above.
>>>> In this case, we can improve it a little by add a guard
>>>> if (!list_empty(bus->devices)).
>>>
>>> This is the common case.  Previously we called pciehp_enable_slot(),
>>> and with your patch we do nothing.  I think that seems sensible, but
>>> this part should be split into a separate patch.  That way we can keep
>>> the benefit of this change even if we trip over something with the
>>> other changes.
>>
>> OK, I will split this changes into a new patch.
> 
> Actually, without your DSN changes, I don't think we can distinguish
> this from case 3.  So I doubt it really could be split out.

I will try, but I think this is not a big deal :)

> 
> .
>
Yijing Wang July 11, 2013, 3:55 a.m. UTC | #7
>> 	We should do nothing in pciehp_resume, but we call
>> pciehp_enable_slot(), so some uncomfortable messages show like above.
>> In this case, we can improve it a little by add a guard
>> if (!list_empty(bus->devices)).
> 
> Great!
> 
> I'm currently trying to bisect another problem, but hope to test this
> patch (and the preceding patch it apparently needs) in a few days.
> Please feel free to prod me if you think testing is needed but I'm
> taking to long to report back.

Hi Paul,
   Can you provide the lspci -vvv and lspci -xxxx info messages ?
I want to confirm your hardware information which cause your resume error.
You can get these messages in any kernel version, that's ok.

When you suspend and resume the system, the wireless card is always present in pcie slot, right?


Thanks!
Yijing.



> 
> 
> .
>
Paul Bolle July 11, 2013, 10:19 a.m. UTC | #8
Yijing,

On Thu, 2013-07-11 at 11:55 +0800, Yijing Wang wrote:
> Can you provide the lspci -vvv and lspci -xxxx info messages ?
> I want to confirm your hardware information which cause your resume error.
> You can get these messages in any kernel version, that's ok.

Would it be sufficient to send that information just for the two pcie
ports on my laptop, and the wireless card that is present in one of
those two ports?

And would it be OK to send that information off list?

> When you suspend and resume the system, the wireless card is always
> present in pcie slot, right?

Yes. I actually wouldn't even know how to remove it. I only discovered
my wireless card was in a pcie slot because of the error messages at
resume (which started a few releases ago, I think v3.7).


Paul Bolle

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yijing Wang July 12, 2013, 1:49 a.m. UTC | #9
On 2013/7/11 18:19, Paul Bolle wrote:
> Yijing,
> 
> On Thu, 2013-07-11 at 11:55 +0800, Yijing Wang wrote:
>> Can you provide the lspci -vvv and lspci -xxxx info messages ?
>> I want to confirm your hardware information which cause your resume error.
>> You can get these messages in any kernel version, that's ok.
> 
> Would it be sufficient to send that information just for the two pcie
> ports on my laptop, and the wireless card that is present in one of
> those two ports?
> 

If you can provide both the lspci -vvv and lspci -xxxx info before suspend
and after resume, it's better.

> And would it be OK to send that information off list?

OK~


Thanks!
Yijing.


> 
>> When you suspend and resume the system, the wireless card is always
>> present in pcie slot, right?
> 
> Yes. I actually wouldn't even know how to remove it. I only discovered
> my wireless card was in a pcie slot because of the error messages at
> resume (which started a few releases ago, I think v3.7).
> 
> 
> Paul Bolle
> 
> 
> .
>
diff mbox

Patch

diff --git a/drivers/pci/hotplug/pciehp_core.c b/drivers/pci/hotplug/pciehp_core.c
index 7d72c5e..d01e093 100644
--- a/drivers/pci/hotplug/pciehp_core.c
+++ b/drivers/pci/hotplug/pciehp_core.c
@@ -291,6 +291,28 @@  static void pciehp_remove(struct pcie_device *dev)
 }
 
 #ifdef CONFIG_PM
+
+/* If device support Device Serial Numner, use DSN
+ * to identify the device
+ */
+static bool device_in_slot_is_changed(struct pci_bus *pbus)
+{
+	u64 old_dsn, new_dsn;
+	struct pci_dev *pdev;
+
+	pdev = pci_get_slot(pbus, PCI_DEVFN(0, 0));
+	old_dsn = pdev->sn;
+
+	/* get func 0 device serial number */
+	pci_get_dsn(pdev, &new_dsn);
+	pci_dev_put(pdev);
+
+	if (old_dsn != new_dsn)
+		return true;
+
+	return false;
+}
+
 static int pciehp_suspend (struct pcie_device *dev)
 {
 	return 0;
@@ -300,6 +322,7 @@  static int pciehp_resume (struct pcie_device *dev)
 {
 	struct controller *ctrl;
 	struct slot *slot;
+	struct pci_bus *pbus = dev->port->subordinate;
 	u8 status;
 
 	ctrl = get_service_data(dev);
@@ -311,10 +334,17 @@  static int pciehp_resume (struct pcie_device *dev)
 
 	/* Check if slot is occupied */
 	pciehp_get_adapter_status(slot, &status);
-	if (status)
-		pciehp_enable_slot(slot);
-	else
-		pciehp_disable_slot(slot);
+	if (status) {
+		if (list_empty(&pbus->devices))
+			pciehp_enable_slot(slot);
+		else if (device_in_slot_is_changed(pbus)) {
+			pciehp_disable_slot(slot);
+			pciehp_enable_slot(slot);
+		}
+	} else {
+		if (!list_empty(&pbus->devices))
+			pciehp_disable_slot(slot);
+	}
 	return 0;
 }
 #endif /* PM */