diff mbox

pciehp_resume: don't add existing device

Message ID 1478714735-134347-1-git-send-email-ravisadineni@chromium.org
State Changes Requested
Headers show

Commit Message

Ravi Chandra Sadineni Nov. 9, 2016, 6:05 p.m. UTC
If a slot was occupied before supend, and nothing has changed after
resume, we call pciehp_enable_slot() although it fails a little
later with the message:
   Device XXXX:XX:XX.X already exists at XXXX:XX:XX, cannot hot-add
   Cannot add device at XXXX:XX:XX

This was partly discussed here: https://lkml.org/lkml/2013/7/9/452
and I'm pulling only the part4 of that patch, since it does not change
anything functionally (or atleast does not seem to make it worse), but
prevents uncomfortable messages pointed above.
---
 drivers/pci/hotplug/pciehp_core.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

Comments

Lukas Wunner Nov. 9, 2016, 7:58 p.m. UTC | #1
On Wed, Nov 09, 2016 at 10:05:35AM -0800, Ravi Chandra Sadineni wrote:
> If a slot was occupied before supend, and nothing has changed after
> resume, we call pciehp_enable_slot() although it fails a little
> later with the message:
>    Device XXXX:XX:XX.X already exists at XXXX:XX:XX, cannot hot-add
>    Cannot add device at XXXX:XX:XX
> 
> This was partly discussed here: https://lkml.org/lkml/2013/7/9/452
> and I'm pulling only the part4 of that patch, since it does not change
> anything functionally (or atleast does not seem to make it worse), but
> prevents uncomfortable messages pointed above.
> ---
>  drivers/pci/hotplug/pciehp_core.c | 10 +++++++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/pci/hotplug/pciehp_core.c b/drivers/pci/hotplug/pciehp_core.c
> index 612b21a..873cff8 100644
> --- a/drivers/pci/hotplug/pciehp_core.c
> +++ b/drivers/pci/hotplug/pciehp_core.c
> @@ -290,6 +290,7 @@ static int pciehp_resume(struct pcie_device *dev)
>  {
>  	struct controller *ctrl;
>  	struct slot *slot;
> +	struct pci_bus *pbus = dev->port->subordinate;
>  	u8 status;
>  
>  	ctrl = get_service_data(dev);
> @@ -302,10 +303,13 @@ static int pciehp_resume(struct pcie_device *dev)
>  	/* Check if slot is occupied */
>  	pciehp_get_adapter_status(slot, &status);
>  	mutex_lock(&slot->hotplug_lock);
> -	if (status)
> -		pciehp_enable_slot(slot);
> -	else
> +	if (status) {
> +		if (list_empty(&pbus->devices))
> +			pciehp_enable_slot(slot);
> +	}
> +	else {
>  		pciehp_disable_slot(slot);
> +	}

What if the device plugged in after suspend is a different one
and requires e.g. an entirely different driver or different resource
allocations?

What if the device is the same but child devices have been plugged
in during system sleep? (E.g. additional devices attached to a
Thunderbolt daisy chain.)

At the very least you need to rescan the bus here.

Thanks,

Lukas

>  	mutex_unlock(&slot->hotplug_lock);
>  	return 0;
>  }
> -- 
> 2.6.6
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rajat Jain Nov. 9, 2016, 8:59 p.m. UTC | #2
On Wed, Nov 9, 2016 at 11:58 AM, Lukas Wunner <lukas@wunner.de> wrote:
> On Wed, Nov 09, 2016 at 10:05:35AM -0800, Ravi Chandra Sadineni wrote:
>> If a slot was occupied before supend, and nothing has changed after
>> resume, we call pciehp_enable_slot() although it fails a little
>> later with the message:
>>    Device XXXX:XX:XX.X already exists at XXXX:XX:XX, cannot hot-add
>>    Cannot add device at XXXX:XX:XX
>>
>> This was partly discussed here: https://lkml.org/lkml/2013/7/9/452
>> and I'm pulling only the part4 of that patch, since it does not change
>> anything functionally (or atleast does not seem to make it worse), but
>> prevents uncomfortable messages pointed above.

Missing "Signed-off-by" :-)

>> ---
>>  drivers/pci/hotplug/pciehp_core.c | 10 +++++++---
>>  1 file changed, 7 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/pci/hotplug/pciehp_core.c b/drivers/pci/hotplug/pciehp_core.c
>> index 612b21a..873cff8 100644
>> --- a/drivers/pci/hotplug/pciehp_core.c
>> +++ b/drivers/pci/hotplug/pciehp_core.c
>> @@ -290,6 +290,7 @@ static int pciehp_resume(struct pcie_device *dev)
>>  {
>>       struct controller *ctrl;
>>       struct slot *slot;
>> +     struct pci_bus *pbus = dev->port->subordinate;
>>       u8 status;
>>
>>       ctrl = get_service_data(dev);
>> @@ -302,10 +303,13 @@ static int pciehp_resume(struct pcie_device *dev)
>>       /* Check if slot is occupied */
>>       pciehp_get_adapter_status(slot, &status);
>>       mutex_lock(&slot->hotplug_lock);
>> -     if (status)
>> -             pciehp_enable_slot(slot);
>> -     else
>> +     if (status) {

should be "} else {" (on the same line)


>> +             if (list_empty(&pbus->devices))
>> +                     pciehp_enable_slot(slot);
>> +     }
>> +     else {
>>               pciehp_disable_slot(slot);
>> +     }
>
> What if the device plugged in after suspend is a different one
> and requires e.g. an entirely different driver or different resource
> allocations?

I may be missing it completely, but this situation does not seem be
handled well today (even without this patch). In the cases that you
mention, it seems to me that the code will bail out (as of today) a
little later in pciehp_configure_device().

       dev = pci_get_slot(parent, PCI_DEVFN(0, 0));
        if (dev) {
                ctrl_err(ctrl, "Device %s already exists at
%04x:%02x:00, cannot hot-add\n",
                         pci_name(dev), pci_domain_nr(parent), parent->number);
                pci_dev_put(dev);
                ret = -EEXIST;
                goto out;
        }

I wasn't able to figure out that if the device was replaced, how the
old driver would be notified or the new driver's probe routine get
called. Or how do the resources get assigned to the new device?

So I'm trying to understand if this patch makes things any worse than
they are now (since atleast it makes the resume time quite faster
(avoiding up to 1 second delay in pciehp_check_link_status waiting for
link))?

>
> What if the device is the same but child devices have been plugged
> in during system sleep? (E.g. additional devices attached to a
> Thunderbolt daisy chain.)
>
> At the very least you need to rescan the bus here.

I agree. But how does this work today then?

Thanks,

Rajat




>
> Thanks,
>
> Lukas
>
>>       mutex_unlock(&slot->hotplug_lock);
>>       return 0;
>>  }
>> --
>> 2.6.6
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rajat Jain Nov. 9, 2016, 9:15 p.m. UTC | #3
On Wed, Nov 9, 2016 at 10:05 AM, Ravi Chandra Sadineni
<ravisadineni@chromium.org> wrote:
> If a slot was occupied before supend, and nothing has changed after
> resume, we call pciehp_enable_slot() although it fails a little
> later with the message:
>    Device XXXX:XX:XX.X already exists at XXXX:XX:XX, cannot hot-add
>    Cannot add device at XXXX:XX:XX
>
> This was partly discussed here: https://lkml.org/lkml/2013/7/9/452
> and I'm pulling only the part4 of that patch, since it does not change
> anything functionally (or atleast does not seem to make it worse), but
> prevents uncomfortable messages pointed above.
> ---
>  drivers/pci/hotplug/pciehp_core.c | 10 +++++++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/pci/hotplug/pciehp_core.c b/drivers/pci/hotplug/pciehp_core.c
> index 612b21a..873cff8 100644
> --- a/drivers/pci/hotplug/pciehp_core.c
> +++ b/drivers/pci/hotplug/pciehp_core.c
> @@ -290,6 +290,7 @@ static int pciehp_resume(struct pcie_device *dev)
>  {
>         struct controller *ctrl;
>         struct slot *slot;
> +       struct pci_bus *pbus = dev->port->subordinate;
>         u8 status;
>
>         ctrl = get_service_data(dev);
> @@ -302,10 +303,13 @@ static int pciehp_resume(struct pcie_device *dev)
>         /* Check if slot is occupied */
>         pciehp_get_adapter_status(slot, &status);
>         mutex_lock(&slot->hotplug_lock);
> -       if (status)
> -               pciehp_enable_slot(slot);
> -       else
> +       if (status) {
> +               if (list_empty(&pbus->devices))
> +                       pciehp_enable_slot(slot);
> +       }
> +       else {
>                 pciehp_disable_slot(slot);
> +       }
>         mutex_unlock(&slot->hotplug_lock);
>         return 0;
>  }
> --
> 2.6.6
>

One thing I'm a little concerned about is that if the controller has a
power controller - the current code would result in turning the power
on if it was not already on. But since we don't actually turn it off
while going into the suspend, I'm wondering if that is really /
practically a problem?
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Lukas Wunner Nov. 11, 2016, 7:23 a.m. UTC | #4
On Wed, Nov 09, 2016 at 12:59:01PM -0800, Rajat Jain wrote:
> On Wed, Nov 9, 2016 at 11:58 AM, Lukas Wunner <lukas@wunner.de> wrote:
> > On Wed, Nov 09, 2016 at 10:05:35AM -0800, Ravi Chandra Sadineni wrote:
> >> If a slot was occupied before supend, and nothing has changed after
> >> resume, we call pciehp_enable_slot() although it fails a little
> >> later with the message:
> >>    Device XXXX:XX:XX.X already exists at XXXX:XX:XX, cannot hot-add
> >>    Cannot add device at XXXX:XX:XX
> >>
> >> This was partly discussed here: https://lkml.org/lkml/2013/7/9/452
> >> and I'm pulling only the part4 of that patch, since it does not change
> >> anything functionally (or atleast does not seem to make it worse), but
> >> prevents uncomfortable messages pointed above.
> 
> Missing "Signed-off-by" :-)
> 
> >> ---
> >>  drivers/pci/hotplug/pciehp_core.c | 10 +++++++---
> >>  1 file changed, 7 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/drivers/pci/hotplug/pciehp_core.c b/drivers/pci/hotplug/pciehp_core.c
> >> index 612b21a..873cff8 100644
> >> --- a/drivers/pci/hotplug/pciehp_core.c
> >> +++ b/drivers/pci/hotplug/pciehp_core.c
> >> @@ -290,6 +290,7 @@ static int pciehp_resume(struct pcie_device *dev)
> >>  {
> >>       struct controller *ctrl;
> >>       struct slot *slot;
> >> +     struct pci_bus *pbus = dev->port->subordinate;
> >>       u8 status;
> >>
> >>       ctrl = get_service_data(dev);
> >> @@ -302,10 +303,13 @@ static int pciehp_resume(struct pcie_device *dev)
> >>       /* Check if slot is occupied */
> >>       pciehp_get_adapter_status(slot, &status);
> >>       mutex_lock(&slot->hotplug_lock);
> >> -     if (status)
> >> -             pciehp_enable_slot(slot);
> >> -     else
> >> +     if (status) {
> 
> should be "} else {" (on the same line)
> 
> 
> >> +             if (list_empty(&pbus->devices))
> >> +                     pciehp_enable_slot(slot);
> >> +     }
> >> +     else {
> >>               pciehp_disable_slot(slot);
> >> +     }
> >
> > What if the device plugged in after suspend is a different one
> > and requires e.g. an entirely different driver or different resource
> > allocations?
> 
> I may be missing it completely, but this situation does not seem be
> handled well today (even without this patch).

Yes, what we currently have is not correct.

First of all we should call pci_rescan_bus() in pciehp_configure_device().

Second, pci_rescan_bus() is broken in that it only adds new devices,
it doesn't remove devices.

It should perform a first pass where it walks the bus, checks for each
device whether the immediate parent is a hotplug bridge, and if so,
compares the vendor and device ID to what we have cached.  If it's
the same, chances are it's the same device (could have been replaced
by an identical device, but that shouldn't be a problem at least after
waking from system sleep).

If it's not the same, pci_stop_and_remove_bus_device() needs to be called
on it. That way we remove all replaced or no longer present devices.  In a
subsequent pass, pci_rescan_bus() would discover any newly added devices,
as it does now.

Third, we're not allowed to remove devices during the ->resume phase,
actually we're not even allowed to add devices until after the
->complete phase. Perhaps this can be accomplished with a PM notifier.

Fourth, we're also not allowed to add new devices after the ->prepare
phase, so we should disable the interrupt there.


> I wasn't able to figure out that if the device was replaced, how the
> old driver would be notified or the new driver's probe routine get
> called.

pci_stop_and_remove_bus_device() will unbind the old driver.
pci_bus_add_devices() causes the new driver to probe.  This is called
both from pci_rescan_bus() as well as pciehp_configure_device().


Thanks,

Lukas
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/pci/hotplug/pciehp_core.c b/drivers/pci/hotplug/pciehp_core.c
index 612b21a..873cff8 100644
--- a/drivers/pci/hotplug/pciehp_core.c
+++ b/drivers/pci/hotplug/pciehp_core.c
@@ -290,6 +290,7 @@  static int pciehp_resume(struct pcie_device *dev)
 {
 	struct controller *ctrl;
 	struct slot *slot;
+	struct pci_bus *pbus = dev->port->subordinate;
 	u8 status;
 
 	ctrl = get_service_data(dev);
@@ -302,10 +303,13 @@  static int pciehp_resume(struct pcie_device *dev)
 	/* Check if slot is occupied */
 	pciehp_get_adapter_status(slot, &status);
 	mutex_lock(&slot->hotplug_lock);
-	if (status)
-		pciehp_enable_slot(slot);
-	else
+	if (status) {
+		if (list_empty(&pbus->devices))
+			pciehp_enable_slot(slot);
+	}
+	else {
 		pciehp_disable_slot(slot);
+	}
 	mutex_unlock(&slot->hotplug_lock);
 	return 0;
 }