diff mbox series

[RFC,v2,2/4] PCI: pciehp: Do not turn off slot if presence comes up after link

Message ID 20190220012031.10741-3-mr.nuke.me@gmail.com
State Changes Requested
Delegated to: Bjorn Helgaas
Headers show
Series PCI: pciehp: Do not turn off slot if presence comes up after link | expand

Commit Message

Alex G. Feb. 20, 2019, 1:20 a.m. UTC
According to PCIe 3.0, the presence detect state is a logical OR of
in-band and out-of-band presence. With this, we'd expect the presence
state to always be asserted when the link comes up.

Not all hardware follows this, and it is possible for the presence to
come up after the link. In this case, the PCIe device would be
erroneously disabled and re-probed. It is possible to distinguish
between a delayed presence and a card swap by looking at the DLL state
changed bit -- The link has to come down if the card is removed.

Thus, for a device that is probed, present and has its link active, a
lack of a link state change event guarantees we have the same device,
and shutdown is not needed.

Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
---
 drivers/pci/hotplug/pciehp_ctrl.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

Comments

Lukas Wunner Feb. 21, 2019, 7:36 a.m. UTC | #1
On Tue, Feb 19, 2019 at 07:20:28PM -0600, Alexandru Gagniuc wrote:
> @@ -213,6 +213,21 @@ void pciehp_handle_disable_request(struct controller *ctrl)
>  	ctrl->request_result = pciehp_disable_slot(ctrl, SAFE_REMOVAL);
>  }
>  
> +static bool is_delayed_presence_up_event(struct controller *ctrl, u32 events)
> +{
> +	bool present, link_active;
> +
> +	if (!ctrl->inband_presence_disabled)
> +		return false;
> +
> +	present = pciehp_card_present(ctrl);
> +	link_active = pciehp_check_link_active(ctrl);
> +
> +	if (!present || !link_active || events & PCI_EXP_SLTSTA_DLLSC)
> +		return false;
> +
> +	return true;
> +}
>  void pciehp_handle_presence_or_link_change(struct controller *ctrl, u32 events)

Newline please after the closing curly brace.


> @@ -220,13 +235,22 @@ void pciehp_handle_presence_or_link_change(struct controller *ctrl, u32 events)
>  	/*
>  	 * If the slot is on and presence or link has changed, turn it off.
>  	 * Even if it's occupied again, we cannot assume the card is the same.
> +	 * When the card is swapped, we also expect a change in link state,
> +	 * without which, it's likely presence became high after link-active.
>  	 */

Maybe it's just me but I find the code comment difficult to understand.
How about something along the lines of:

 	/*
 	 * If the slot is on and presence or link has changed, turn it off.
 	 * Even if it's occupied again, we cannot assume the card is the same.
+	 *
+	 * An exception is a delayed "Card present" after a "Link Up".
+	 * This can happen on controllers with in-band presence disabled,
+	 * PCIe r5.0 sec X.Y.Z.
 	 */


>  	mutex_lock(&ctrl->state_lock);
> +	present = pciehp_card_present(ctrl);
> +	link_active = pciehp_check_link_active(ctrl);
>  	switch (ctrl->state) {

These two assignments appear to be superfluous as you're also performing
them in pciehp_check_link_active().

Thanks,

Lukas
Alex_Gagniuc@Dellteam.com Feb. 22, 2019, 7:56 p.m. UTC | #2
On 2/21/19 1:36 AM, Lukas Wunner wrote:
> On Tue, Feb 19, 2019 at 07:20:28PM -0600, Alexandru Gagniuc wrote:
>>   	mutex_lock(&ctrl->state_lock);
>> +	present = pciehp_card_present(ctrl);
>> +	link_active = pciehp_check_link_active(ctrl);
>>   	switch (ctrl->state) {
> 
> These two assignments appear to be superfluous as you're also performing
> them in pciehp_check_link_active().

Not sure. Between the first check, and this check, you can have several 
seconds elapse depending on whether the driver's .probe()/remove() is 
invoked. Whatever you got at the beginning would be stale. If you had a 
picture dictionary and looked up 'bad idea', it would have a picture of 
the above code with the second check removed.

I've got all the other review comments addressed in my local branch. I'm 
waiting on Lord Helgass' decision on which solution is better.

Alex
Lukas Wunner Feb. 23, 2019, 6:49 a.m. UTC | #3
On Fri, Feb 22, 2019 at 07:56:28PM +0000, Alex_Gagniuc@Dellteam.com wrote:
> On 2/21/19 1:36 AM, Lukas Wunner wrote:
> > On Tue, Feb 19, 2019 at 07:20:28PM -0600, Alexandru Gagniuc wrote:
> >>   	mutex_lock(&ctrl->state_lock);
> >> +	present = pciehp_card_present(ctrl);
> >> +	link_active = pciehp_check_link_active(ctrl);
> >>   	switch (ctrl->state) {
> > 
> > These two assignments appear to be superfluous as you're also performing
> > them in pciehp_check_link_active().
> 
> Not sure. Between the first check, and this check, you can have several 
> seconds elapse depending on whether the driver's .probe()/remove() is 
> invoked. Whatever you got at the beginning would be stale. If you had a 
> picture dictionary and looked up 'bad idea', it would have a picture of 
> the above code with the second check removed.

I don't quite follow.  You're no longer using the "present" and
"link_active" variables in pciehp_handle_presence_or_link_change(),
the variables are set again further down in the function and you're
*also* reading PDS and DLLLA in is_delayed_presence_up_event().
So the above-quoted assignments are superfluous.  Am I missing something?

(Sorry, I had copy-pasted the wrong function name, I meant
is_delayed_presence_up_event() above, not pciehp_check_link_active().


> I've got all the other review comments addressed in my local branch. I'm 
> waiting on Lord Helgass' decision on which solution is better.
             ^^^^^^^^^^^^

Can we keep this discussion in a neutral tone please?

Thanks,

Lukas
Alex_Gagniuc@Dellteam.com Feb. 24, 2019, 10:27 p.m. UTC | #4
On 2/23/19 12:50 AM, Lukas Wunner wrote:
> 
> [EXTERNAL EMAIL]
> 
> On Fri, Feb 22, 2019 at 07:56:28PM +0000, Alex_Gagniuc@Dellteam.com wrote:
>> On 2/21/19 1:36 AM, Lukas Wunner wrote:
>>> On Tue, Feb 19, 2019 at 07:20:28PM -0600, Alexandru Gagniuc wrote:
>>>>    	mutex_lock(&ctrl->state_lock);
>>>> +	present = pciehp_card_present(ctrl);
>>>> +	link_active = pciehp_check_link_active(ctrl);
>>>>    	switch (ctrl->state) {
>>>
>>> These two assignments appear to be superfluous as you're also performing
>>> them in pciehp_check_link_active().
>>
>> Not sure. Between the first check, and this check, you can have several
>> seconds elapse depending on whether the driver's .probe()/remove() is
>> invoked. Whatever you got at the beginning would be stale. If you had a
>> picture dictionary and looked up 'bad idea', it would have a picture of
>> the above code with the second check removed.
> 
> I don't quite follow.  You're no longer using the "present" and
> "link_active" variables in pciehp_handle_presence_or_link_change(),
> the variables are set again further down in the function and you're
> *also* reading PDS and DLLLA in is_delayed_presence_up_event().
> So the above-quoted assignments are superfluous.  Am I missing something?
> 
> (Sorry, I had copy-pasted the wrong function name, I meant
> is_delayed_presence_up_event() above, not pciehp_check_link_active().


I see what I did. You're right. I should remove the following lines from 
the patch. I'll have that fixed when I re-submit this.

+       present = pciehp_card_present(ctrl);
+       link_active = pciehp_check_link_active(ctrl);

> 
>> I've got all the other review comments addressed in my local branch. I'm
>> waiting on Lord Helgass' decision on which solution is better.
>               ^^^^^^^^^^^^
> 
> Can we keep this discussion in a neutral tone please?

I'm sorry. I thought comparing linux to feudalism would be hillarious, 
but I now see not everyone agrees. Sorry, Bjorn.

Alex
diff mbox series

Patch

diff --git a/drivers/pci/hotplug/pciehp_ctrl.c b/drivers/pci/hotplug/pciehp_ctrl.c
index 3f3df4c29f6e..28965995ebb9 100644
--- a/drivers/pci/hotplug/pciehp_ctrl.c
+++ b/drivers/pci/hotplug/pciehp_ctrl.c
@@ -213,6 +213,21 @@  void pciehp_handle_disable_request(struct controller *ctrl)
 	ctrl->request_result = pciehp_disable_slot(ctrl, SAFE_REMOVAL);
 }
 
+static bool is_delayed_presence_up_event(struct controller *ctrl, u32 events)
+{
+	bool present, link_active;
+
+	if (!ctrl->inband_presence_disabled)
+		return false;
+
+	present = pciehp_card_present(ctrl);
+	link_active = pciehp_check_link_active(ctrl);
+
+	if (!present || !link_active || events & PCI_EXP_SLTSTA_DLLSC)
+		return false;
+
+	return true;
+}
 void pciehp_handle_presence_or_link_change(struct controller *ctrl, u32 events)
 {
 	bool present, link_active;
@@ -220,13 +235,22 @@  void pciehp_handle_presence_or_link_change(struct controller *ctrl, u32 events)
 	/*
 	 * If the slot is on and presence or link has changed, turn it off.
 	 * Even if it's occupied again, we cannot assume the card is the same.
+	 * When the card is swapped, we also expect a change in link state,
+	 * without which, it's likely presence became high after link-active.
 	 */
 	mutex_lock(&ctrl->state_lock);
+	present = pciehp_card_present(ctrl);
+	link_active = pciehp_check_link_active(ctrl);
 	switch (ctrl->state) {
 	case BLINKINGOFF_STATE:
 		cancel_delayed_work(&ctrl->button_work);
 		/* fall through */
 	case ON_STATE:
+		if (is_delayed_presence_up_event(ctrl, events)) {
+			mutex_unlock(&ctrl->state_lock);
+			ctrl_dbg(ctrl, "Presence state came up after link");
+			return;
+		}
 		ctrl->state = POWEROFF_STATE;
 		mutex_unlock(&ctrl->state_lock);
 		if (events & PCI_EXP_SLTSTA_DLLSC)