diff mbox series

PCI: pciehp: Don't enable slot unless link or presence up

Message ID 20200310182100.102987-1-stuart.w.hayes@gmail.com
State New
Headers show
Series PCI: pciehp: Don't enable slot unless link or presence up | expand

Commit Message

stuart hayes March 10, 2020, 6:21 p.m. UTC
When a pciehp slot is powered down via /sys/bus/pci/slots/<slot>/power,
and then the card is physically removed, the kernel will sometimes try to
enable the slot as the card is removed, which results in an error in the
kernel log.

This can happen if the presence detect and link active bits don't go down
at the exact same time. When the card is disabled via /sys/.../power, the
card is placed in OFF_STATE, but the presence detect and link active bits
can still be up.  Then, when, say, presence detect goes down, an interrupt
reports that the presence detect has changed, and the code in
pciehp_handle_presence_or_link_change() will see that the link is up
(because it hasn't gone down yet), and it will try to enable the slot.

This patch modifies that code so it won't try to enable a slot in OFF_STATE
unless it sees the presence detect changed bit with presence detect active,
or the link status changed bit with an active link. This will prevent the
unwanted attempts to enable a card that's being removed, but will still
allow the slot to come up if the slot is re-enabled by writing to
/sys/.../power, or if a new card is added to the slot.

Signed-off-by: Stuart Hayes <stuart.w.hayes@gmail.com>
---
 drivers/pci/hotplug/pciehp_ctrl.c | 24 ++++++++++++++----------
 1 file changed, 14 insertions(+), 10 deletions(-)

Comments

Bjorn Helgaas March 28, 2020, 8:39 p.m. UTC | #1
On Tue, Mar 10, 2020 at 01:21:00PM -0500, Stuart Hayes wrote:
> When a pciehp slot is powered down via /sys/bus/pci/slots/<slot>/power,
> and then the card is physically removed, the kernel will sometimes try to
> enable the slot as the card is removed, which results in an error in the
> kernel log.
> 
> This can happen if the presence detect and link active bits don't go down
> at the exact same time. When the card is disabled via /sys/.../power, the
> card is placed in OFF_STATE, but the presence detect and link active bits
> can still be up.  Then, when, say, presence detect goes down, an interrupt
> reports that the presence detect has changed, and the code in
> pciehp_handle_presence_or_link_change() will see that the link is up
> (because it hasn't gone down yet), and it will try to enable the slot.
> 
> This patch modifies that code so it won't try to enable a slot in OFF_STATE
> unless it sees the presence detect changed bit with presence detect active,
> or the link status changed bit with an active link. This will prevent the
> unwanted attempts to enable a card that's being removed, but will still
> allow the slot to come up if the slot is re-enabled by writing to
> /sys/.../power, or if a new card is added to the slot.

Looking for a reviewed-by from Lukas for this.

> Signed-off-by: Stuart Hayes <stuart.w.hayes@gmail.com>
> ---
>  drivers/pci/hotplug/pciehp_ctrl.c | 24 ++++++++++++++----------
>  1 file changed, 14 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/pci/hotplug/pciehp_ctrl.c b/drivers/pci/hotplug/pciehp_ctrl.c
> index 6503d15effbb..f6cbf21711e0 100644
> --- a/drivers/pci/hotplug/pciehp_ctrl.c
> +++ b/drivers/pci/hotplug/pciehp_ctrl.c
> @@ -267,16 +267,20 @@ void pciehp_handle_presence_or_link_change(struct controller *ctrl, u32 events)
>  		cancel_delayed_work(&ctrl->button_work);
>  		/* fall through */
>  	case OFF_STATE:
> -		ctrl->state = POWERON_STATE;
> -		mutex_unlock(&ctrl->state_lock);
> -		if (present)
> -			ctrl_info(ctrl, "Slot(%s): Card present\n",
> -				  slot_name(ctrl));
> -		if (link_active)
> -			ctrl_info(ctrl, "Slot(%s): Link Up\n",
> -				  slot_name(ctrl));
> -		ctrl->request_result = pciehp_enable_slot(ctrl);
> -		break;
> +		if ((events & PCI_EXP_SLTSTA_PDC && present) ||
> +		    (events & PCI_EXP_SLTSTA_DLLSC && link_active)) {
> +			ctrl->state = POWERON_STATE;
> +			mutex_unlock(&ctrl->state_lock);
> +			if (present)
> +				ctrl_info(ctrl, "Slot(%s): Card present\n",
> +					  slot_name(ctrl));
> +			if (link_active)
> +				ctrl_info(ctrl, "Slot(%s): Link Up\n",
> +					  slot_name(ctrl));
> +			ctrl->request_result = pciehp_enable_slot(ctrl);
> +			break;
> +		}
> +		/* fall through */
>  	default:
>  		mutex_unlock(&ctrl->state_lock);
>  		break;
> -- 
> 2.18.1
>
Lukas Wunner March 29, 2020, 12:36 p.m. UTC | #2
On Tue, Mar 10, 2020 at 01:21:00PM -0500, Stuart Hayes wrote:
> When a pciehp slot is powered down via /sys/bus/pci/slots/<slot>/power,
> and then the card is physically removed, the kernel will sometimes try to
> enable the slot as the card is removed, which results in an error in the
> kernel log.
> 
> This can happen if the presence detect and link active bits don't go down
> at the exact same time. When the card is disabled via /sys/.../power, the
> card is placed in OFF_STATE, but the presence detect and link active bits
> can still be up.  Then, when, say, presence detect goes down, an interrupt
> reports that the presence detect has changed, and the code in
> pciehp_handle_presence_or_link_change() will see that the link is up
> (because it hasn't gone down yet), and it will try to enable the slot.
> 
> This patch modifies that code so it won't try to enable a slot in OFF_STATE
> unless it sees the presence detect changed bit with presence detect active,
> or the link status changed bit with an active link. This will prevent the
> unwanted attempts to enable a card that's being removed, but will still
> allow the slot to come up if the slot is re-enabled by writing to
> /sys/.../power, or if a new card is added to the slot.
> 
> Signed-off-by: Stuart Hayes <stuart.w.hayes@gmail.com>
> ---
>  drivers/pci/hotplug/pciehp_ctrl.c | 24 ++++++++++++++----------
>  1 file changed, 14 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/pci/hotplug/pciehp_ctrl.c b/drivers/pci/hotplug/pciehp_ctrl.c
> index 6503d15effbb..f6cbf21711e0 100644
> --- a/drivers/pci/hotplug/pciehp_ctrl.c
> +++ b/drivers/pci/hotplug/pciehp_ctrl.c
> @@ -267,16 +267,20 @@ void pciehp_handle_presence_or_link_change(struct controller *ctrl, u32 events)
>  		cancel_delayed_work(&ctrl->button_work);
>  		/* fall through */
>  	case OFF_STATE:
> -		ctrl->state = POWERON_STATE;
> -		mutex_unlock(&ctrl->state_lock);
> -		if (present)
> -			ctrl_info(ctrl, "Slot(%s): Card present\n",
> -				  slot_name(ctrl));
> -		if (link_active)
> -			ctrl_info(ctrl, "Slot(%s): Link Up\n",
> -				  slot_name(ctrl));
> -		ctrl->request_result = pciehp_enable_slot(ctrl);
> -		break;
> +		if ((events & PCI_EXP_SLTSTA_PDC && present) ||
> +		    (events & PCI_EXP_SLTSTA_DLLSC && link_active)) {
> +			ctrl->state = POWERON_STATE;
> +			mutex_unlock(&ctrl->state_lock);
> +			if (present)
> +				ctrl_info(ctrl, "Slot(%s): Card present\n",
> +					  slot_name(ctrl));
> +			if (link_active)
> +				ctrl_info(ctrl, "Slot(%s): Link Up\n",
> +					  slot_name(ctrl));
> +			ctrl->request_result = pciehp_enable_slot(ctrl);
> +			break;
> +		}
> +		/* fall through */
>  	default:
>  		mutex_unlock(&ctrl->state_lock);
>  		break;

First of all:

Up until now, when the controller is in BLINKINGON_STATE and a PDC or
DLLSC event is received and PDS or DLLLA is on, the button_work is
canceled and the slot is brought up immediately.  The rationale is
that is doesn't make much sense to wait until the 5 second delay has
elapsed if we know there's a card in the slot or the link is already up.

However with the above change, it could happen that the button_work is
canceled even though the slot *isn't* brought up.  If the slot is only
brought up if ((PDC && PDS) || (DLLSC && DLLLA)) and those conditions
aren't satisfied at the moment, they might be once the 5 seconds have
elapsed.  So I think we should continue blinking and not cancel the
button_work.

Secondly, I'm wondering if this change might break hardware which doesn't
behave well.  If the slot signals e.g. PDC but takes a little while to
set PDS even though DLLLA is already set, then the current code will react
tolerantly, whereas the change proposed above may result in the slot not
being brought up at all.  Not sure if such hardware exists, just slightly
concerned.

IIUC, the only downside of the current code is an error message in dmesg
when trying to bring up a slot after the card has already been removed.
Maybe just toning down the message to KERN_INFO would be appropriate?

Thanks,

Lukas
stuart hayes July 8, 2020, 10:39 p.m. UTC | #3
On 3/29/20 7:36 AM, Lukas Wunner wrote:
> On Tue, Mar 10, 2020 at 01:21:00PM -0500, Stuart Hayes wrote:
>> When a pciehp slot is powered down via /sys/bus/pci/slots/<slot>/power,
>> and then the card is physically removed, the kernel will sometimes try to
>> enable the slot as the card is removed, which results in an error in the
>> kernel log.
>>
>> This can happen if the presence detect and link active bits don't go down
>> at the exact same time. When the card is disabled via /sys/.../power, the
>> card is placed in OFF_STATE, but the presence detect and link active bits
>> can still be up.  Then, when, say, presence detect goes down, an interrupt
>> reports that the presence detect has changed, and the code in
>> pciehp_handle_presence_or_link_change() will see that the link is up
>> (because it hasn't gone down yet), and it will try to enable the slot.
>>
>> This patch modifies that code so it won't try to enable a slot in OFF_STATE
>> unless it sees the presence detect changed bit with presence detect active,
>> or the link status changed bit with an active link. This will prevent the
>> unwanted attempts to enable a card that's being removed, but will still
>> allow the slot to come up if the slot is re-enabled by writing to
>> /sys/.../power, or if a new card is added to the slot.
>>
>> Signed-off-by: Stuart Hayes <stuart.w.hayes@gmail.com>
>> ---
>>  drivers/pci/hotplug/pciehp_ctrl.c | 24 ++++++++++++++----------
>>  1 file changed, 14 insertions(+), 10 deletions(-)
>>
>> diff --git a/drivers/pci/hotplug/pciehp_ctrl.c b/drivers/pci/hotplug/pciehp_ctrl.c
>> index 6503d15effbb..f6cbf21711e0 100644
>> --- a/drivers/pci/hotplug/pciehp_ctrl.c
>> +++ b/drivers/pci/hotplug/pciehp_ctrl.c
>> @@ -267,16 +267,20 @@ void pciehp_handle_presence_or_link_change(struct controller *ctrl, u32 events)
>>  		cancel_delayed_work(&ctrl->button_work);
>>  		/* fall through */
>>  	case OFF_STATE:
>> -		ctrl->state = POWERON_STATE;
>> -		mutex_unlock(&ctrl->state_lock);
>> -		if (present)
>> -			ctrl_info(ctrl, "Slot(%s): Card present\n",
>> -				  slot_name(ctrl));
>> -		if (link_active)
>> -			ctrl_info(ctrl, "Slot(%s): Link Up\n",
>> -				  slot_name(ctrl));
>> -		ctrl->request_result = pciehp_enable_slot(ctrl);
>> -		break;
>> +		if ((events & PCI_EXP_SLTSTA_PDC && present) ||
>> +		    (events & PCI_EXP_SLTSTA_DLLSC && link_active)) {
>> +			ctrl->state = POWERON_STATE;
>> +			mutex_unlock(&ctrl->state_lock);
>> +			if (present)
>> +				ctrl_info(ctrl, "Slot(%s): Card present\n",
>> +					  slot_name(ctrl));
>> +			if (link_active)
>> +				ctrl_info(ctrl, "Slot(%s): Link Up\n",
>> +					  slot_name(ctrl));
>> +			ctrl->request_result = pciehp_enable_slot(ctrl);
>> +			break;
>> +		}
>> +		/* fall through */
>>  	default:
>>  		mutex_unlock(&ctrl->state_lock);
>>  		break;
> 
> First of all:
> 
> Up until now, when the controller is in BLINKINGON_STATE and a PDC or
> DLLSC event is received and PDS or DLLLA is on, the button_work is
> canceled and the slot is brought up immediately.  The rationale is
> that is doesn't make much sense to wait until the 5 second delay has
> elapsed if we know there's a card in the slot or the link is already up.
> 
> However with the above change, it could happen that the button_work is
> canceled even though the slot *isn't* brought up.  If the slot is only
> brought up if ((PDC && PDS) || (DLLSC && DLLLA)) and those conditions
> aren't satisfied at the moment, they might be once the 5 seconds have
> elapsed.  So I think we should continue blinking and not cancel the
> button_work.
> > Secondly, I'm wondering if this change might break hardware which doesn't
> behave well.  If the slot signals e.g. PDC but takes a little while to
> set PDS even though DLLLA is already set, then the current code will react
> tolerantly, whereas the change proposed above may result in the slot not
> being brought up at all.  Not sure if such hardware exists, just slightly
> concerned.
> > IIUC, the only downside of the current code is an error message in dmesg
> when trying to bring up a slot after the card has already been removed.
> Maybe just toning down the message to KERN_INFO would be appropriate?
> 
> Thanks,
> 
> Lukas
> 

(Sorry it has taken me so long to reply.)

I could easily change the patch so that the behavior is unchanged when the
controller is in the BLINKINGON_STATE.  My concern is only when it is in
the OFF_STATE (after being disabled via sysfs).

It's more than just the "Link Up" message.  When I "power down" an NVMe
drive via sysfs (on a system that doesn't actually shut off the power to
the slot), and then I physically remove the drive, I get this ugly string
of messages:

pcieport 0000:3c:06.0: pciehp: Slot(180): Link Up
pcieport 0000:3c:06.0: pciehp: Timeout waiting for Presence Detect
pcieport 0000:3c:06.0: pciehp: link training error: status 0x0001
pcieport 0000:3c:06.0: pciehp: Failed to check link status

But I've given it a lot of thought, and I don't see a good way to avoid
that, unless we can assume that PDS/DLLLA will get set no later than PDC/
DLLLA.  Without that assumption, even if we put the controller in a
new state like ON_BUT_DISABLED_STATE after it is disabled but there's no
hardware to actually turn off the power to the slot, we won't know when
we get a PDC if that's because the card has been removed, or if the card
was removed and re-inserted, and the PDS just hasn't gone active yet.

Even without this patch, the pciehp code could fail to connect newly-
inserted cards, if they don't set PDS/DLLLA before PDC/DLLSC.  For
example, if both PDC & DLLSC occur at about the same time, but neither
PDS nor DLLLA are set yet when pciehp_handle_presence_or_link_change()
runs, it will return without trying to bring the slot up.  And cards
that don't ever bother to turn on PDS would never connect automatically
if DLLLA comes up after DLLLC.

I would think that it would be worth making that assumption to get rid
of those ugly messages, since these messages could be quite common, but
it is (I suspect) very unlikely than any hardware would set PDC/DLLSC
before the corresponding status bits.

I did think of one other potential issue:  Right now, a fake PDC event
is used to trigger connecting of occupied slots during probe, resume, or
enabling via sysfs... I could change that to PDC | DLLSC to make
sure it connects cards that don't ever set PDS.

Would you consider the patch if I rework it to separate the
BLINKINGON_STATE so that its behavior isn't changed, and change the
fake PDC events to PDC | DLLSC?

Thanks,
Stuart
Lukas Wunner July 9, 2020, 4:05 p.m. UTC | #4
On Wed, Jul 08, 2020 at 05:39:49PM -0500, Stuart Hayes wrote:
> It's more than just the "Link Up" message.  When I "power down" an NVMe
> drive via sysfs (on a system that doesn't actually shut off the power to
> the slot), and then I physically remove the drive, I get this ugly string
> of messages:
> 
> pcieport 0000:3c:06.0: pciehp: Slot(180): Link Up
> pcieport 0000:3c:06.0: pciehp: Timeout waiting for Presence Detect
> pcieport 0000:3c:06.0: pciehp: link training error: status 0x0001
> pcieport 0000:3c:06.0: pciehp: Failed to check link status

I was just wondering if those messages could be avoided by setting
the Link Disable bit in the Link Control register if the slot
is brought down via sysfs.  The question is, if a new NVMe
drive is subsequently inserted into the slot, does the hotplug
controller signal a DLLSC event even though Link Disable is set?
Or at least a PDC event, whereupon we could clear Link Disable?

Or is there some other way to force the link into a state such that
it is off but DLLSC events continue to be received?

There's a function pciehp_link_enable() which can set or clear the
Link Disable flag.  But it's only ever invoked to clear it.  Hm, digging
in the git history shows that between 2012 and 2014 it was used to set
Link Disable when bringing down the slot, but b1811d2455f3 dropped that
because DLLSC was no longer received.

I guess we might be able to set Link Disable, then clear it once the
PDC event is received upon removal of the NVMe drive from the slot.
But I imagine that would complicate the state machine quite a bit.

Another idea:  If the hotplug controller is suspended to D3hot,
the link should be down but presence or link events should still
be received.  Sadly, runtime D3hot is disabled by default for
hotplug ports because it is known to not work on certain SkyLake
Xeon-SP systems and possibly others (see eb3b5bf1a88d).  You can
force it on by passing pcie_port_pm=force on the command line.


> Would you consider the patch if I rework it to separate the
> BLINKINGON_STATE so that its behavior isn't changed, and change the
> fake PDC events to PDC | DLLSC?

TBH I'd suggest to simply drop some of the messages and tone down the
remaining ones, in particular:

* Drop the "Timeout waiting for Presence Detect" message in
  pcie_wait_for_presence().  The sole caller of that function,
  pciehp_check_link_status(), doesn't bail out if it fails
  but continues and will emit an error message of its own.
  I don't think the message provides much additional value.

* Tone down the "link training error" message in
  pciehp_check_link_status() to ctrl_info().  That message at
  least prints the contents of the Link Status register, which
  may have some value.

* Add a message with ctrl_info() severity in the "if (!found)"
  clause in pciehp_check_link_status().  That way the function
  emits a message in all error cases.

* This in turn allows dropping the "Failed to check link status"
  message in board_added().

As a result you should just get two messages with KERN_INFO severity
which simply serve as an indication that DLLLA and PDS did not change
in unison.

With Thunderbolt, unplugging a daisy chain of devices always causes
a bunch of harmless "no response from device" messages with KERN_INFO
severity.  We never know if there's a real problem or not, so we try
to alert the user that there *may* be a problem, but at the same time
we should try to keep noisiness at a minimum.

Thanks,

Lukas
diff mbox series

Patch

diff --git a/drivers/pci/hotplug/pciehp_ctrl.c b/drivers/pci/hotplug/pciehp_ctrl.c
index 6503d15effbb..f6cbf21711e0 100644
--- a/drivers/pci/hotplug/pciehp_ctrl.c
+++ b/drivers/pci/hotplug/pciehp_ctrl.c
@@ -267,16 +267,20 @@  void pciehp_handle_presence_or_link_change(struct controller *ctrl, u32 events)
 		cancel_delayed_work(&ctrl->button_work);
 		/* fall through */
 	case OFF_STATE:
-		ctrl->state = POWERON_STATE;
-		mutex_unlock(&ctrl->state_lock);
-		if (present)
-			ctrl_info(ctrl, "Slot(%s): Card present\n",
-				  slot_name(ctrl));
-		if (link_active)
-			ctrl_info(ctrl, "Slot(%s): Link Up\n",
-				  slot_name(ctrl));
-		ctrl->request_result = pciehp_enable_slot(ctrl);
-		break;
+		if ((events & PCI_EXP_SLTSTA_PDC && present) ||
+		    (events & PCI_EXP_SLTSTA_DLLSC && link_active)) {
+			ctrl->state = POWERON_STATE;
+			mutex_unlock(&ctrl->state_lock);
+			if (present)
+				ctrl_info(ctrl, "Slot(%s): Card present\n",
+					  slot_name(ctrl));
+			if (link_active)
+				ctrl_info(ctrl, "Slot(%s): Link Up\n",
+					  slot_name(ctrl));
+			ctrl->request_result = pciehp_enable_slot(ctrl);
+			break;
+		}
+		/* fall through */
 	default:
 		mutex_unlock(&ctrl->state_lock);
 		break;