diff mbox series

[v4] PCI: pciehp: Fix the slot in BLINKINGON_STATE when Presence Detect Changed event occurred

Message ID 20230512021518.336460-1-clementwei90@163.com
State New
Headers show
Series [v4] PCI: pciehp: Fix the slot in BLINKINGON_STATE when Presence Detect Changed event occurred | expand

Commit Message

Rongguang Wei May 12, 2023, 2:15 a.m. UTC
From: Rongguang Wei <weirongguang@kylinos.cn>

pciehp's behavior is incorrect if the Attention Button is pressed
on an unoccupied slot.

When a Presence Detect Changed event has occurred, the slot status
in either BLINKINGOFF_STATE or OFF_STATE, turn it off unconditionally.
But if the slot status is in BLINKINGON_STATE and the slot is currently
empty, the slot status was staying in BLINKINGON_STATE.

The message print like this:
    pcieport 0000:00:01.5: pciehp: Slot(0-5): Attention button pressed
    pcieport 0000:00:01.5: pciehp: Slot(0-5) Powering on due to button press
    pcieport 0000:00:01.5: pciehp: Slot(0-5): Attention button pressed
    pcieport 0000:00:01.5: pciehp: Slot(0-5): Button cancel
    pcieport 0000:00:01.5: pciehp: Slot(0-5): Action canceled due to button press

It cause the next Attention Button Pressed event become Button cancel
and missing the Presence Detect Changed event with this button press
though this button presses event is occurred after 5s.

According to the Commit d331710ea78f ("PCI: pciehp: Become resilient
to missed events"), if the slot is currently occupied, turn it on and
if the slot is empty, it need to set in OFF_STATE rather than stay in
current status when pciehp_handle_presence_or_link_change() bails out.

V2: Update to simple code and avoid gratuitous message.
V3: Add Suggested-by.
V4: Add state change conditional and message.

Fixes: d331710ea78f ("PCI: pciehp: Become resilient to missed events")
Link: https://lore.kernel.org/linux-pci/20230403054619.19163-1-clementwei90@163.com/
Link: https://lore.kernel.org/linux-pci/20230421025641.655991-1-clementwei90@163.com/
Suggested-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Rongguang Wei <weirongguang@kylinos.cn>
---
 drivers/pci/hotplug/pciehp_ctrl.c | 8 ++++++++
 1 file changed, 8 insertions(+)

Comments

Lukas Wunner May 12, 2023, 5:20 a.m. UTC | #1
On Fri, May 12, 2023 at 10:15:18AM +0800, Rongguang Wei wrote:
> pciehp's behavior is incorrect if the Attention Button is pressed
> on an unoccupied slot.
[...]
> V2: Update to simple code and avoid gratuitous message.
> V3: Add Suggested-by.
> V4: Add state change conditional and message.
> 
> Fixes: d331710ea78f ("PCI: pciehp: Become resilient to missed events")
> Link: https://lore.kernel.org/linux-pci/20230403054619.19163-1-clementwei90@163.com/
> Link: https://lore.kernel.org/linux-pci/20230421025641.655991-1-clementwei90@163.com/

The changes from previous versions of the patch as well as the links to
those previous versions are usually placed below the three dashes.
That way they don't become part of the git history as they're generally
not interesting for future readers.  (The exception of that rule is the
gpu subsystem which habitually puts the changelog in the commit message.)

However I don't think you need to respin the patch because of that as
Bjorn can fix up the commit message when applying.  Just so you know
for future submissions.

> Suggested-by: Lukas Wunner <lukas@wunner.de>
> Signed-off-by: Rongguang Wei <weirongguang@kylinos.cn>

Reviewed-by: Lukas Wunner <lukas@wunner.de>
Cc: stable@vger.kernel.org # v4.19+

Thanks!

Lukas

> ---
>  drivers/pci/hotplug/pciehp_ctrl.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/drivers/pci/hotplug/pciehp_ctrl.c b/drivers/pci/hotplug/pciehp_ctrl.c
> index 529c34808440..32baba1b7f13 100644
> --- a/drivers/pci/hotplug/pciehp_ctrl.c
> +++ b/drivers/pci/hotplug/pciehp_ctrl.c
> @@ -256,6 +256,14 @@ void pciehp_handle_presence_or_link_change(struct controller *ctrl, u32 events)
>  	present = pciehp_card_present(ctrl);
>  	link_active = pciehp_check_link_active(ctrl);
>  	if (present <= 0 && link_active <= 0) {
> +		if (ctrl->state == BLINKINGON_STATE) {
> +			ctrl->state = OFF_STATE;
> +			cancel_delayed_work(&ctrl->button_work);
> +			pciehp_set_indicators(ctrl, PCI_EXP_SLTCTL_PWR_IND_OFF,
> +					      INDICATOR_NOOP);
> +			ctrl_info(ctrl, "Slot(%s): Card not present\n",
> +				  slot_name(ctrl));
> +		}
>  		mutex_unlock(&ctrl->state_lock);
>  		return;
>  	}
> -- 
> 2.25.1
Bjorn Helgaas May 17, 2023, 9:02 p.m. UTC | #2
On Fri, May 12, 2023 at 10:15:18AM +0800, Rongguang Wei wrote:
> From: Rongguang Wei <weirongguang@kylinos.cn>
> 
> pciehp's behavior is incorrect if the Attention Button is pressed
> on an unoccupied slot.
> 
> When a Presence Detect Changed event has occurred, the slot status
> in either BLINKINGOFF_STATE or OFF_STATE, turn it off unconditionally.
> But if the slot status is in BLINKINGON_STATE and the slot is currently
> empty, the slot status was staying in BLINKINGON_STATE.

Thanks for the patch!

I don't quite follow the events here.  I think the current behavior is
this (tell me if I'm going wrong):

  - Slot is empty (OFF_STATE).

  - User presses Attention Button.  pciehp_handle_button_press() sets
    state to BLINKINGON_STATE, sets power indicator to blinking,
    schedules pciehp_queue_pushbutton_work() to turn on power after
    5 seconds.

  - When pciehp_queue_pushbutton_work() runs 5 seconds later, it
    synthesizes a PCI_EXP_SLTSTA_PDC event and wakes the IRQ thread.

  - The IRQ thread (pciehp_ist()) calls
    pciehp_handle_presence_or_link_change(), which does nothing since
    the slot is in BLINKINGON_STATE, the slot is empty, and the link
    is not active.

  - Slot incorrectly remains in BLINKINGON_STATE and power indicator
    remains blinking.

And this patch changes pciehp_handle_presence_or_link_change() so that
if the slot is empty, the link is not acive, and the slot is in
BLINKINGON_STATE, we put it in OFF_STATE, cancel the delayed work, and
turn off the power indicator.

After this patch, the user experience is this:

  - Slot is empty (OFF_STATE).

  - User presses Attention Button.

  - Power indicator blinks for 5 seconds.

  - Power indicator turns off.

which definitely seems better.

I'm curious why we want the 5 seconds of blinking power indicator at
all.  We can't really do anything in response to an Attention Button
on an empty slot, so could we just ignore it completely in
pciehp_handle_button_press()?

IIUC, this patch leads to messages like these, which are slightly
confusing because we say we're powering up the slot, then later decide
"oops, there's nothing here, never mind" (or, I guess the user could
push the button, *then* insert the card, and we would power it up,
which seems a little sketchy):

  [ 0.000] pcieport 0000:00:01.5: pciehp: Slot(0-5): Attention button pressed
  [ 0.001] pcieport 0000:00:01.5: pciehp: Slot(0-5): Powering on due to button press
  [ 5.001] pcieport 0000:00:01.5: pciehp: Slot(0-5): Card not present

Is there a spec that covers the user experience of this case?  The
closest I could find are SHPC r1.0, sec 2.5, and PCIe r6.0, sec
6.7.1.5.  Both mention the 5-second abort interval with the power
indicator blinking, but they implicitly assume the slot is occupied.
Neither mentions the empty slot case.

> The message print like this:
>     pcieport 0000:00:01.5: pciehp: Slot(0-5): Attention button pressed
>     pcieport 0000:00:01.5: pciehp: Slot(0-5) Powering on due to button press
>     pcieport 0000:00:01.5: pciehp: Slot(0-5): Attention button pressed
>     pcieport 0000:00:01.5: pciehp: Slot(0-5): Button cancel
>     pcieport 0000:00:01.5: pciehp: Slot(0-5): Action canceled due to button press
>
> It cause the next Attention Button Pressed event become Button cancel
> and missing the Presence Detect Changed event with this button press
> though this button presses event is occurred after 5s.

It seems like the problem ("empty slot staying in BLINKINGON_STATE
forever after one Attention Button event") only requires one button
press.

If so, why do we talk about the *next* button press here?

> According to the Commit d331710ea78f ("PCI: pciehp: Become resilient
> to missed events"), if the slot is currently occupied, turn it on and
> if the slot is empty, it need to set in OFF_STATE rather than stay in
> current status when pciehp_handle_presence_or_link_change() bails out.
>
> Fixes: d331710ea78f ("PCI: pciehp: Become resilient to missed events")
> Link: https://lore.kernel.org/linux-pci/20230403054619.19163-1-clementwei90@163.com/
> Link: https://lore.kernel.org/linux-pci/20230421025641.655991-1-clementwei90@163.com/
> Suggested-by: Lukas Wunner <lukas@wunner.de>
> Signed-off-by: Rongguang Wei <weirongguang@kylinos.cn>
> ---
>  drivers/pci/hotplug/pciehp_ctrl.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/drivers/pci/hotplug/pciehp_ctrl.c b/drivers/pci/hotplug/pciehp_ctrl.c
> index 529c34808440..32baba1b7f13 100644
> --- a/drivers/pci/hotplug/pciehp_ctrl.c
> +++ b/drivers/pci/hotplug/pciehp_ctrl.c
> @@ -256,6 +256,14 @@ void pciehp_handle_presence_or_link_change(struct controller *ctrl, u32 events)
>  	present = pciehp_card_present(ctrl);
>  	link_active = pciehp_check_link_active(ctrl);
>  	if (present <= 0 && link_active <= 0) {
> +		if (ctrl->state == BLINKINGON_STATE) {
> +			ctrl->state = OFF_STATE;
> +			cancel_delayed_work(&ctrl->button_work);
> +			pciehp_set_indicators(ctrl, PCI_EXP_SLTCTL_PWR_IND_OFF,
> +					      INDICATOR_NOOP);
> +			ctrl_info(ctrl, "Slot(%s): Card not present\n",
> +				  slot_name(ctrl));
> +		}
>  		mutex_unlock(&ctrl->state_lock);
>  		return;
>  	}
Lukas Wunner May 18, 2023, 6:25 a.m. UTC | #3
On Wed, May 17, 2023 at 04:02:01PM -0500, Bjorn Helgaas wrote:
> I'm curious why we want the 5 seconds of blinking power indicator at
> all.  We can't really do anything in response to an Attention Button
> on an empty slot, so could we just ignore it completely in
> pciehp_handle_button_press()?

That wouldn't cover the case where the slot is occupied when the
button is pressed, but the card is yanked out during the 5 second
blinking interval.

We'd still need the present commit to handle that case.

Thanks,

Lukas
Bjorn Helgaas May 18, 2023, 11:09 a.m. UTC | #4
On Thu, May 18, 2023 at 08:25:57AM +0200, Lukas Wunner wrote:
> On Wed, May 17, 2023 at 04:02:01PM -0500, Bjorn Helgaas wrote:
> > I'm curious why we want the 5 seconds of blinking power indicator at
> > all.  We can't really do anything in response to an Attention Button
> > on an empty slot, so could we just ignore it completely in
> > pciehp_handle_button_press()?
> 
> That wouldn't cover the case where the slot is occupied when the
> button is pressed, but the card is yanked out during the 5 second
> blinking interval.

Obviously we can't ignore a button press when the slot is occupied,
because that's part of the "insert card, press button to power it up"
and "press button to power down card, remove card" flows.

I'm asking about ignoring it when the slot is empty, which would mean
adding a check for card presence in pciehp_handle_button_press().  But
maybe there's a reason why we can't do that there?

Bjorn
Rongguang Wei May 19, 2023, 1:27 a.m. UTC | #5
Hi

On 5/18/23 7:09 PM, Bjorn Helgaas wrote:
> On Thu, May 18, 2023 at 08:25:57AM +0200, Lukas Wunner wrote:
>> On Wed, May 17, 2023 at 04:02:01PM -0500, Bjorn Helgaas wrote:
>>> I'm curious why we want the 5 seconds of blinking power indicator at
>>> all.  We can't really do anything in response to an Attention Button
>>> on an empty slot, so could we just ignore it completely in
>>> pciehp_handle_button_press()?
>>
>> That wouldn't cover the case where the slot is occupied when the
>> button is pressed, but the card is yanked out during the 5 second
>> blinking interval.
> 
> Obviously we can't ignore a button press when the slot is occupied,
> because that's part of the "insert card, press button to power it up"
> and "press button to power down card, remove card" flows.
> 
> I'm asking about ignoring it when the slot is empty, which would mean
> adding a check for card presence in pciehp_handle_button_press().  But
> maybe there's a reason why we can't do that there?
> 
> Bjorn
> 
I think we can't add a check in pciehp_handle_button_press() because
this function is handle the ABP event and the slot is occupied or empty
is PDC event. Those two events are better not control in one function.

Thanks.
Lukas Wunner May 19, 2023, 6:22 a.m. UTC | #6
On Thu, May 18, 2023 at 06:09:41AM -0500, Bjorn Helgaas wrote:
> On Thu, May 18, 2023 at 08:25:57AM +0200, Lukas Wunner wrote:
> > On Wed, May 17, 2023 at 04:02:01PM -0500, Bjorn Helgaas wrote:
> > > I'm curious why we want the 5 seconds of blinking power indicator at
> > > all.  We can't really do anything in response to an Attention Button
> > > on an empty slot, so could we just ignore it completely in
> > > pciehp_handle_button_press()?
> > 
> > That wouldn't cover the case where the slot is occupied when the
> > button is pressed, but the card is yanked out during the 5 second
> > blinking interval.
> 
> Obviously we can't ignore a button press when the slot is occupied,
> because that's part of the "insert card, press button to power it up"
> and "press button to power down card, remove card" flows.
> 
> I'm asking about ignoring it when the slot is empty, which would mean
> adding a check for card presence in pciehp_handle_button_press().  But
> maybe there's a reason why we can't do that there?

It would of course be possible to copy/paste the pciehp_card_present() +
pciehp_check_link_active() check from pciehp_handle_presence_or_link_change().

The only downside is that the symmetry between the ON_STATE / OFF_STATE
cases in pciehp_handle_button_press() could no longer be preserved.
(Because the additional check only applies to OFF_STATE.)  So it could
be argued that readability becomes a little worse and the logic of the
state machine slightly more difficult to follow.

Ultimately any engineering discipline boils down to balancing various
competing traits (such as simplicity of code versus usability) and
personally I would decide to continue allowing the "press button first,
insert card afterwards" usage model because it keeps the code lean.

Unfortunately back in the day the PCISIG decided to saddle PCIe hotplug
with numerous optional features which now complicate implementations.
Form factors implementing the Attention Button seem pretty rare,
as evidenced by the fact this glitch was discovered by Rongguang Wei
almost 5 years after the issue was introduced.  I think most modern
form factors just use surprise-removal instead (NVMe drives in data
centers specifically, and Thunderbolt).

The present commit is necessary anyway to deal with the "card is in slot
when button is pressed, but yanked within 5 seconds" case.  The additional
check in pciehp_handle_button_press() you're contemplating to avoid bringup
attempts if the card is not in the slot upon button press is optional.

Your call. :)

Thanks,

Lukas
Bjorn Helgaas May 19, 2023, 8:55 p.m. UTC | #7
On Wed, May 17, 2023 at 04:02:01PM -0500, Bjorn Helgaas wrote:
> On Fri, May 12, 2023 at 10:15:18AM +0800, Rongguang Wei wrote:
> > From: Rongguang Wei <weirongguang@kylinos.cn>
> > 
> > pciehp's behavior is incorrect if the Attention Button is pressed
> > on an unoccupied slot.
> > 
> > When a Presence Detect Changed event has occurred, the slot status
> > in either BLINKINGOFF_STATE or OFF_STATE, turn it off unconditionally.

Was this supposed to say "BLINKINGOFF_STATE or ON_STATE"
(not "OFF_STATE")?

BLINKINGOFF_STATE and ON_STATE are the only cases where
pciehp_handle_presence_or_link_change() calls pciehp_disable_slot() to
turn off slot power.

> > The message print like this:
> >     pcieport 0000:00:01.5: pciehp: Slot(0-5): Attention button pressed
> >     pcieport 0000:00:01.5: pciehp: Slot(0-5) Powering on due to button press
> >     pcieport 0000:00:01.5: pciehp: Slot(0-5): Attention button pressed
> >     pcieport 0000:00:01.5: pciehp: Slot(0-5): Button cancel
> >     pcieport 0000:00:01.5: pciehp: Slot(0-5): Action canceled due to button press

I think the above messages are from *two* button presses, which
complicates the description unnecessarily, since IIUC we only need one
press to see the problem.

I propose the following commit log:

  If a PCIe hotplug slot has an Attention Button, the normal hot-add
  flow is:

    - Slot is empty and slot power is off
    - User inserts card in slot and presses Attention Button
    - OS blinks Power Indicator for 5 seconds
    - After 5 seconds, OS turns on Power Indicator, turns on slot
      power, and enumerates the device

  Previously, if a user pressed the Attention Button on an *empty*
  slot, pciehp logged the following messages and blinked the Power
  Indicator indefinitely:

    [0.000] pcieport 0000:00:01.5: pciehp: Slot(0-5): Attention button pressed
    [0.001] pcieport 0000:00:01.5: pciehp: Slot(0-5): Powering on due to button press

  An empty slot is in OFF_STATE.  When the Attention Button is pressed,
  pciehp_handle_button_press() puts the slot in BLINKINGON_STATE, sets
  the Power Indicator blinking, and schedules pciehp_queue_pushbutton_work()
  to run 5 seconds later.

  pciehp_queue_pushbutton_work() synthesizes a Presence Detect Changed
  event, and pciehp_handle_presence_or_link_change() exits when it
  finds the slot empty, leaving the slot in BLINKINGON_STATE with the
  Power Indicator blinking.

  To fix the indefinitely blinking Power Indicator, change
  pciehp_handle_presence_or_link_change() to put the empty slot back
  in OFF_STATE and turn off the Power Indicator before exiting.

  The Power Indicator will blink for 5 seconds before stopping, and
  messages like this will be logged:

    [0.000] pcieport 0000:00:01.5: pciehp: Slot(0-5): Attention button pressed
    [0.001] pcieport 0000:00:01.5: pciehp: Slot(0-5): Powering on due to button press
    [5.001] pcieport 0000:00:01.5: pciehp: Slot(0-5): Card not present

Let me know if I got anything wrong above.

Bjorn

> > Fixes: d331710ea78f ("PCI: pciehp: Become resilient to missed events")
> > Link: https://lore.kernel.org/linux-pci/20230403054619.19163-1-clementwei90@163.com/
> > Link: https://lore.kernel.org/linux-pci/20230421025641.655991-1-clementwei90@163.com/
> > Suggested-by: Lukas Wunner <lukas@wunner.de>
> > Signed-off-by: Rongguang Wei <weirongguang@kylinos.cn>
> > ---
> >  drivers/pci/hotplug/pciehp_ctrl.c | 8 ++++++++
> >  1 file changed, 8 insertions(+)
> > 
> > diff --git a/drivers/pci/hotplug/pciehp_ctrl.c b/drivers/pci/hotplug/pciehp_ctrl.c
> > index 529c34808440..32baba1b7f13 100644
> > --- a/drivers/pci/hotplug/pciehp_ctrl.c
> > +++ b/drivers/pci/hotplug/pciehp_ctrl.c
> > @@ -256,6 +256,14 @@ void pciehp_handle_presence_or_link_change(struct controller *ctrl, u32 events)
> >  	present = pciehp_card_present(ctrl);
> >  	link_active = pciehp_check_link_active(ctrl);
> >  	if (present <= 0 && link_active <= 0) {
> > +		if (ctrl->state == BLINKINGON_STATE) {
> > +			ctrl->state = OFF_STATE;
> > +			cancel_delayed_work(&ctrl->button_work);
> > +			pciehp_set_indicators(ctrl, PCI_EXP_SLTCTL_PWR_IND_OFF,
> > +					      INDICATOR_NOOP);
> > +			ctrl_info(ctrl, "Slot(%s): Card not present\n",
> > +				  slot_name(ctrl));
> > +		}
> >  		mutex_unlock(&ctrl->state_lock);
> >  		return;
> >  	}
Lukas Wunner May 20, 2023, 8:31 a.m. UTC | #8
On Fri, May 19, 2023 at 03:55:35PM -0500, Bjorn Helgaas wrote:
> On Wed, May 17, 2023 at 04:02:01PM -0500, Bjorn Helgaas wrote:
> > On Fri, May 12, 2023 at 10:15:18AM +0800, Rongguang Wei wrote:
> > > From: Rongguang Wei <weirongguang@kylinos.cn>
> > > 
> > > pciehp's behavior is incorrect if the Attention Button is pressed
> > > on an unoccupied slot.
> > > 
> > > When a Presence Detect Changed event has occurred, the slot status
> > > in either BLINKINGOFF_STATE or OFF_STATE, turn it off unconditionally.
> 
> Was this supposed to say "BLINKINGOFF_STATE or ON_STATE"
> (not "OFF_STATE")?

Yes I think you're right.


> I propose the following commit log:
[...]
>   pciehp_queue_pushbutton_work() synthesizes a Presence Detect Changed
>   event, and pciehp_handle_presence_or_link_change() exits when it
>   finds the slot empty, leaving the slot in BLINKINGON_STATE with the
>   Power Indicator blinking.
> 
>   To fix the indefinitely blinking Power Indicator, change
>   pciehp_handle_presence_or_link_change() to put the empty slot back
>   in OFF_STATE and turn off the Power Indicator before exiting.

The indefinitely blinking Power Indicator is only one half of the problem.
The other half is that the next button press doesn't result in slot
bringup, even if the slot is occupied and the 5 second timeout has
elapsed.  Suggested wording, feel free to rephrase as you see fit:

  Because the slot was previously left in BLINKINGON_STATE, the next
  button press was interpreted as a "button cancel" event, even if the
  slot was occupied upon that next button press:  pciehp stopped blinking
  and did not perform another slot bringup attempt.

  By putting the slot in OFF_STATE, such user-unfriendly behavior is
  avoided:  Instead, the next button press will result in the slot
  starting to blink again and another bringup attempt after 5 seconds.

Thanks,

Lukas
Bjorn Helgaas May 22, 2023, 9:10 p.m. UTC | #9
On Sat, May 20, 2023 at 10:31:18AM +0200, Lukas Wunner wrote:
> On Fri, May 19, 2023 at 03:55:35PM -0500, Bjorn Helgaas wrote:
> > On Wed, May 17, 2023 at 04:02:01PM -0500, Bjorn Helgaas wrote:
> > > On Fri, May 12, 2023 at 10:15:18AM +0800, Rongguang Wei wrote:
> > > > From: Rongguang Wei <weirongguang@kylinos.cn>
> > > > 
> > > > pciehp's behavior is incorrect if the Attention Button is pressed
> > > > on an unoccupied slot.
> > > > 
> > > > When a Presence Detect Changed event has occurred, the slot status
> > > > in either BLINKINGOFF_STATE or OFF_STATE, turn it off unconditionally.
> > 
> > Was this supposed to say "BLINKINGOFF_STATE or ON_STATE"
> > (not "OFF_STATE")?
> 
> Yes I think you're right.
> 
> > I propose the following commit log:
> [...]
> >   pciehp_queue_pushbutton_work() synthesizes a Presence Detect Changed
> >   event, and pciehp_handle_presence_or_link_change() exits when it
> >   finds the slot empty, leaving the slot in BLINKINGON_STATE with the
> >   Power Indicator blinking.
> > 
> >   To fix the indefinitely blinking Power Indicator, change
> >   pciehp_handle_presence_or_link_change() to put the empty slot back
> >   in OFF_STATE and turn off the Power Indicator before exiting.
> 
> The indefinitely blinking Power Indicator is only one half of the problem.
> The other half is that the next button press doesn't result in slot
> bringup, even if the slot is occupied and the 5 second timeout has
> elapsed.  

Thanks for your patience, I think I understand that.  Here's another
try:

  Previously, if a user pressed the Attention Button on an *empty* slot,
  pciehp logged the following messages and blinked the Power Indicator
  until a second button press:

    [0.000] pciehp: Attention button pressed
    [0.001] pciehp: Powering on due to button press
    [0.002] # Power Indicator starts blinking
    [5.002] # 5 second timeout should abort power-on sequence, but doesn't

    [8.000] # Power Indicator should be off, but is still blinking
    [9.000] # possible card insertion
    [9.000] pciehp: Attention button pressed
    [9.001] pciehp: Button cancel
    [9.002] pciehp: Action canceled due to button press

  The first button press incorrectly left the slot in BLINKINGON_STATE,
  so the second was interpreted as a "button cancel" event regardless of
  whether a card was present.

  If the slot is empty, turn off the Power Indicator and return from
  BLINKINGON_STATE to OFF_STATE after 5 seconds.  Putting the slot in
  OFF_STATE also means the second button press will correctly start a
  bringup attempt if the slot is occupied.

Maybe the above is enough for a commit log.  The notes below are my
attempt to work through in more detail:

IIUC, if the button is pressed twice on an empty slot, we end up back
in the "Empty slot, OFF" state (although the indicator blinks until
the second press, when it should stop after 5 seconds), and inserting
a card and pressing the button works as expected.

The problem is when the card is inserted between first and second
button presses, where the second press cancels the BLINKINGON when it
should *start* BLINKINGON.  A third press would power on the slot,
when it should go to BLINKINGOFF to power it off:

                    Slot        v6.4               Expected
                    --------    -----------        -----------
  Slot empty        Empty       OFF                OFF
  Button press 1    Empty       BLINKINGON         BLINKINGON
                                "Powering on"      "Powering on"
                                sched-work         sched-work
    +5s synth PDC   Empty       BLINKINGON         OFF
                                (a)                "Card not present"
  Insert card       Occupied    BLINKINGON         OFF
  Button press 2    Occupied    OFF                BLINKINGON
                                "Button cancel"    "Powering on"
                                                   sched-work
    +5s synth PDC   Occupied    (b, N/A)           POWERON
    Power control   Occupied    (b, N/A)           ON
  Button press 3    Occupied    BLINKINGON         BLINKINGOFF
                                "Powering on"      "Powering off"
                                sched-work         sched-work
    +5s synth PDC   Occupied    POWERON            POWEROFF
    Power control   Occupied    ON                 OFF

At (a), v6.4-rc1 will blink until another button press.  At (b), the
button press generates a "Button cancel" message and does not schedule
button_work.

And (b) is the situation you refer to where the second button press
doesn't bring the slot up when it should.  Right?

> Suggested wording, feel free to rephrase as you see fit:
> 
>   Because the slot was previously left in BLINKINGON_STATE, the next
>   button press was interpreted as a "button cancel" event, even if the
>   slot was occupied upon that next button press:  pciehp stopped blinking
>   and did not perform another slot bringup attempt.
> 
>   By putting the slot in OFF_STATE, such user-unfriendly behavior is
>   avoided:  Instead, the next button press will result in the slot
>   starting to blink again and another bringup attempt after 5 seconds.
> 
> Thanks,
> 
> Lukas
Lukas Wunner May 23, 2023, 5:03 a.m. UTC | #10
On Mon, May 22, 2023 at 04:10:36PM -0500, Bjorn Helgaas wrote:
> Thanks for your patience, I think I understand that.  Here's another
> try:

The wording LGTM.


> And (b) is the situation you refer to where the second button press
> doesn't bring the slot up when it should.  Right?

Yes, exactly.

Thanks,

Lukas
diff mbox series

Patch

diff --git a/drivers/pci/hotplug/pciehp_ctrl.c b/drivers/pci/hotplug/pciehp_ctrl.c
index 529c34808440..32baba1b7f13 100644
--- a/drivers/pci/hotplug/pciehp_ctrl.c
+++ b/drivers/pci/hotplug/pciehp_ctrl.c
@@ -256,6 +256,14 @@  void pciehp_handle_presence_or_link_change(struct controller *ctrl, u32 events)
 	present = pciehp_card_present(ctrl);
 	link_active = pciehp_check_link_active(ctrl);
 	if (present <= 0 && link_active <= 0) {
+		if (ctrl->state == BLINKINGON_STATE) {
+			ctrl->state = OFF_STATE;
+			cancel_delayed_work(&ctrl->button_work);
+			pciehp_set_indicators(ctrl, PCI_EXP_SLTCTL_PWR_IND_OFF,
+					      INDICATOR_NOOP);
+			ctrl_info(ctrl, "Slot(%s): Card not present\n",
+				  slot_name(ctrl));
+		}
 		mutex_unlock(&ctrl->state_lock);
 		return;
 	}