PCI,pciehp: Don't handle PDC for cards with attention button

Message ID 20170217061247.5591-1-yinghai@kernel.org
State New
Headers show

Commit Message

Yinghai Lu Feb. 17, 2017, 6:12 a.m.
On one system during power off slot, find card get power on right away.
 # echo 0 > /sys/bus/pci/slots/1/power
 pci_hotplug: power_write_file: power = 0
 pciehp 0000:16:00.0:pcie004: pciehp_get_power_status: SLOTCTRL a8 value read 11f1
 pciehp 0000:16:00.0:pcie004: pciehp_unconfigure_device: domain:bus:dev = 0000:17:00
 pci 0000:17:00.0: PME# disabled
 pci 0000:17:00.0: freeing pci_dev info
 pciehp 0000:16:00.0:pcie004: pending interrupts 0x0010 from Slot Status
 pciehp 0000:16:00.0:pcie004: pciehp_power_off_slot: SLOTCTRL a8 write cmd 400
 pciehp 0000:16:00.0:pcie004: pending interrupts 0x0108 from Slot Status
 pciehp 0000:16:00.0:pcie004: Slot(1): Link Down
 pciehp 0000:16:00.0:pcie004: Slot(1): Link Down event ignored; already powering off
 pciehp 0000:16:00.0:pcie004: pciehp_green_led_off: SLOTCTRL a8 write cmd 300
 pciehp 0000:16:00.0:pcie004: pending interrupts 0x0018 from Slot Status  <======
 pciehp 0000:16:00.0:pcie004: Slot(1): Card present
 pciehp 0000:16:00.0:pcie004: pciehp_get_power_status: SLOTCTRL a8 value read 17f1
 pciehp 0000:16:00.0:pcie004: pending interrupts 0x0010 from Slot Status
 pciehp 0000:16:00.0:pcie004: pciehp_power_on_slot: SLOTCTRL a8 write cmd 0
 pciehp 0000:16:00.0:pcie004: pciehp_green_led_blink: SLOTCTRL a8 write cmd 200
 pciehp 0000:16:00.0:pcie004: pending interrupts 0x0010 from Slot Status
 pciehp 0000:16:00.0:pcie004: pciehp_check_link_active: lnk_status = f103
 pciehp 0000:16:00.0:pcie004: pending interrupts 0x0108 from Slot Status
 pciehp 0000:16:00.0:pcie004: Slot(1): Link Up
...

Somehow PDC bit get set, and during handling interrupt that is caused by
CC, that PDC also get handled, and the card get powered on again.

In pcie_enable_notification(), we already only enable notification
for PDC when attention button is not there.
So we can safely add checking in pciehp_isr() to skip that PDC handling.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 drivers/pci/hotplug/pciehp_hpc.c |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Comments

Ashok Raj Feb. 17, 2017, 5:40 p.m. | #1
Hi Yinghai

Which version of linux did you apply this? 

I'm not sure if you can ignore PDC when ATTN isn't present. Surprise hot-add on
systems with Power Control would depend on PDC.

There is some new code to deal with both Presence detect and DLLSC events since
4.10-rc1. 


On Thu, Feb 16, 2017 at 10:12:47PM -0800, Yinghai Lu wrote:
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> 
> ---
>  drivers/pci/hotplug/pciehp_hpc.c |    5 +++--

> +		if (!ATTN_BUTTN(ctrl))
> +			pciehp_queue_interrupt_event(slot, present ?
> +					INT_PRESENCE_ON : INT_PRESENCE_OFF);
>  	}
>  
>  	/* Check Power Fault Detected */

This is what we have since v4.10-rc1

    /*
     * Check Link Status Changed at higher precedence than Presence
     * Detect Changed.  The PDS value may be set to "card present" from
     * out-of-band detection, which may be in conflict with a Link Down
     * and cause the wrong event to queue.
     */
    if (events & PCI_EXP_SLTSTA_DLLSC) {
        ctrl_info(ctrl, "Slot(%s): Link %s\n", slot_name(slot),
              link ? "Up" : "Down");
        pciehp_queue_interrupt_event(slot, link ? INT_LINK_UP :
                         INT_LINK_DOWN);
    } else if (events & PCI_EXP_SLTSTA_PDC) {
        present = !!(status & PCI_EXP_SLTSTA_PDS);
        ctrl_info(ctrl, "Slot(%s): Card %spresent\n", slot_name(slot),
              present ? "" : "not ");
        pciehp_queue_interrupt_event(slot, present ? INT_PRESENCE_ON :
                         INT_PRESENCE_OFF);
    }


Cheers
Ashok
Yinghai Lu Feb. 17, 2017, 6:56 p.m. | #2
On Fri, Feb 17, 2017 at 9:40 AM, Raj, Ashok <ashok.raj@intel.com> wrote:
> Hi Yinghai
>
> Which version of linux did you apply this?

current linus tree + pci/next

>
> I'm not sure if you can ignore PDC when ATTN isn't present. Surprise hot-add on
> systems with Power Control would depend on PDC.

in pcie_enable_notification() we don't enable PDCE when ATTN is not there.

        cmd = PCI_EXP_SLTCTL_DLLSCE;
        if (ATTN_BUTTN(ctrl))
                cmd |= PCI_EXP_SLTCTL_ABPE;
        else
                cmd |= PCI_EXP_SLTCTL_PDCE;
        if (!pciehp_poll_mode)
                cmd |= PCI_EXP_SLTCTL_HPIE | PCI_EXP_SLTCTL_CCIE;

        mask = (PCI_EXP_SLTCTL_PDCE | PCI_EXP_SLTCTL_ABPE |
                PCI_EXP_SLTCTL_PFDE |
                PCI_EXP_SLTCTL_HPIE | PCI_EXP_SLTCTL_CCIE |
                PCI_EXP_SLTCTL_DLLSCE);

        pcie_write_cmd_nowait(ctrl, cmd, mask);

So we should handle PDC at all.

Thanks

Yinghai
Bjorn Helgaas Feb. 17, 2017, 10:39 p.m. | #3
On Thu, Feb 16, 2017 at 10:12:47PM -0800, Yinghai Lu wrote:
> On one system during power off slot, find card get power on right away.
>  # echo 0 > /sys/bus/pci/slots/1/power
>  pci_hotplug: power_write_file: power = 0
>  pciehp 0000:16:00.0:pcie004: pciehp_get_power_status: SLOTCTRL a8 value read 11f1
>  pciehp 0000:16:00.0:pcie004: pciehp_unconfigure_device: domain:bus:dev = 0000:17:00
>  pci 0000:17:00.0: PME# disabled
>  pci 0000:17:00.0: freeing pci_dev info
>  pciehp 0000:16:00.0:pcie004: pending interrupts 0x0010 from Slot Status
>  pciehp 0000:16:00.0:pcie004: pciehp_power_off_slot: SLOTCTRL a8 write cmd 400
>  pciehp 0000:16:00.0:pcie004: pending interrupts 0x0108 from Slot Status
>  pciehp 0000:16:00.0:pcie004: Slot(1): Link Down
>  pciehp 0000:16:00.0:pcie004: Slot(1): Link Down event ignored; already powering off
>  pciehp 0000:16:00.0:pcie004: pciehp_green_led_off: SLOTCTRL a8 write cmd 300
>  pciehp 0000:16:00.0:pcie004: pending interrupts 0x0018 from Slot Status  <======
>  pciehp 0000:16:00.0:pcie004: Slot(1): Card present
>  pciehp 0000:16:00.0:pcie004: pciehp_get_power_status: SLOTCTRL a8 value read 17f1
>  pciehp 0000:16:00.0:pcie004: pending interrupts 0x0010 from Slot Status
>  pciehp 0000:16:00.0:pcie004: pciehp_power_on_slot: SLOTCTRL a8 write cmd 0
>  pciehp 0000:16:00.0:pcie004: pciehp_green_led_blink: SLOTCTRL a8 write cmd 200
>  pciehp 0000:16:00.0:pcie004: pending interrupts 0x0010 from Slot Status
>  pciehp 0000:16:00.0:pcie004: pciehp_check_link_active: lnk_status = f103
>  pciehp 0000:16:00.0:pcie004: pending interrupts 0x0108 from Slot Status
>  pciehp 0000:16:00.0:pcie004: Slot(1): Link Up
> ...
> 
> Somehow PDC bit get set, and during handling interrupt that is caused by
> CC, that PDC also get handled, and the card get powered on again.
> 
> In pcie_enable_notification(), we already only enable notification
> for PDC when attention button is not there.
> So we can safely add checking in pciehp_isr() to skip that PDC handling.
> 
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> 
> ---
>  drivers/pci/hotplug/pciehp_hpc.c |    5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> Index: linux-2.6/drivers/pci/hotplug/pciehp_hpc.c
> ===================================================================
> --- linux-2.6.orig/drivers/pci/hotplug/pciehp_hpc.c
> +++ linux-2.6/drivers/pci/hotplug/pciehp_hpc.c
> @@ -618,8 +618,9 @@ static irqreturn_t pciehp_isr(int irq, v
>  		present = !!(status & PCI_EXP_SLTSTA_PDS);
>  		ctrl_info(ctrl, "Slot(%s): Card %spresent\n", slot_name(slot),
>  			  present ? "" : "not ");
> -		pciehp_queue_interrupt_event(slot, present ? INT_PRESENCE_ON :
> -					     INT_PRESENCE_OFF);
> +		if (!ATTN_BUTTN(ctrl))
> +			pciehp_queue_interrupt_event(slot, present ?
> +					INT_PRESENCE_ON : INT_PRESENCE_OFF);

I don't think it really makes sense to tie PDC handling with the
attention button.  It might happen to avoid the problem on your
system, but I don't see the logical connection between them.

Can you reproduce this by disabling pciehp and driving this sequence
manually with setpci?  I suspect that we are tripping over our own
feet because we read PCI_EXP_SLTSTA once, clear it (probably too
early), then queue multiple events, then process those events possibly
simultaneously.

>  	}
>  
>  	/* Check Power Fault Detected */
Yinghai Lu Feb. 17, 2017, 11:36 p.m. | #4
On Fri, Feb 17, 2017 at 2:39 PM, Bjorn Helgaas <helgaas@kernel.org> wrote:
> On Thu, Feb 16, 2017 at 10:12:47PM -0800, Yinghai Lu wrote:
>
> I don't think it really makes sense to tie PDC handling with the
> attention button.  It might happen to avoid the problem on your
> system, but I don't see the logical connection between them.

but in pcie_enable_notification() we don't enable PDCE when ATTN is not there.

        cmd = PCI_EXP_SLTCTL_DLLSCE;
        if (ATTN_BUTTN(ctrl))
                cmd |= PCI_EXP_SLTCTL_ABPE;
        else
                cmd |= PCI_EXP_SLTCTL_PDCE;
        if (!pciehp_poll_mode)
                cmd |= PCI_EXP_SLTCTL_HPIE | PCI_EXP_SLTCTL_CCIE;

        mask = (PCI_EXP_SLTCTL_PDCE | PCI_EXP_SLTCTL_ABPE |
                PCI_EXP_SLTCTL_PFDE |
                PCI_EXP_SLTCTL_HPIE | PCI_EXP_SLTCTL_CCIE |
                PCI_EXP_SLTCTL_DLLSCE);

        pcie_write_cmd_nowait(ctrl, cmd, mask);

should we remove that check there ?

>
> Can you reproduce this by disabling pciehp and driving this sequence
> manually with setpci?  I suspect that we are tripping over our own
> feet because we read PCI_EXP_SLTSTA once, clear it (probably too
> early), then queue multiple events, then process those events possibly
> simultaneously.

sca15-2243-0a818158:~ # echo 1 > /sys/bus/pci/devices/0000\:3b\:00.0/remove
[  171.769322] pci 0000:3b:00.0: PME# disabled
[  171.774459] iommu: Removing device 0000:3b:00.0 from group 36
[  171.780984] pci 0000:3b:00.0: freeing pci_dev info
sca15-2243-0a818158:~ # setpci -s 0000:3a:00.0 0xa8.w
01cb
sca15-2243-0a818158:~ # setpci -s 0000:3a:00.0 0xaa.w
0050
sca15-2243-0a818158:~ # setpci -s 0000:3a:00.0 0xa8.w=0x05cb
sca15-2243-0a818158:~ # setpci -s 0000:3a:00.0 0xaa.w
0158

so after power off, status bit 3 the PDC get set.

Patch

Index: linux-2.6/drivers/pci/hotplug/pciehp_hpc.c
===================================================================
--- linux-2.6.orig/drivers/pci/hotplug/pciehp_hpc.c
+++ linux-2.6/drivers/pci/hotplug/pciehp_hpc.c
@@ -618,8 +618,9 @@  static irqreturn_t pciehp_isr(int irq, v
 		present = !!(status & PCI_EXP_SLTSTA_PDS);
 		ctrl_info(ctrl, "Slot(%s): Card %spresent\n", slot_name(slot),
 			  present ? "" : "not ");
-		pciehp_queue_interrupt_event(slot, present ? INT_PRESENCE_ON :
-					     INT_PRESENCE_OFF);
+		if (!ATTN_BUTTN(ctrl))
+			pciehp_queue_interrupt_event(slot, present ?
+					INT_PRESENCE_ON : INT_PRESENCE_OFF);
 	}
 
 	/* Check Power Fault Detected */