diff mbox

pciehp LinkState

Message ID 20131125194044.GA9710@google.com
State Not Applicable
Headers show

Commit Message

Bjorn Helgaas Nov. 25, 2013, 7:40 p.m. UTC
On Mon, Nov 25, 2013 at 07:00:01PM +0000, Rajat Jain wrote:
> Hello Martin,
> 
> > >
> > > Below is the overview since cold boot with no card inserted with all
> > those hotplug events. You see,
> > > pciehp never sets LinkState back to minus once the slot is empty:
> > >
> > > vostro ~ # grep LinkState
> > pciehp/3.12/cold_boot_no_card_in_slot/lspci_vvv_initial__inserted__eject
> > ed_but_not_realized__inserted__ejected.txt
> > >                         Changed: MRL- PresDet- LinkState-
> > >                         Changed: MRL- PresDet- LinkState+
> > >                         Changed: MRL- PresDet- LinkState+
> > >                         Changed: MRL- PresDet- LinkState+
> > >                         Changed: MRL- PresDet- LinkState+
> > 
> > I opened https://bugzilla.kernel.org/show_bug.cgi?id=65521 for this
> > issue.  I agree that pciehp should probably clear this bit.  Rajat
> > recently posted some patches [1] that add support for link state
> > changes.  I haven't look at them in detail yet, but I wouldn't be
> > surprised if they will fix this issue.  If you wanted to test them,
> > that would be awesome!
> > 
> > Bjorn
> > 
> > [1] http://lkml.kernel.org/r/528BE8B6.9080007@gmail.com
> > --
> 
> Yes, this issue should be fixed by the following 2 patches:
> 
> https://lkml.org/lkml/2013/11/19/542
> https://lkml.org/lkml/2013/11/19/543
> 
> The particular hunk you want to be looking at is this:
> 
> @@ -661,7 +669,7 @@ static irqreturn_t pcie_isr(int irq, void *dev_id)
>  
>  		detected &= (PCI_EXP_SLTSTA_ABP | PCI_EXP_SLTSTA_PFD |
>  			     PCI_EXP_SLTSTA_MRLSC | PCI_EXP_SLTSTA_PDC |
> -			     PCI_EXP_SLTSTA_CC);
> +			     PCI_EXP_SLTSTA_CC | PCI_EXP_SLTSTA_DLLSC);
>  		detected &= ~intr_loc;
>  		intr_loc |= detected;
>  		if (!intr_loc)
> 
> 
> Would you be please kind enough to test the patch out? No need to pass any module parameter (Just apply patch and build & boot normally like you would).

Martin, here are Rajat's patches (all four of them), in case it's easier
than collecting them from lkml.org.  The following applies to v3.12
with "patch -p1 < <MSG>".


-------------------------------------------------------------------------------
pciehp-make-check_link_active
-------------------------------------------------------------------------------
pciehp: Make check_link_active() non-static

From: Rajat Jain <rajatjain.linux@gmail.com>

check_link_active() functionality needs to be used by subsequent patches
(that introduce link state change based hotplug). Thus make the function
non-static, and rename it to pciehp_check_link_active() so as to be
consistent with other non-static functions.

Signed-off-by: Rajat Jain <rajatjain@juniper.net>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
---
 drivers/pci/hotplug/pciehp.h     |    1 +
 drivers/pci/hotplug/pciehp_hpc.c |    6 +++---
 2 files changed, 4 insertions(+), 3 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Bjorn Helgaas Nov. 26, 2013, 3:13 a.m. UTC | #1
On Tue, Nov 26, 2013 at 02:06:03AM +0100, Martin Mokrejs wrote:
> Hi Rajat and Bjorn,
>   I tested all the patches, comments below:
> 
> Bjorn Helgaas wrote:
> > On Mon, Nov 25, 2013 at 07:00:01PM +0000, Rajat Jain wrote:
> >> Hello Martin,
> >>
> >>>>
> >>>> Below is the overview since cold boot with no card inserted with all
> >>> those hotplug events. You see,
> >>>> pciehp never sets LinkState back to minus once the slot is empty:
> >>>>
> >>>> vostro ~ # grep LinkState
> >>> pciehp/3.12/cold_boot_no_card_in_slot/lspci_vvv_initial__inserted__eject
> >>> ed_but_not_realized__inserted__ejected.txt
> >>>>                         Changed: MRL- PresDet- LinkState-
> >>>>                         Changed: MRL- PresDet- LinkState+
> >>>>                         Changed: MRL- PresDet- LinkState+
> >>>>                         Changed: MRL- PresDet- LinkState+
> >>>>                         Changed: MRL- PresDet- LinkState+
> >>>
> >>> I opened https://bugzilla.kernel.org/show_bug.cgi?id=65521 for this
> >>> issue.  I agree that pciehp should probably clear this bit.  Rajat
> >>> recently posted some patches [1] that add support for link state
> >>> changes.  I haven't look at them in detail yet, but I wouldn't be
> >>> surprised if they will fix this issue.  If you wanted to test them,
> >>> that would be awesome!
> 
> In brief, the patch works, note below on every 5th line there is LinkState-,
> so the LinkState is not stuck at plus anymore. Why I never see it plus
> I do not know, maybe it is cleared too quickly?
> 
> # grep Changed lspci_vvv_initial* | grep LinkState
> lspci_vvv_initial.txt:                  Changed: MRL- PresDet- LinkState-
> lspci_vvv_initial.txt:                  Changed: MRL- PresDet- LinkState+
> lspci_vvv_initial.txt:                  Changed: MRL- PresDet- LinkState+
> lspci_vvv_initial.txt:                  Changed: MRL- PresDet- LinkState+
> lspci_vvv_initial.txt:                  Changed: MRL- PresDet- LinkState-


Great, thanks for testing this.  The bit labelled "LinkState" here is the
"Data Link Layer State *Changed*" bit in the Slot Status register.  It is
set whenever the Data Link Layer Link Active bit in the Link Status
register is changed.  So this LinkState bit doesn't tell you the state of
the link; basically it just tells you that the link changed from "down" to
"up", or from "up" to "down."

You should not normally see this "LinkStateChanged" bit set via lspci
because pciehp will get an interrupt when it is set, and the ISR will
notice that the link state has changed and immediately clear the
LinkStateChanged bit.

> During very first card hotinsert I get on the upstream rootport:
> 
> 
> # diff -u30 -w lspci_vvv_initial.txt lspci_vvv_initial__inserted.txt 
> --- lspci_vvv_initial.txt       2013-11-26 01:12:19.260000366 +0100
> +++ lspci_vvv_initial__inserted.txt     2013-11-26 01:13:02.780000480 +0100
>  00:1c.7 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 8 (rev b5) (prog-if 00 [Normal decode])
>         Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
>         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>         Latency: 0, Cache Line Size: 64 bytes
>         Bus: primary=00, secondary=11, subordinate=16, sec-latency=0
>         I/O behind bridge: 0000c000-0000dfff
>         Memory behind bridge: f6c00000-f7cfffff
>         Prefetchable memory behind bridge: 00000000f0000000-00000000f10fffff
>         Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
>         BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
>                 PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
>         Capabilities: [40] Express (v2) Root Port (Slot+), MSI 00
>                 DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
>                         ExtTag- RBE+ FLReset-
>                 DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
>                         RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
>                         MaxPayload 128 bytes, MaxReadReq 128 bytes
>                 DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
> -               LnkCap: Port #8, Speed 5GT/s, Width x1, ASPM L0s L1, Latency L0 <1us, L1 <16us
> +               LnkCap: Port #8, Speed 5GT/s, Width x1, ASPM L0s L1, Latency L0 <512ns, L1 <16us

L0 exit latency depends on clock configuration, which we changed, so this
difference is normal.

>                         ClockPM- Surprise- LLActRep+ BwNot-
> -               LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- Retrain- CommClk-
> +               LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes Disabled- Retrain- CommClk+

Here's where we changed the clock config.  This is part of ASPM, and these
differences look normal.

>                         ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
> -               LnkSta: Speed 2.5GT/s, Width x0, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
> +               LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt-

This looks normal.

>                 SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug+ Surprise+
>                         Slot #7, PowerLimit 10.000W; Interlock- NoCompl+
>                 SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet+ CmdCplt- HPIrq+ LinkChg-
>                         Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock-
> -               SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet- Interlock-
> +               SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock-

This looks normal.

>                         Changed: MRL- PresDet- LinkState-
>                 RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna- CRSVisible-
>                 RootCap: CRSVisible-
>                 RootSta: PME ReqID 0000, PMEStatus- PMEPending-
>                 DevCap2: Completion Timeout: Range BC, TimeoutDis+, LTR-, OBFF Not Supported ARIFwd-
>                 DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled ARIFwd-
>                 LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
>                          Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
>                          Compliance De-emphasis: -6dB
>                 LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1-
>                          EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
>         Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-
>                 Address: fee00318  Data: 0000
>         Capabilities: [90] Subsystem: Dell Device 04b3
>         Capabilities: [a0] Power Management version 2
>                 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
>                 Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
>         Kernel driver in use: pcieport
>  
> 
> and the card itself is detected:
> 
> +11:00.0 USB controller: NEC Corporation uPD720200 USB 3.0 Host Controller (rev 03) (prog-if 30 [XHCI])
> +       Subsystem: NEC Corporation uPD720200 USB 3.0 Host Controller
> +       Physical Slot: 7
> +       Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
> +       Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
> +       Latency: 0, Cache Line Size: 64 bytes
> +       Interrupt: pin A routed to IRQ 19
> +       Region 0: Memory at f6c00000 (64-bit, non-prefetchable) [size=8K]
> +       Capabilities: [50] Power Management version 3
> +               Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold-)
> +               Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
> +       Capabilities: [70] MSI: Enable- Count=1/8 Maskable- 64bit+
> +               Address: 0000000000000000  Data: 0000
> +       Capabilities: [90] MSI-X: Enable+ Count=8 Masked-
> +               Vector table: BAR=0 offset=00001000
> +               PBA: BAR=0 offset=00001080
> +       Capabilities: [a0] Express (v2) Endpoint, MSI 00
> +               DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
> +                       ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
> +               DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
> +                       RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+
> +                       MaxPayload 128 bytes, MaxReadReq 512 bytes
> +               DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
> +               LnkCap: Port #0, Speed 5GT/s, Width x1, ASPM L0s L1, Latency L0 <4us, L1 unlimited
> +                       ClockPM+ Surprise- LLActRep- BwNot-
> +               LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes Disabled- Retrain- CommClk+
> +                       ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
> +               LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
> +               DevCap2: Completion Timeout: Not Supported, TimeoutDis+, LTR+, OBFF Not Supported
> +               DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
> +               LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
> +                        Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
> +                        Compliance De-emphasis: -6dB
> +               LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1-
> +                        EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
> +       Capabilities: [100 v1] Advanced Error Reporting
> +               UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> +               UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> +               UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
> +               CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
> +               CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
> +               AERCap: First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
> +       Capabilities: [140 v1] Device Serial Number ff-ff-ff-ff-ff-ff-ff-ff
> +       Capabilities: [150 v1] Latency Tolerance Reporting
> +               Max snoop latency: 0ns
> +               Max no snoop latency: 0ns
> +       Kernel driver in use: xhci_hcd
> 
> 
> After eject of the card the system differs from a cold-boot state so I get some diffs those are maybe irrelevant?:
> 
> # diff -u30 -w lspci_vvv_initial.txt lspci_vvv_initial__inserted__ejected.txt 
> --- lspci_vvv_initial.txt       2013-11-26 01:12:19.260000366 +0100
> +++ lspci_vvv_initial__inserted__ejected.txt    2013-11-26 01:13:42.060000582 +0100
>  00:1c.7 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 8 (rev b5) (prog-if 00 [Normal decode])
>         Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
>         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>         Latency: 0, Cache Line Size: 64 bytes
>         Bus: primary=00, secondary=11, subordinate=16, sec-latency=0
>         I/O behind bridge: 0000c000-0000dfff
>         Memory behind bridge: f6c00000-f7cfffff
>         Prefetchable memory behind bridge: 00000000f0000000-00000000f10fffff
> -       Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
> +       Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ <SERR- <PERR-

This is https://bugzilla.kernel.org/show_bug.cgi?id=65511

>         BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
>                 PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
>         Capabilities: [40] Express (v2) Root Port (Slot+), MSI 00
>                 DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
>                         ExtTag- RBE+ FLReset-
>                 DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
>                         RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
>                         MaxPayload 128 bytes, MaxReadReq 128 bytes
>                 DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
> -               LnkCap: Port #8, Speed 5GT/s, Width x1, ASPM L0s L1, Latency L0 <1us, L1 <16us
> +               LnkCap: Port #8, Speed 5GT/s, Width x1, ASPM L0s L1, Latency L0 <512ns, L1 <16us
>                         ClockPM- Surprise- LLActRep+ BwNot-
> -               LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- Retrain- CommClk-
> +               LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+

If we restored the ASPM & clock config after removal, these differences
would probably go away.  We could try to do that, but I think there are
more important ASPM things to worry about.

>                         ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
> -               LnkSta: Speed 2.5GT/s, Width x0, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
> +               LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt+ ABWMgmt-

Linux currently doesn't have monitor these Link Bandwith management things:
we don't enable the interrupts in the Link Control register, and there's no
ISR that would read and clear these Link Status bits.  It might be nice to
do that, but again, there's no pressing need for it.

>                 SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug+ Surprise+
>                         Slot #7, PowerLimit 10.000W; Interlock- NoCompl+
>                 SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet+ CmdCplt- HPIrq+ LinkChg-
>                         Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock-
>                 SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet- Interlock-
>                         Changed: MRL- PresDet- LinkState-
>                 RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna- CRSVisible-
>                 RootCap: CRSVisible-
>                 RootSta: PME ReqID 0000, PMEStatus- PMEPending-
>                 DevCap2: Completion Timeout: Range BC, TimeoutDis+, LTR-, OBFF Not Supported ARIFwd-
>                 DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled ARIFwd-
>                 LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
>                          Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
>                          Compliance De-emphasis: -6dB
>                 LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1-
>                          EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
>         Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-
>                 Address: fee00318  Data: 0000
>         Capabilities: [90] Subsystem: Dell Device 04b3
>         Capabilities: [a0] Power Management version 2
>                 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
>                 Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
>         Kernel driver in use: pcieport
> 
> 
> Notably, upon eject the status line of 00:1c.7 from vanilla 3.12 was always stuck at plus while the patched kernel flipped it back to minus:
> -Changed: MRL- PresDet- LinkState+
> +Changed: MRL- PresDet- LinkState-
> 
> So, I think your patch works fine.
> ...

> If it is clearer, a view froma diffent angle comparing vanilla and patched kernel states after
> very first hotinsert:
> 
> # diff -u40 -w 3.12/cold_boot_no_card_in_slot/lspci_vvv_initial__inserted.txt 3.12-linkchangepatch/cold_boot_no_card_in_slot/lspci_vvv_initial__inserted.txt 
> --- 3.12/cold_boot_no_card_in_slot/lspci_vvv_initial__inserted.txt      2013-11-20 19:34:47.990000788 +0100
> +++ 3.12-linkchangepatch/cold_boot_no_card_in_slot/lspci_vvv_initial__inserted.txt      2013-11-26 01:13:02.780000480 +0100
>  00:1c.7 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 8 (rev b5) (prog-if 00 [Normal decode])
>         Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
>         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>         Latency: 0, Cache Line Size: 64 bytes
>         Bus: primary=00, secondary=11, subordinate=16, sec-latency=0
>         I/O behind bridge: 0000c000-0000dfff
>         Memory behind bridge: f6c00000-f7cfffff
>         Prefetchable memory behind bridge: 00000000f0000000-00000000f10fffff
>         Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
>         BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
>                 PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
>         Capabilities: [40] Express (v2) Root Port (Slot+), MSI 00
>                 DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
>                         ExtTag- RBE+ FLReset-
>                 DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
>                         RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
>                         MaxPayload 128 bytes, MaxReadReq 128 bytes
>                 DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
>                 LnkCap: Port #8, Speed 5GT/s, Width x1, ASPM L0s L1, Latency L0 <512ns, L1 <16us
>                         ClockPM- Surprise- LLActRep+ BwNot-
>                 LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes Disabled- Retrain- CommClk+
>                         ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
>                 LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt-
>                 SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug+ Surprise+
>                         Slot #7, PowerLimit 10.000W; Interlock- NoCompl+
>                 SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet+ CmdCplt- HPIrq+ LinkChg-
>                         Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock-
>                 SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock-
> -                       Changed: MRL- PresDet- LinkState+
> +                       Changed: MRL- PresDet- LinkState-
>                 RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna- CRSVisible-
>                 RootCap: CRSVisible-
>                 RootSta: PME ReqID 0000, PMEStatus- PMEPending-
>                 DevCap2: Completion Timeout: Range BC, TimeoutDis+, LTR-, OBFF Not Supported ARIFwd-
>                 DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled ARIFwd-
>                 LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
>                          Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
>                          Compliance De-emphasis: -6dB
>                 LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1-
>                          EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
>         Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-
>                 Address: fee00318  Data: 0000
>         Capabilities: [90] Subsystem: Dell Device 04b3
>         Capabilities: [a0] Power Management version 2
>                 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
>                 Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
>         Kernel driver in use: pcieport
> 
> 
> 
> I read a bit too quickly your comments in the patch and while am no kernel developer I will post
> silly questions. Maybe you would like to polish the comments in patches.
> 1. You speak about some 'bells and whistles' ... you mean PresDet bit handled by my SandyBridge chip
> is also one of those fancy things like an eject button? The LinkState is another example of fancy thing?
> Why aren't or weren't these supported before some eject buttons you speak about?

I think this is just a consequence of the fact that we typically add
support for things as we need them, and people haven't needed LinkState in
the past.  Rajat has a somewhat unusual system that needs it, so he's now
adding support for it.

> 2. Concerning eject detection, is there any order of precedence of the information being collected
> and interpreted? Like, is PresDet more important than LinkChange or some "eject button" of whatever
> you mean under thos 'bells and whistles'?
> 
> 3. Related to point 2., should xhci_hcd behave in a different way and say let more happily a USB device
> fall asleep if the LinkState is now being cleared? (Note: to do the above tests I had to redo my test
> of teh patched kernel because I forgot that likely while testing vanilla kernel I had attached USB
> mouse and keyboard to my SandyBridge chip and the TexaInstruments chip. While inspecting lspci differences
> I wondered why in unpatched kernel I had changes between ASPM L0s L1 vs. Disabled while on patched
> kernel the insert/eject changes resulted in ASPM L0s vs L1 only difference. It seem that was because
> I did not have attached USB devices to the ports. But, funny is that these differences were on all
> USB-capable PCI devices (hence am speaking about SandyBridge's USB2 ports, the onboard TexasInstuments
> USB3 chip while affected should have been only the NEC-based USB3 chip presenton the hotpluggable card --
> or its rootport. So, I somehow suspect your patch changed a bit more in behavior or my system.
> 
> 
> I do not want to spoil this message thread anymore, try to check this ASPM stuff yourself if you care.
> I am not the right person to test this. ;-)
> 
> LinkState is being changed and that is enough for my curiosity ATM. ;) You can add my Tested-by:
> if you do not wonder why there is no LinkState+ in my tests. If anybody wants tar.bz2 of the collected
> logs please email me. I have attached only the very last dmesg collected covering all the tests.
> 
> Thank you,
> Martin

> [  174.623769] pciehp 0000:00:1c.7:pcie04: pcie_isr: intr_loc 108
> [  174.623781] pciehp 0000:00:1c.7:pcie04: Presence/Notify input change
> [  174.623784] pciehp 0000:00:1c.7:pcie04: Card present on Slot(7)
> [  174.623798] pciehp 0000:00:1c.7:pcie04: Data Link Layer State change
> [  174.623801] pciehp 0000:00:1c.7:pcie04: pciehp_check_link_active: lnk_status = 7011
> [  174.623802] pciehp 0000:00:1c.7:pcie04: slot(7): Link Up event
> [  174.624040] pciehp 0000:00:1c.7:pcie04: Surprise Removal
> [  174.624075] pciehp 0000:00:1c.7:pcie04: Link Up event ignored on slot(7): already powering on
> [  174.625660] pciehp 0000:00:1c.7:pcie04: Enabling domain:bus:device=0000:11:00
> [  174.625690] pciehp 0000:00:1c.7:pcie04: pciehp_check_link_active: lnk_status = 7011
> [  174.726003] pciehp 0000:00:1c.7:pcie04: pciehp_check_link_status: lnk_status = 7011

These messages are more verbose and possibly more alarming than necessary.
We should clean this up somehow.  Some of it is probably debug stuff that
is no longer relevant.  But I guess you're booting with "pciehp_debug=1",
so maybe the normal messages aren't so verbose.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rajat Jain Nov. 26, 2013, 5:53 a.m. UTC | #2
Hello,

> -----Original Message-----
> From: Bjorn Helgaas [mailto:bhelgaas@google.com]
> Sent: Monday, November 25, 2013 7:13 PM
> To: Martin Mokrejs
> Cc: Rajat Jain; linux-pci@vger.kernel.org; Rajat Jain; Guenter Roeck
> Subject: Re: pciehp LinkState
> 
> On Tue, Nov 26, 2013 at 02:06:03AM +0100, Martin Mokrejs wrote:
> > Hi Rajat and Bjorn,
> >   I tested all the patches, comments below:

Thanks a lot! This was very helpful.

> >
> > Bjorn Helgaas wrote:
> > > On Mon, Nov 25, 2013 at 07:00:01PM +0000, Rajat Jain wrote:
> > >> Hello Martin,
> > >>
> > >>>>
> > >>>> Below is the overview since cold boot with no card inserted with
> all
> > >>> those hotplug events. You see,
> > >>>> pciehp never sets LinkState back to minus once the slot is empty:
> > >>>>
> > >>>> vostro ~ # grep LinkState
> > >>>
> pciehp/3.12/cold_boot_no_card_in_slot/lspci_vvv_initial__inserted__eject
> > >>> ed_but_not_realized__inserted__ejected.txt
> > >>>>                         Changed: MRL- PresDet- LinkState-
> > >>>>                         Changed: MRL- PresDet- LinkState+
> > >>>>                         Changed: MRL- PresDet- LinkState+
> > >>>>                         Changed: MRL- PresDet- LinkState+
> > >>>>                         Changed: MRL- PresDet- LinkState+
> > >>>
> > >>> I opened https://bugzilla.kernel.org/show_bug.cgi?id=65521 for
> this
> > >>> issue.  I agree that pciehp should probably clear this bit.  Rajat
> > >>> recently posted some patches [1] that add support for link state
> > >>> changes.  I haven't look at them in detail yet, but I wouldn't be
> > >>> surprised if they will fix this issue.  If you wanted to test
> them,
> > >>> that would be awesome!
> >
> > In brief, the patch works, note below on every 5th line there is
> LinkState-,
> > so the LinkState is not stuck at plus anymore. Why I never see it plus
> > I do not know, maybe it is cleared too quickly?
> >
> > # grep Changed lspci_vvv_initial* | grep LinkState
> > lspci_vvv_initial.txt:                  Changed: MRL- PresDet-
> LinkState-
> > lspci_vvv_initial.txt:                  Changed: MRL- PresDet-
> LinkState+
> > lspci_vvv_initial.txt:                  Changed: MRL- PresDet-
> LinkState+
> > lspci_vvv_initial.txt:                  Changed: MRL- PresDet-
> LinkState+
> > lspci_vvv_initial.txt:                  Changed: MRL- PresDet-
> LinkState-
> 
> 
> Great, thanks for testing this.  The bit labelled "LinkState" here is
> the
> "Data Link Layer State *Changed*" bit in the Slot Status register.  It
> is
> set whenever the Data Link Layer Link Active bit in the Link Status
> register is changed.  So this LinkState bit doesn't tell you the state
> of
> the link; basically it just tells you that the link changed from "down"
> to
> "up", or from "up" to "down."
> 
> You should not normally see this "LinkStateChanged" bit set via lspci
> because pciehp will get an interrupt when it is set, and the ISR will
> notice that the link state has changed and immediately clear the
> LinkStateChanged bit.

Martin: You mention "LinkState-" on every 5th line of the log, I am assuming that the remaining 4 lines (specially the ones with "LinkState+") are for PCI bridges that are not handled by the pciehp - may be because they do not represent hot-pluggable ports. With this patch, I'll be surprised if you ever saw "LinkState+" in the lspci output of a bridge for hot-pluggable port.

> >
> >
> > I read a bit too quickly your comments in the patch and while am no
> kernel developer I will post
> > silly questions. Maybe you would like to polish the comments in
> patches.
> > 1. You speak about some 'bells and whistles' ... you mean PresDet bit
> handled by my SandyBridge chip
> > is also one of those fancy things like an eject button? The LinkState
> is another example of fancy thing?
> > Why aren't or weren't these supported before some eject buttons you
> speak about?

The PresDet changed bit *IS* supported today - however only if the port supports "Surprise" addition or removal. My patch is aimed at systems that do not even have that in place, and want to only rely on link state changes. 

> 
> I think this is just a consequence of the fact that we typically add
> support for things as we need them, and people haven't needed LinkState
> in
> the past.  Rajat has a somewhat unusual system that needs it, so he's
> now
> adding support for it.

:-)

> 
> > 2. Concerning eject detection, is there any order of precedence of the
> information being collected
> > and interpreted? Like, is PresDet more important than LinkChange or
> some "eject button" of whatever
> > you mean under those 'bells and whistles'.

No, all events are processed eventually, and there is no precedence as such. The only "ordering" (if you may call it so) is the order in which handler functions are called from pcie_isr():

      /* Check Command Complete Interrupt Pending */
        if (intr_loc & PCI_EXP_SLTSTA_CC) {
                ctrl->cmd_busy = 0;
                smp_mb();
                wake_up(&ctrl->queue);
        }

        if (!(intr_loc & ~PCI_EXP_SLTSTA_CC))
                return IRQ_HANDLED;

        /* Check MRL Sensor Changed */
        if (intr_loc & PCI_EXP_SLTSTA_MRLSC)
                pciehp_handle_switch_change(slot);

        /* Check Attention Button Pressed */
        if (intr_loc & PCI_EXP_SLTSTA_ABP)
                pciehp_handle_attention_button(slot);

        /* Check Presence Detect Changed */
        if (intr_loc & PCI_EXP_SLTSTA_PDC)
                pciehp_handle_presence_change(slot);

        /* Check Power Fault Detected */
        if ((intr_loc & PCI_EXP_SLTSTA_PFD) && !ctrl->power_fault_detected) {
                ctrl->power_fault_detected = 1;
                pciehp_handle_power_fault(slot);
        }

        if (intr_loc & PCI_EXP_SLTSTA_DLLSC)
                pciehp_handle_linkstate_change(slot);


> >
> > 3. Related to point 2., should xhci_hcd behave in a different way and
> say let more happily a USB device
> > fall asleep if the LinkState is now being cleared? (Note: to do the
> above tests I had to redo my test
> > of teh patched kernel because I forgot that likely while testing
> vanilla kernel I had attached USB
> > mouse and keyboard to my SandyBridge chip and the TexaInstruments
> chip. While inspecting lspci differences
> > I wondered why in unpatched kernel I had changes between ASPM L0s L1
> vs. Disabled while on patched
> > kernel the insert/eject changes resulted in ASPM L0s vs L1 only
> difference. It seem that was because
> > I did not have attached USB devices to the ports. But, funny is that
> these differences were on all
> > USB-capable PCI devices (hence am speaking about SandyBridge's USB2
> ports, the onboard TexasInstuments
> > USB3 chip while affected should have been only the NEC-based USB3 chip
> presenton the hotpluggable card --
> > or its rootport. So, I somehow suspect your patch changed a bit more
> in behavior or my system.

Ummm, I doubt if my patch could have any effect at all on the ASPM, but I'll let more experts comment. 

> >
> >
> > I do not want to spoil this message thread anymore, try to check this
> ASPM stuff yourself if you care.
> > I am not the right person to test this. ;-)
> >
> > LinkState is being changed and that is enough for my curiosity ATM. ;)
> You can add my Tested-by:
> > if you do not wonder why there is no LinkState+ in my tests. If
> anybody wants tar.bz2 of the collected
> > logs please email me. I have attached only the very last dmesg
> collected covering all the tests.
> >
> > Thank you,
> > Martin
> 
> > [  174.623769] pciehp 0000:00:1c.7:pcie04: pcie_isr: intr_loc 108
> > [  174.623781] pciehp 0000:00:1c.7:pcie04: Presence/Notify input
> change
> > [  174.623784] pciehp 0000:00:1c.7:pcie04: Card present on Slot(7)
> > [  174.623798] pciehp 0000:00:1c.7:pcie04: Data Link Layer State
> change
> > [  174.623801] pciehp 0000:00:1c.7:pcie04: pciehp_check_link_active:
> lnk_status = 7011
> > [  174.623802] pciehp 0000:00:1c.7:pcie04: slot(7): Link Up event
> > [  174.624040] pciehp 0000:00:1c.7:pcie04: Surprise Removal
> > [  174.624075] pciehp 0000:00:1c.7:pcie04: Link Up event ignored on
> slot(7): already powering on
> > [  174.625660] pciehp 0000:00:1c.7:pcie04: Enabling
> domain:bus:device=0000:11:00
> > [  174.625690] pciehp 0000:00:1c.7:pcie04: pciehp_check_link_active:
> lnk_status = 7011
> > [  174.726003] pciehp 0000:00:1c.7:pcie04: pciehp_check_link_status:
> lnk_status = 7011
> 
> These messages are more verbose and possibly more alarming than
> necessary.
> We should clean this up somehow.  Some of it is probably debug stuff
> that
> is no longer relevant.  But I guess you're booting with
> "pciehp_debug=1",
> so maybe the normal messages aren't so verbose.

Bjorn: Actually this threads brings up an good point. Is "DLL State changed" bit in "Slot Status" ALWAYS updated without regard to whether the Link state notifications were enabled or not? Looks like it. If yes, I'll take this as an input to make a 1 line change to my patch:

Currently in my patch, the pciehp_use_link_events=1 is used to only enable link state change notifications, but in case pcie_isr() finds "DLL changed" bit set in "Slot status", it processes the link state change. I'll make the single line change to pcie_isr() to process the event only if pciehp_use_link_events=1. Let me know if this sounds alright.

I'd still wait for other comments before sending out a revised version of the patch though.

Thanks both Bjorn & Martin for paying attention to this patch set!

Rajat

> 
> Bjorn
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Martin Mokrejs Nov. 26, 2013, 8:40 a.m. UTC | #3
Bjorn Helgaas wrote:
> On Tue, Nov 26, 2013 at 02:06:03AM +0100, Martin Mokrejs wrote:

Bjorn, thanks for all you clarifications, they are helpful.

> 
>> [  174.623769] pciehp 0000:00:1c.7:pcie04: pcie_isr: intr_loc 108
>> [  174.623781] pciehp 0000:00:1c.7:pcie04: Presence/Notify input change
>> [  174.623784] pciehp 0000:00:1c.7:pcie04: Card present on Slot(7)
>> [  174.623798] pciehp 0000:00:1c.7:pcie04: Data Link Layer State change
>> [  174.623801] pciehp 0000:00:1c.7:pcie04: pciehp_check_link_active: lnk_status = 7011
>> [  174.623802] pciehp 0000:00:1c.7:pcie04: slot(7): Link Up event
>> [  174.624040] pciehp 0000:00:1c.7:pcie04: Surprise Removal
>> [  174.624075] pciehp 0000:00:1c.7:pcie04: Link Up event ignored on slot(7): already powering on
>> [  174.625660] pciehp 0000:00:1c.7:pcie04: Enabling domain:bus:device=0000:11:00
>> [  174.625690] pciehp 0000:00:1c.7:pcie04: pciehp_check_link_active: lnk_status = 7011
>> [  174.726003] pciehp 0000:00:1c.7:pcie04: pciehp_check_link_status: lnk_status = 7011
> 
> These messages are more verbose and possibly more alarming than necessary.
> We should clean this up somehow.  Some of it is probably debug stuff that
> is no longer relevant.  But I guess you're booting with "pciehp_debug=1",
> so maybe the normal messages aren't so verbose.

Yes, I have debug enabled (kernel config line should be in dmesg somewhere), I am glad debug
output says more, it helps when there is a problem popping up in future. ;-)

Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Martin Mokrejs Nov. 26, 2013, 10:40 a.m. UTC | #4
Hi Rajat,

Rajat Jain wrote:
>>> Bjorn Helgaas wrote:
>>>> On Mon, Nov 25, 2013 at 07:00:01PM +0000, Rajat Jain wrote:
>>>>> Hello Martin,
>>>>>
>>>>>>>
>>>>>>> Below is the overview since cold boot with no card inserted with
>> all
>>>>>> those hotplug events. You see,
>>>>>>> pciehp never sets LinkState back to minus once the slot is empty:
>>>>>>>
>>>>>>> vostro ~ # grep LinkState
>>>>>>
>> pciehp/3.12/cold_boot_no_card_in_slot/lspci_vvv_initial__inserted__eject
>>>>>> ed_but_not_realized__inserted__ejected.txt
>>>>>>>                         Changed: MRL- PresDet- LinkState-
>>>>>>>                         Changed: MRL- PresDet- LinkState+
>>>>>>>                         Changed: MRL- PresDet- LinkState+
>>>>>>>                         Changed: MRL- PresDet- LinkState+
>>>>>>>                         Changed: MRL- PresDet- LinkState+
>>>>>>
>>>>>> I opened https://bugzilla.kernel.org/show_bug.cgi?id=65521 for
>> this
>>>>>> issue.  I agree that pciehp should probably clear this bit.  Rajat
>>>>>> recently posted some patches [1] that add support for link state
>>>>>> changes.  I haven't look at them in detail yet, but I wouldn't be
>>>>>> surprised if they will fix this issue.  If you wanted to test
>> them,
>>>>>> that would be awesome!
>>>
>>> In brief, the patch works, note below on every 5th line there is
>> LinkState-,
>>> so the LinkState is not stuck at plus anymore. Why I never see it plus
>>> I do not know, maybe it is cleared too quickly?
>>>
>>> # grep Changed lspci_vvv_initial* | grep LinkState
>>> lspci_vvv_initial.txt:                  Changed: MRL- PresDet-
>> LinkState-
>>> lspci_vvv_initial.txt:                  Changed: MRL- PresDet-
>> LinkState+
>>> lspci_vvv_initial.txt:                  Changed: MRL- PresDet-
>> LinkState+
>>> lspci_vvv_initial.txt:                  Changed: MRL- PresDet-
>> LinkState+
>>> lspci_vvv_initial.txt:                  Changed: MRL- PresDet-
>> LinkState-
>>
>>
>> Great, thanks for testing this.  The bit labelled "LinkState" here is
>> the
>> "Data Link Layer State *Changed*" bit in the Slot Status register.  It
>> is
>> set whenever the Data Link Layer Link Active bit in the Link Status
>> register is changed.  So this LinkState bit doesn't tell you the state
>> of
>> the link; basically it just tells you that the link changed from "down"
>> to
>> "up", or from "up" to "down."
>>
>> You should not normally see this "LinkStateChanged" bit set via lspci
>> because pciehp will get an interrupt when it is set, and the ISR will
>> notice that the link state has changed and immediately clear the
>> LinkStateChanged bit.
> 
> Martin: You mention "LinkState-" on every 5th line of the log, I am assuming that the remaining 4 lines (specially the ones with "LinkState+") are for PCI bridges that are not handled by the pciehp - may be because they do not represent hot-pluggable ports. With this patch, I'll be surprised if you ever saw "LinkState+" in the lspci output of a bridge for hot-pluggable port.

Note to others, I sent to Bjorn and Rajat the collected data files so I believe
one of you can post an answer on my behalf. ;) Thank you. :-))

> 
>>>
>>>
>>> I read a bit too quickly your comments in the patch and while am no
>> kernel developer I will post
>>> silly questions. Maybe you would like to polish the comments in
>> patches.
>>> 1. You speak about some 'bells and whistles' ... you mean PresDet bit
>> handled by my SandyBridge chip
>>> is also one of those fancy things like an eject button? The LinkState
>> is another example of fancy thing?
>>> Why aren't or weren't these supported before some eject buttons you
>> speak about?
> 
> The PresDet changed bit *IS* supported today - however only if the port supports "Surprise" addition or removal. My patch is aimed at systems that do not even have that in place, and want to only rely on link state changes. 

OK, about 1.5 year ago it seemed my system has broken PresDet. I think it works
fine in general, I have mixed up my opinions on that meanwhile but lets say in
brief that if say PresDet is broken a valid question is whether LinkState
recognition will help to overcome my say broken HW or BIOS. That is what I meant
with the "order of precedence" question. ;) In some scenarios only every second
eject is noticed on my system but that depends on the ExpressCard being removed,
and it always seemed to me related/caused by its power saving state capabilities.
I will get to slowly re-test in more detail pciehp and acpiphp, not sure if the
every even eject was broken under pciehp or acpiphp.
As a final note, this was another reason why I was curious now about the ASPM
differences caused by your patch, especially because it was exactly this NEC-based
USB3 card displaying the issues, unlike two other express cards.


> 
>>
>> I think this is just a consequence of the fact that we typically add
>> support for things as we need them, and people haven't needed LinkState
>> in
>> the past.  Rajat has a somewhat unusual system that needs it, so he's
>> now
>> adding support for it.
> 
> :-)
> 
>>
>>> 2. Concerning eject detection, is there any order of precedence of the
>> information being collected
>>> and interpreted? Like, is PresDet more important than LinkChange or
>> some "eject button" of whatever
>>> you mean under those 'bells and whistles'.
> 
> No, all events are processed eventually, and there is no precedence as such. The only "ordering" (if you may call it so) is the order in which handler functions are called from pcie_isr():
> 
>       /* Check Command Complete Interrupt Pending */
>         if (intr_loc & PCI_EXP_SLTSTA_CC) {
>                 ctrl->cmd_busy = 0;
>                 smp_mb();
>                 wake_up(&ctrl->queue);
>         }
> 
>         if (!(intr_loc & ~PCI_EXP_SLTSTA_CC))
>                 return IRQ_HANDLED;
> 
>         /* Check MRL Sensor Changed */
>         if (intr_loc & PCI_EXP_SLTSTA_MRLSC)
>                 pciehp_handle_switch_change(slot);
> 
>         /* Check Attention Button Pressed */
>         if (intr_loc & PCI_EXP_SLTSTA_ABP)
>                 pciehp_handle_attention_button(slot);
> 
>         /* Check Presence Detect Changed */
>         if (intr_loc & PCI_EXP_SLTSTA_PDC)
>                 pciehp_handle_presence_change(slot);
> 
>         /* Check Power Fault Detected */
>         if ((intr_loc & PCI_EXP_SLTSTA_PFD) && !ctrl->power_fault_detected) {
>                 ctrl->power_fault_detected = 1;
>                 pciehp_handle_power_fault(slot);
>         }
> 
>         if (intr_loc & PCI_EXP_SLTSTA_DLLSC)
>                 pciehp_handle_linkstate_change(slot);


So if PresDet is not set to minus if a card is ejected (say by broken SandyBridge)
but LinkState is flipped to minus, will Linux realize the card is removed?

Could there be a check, a sanity check, verifying that the values are either both
LinkState+ and PresDet+ or both PresDet- and LinkState-? If it is meaningful.
No matter to me if that is about the Changed or Status bits, ideally check both for
sanity. Just a Debug message if they do not correspond to each other.

> 
> 
>>>
>>> 3. Related to point 2., should xhci_hcd behave in a different way and
>> say let more happily a USB device
>>> fall asleep if the LinkState is now being cleared? (Note: to do the
>> above tests I had to redo my test
>>> of teh patched kernel because I forgot that likely while testing
>> vanilla kernel I had attached USB
>>> mouse and keyboard to my SandyBridge chip and the TexaInstruments
>> chip. While inspecting lspci differences
>>> I wondered why in unpatched kernel I had changes between ASPM L0s L1
>> vs. Disabled while on patched
>>> kernel the insert/eject changes resulted in ASPM L0s vs L1 only
>> difference. It seem that was because
>>> I did not have attached USB devices to the ports. But, funny is that
>> these differences were on all
>>> USB-capable PCI devices (hence am speaking about SandyBridge's USB2
>> ports, the onboard TexasInstuments
>>> USB3 chip while affected should have been only the NEC-based USB3 chip
>> presenton the hotpluggable card --
>>> or its rootport. So, I somehow suspect your patch changed a bit more
>> in behavior or my system.
> 
> Ummm, I doubt if my patch could have any effect at all on the ASPM, but I'll let more experts comment.

I fear you will have to check. USB devs always told me PCI/PCIe hotplug is
broken on my system so they could do anything in that regard.


Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas Nov. 26, 2013, 3:50 p.m. UTC | #5
On Mon, Nov 25, 2013 at 10:53 PM, Rajat Jain <rajatjain@juniper.net> wrote:

> Bjorn: Actually this threads brings up an good point. Is "DLL State changed" bit in "Slot Status" ALWAYS updated without regard to whether the Link state notifications were enabled or not?

That's the way I read it.  I assume you're referring to the "DLL State
Changed Enable" bit in Slot Control.  I don't see anything in the spec
about "DLL State Changed" depending on that.

> Looks like it. If yes, I'll take this as an input to make a 1 line change to my patch:
>
> Currently in my patch, the pciehp_use_link_events=1 is used to only enable link state change notifications, but in case pcie_isr() finds "DLL changed" bit set in "Slot status", it processes the link state change. I'll make the single line change to pcie_isr() to process the event only if pciehp_use_link_events=1. Let me know if this sounds alright.

If we can avoid it, I'd prefer not to add the "pciehp_use_link_events"
module parameter.  What would break if we just *always* used link
events?

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rajat Jain Nov. 26, 2013, 10:06 p.m. UTC | #6
> -----Original Message-----
> From: Bjorn Helgaas [mailto:bhelgaas@google.com]
> 
> >
> > Currently in my patch, the pciehp_use_link_events=1 is used to only
> enable link state change notifications, but in case pcie_isr() finds
> "DLL changed" bit set in "Slot status", it processes the link state
> change. I'll make the single line change to pcie_isr() to process the
> event only if pciehp_use_link_events=1. Let me know if this sounds
> alright.
> 
> If we can avoid it, I'd prefer not to add the "pciehp_use_link_events"
> module parameter.  What would break if we just *always* used link
> events?

Not really "break", but I'd think some changes in behavior shall be visible:

1) When a card is pulled out (thus causing the link to go down), it shall always be disabled - whether or not it supports "surprise removal" (in the capabilities). I think this would be the right thing to do in this situation.

2) Even other scenarios where the link may go down - e.g. device reset using an external reset pin, or a firmware initiated reset etc, shall cause the device to be immediately removed and re-added again when the link comes back up). This may also be desired.

3) If the link comes up automatically as soon as a card is plugged in (i.e. without the need to turn on power or do anything else from SW), the card shall be immediately added - even if surprise capability is not present. I'm not sure if this is desired. That's because the current behavior is that for slots that do not support surprise events, the user has to initiate the hot-add using the attention button. Even after that, he gets a 5 second time to cancel the operation in case he wishes. This will no longer be the case, if use of link events is always enabled.

I think it would be fair to say that if use of link events is always supported, it would be roughly equivalent to a slot supporting surprise events. (Surprise events work on "presence detect" pin, and this will work on the link state changes).

Thanks,

Rajat

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas Nov. 26, 2013, 10:26 p.m. UTC | #7
[+cc Alex]

On Tue, Nov 26, 2013 at 3:06 PM, Rajat Jain <rajatjain@juniper.net> wrote:
>> From: Bjorn Helgaas [mailto:bhelgaas@google.com]
>> > Currently in my patch, the pciehp_use_link_events=1 is used to only
>> enable link state change notifications, but in case pcie_isr() finds
>> "DLL changed" bit set in "Slot status", it processes the link state
>> change. I'll make the single line change to pcie_isr() to process the
>> event only if pciehp_use_link_events=1. Let me know if this sounds
>> alright.
>>
>> If we can avoid it, I'd prefer not to add the "pciehp_use_link_events"
>> module parameter.  What would break if we just *always* used link
>> events?
>
> Not really "break", but I'd think some changes in behavior shall be visible:
>
> 1) When a card is pulled out (thus causing the link to go down), it shall always be disabled - whether or not it supports "surprise removal" (in the capabilities). I think this would be the right thing to do in this situation.

Yep, this sounds like it would be OK.

> 2) Even other scenarios where the link may go down - e.g. device reset using an external reset pin, or a firmware initiated reset etc, shall cause the device to be immediately removed and re-added again when the link comes back up). This may also be desired.

This sounds OK, too.  There's already some fancy dancing for similar
scenarios in pciehp_reset_slot(); maybe that would have to be
extended.  Alex might have input here.

> 3) If the link comes up automatically as soon as a card is plugged in (i.e. without the need to turn on power or do anything else from SW), the card shall be immediately added - even if surprise capability is not present. I'm not sure if this is desired. That's because the current behavior is that for slots that do not support surprise events, the user has to initiate the hot-add using the attention button. Even after that, he gets a 5 second time to cancel the operation in case he wishes. This will no longer be the case, if use of link events is always enabled.

This one sounds like a problem.  But I think we could manage this,
e.g., by noticing the link-up interrupt, but waiting for the 5-second
timeout if an attention button is present.

> I think it would be fair to say that if use of link events is always supported, it would be roughly equivalent to a slot supporting surprise events. (Surprise events work on "presence detect" pin, and this will work on the link state changes).
>
> Thanks,
>
> Rajat
>
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alex Williamson Nov. 26, 2013, 11:12 p.m. UTC | #8
On Tue, 2013-11-26 at 15:26 -0700, Bjorn Helgaas wrote:
> [+cc Alex]
> 
> On Tue, Nov 26, 2013 at 3:06 PM, Rajat Jain <rajatjain@juniper.net> wrote:
> >> From: Bjorn Helgaas [mailto:bhelgaas@google.com]
> >> > Currently in my patch, the pciehp_use_link_events=1 is used to only
> >> enable link state change notifications, but in case pcie_isr() finds
> >> "DLL changed" bit set in "Slot status", it processes the link state
> >> change. I'll make the single line change to pcie_isr() to process the
> >> event only if pciehp_use_link_events=1. Let me know if this sounds
> >> alright.
> >>
> >> If we can avoid it, I'd prefer not to add the "pciehp_use_link_events"
> >> module parameter.  What would break if we just *always* used link
> >> events?
> >
> > Not really "break", but I'd think some changes in behavior shall be visible:
> >
> > 1) When a card is pulled out (thus causing the link to go down), it shall always be disabled - whether or not it supports "surprise removal" (in the capabilities). I think this would be the right thing to do in this situation.
> 
> Yep, this sounds like it would be OK.
> 
> > 2) Even other scenarios where the link may go down - e.g. device reset using an external reset pin, or a firmware initiated reset etc, shall cause the device to be immediately removed and re-added again when the link comes back up). This may also be desired.
> 
> This sounds OK, too.  There's already some fancy dancing for similar
> scenarios in pciehp_reset_slot(); maybe that would have to be
> extended.  Alex might have input here.

So long as reset initiated through pci-core doesn't cause a hotplug I
think I'm ok with it.  I saw code was already being added around slot
reset to mask link changes, just like we do with presence detection.
Thanks,

Alex

> > 3) If the link comes up automatically as soon as a card is plugged in (i.e. without the need to turn on power or do anything else from SW), the card shall be immediately added - even if surprise capability is not present. I'm not sure if this is desired. That's because the current behavior is that for slots that do not support surprise events, the user has to initiate the hot-add using the attention button. Even after that, he gets a 5 second time to cancel the operation in case he wishes. This will no longer be the case, if use of link events is always enabled.
> 
> This one sounds like a problem.  But I think we could manage this,
> e.g., by noticing the link-up interrupt, but waiting for the 5-second
> timeout if an attention button is present.
> 
> > I think it would be fair to say that if use of link events is always supported, it would be roughly equivalent to a slot supporting surprise events. (Surprise events work on "presence detect" pin, and this will work on the link state changes).
> >
> > Thanks,
> >
> > Rajat
> >



--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rajat Jain Nov. 27, 2013, 7:02 a.m. UTC | #9
> > >> > Currently in my patch, the pciehp_use_link_events=1 is used to

> > >> > only

> > >> enable link state change notifications, but in case pcie_isr()

> > >> finds "DLL changed" bit set in "Slot status", it processes the link

> > >> state change. I'll make the single line change to pcie_isr() to

> > >> process the event only if pciehp_use_link_events=1. Let me know if

> > >> this sounds alright.

> > >>

> > >> If we can avoid it, I'd prefer not to add the

> "pciehp_use_link_events"

> > >> module parameter.  What would break if we just *always* used link

> > >> events?

> > >

> > > Not really "break", but I'd think some changes in behavior shall be

> visible:

> > >

> > > 1) When a card is pulled out (thus causing the link to go down), it

> shall always be disabled - whether or not it supports "surprise removal"

> (in the capabilities). I think this would be the right thing to do in

> this situation.

> >

> > Yep, this sounds like it would be OK.

> >

> > > 2) Even other scenarios where the link may go down - e.g. device

> reset using an external reset pin, or a firmware initiated reset etc,

> shall cause the device to be immediately removed and re-added again when

> the link comes back up). This may also be desired.

> >

> > This sounds OK, too.  There's already some fancy dancing for similar

> > scenarios in pciehp_reset_slot(); maybe that would have to be

> > extended.  Alex might have input here.

> 

> So long as reset initiated through pci-core doesn't cause a hotplug I

> think I'm ok with it.  I saw code was already being added around slot

> reset to mask link changes, just like we do with presence detection.


Thanks Alex for taking a look at my patch. Yes, I have taken care in the pciehp_reset_slot().

> Thanks,

> 

> Alex

> 

> > > 3) If the link comes up automatically as soon as a card is plugged

> in (i.e. without the need to turn on power or do anything else from SW),

> the card shall be immediately added - even if surprise capability is not

> present. I'm not sure if this is desired. That's because the current

> behavior is that for slots that do not support surprise events, the user

> has to initiate the hot-add using the attention button. Even after that,

> he gets a 5 second time to cancel the operation in case he wishes. This

> will no longer be the case, if use of link events is always enabled.

> >

> > This one sounds like a problem.  But I think we could manage this,

> > e.g., by noticing the link-up interrupt, but waiting for the 5-second

> > timeout if an attention button is present.


I think that one aspect might still not be right. The spec seems to indicate that the hot-plug should be *initiated* by the press of attention button, and after attention button press user has a time of 5 seconds. So I think if we want to remove that module parameter and also retain attention button behavior, our options might be:

* If attention button is present, do not use link state events all together. (i.e. user has to use attention button always to initiate enabling or disabling device. This'd be compliant with current behavior and expectations I guess). In all the remaining cases, the link events can be used just fine.

* If attention button is present, process LINK-DOWN events, but not LINK-UP events. Any devices whose link suddenly goes down, will be disabled (sounds quite right). But when Link comes back up, they'll not be enabled (although this may not sound right, but this complies with the current behavior and the semantics of attention button).

Or else, we can leave the module parameter in place.

BTW, what are the odds of coming across a slot where the link comes up automatically without the need for SW to do anything (despite having an attention button there), but "surprise" events are not supported :-)? Just wondering if we are discussing a realistic scenario. If we don't see this as realistic, then we can safely drop the module parameter IMHO. 

Thanks,

Rajat




> >

> > > I think it would be fair to say that if use of link events is always

> supported, it would be roughly equivalent to a slot supporting surprise

> events. (Surprise events work on "presence detect" pin, and this will

> work on the link state changes).

> > >

> > > Thanks,

> > >

> > > Rajat

> > >

> 

> 

> 

> --

> To unsubscribe from this list: send the line "unsubscribe linux-pci" in

> the body of a message to majordomo@vger.kernel.org More majordomo info

> at  http://vger.kernel.org/majordomo-info.html

>
Bjorn Helgaas Nov. 27, 2013, 6:16 p.m. UTC | #10
On Wed, Nov 27, 2013 at 12:02 AM, Rajat Jain <rajatjain@juniper.net> wrote:
>
>
>> > >> > Currently in my patch, the pciehp_use_link_events=1 is used to
>> > >> > only
>> > >> enable link state change notifications, but in case pcie_isr()
>> > >> finds "DLL changed" bit set in "Slot status", it processes the link
>> > >> state change. I'll make the single line change to pcie_isr() to
>> > >> process the event only if pciehp_use_link_events=1. Let me know if
>> > >> this sounds alright.
>> > >>
>> > >> If we can avoid it, I'd prefer not to add the
>> "pciehp_use_link_events"
>> > >> module parameter.  What would break if we just *always* used link
>> > >> events?
>> > >
>> > > Not really "break", but I'd think some changes in behavior shall be
>> visible:
>> > >
>> > > 1) When a card is pulled out (thus causing the link to go down), it
>> shall always be disabled - whether or not it supports "surprise removal"
>> (in the capabilities). I think this would be the right thing to do in
>> this situation.
>> >
>> > Yep, this sounds like it would be OK.
>> >
>> > > 2) Even other scenarios where the link may go down - e.g. device
>> reset using an external reset pin, or a firmware initiated reset etc,
>> shall cause the device to be immediately removed and re-added again when
>> the link comes back up). This may also be desired.
>> >
>> > This sounds OK, too.  There's already some fancy dancing for similar
>> > scenarios in pciehp_reset_slot(); maybe that would have to be
>> > extended.  Alex might have input here.
>>
>> So long as reset initiated through pci-core doesn't cause a hotplug I
>> think I'm ok with it.  I saw code was already being added around slot
>> reset to mask link changes, just like we do with presence detection.
>
> Thanks Alex for taking a look at my patch. Yes, I have taken care in the pciehp_reset_slot().
>
>> Thanks,
>>
>> Alex
>>
>> > > 3) If the link comes up automatically as soon as a card is plugged
>> in (i.e. without the need to turn on power or do anything else from SW),
>> the card shall be immediately added - even if surprise capability is not
>> present. I'm not sure if this is desired. That's because the current
>> behavior is that for slots that do not support surprise events, the user
>> has to initiate the hot-add using the attention button. Even after that,
>> he gets a 5 second time to cancel the operation in case he wishes. This
>> will no longer be the case, if use of link events is always enabled.
>> >
>> > This one sounds like a problem.  But I think we could manage this,
>> > e.g., by noticing the link-up interrupt, but waiting for the 5-second
>> > timeout if an attention button is present.
>
> I think that one aspect might still not be right. The spec seems to indicate that the hot-plug should be *initiated* by the press of attention button, and after attention button press user has a time of 5 seconds.

Yes, sorry, my response didn't make sense.  BTW, what spec are you
reading that describes the hot-plug flow?  I don't see many details in
the PCIe spec, so I'm looking at the Standard Hot-Plug Controller
Specification, rev 1.0.  I think PCIe hotplug is modeled after that
and keeps the same UI model.  It might be nice if we had a summary in
Documentation/PCI with pointers to the specs.  The model feels a bit
like folklore that everybody is "supposed to know" but I don't know
how everybody is supposed to learn it.

As I read the SHPC spec, the attention button is basically a request
to apply power to the slot:

  - user inserts a card into a powered-off slot,
  - user presses attention button,
  - OS blinks power indicator,
  - after 5 seconds, OS applies power
  - link becomes up,
  - OS enumerates device,
  - OS turns power indicator on

If user presses attention button during 5-second interval while power
indicator is blinking, operation is cancelled and OS turns off power
indicator and leaves slot powered off.

> So I think if we want to remove that module parameter and also retain attention button behavior, our options might be:
>
> * If attention button is present, do not use link state events all together. (i.e. user has to use attention button always to initiate enabling or disabling device. This'd be compliant with current behavior and expectations I guess). In all the remaining cases, the link events can be used just fine.
>
> * If attention button is present, process LINK-DOWN events, but not LINK-UP events. Any devices whose link suddenly goes down, will be disabled (sounds quite right). But when Link comes back up, they'll not be enabled (although this may not sound right, but this complies with the current behavior and the semantics of attention button).
>
> Or else, we can leave the module parameter in place.
>
> BTW, what are the odds of coming across a slot where the link comes up automatically without the need for SW to do anything (despite having an attention button there), but "surprise" events are not supported :-)? Just wondering if we are discussing a realistic scenario. If we don't see this as realistic, then we can safely drop the module parameter IMHO.

If the link comes up, that means power is applied already, and I don't
know what else the OS could or should do before enumerating the
device.  It seems like this is the case regardless of whether there's
an attention button or other hardware.  So my vote is to just drop the
module parameter.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rajat Jain Nov. 28, 2013, 12:58 a.m. UTC | #11
> >>
> >> > > 3) If the link comes up automatically as soon as a card is
> >> > > plugged
> >> in (i.e. without the need to turn on power or do anything else from
> >> SW), the card shall be immediately added - even if surprise
> >> capability is not present. I'm not sure if this is desired. That's
> >> because the current behavior is that for slots that do not support
> >> surprise events, the user has to initiate the hot-add using the
> >> attention button. Even after that, he gets a 5 second time to cancel
> >> the operation in case he wishes. This will no longer be the case, if
> use of link events is always enabled.
> >> >
> >> > This one sounds like a problem.  But I think we could manage this,
> >> > e.g., by noticing the link-up interrupt, but waiting for the
> >> > 5-second timeout if an attention button is present.
> >
> > I think that one aspect might still not be right. The spec seems to
> indicate that the hot-plug should be *initiated* by the press of
> attention button, and after attention button press user has a time of 5
> seconds.
> 
> Yes, sorry, my response didn't make sense.  BTW, what spec are you
> reading that describes the hot-plug flow?  I don't see many details in
> the PCIe spec, so I'm looking at the Standard Hot-Plug Controller
> Specification, rev 1.0.  I think PCIe hotplug is modeled after that and
> keeps the same UI model.  It might be nice if we had a summary in
> Documentation/PCI with pointers to the specs.  The model feels a bit
> like folklore that everybody is "supposed to know" but I don't know how
> everybody is supposed to learn it.

Ha Ha. Actually right. My knowledge of "spec" is also based mostly on folklore, and reading between the lines of PCIe and SHPC specs.

> 
> As I read the SHPC spec, the attention button is basically a request to
> apply power to the slot:
> 
>   - user inserts a card into a powered-off slot,
>   - user presses attention button,
>   - OS blinks power indicator,
>   - after 5 seconds, OS applies power
>   - link becomes up,
>   - OS enumerates device,
>   - OS turns power indicator on
> 
> If user presses attention button during 5-second interval while power
> indicator is blinking, operation is cancelled and OS turns off power
> indicator and leaves slot powered off.
> 
> > So I think if we want to remove that module parameter and also retain
> attention button behavior, our options might be:
> >
> > * If attention button is present, do not use link state events all
> together. (i.e. user has to use attention button always to initiate
> enabling or disabling device. This'd be compliant with current behavior
> and expectations I guess). In all the remaining cases, the link events
> can be used just fine.
> >
> > * If attention button is present, process LINK-DOWN events, but not
> LINK-UP events. Any devices whose link suddenly goes down, will be
> disabled (sounds quite right). But when Link comes back up, they'll not
> be enabled (although this may not sound right, but this complies with
> the current behavior and the semantics of attention button).
> >
> > Or else, we can leave the module parameter in place.
> >
> > BTW, what are the odds of coming across a slot where the link comes up
> automatically without the need for SW to do anything (despite having an
> attention button there), but "surprise" events are not supported :-)?
> Just wondering if we are discussing a realistic scenario. If we don't
> see this as realistic, then we can safely drop the module parameter
> IMHO.
> 
> If the link comes up, that means power is applied already, and I don't
> know what else the OS could or should do before enumerating the device.
> It seems like this is the case regardless of whether there's an
> attention button or other hardware.  So my vote is to just drop the
> module parameter.

I concur. I'd drop the module parameter. 

Thanks,

Rajat

> 
> Bjorn
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org More majordomo info
> at  http://vger.kernel.org/majordomo-info.html
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/pci/hotplug/pciehp.h b/drivers/pci/hotplug/pciehp.h
index 541bbe6d5343..fc322ed29c56 100644
--- a/drivers/pci/hotplug/pciehp.h
+++ b/drivers/pci/hotplug/pciehp.h
@@ -154,6 +154,7 @@  void pciehp_green_led_on(struct slot *slot);
 void pciehp_green_led_off(struct slot *slot);
 void pciehp_green_led_blink(struct slot *slot);
 int pciehp_check_link_status(struct controller *ctrl);
+bool pciehp_check_link_active(struct controller *ctrl);
 void pciehp_release_ctrl(struct controller *ctrl);
 int pciehp_reset_slot(struct slot *slot, int probe);
 
diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index 51f56ef4ab6f..3a5eee781bd6 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -241,7 +241,7 @@  static int pcie_write_cmd(struct controller *ctrl, u16 cmd, u16 mask)
 	return retval;
 }
 
-static bool check_link_active(struct controller *ctrl)
+bool pciehp_check_link_active(struct controller *ctrl)
 {
 	bool ret = false;
 	u16 lnk_status;
@@ -261,12 +261,12 @@  static void __pcie_wait_link_active(struct controller *ctrl, bool active)
 {
 	int timeout = 1000;
 
-	if (check_link_active(ctrl) == active)
+	if (pciehp_check_link_active(ctrl) == active)
 		return;
 	while (timeout > 0) {
 		msleep(10);
 		timeout -= 10;
-		if (check_link_active(ctrl) == active)
+		if (pciehp_check_link_active(ctrl) == active)
 			return;
 	}
 	ctrl_dbg(ctrl, "Data Link Layer Link Active not %s in 1000 msec\n",
-------------------------------------------------------------------------------
pciehp-use-link-change
-------------------------------------------------------------------------------
pciehp: Use link change notifications for hot-plug and, removal

From: Rajat Jain <rajatjain.linux@gmail.com>

A lot of systems do not have the fancy buttons and LEDs, and instead
want to rely only on the Link state change events to drive the hotplug
and removal state machinery.
(http://www.spinics.net/lists/hotplug/msg05802.html)

This patch adds support for that functionality. Here are the details
about the patch itself:

* Add "pciehp_use_link_events" module parameter to indicate link state
  changes to be used for hot-plug.

* Define and use interrupt events for linkup / linkdown.

* Introduce the functions to handle link-up and link-down events and
  queue the work in the slot->wq to be processed by pciehp_power_thread

* Many ports use inband mechanism such as receiver detection to find out
  whether an adapter is present. In such HP ports, when link goes down,
  adapter presence will also toggle. Thus don't bail out the removal
  process initiated by link down, if adapter is not found to be present.

* The current pciehp driver disabled the link in case of a hot-removal.
  Since for link change based hot-plug to work, we need that enabled,
  hence make sure to not disable the link permanently if link state
  based hot-plug is to be used.

* pciehp_reset_slot - reset of secondary bus may cause undesirable
  spurious link notifications. Thus disable these around the secondary
  bus reset.

Signed-off-by: Rajat Jain <rajatjain@juniper.net>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
---
 drivers/pci/hotplug/pciehp.h      |    4 +
 drivers/pci/hotplug/pciehp_core.c |    3 +
 drivers/pci/hotplug/pciehp_ctrl.c |  132 +++++++++++++++++++++++++++++++++++++
 drivers/pci/hotplug/pciehp_hpc.c  |   56 +++++++++++-----
 4 files changed, 179 insertions(+), 16 deletions(-)

diff --git a/drivers/pci/hotplug/pciehp.h b/drivers/pci/hotplug/pciehp.h
index fc322ed29c56..64a6840c32a4 100644
--- a/drivers/pci/hotplug/pciehp.h
+++ b/drivers/pci/hotplug/pciehp.h
@@ -44,6 +44,7 @@  extern bool pciehp_poll_mode;
 extern int pciehp_poll_time;
 extern bool pciehp_debug;
 extern bool pciehp_force;
+extern bool pciehp_use_link_events;      /* Use link state events for hotplug */
 
 #define dbg(format, arg...)						\
 do {									\
@@ -110,6 +111,8 @@  struct controller {
 #define INT_BUTTON_PRESS		7
 #define INT_BUTTON_RELEASE		8
 #define INT_BUTTON_CANCEL		9
+#define INT_LINK_UP			10
+#define INT_LINK_DOWN			11
 
 #define STATIC_STATE			0
 #define BLINKINGON_STATE		1
@@ -133,6 +136,7 @@  u8 pciehp_handle_attention_button(struct slot *p_slot);
 u8 pciehp_handle_switch_change(struct slot *p_slot);
 u8 pciehp_handle_presence_change(struct slot *p_slot);
 u8 pciehp_handle_power_fault(struct slot *p_slot);
+u8 pciehp_handle_linkstate_change(struct slot *p_slot);
 int pciehp_configure_device(struct slot *p_slot);
 int pciehp_unconfigure_device(struct slot *p_slot);
 void pciehp_queue_pushbutton_work(struct work_struct *work);
diff --git a/drivers/pci/hotplug/pciehp_core.c b/drivers/pci/hotplug/pciehp_core.c
index f4a18f51a29c..5a2c43cb0ce8 100644
--- a/drivers/pci/hotplug/pciehp_core.c
+++ b/drivers/pci/hotplug/pciehp_core.c
@@ -42,6 +42,7 @@  bool pciehp_debug;
 bool pciehp_poll_mode;
 int pciehp_poll_time;
 bool pciehp_force;
+bool pciehp_use_link_events;
 
 #define DRIVER_VERSION	"0.4"
 #define DRIVER_AUTHOR	"Dan Zink <dan.zink@compaq.com>, Greg Kroah-Hartman <greg@kroah.com>, Dely Sy <dely.l.sy@intel.com>"
@@ -55,10 +56,12 @@  module_param(pciehp_debug, bool, 0644);
 module_param(pciehp_poll_mode, bool, 0644);
 module_param(pciehp_poll_time, int, 0644);
 module_param(pciehp_force, bool, 0644);
+module_param(pciehp_use_link_events, bool, 0644);
 MODULE_PARM_DESC(pciehp_debug, "Debugging mode enabled or not");
 MODULE_PARM_DESC(pciehp_poll_mode, "Using polling mechanism for hot-plug events or not");
 MODULE_PARM_DESC(pciehp_poll_time, "Polling mechanism frequency, in seconds");
 MODULE_PARM_DESC(pciehp_force, "Force pciehp, even if OSHP is missing");
+MODULE_PARM_DESC(pciehp_use_link_events, "Use link state change events to drive pciehp hot-plug");
 
 #define PCIE_MODULE_NAME "pciehp"
 
diff --git a/drivers/pci/hotplug/pciehp_ctrl.c b/drivers/pci/hotplug/pciehp_ctrl.c
index 38f018679175..3899b526f3b5 100644
--- a/drivers/pci/hotplug/pciehp_ctrl.c
+++ b/drivers/pci/hotplug/pciehp_ctrl.c
@@ -150,6 +150,29 @@  u8 pciehp_handle_power_fault(struct slot *p_slot)
 	return 1;
 }
 
+u8 pciehp_handle_linkstate_change(struct slot *p_slot)
+{
+	u32 event_type;
+	struct controller *ctrl = p_slot->ctrl;
+
+	/* Link Status Change */
+	ctrl_dbg(ctrl, "Data Link Layer State change\n");
+
+	if (pciehp_check_link_active(ctrl)) {
+		ctrl_info(ctrl, "slot(%s): Link Up event\n",
+			  slot_name(p_slot));
+		event_type = INT_LINK_UP;
+	} else {
+		ctrl_info(ctrl, "slot(%s): Link Down event\n",
+			  slot_name(p_slot));
+		event_type = INT_LINK_DOWN;
+	}
+
+	queue_interrupt_event(p_slot, event_type);
+
+	return 1;
+}
+
 /* The following routines constitute the bulk of the
    hotplug controller logic
  */
@@ -442,6 +465,100 @@  static void handle_surprise_event(struct slot *p_slot)
 	queue_work(p_slot->wq, &info->work);
 }
 
+/*
+ * Note: This function must be called with slot->lock held
+ */
+static void handle_link_up_event(struct slot *p_slot)
+{
+	struct controller *ctrl = p_slot->ctrl;
+	struct power_work_info *info;
+
+	info = kmalloc(sizeof(*info), GFP_KERNEL);
+	if (!info) {
+		ctrl_err(p_slot->ctrl, "%s: Cannot allocate memory\n",
+			 __func__);
+		return;
+	}
+	info->p_slot = p_slot;
+	INIT_WORK(&info->work, pciehp_power_thread);
+
+	switch (p_slot->state) {
+	case BLINKINGON_STATE:
+	case BLINKINGOFF_STATE:
+		cancel_delayed_work(&p_slot->work);
+		/* Fall through */
+	case STATIC_STATE:
+		p_slot->state = POWERON_STATE;
+		queue_work(p_slot->wq, &info->work);
+		break;
+	case POWERON_STATE:
+		ctrl_info(ctrl,
+			  "Link Up event ignored on slot(%s): already powering on\n",
+			  slot_name(p_slot));
+		kfree(info);
+		break;
+	case POWEROFF_STATE:
+		ctrl_info(ctrl,
+			  "Link Up event queued on slot(%s): currently getting powered off\n",
+			  slot_name(p_slot));
+		p_slot->state = POWERON_STATE;
+		queue_work(p_slot->wq, &info->work);
+		break;
+	default:
+		ctrl_err(ctrl, "Not a valid state on slot(%s)\n",
+			 slot_name(p_slot));
+		kfree(info);
+		break;
+	}
+}
+
+/*
+ * Note: This function must be called with slot->lock held
+ */
+static void handle_link_down_event(struct slot *p_slot)
+{
+	struct controller *ctrl = p_slot->ctrl;
+	struct power_work_info *info;
+
+	info = kmalloc(sizeof(*info), GFP_KERNEL);
+	if (!info) {
+		ctrl_err(p_slot->ctrl, "%s: Cannot allocate memory\n",
+			 __func__);
+		return;
+	}
+	info->p_slot = p_slot;
+	INIT_WORK(&info->work, pciehp_power_thread);
+
+	switch (p_slot->state) {
+	case BLINKINGON_STATE:
+	case BLINKINGOFF_STATE:
+		cancel_delayed_work(&p_slot->work);
+		/* Fall through */
+	case STATIC_STATE:
+		p_slot->state = POWEROFF_STATE;
+		queue_work(p_slot->wq, &info->work);
+		break;
+	case POWEROFF_STATE:
+		ctrl_info(ctrl,
+			  "Link Down event ignored on slot(%s): already powering off\n",
+			  slot_name(p_slot));
+		kfree(info);
+		break;
+	case POWERON_STATE:
+		ctrl_info(ctrl,
+			  "Link Down event queued on slot(%s): currently getting powered on\n",
+			  slot_name(p_slot));
+		p_slot->state = POWEROFF_STATE;
+		queue_work(p_slot->wq, &info->work);
+		break;
+	default:
+		ctrl_err(ctrl, "Not a valid state on slot %s\n",
+			 slot_name(p_slot));
+		kfree(info);
+		break;
+	}
+}
+
 static void interrupt_event_handler(struct work_struct *work)
 {
 	struct event_info *info = container_of(work, struct event_info, work);
@@ -468,6 +585,12 @@  static void interrupt_event_handler(struct work_struct *work)
 		ctrl_dbg(ctrl, "Surprise Removal\n");
 		handle_surprise_event(p_slot);
 		break;
+	case INT_LINK_UP:
+		handle_link_up_event(p_slot);
+		break;
+	case INT_LINK_DOWN:
+		handle_link_down_event(p_slot);
+		break;
 	default:
 		break;
 	}
@@ -524,7 +647,14 @@  int pciehp_disable_slot(struct slot *p_slot)
 	if (!p_slot->ctrl)
 		return 1;
 
-	if (!HP_SUPR_RM(p_slot->ctrl)) {
+	/*
+	 * In certain scenarios, adapter may have already disappeared by this
+	 * time. E.g (1) Surprise Removal (2) Many ports use in-band
+	 * mechanisms to signal if adapter is present or not. Thus when a
+	 * link goes down, the device may have already gone away by the time
+	 * this code executes.
+	 */
+	if (!HP_SUPR_RM(p_slot->ctrl) && !pciehp_use_link_events) {
 		ret = pciehp_get_adapter_status(p_slot, &getstatus);
 		if (ret || !getstatus) {
 			ctrl_info(ctrl, "No adapter on slot(%s)\n",
diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index 3a5eee781bd6..90eb8c681596 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -620,13 +620,21 @@  int pciehp_power_off_slot(struct slot * slot)
 	u16 cmd_mask;
 	int retval;
 
-	/* Disable the link at first */
-	pciehp_link_disable(ctrl);
-	/* wait the link is down */
-	if (ctrl->link_active_reporting)
-		pcie_wait_link_not_active(ctrl);
-	else
-		msleep(1000);
+	if (!pciehp_use_link_events) {
+		/*
+		 * In case it is desired to use the link state events for
+		 * hot-plug, make sure to NOT disable the link permanently, or
+		 * else we might never get a link-up hotplug event again.
+		 */
+
+		/* Disable the link first */
+		pciehp_link_disable(ctrl);
+		/* wait until the link is down */
+		if (ctrl->link_active_reporting)
+			pcie_wait_link_not_active(ctrl);
+		else
+			msleep(1000);
+	}
 
 	slot_cmd = POWER_OFF;
 	cmd_mask = PCI_EXP_SLTCTL_PCC;
@@ -661,7 +669,7 @@  static irqreturn_t pcie_isr(int irq, void *dev_id)
 
 		detected &= (PCI_EXP_SLTSTA_ABP | PCI_EXP_SLTSTA_PFD |
 			     PCI_EXP_SLTSTA_MRLSC | PCI_EXP_SLTSTA_PDC |
-			     PCI_EXP_SLTSTA_CC);
+			     PCI_EXP_SLTSTA_CC | PCI_EXP_SLTSTA_DLLSC);
 		detected &= ~intr_loc;
 		intr_loc |= detected;
 		if (!intr_loc)
@@ -702,6 +710,10 @@  static irqreturn_t pcie_isr(int irq, void *dev_id)
 		ctrl->power_fault_detected = 1;
 		pciehp_handle_power_fault(slot);
 	}
+
+	if (intr_loc & PCI_EXP_SLTSTA_DLLSC)
+		pciehp_handle_linkstate_change(slot);
+
 	return IRQ_HANDLED;
 }
 
@@ -724,12 +736,15 @@  int pcie_enable_notification(struct controller *ctrl)
 		cmd |= PCI_EXP_SLTCTL_ABPE;
 	if (MRL_SENS(ctrl))
 		cmd |= PCI_EXP_SLTCTL_MRLSCE;
+	if (pciehp_use_link_events)
+		cmd |= PCI_EXP_SLTCTL_DLLSCE;
 	if (!pciehp_poll_mode)
 		cmd |= PCI_EXP_SLTCTL_HPIE | PCI_EXP_SLTCTL_CCIE;
 
 	mask = (PCI_EXP_SLTCTL_PDCE | PCI_EXP_SLTCTL_ABPE |
 		PCI_EXP_SLTCTL_MRLSCE | PCI_EXP_SLTCTL_PFDE |
-		PCI_EXP_SLTCTL_HPIE | PCI_EXP_SLTCTL_CCIE);
+		PCI_EXP_SLTCTL_HPIE | PCI_EXP_SLTCTL_CCIE |
+		PCI_EXP_SLTCTL_DLLSCE);
 
 	if (pcie_write_cmd(ctrl, cmd, mask)) {
 		ctrl_err(ctrl, "Cannot enable software notification\n");
@@ -751,28 +766,39 @@  static void pcie_disable_notification(struct controller *ctrl)
 
 /*
  * pciehp has a 1:1 bus:slot relationship so we ultimately want a secondary
- * bus reset of the bridge, but if the slot supports surprise removal we need
- * to disable presence detection around the bus reset and clear any spurious
+ * bus reset of the bridge, but if the slot supports surprise removal (or
+ * link state change based hotplug), we need to disable presence detection
+ * (or link state notifications) around the bus reset and clear any spurious
  * events after.
  */
 int pciehp_reset_slot(struct slot *slot, int probe)
 {
 	struct controller *ctrl = slot->ctrl;
+	u16 stat_mask = 0, ctrl_mask = 0;
 
 	if (probe)
 		return 0;
 
 	if (HP_SUPR_RM(ctrl)) {
-		pcie_write_cmd(ctrl, 0, PCI_EXP_SLTCTL_PDCE);
+		ctrl_mask |= PCI_EXP_SLTCTL_PDCE;
+		stat_mask |= PCI_EXP_SLTSTA_PDC;
+	}
+	if (pciehp_use_link_events) {
+		ctrl_mask |= PCI_EXP_SLTCTL_DLLSCE;
+		stat_mask |= PCI_EXP_SLTSTA_DLLSC;
+	}
+
+	if (ctrl_mask) {
+		pcie_write_cmd(ctrl, 0, ctrl_mask);
 		if (pciehp_poll_mode)
 			del_timer_sync(&ctrl->poll_timer);
 	}
 
 	pci_reset_bridge_secondary_bus(ctrl->pcie->port);
 
-	if (HP_SUPR_RM(ctrl)) {
-		pciehp_writew(ctrl, PCI_EXP_SLTSTA, PCI_EXP_SLTSTA_PDC);
-		pcie_write_cmd(ctrl, PCI_EXP_SLTCTL_PDCE, PCI_EXP_SLTCTL_PDCE);
+	if (ctrl_mask) {
+		pciehp_writew(ctrl, PCI_EXP_SLTSTA, stat_mask);
+		pcie_write_cmd(ctrl, ctrl_mask, ctrl_mask);
 		if (pciehp_poll_mode)
 			int_poll_timeout(ctrl->poll_timer.data);
 	}
-------------------------------------------------------------------------------
pciehp-ensure-very-fast
-------------------------------------------------------------------------------
pciehp: Ensure very fast hotplug events are also processed.

From: Rajat Jain <rajatjain.linux@gmail.com>

Today, this is how all the hotplug and unplug events work:

Hotplug / Removal needs to be done
  => Set slot->state (protected by slot->lock) to either
    POWERON_STATE (for enabling) or POWEROFF_STATE (for disabling).
  => Submit the work item for pciehp_power_thread() to slot->wq.

Problem:
  There is a problem if the hotplug events can happen fast enough that
  they do not give SW enough time to add or remove the new devices.

  => Assume: Event for unplug comes (e.g. surprise removal). But
     before the pciehp_power_thread() work item was executed, the
     card was replaced by another card, causing surprise hotplug event.

  => What goes wrong:
    => The hot-removal event sets slot->state to POWEROFF_STATE, and
       schedules the pciehp_power_thread().
    => The hot-add event sets slot->state to POWERON_STATE, and
       schedules the pciehp_power_thread().
    => Now the pciehp_power_thread() is scheduled twice, and on both
       occasions it will find POWERON_STATE and will try to add the
       devices on the slot, and will fail complaining that the devices
       already exist.

  => Why this is a problem: If the device was replaced between the hot
     removal and hot-add, then we should unload the old driver and
     reload the new one. This does not happen today. The kernel or the
     driver is not even aware that the device was replaced.

     The problem is that the pciehp_power_thread() only looks at the
     slot->state which would only contain the *latest* state - not
     the actual event (add / remove) that was the intent of the IRQ
     handler who submitted the work.

What this patch does:

  => Hotplug events pass on an actual request (for addition or removal)
     to pciehp_power_thread() which is local to that work item
     submission.

  => pciehp_power_thread() does not need to look at slote->state and
     hence no locks needed in that.

  => Essentially this results in all the hotplug and unplug events
     "replayed" by pciehp_power_thread().

Signed-off-by: Rajat Jain <rajatjain@juniper.net>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
---
 drivers/pci/hotplug/pciehp_ctrl.c |   32 +++++++++++++++++++++++---------
 1 file changed, 23 insertions(+), 9 deletions(-)

diff --git a/drivers/pci/hotplug/pciehp_ctrl.c b/drivers/pci/hotplug/pciehp_ctrl.c
index 3899b526f3b5..83dbe0867c36 100644
--- a/drivers/pci/hotplug/pciehp_ctrl.c
+++ b/drivers/pci/hotplug/pciehp_ctrl.c
@@ -299,6 +299,9 @@  static int remove_board(struct slot *p_slot)
 struct power_work_info {
 	struct slot *p_slot;
 	struct work_struct work;
+	unsigned int req;
+#define DISABLE_REQ 0
+#define ENABLE_REQ  1
 };
 
 /**
@@ -314,10 +317,8 @@  static void pciehp_power_thread(struct work_struct *work)
 		container_of(work, struct power_work_info, work);
 	struct slot *p_slot = info->p_slot;
 
-	mutex_lock(&p_slot->lock);
-	switch (p_slot->state) {
-	case POWEROFF_STATE:
-		mutex_unlock(&p_slot->lock);
+	switch (info->req) {
+	case DISABLE_REQ:
 		ctrl_dbg(p_slot->ctrl,
 			 "Disabling domain:bus:device=%04x:%02x:00\n",
 			 pci_domain_nr(p_slot->ctrl->pcie->port->subordinate),
@@ -325,18 +326,22 @@  static void pciehp_power_thread(struct work_struct *work)
 		pciehp_disable_slot(p_slot);
 		mutex_lock(&p_slot->lock);
 		p_slot->state = STATIC_STATE;
-		break;
-	case POWERON_STATE:
 		mutex_unlock(&p_slot->lock);
+		break;
+	case ENABLE_REQ:
+		ctrl_dbg(p_slot->ctrl,
+			 "Enabling domain:bus:device=%04x:%02x:00\n",
+			 pci_domain_nr(p_slot->ctrl->pcie->port->subordinate),
+			 p_slot->ctrl->pcie->port->subordinate->number);
 		if (pciehp_enable_slot(p_slot) && PWR_LED(p_slot->ctrl))
 			pciehp_green_led_off(p_slot);
 		mutex_lock(&p_slot->lock);
 		p_slot->state = STATIC_STATE;
+		mutex_unlock(&p_slot->lock);
 		break;
 	default:
 		break;
 	}
-	mutex_unlock(&p_slot->lock);
 
 	kfree(info);
 }
@@ -359,9 +364,11 @@  void pciehp_queue_pushbutton_work(struct work_struct *work)
 	switch (p_slot->state) {
 	case BLINKINGOFF_STATE:
 		p_slot->state = POWEROFF_STATE;
+		info->req = DISABLE_REQ;
 		break;
 	case BLINKINGON_STATE:
 		p_slot->state = POWERON_STATE;
+		info->req = ENABLE_REQ;
 		break;
 	default:
 		kfree(info);
@@ -457,10 +464,13 @@  static void handle_surprise_event(struct slot *p_slot)
 	INIT_WORK(&info->work, pciehp_power_thread);
 
 	pciehp_get_adapter_status(p_slot, &getstatus);
-	if (!getstatus)
+	if (!getstatus) {
 		p_slot->state = POWEROFF_STATE;
-	else
+		info->req = DISABLE_REQ;
+	} else {
 		p_slot->state = POWERON_STATE;
+		info->req = ENABLE_REQ;
+	}
 
 	queue_work(p_slot->wq, &info->work);
 }
@@ -489,6 +499,7 @@  static void handle_link_up_event(struct slot *p_slot)
 		/* Fall through */
 	case STATIC_STATE:
 		p_slot->state = POWERON_STATE;
+		info->req = ENABLE_REQ;
 		queue_work(p_slot->wq, &info->work);
 		break;
 	case POWERON_STATE:
@@ -502,6 +513,7 @@  static void handle_link_up_event(struct slot *p_slot)
 			  "Link Up event queued on slot(%s): currently getting powered off\n",
 			  slot_name(p_slot));
 		p_slot->state = POWERON_STATE;
+		info->req = ENABLE_REQ;
 		queue_work(p_slot->wq, &info->work);
 		break;
 	default:
@@ -536,6 +548,7 @@  static void handle_link_down_event(struct slot *p_slot)
 		/* Fall through */
 	case STATIC_STATE:
 		p_slot->state = POWEROFF_STATE;
+		info->req = DISABLE_REQ;
 		queue_work(p_slot->wq, &info->work);
 		break;
 	case POWEROFF_STATE:
@@ -549,6 +562,7 @@  static void handle_link_down_event(struct slot *p_slot)
 			  "Link Down event queued on slot(%s): currently getting powered on\n",
 			  slot_name(p_slot));
 		p_slot->state = POWEROFF_STATE;
+		info->req = DISABLE_REQ;
 		queue_work(p_slot->wq, &info->work);
 		break;
 	default:
-------------------------------------------------------------------------------
pciehp-introduce-hotplug_lock
-------------------------------------------------------------------------------
pciehp: Introduce hotplug_lock to serialize HP events

From: Rajat Jain <rajatjain.linux@gmail.com>

Today it is there is no protection around pciehp_enable_slot() and
pciehp_disable_slot() to ensure that they complete before another
hot-plug operation can be done on that particular slot.

This patch introduces the slot->hotplug_lock to ensure that any
hotplug operations (add / remove) complete before another HP event
can begin processing on that particular slot.

Signed-off-by: Rajat Jain <rajatjain@juniper.net>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
---
 drivers/pci/hotplug/pciehp.h      |    1 +
 drivers/pci/hotplug/pciehp_core.c |    7 ++++++-
 drivers/pci/hotplug/pciehp_ctrl.c |   17 +++++++++++++++--
 drivers/pci/hotplug/pciehp_hpc.c  |    1 +
 4 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/hotplug/pciehp.h b/drivers/pci/hotplug/pciehp.h
index 64a6840c32a4..fdf2038b53c8 100644
--- a/drivers/pci/hotplug/pciehp.h
+++ b/drivers/pci/hotplug/pciehp.h
@@ -78,6 +78,7 @@  struct slot {
 	struct hotplug_slot *hotplug_slot;
 	struct delayed_work work;	/* work for button event */
 	struct mutex lock;
+	struct mutex hotplug_lock;
 	struct workqueue_struct *wq;
 };
 
diff --git a/drivers/pci/hotplug/pciehp_core.c b/drivers/pci/hotplug/pciehp_core.c
index 5a2c43cb0ce8..2175f7ab6f7e 100644
--- a/drivers/pci/hotplug/pciehp_core.c
+++ b/drivers/pci/hotplug/pciehp_core.c
@@ -281,8 +281,11 @@  static int pciehp_probe(struct pcie_device *dev)
 	slot = ctrl->slot;
 	pciehp_get_adapter_status(slot, &occupied);
 	pciehp_get_power_status(slot, &poweron);
-	if (occupied && pciehp_force)
+	if (occupied && pciehp_force) {
+		mutex_lock(&slot->hotplug_lock);
 		pciehp_enable_slot(slot);
+		mutex_unlock(&slot->hotplug_lock);
+	}
 	/* If empty slot's power status is on, turn power off */
 	if (!occupied && poweron && POWER_CTRL(ctrl))
 		pciehp_power_off_slot(slot);
@@ -326,10 +329,12 @@  static int pciehp_resume (struct pcie_device *dev)
 
 	/* Check if slot is occupied */
 	pciehp_get_adapter_status(slot, &status);
+	mutex_lock(&slot->hotplug_lock);
 	if (status)
 		pciehp_enable_slot(slot);
 	else
 		pciehp_disable_slot(slot);
+	mutex_unlock(&slot->hotplug_lock);
 	return 0;
 }
 #endif /* PM */
diff --git a/drivers/pci/hotplug/pciehp_ctrl.c b/drivers/pci/hotplug/pciehp_ctrl.c
index 83dbe0867c36..8ad63a44b279 100644
--- a/drivers/pci/hotplug/pciehp_ctrl.c
+++ b/drivers/pci/hotplug/pciehp_ctrl.c
@@ -316,6 +316,7 @@  static void pciehp_power_thread(struct work_struct *work)
 	struct power_work_info *info =
 		container_of(work, struct power_work_info, work);
 	struct slot *p_slot = info->p_slot;
+	int ret;
 
 	switch (info->req) {
 	case DISABLE_REQ:
@@ -323,7 +324,9 @@  static void pciehp_power_thread(struct work_struct *work)
 			 "Disabling domain:bus:device=%04x:%02x:00\n",
 			 pci_domain_nr(p_slot->ctrl->pcie->port->subordinate),
 			 p_slot->ctrl->pcie->port->subordinate->number);
+		mutex_lock(&p_slot->hotplug_lock);
 		pciehp_disable_slot(p_slot);
+		mutex_unlock(&p_slot->hotplug_lock);
 		mutex_lock(&p_slot->lock);
 		p_slot->state = STATIC_STATE;
 		mutex_unlock(&p_slot->lock);
@@ -333,7 +336,10 @@  static void pciehp_power_thread(struct work_struct *work)
 			 "Enabling domain:bus:device=%04x:%02x:00\n",
 			 pci_domain_nr(p_slot->ctrl->pcie->port->subordinate),
 			 p_slot->ctrl->pcie->port->subordinate->number);
-		if (pciehp_enable_slot(p_slot) && PWR_LED(p_slot->ctrl))
+		mutex_lock(&p_slot->hotplug_lock);
+		ret = pciehp_enable_slot(p_slot);
+		mutex_unlock(&p_slot->hotplug_lock);
+		if (ret && PWR_LED(p_slot->ctrl))
 			pciehp_green_led_off(p_slot);
 		mutex_lock(&p_slot->lock);
 		p_slot->state = STATIC_STATE;
@@ -613,6 +619,9 @@  static void interrupt_event_handler(struct work_struct *work)
 	kfree(info);
 }
 
+/*
+ * Note: This function must be called with slot->hotplug_lock held
+ */
 int pciehp_enable_slot(struct slot *p_slot)
 {
 	u8 getstatus = 0;
@@ -651,7 +660,9 @@  int pciehp_enable_slot(struct slot *p_slot)
 	return rc;
 }
 
-
+/*
+ * Note: This function must be called with slot->hotplug_lock held
+ */
 int pciehp_disable_slot(struct slot *p_slot)
 {
 	u8 getstatus = 0;
@@ -710,7 +721,9 @@  int pciehp_sysfs_enable_slot(struct slot *p_slot)
 	case STATIC_STATE:
 		p_slot->state = POWERON_STATE;
 		mutex_unlock(&p_slot->lock);
+		mutex_lock(&p_slot->hotplug_lock);
 		retval = pciehp_enable_slot(p_slot);
+		mutex_unlock(&p_slot->hotplug_lock);
 		mutex_lock(&p_slot->lock);
 		p_slot->state = STATIC_STATE;
 		break;
diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index 90eb8c681596..740d883f1fd7 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -841,6 +841,7 @@  static int pcie_init_slot(struct controller *ctrl)
 
 	slot->ctrl = ctrl;
 	mutex_init(&slot->lock);
+	mutex_init(&slot->hotplug_lock);
 	INIT_DELAYED_WORK(&slot->work, pciehp_queue_pushbutton_work);
 	ctrl->slot = slot;
 	return 0;