diff mbox

[v1] PCI: pciehp: Fix presence detect change interrupt handling

Message ID 92EBB4272BF81E4089A7126EC1E7B284665970B5@IRSMSX101.ger.corp.intel.com
State Changes Requested
Headers show

Commit Message

Patel, Mayurkumar Aug. 17, 2016, 1:42 p.m. UTC
Currently, if very fast hotplug removal and insertion event comes
as following

[  608.823412] pciehp 0000:00:1c.1:pcie04: Card not present on Slot(1)
[  608.835249] pciehp 0000:00:1c.1:pcie04: Card present on Slot(1)

In this case following scenario happens,

While removal:
pcie_isr() -> pciehp_queue_interrupt_event() -> triggers queue_work().
work invokes interrupt_event_handler() -> case INT_PRESENCE_OFF
and calls handle_surprise_event().

handle_surprise_event() again calls pciehp_get_adapter_status()
and reads slot status which might have been changed
already due to PCI_EXP_SLTSTA_PDC event for hotplug insertion
has happened. So it queues, ENABLE_REQ for both removal
and insertion interrupt based on latest slot status.

In this case, PCIe device can not be hot-add again because
it was never removed due to which device can not get enabled.

handle_surprise_event() can be removed and pciehp_queue_power_work()
can be directly triggered based on INT_PRESENCE_ON and INT_PRESENCE_OFF
from the switch case exist in interrupt_event_hanlder().

The patch ensures the pciehp_queue_power_work() processes
presence detect change for removal and insertion correctly.

Signed-off-by: Mayurkumar Patel <mayurkumar.patel@intel.com>
---
 Resending the patch addressing to PCI Maintainer Bjorn Helgaas.

 drivers/pci/hotplug/pciehp_ctrl.c |   18 ++----------------
 1 file changed, 2 insertions(+), 16 deletions(-)

--
1.7.9.5



Mayurkumar Patel
Intel Deutschland GmbH
Registered Address: Am Campeon 10-12, 85579 Neubiberg, Germany
Tel: +49 89 99 8853-0, www.intel.de
Managing Directors: Christin Eisenschmid, Christian Lamprechter Chairperson of the Supervisory Board: Nicole Lau Registered Office: Munich Commercial Register: Amtsgericht Muenchen HRB 186928




Intel Deutschland GmbH
Registered Address: Am Campeon 10-12, 85579 Neubiberg, Germany
Tel: +49 89 99 8853-0, www.intel.de
Managing Directors: Christin Eisenschmid, Christian Lamprechter
Chairperson of the Supervisory Board: Nicole Lau
Registered Office: Munich
Commercial Register: Amtsgericht Muenchen HRB 186928

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Bjorn Helgaas Aug. 17, 2016, 5:12 p.m. UTC | #1
Hi Mayurkumar,

On Wed, Aug 17, 2016 at 01:42:18PM +0000, Patel, Mayurkumar wrote:
> Currently, if very fast hotplug removal and insertion event comes
> as following
> 
> [  608.823412] pciehp 0000:00:1c.1:pcie04: Card not present on Slot(1)
> [  608.835249] pciehp 0000:00:1c.1:pcie04: Card present on Slot(1)
> 
> In this case following scenario happens,
> 
> While removal:
> pcie_isr() -> pciehp_queue_interrupt_event() -> triggers queue_work().
> work invokes interrupt_event_handler() -> case INT_PRESENCE_OFF
> and calls handle_surprise_event().
> 
> handle_surprise_event() again calls pciehp_get_adapter_status()
> and reads slot status which might have been changed
> already due to PCI_EXP_SLTSTA_PDC event for hotplug insertion
> has happened. So it queues, ENABLE_REQ for both removal
> and insertion interrupt based on latest slot status.
> 
> In this case, PCIe device can not be hot-add again because
> it was never removed due to which device can not get enabled.
> 
> handle_surprise_event() can be removed and pciehp_queue_power_work()
> can be directly triggered based on INT_PRESENCE_ON and INT_PRESENCE_OFF
> from the switch case exist in interrupt_event_hanlder().
> 
> The patch ensures the pciehp_queue_power_work() processes
> presence detect change for removal and insertion correctly.
> 
> Signed-off-by: Mayurkumar Patel <mayurkumar.patel@intel.com>
> ---
>  Resending the patch addressing to PCI Maintainer Bjorn Helgaas.
> 
>  drivers/pci/hotplug/pciehp_ctrl.c |   18 ++----------------
>  1 file changed, 2 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/pci/hotplug/pciehp_ctrl.c b/drivers/pci/hotplug/pciehp_ctrl.c
> index 880978b..87c5bea 100644
> --- a/drivers/pci/hotplug/pciehp_ctrl.c
> +++ b/drivers/pci/hotplug/pciehp_ctrl.c
> @@ -301,20 +301,6 @@ static void handle_button_press_event(struct slot *p_slot)
>  /*
>   * Note: This function must be called with slot->lock held
>   */
> -static void handle_surprise_event(struct slot *p_slot)
> -{
> -	u8 getstatus;
> -
> -	pciehp_get_adapter_status(p_slot, &getstatus);
> -	if (!getstatus)
> -		pciehp_queue_power_work(p_slot, DISABLE_REQ);
> -	else
> -		pciehp_queue_power_work(p_slot, ENABLE_REQ);
> -}
> -
> -/*
> - * Note: This function must be called with slot->lock held
> - */
>  static void handle_link_event(struct slot *p_slot, u32 event)
>  {
>  	struct controller *ctrl = p_slot->ctrl;
> @@ -377,14 +363,14 @@ static void interrupt_event_handler(struct work_struct *work)
>  		pciehp_green_led_off(p_slot);
>  		break;
>  	case INT_PRESENCE_ON:
> -		handle_surprise_event(p_slot);
> +		pciehp_queue_power_work(p_slot, ENABLE_REQ);
>  		break;
>  	case INT_PRESENCE_OFF:
>  		/*
>  		 * Regardless of surprise capability, we need to
>  		 * definitely remove a card that has been pulled out!
>  		 */
> -		handle_surprise_event(p_slot);
> +		pciehp_queue_power_work(p_slot, DISABLE_REQ);
>  		break;
>  	case INT_LINK_UP:
>  	case INT_LINK_DOWN:

Thanks a lot for this.  I think other people have seen the same issue.

Even with this fix, don't we have essentially the same problem one
layer back?  The first thing pcie_isr() does is read PCI_EXP_SLTSTA,
then few lines down, we call pciehp_get_adapter_status(), which reads
PCI_EXP_SLTSTA *again*.  So I think the window is smaller but still
there.

I think what we really should do is read the status registers
(PCI_EXP_SLTSTA and probably also PCI_EXP_LNKSTA) *once* in
pcie_isr(), before we write PCI_EXP_SLTSTA to clear the RW1C bits
there, and then queue up events based on those values, without
re-reading the registers.

What do you think?

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rajat Jain Aug. 17, 2016, 5:54 p.m. UTC | #2
On Wed, Aug 17, 2016 at 10:12 AM, Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> Hi Mayurkumar,
>
> On Wed, Aug 17, 2016 at 01:42:18PM +0000, Patel, Mayurkumar wrote:
> > Currently, if very fast hotplug removal and insertion event comes
> > as following
> >
> > [  608.823412] pciehp 0000:00:1c.1:pcie04: Card not present on Slot(1)
> > [  608.835249] pciehp 0000:00:1c.1:pcie04: Card present on Slot(1)
> >
> > In this case following scenario happens,
> >
> > While removal:
> > pcie_isr() -> pciehp_queue_interrupt_event() -> triggers queue_work().
> > work invokes interrupt_event_handler() -> case INT_PRESENCE_OFF
> > and calls handle_surprise_event().
> >
> > handle_surprise_event() again calls pciehp_get_adapter_status()
> > and reads slot status which might have been changed
> > already due to PCI_EXP_SLTSTA_PDC event for hotplug insertion
> > has happened. So it queues, ENABLE_REQ for both removal
> > and insertion interrupt based on latest slot status.
> >
> > In this case, PCIe device can not be hot-add again because
> > it was never removed due to which device can not get enabled.
> >
> > handle_surprise_event() can be removed and pciehp_queue_power_work()
> > can be directly triggered based on INT_PRESENCE_ON and INT_PRESENCE_OFF
> > from the switch case exist in interrupt_event_hanlder().
> >
> > The patch ensures the pciehp_queue_power_work() processes
> > presence detect change for removal and insertion correctly.
> >
> > Signed-off-by: Mayurkumar Patel <mayurkumar.patel@intel.com>

Acked-by: Rajat Jain <rajatxjain@gmail.com>

>
> > ---
> >  Resending the patch addressing to PCI Maintainer Bjorn Helgaas.
> >
> >  drivers/pci/hotplug/pciehp_ctrl.c |   18 ++----------------
> >  1 file changed, 2 insertions(+), 16 deletions(-)
> >
> > diff --git a/drivers/pci/hotplug/pciehp_ctrl.c b/drivers/pci/hotplug/pciehp_ctrl.c
> > index 880978b..87c5bea 100644
> > --- a/drivers/pci/hotplug/pciehp_ctrl.c
> > +++ b/drivers/pci/hotplug/pciehp_ctrl.c
> > @@ -301,20 +301,6 @@ static void handle_button_press_event(struct slot *p_slot)
> >  /*
> >   * Note: This function must be called with slot->lock held
> >   */
> > -static void handle_surprise_event(struct slot *p_slot)
> > -{
> > -     u8 getstatus;
> > -
> > -     pciehp_get_adapter_status(p_slot, &getstatus);
> > -     if (!getstatus)
> > -             pciehp_queue_power_work(p_slot, DISABLE_REQ);
> > -     else
> > -             pciehp_queue_power_work(p_slot, ENABLE_REQ);
> > -}
> > -
> > -/*
> > - * Note: This function must be called with slot->lock held
> > - */
> >  static void handle_link_event(struct slot *p_slot, u32 event)
> >  {
> >       struct controller *ctrl = p_slot->ctrl;
> > @@ -377,14 +363,14 @@ static void interrupt_event_handler(struct work_struct *work)
> >               pciehp_green_led_off(p_slot);
> >               break;
> >       case INT_PRESENCE_ON:
> > -             handle_surprise_event(p_slot);
> > +             pciehp_queue_power_work(p_slot, ENABLE_REQ);
> >               break;
> >       case INT_PRESENCE_OFF:
> >               /*
> >                * Regardless of surprise capability, we need to
> >                * definitely remove a card that has been pulled out!
> >                */
> > -             handle_surprise_event(p_slot);
> > +             pciehp_queue_power_work(p_slot, DISABLE_REQ);
> >               break;
> >       case INT_LINK_UP:
> >       case INT_LINK_DOWN:
>
> Thanks a lot for this.  I think other people have seen the same issue.
>
> Even with this fix, don't we have essentially the same problem one
> layer back?  The first thing pcie_isr() does is read PCI_EXP_SLTSTA,
> then few lines down, we call pciehp_get_adapter_status(), which reads
> PCI_EXP_SLTSTA *again*.  So I think the window is smaller but still
> there.
>
> I think what we really should do is read the status registers
> (PCI_EXP_SLTSTA and probably also PCI_EXP_LNKSTA) *once* in
> pcie_isr(), before we write PCI_EXP_SLTSTA to clear the RW1C bits
> there, and then queue up events based on those values, without
> re-reading the registers.
>
> What do you think?


Yes, I agree. We need to do something about that *in addition * to the
above patch to cover the
whole story. However I think there still will be a room for some
interrupt misses because we are
collecting the interrupts in intr_loc, and theoretically we could be
in a situation where in the pcie_isr, the

do {
    ...
} while(detected)

loop gets a removal->insertion->removal all while in the same
invocation of pcie_isr().
If this happens, the intr_loc will have recorded a single insertion
and a single removal, and
the final result will depend on the order in which we decide to
process the events in intr_loc.

Or, may be we can make the calls to pciehp_queue_interrupt_event()
before clearing the
RW1C in the slot status register (in the loop)?

Thanks,

rajat

>
>
> Bjorn
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas Aug. 17, 2016, 6:14 p.m. UTC | #3
Hi Rajat, thanks for chiming in!

On Wed, Aug 17, 2016 at 10:54:12AM -0700, Rajat Jain wrote:
> On Wed, Aug 17, 2016 at 10:12 AM, Bjorn Helgaas <helgaas@kernel.org> wrote:
> >
> > Hi Mayurkumar,
> >
> > On Wed, Aug 17, 2016 at 01:42:18PM +0000, Patel, Mayurkumar wrote:
> > > Currently, if very fast hotplug removal and insertion event comes
> > > as following
> > >
> > > [  608.823412] pciehp 0000:00:1c.1:pcie04: Card not present on Slot(1)
> > > [  608.835249] pciehp 0000:00:1c.1:pcie04: Card present on Slot(1)
> > >
> > > In this case following scenario happens,
> > >
> > > While removal:
> > > pcie_isr() -> pciehp_queue_interrupt_event() -> triggers queue_work().
> > > work invokes interrupt_event_handler() -> case INT_PRESENCE_OFF
> > > and calls handle_surprise_event().
> > >
> > > handle_surprise_event() again calls pciehp_get_adapter_status()
> > > and reads slot status which might have been changed
> > > already due to PCI_EXP_SLTSTA_PDC event for hotplug insertion
> > > has happened. So it queues, ENABLE_REQ for both removal
> > > and insertion interrupt based on latest slot status.
> > >
> > > In this case, PCIe device can not be hot-add again because
> > > it was never removed due to which device can not get enabled.
> > >
> > > handle_surprise_event() can be removed and pciehp_queue_power_work()
> > > can be directly triggered based on INT_PRESENCE_ON and INT_PRESENCE_OFF
> > > from the switch case exist in interrupt_event_hanlder().
> > >
> > > The patch ensures the pciehp_queue_power_work() processes
> > > presence detect change for removal and insertion correctly.
> > >
> > > Signed-off-by: Mayurkumar Patel <mayurkumar.patel@intel.com>
> 
> Acked-by: Rajat Jain <rajatxjain@gmail.com>
> 
> >
> > > ---
> > >  Resending the patch addressing to PCI Maintainer Bjorn Helgaas.
> > >
> > >  drivers/pci/hotplug/pciehp_ctrl.c |   18 ++----------------
> > >  1 file changed, 2 insertions(+), 16 deletions(-)
> > >
> > > diff --git a/drivers/pci/hotplug/pciehp_ctrl.c b/drivers/pci/hotplug/pciehp_ctrl.c
> > > index 880978b..87c5bea 100644
> > > --- a/drivers/pci/hotplug/pciehp_ctrl.c
> > > +++ b/drivers/pci/hotplug/pciehp_ctrl.c
> > > @@ -301,20 +301,6 @@ static void handle_button_press_event(struct slot *p_slot)
> > >  /*
> > >   * Note: This function must be called with slot->lock held
> > >   */
> > > -static void handle_surprise_event(struct slot *p_slot)
> > > -{
> > > -     u8 getstatus;
> > > -
> > > -     pciehp_get_adapter_status(p_slot, &getstatus);
> > > -     if (!getstatus)
> > > -             pciehp_queue_power_work(p_slot, DISABLE_REQ);
> > > -     else
> > > -             pciehp_queue_power_work(p_slot, ENABLE_REQ);
> > > -}
> > > -
> > > -/*
> > > - * Note: This function must be called with slot->lock held
> > > - */
> > >  static void handle_link_event(struct slot *p_slot, u32 event)
> > >  {
> > >       struct controller *ctrl = p_slot->ctrl;
> > > @@ -377,14 +363,14 @@ static void interrupt_event_handler(struct work_struct *work)
> > >               pciehp_green_led_off(p_slot);
> > >               break;
> > >       case INT_PRESENCE_ON:
> > > -             handle_surprise_event(p_slot);
> > > +             pciehp_queue_power_work(p_slot, ENABLE_REQ);
> > >               break;
> > >       case INT_PRESENCE_OFF:
> > >               /*
> > >                * Regardless of surprise capability, we need to
> > >                * definitely remove a card that has been pulled out!
> > >                */
> > > -             handle_surprise_event(p_slot);
> > > +             pciehp_queue_power_work(p_slot, DISABLE_REQ);
> > >               break;
> > >       case INT_LINK_UP:
> > >       case INT_LINK_DOWN:
> >
> > Thanks a lot for this.  I think other people have seen the same issue.
> >
> > Even with this fix, don't we have essentially the same problem one
> > layer back?  The first thing pcie_isr() does is read PCI_EXP_SLTSTA,
> > then few lines down, we call pciehp_get_adapter_status(), which reads
> > PCI_EXP_SLTSTA *again*.  So I think the window is smaller but still
> > there.
> >
> > I think what we really should do is read the status registers
> > (PCI_EXP_SLTSTA and probably also PCI_EXP_LNKSTA) *once* in
> > pcie_isr(), before we write PCI_EXP_SLTSTA to clear the RW1C bits
> > there, and then queue up events based on those values, without
> > re-reading the registers.
> >
> > What do you think?
> 
> 
> Yes, I agree. We need to do something about that *in addition * to the
> above patch to cover the
> whole story. However I think there still will be a room for some
> interrupt misses because we are
> collecting the interrupts in intr_loc, and theoretically we could be
> in a situation where in the pcie_isr, the
> 
> do {
>     ...
> } while(detected)
> 
> loop gets a removal->insertion->removal all while in the same
> invocation of pcie_isr().
> If this happens, the intr_loc will have recorded a single insertion
> and a single removal, and
> the final result will depend on the order in which we decide to
> process the events in intr_loc.

I don't quite understand how that "do { .. } while (detected)" loop
works or why it's done that way.  Collecting interrupt status bits in
an ISR is obviously a very common task; it seems like there should be
a standard, idiomatic way of doing it, but I don't know it.

> Or, may be we can make the calls to pciehp_queue_interrupt_event()
> before clearing the
> RW1C in the slot status register (in the loop)?

Yeah, it seems like we should read PCI_EXP_SLTSTA once, queue up any
events related to it, then clear the relevant SLTSTA bits.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Patel, Mayurkumar Aug. 17, 2016, 10:37 p.m. UTC | #4
Hi Bjorn and Rajat
Thanks for replying.
 
> Hi Rajat, thanks for chiming in!
> 
> On Wed, Aug 17, 2016 at 10:54:12AM -0700, Rajat Jain wrote:
> > On Wed, Aug 17, 2016 at 10:12 AM, Bjorn Helgaas <helgaas@kernel.org> wrote:
> > >
> > > Hi Mayurkumar,
> > >
> > > On Wed, Aug 17, 2016 at 01:42:18PM +0000, Patel, Mayurkumar wrote:
> > > > Currently, if very fast hotplug removal and insertion event comes
> > > > as following
> > > >
> > > > [  608.823412] pciehp 0000:00:1c.1:pcie04: Card not present on Slot(1)
> > > > [  608.835249] pciehp 0000:00:1c.1:pcie04: Card present on Slot(1)
> > > >
> > > > In this case following scenario happens,
> > > >
> > > > While removal:
> > > > pcie_isr() -> pciehp_queue_interrupt_event() -> triggers queue_work().
> > > > work invokes interrupt_event_handler() -> case INT_PRESENCE_OFF
> > > > and calls handle_surprise_event().
> > > >
> > > > handle_surprise_event() again calls pciehp_get_adapter_status()
> > > > and reads slot status which might have been changed
> > > > already due to PCI_EXP_SLTSTA_PDC event for hotplug insertion
> > > > has happened. So it queues, ENABLE_REQ for both removal
> > > > and insertion interrupt based on latest slot status.
> > > >
> > > > In this case, PCIe device can not be hot-add again because
> > > > it was never removed due to which device can not get enabled.
> > > >
> > > > handle_surprise_event() can be removed and pciehp_queue_power_work()
> > > > can be directly triggered based on INT_PRESENCE_ON and INT_PRESENCE_OFF
> > > > from the switch case exist in interrupt_event_hanlder().
> > > >
> > > > The patch ensures the pciehp_queue_power_work() processes
> > > > presence detect change for removal and insertion correctly.
> > > >
> > > > Signed-off-by: Mayurkumar Patel <mayurkumar.patel@intel.com>
> >
> > Acked-by: Rajat Jain <rajatxjain@gmail.com>
> >
> > >
> > > > ---
> > > >  Resending the patch addressing to PCI Maintainer Bjorn Helgaas.
> > > >
> > > >  drivers/pci/hotplug/pciehp_ctrl.c |   18 ++----------------
> > > >  1 file changed, 2 insertions(+), 16 deletions(-)
> > > >
> > > > diff --git a/drivers/pci/hotplug/pciehp_ctrl.c b/drivers/pci/hotplug/pciehp_ctrl.c
> > > > index 880978b..87c5bea 100644
> > > > --- a/drivers/pci/hotplug/pciehp_ctrl.c
> > > > +++ b/drivers/pci/hotplug/pciehp_ctrl.c
> > > > @@ -301,20 +301,6 @@ static void handle_button_press_event(struct slot *p_slot)
> > > >  /*
> > > >   * Note: This function must be called with slot->lock held
> > > >   */
> > > > -static void handle_surprise_event(struct slot *p_slot)
> > > > -{
> > > > -     u8 getstatus;
> > > > -
> > > > -     pciehp_get_adapter_status(p_slot, &getstatus);
> > > > -     if (!getstatus)
> > > > -             pciehp_queue_power_work(p_slot, DISABLE_REQ);
> > > > -     else
> > > > -             pciehp_queue_power_work(p_slot, ENABLE_REQ);
> > > > -}
> > > > -
> > > > -/*
> > > > - * Note: This function must be called with slot->lock held
> > > > - */
> > > >  static void handle_link_event(struct slot *p_slot, u32 event)
> > > >  {
> > > >       struct controller *ctrl = p_slot->ctrl;
> > > > @@ -377,14 +363,14 @@ static void interrupt_event_handler(struct work_struct *work)
> > > >               pciehp_green_led_off(p_slot);
> > > >               break;
> > > >       case INT_PRESENCE_ON:
> > > > -             handle_surprise_event(p_slot);
> > > > +             pciehp_queue_power_work(p_slot, ENABLE_REQ);
> > > >               break;
> > > >       case INT_PRESENCE_OFF:
> > > >               /*
> > > >                * Regardless of surprise capability, we need to
> > > >                * definitely remove a card that has been pulled out!
> > > >                */
> > > > -             handle_surprise_event(p_slot);
> > > > +             pciehp_queue_power_work(p_slot, DISABLE_REQ);
> > > >               break;
> > > >       case INT_LINK_UP:
> > > >       case INT_LINK_DOWN:
> > >
> > > Thanks a lot for this.  I think other people have seen the same issue.
> > >
> > > Even with this fix, don't we have essentially the same problem one
> > > layer back?  The first thing pcie_isr() does is read PCI_EXP_SLTSTA,
> > > then few lines down, we call pciehp_get_adapter_status(), which reads
> > > PCI_EXP_SLTSTA *again*.  So I think the window is smaller but still
> > > there.
> > >
> > > I think what we really should do is read the status registers
> > > (PCI_EXP_SLTSTA and probably also PCI_EXP_LNKSTA) *once* in
> > > pcie_isr(), before we write PCI_EXP_SLTSTA to clear the RW1C bits
> > > there, and then queue up events based on those values, without
> > > re-reading the registers.
> > >
> > > What do you think?
> >
> >
> > Yes, I agree. 

Yes indeed that should be done too.

> > We need to do something about that *in addition * to the
> > above patch to cover the
> > whole story. However I think there still will be a room for some
> > interrupt misses because we are
> > collecting the interrupts in intr_loc, and theoretically we could be
> > in a situation where in the pcie_isr, the
> >
> > do {
> >     ...
> > } while(detected)
> >
> > loop gets a removal->insertion->removal all while in the same
> > invocation of pcie_isr().
> > If this happens, the intr_loc will have recorded a single insertion
> > and a single removal, and
> > the final result will depend on the order in which we decide to
> > process the events in intr_loc.
> 
> I don't quite understand how that "do { .. } while (detected)" loop
> works or why it's done that way.  Collecting interrupt status bits in
> an ISR is obviously a very common task; it seems like there should be
> a standard, idiomatic way of doing it, but I don't know it.
> 
> > Or, may be we can make the calls to pciehp_queue_interrupt_event()
> > before clearing the
> > RW1C in the slot status register (in the loop)?
> 
> Yeah, it seems like we should read PCI_EXP_SLTSTA once, queue up any
> events related to it, then clear the relevant SLTSTA bits.
> 

Do you mean to remove the do {...} while loop and just
read PCI_EXP_SLTSTA once in ISR , queue the work and clear interrupts?


> Bjorn
Intel Deutschland GmbH
Registered Address: Am Campeon 10-12, 85579 Neubiberg, Germany
Tel: +49 89 99 8853-0, www.intel.de
Managing Directors: Christin Eisenschmid, Christian Lamprechter
Chairperson of the Supervisory Board: Nicole Lau
Registered Office: Munich
Commercial Register: Amtsgericht Muenchen HRB 186928

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas Aug. 18, 2016, 12:52 p.m. UTC | #5
On Wed, Aug 17, 2016 at 10:37:13PM +0000, Patel, Mayurkumar wrote:
> > On Wed, Aug 17, 2016 at 10:54:12AM -0700, Rajat Jain wrote:

> > > We need to do something about that *in addition * to the
> > > above patch to cover the
> > > whole story. However I think there still will be a room for some
> > > interrupt misses because we are
> > > collecting the interrupts in intr_loc, and theoretically we could be
> > > in a situation where in the pcie_isr, the
> > >
> > > do {
> > >     ...
> > > } while(detected)
> > >
> > > loop gets a removal->insertion->removal all while in the same
> > > invocation of pcie_isr().
> > > If this happens, the intr_loc will have recorded a single insertion
> > > and a single removal, and
> > > the final result will depend on the order in which we decide to
> > > process the events in intr_loc.
> > 
> > I don't quite understand how that "do { .. } while (detected)" loop
> > works or why it's done that way.  Collecting interrupt status bits in
> > an ISR is obviously a very common task; it seems like there should be
> > a standard, idiomatic way of doing it, but I don't know it.
> > 
> > > Or, may be we can make the calls to pciehp_queue_interrupt_event()
> > > before clearing the
> > > RW1C in the slot status register (in the loop)?
> > 
> > Yeah, it seems like we should read PCI_EXP_SLTSTA once, queue up any
> > events related to it, then clear the relevant SLTSTA bits.
> > 
> 
> Do you mean to remove the do {...} while loop and just
> read PCI_EXP_SLTSTA once in ISR , queue the work and clear interrupts?

I don't know if removing the loop is the right thing or not.  We need
to understand why the loop is there in the first place and make sure
removing it wouldn't break things.

But I do think that in the resulting code, the connection between

  (1) the events we learn from the interrupt and
  (2) the queued work items

needs to be crystal clear.  Right now it's a bit muddy because of
things like the case you fixed: a work item that goes back and looks
at PCI_EXP_SLTSTA after it's been cleared and the hardware may have
already set bits for new events.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Patel, Mayurkumar Aug. 18, 2016, 8:59 p.m. UTC | #6
> On Wed, Aug 17, 2016 at 10:37:13PM +0000, Patel, Mayurkumar wrote:
> > > On Wed, Aug 17, 2016 at 10:54:12AM -0700, Rajat Jain wrote:
> 
> > > > We need to do something about that *in addition * to the
> > > > above patch to cover the
> > > > whole story. However I think there still will be a room for some
> > > > interrupt misses because we are
> > > > collecting the interrupts in intr_loc, and theoretically we could be
> > > > in a situation where in the pcie_isr, the
> > > >
> > > > do {
> > > >     ...
> > > > } while(detected)
> > > >
> > > > loop gets a removal->insertion->removal all while in the same
> > > > invocation of pcie_isr().
> > > > If this happens, the intr_loc will have recorded a single insertion
> > > > and a single removal, and
> > > > the final result will depend on the order in which we decide to
> > > > process the events in intr_loc.
> > >
> > > I don't quite understand how that "do { .. } while (detected)" loop
> > > works or why it's done that way.  Collecting interrupt status bits in
> > > an ISR is obviously a very common task; it seems like there should be
> > > a standard, idiomatic way of doing it, but I don't know it.
> > >
> > > > Or, may be we can make the calls to pciehp_queue_interrupt_event()
> > > > before clearing the
> > > > RW1C in the slot status register (in the loop)?
> > >
> > > Yeah, it seems like we should read PCI_EXP_SLTSTA once, queue up any
> > > events related to it, then clear the relevant SLTSTA bits.
> > >
> >
> > Do you mean to remove the do {...} while loop and just
> > read PCI_EXP_SLTSTA once in ISR , queue the work and clear interrupts?
> 
> I don't know if removing the loop is the right thing or not.  We need
> to understand why the loop is there in the first place and make sure
> removing it wouldn't break things.
> 


The pcie base spec. 3.0 says from chapter 6.7.3.1. Slot Events:

	A Downstream Port with hot-plug capabilities monitors the slot it controls for the slot events listed
	above. When one of these slot events is detected, the Port indicates that the event has occurred by
	setting the status field associated with the event. At that point, the event is pending until software
	clears the status field.
	Once a slot event is pending on a particular slot, all subsequent events of that type are ignored on
	that slot until the event is cleared. The Port must continue to monitor the slot for all other slot
	event types and report them as they occur.


So does it mean that the port would continue providing MSI if there has been
any other events occurred apart from the event which is not cleared? If that
is the case then it's not sure why the loop is still needed.

Also with having a loop, pcie_isr() is reading the PCI_EXP_SLTSTA again and taking action just based on
latest event happens. That may mean removal->insertion happens fast enough then
removal could be overwritten by insertion and pciehp_get_adapter_status ()
will return insertion only and we may miss the removal event to be processed for the loop.

If we don't clear the Slot event quick enough then according to spec. statement above
that event could get ignored and SW may never get notified.



> But I do think that in the resulting code, the connection between
> 
>   (1) the events we learn from the interrupt and
>   (2) the queued work items
> 
> needs to be crystal clear.  Right now it's a bit muddy because of
> things like the case you fixed: a work item that goes back and looks
> at PCI_EXP_SLTSTA after it's been cleared and the hardware may have
> already set bits for new events.
> 

I have made a prototype patch in my follow up reply and tested ok on my existing setup 
on which I caught the previous issue although I cant say it will work on any HW.
Clearing events after queuing events gave some problems and did not work properly on my
existing HW where I test very fast insertion and removal events, in which
case only "removal" event comes and "insertion" does not occur even if HW gets powered.

> Bjorn
Intel Deutschland GmbH
Registered Address: Am Campeon 10-12, 85579 Neubiberg, Germany
Tel: +49 89 99 8853-0, www.intel.de
Managing Directors: Christin Eisenschmid, Christian Lamprechter
Chairperson of the Supervisory Board: Nicole Lau
Registered Office: Munich
Commercial Register: Amtsgericht Muenchen HRB 186928

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rajat Jain Aug. 23, 2016, 11:47 p.m. UTC | #7
On Thu, Aug 18, 2016 at 1:59 PM, Patel, Mayurkumar
<mayurkumar.patel@intel.com> wrote:
>
>
>
> > On Wed, Aug 17, 2016 at 10:37:13PM +0000, Patel, Mayurkumar wrote:
> > > > On Wed, Aug 17, 2016 at 10:54:12AM -0700, Rajat Jain wrote:
> >
> > > > > We need to do something about that *in addition * to the
> > > > > above patch to cover the
> > > > > whole story. However I think there still will be a room for some
> > > > > interrupt misses because we are
> > > > > collecting the interrupts in intr_loc, and theoretically we could be
> > > > > in a situation where in the pcie_isr, the
> > > > >
> > > > > do {
> > > > >     ...
> > > > > } while(detected)
> > > > >
> > > > > loop gets a removal->insertion->removal all while in the same
> > > > > invocation of pcie_isr().
> > > > > If this happens, the intr_loc will have recorded a single insertion
> > > > > and a single removal, and
> > > > > the final result will depend on the order in which we decide to
> > > > > process the events in intr_loc.
> > > >
> > > > I don't quite understand how that "do { .. } while (detected)" loop
> > > > works or why it's done that way.  Collecting interrupt status bits in
> > > > an ISR is obviously a very common task; it seems like there should be
> > > > a standard, idiomatic way of doing it, but I don't know it.
> > > >
> > > > > Or, may be we can make the calls to pciehp_queue_interrupt_event()
> > > > > before clearing the
> > > > > RW1C in the slot status register (in the loop)?
> > > >
> > > > Yeah, it seems like we should read PCI_EXP_SLTSTA once, queue up any
> > > > events related to it, then clear the relevant SLTSTA bits.
> > > >
> > >
> > > Do you mean to remove the do {...} while loop and just
> > > read PCI_EXP_SLTSTA once in ISR , queue the work and clear interrupts?
> >
> > I don't know if removing the loop is the right thing or not.  We need
> > to understand why the loop is there in the first place and make sure
> > removing it wouldn't break things.
> >
>
>
> The pcie base spec. 3.0 says from chapter 6.7.3.1. Slot Events:
>
>         A Downstream Port with hot-plug capabilities monitors the slot it controls for the slot events listed
>         above. When one of these slot events is detected, the Port indicates that the event has occurred by
>         setting the status field associated with the event. At that point, the event is pending until software
>         clears the status field.
>         Once a slot event is pending on a particular slot, all subsequent events of that type are ignored on
>         that slot until the event is cleared. The Port must continue to monitor the slot for all other slot
>         event types and report them as they occur.
>
>
> So does it mean that the port would continue providing MSI if there has been
> any other events occurred apart from the event which is not cleared? If that
> is the case then it's not sure why the loop is still needed.

I'd think that the loop is needed because we don't want to handle just
one event on one interrupt. We want to handle as many events as we can
(to keep the interrupt latency overhead low), that have happened
during the ISR invocation. And hence the loop.

>
> Also with having a loop, pcie_isr() is reading the PCI_EXP_SLTSTA again and taking action just based on
> latest event happens. That may mean removal->insertion happens fast enough then
> removal could be overwritten by insertion and pciehp_get_adapter_status ()
> will return insertion only and we may miss the removal event to be processed for the loop.

Current. This is the problem that Bjorn highlighted above. I think the
current code works well when events of different kinds have happened
(due to the loop).

In case of similar kind of events (insertion / removal), I think there
are 2 problems: (1) we use the transient values of slot status which
may not be valid at that time. (2) because we ack the event before we
queue it up, it leaves chance for error. I think we'd need more
careful examination in order to fix these issues.

>
> If we don't clear the Slot event quick enough then according to spec. statement above
> that event could get ignored and SW may never get notified.
>
>
>
> > But I do think that in the resulting code, the connection between
> >
> >   (1) the events we learn from the interrupt and
> >   (2) the queued work items
> >
> > needs to be crystal clear.  Right now it's a bit muddy because of
> > things like the case you fixed: a work item that goes back and looks
> > at PCI_EXP_SLTSTA after it's been cleared and the hardware may have
> > already set bits for new events.
> >
>
> I have made a prototype patch in my follow up reply and tested ok on my existing setup
> on which I caught the previous issue although I cant say it will work on any HW.
> Clearing events after queuing events gave some problems and did not work properly on my
> existing HW where I test very fast insertion and removal events, in which
> case only "removal" event comes and "insertion" does not occur even if HW gets powered.
>
> > Bjorn
> Intel Deutschland GmbH
> Registered Address: Am Campeon 10-12, 85579 Neubiberg, Germany
> Tel: +49 89 99 8853-0, www.intel.de
> Managing Directors: Christin Eisenschmid, Christian Lamprechter
> Chairperson of the Supervisory Board: Nicole Lau
> Registered Office: Munich
> Commercial Register: Amtsgericht Muenchen HRB 186928
>
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Patel, Mayurkumar Aug. 24, 2016, 9 a.m. UTC | #8
> >

> > > On Wed, Aug 17, 2016 at 10:37:13PM +0000, Patel, Mayurkumar wrote:

> > > > > On Wed, Aug 17, 2016 at 10:54:12AM -0700, Rajat Jain wrote:

> > >

> > > > > > We need to do something about that *in addition * to the

> > > > > > above patch to cover the

> > > > > > whole story. However I think there still will be a room for some

> > > > > > interrupt misses because we are

> > > > > > collecting the interrupts in intr_loc, and theoretically we could be

> > > > > > in a situation where in the pcie_isr, the

> > > > > >

> > > > > > do {

> > > > > >     ...

> > > > > > } while(detected)

> > > > > >

> > > > > > loop gets a removal->insertion->removal all while in the same

> > > > > > invocation of pcie_isr().

> > > > > > If this happens, the intr_loc will have recorded a single insertion

> > > > > > and a single removal, and

> > > > > > the final result will depend on the order in which we decide to

> > > > > > process the events in intr_loc.

> > > > >

> > > > > I don't quite understand how that "do { .. } while (detected)" loop

> > > > > works or why it's done that way.  Collecting interrupt status bits in

> > > > > an ISR is obviously a very common task; it seems like there should be

> > > > > a standard, idiomatic way of doing it, but I don't know it.

> > > > >

> > > > > > Or, may be we can make the calls to pciehp_queue_interrupt_event()

> > > > > > before clearing the

> > > > > > RW1C in the slot status register (in the loop)?

> > > > >

> > > > > Yeah, it seems like we should read PCI_EXP_SLTSTA once, queue up any

> > > > > events related to it, then clear the relevant SLTSTA bits.

> > > > >

> > > >

> > > > Do you mean to remove the do {...} while loop and just

> > > > read PCI_EXP_SLTSTA once in ISR , queue the work and clear interrupts?

> > >

> > > I don't know if removing the loop is the right thing or not.  We need

> > > to understand why the loop is there in the first place and make sure

> > > removing it wouldn't break things.

> > >

> >

> >

> > The pcie base spec. 3.0 says from chapter 6.7.3.1. Slot Events:

> >

> >         A Downstream Port with hot-plug capabilities monitors the slot it controls for the slot events listed

> >         above. When one of these slot events is detected, the Port indicates that the event has occurred by

> >         setting the status field associated with the event. At that point, the event is pending until software

> >         clears the status field.

> >         Once a slot event is pending on a particular slot, all subsequent events of that type are ignored on

> >         that slot until the event is cleared. The Port must continue to monitor the slot for all other slot

> >         event types and report them as they occur.

> >

> >

> > So does it mean that the port would continue providing MSI if there has been

> > any other events occurred apart from the event which is not cleared? If that

> > is the case then it's not sure why the loop is still needed.

> 

> I'd think that the loop is needed because we don't want to handle just

> one event on one interrupt. We want to handle as many events as we can

> (to keep the interrupt latency overhead low), that have happened

> during the ISR invocation. And hence the loop.

> 


Hmm, Right. I created a proposal patch
(=>[v1,2/2] PCI: pciehp: Rework hotplug interrupt routine)
to remove the do {...} while loop which 
I guess then should be rejected, right? As we decide to keep this loop?

> >

> > Also with having a loop, pcie_isr() is reading the PCI_EXP_SLTSTA again and taking action just based on

> > latest event happens. That may mean removal->insertion happens fast enough then

> > removal could be overwritten by insertion and pciehp_get_adapter_status ()

> > will return insertion only and we may miss the removal event to be processed for the loop.

> 

> Current. This is the problem that Bjorn highlighted above. I think the

> current code works well when events of different kinds have happened

> (due to the loop).

> 

> In case of similar kind of events (insertion / removal), I think there

> are 2 problems: (1) we use the transient values of slot status which

> may not be valid at that time. (2) because we ack the event before we

> queue it up, it leaves chance for error. I think we'd need more

> careful examination in order to fix these issues.

> 


I also see 3rd problem too with this Loop. 
For example, if we are in the loop,
If only slot removal comes -> detect = PCI_EXP_SLTSTA_PDC -> intr_lock |= detected
-> detected==true so clear PDC interrupt bit => So we have a chance to detect
another insertion event.

We are in the same loop & if insertion comes again -> so detected = PCI_EXP_SLTSTA_PDC
-> then detected &= ~intr_loc -> detected == false => This means
PCI_EXP_SLTSTA would not be cleared for insertion event happened 
and there we could get a problem that No further PDC events can get triggered by HW.

So if we want to keep the loop, I think clearing PCI_EXP_SLTSTA  should be shifted
outside of the loop to clear Slot events(but not based on detected) to ensure that loop
just looks for the events Which has not occurred, also before clearing SLTSTA outside
the loop, we can update link and slot_status to trigger the right events for DLLSC and PDC types.

What do you think about following proposal?

@@ -542,7 +542,7 @@ static irqreturn_t pcie_isr(int irq, void *dev_id)
 	struct pci_bus *subordinate = pdev->subordinate;
 	struct pci_dev *dev;
 	struct slot *slot = ctrl->slot;
-	u16 detected, intr_loc;
+	u16 detected, intr_loc, slot_status;
 	u8 present;
 	bool link;
 
@@ -553,26 +553,30 @@ static irqreturn_t pcie_isr(int irq, void *dev_id)
 	 */
 	intr_loc = 0;
 	do {
-		pcie_capability_read_word(pdev, PCI_EXP_SLTSTA, &detected);
-		if (detected == (u16) ~0) {
+		pcie_capability_read_word(pdev, PCI_EXP_SLTSTA, &slot_status);
+		if (slot_status == (u16) ~0) {
 			ctrl_info(ctrl, "%s: no response from device\n",
 				  __func__);
 			return IRQ_HANDLED;
 		}
 
-		detected &= (PCI_EXP_SLTSTA_ABP | PCI_EXP_SLTSTA_PFD |
+		detected = slot_status & (PCI_EXP_SLTSTA_ABP |
+			     PCI_EXP_SLTSTA_PFD |
 			     PCI_EXP_SLTSTA_PDC |
 			     PCI_EXP_SLTSTA_CC | PCI_EXP_SLTSTA_DLLSC);
 		detected &= ~intr_loc;
 		intr_loc |= detected;
 		if (!intr_loc)
 			return IRQ_NONE;
-		if (detected)
-			pcie_capability_write_word(pdev, PCI_EXP_SLTSTA,
-						   intr_loc);
 	} while (detected);
 
 	ctrl_dbg(ctrl, "pending interrupts %#06x from Slot Status\n", intr_loc);
+	if (intr_loc & PCI_EXP_SLTSTA_PDC)
+		present = !!(slot_status & PCI_EXP_SLTSTA_PDS);
+	if (intr_loc & PCI_EXP_SLTSTA_DLLSC)
+		link = pciehp_check_link_active(ctrl);
+
+	pcie_capability_write_word(pdev, PCI_EXP_SLTSTA, intr_loc);
 
 	/* Check Command Complete Interrupt Pending */
 	if (intr_loc & PCI_EXP_SLTSTA_CC) {
@@ -603,7 +607,6 @@ static irqreturn_t pcie_isr(int irq, void *dev_id)
 
 	/* Check Presence Detect Changed */
 	if (intr_loc & PCI_EXP_SLTSTA_PDC) {
-		pciehp_get_adapter_status(slot, &present);
 		ctrl_info(ctrl, "Card %spresent on Slot(%s)\n",
 			  present ? "" : "not ", slot_name(slot));
 		pciehp_queue_interrupt_event(slot, present ? INT_PRESENCE_ON :
@@ -618,7 +621,6 @@ static irqreturn_t pcie_isr(int irq, void *dev_id)
 	}
 
 	if (intr_loc & PCI_EXP_SLTSTA_DLLSC) {
-		link = pciehp_check_link_active(ctrl);
 		ctrl_info(ctrl, "slot(%s): Link %s event\n",
 			  slot_name(slot), link ? "Up" : "Down");
 		pciehp_queue_interrupt_event(slot, link ? INT_LINK_UP :


> >

> > If we don't clear the Slot event quick enough then according to spec. statement above

> > that event could get ignored and SW may never get notified.

> >

> >

> >

> > > But I do think that in the resulting code, the connection between

> > >

> > >   (1) the events we learn from the interrupt and

> > >   (2) the queued work items

> > >

> > > needs to be crystal clear.  Right now it's a bit muddy because of

> > > things like the case you fixed: a work item that goes back and looks

> > > at PCI_EXP_SLTSTA after it's been cleared and the hardware may have

> > > already set bits for new events.

> > >

> >

> > I have made a prototype patch in my follow up reply and tested ok on my existing setup

> > on which I caught the previous issue although I cant say it will work on any HW.

> > Clearing events after queuing events gave some problems and did not work properly on my

> > existing HW where I test very fast insertion and removal events, in which

> > case only "removal" event comes and "insertion" does not occur even if HW gets powered.

> >



Intel Deutschland GmbH
Registered Address: Am Campeon 10-12, 85579 Neubiberg, Germany
Tel: +49 89 99 8853-0, www.intel.de
Managing Directors: Christin Eisenschmid, Christian Lamprechter
Chairperson of the Supervisory Board: Nicole Lau
Registered Office: Munich
Commercial Register: Amtsgericht Muenchen HRB 186928
Keith Busch Aug. 24, 2016, 2:59 p.m. UTC | #9
On Tue, Aug 23, 2016 at 04:47:44PM -0700, Rajat Jain wrote:
> On Thu, Aug 18, 2016 at 1:59 PM, Patel, Mayurkumar
> <mayurkumar.patel@intel.com> wrote:
> >
> > So does it mean that the port would continue providing MSI if there has been
> > any other events occurred apart from the event which is not cleared? If that
> > is the case then it's not sure why the loop is still needed.
> 
> I'd think that the loop is needed because we don't want to handle just
> one event on one interrupt. We want to handle as many events as we can
> (to keep the interrupt latency overhead low), that have happened
> during the ISR invocation. And hence the loop.

The Slot Status doesn't have to provide a single event at a time. But
even if it does, this isn't exactly a performance path that would be
hurt by having one interrupt per event, right?
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Patel, Mayurkumar Sept. 1, 2016, 10:44 a.m. UTC | #10
> 

> > >

> > > > On Wed, Aug 17, 2016 at 10:37:13PM +0000, Patel, Mayurkumar wrote:

> > > > > > On Wed, Aug 17, 2016 at 10:54:12AM -0700, Rajat Jain wrote:

> > > >

> > > > > > > We need to do something about that *in addition * to the

> > > > > > > above patch to cover the

> > > > > > > whole story. However I think there still will be a room for some

> > > > > > > interrupt misses because we are

> > > > > > > collecting the interrupts in intr_loc, and theoretically we could be

> > > > > > > in a situation where in the pcie_isr, the

> > > > > > >

> > > > > > > do {

> > > > > > >     ...

> > > > > > > } while(detected)

> > > > > > >

> > > > > > > loop gets a removal->insertion->removal all while in the same

> > > > > > > invocation of pcie_isr().

> > > > > > > If this happens, the intr_loc will have recorded a single insertion

> > > > > > > and a single removal, and

> > > > > > > the final result will depend on the order in which we decide to

> > > > > > > process the events in intr_loc.

> > > > > >

> > > > > > I don't quite understand how that "do { .. } while (detected)" loop

> > > > > > works or why it's done that way.  Collecting interrupt status bits in

> > > > > > an ISR is obviously a very common task; it seems like there should be

> > > > > > a standard, idiomatic way of doing it, but I don't know it.

> > > > > >

> > > > > > > Or, may be we can make the calls to pciehp_queue_interrupt_event()

> > > > > > > before clearing the

> > > > > > > RW1C in the slot status register (in the loop)?

> > > > > >

> > > > > > Yeah, it seems like we should read PCI_EXP_SLTSTA once, queue up any

> > > > > > events related to it, then clear the relevant SLTSTA bits.

> > > > > >

> > > > >

> > > > > Do you mean to remove the do {...} while loop and just

> > > > > read PCI_EXP_SLTSTA once in ISR , queue the work and clear interrupts?

> > > >

> > > > I don't know if removing the loop is the right thing or not.  We need

> > > > to understand why the loop is there in the first place and make sure

> > > > removing it wouldn't break things.

> > > >

> > >

> > >

> > > The pcie base spec. 3.0 says from chapter 6.7.3.1. Slot Events:

> > >

> > >         A Downstream Port with hot-plug capabilities monitors the slot it controls for the slot events listed

> > >         above. When one of these slot events is detected, the Port indicates that the event has occurred by

> > >         setting the status field associated with the event. At that point, the event is pending until software

> > >         clears the status field.

> > >         Once a slot event is pending on a particular slot, all subsequent events of that type are ignored on

> > >         that slot until the event is cleared. The Port must continue to monitor the slot for all other slot

> > >         event types and report them as they occur.

> > >

> > >

> > > So does it mean that the port would continue providing MSI if there has been

> > > any other events occurred apart from the event which is not cleared? If that

> > > is the case then it's not sure why the loop is still needed.

> >

> > I'd think that the loop is needed because we don't want to handle just

> > one event on one interrupt. We want to handle as many events as we can

> > (to keep the interrupt latency overhead low), that have happened

> > during the ISR invocation. And hence the loop.

> >

> 

> Hmm, Right. I created a proposal patch

> (=>[v1,2/2] PCI: pciehp: Rework hotplug interrupt routine)

> to remove the do {...} while loop which

> I guess then should be rejected, right? As we decide to keep this loop?

> 

> > >

> > > Also with having a loop, pcie_isr() is reading the PCI_EXP_SLTSTA again and taking action just based on

> > > latest event happens. That may mean removal->insertion happens fast enough then

> > > removal could be overwritten by insertion and pciehp_get_adapter_status ()

> > > will return insertion only and we may miss the removal event to be processed for the loop.

> >

> > Current. This is the problem that Bjorn highlighted above. I think the

> > current code works well when events of different kinds have happened

> > (due to the loop).

> >

> > In case of similar kind of events (insertion / removal), I think there

> > are 2 problems: (1) we use the transient values of slot status which

> > may not be valid at that time. (2) because we ack the event before we

> > queue it up, it leaves chance for error. I think we'd need more

> > careful examination in order to fix these issues.

> >

> 

> I also see 3rd problem too with this Loop.

> For example, if we are in the loop,

> If only slot removal comes -> detect = PCI_EXP_SLTSTA_PDC -> intr_lock |= detected

> -> detected==true so clear PDC interrupt bit => So we have a chance to detect

> another insertion event.

> 

> We are in the same loop & if insertion comes again -> so detected = PCI_EXP_SLTSTA_PDC

> -> then detected &= ~intr_loc -> detected == false => This means

> PCI_EXP_SLTSTA would not be cleared for insertion event happened

> and there we could get a problem that No further PDC events can get triggered by HW.

> 

> So if we want to keep the loop, I think clearing PCI_EXP_SLTSTA  should be shifted

> outside of the loop to clear Slot events(but not based on detected) to ensure that loop

> just looks for the events Which has not occurred, also before clearing SLTSTA outside

> the loop, we can update link and slot_status to trigger the right events for DLLSC and PDC types.

> 

> What do you think about following proposal?

> 

> @@ -542,7 +542,7 @@ static irqreturn_t pcie_isr(int irq, void *dev_id)

>  	struct pci_bus *subordinate = pdev->subordinate;

>  	struct pci_dev *dev;

>  	struct slot *slot = ctrl->slot;

> -	u16 detected, intr_loc;

> +	u16 detected, intr_loc, slot_status;

>  	u8 present;

>  	bool link;

> 

> @@ -553,26 +553,30 @@ static irqreturn_t pcie_isr(int irq, void *dev_id)

>  	 */

>  	intr_loc = 0;

>  	do {

> -		pcie_capability_read_word(pdev, PCI_EXP_SLTSTA, &detected);

> -		if (detected == (u16) ~0) {

> +		pcie_capability_read_word(pdev, PCI_EXP_SLTSTA, &slot_status);

> +		if (slot_status == (u16) ~0) {

>  			ctrl_info(ctrl, "%s: no response from device\n",

>  				  __func__);

>  			return IRQ_HANDLED;

>  		}

> 

> -		detected &= (PCI_EXP_SLTSTA_ABP | PCI_EXP_SLTSTA_PFD |

> +		detected = slot_status & (PCI_EXP_SLTSTA_ABP |

> +			     PCI_EXP_SLTSTA_PFD |

>  			     PCI_EXP_SLTSTA_PDC |

>  			     PCI_EXP_SLTSTA_CC | PCI_EXP_SLTSTA_DLLSC);

>  		detected &= ~intr_loc;

>  		intr_loc |= detected;

>  		if (!intr_loc)

>  			return IRQ_NONE;

> -		if (detected)

> -			pcie_capability_write_word(pdev, PCI_EXP_SLTSTA,

> -						   intr_loc);

>  	} while (detected);

> 

>  	ctrl_dbg(ctrl, "pending interrupts %#06x from Slot Status\n", intr_loc);

> +	if (intr_loc & PCI_EXP_SLTSTA_PDC)

> +		present = !!(slot_status & PCI_EXP_SLTSTA_PDS);

> +	if (intr_loc & PCI_EXP_SLTSTA_DLLSC)

> +		link = pciehp_check_link_active(ctrl);

> +

> +	pcie_capability_write_word(pdev, PCI_EXP_SLTSTA, intr_loc);

> 

>  	/* Check Command Complete Interrupt Pending */

>  	if (intr_loc & PCI_EXP_SLTSTA_CC) {

> @@ -603,7 +607,6 @@ static irqreturn_t pcie_isr(int irq, void *dev_id)

> 

>  	/* Check Presence Detect Changed */

>  	if (intr_loc & PCI_EXP_SLTSTA_PDC) {

> -		pciehp_get_adapter_status(slot, &present);

>  		ctrl_info(ctrl, "Card %spresent on Slot(%s)\n",

>  			  present ? "" : "not ", slot_name(slot));

>  		pciehp_queue_interrupt_event(slot, present ? INT_PRESENCE_ON :

> @@ -618,7 +621,6 @@ static irqreturn_t pcie_isr(int irq, void *dev_id)

>  	}

> 

>  	if (intr_loc & PCI_EXP_SLTSTA_DLLSC) {

> -		link = pciehp_check_link_active(ctrl);

>  		ctrl_info(ctrl, "slot(%s): Link %s event\n",

>  			  slot_name(slot), link ? "Up" : "Down");

>  		pciehp_queue_interrupt_event(slot, link ? INT_LINK_UP :

> 

> 


Hi Rajat/Bjorn
Could you please comment, if the proposal can be worked out or we should consider
removing do {..} while at all as Keith, Busch commented as that's not really performance
path or is there any other way/suggestion you think of?


> > >

> > > If we don't clear the Slot event quick enough then according to spec. statement above

> > > that event could get ignored and SW may never get notified.

> > >

> > >

> > >

> > > > But I do think that in the resulting code, the connection between

> > > >

> > > >   (1) the events we learn from the interrupt and

> > > >   (2) the queued work items

> > > >

> > > > needs to be crystal clear.  Right now it's a bit muddy because of

> > > > things like the case you fixed: a work item that goes back and looks

> > > > at PCI_EXP_SLTSTA after it's been cleared and the hardware may have

> > > > already set bits for new events.

> > > >

> > >

> > > I have made a prototype patch in my follow up reply and tested ok on my existing setup

> > > on which I caught the previous issue although I cant say it will work on any HW.

> > > Clearing events after queuing events gave some problems and did not work properly on my

> > > existing HW where I test very fast insertion and removal events, in which

> > > case only "removal" event comes and "insertion" does not occur even if HW gets powered.

> > >

> 

Intel Deutschland GmbH
Registered Address: Am Campeon 10-12, 85579 Neubiberg, Germany
Tel: +49 89 99 8853-0, www.intel.de
Managing Directors: Christin Eisenschmid, Christian Lamprechter
Chairperson of the Supervisory Board: Nicole Lau
Registered Office: Munich
Commercial Register: Amtsgericht Muenchen HRB 186928
diff mbox

Patch

diff --git a/drivers/pci/hotplug/pciehp_ctrl.c b/drivers/pci/hotplug/pciehp_ctrl.c
index 880978b..87c5bea 100644
--- a/drivers/pci/hotplug/pciehp_ctrl.c
+++ b/drivers/pci/hotplug/pciehp_ctrl.c
@@ -301,20 +301,6 @@  static void handle_button_press_event(struct slot *p_slot)
 /*
  * Note: This function must be called with slot->lock held
  */
-static void handle_surprise_event(struct slot *p_slot)
-{
-	u8 getstatus;
-
-	pciehp_get_adapter_status(p_slot, &getstatus);
-	if (!getstatus)
-		pciehp_queue_power_work(p_slot, DISABLE_REQ);
-	else
-		pciehp_queue_power_work(p_slot, ENABLE_REQ);
-}
-
-/*
- * Note: This function must be called with slot->lock held
- */
 static void handle_link_event(struct slot *p_slot, u32 event)
 {
 	struct controller *ctrl = p_slot->ctrl;
@@ -377,14 +363,14 @@  static void interrupt_event_handler(struct work_struct *work)
 		pciehp_green_led_off(p_slot);
 		break;
 	case INT_PRESENCE_ON:
-		handle_surprise_event(p_slot);
+		pciehp_queue_power_work(p_slot, ENABLE_REQ);
 		break;
 	case INT_PRESENCE_OFF:
 		/*
 		 * Regardless of surprise capability, we need to
 		 * definitely remove a card that has been pulled out!
 		 */
-		handle_surprise_event(p_slot);
+		pciehp_queue_power_work(p_slot, DISABLE_REQ);
 		break;
 	case INT_LINK_UP:
 	case INT_LINK_DOWN: