diff mbox

pciehp: Acknowledge the spurious "cmd completed" event.

Message ID 20131123005102.GA14690@google.com
State Not Applicable
Headers show

Commit Message

Bjorn Helgaas Nov. 23, 2013, 12:51 a.m. UTC
On Mon, Nov 11, 2013 at 01:26:06PM -0800, Rajat Jain wrote:
> Hello,
> 
> >> With my patch, we'd eliminate the 1 second delay and the second line
> >> of output above. If we also want to get rid of the first line
> >> "Unexpected CMD_COMPLETED..." also, that would probably have to be
> >> solved by changing the behavior to assume that CC shall be generated
> >> for every SLOTCTRL register write.
> >
> > I *do* want to get rid of the "Unexpected CMD_COMPLETED" complaint.
> > We shouldn't complain about hardware that is working perfectly fine.
> > I don't know the best way to do that yet.  I have found a box with the
> > same hardware that was fixed by 5808639bfa98, so I hope to play with
> > it and figure out something that will work nicely for both scenarios.
> 
> Please keep posted :-)
> 
> >
> > BTW, can you open a report at bugzilla.kernel.org and attach your
> > "lspci -vvxxx" and dmesg output?  When we eventually merge a fix, I'd
> > like to have the details archived somewhere.
> 
> Done:
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=64821
> 
> Please let me know if you need any help in trying something out - I'd
> be more than keen to help - write code or test.

What do you think of the patch below?  I'm afraid we'll trip over
a few other old parts similar to the 82801H, but I'd rather do that
than penalize the parts that work correctly.


PCI: pciehp: Support command completion even with no hotplug hardware

From: Bjorn Helgaas <bhelgaas@google.com>

Commit 5808639bfa98 ("pciehp: fix slow probing") fixed a slow probing
problem on hardware that doesn't conform to the spec, but caused a
similar problem for hardware that *does* conform to the spec.

Per PCIe 3.0, sec 7.8.10 and 6.7.3.2, any write to Slot Control generates a
hot-plug command.  Ports that can accept new commands with no delay can set
the "No Command Completed Support" bit.  Otherwise the port must indicate
its completion of the command and readiness to accept a new command with a
"command completed event."

5808639bfa98 assumes ports that lack a power controller, power indicator,
attention indicator, and interlock will not generate completion events,
even if they neglect to set "No Command Completed Support." But on ports
that lack those elements and *do* support command completion notification,
it causes:

  Unexpected CMD_COMPLETED. Need to wait for command completed event.
  Command not completed in 1000 msec

and forces us to wait for a 1 second timeout.

This patch makes the 5808639bfa98 workaround into a quirk that's applied
only to devices known to be broken, currently just Intel 82801H ports.
There are probably other similarly-broken devices that may now probe
slowly, but I don't know how to catch them all without penalizing the
ones that play by the rules.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=64821
Reported-by: Rajat Jain <rajatjain@juniper.net>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
---
 drivers/pci/hotplug/pciehp_hpc.c |   39 +++++++++++++++++++++++++++++---------
 1 file changed, 30 insertions(+), 9 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Guenter Roeck Nov. 23, 2013, 1:59 a.m. UTC | #1
On Fri, Nov 22, 2013 at 05:51:02PM -0700, Bjorn Helgaas wrote:
> On Mon, Nov 11, 2013 at 01:26:06PM -0800, Rajat Jain wrote:
> > Hello,
> > 
> > >> With my patch, we'd eliminate the 1 second delay and the second line
> > >> of output above. If we also want to get rid of the first line
> > >> "Unexpected CMD_COMPLETED..." also, that would probably have to be
> > >> solved by changing the behavior to assume that CC shall be generated
> > >> for every SLOTCTRL register write.
> > >
> > > I *do* want to get rid of the "Unexpected CMD_COMPLETED" complaint.
> > > We shouldn't complain about hardware that is working perfectly fine.
> > > I don't know the best way to do that yet.  I have found a box with the
> > > same hardware that was fixed by 5808639bfa98, so I hope to play with
> > > it and figure out something that will work nicely for both scenarios.
> > 
> > Please keep posted :-)
> > 
> > >
> > > BTW, can you open a report at bugzilla.kernel.org and attach your
> > > "lspci -vvxxx" and dmesg output?  When we eventually merge a fix, I'd
> > > like to have the details archived somewhere.
> > 
> > Done:
> > 
> > https://bugzilla.kernel.org/show_bug.cgi?id=64821
> > 
> > Please let me know if you need any help in trying something out - I'd
> > be more than keen to help - write code or test.
> 
> What do you think of the patch below?  I'm afraid we'll trip over
> a few other old parts similar to the 82801H, but I'd rather do that
> than penalize the parts that work correctly.
> 
Works nicely as far as I can see. Tested on P2020 and P5040 based systems
with IDT 89HPES48H12G2.

Tested-by: Guenter Roeck <groeck@juniper.net>

Guenter

> 
> PCI: pciehp: Support command completion even with no hotplug hardware
> 
> From: Bjorn Helgaas <bhelgaas@google.com>
> 
> Commit 5808639bfa98 ("pciehp: fix slow probing") fixed a slow probing
> problem on hardware that doesn't conform to the spec, but caused a
> similar problem for hardware that *does* conform to the spec.
> 
> Per PCIe 3.0, sec 7.8.10 and 6.7.3.2, any write to Slot Control generates a
> hot-plug command.  Ports that can accept new commands with no delay can set
> the "No Command Completed Support" bit.  Otherwise the port must indicate
> its completion of the command and readiness to accept a new command with a
> "command completed event."
> 
> 5808639bfa98 assumes ports that lack a power controller, power indicator,
> attention indicator, and interlock will not generate completion events,
> even if they neglect to set "No Command Completed Support." But on ports
> that lack those elements and *do* support command completion notification,
> it causes:
> 
>   Unexpected CMD_COMPLETED. Need to wait for command completed event.
>   Command not completed in 1000 msec
> 
> and forces us to wait for a 1 second timeout.
> 
> This patch makes the 5808639bfa98 workaround into a quirk that's applied
> only to devices known to be broken, currently just Intel 82801H ports.
> There are probably other similarly-broken devices that may now probe
> slowly, but I don't know how to catch them all without penalizing the
> ones that play by the rules.
> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=64821
> Reported-by: Rajat Jain <rajatjain@juniper.net>
> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
> ---
>  drivers/pci/hotplug/pciehp_hpc.c |   39 +++++++++++++++++++++++++++++---------
>  1 file changed, 30 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
> index 3eea3fdd4b0b..2fd2bd59e07f 100644
> --- a/drivers/pci/hotplug/pciehp_hpc.c
> +++ b/drivers/pci/hotplug/pciehp_hpc.c
> @@ -881,6 +881,34 @@ static inline void dbg_ctrl(struct controller *ctrl)
>  	ctrl_info(ctrl, "Slot Control           : 0x%04x\n", reg16);
>  }
>  
> +static int pciehp_no_command_complete(struct controller *ctrl)
> +{
> +	struct pcie_device *dev = ctrl->pcie;
> +	struct pci_dev *pdev = dev->port;
> +	u16 vendor, device;
> +
> +	if (NO_CMD_CMPL(ctrl))
> +		return 1;
> +
> +	/*
> +	 * Controller should notify on command completion unless the "No
> +	 * Command Completed Support" bit is set.  But some hardware does
> +	 * not.  See https://bugzilla.kernel.org/show_bug.cgi?id=10751
> +	 */
> +	if (!(POWER_CTRL(ctrl) | ATTN_LED(ctrl) | PWR_LED(ctrl) | EMI(ctrl))) {
> +		vendor = pdev->vendor;
> +		device = pdev->device;
> +		if (vendor == PCI_VENDOR_ID_INTEL &&
> +		    device >= 0x283f && device <= 0x2849) {
> +			dev_info(&dev->device, "device [%04x:%04x] does not notify on hotplug command completion\n",
> +				 vendor, device);
> +			return 1;
> +		}
> +	}
> +
> +	return 0;
> +}
> +
>  struct controller *pcie_init(struct pcie_device *dev)
>  {
>  	struct controller *ctrl;
> @@ -902,15 +930,8 @@ struct controller *pcie_init(struct pcie_device *dev)
>  	mutex_init(&ctrl->ctrl_lock);
>  	init_waitqueue_head(&ctrl->queue);
>  	dbg_ctrl(ctrl);
> -	/*
> -	 * Controller doesn't notify of command completion if the "No
> -	 * Command Completed Support" bit is set in Slot Capability
> -	 * register or the controller supports none of power
> -	 * controller, attention led, power led and EMI.
> -	 */
> -	if (NO_CMD_CMPL(ctrl) ||
> -	    !(POWER_CTRL(ctrl) | ATTN_LED(ctrl) | PWR_LED(ctrl) | EMI(ctrl)))
> -	    ctrl->no_cmd_complete = 1;
> +
> +	ctrl->no_cmd_complete = pciehp_no_command_complete(ctrl);
>  
>          /* Check if Data Link Layer Link Active Reporting is implemented */
>          if (pciehp_readl(ctrl, PCI_EXP_LNKCAP, &link_cap)) {
> --
> To unsubscribe from this list: send the line "unsubscribe linux-hotplug" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rajat Jain Nov. 23, 2013, 2:56 p.m. UTC | #2
Hello Bjorn,

> >
> > https://bugzilla.kernel.org/show_bug.cgi?id=64821
> >
> > Please let me know if you need any help in trying something out - I'd
> > be more than keen to help - write code or test.
> 
> What do you think of the patch below?  I'm afraid we'll trip over a few
> other old parts similar to the 82801H, but I'd rather do that than
> penalize the parts that work correctly.

Tested and works fine. Looks good (on a side note, we are introducing device specific check in a common code, but I do not think there is any other cleaner way). I think this patch should be applied as it solves the problem of slow probing.

On a different note, I feel there is still a need to apply my original patch. There is still an open problem in case of spurious interrupts (or in any case where the condition "if (slot_status & PCI_EXP_SLTSTA_CC)" becomes true in pcie_write_cmd()). That is because once that happens, we never clear that interrupt, and no further hotplug interrupts shall be received unless we do that.

Thanks,

Rajat

> 
> 
> PCI: pciehp: Support command completion even with no hotplug hardware
> 
> From: Bjorn Helgaas <bhelgaas@google.com>
> 
> Commit 5808639bfa98 ("pciehp: fix slow probing") fixed a slow probing
> problem on hardware that doesn't conform to the spec, but caused a
> similar problem for hardware that *does* conform to the spec.
> 
> Per PCIe 3.0, sec 7.8.10 and 6.7.3.2, any write to Slot Control
> generates a hot-plug command.  Ports that can accept new commands with
> no delay can set the "No Command Completed Support" bit.  Otherwise the
> port must indicate its completion of the command and readiness to accept
> a new command with a "command completed event."
> 
> 5808639bfa98 assumes ports that lack a power controller, power
> indicator, attention indicator, and interlock will not generate
> completion events, even if they neglect to set "No Command Completed
> Support." But on ports that lack those elements and *do* support command
> completion notification, it causes:
> 
>   Unexpected CMD_COMPLETED. Need to wait for command completed event.
>   Command not completed in 1000 msec
> 
> and forces us to wait for a 1 second timeout.
> 
> This patch makes the 5808639bfa98 workaround into a quirk that's applied
> only to devices known to be broken, currently just Intel 82801H ports.
> There are probably other similarly-broken devices that may now probe
> slowly, but I don't know how to catch them all without penalizing the
> ones that play by the rules.
> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=64821
> Reported-by: Rajat Jain <rajatjain@juniper.net>
> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
> ---
>  drivers/pci/hotplug/pciehp_hpc.c |   39 +++++++++++++++++++++++++++++--
> -------
>  1 file changed, 30 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/pci/hotplug/pciehp_hpc.c
> b/drivers/pci/hotplug/pciehp_hpc.c
> index 3eea3fdd4b0b..2fd2bd59e07f 100644
> --- a/drivers/pci/hotplug/pciehp_hpc.c
> +++ b/drivers/pci/hotplug/pciehp_hpc.c
> @@ -881,6 +881,34 @@ static inline void dbg_ctrl(struct controller
> *ctrl)
>  	ctrl_info(ctrl, "Slot Control           : 0x%04x\n", reg16);
>  }
> 
> +static int pciehp_no_command_complete(struct controller *ctrl) {
> +	struct pcie_device *dev = ctrl->pcie;
> +	struct pci_dev *pdev = dev->port;
> +	u16 vendor, device;
> +
> +	if (NO_CMD_CMPL(ctrl))
> +		return 1;
> +
> +	/*
> +	 * Controller should notify on command completion unless the "No
> +	 * Command Completed Support" bit is set.  But some hardware does
> +	 * not.  See https://bugzilla.kernel.org/show_bug.cgi?id=10751
> +	 */
> +	if (!(POWER_CTRL(ctrl) | ATTN_LED(ctrl) | PWR_LED(ctrl) |
> EMI(ctrl))) {
> +		vendor = pdev->vendor;
> +		device = pdev->device;
> +		if (vendor == PCI_VENDOR_ID_INTEL &&
> +		    device >= 0x283f && device <= 0x2849) {
> +			dev_info(&dev->device, "device [%04x:%04x] does not
> notify on hotplug command completion\n",
> +				 vendor, device);
> +			return 1;
> +		}
> +	}
> +
> +	return 0;
> +}
> +
>  struct controller *pcie_init(struct pcie_device *dev)  {
>  	struct controller *ctrl;
> @@ -902,15 +930,8 @@ struct controller *pcie_init(struct pcie_device
> *dev)
>  	mutex_init(&ctrl->ctrl_lock);
>  	init_waitqueue_head(&ctrl->queue);
>  	dbg_ctrl(ctrl);
> -	/*
> -	 * Controller doesn't notify of command completion if the "No
> -	 * Command Completed Support" bit is set in Slot Capability
> -	 * register or the controller supports none of power
> -	 * controller, attention led, power led and EMI.
> -	 */
> -	if (NO_CMD_CMPL(ctrl) ||
> -	    !(POWER_CTRL(ctrl) | ATTN_LED(ctrl) | PWR_LED(ctrl) |
> EMI(ctrl)))
> -	    ctrl->no_cmd_complete = 1;
> +
> +	ctrl->no_cmd_complete = pciehp_no_command_complete(ctrl);
> 
>          /* Check if Data Link Layer Link Active Reporting is
> implemented */
>          if (pciehp_readl(ctrl, PCI_EXP_LNKCAP, &link_cap)) {
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas Nov. 23, 2013, 7:32 p.m. UTC | #3
On Sat, Nov 23, 2013 at 7:56 AM, Rajat Jain <rajatjain@juniper.net> wrote:
> Hello Bjorn,
>
>> >
>> > https://bugzilla.kernel.org/show_bug.cgi?id=64821
>> >
>> > Please let me know if you need any help in trying something out - I'd
>> > be more than keen to help - write code or test.
>>
>> What do you think of the patch below?  I'm afraid we'll trip over a few
>> other old parts similar to the 82801H, but I'd rather do that than
>> penalize the parts that work correctly.
>
> Tested and works fine. Looks good (on a side note, we are introducing device specific check in a common code, but I do not think there is any other cleaner way). I think this patch should be applied as it solves the problem of slow probing.

Thanks for testing.  I don't like the device-specific code either, but
it does seem like a device-specific defect, and I didn't have any
better ideas.

> On a different note, I feel there is still a need to apply my original patch. There is still an open problem in case of spurious interrupts (or in any case where the condition "if (slot_status & PCI_EXP_SLTSTA_CC)" becomes true in pcie_write_cmd()). That is because once that happens, we never clear that interrupt, and no further hotplug interrupts shall be received unless we do that.

I agree this is an issue and we should address it somehow.  My
hesitation is just that I'd prefer to do some more aggressive
restructuring rather than apply a point fix.  For example:

- We currently look at PCI_EXP_SLTSTA_CC in pcie_isr(),
pcie_poll_cmd(), and pcie_write_cmd().  I think it would be better to
look at it only in pcie_isr().

- I don't think pcie_poll_cmd() should exist at all; we should poll by
calling pcie_isr() instead.

- We need pcie_write_cmd(), but I think the way it waits is backwards.
 Currently we issue the command, then wait for it to complete.  I
think we should issue the command, note the current time, and return
without waiting.  The *next* time we need to issue a command, we can
wait for completion of the previous one (or timeout) if necessary.

But maybe we need the point fix in the interim, especially if anybody
can actually produce the scenario you mention.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rajat Jain Nov. 25, 2013, 7:03 p.m. UTC | #4
Hello,

> > On a different note, I feel there is still a need to apply my original
> patch. There is still an open problem in case of spurious interrupts (or
> in any case where the condition "if (slot_status & PCI_EXP_SLTSTA_CC)"
> becomes true in pcie_write_cmd()). That is because once that happens, we
> never clear that interrupt, and no further hotplug interrupts shall be
> received unless we do that.
> 
> I agree this is an issue and we should address it somehow.  My
> hesitation is just that I'd prefer to do some more aggressive
> restructuring rather than apply a point fix.  For example:

OK, I'll attempt to fix it that way when I get time.

> 
> - We currently look at PCI_EXP_SLTSTA_CC in pcie_isr(), pcie_poll_cmd(),
> and pcie_write_cmd().  I think it would be better to look at it only in
> pcie_isr().
> 
> - I don't think pcie_poll_cmd() should exist at all; we should poll by
> calling pcie_isr() instead.
> 
> - We need pcie_write_cmd(), but I think the way it waits is backwards.
>  Currently we issue the command, then wait for it to complete.  I think
> we should issue the command, note the current time, and return without
> waiting.  The *next* time we need to issue a command, we can wait for
> completion of the previous one (or timeout) if necessary.
> 
> But maybe we need the point fix in the interim, especially if anybody
> can actually produce the scenario you mention.

Ok.

Thanks,

Rajat

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas Feb. 12, 2014, 12:34 a.m. UTC | #5
On Mon, Nov 25, 2013 at 07:03:11PM +0000, Rajat Jain wrote:
> Hello,
> 
> > > On a different note, I feel there is still a need to apply my original
> > patch. There is still an open problem in case of spurious interrupts (or
> > in any case where the condition "if (slot_status & PCI_EXP_SLTSTA_CC)"
> > becomes true in pcie_write_cmd()). That is because once that happens, we
> > never clear that interrupt, and no further hotplug interrupts shall be
> > received unless we do that.
> > 
> > I agree this is an issue and we should address it somehow.  My
> > hesitation is just that I'd prefer to do some more aggressive
> > restructuring rather than apply a point fix.  For example:
> 
> OK, I'll attempt to fix it that way when I get time.
> 
> > 
> > - We currently look at PCI_EXP_SLTSTA_CC in pcie_isr(), pcie_poll_cmd(),
> > and pcie_write_cmd().  I think it would be better to look at it only in
> > pcie_isr().
> > 
> > - I don't think pcie_poll_cmd() should exist at all; we should poll by
> > calling pcie_isr() instead.
> > 
> > - We need pcie_write_cmd(), but I think the way it waits is backwards.
> >  Currently we issue the command, then wait for it to complete.  I think
> > we should issue the command, note the current time, and return without
> > waiting.  The *next* time we need to issue a command, we can wait for
> > completion of the previous one (or timeout) if necessary.
> > 
> > But maybe we need the point fix in the interim, especially if anybody
> > can actually produce the scenario you mention.
> 
> Ok.

This patch is still in patchwork, but I've lost track of where we are.
Did you resolve this in the series that I just applied, or is it still
an outstanding issue?

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rajat Jain Feb. 12, 2014, 1:08 a.m. UTC | #6
Hello Bjorn,

> -----Original Message-----

> From: linux-hotplug-owner@vger.kernel.org [mailto:linux-hotplug-

> owner@vger.kernel.org] On Behalf Of Bjorn Helgaas

> Sent: Tuesday, February 11, 2014 4:35 PM

> To: Rajat Jain

> Cc: Rajat Jain; linux-pci@vger.kernel.org; linux hotplug mailing; Kenji

> Kaneshige; Yijing Wang; Greg KH; Tom Nguyen; Kristen Accardi; Rajat

> Jain; Guenter Roeck

> Subject: Re: [PATCH] pciehp: Acknowledge the spurious "cmd completed"

> event.

> 

> On Mon, Nov 25, 2013 at 07:03:11PM +0000, Rajat Jain wrote:

> > Hello,

> >

> > > > On a different note, I feel there is still a need to apply my

> > > > original

> > > patch. There is still an open problem in case of spurious interrupts

> > > (or in any case where the condition "if (slot_status &

> PCI_EXP_SLTSTA_CC)"

> > > becomes true in pcie_write_cmd()). That is because once that

> > > happens, we never clear that interrupt, and no further hotplug

> > > interrupts shall be received unless we do that.

> > >

> > > I agree this is an issue and we should address it somehow.  My

> > > hesitation is just that I'd prefer to do some more aggressive

> > > restructuring rather than apply a point fix.  For example:

> >

> > OK, I'll attempt to fix it that way when I get time.

> >

> > >

> > > - We currently look at PCI_EXP_SLTSTA_CC in pcie_isr(),

> > > pcie_poll_cmd(), and pcie_write_cmd().  I think it would be better

> > > to look at it only in pcie_isr().

> > >

> > > - I don't think pcie_poll_cmd() should exist at all; we should poll

> > > by calling pcie_isr() instead.

> > >

> > > - We need pcie_write_cmd(), but I think the way it waits is

> backwards.

> > >  Currently we issue the command, then wait for it to complete.  I

> > > think we should issue the command, note the current time, and return

> > > without waiting.  The *next* time we need to issue a command, we can

> > > wait for completion of the previous one (or timeout) if necessary.

> > >

> > > But maybe we need the point fix in the interim, especially if

> > > anybody can actually produce the scenario you mention.

> >

> > Ok.

> 

> This patch is still in patchwork, but I've lost track of where we are.

> Did you resolve this in the series that I just applied, or is it still

> an outstanding issue?


No, I did not solve it. It is still an outstanding issue. So far I am using your patch to overcome this:

http://www.spinics.net/lists/hotplug/msg05830.html

I'll just attempt to conclude the status on this issue so that you can make the decision on the course of action. IMHO there are 2 independent issues that we discussed in this thread:
	
1) PCIe compliant HW (that generates cmd completed interrupts at every write of Slot_ctrl register) being penalized with 1 second delay during the boot up. Your patch solves this.

2) If there is a genuine spurious interrupt, it does not get acknowledged. I had originally posted a patch for THIS problem.
http://www.spinics.net/lists/hotplug/msg05815.html

You had indicated that you would rather want a bigger restructuring of the driver to solve (2).

My observation: MY problem (in my setup) is not seen if I use either of the patches (yours or mine).

My opinion: I think my patch solves (2) but might not solve (1) for all corner cases. Also your patch solves (1) but may not solve (2) for all corner cases -Thus we should probably solve both of these problems individually.

Thanks,

Rajat



 


> 

> Bjorn

> --

> To unsubscribe from this list: send the line "unsubscribe linux-hotplug"

> in the body of a message to majordomo@vger.kernel.org More majordomo

> info at  http://vger.kernel.org/majordomo-info.html

>
Rajat Jain Feb. 20, 2014, 7:42 a.m. UTC | #7
Hello Bjorn,

>>
>> On Mon, Nov 25, 2013 at 07:03:11PM +0000, Rajat Jain wrote:
>> > Hello,
>> >
>> > > > On a different note, I feel there is still a need to apply my
>> > > > original
>> > > patch. There is still an open problem in case of spurious interrupts
>> > > (or in any case where the condition "if (slot_status &
>> PCI_EXP_SLTSTA_CC)"
>> > > becomes true in pcie_write_cmd()). That is because once that
>> > > happens, we never clear that interrupt, and no further hotplug
>> > > interrupts shall be received unless we do that.
>> > >
>> > > I agree this is an issue and we should address it somehow.  My
>> > > hesitation is just that I'd prefer to do some more aggressive
>> > > restructuring rather than apply a point fix.  For example:
>> >
>> > OK, I'll attempt to fix it that way when I get time.
>> >
>> > >
>> > > - We currently look at PCI_EXP_SLTSTA_CC in pcie_isr(),
>> > > pcie_poll_cmd(), and pcie_write_cmd().  I think it would be better
>> > > to look at it only in pcie_isr().
>> > >
>> > > - I don't think pcie_poll_cmd() should exist at all; we should poll
>> > > by calling pcie_isr() instead.
>> > >
>> > > - We need pcie_write_cmd(), but I think the way it waits is
>> backwards.
>> > >  Currently we issue the command, then wait for it to complete.  I
>> > > think we should issue the command, note the current time, and return
>> > > without waiting.  The *next* time we need to issue a command, we can
>> > > wait for completion of the previous one (or timeout) if necessary.
>> > >
>> > > But maybe we need the point fix in the interim, especially if
>> > > anybody can actually produce the scenario you mention.
>> >
>> > Ok.
>>
>> This patch is still in patchwork, but I've lost track of where we are.
>> Did you resolve this in the series that I just applied, or is it still
>> an outstanding issue?
>
> No, I did not solve it. It is still an outstanding issue. So far I am using your patch to overcome this:
>
> http://www.spinics.net/lists/hotplug/msg05830.html
>
> I'll just attempt to conclude the status on this issue so that you can make the decision on the course of action. IMHO there are 2 independent issues that we discussed in this thread:
>
> 1) PCIe compliant HW (that generates cmd completed interrupts at every write of Slot_ctrl register) being penalized with 1 second delay during the boot up. Your patch solves this.
>
> 2) If there is a genuine spurious interrupt, it does not get acknowledged. I had originally posted a patch for THIS problem.
> http://www.spinics.net/lists/hotplug/msg05815.html
>
> You had indicated that you would rather want a bigger restructuring of the driver to solve (2).
>
> My observation: MY problem (in my setup) is not seen if I use either of the patches (yours or mine).
>
> My opinion: I think my patch solves (2) but might not solve (1) for all corner cases. Also your patch solves (1) but may not solve (2) for all corner cases -Thus we should probably solve both of these problems individually.
>

Just wondering if you decided on how to solve this problem?

Are you planning this for 3.15?

Thanks,

Rajat
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas Feb. 20, 2014, 10:20 p.m. UTC | #8
On Thu, Feb 20, 2014 at 12:42 AM, Rajat Jain <rajatxjain@gmail.com> wrote:
> Hello Bjorn,
>
>>>
>>> On Mon, Nov 25, 2013 at 07:03:11PM +0000, Rajat Jain wrote:
>>> > Hello,
>>> >
>>> > > > On a different note, I feel there is still a need to apply my
>>> > > > original
>>> > > patch. There is still an open problem in case of spurious interrupts
>>> > > (or in any case where the condition "if (slot_status &
>>> PCI_EXP_SLTSTA_CC)"
>>> > > becomes true in pcie_write_cmd()). That is because once that
>>> > > happens, we never clear that interrupt, and no further hotplug
>>> > > interrupts shall be received unless we do that.
>>> > >
>>> > > I agree this is an issue and we should address it somehow.  My
>>> > > hesitation is just that I'd prefer to do some more aggressive
>>> > > restructuring rather than apply a point fix.  For example:
>>> >
>>> > OK, I'll attempt to fix it that way when I get time.
>>> >
>>> > >
>>> > > - We currently look at PCI_EXP_SLTSTA_CC in pcie_isr(),
>>> > > pcie_poll_cmd(), and pcie_write_cmd().  I think it would be better
>>> > > to look at it only in pcie_isr().
>>> > >
>>> > > - I don't think pcie_poll_cmd() should exist at all; we should poll
>>> > > by calling pcie_isr() instead.
>>> > >
>>> > > - We need pcie_write_cmd(), but I think the way it waits is
>>> backwards.
>>> > >  Currently we issue the command, then wait for it to complete.  I
>>> > > think we should issue the command, note the current time, and return
>>> > > without waiting.  The *next* time we need to issue a command, we can
>>> > > wait for completion of the previous one (or timeout) if necessary.
>>> > >
>>> > > But maybe we need the point fix in the interim, especially if
>>> > > anybody can actually produce the scenario you mention.
>>> >
>>> > Ok.
>>>
>>> This patch is still in patchwork, but I've lost track of where we are.
>>> Did you resolve this in the series that I just applied, or is it still
>>> an outstanding issue?
>>
>> No, I did not solve it. It is still an outstanding issue. So far I am using your patch to overcome this:
>>
>> http://www.spinics.net/lists/hotplug/msg05830.html
>>
>> I'll just attempt to conclude the status on this issue so that you can make the decision on the course of action. IMHO there are 2 independent issues that we discussed in this thread:
>>
>> 1) PCIe compliant HW (that generates cmd completed interrupts at every write of Slot_ctrl register) being penalized with 1 second delay during the boot up. Your patch solves this.
>>
>> 2) If there is a genuine spurious interrupt, it does not get acknowledged. I had originally posted a patch for THIS problem.
>> http://www.spinics.net/lists/hotplug/msg05815.html
>>
>> You had indicated that you would rather want a bigger restructuring of the driver to solve (2).
>>
>> My observation: MY problem (in my setup) is not seen if I use either of the patches (yours or mine).
>>
>> My opinion: I think my patch solves (2) but might not solve (1) for all corner cases. Also your patch solves (1) but may not solve (2) for all corner cases -Thus we should probably solve both of these problems individually.
>>
>
> Just wondering if you decided on how to solve this problem?
>
> Are you planning this for 3.15?

Sorry, I haven't had a chance to work on this, so I don't think *I*
will get anything done for v3.15.  To make forward progress, maybe we
should merge your original patch?  Would you mind posting it again so
it gets into patchwork again?

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rajat Jain Feb. 21, 2014, 1:43 a.m. UTC | #9
On Thu, Feb 20, 2014 at 2:20 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> On Thu, Feb 20, 2014 at 12:42 AM, Rajat Jain <rajatxjain@gmail.com> wrote:
>> Hello Bjorn,
>>
>>>>
>>>> On Mon, Nov 25, 2013 at 07:03:11PM +0000, Rajat Jain wrote:
>>>> > Hello,
>>>> >
>>>> > > > On a different note, I feel there is still a need to apply my
>>>> > > > original
>>>> > > patch. There is still an open problem in case of spurious interrupts
>>>> > > (or in any case where the condition "if (slot_status &
>>>> PCI_EXP_SLTSTA_CC)"
>>>> > > becomes true in pcie_write_cmd()). That is because once that
>>>> > > happens, we never clear that interrupt, and no further hotplug
>>>> > > interrupts shall be received unless we do that.
>>>> > >
>>>> > > I agree this is an issue and we should address it somehow.  My
>>>> > > hesitation is just that I'd prefer to do some more aggressive
>>>> > > restructuring rather than apply a point fix.  For example:
>>>> >
>>>> > OK, I'll attempt to fix it that way when I get time.
>>>> >
>>>> > >
>>>> > > - We currently look at PCI_EXP_SLTSTA_CC in pcie_isr(),
>>>> > > pcie_poll_cmd(), and pcie_write_cmd().  I think it would be better
>>>> > > to look at it only in pcie_isr().
>>>> > >
>>>> > > - I don't think pcie_poll_cmd() should exist at all; we should poll
>>>> > > by calling pcie_isr() instead.
>>>> > >
>>>> > > - We need pcie_write_cmd(), but I think the way it waits is
>>>> backwards.
>>>> > >  Currently we issue the command, then wait for it to complete.  I
>>>> > > think we should issue the command, note the current time, and return
>>>> > > without waiting.  The *next* time we need to issue a command, we can
>>>> > > wait for completion of the previous one (or timeout) if necessary.
>>>> > >
>>>> > > But maybe we need the point fix in the interim, especially if
>>>> > > anybody can actually produce the scenario you mention.
>>>> >
>>>> > Ok.
>>>>
>>>> This patch is still in patchwork, but I've lost track of where we are.
>>>> Did you resolve this in the series that I just applied, or is it still
>>>> an outstanding issue?
>>>
>>> No, I did not solve it. It is still an outstanding issue. So far I am using your patch to overcome this:
>>>
>>> http://www.spinics.net/lists/hotplug/msg05830.html
>>>
>>> I'll just attempt to conclude the status on this issue so that you can make the decision on the course of action. IMHO there are 2 independent issues that we discussed in this thread:
>>>
>>> 1) PCIe compliant HW (that generates cmd completed interrupts at every write of Slot_ctrl register) being penalized with 1 second delay during the boot up. Your patch solves this.
>>>
>>> 2) If there is a genuine spurious interrupt, it does not get acknowledged. I had originally posted a patch for THIS problem.
>>> http://www.spinics.net/lists/hotplug/msg05815.html
>>>
>>> You had indicated that you would rather want a bigger restructuring of the driver to solve (2).
>>>
>>> My observation: MY problem (in my setup) is not seen if I use either of the patches (yours or mine).
>>>
>>> My opinion: I think my patch solves (2) but might not solve (1) for all corner cases. Also your patch solves (1) but may not solve (2) for all corner cases -Thus we should probably solve both of these problems individually.
>>>
>>
>> Just wondering if you decided on how to solve this problem?
>>
>> Are you planning this for 3.15?
>
> Sorry, I haven't had a chance to work on this, so I don't think *I*
> will get anything done for v3.15.  To make forward progress, maybe we
> should merge your original patch?  Would you mind posting it again so
> it gets into patchwork again?

Just sent it again, Thanks!
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index 3eea3fdd4b0b..2fd2bd59e07f 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -881,6 +881,34 @@  static inline void dbg_ctrl(struct controller *ctrl)
 	ctrl_info(ctrl, "Slot Control           : 0x%04x\n", reg16);
 }
 
+static int pciehp_no_command_complete(struct controller *ctrl)
+{
+	struct pcie_device *dev = ctrl->pcie;
+	struct pci_dev *pdev = dev->port;
+	u16 vendor, device;
+
+	if (NO_CMD_CMPL(ctrl))
+		return 1;
+
+	/*
+	 * Controller should notify on command completion unless the "No
+	 * Command Completed Support" bit is set.  But some hardware does
+	 * not.  See https://bugzilla.kernel.org/show_bug.cgi?id=10751
+	 */
+	if (!(POWER_CTRL(ctrl) | ATTN_LED(ctrl) | PWR_LED(ctrl) | EMI(ctrl))) {
+		vendor = pdev->vendor;
+		device = pdev->device;
+		if (vendor == PCI_VENDOR_ID_INTEL &&
+		    device >= 0x283f && device <= 0x2849) {
+			dev_info(&dev->device, "device [%04x:%04x] does not notify on hotplug command completion\n",
+				 vendor, device);
+			return 1;
+		}
+	}
+
+	return 0;
+}
+
 struct controller *pcie_init(struct pcie_device *dev)
 {
 	struct controller *ctrl;
@@ -902,15 +930,8 @@  struct controller *pcie_init(struct pcie_device *dev)
 	mutex_init(&ctrl->ctrl_lock);
 	init_waitqueue_head(&ctrl->queue);
 	dbg_ctrl(ctrl);
-	/*
-	 * Controller doesn't notify of command completion if the "No
-	 * Command Completed Support" bit is set in Slot Capability
-	 * register or the controller supports none of power
-	 * controller, attention led, power led and EMI.
-	 */
-	if (NO_CMD_CMPL(ctrl) ||
-	    !(POWER_CTRL(ctrl) | ATTN_LED(ctrl) | PWR_LED(ctrl) | EMI(ctrl)))
-	    ctrl->no_cmd_complete = 1;
+
+	ctrl->no_cmd_complete = pciehp_no_command_complete(ctrl);
 
         /* Check if Data Link Layer Link Active Reporting is implemented */
         if (pciehp_readl(ctrl, PCI_EXP_LNKCAP, &link_cap)) {