diff mbox

[v4] pciehp: only wait command complete for real hotplug control

Message ID 1398966055-19325-1-git-send-email-yinghai@kernel.org
State Rejected
Headers show

Commit Message

Yinghai Lu May 1, 2014, 5:40 p.m. UTC
On system with 16 PCI express hotplug slots, customer complain every slot
will report "Command not completed in 1000 msec" during initialization.

Intel says that we should only wait command complete only for
           Electromechanical Interlock Control
           Power Controller Control
           Power Indicator Control
           Attention Indicator Control

For detail please check CF118 on
  http://www.intel.com/content/www/us/en/processors/xeon/xeon-e7-v2-spec-update.html

We saw same problems on system with AMD/Nvidia chipsets.

Two ways to address the problem:
1. change to only wait when (EIC, PCC, PIC, AIC) bits get changed.
2. or check CMD_COMPLETE and wait before write command. Only wait
   when CC is set and cmd_busy is true. For chipset from intel
   will only have CC set for real hotplug control, so we could
   skip the wait for others.

This patch is using second way.

With that we could save 16 seconds during booting, later with 32 sockets system
with 64 pcie hotplug slots we could save 64 seconds.

This patch address spurious "cmd completed" event problem that that
Rajat met.

-v2: use second way that is suggested by Bjorn.
-v3: rebase after
	PCI: pciehp: Acknowledge spurious "cmd completed" event
-v4: add comments according to Bjorn.
     also add URL for Intel doc about the CC set on intel chipset.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Tested-by: Rajat Jain <rajatxjain@gmail.com>

---
 drivers/pci/hotplug/pciehp_hpc.c |   48 ++++++++++++++++++++++++---------------
 1 file changed, 30 insertions(+), 18 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Rajat Jain May 1, 2014, 5:56 p.m. UTC | #1
Hello Yinghai,

On Thu, May 1, 2014 at 10:40 AM, Yinghai Lu <yinghai@kernel.org> wrote:
> On system with 16 PCI express hotplug slots, customer complain every slot
> will report "Command not completed in 1000 msec" during initialization.
>
> Intel says that we should only wait command complete only for
>            Electromechanical Interlock Control
>            Power Controller Control
>            Power Indicator Control
>            Attention Indicator Control

The document that you sent is an Intel ERRATA. Thus it would be nice
if you can please also mention CF118 in the context of it being an
Intel "errata", so that people do not confuse it with being expected /
default / standard behavior on all platforms.

Thanks,

Rajat

>
> For detail please check CF118 on
>   http://www.intel.com/content/www/us/en/processors/xeon/xeon-e7-v2-spec-update.html
>
> We saw same problems on system with AMD/Nvidia chipsets.
>
> Two ways to address the problem:
> 1. change to only wait when (EIC, PCC, PIC, AIC) bits get changed.
> 2. or check CMD_COMPLETE and wait before write command. Only wait
>    when CC is set and cmd_busy is true. For chipset from intel
>    will only have CC set for real hotplug control, so we could
>    skip the wait for others.
>
> This patch is using second way.
>
> With that we could save 16 seconds during booting, later with 32 sockets system
> with 64 pcie hotplug slots we could save 64 seconds.
>
> This patch address spurious "cmd completed" event problem that that
> Rajat met.
>
> -v2: use second way that is suggested by Bjorn.
> -v3: rebase after
>         PCI: pciehp: Acknowledge spurious "cmd completed" event
> -v4: add comments according to Bjorn.
>      also add URL for Intel doc about the CC set on intel chipset.
>
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> Tested-by: Rajat Jain <rajatxjain@gmail.com>
>
> ---
>  drivers/pci/hotplug/pciehp_hpc.c |   48 ++++++++++++++++++++++++---------------
>  1 file changed, 30 insertions(+), 18 deletions(-)
>
> Index: linux-2.6/drivers/pci/hotplug/pciehp_hpc.c
> ===================================================================
> --- linux-2.6.orig/drivers/pci/hotplug/pciehp_hpc.c
> +++ linux-2.6/drivers/pci/hotplug/pciehp_hpc.c
> @@ -155,20 +155,30 @@ static void pcie_write_cmd(struct contro
>         u16 slot_status;
>         u16 slot_ctrl;
>
> +       /*
> +        * We basically have two strategies for dealing with Command Complete.
> +        * Both are legal according to the PCIe spec:
> +        * A: Write command at first and wait for Command Complete is set.
> +        * B: Wait at first iff Command Complete and cmd_busy are set,
> +        *    that means previous command is not complete yet.
> +        *    Then write command.
> +        *
> +        * Second way also cover "Command not completed in 1000 msec"
> +        * Problem on system with chipsets that only set Command Complete
> +        * for some "real" hotplug events like:
> +        *      Electromechanical Interlock Control
> +        *      Power Controller Control
> +        *      Power Indicator Control
> +        *      Attention Indicator Control
> +        * like CF118 in
> +        * http://www.intel.com/content/www/us/en/processors/xeon/xeon-e7-v2-spec-update.html
> +        */
> +
>         mutex_lock(&ctrl->ctrl_lock);
>
>         pcie_capability_read_word(pdev, PCI_EXP_SLTSTA, &slot_status);
> -       if (slot_status & PCI_EXP_SLTSTA_CC) {
> -               pcie_capability_write_word(pdev, PCI_EXP_SLTSTA,
> -                                          PCI_EXP_SLTSTA_CC);
> -               if (!ctrl->no_cmd_complete) {
> -                       /*
> -                        * After 1 sec and CMD_COMPLETED still not set, just
> -                        * proceed forward to issue the next command according
> -                        * to spec. Just print out the error message.
> -                        */
> -                       ctrl_dbg(ctrl, "CMD_COMPLETED not clear after 1 sec\n");
> -               } else if (!NO_CMD_CMPL(ctrl)) {
> +       if (ctrl->no_cmd_complete && slot_status & PCI_EXP_SLTSTA_CC) {
> +               if (!NO_CMD_CMPL(ctrl)) {
>                         /*
>                          * This controller seems to notify of command completed
>                          * event even though it supports none of power
> @@ -184,16 +194,11 @@ static void pcie_write_cmd(struct contro
>         }
>
>         pcie_capability_read_word(pdev, PCI_EXP_SLTCTL, &slot_ctrl);
> -       slot_ctrl &= ~mask;
> -       slot_ctrl |= (cmd & mask);
> -       ctrl->cmd_busy = 1;
> -       smp_mb();
> -       pcie_capability_write_word(pdev, PCI_EXP_SLTCTL, slot_ctrl);
> -
>         /*
>          * Wait for command completion.
>          */
> -       if (!ctrl->no_cmd_complete) {
> +       if (!ctrl->no_cmd_complete && (slot_status & PCI_EXP_SLTSTA_CC) &&
> +           ctrl->cmd_busy) {
>                 int poll = 0;
>                 /*
>                  * if hotplug interrupt is not enabled or command
> @@ -205,6 +210,13 @@ static void pcie_write_cmd(struct contro
>                         poll = 1;
>                  pcie_wait_cmd(ctrl, poll);
>         }
> +
> +       slot_ctrl &= ~mask;
> +       slot_ctrl |= (cmd & mask);
> +       ctrl->cmd_busy = 1;
> +       smp_mb();
> +       pcie_capability_write_word(pdev, PCI_EXP_SLTCTL, slot_ctrl);
> +
>         mutex_unlock(&ctrl->ctrl_lock);
>  }
>
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas May 1, 2014, 5:59 p.m. UTC | #2
On Thu, May 1, 2014 at 11:56 AM, Rajat Jain <rajatxjain@gmail.com> wrote:
> Hello Yinghai,
>
> On Thu, May 1, 2014 at 10:40 AM, Yinghai Lu <yinghai@kernel.org> wrote:
>> On system with 16 PCI express hotplug slots, customer complain every slot
>> will report "Command not completed in 1000 msec" during initialization.
>>
>> Intel says that we should only wait command complete only for
>>            Electromechanical Interlock Control
>>            Power Controller Control
>>            Power Indicator Control
>>            Attention Indicator Control
>
> The document that you sent is an Intel ERRATA. Thus it would be nice
> if you can please also mention CF118 in the context of it being an
> Intel "errata", so that people do not confuse it with being expected /
> default / standard behavior on all platforms.

Yep, I'll fix that up when I merge the patch.  I think we've gotten as
far as we can by iterating on this.

>> For detail please check CF118 on
>>   http://www.intel.com/content/www/us/en/processors/xeon/xeon-e7-v2-spec-update.html
>>
>> We saw same problems on system with AMD/Nvidia chipsets.
>>
>> Two ways to address the problem:
>> 1. change to only wait when (EIC, PCC, PIC, AIC) bits get changed.
>> 2. or check CMD_COMPLETE and wait before write command. Only wait
>>    when CC is set and cmd_busy is true. For chipset from intel
>>    will only have CC set for real hotplug control, so we could
>>    skip the wait for others.
>>
>> This patch is using second way.
>>
>> With that we could save 16 seconds during booting, later with 32 sockets system
>> with 64 pcie hotplug slots we could save 64 seconds.
>>
>> This patch address spurious "cmd completed" event problem that that
>> Rajat met.
>>
>> -v2: use second way that is suggested by Bjorn.
>> -v3: rebase after
>>         PCI: pciehp: Acknowledge spurious "cmd completed" event
>> -v4: add comments according to Bjorn.
>>      also add URL for Intel doc about the CC set on intel chipset.
>>
>> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
>> Tested-by: Rajat Jain <rajatxjain@gmail.com>
>>
>> ---
>>  drivers/pci/hotplug/pciehp_hpc.c |   48 ++++++++++++++++++++++++---------------
>>  1 file changed, 30 insertions(+), 18 deletions(-)
>>
>> Index: linux-2.6/drivers/pci/hotplug/pciehp_hpc.c
>> ===================================================================
>> --- linux-2.6.orig/drivers/pci/hotplug/pciehp_hpc.c
>> +++ linux-2.6/drivers/pci/hotplug/pciehp_hpc.c
>> @@ -155,20 +155,30 @@ static void pcie_write_cmd(struct contro
>>         u16 slot_status;
>>         u16 slot_ctrl;
>>
>> +       /*
>> +        * We basically have two strategies for dealing with Command Complete.
>> +        * Both are legal according to the PCIe spec:
>> +        * A: Write command at first and wait for Command Complete is set.
>> +        * B: Wait at first iff Command Complete and cmd_busy are set,
>> +        *    that means previous command is not complete yet.
>> +        *    Then write command.
>> +        *
>> +        * Second way also cover "Command not completed in 1000 msec"
>> +        * Problem on system with chipsets that only set Command Complete
>> +        * for some "real" hotplug events like:
>> +        *      Electromechanical Interlock Control
>> +        *      Power Controller Control
>> +        *      Power Indicator Control
>> +        *      Attention Indicator Control
>> +        * like CF118 in
>> +        * http://www.intel.com/content/www/us/en/processors/xeon/xeon-e7-v2-spec-update.html
>> +        */
>> +
>>         mutex_lock(&ctrl->ctrl_lock);
>>
>>         pcie_capability_read_word(pdev, PCI_EXP_SLTSTA, &slot_status);
>> -       if (slot_status & PCI_EXP_SLTSTA_CC) {
>> -               pcie_capability_write_word(pdev, PCI_EXP_SLTSTA,
>> -                                          PCI_EXP_SLTSTA_CC);
>> -               if (!ctrl->no_cmd_complete) {
>> -                       /*
>> -                        * After 1 sec and CMD_COMPLETED still not set, just
>> -                        * proceed forward to issue the next command according
>> -                        * to spec. Just print out the error message.
>> -                        */
>> -                       ctrl_dbg(ctrl, "CMD_COMPLETED not clear after 1 sec\n");
>> -               } else if (!NO_CMD_CMPL(ctrl)) {
>> +       if (ctrl->no_cmd_complete && slot_status & PCI_EXP_SLTSTA_CC) {
>> +               if (!NO_CMD_CMPL(ctrl)) {
>>                         /*
>>                          * This controller seems to notify of command completed
>>                          * event even though it supports none of power
>> @@ -184,16 +194,11 @@ static void pcie_write_cmd(struct contro
>>         }
>>
>>         pcie_capability_read_word(pdev, PCI_EXP_SLTCTL, &slot_ctrl);
>> -       slot_ctrl &= ~mask;
>> -       slot_ctrl |= (cmd & mask);
>> -       ctrl->cmd_busy = 1;
>> -       smp_mb();
>> -       pcie_capability_write_word(pdev, PCI_EXP_SLTCTL, slot_ctrl);
>> -
>>         /*
>>          * Wait for command completion.
>>          */
>> -       if (!ctrl->no_cmd_complete) {
>> +       if (!ctrl->no_cmd_complete && (slot_status & PCI_EXP_SLTSTA_CC) &&
>> +           ctrl->cmd_busy) {
>>                 int poll = 0;
>>                 /*
>>                  * if hotplug interrupt is not enabled or command
>> @@ -205,6 +210,13 @@ static void pcie_write_cmd(struct contro
>>                         poll = 1;
>>                  pcie_wait_cmd(ctrl, poll);
>>         }
>> +
>> +       slot_ctrl &= ~mask;
>> +       slot_ctrl |= (cmd & mask);
>> +       ctrl->cmd_busy = 1;
>> +       smp_mb();
>> +       pcie_capability_write_word(pdev, PCI_EXP_SLTCTL, slot_ctrl);
>> +
>>         mutex_unlock(&ctrl->ctrl_lock);
>>  }
>>
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas May 20, 2014, 8:53 p.m. UTC | #3
On Thu, May 01, 2014 at 10:40:55AM -0700, Yinghai Lu wrote:
> On system with 16 PCI express hotplug slots, customer complain every slot
> will report "Command not completed in 1000 msec" during initialization.
> 
> Intel says that we should only wait command complete only for
>            Electromechanical Interlock Control
>            Power Controller Control
>            Power Indicator Control
>            Attention Indicator Control
> 
> For detail please check CF118 on
>   http://www.intel.com/content/www/us/en/processors/xeon/xeon-e7-v2-spec-update.html
> 
> We saw same problems on system with AMD/Nvidia chipsets.
> 
> Two ways to address the problem:
> 1. change to only wait when (EIC, PCC, PIC, AIC) bits get changed.
> 2. or check CMD_COMPLETE and wait before write command. Only wait
>    when CC is set and cmd_busy is true. For chipset from intel
>    will only have CC set for real hotplug control, so we could
>    skip the wait for others.
> 
> This patch is using second way.
> 
> With that we could save 16 seconds during booting, later with 32 sockets system
> with 64 pcie hotplug slots we could save 64 seconds.

Doesn't this just move a timeout out of the boot path and into a future
hotplug event?

These controllers do not set the "No Command Completed Support" bit, so
pciehp expects that they will generate Command Completed events after they
process commands.

During initialization, pcie_enable_notification() writes a command to set
the DLLSCE, ABPE, PDCE, MRLSCE, HPIE, CCIE bits.  The (EIC, PCC, PIC, AIC)
bits aren't changed, so per CF118, the controller will not set the Command
Completed bit.

Later, when we turn on power to the slot, ctrl->cmd_busy will still be set,
so won't we timeout waiting for the Command Completed bit?

(I think there's a bug in your patch (see below) that would explain why you
wouldn't see this timeout in testing.)

The CF118 erratum only affects commands that change DLLSCE, ABPE, PDCE,
etc., without also changing (EIC, PCC, PIC, AIC).  Any command *after* one
of these should time out.

But we only issue commands like that at boot/suspend/resume/reset time.
Maybe it's acceptable for the first "real" hotplug command to timeout.
Subsequent commands shouldn't have any problems.

Or, since the actual hotplug events where we do change (EIC, PCC, PIC, AIC)
normally occur long after boot, maybe we could avoid the timeout by noting
the time when the initialization command (to set DLLSCE, ABPE, etc.) was
issued.  It's very likely that the next command will be more than 1000 msec
later, so there's no point in waiting for another timeout period.

> This patch address spurious "cmd completed" event problem that that
> Rajat met.
> 
> -v2: use second way that is suggested by Bjorn.
> -v3: rebase after
> 	PCI: pciehp: Acknowledge spurious "cmd completed" event
> -v4: add comments according to Bjorn.
>      also add URL for Intel doc about the CC set on intel chipset.
> 
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> Tested-by: Rajat Jain <rajatxjain@gmail.com>
> 
> ---
>  drivers/pci/hotplug/pciehp_hpc.c |   48 ++++++++++++++++++++++++---------------
>  1 file changed, 30 insertions(+), 18 deletions(-)
> 
> Index: linux-2.6/drivers/pci/hotplug/pciehp_hpc.c
> ===================================================================
> --- linux-2.6.orig/drivers/pci/hotplug/pciehp_hpc.c
> +++ linux-2.6/drivers/pci/hotplug/pciehp_hpc.c
> @@ -155,20 +155,30 @@ static void pcie_write_cmd(struct contro
>  	u16 slot_status;
>  	u16 slot_ctrl;
>  
> +	/*
> +	 * We basically have two strategies for dealing with Command Complete.
> +	 * Both are legal according to the PCIe spec:
> +	 * A: Write command at first and wait for Command Complete is set.
> +	 * B: Wait at first iff Command Complete and cmd_busy are set,
> +	 *    that means previous command is not complete yet.
> +	 *    Then write command.
> +	 *
> +	 * Second way also cover "Command not completed in 1000 msec"
> +	 * Problem on system with chipsets that only set Command Complete
> +	 * for some "real" hotplug events like:
> +	 *	Electromechanical Interlock Control
> +	 *	Power Controller Control
> +	 *	Power Indicator Control
> +	 *	Attention Indicator Control
> +	 * like CF118 in
> +	 * http://www.intel.com/content/www/us/en/processors/xeon/xeon-e7-v2-spec-update.html
> +	 */
> +
>  	mutex_lock(&ctrl->ctrl_lock);
>  
>  	pcie_capability_read_word(pdev, PCI_EXP_SLTSTA, &slot_status);
> -	if (slot_status & PCI_EXP_SLTSTA_CC) {
> -		pcie_capability_write_word(pdev, PCI_EXP_SLTSTA,
> -					   PCI_EXP_SLTSTA_CC);
> -		if (!ctrl->no_cmd_complete) {
> -			/*
> -			 * After 1 sec and CMD_COMPLETED still not set, just
> -			 * proceed forward to issue the next command according
> -			 * to spec. Just print out the error message.
> -			 */
> -			ctrl_dbg(ctrl, "CMD_COMPLETED not clear after 1 sec\n");
> -		} else if (!NO_CMD_CMPL(ctrl)) {
> +	if (ctrl->no_cmd_complete && slot_status & PCI_EXP_SLTSTA_CC) {
> +		if (!NO_CMD_CMPL(ctrl)) {
>  			/*
>  			 * This controller seems to notify of command completed
>  			 * event even though it supports none of power
> @@ -184,16 +194,11 @@ static void pcie_write_cmd(struct contro
>  	}
>  
>  	pcie_capability_read_word(pdev, PCI_EXP_SLTCTL, &slot_ctrl);
> -	slot_ctrl &= ~mask;
> -	slot_ctrl |= (cmd & mask);
> -	ctrl->cmd_busy = 1;
> -	smp_mb();
> -	pcie_capability_write_word(pdev, PCI_EXP_SLTCTL, slot_ctrl);
> -
>  	/*
>  	 * Wait for command completion.
>  	 */
> -	if (!ctrl->no_cmd_complete) {
> +	if (!ctrl->no_cmd_complete && (slot_status & PCI_EXP_SLTSTA_CC) &&
> +	    ctrl->cmd_busy) {

I don't understand this change.  Why do you test "slot_status &
PCI_EXP_SLTSTA_CC" here?  Did you mean to write this:

  if (!ctrl->no_cmd_complete && !(slot_status & PCI_EXP_SLTSTA_CC) && ...

instead?  I think we want to wait if (1) the controller generates Command
Complete events and (2) Slot Status indicates that a command is still in
progress.

>  		int poll = 0;
>  		/*
>  		 * if hotplug interrupt is not enabled or command
> @@ -205,6 +210,13 @@ static void pcie_write_cmd(struct contro
>  			poll = 1;
>                  pcie_wait_cmd(ctrl, poll);
>  	}
> +
> +	slot_ctrl &= ~mask;
> +	slot_ctrl |= (cmd & mask);
> +	ctrl->cmd_busy = 1;
> +	smp_mb();
> +	pcie_capability_write_word(pdev, PCI_EXP_SLTCTL, slot_ctrl);
> +
>  	mutex_unlock(&ctrl->ctrl_lock);
>  }
>  
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yinghai Lu May 30, 2014, 10:28 p.m. UTC | #4
On Tue, May 20, 2014 at 1:53 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> On Thu, May 01, 2014 at 10:40:55AM -0700, Yinghai Lu wrote:

>> With that we could save 16 seconds during booting, later with 32 sockets system
>> with 64 pcie hotplug slots we could save 64 seconds.
>
> Doesn't this just move a timeout out of the boot path and into a future
> hotplug event?

Both. But not affect real Hotplug events (EIC, PCC, PIC, AIC).

>
> These controllers do not set the "No Command Completed Support" bit, so
> pciehp expects that they will generate Command Completed events after they
> process commands.
>
> During initialization, pcie_enable_notification() writes a command to set
> the DLLSCE, ABPE, PDCE, MRLSCE, HPIE, CCIE bits.  The (EIC, PCC, PIC, AIC)
> bits aren't changed, so per CF118, the controller will not set the Command
> Completed bit.
>
> Later, when we turn on power to the slot, ctrl->cmd_busy will still be set,
> so won't we timeout waiting for the Command Completed bit?

If it is real hotplug events, cmd_busy will be cleared by pcie_isr.

If it is not real hotplug events, cmd_busy is still set, and but CC is not set.

later before we try to write cmd again, but only cmd_busy is set &&
CC is not set, that means previous write cmd is not real hotplug event.
so we should not wait.


>
> (I think there's a bug in your patch (see below) that would explain why you
> wouldn't see this timeout in testing.)
>
>>       /*
>>        * Wait for command completion.
>>        */
>> -     if (!ctrl->no_cmd_complete) {
>> +     if (!ctrl->no_cmd_complete && (slot_status & PCI_EXP_SLTSTA_CC) &&
>> +         ctrl->cmd_busy) {
>
> I don't understand this change.  Why do you test "slot_status &
> PCI_EXP_SLTSTA_CC" here?  Did you mean to write this:
>
>   if (!ctrl->no_cmd_complete && !(slot_status & PCI_EXP_SLTSTA_CC) && ...
>
> instead?  I think we want to wait if (1) the controller generates Command
> Complete events and (2) Slot Status indicates that a command is still in
> progress.

Yes. your logic is right for those real hotplug event, that means
CC is not set, and isr is not called to clear cmd_busy.

but it will wait for real or non-real cmd right after non-real hotplug cmd.

looks like we need to start one timer for non-real hotplug to clear cmd_busy?

Thanks

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yinghai Lu May 30, 2014, 10:50 p.m. UTC | #5
On Fri, May 30, 2014 at 3:28 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Tue, May 20, 2014 at 1:53 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>> I don't understand this change.  Why do you test "slot_status &
>> PCI_EXP_SLTSTA_CC" here?  Did you mean to write this:
>>
>>   if (!ctrl->no_cmd_complete && !(slot_status & PCI_EXP_SLTSTA_CC) && ...
>>
>> instead?  I think we want to wait if (1) the controller generates Command
>> Complete events and (2) Slot Status indicates that a command is still in
>> progress.
>
> Yes. your logic is right for those real hotplug event, that means
> CC is not set, and isr is not called to clear cmd_busy.
>
> but it will wait for real or non-real cmd right after non-real hotplug cmd.
>
> looks like we need to start one timer for non-real hotplug to clear cmd_busy?

That looks make it too complicated.

I would revert back to first way
   change to only wait when (EIC, PCC, PIC, AIC) bits get changed.
and add one flag in ctrl to see if need to apply the rule.

Thanks

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yinghai Lu May 30, 2014, 11:08 p.m. UTC | #6
On Fri, May 30, 2014 at 3:50 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Fri, May 30, 2014 at 3:28 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>> On Tue, May 20, 2014 at 1:53 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>>> I don't understand this change.  Why do you test "slot_status &
>>> PCI_EXP_SLTSTA_CC" here?  Did you mean to write this:
>>>
>>>   if (!ctrl->no_cmd_complete && !(slot_status & PCI_EXP_SLTSTA_CC) && ...
>>>
>>> instead?  I think we want to wait if (1) the controller generates Command
>>> Complete events and (2) Slot Status indicates that a command is still in
>>> progress.
>>
>> Yes. your logic is right for those real hotplug event, that means
>> CC is not set, and isr is not called to clear cmd_busy.
>>
>> but it will wait for real or non-real cmd right after non-real hotplug cmd.
>>
>> looks like we need to start one timer for non-real hotplug to clear cmd_busy?
>
> That looks make it too complicated.
>
> I would revert back to first way
>    change to only wait when (EIC, PCC, PIC, AIC) bits get changed.
> and add one flag in ctrl to see if need to apply the rule.
>

Please check update one with method 1 at

http://patchwork.ozlabs.org/patch/354365/

Thanks

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas May 30, 2014, 11:14 p.m. UTC | #7
On Fri, May 30, 2014 at 03:28:16PM -0700, Yinghai Lu wrote:
> On Tue, May 20, 2014 at 1:53 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> > On Thu, May 01, 2014 at 10:40:55AM -0700, Yinghai Lu wrote:
> 
> >> With that we could save 16 seconds during booting, later with 32 sockets system
> >> with 64 pcie hotplug slots we could save 64 seconds.
> >
> > Doesn't this just move a timeout out of the boot path and into a future
> > hotplug event?
> 
> Both. But not affect real Hotplug events (EIC, PCC, PIC, AIC).
> 
> >
> > These controllers do not set the "No Command Completed Support" bit, so
> > pciehp expects that they will generate Command Completed events after they
> > process commands.
> >
> > During initialization, pcie_enable_notification() writes a command to set
> > the DLLSCE, ABPE, PDCE, MRLSCE, HPIE, CCIE bits.  The (EIC, PCC, PIC, AIC)
> > bits aren't changed, so per CF118, the controller will not set the Command
> > Completed bit.
> >
> > Later, when we turn on power to the slot, ctrl->cmd_busy will still be set,
> > so won't we timeout waiting for the Command Completed bit?
> 
> If it is real hotplug events, cmd_busy will be cleared by pcie_isr.
> 
> If it is not real hotplug events, cmd_busy is still set, and but CC is not set.

The first command is not a "real" hotplug event (it doesn't change (EIC,
PCC, PIC, AIC)).  So these broken controllers won't set CC.  The second
command probably *will* be a real hotplug event.  But we have to wait for
CC to be set before we can issue it, and on these broken controllers, that
wait will time out.

> later before we try to write cmd again, but only cmd_busy is set &&
> CC is not set, that means previous write cmd is not real hotplug event.
> so we should not wait.

It means no such thing.  If cmd_busy is set and CC is not set, that means
we've written a command to SLTCTL, but it hasn't completed yet, and we
can't issue another command until either CC is set or a timeout period has
elapsed.

> > (I think there's a bug in your patch (see below) that would explain why you
> > wouldn't see this timeout in testing.)
> >
> >>       /*
> >>        * Wait for command completion.
> >>        */
> >> -     if (!ctrl->no_cmd_complete) {
> >> +     if (!ctrl->no_cmd_complete && (slot_status & PCI_EXP_SLTSTA_CC) &&
> >> +         ctrl->cmd_busy) {
> >
> > I don't understand this change.  Why do you test "slot_status &
> > PCI_EXP_SLTSTA_CC" here?  Did you mean to write this:
> >
> >   if (!ctrl->no_cmd_complete && !(slot_status & PCI_EXP_SLTSTA_CC) && ...
> >
> > instead?  I think we want to wait if (1) the controller generates Command
> > Complete events and (2) Slot Status indicates that a command is still in
> > progress.
> 
> Yes. your logic is right for those real hotplug event, that means
> CC is not set, and isr is not called to clear cmd_busy.
> 
> but it will wait for real or non-real cmd right after non-real hotplug cmd.
> 
> looks like we need to start one timer for non-real hotplug to clear cmd_busy?

I think your current patch is incorrect because you wait if the previous
command has completed, and you write a new command if the previous one is
still in progress.  That's backwards.

The code has to work correctly for spec-compliant hardware.  If we have to
wait for timeouts on hardware with errata, that's unfortunate, but at least
things will still work correctly.

If you can make it work correctly with no timeouts on both working and
broken hardware, that's obviously ideal.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yinghai Lu May 31, 2014, 1:25 a.m. UTC | #8
On Fri, May 30, 2014 at 4:14 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> I think your current patch is incorrect because you wait if the previous
> command has completed, and you write a new command if the previous one is
> still in progress.  That's backwards.
>

Agreed.
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

Index: linux-2.6/drivers/pci/hotplug/pciehp_hpc.c
===================================================================
--- linux-2.6.orig/drivers/pci/hotplug/pciehp_hpc.c
+++ linux-2.6/drivers/pci/hotplug/pciehp_hpc.c
@@ -155,20 +155,30 @@  static void pcie_write_cmd(struct contro
 	u16 slot_status;
 	u16 slot_ctrl;
 
+	/*
+	 * We basically have two strategies for dealing with Command Complete.
+	 * Both are legal according to the PCIe spec:
+	 * A: Write command at first and wait for Command Complete is set.
+	 * B: Wait at first iff Command Complete and cmd_busy are set,
+	 *    that means previous command is not complete yet.
+	 *    Then write command.
+	 *
+	 * Second way also cover "Command not completed in 1000 msec"
+	 * Problem on system with chipsets that only set Command Complete
+	 * for some "real" hotplug events like:
+	 *	Electromechanical Interlock Control
+	 *	Power Controller Control
+	 *	Power Indicator Control
+	 *	Attention Indicator Control
+	 * like CF118 in
+	 * http://www.intel.com/content/www/us/en/processors/xeon/xeon-e7-v2-spec-update.html
+	 */
+
 	mutex_lock(&ctrl->ctrl_lock);
 
 	pcie_capability_read_word(pdev, PCI_EXP_SLTSTA, &slot_status);
-	if (slot_status & PCI_EXP_SLTSTA_CC) {
-		pcie_capability_write_word(pdev, PCI_EXP_SLTSTA,
-					   PCI_EXP_SLTSTA_CC);
-		if (!ctrl->no_cmd_complete) {
-			/*
-			 * After 1 sec and CMD_COMPLETED still not set, just
-			 * proceed forward to issue the next command according
-			 * to spec. Just print out the error message.
-			 */
-			ctrl_dbg(ctrl, "CMD_COMPLETED not clear after 1 sec\n");
-		} else if (!NO_CMD_CMPL(ctrl)) {
+	if (ctrl->no_cmd_complete && slot_status & PCI_EXP_SLTSTA_CC) {
+		if (!NO_CMD_CMPL(ctrl)) {
 			/*
 			 * This controller seems to notify of command completed
 			 * event even though it supports none of power
@@ -184,16 +194,11 @@  static void pcie_write_cmd(struct contro
 	}
 
 	pcie_capability_read_word(pdev, PCI_EXP_SLTCTL, &slot_ctrl);
-	slot_ctrl &= ~mask;
-	slot_ctrl |= (cmd & mask);
-	ctrl->cmd_busy = 1;
-	smp_mb();
-	pcie_capability_write_word(pdev, PCI_EXP_SLTCTL, slot_ctrl);
-
 	/*
 	 * Wait for command completion.
 	 */
-	if (!ctrl->no_cmd_complete) {
+	if (!ctrl->no_cmd_complete && (slot_status & PCI_EXP_SLTSTA_CC) &&
+	    ctrl->cmd_busy) {
 		int poll = 0;
 		/*
 		 * if hotplug interrupt is not enabled or command
@@ -205,6 +210,13 @@  static void pcie_write_cmd(struct contro
 			poll = 1;
                 pcie_wait_cmd(ctrl, poll);
 	}
+
+	slot_ctrl &= ~mask;
+	slot_ctrl |= (cmd & mask);
+	ctrl->cmd_busy = 1;
+	smp_mb();
+	pcie_capability_write_word(pdev, PCI_EXP_SLTCTL, slot_ctrl);
+
 	mutex_unlock(&ctrl->ctrl_lock);
 }