diff mbox

powerpc/pseries: Avoid context switch in EEH reset if required

Message ID 1421621243-21265-1-git-send-email-gwshan@linux.vnet.ibm.com (mailing list archive)
State Rejected
Delegated to: Benjamin Herrenschmidt
Headers show

Commit Message

Gavin Shan Jan. 18, 2015, 10:47 p.m. UTC
On pseries platform, the EEH reset backend pseries_eeh_reset() can
be called in atomic context as follows. For this case, we should
call udelay() instead of msleep() to avoid context switching.

     drivers/scsi/ipr.c::ipr_reset_slot_reset_done()
     drivers/pci/pci.c::pci_set_pcie_reset_state()
     arch/powerpc/kernel/eeh.c::pcibios_set_pcie_reset_state()
     arch/powerpc/platforms/pseries/eeh_pseries.c::pseries_eeh_reset()

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Tested-by: Wen Xiong<wenxiong@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/pseries/eeh_pseries.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

Comments

Benjamin Herrenschmidt Jan. 20, 2015, 9:28 a.m. UTC | #1
On Mon, 2015-01-19 at 09:47 +1100, Gavin Shan wrote:
> On pseries platform, the EEH reset backend pseries_eeh_reset() can
> be called in atomic context as follows. For this case, we should
> call udelay() instead of msleep() to avoid context switching.
> 
>      drivers/scsi/ipr.c::ipr_reset_slot_reset_done()
>      drivers/pci/pci.c::pci_set_pcie_reset_state()
>      arch/powerpc/kernel/eeh.c::pcibios_set_pcie_reset_state()
>      arch/powerpc/platforms/pseries/eeh_pseries.c::pseries_eeh_reset()

It's not acceptable to introduce multi-millisecond delays at interrupt
time. In fact, we should generally not use udelay in such context.

I understand that this is an exceptional error handling case but it's
still not right.

Are there many other users of pci_set_pcie_reset_state() at interrupt
time ? Can we have a discussion with the PCI folks as to whether that
should be legal or not ?

I'm tempted to require that it's made illegal.

Ben.

> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> Tested-by: Wen Xiong<wenxiong@linux.vnet.ibm.com>
> ---
>  arch/powerpc/platforms/pseries/eeh_pseries.c | 12 ++++++++----
>  1 file changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c b/arch/powerpc/platforms/pseries/eeh_pseries.c
> index a6c7e19..67623a3 100644
> --- a/arch/powerpc/platforms/pseries/eeh_pseries.c
> +++ b/arch/powerpc/platforms/pseries/eeh_pseries.c
> @@ -503,8 +503,7 @@ static int pseries_eeh_get_state(struct eeh_pe *pe, int *state)
>   */
>  static int pseries_eeh_reset(struct eeh_pe *pe, int option)
>  {
> -	int config_addr;
> -	int ret;
> +	int config_addr, delay, ret;
>  
>  	/* Figure out PE address */
>  	config_addr = pe->config_addr;
> @@ -528,9 +527,14 @@ static int pseries_eeh_reset(struct eeh_pe *pe, int option)
>  	/* We need reset hold or settlement delay */
>  	if (option == EEH_RESET_FUNDAMENTAL ||
>  	    option == EEH_RESET_HOT)
> -		msleep(EEH_PE_RST_HOLD_TIME);
> +		delay = EEH_PE_RST_HOLD_TIME;
> +	else
> +		delay = EEH_PE_RST_SETTLE_TIME;
> +
> +	if (in_atomic())
> +		udelay(delay * 1000);
>  	else
> -		msleep(EEH_PE_RST_SETTLE_TIME);
> +		msleep(delay);
>  
>  	return ret;
>  }
Gavin Shan Jan. 20, 2015, 10:56 p.m. UTC | #2
On Tue, Jan 20, 2015 at 10:28:16AM +0100, Benjamin Herrenschmidt wrote:
>On Mon, 2015-01-19 at 09:47 +1100, Gavin Shan wrote:
>> On pseries platform, the EEH reset backend pseries_eeh_reset() can
>> be called in atomic context as follows. For this case, we should
>> call udelay() instead of msleep() to avoid context switching.
>> 
>>      drivers/scsi/ipr.c::ipr_reset_slot_reset_done()
>>      drivers/pci/pci.c::pci_set_pcie_reset_state()
>>      arch/powerpc/kernel/eeh.c::pcibios_set_pcie_reset_state()
>>      arch/powerpc/platforms/pseries/eeh_pseries.c::pseries_eeh_reset()
>
>It's not acceptable to introduce multi-millisecond delays at interrupt
>time. In fact, we should generally not use udelay in such context.
>
>I understand that this is an exceptional error handling case but it's
>still not right.
>

Yes, I agree it's unsafe to udelay for multi-milliseconds as the queued
works in atomic context is expected to be completed as soon as possible.

>Are there many other users of pci_set_pcie_reset_state() at interrupt
>time ? Can we have a discussion with the PCI folks as to whether that
>should be legal or not ?
>
>I'm tempted to require that it's made illegal.

Currently, there are 2 drivers calling this function: IPR and misc/genwqe.
Also, VFIO would call this function for IBM and Mellanox adapters in PowerKVM
repository. For now, IPR driver is the only one call this function in atomic
context. 

Sure, I'll send one email to confirm with PCI folks. I guess it's illegal
to call pci_set_pcie_reset_state() in atomic context. If it's the case,
I'm afraid Wendy has to change IPR driver to replace the reset timer with
something else (e.g. workqueue).

Thanks,
Gavin

>
>Ben.
>
>> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>> Tested-by: Wen Xiong<wenxiong@linux.vnet.ibm.com>
>> ---
>>  arch/powerpc/platforms/pseries/eeh_pseries.c | 12 ++++++++----
>>  1 file changed, 8 insertions(+), 4 deletions(-)
>> 
>> diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c b/arch/powerpc/platforms/pseries/eeh_pseries.c
>> index a6c7e19..67623a3 100644
>> --- a/arch/powerpc/platforms/pseries/eeh_pseries.c
>> +++ b/arch/powerpc/platforms/pseries/eeh_pseries.c
>> @@ -503,8 +503,7 @@ static int pseries_eeh_get_state(struct eeh_pe *pe, int *state)
>>   */
>>  static int pseries_eeh_reset(struct eeh_pe *pe, int option)
>>  {
>> -	int config_addr;
>> -	int ret;
>> +	int config_addr, delay, ret;
>>  
>>  	/* Figure out PE address */
>>  	config_addr = pe->config_addr;
>> @@ -528,9 +527,14 @@ static int pseries_eeh_reset(struct eeh_pe *pe, int option)
>>  	/* We need reset hold or settlement delay */
>>  	if (option == EEH_RESET_FUNDAMENTAL ||
>>  	    option == EEH_RESET_HOT)
>> -		msleep(EEH_PE_RST_HOLD_TIME);
>> +		delay = EEH_PE_RST_HOLD_TIME;
>> +	else
>> +		delay = EEH_PE_RST_SETTLE_TIME;
>> +
>> +	if (in_atomic())
>> +		udelay(delay * 1000);
>>  	else
>> -		msleep(EEH_PE_RST_SETTLE_TIME);
>> +		msleep(delay);
>>  
>>  	return ret;
>>  }
>
>
Gavin Shan Jan. 20, 2015, 11:53 p.m. UTC | #3
On Wed, Jan 21, 2015 at 09:56:07AM +1100, Gavin Shan wrote:
>On Tue, Jan 20, 2015 at 10:28:16AM +0100, Benjamin Herrenschmidt wrote:
>>On Mon, 2015-01-19 at 09:47 +1100, Gavin Shan wrote:
>>> On pseries platform, the EEH reset backend pseries_eeh_reset() can
>>> be called in atomic context as follows. For this case, we should
>>> call udelay() instead of msleep() to avoid context switching.
>>> 
>>>      drivers/scsi/ipr.c::ipr_reset_slot_reset_done()
>>>      drivers/pci/pci.c::pci_set_pcie_reset_state()
>>>      arch/powerpc/kernel/eeh.c::pcibios_set_pcie_reset_state()
>>>      arch/powerpc/platforms/pseries/eeh_pseries.c::pseries_eeh_reset()
>>
>>It's not acceptable to introduce multi-millisecond delays at interrupt
>>time. In fact, we should generally not use udelay in such context.
>>
>>I understand that this is an exceptional error handling case but it's
>>still not right.
>>
>
>Yes, I agree it's unsafe to udelay for multi-milliseconds as the queued
>works in atomic context is expected to be completed as soon as possible.
>
>>Are there many other users of pci_set_pcie_reset_state() at interrupt
>>time ? Can we have a discussion with the PCI folks as to whether that
>>should be legal or not ?
>>
>>I'm tempted to require that it's made illegal.
>
>Currently, there are 2 drivers calling this function: IPR and misc/genwqe.
>Also, VFIO would call this function for IBM and Mellanox adapters in PowerKVM
>repository. For now, IPR driver is the only one call this function in atomic
>context. 
>
>Sure, I'll send one email to confirm with PCI folks. I guess it's illegal
>to call pci_set_pcie_reset_state() in atomic context. If it's the case,
>I'm afraid Wendy has to change IPR driver to replace the reset timer with
>something else (e.g. workqueue).
>

Another way is to drop the hold/settle delays for pcibios_set_pcie_reset_state()
and IPR relies on the timer interval to cover them. Wendy, could you please
let me know if it would work for you or not?

    Start reset timer;
    Timer expires, assert the reset. Restart the timer with assert delay;
    Timer expires, deassert the reset. Restart the timer with settle delay;
    Timer expires, ready for subsequent works;

Thanks,
Gavin

>>
>>Ben.
>>
>>> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>> Tested-by: Wen Xiong<wenxiong@linux.vnet.ibm.com>
>>> ---
>>>  arch/powerpc/platforms/pseries/eeh_pseries.c | 12 ++++++++----
>>>  1 file changed, 8 insertions(+), 4 deletions(-)
>>> 
>>> diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c b/arch/powerpc/platforms/pseries/eeh_pseries.c
>>> index a6c7e19..67623a3 100644
>>> --- a/arch/powerpc/platforms/pseries/eeh_pseries.c
>>> +++ b/arch/powerpc/platforms/pseries/eeh_pseries.c
>>> @@ -503,8 +503,7 @@ static int pseries_eeh_get_state(struct eeh_pe *pe, int *state)
>>>   */
>>>  static int pseries_eeh_reset(struct eeh_pe *pe, int option)
>>>  {
>>> -	int config_addr;
>>> -	int ret;
>>> +	int config_addr, delay, ret;
>>>  
>>>  	/* Figure out PE address */
>>>  	config_addr = pe->config_addr;
>>> @@ -528,9 +527,14 @@ static int pseries_eeh_reset(struct eeh_pe *pe, int option)
>>>  	/* We need reset hold or settlement delay */
>>>  	if (option == EEH_RESET_FUNDAMENTAL ||
>>>  	    option == EEH_RESET_HOT)
>>> -		msleep(EEH_PE_RST_HOLD_TIME);
>>> +		delay = EEH_PE_RST_HOLD_TIME;
>>> +	else
>>> +		delay = EEH_PE_RST_SETTLE_TIME;
>>> +
>>> +	if (in_atomic())
>>> +		udelay(delay * 1000);
>>>  	else
>>> -		msleep(EEH_PE_RST_SETTLE_TIME);
>>> +		msleep(delay);
>>>  
>>>  	return ret;
>>>  }
>>
>>
Gavin Shan Jan. 23, 2015, 3:50 a.m. UTC | #4
On Wed, Jan 21, 2015 at 10:53:38AM +1100, Gavin Shan wrote:
>On Wed, Jan 21, 2015 at 09:56:07AM +1100, Gavin Shan wrote:
>>On Tue, Jan 20, 2015 at 10:28:16AM +0100, Benjamin Herrenschmidt wrote:
>>>On Mon, 2015-01-19 at 09:47 +1100, Gavin Shan wrote:
>>>> On pseries platform, the EEH reset backend pseries_eeh_reset() can
>>>> be called in atomic context as follows. For this case, we should
>>>> call udelay() instead of msleep() to avoid context switching.
>>>> 
>>>>      drivers/scsi/ipr.c::ipr_reset_slot_reset_done()
>>>>      drivers/pci/pci.c::pci_set_pcie_reset_state()
>>>>      arch/powerpc/kernel/eeh.c::pcibios_set_pcie_reset_state()
>>>>      arch/powerpc/platforms/pseries/eeh_pseries.c::pseries_eeh_reset()
>>>
>>>It's not acceptable to introduce multi-millisecond delays at interrupt
>>>time. In fact, we should generally not use udelay in such context.
>>>
>>>I understand that this is an exceptional error handling case but it's
>>>still not right.
>>>
>>
>>Yes, I agree it's unsafe to udelay for multi-milliseconds as the queued
>>works in atomic context is expected to be completed as soon as possible.
>>
>>>Are there many other users of pci_set_pcie_reset_state() at interrupt
>>>time ? Can we have a discussion with the PCI folks as to whether that
>>>should be legal or not ?
>>>
>>>I'm tempted to require that it's made illegal.
>>
>>Currently, there are 2 drivers calling this function: IPR and misc/genwqe.
>>Also, VFIO would call this function for IBM and Mellanox adapters in PowerKVM
>>repository. For now, IPR driver is the only one call this function in atomic
>>context. 
>>
>>Sure, I'll send one email to confirm with PCI folks. I guess it's illegal
>>to call pci_set_pcie_reset_state() in atomic context. If it's the case,
>>I'm afraid Wendy has to change IPR driver to replace the reset timer with
>>something else (e.g. workqueue).
>>
>
>Another way is to drop the hold/settle delays for pcibios_set_pcie_reset_state()
>and IPR relies on the timer interval to cover them. Wendy, could you please
>let me know if it would work for you or not?
>
>    Start reset timer;
>    Timer expires, assert the reset. Restart the timer with assert delay;
>    Timer expires, deassert the reset. Restart the timer with settle delay;
>    Timer expires, ready for subsequent works;
>

Brian King is the author of pci_set_pcie_reset_state() and as he said, it was
expected to be called in atomic context. So I'm going to come up with another
approach as above, caller of pci_set_pcie_reset_state() should take corresponding 
delays as what IPR driver is doing.

Messages from Brian for reference:

| The API has changed. I wrote the pci_set_pcie_reset_state API originally.
| When this API was put in place initially, it was perfectly legal to call it
| from an atomic context. Can you clarify why we have to have the delay in the
| pci_set_pcie_reset_state function? Shouldn't it be the responsibility of the
| callers to ensure a proper delay is used? This was always the case until
| recently.

So please ignore this patch and I'll send another one, which is implemented in
above approach.

Thanks,
Gavin

>>>> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>>> Tested-by: Wen Xiong<wenxiong@linux.vnet.ibm.com>
>>>> ---
>>>>  arch/powerpc/platforms/pseries/eeh_pseries.c | 12 ++++++++----
>>>>  1 file changed, 8 insertions(+), 4 deletions(-)
>>>> 
>>>> diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c b/arch/powerpc/platforms/pseries/eeh_pseries.c
>>>> index a6c7e19..67623a3 100644
>>>> --- a/arch/powerpc/platforms/pseries/eeh_pseries.c
>>>> +++ b/arch/powerpc/platforms/pseries/eeh_pseries.c
>>>> @@ -503,8 +503,7 @@ static int pseries_eeh_get_state(struct eeh_pe *pe, int *state)
>>>>   */
>>>>  static int pseries_eeh_reset(struct eeh_pe *pe, int option)
>>>>  {
>>>> -	int config_addr;
>>>> -	int ret;
>>>> +	int config_addr, delay, ret;
>>>>  
>>>>  	/* Figure out PE address */
>>>>  	config_addr = pe->config_addr;
>>>> @@ -528,9 +527,14 @@ static int pseries_eeh_reset(struct eeh_pe *pe, int option)
>>>>  	/* We need reset hold or settlement delay */
>>>>  	if (option == EEH_RESET_FUNDAMENTAL ||
>>>>  	    option == EEH_RESET_HOT)
>>>> -		msleep(EEH_PE_RST_HOLD_TIME);
>>>> +		delay = EEH_PE_RST_HOLD_TIME;
>>>> +	else
>>>> +		delay = EEH_PE_RST_SETTLE_TIME;
>>>> +
>>>> +	if (in_atomic())
>>>> +		udelay(delay * 1000);
>>>>  	else
>>>> -		msleep(EEH_PE_RST_SETTLE_TIME);
>>>> +		msleep(delay);
>>>>  
>>>>  	return ret;
>>>>  }
>>>
>>>
Benjamin Herrenschmidt Jan. 24, 2015, 9:57 a.m. UTC | #5
On Fri, 2015-01-23 at 14:50 +1100, Gavin Shan wrote:

> Messages from Brian for reference:
> 
> | The API has changed. I wrote the pci_set_pcie_reset_state API originally.
> | When this API was put in place initially, it was perfectly legal to call it
> | from an atomic context. Can you clarify why we have to have the delay in the
> | pci_set_pcie_reset_state function? Shouldn't it be the responsibility of the
> | callers to ensure a proper delay is used? This was always the case until
> | recently.
> 
> So please ignore this patch and I'll send another one, which is implemented in
> above approach.

I still think it's not a great idea to allow that API to be called in
atomic context.

Brian, the reset API for PCIe involves FW calls which might have to do
a bunch of stuff under the hood, including potentially significant
delays.

For example, under OPAL (and I suppose PowerVM), if doing a PERST, the
API calls will loop until the link is back up, at least when "releasing"
the reset line.

I wouldn't be surprised if on x86, similar kinds of ACPI calls are
needed which may not be the best thing to do in atomic context.

I don't see any specific performance issues with issuing resets, so I
would strongly advocate for changing the API requirements instead so
that it's called from a task context.

Cheers,
Ben.



> Thanks,
> Gavin
> 
> >>>> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> >>>> Tested-by: Wen Xiong<wenxiong@linux.vnet.ibm.com>
> >>>> ---
> >>>>  arch/powerpc/platforms/pseries/eeh_pseries.c | 12 ++++++++----
> >>>>  1 file changed, 8 insertions(+), 4 deletions(-)
> >>>> 
> >>>> diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c b/arch/powerpc/platforms/pseries/eeh_pseries.c
> >>>> index a6c7e19..67623a3 100644
> >>>> --- a/arch/powerpc/platforms/pseries/eeh_pseries.c
> >>>> +++ b/arch/powerpc/platforms/pseries/eeh_pseries.c
> >>>> @@ -503,8 +503,7 @@ static int pseries_eeh_get_state(struct eeh_pe *pe, int *state)
> >>>>   */
> >>>>  static int pseries_eeh_reset(struct eeh_pe *pe, int option)
> >>>>  {
> >>>> -	int config_addr;
> >>>> -	int ret;
> >>>> +	int config_addr, delay, ret;
> >>>>  
> >>>>  	/* Figure out PE address */
> >>>>  	config_addr = pe->config_addr;
> >>>> @@ -528,9 +527,14 @@ static int pseries_eeh_reset(struct eeh_pe *pe, int option)
> >>>>  	/* We need reset hold or settlement delay */
> >>>>  	if (option == EEH_RESET_FUNDAMENTAL ||
> >>>>  	    option == EEH_RESET_HOT)
> >>>> -		msleep(EEH_PE_RST_HOLD_TIME);
> >>>> +		delay = EEH_PE_RST_HOLD_TIME;
> >>>> +	else
> >>>> +		delay = EEH_PE_RST_SETTLE_TIME;
> >>>> +
> >>>> +	if (in_atomic())
> >>>> +		udelay(delay * 1000);
> >>>>  	else
> >>>> -		msleep(EEH_PE_RST_SETTLE_TIME);
> >>>> +		msleep(delay);
> >>>>  
> >>>>  	return ret;
> >>>>  }
> >>>
> >>>
Brian King Jan. 26, 2015, 11:36 p.m. UTC | #6
On 01/24/2015 03:57 AM, Benjamin Herrenschmidt wrote:
> On Fri, 2015-01-23 at 14:50 +1100, Gavin Shan wrote:
> 
>> Messages from Brian for reference:
>>
>> | The API has changed. I wrote the pci_set_pcie_reset_state API originally.
>> | When this API was put in place initially, it was perfectly legal to call it
>> | from an atomic context. Can you clarify why we have to have the delay in the
>> | pci_set_pcie_reset_state function? Shouldn't it be the responsibility of the
>> | callers to ensure a proper delay is used? This was always the case until
>> | recently.
>>
>> So please ignore this patch and I'll send another one, which is implemented in
>> above approach.
> 
> I still think it's not a great idea to allow that API to be called in
> atomic context.

That was the entire purpose of the API though. If a driver doesn't need to
do the reset in atomic context, why bother having separate assert / deassert
APIs and just have an API that does the reset, delay, and deassert?


> Brian, the reset API for PCIe involves FW calls which might have to do
> a bunch of stuff under the hood, including potentially significant
> delays.
> 
> For example, under OPAL (and I suppose PowerVM), if doing a PERST, the
> API calls will loop until the link is back up, at least when "releasing"
> the reset line.

Under PowerVM, this maps to the ibm,set-slot-reset. According to PAPR+ V2.7,
R1-7.2.4-5:

For RTAS calls which do not allow the Status of -2 (Busy), there may be “rare” instances where an
anomaly may occur and the call may take longer than a “very short period of time.” In these cases, the call
must complete within 250 microseconds. Otherwise, a hardware error response must be given.


> I wouldn't be surprised if on x86, similar kinds of ACPI calls are
> needed which may not be the best thing to do in atomic context.

x86 hasn't wired up the function yet, so we don't have any code
to look at here. In fact, Power is the only architecture that has
wired up this API at all, all other architectures will return -EINVAL
if it is called.


> I don't see any specific performance issues with issuing resets, so I
> would strongly advocate for changing the API requirements instead so
> that it's called from a task context.

To set some context, this function is only used by ipr for some old
broken adapters. These are adapters that are not supported on p8,
so will never show up under OPAL, only PowerVM. I'm fine with looking
at alternatives for the future, but I can't say I'm too excited about
changing the calling requirements for an API that has been around
for many years. Particularly given that this code is only needed for
these old adapters. If its difficult to implement this for OPAL without
noticeable delays, could we just return -EINVAL for this function on OPAL?,
since its not needed there today anyway.

Thanks,

Brian

> 
> Cheers,
> Ben.
> 
> 
> 
>> Thanks,
>> Gavin
>>
>>>>>> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>>>>> Tested-by: Wen Xiong<wenxiong@linux.vnet.ibm.com>
>>>>>> ---
>>>>>>  arch/powerpc/platforms/pseries/eeh_pseries.c | 12 ++++++++----
>>>>>>  1 file changed, 8 insertions(+), 4 deletions(-)
>>>>>>
>>>>>> diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c b/arch/powerpc/platforms/pseries/eeh_pseries.c
>>>>>> index a6c7e19..67623a3 100644
>>>>>> --- a/arch/powerpc/platforms/pseries/eeh_pseries.c
>>>>>> +++ b/arch/powerpc/platforms/pseries/eeh_pseries.c
>>>>>> @@ -503,8 +503,7 @@ static int pseries_eeh_get_state(struct eeh_pe *pe, int *state)
>>>>>>   */
>>>>>>  static int pseries_eeh_reset(struct eeh_pe *pe, int option)
>>>>>>  {
>>>>>> -	int config_addr;
>>>>>> -	int ret;
>>>>>> +	int config_addr, delay, ret;
>>>>>>  
>>>>>>  	/* Figure out PE address */
>>>>>>  	config_addr = pe->config_addr;
>>>>>> @@ -528,9 +527,14 @@ static int pseries_eeh_reset(struct eeh_pe *pe, int option)
>>>>>>  	/* We need reset hold or settlement delay */
>>>>>>  	if (option == EEH_RESET_FUNDAMENTAL ||
>>>>>>  	    option == EEH_RESET_HOT)
>>>>>> -		msleep(EEH_PE_RST_HOLD_TIME);
>>>>>> +		delay = EEH_PE_RST_HOLD_TIME;
>>>>>> +	else
>>>>>> +		delay = EEH_PE_RST_SETTLE_TIME;
>>>>>> +
>>>>>> +	if (in_atomic())
>>>>>> +		udelay(delay * 1000);
>>>>>>  	else
>>>>>> -		msleep(EEH_PE_RST_SETTLE_TIME);
>>>>>> +		msleep(delay);
>>>>>>  
>>>>>>  	return ret;
>>>>>>  }
>>>>>
>>>>>
> 
> 
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
>
Benjamin Herrenschmidt Jan. 27, 2015, 4:31 a.m. UTC | #7
On Mon, 2015-01-26 at 17:36 -0600, Brian King wrote:
> To set some context, this function is only used by ipr for some old
> broken adapters. These are adapters that are not supported on p8,
> so will never show up under OPAL, only PowerVM. I'm fine with looking
> at alternatives for the future, but I can't say I'm too excited about
> changing the calling requirements for an API that has been around
> for many years. Particularly given that this code is only needed for
> these old adapters. If its difficult to implement this for OPAL without
> noticeable delays, could we just return -EINVAL for this function on OPAL?,
> since its not needed there today anyway.

Because it's needed for other things nowadays afaik, though IPR is the only one
that needs this to be done at interrupt time...

In fact, even with IPR and the existing call, how do you wait for the link to come
back for a PERST ? That can take a while...

Ben.
Brian King Jan. 27, 2015, 10:58 p.m. UTC | #8
On 01/26/2015 10:31 PM, Benjamin Herrenschmidt wrote:
> On Mon, 2015-01-26 at 17:36 -0600, Brian King wrote:
>> To set some context, this function is only used by ipr for some old
>> broken adapters. These are adapters that are not supported on p8,
>> so will never show up under OPAL, only PowerVM. I'm fine with looking
>> at alternatives for the future, but I can't say I'm too excited about
>> changing the calling requirements for an API that has been around
>> for many years. Particularly given that this code is only needed for
>> these old adapters. If its difficult to implement this for OPAL without
>> noticeable delays, could we just return -EINVAL for this function on OPAL?,
>> since its not needed there today anyway.
> 
> Because it's needed for other things nowadays afaik, though IPR is the only one
> that needs this to be done at interrupt time...

I'd argue we are our own worst enemy here really. The new user is EEH code.
I don't see a huge reason that code would need to use this exact same API.

> In fact, even with IPR and the existing call, how do you wait for the link to come
> back for a PERST ? That can take a while...

Basically, I assert reset, delay for 1/2 second via a timer interrupt, deassert reset,
delay for 2 seconds via another timer interrupt, then proceed with adapter initialization.

Thanks,

Brian
Benjamin Herrenschmidt Jan. 27, 2015, 11:58 p.m. UTC | #9
On Tue, 2015-01-27 at 16:58 -0600, Brian King wrote:
> I'd argue we are our own worst enemy here really. The new user is EEH
> code.
> I don't see a huge reason that code would need to use this exact same
> API.
> 
> > In fact, even with IPR and the existing call, how do you wait for
> the link to come
> > back for a PERST ? That can take a while...
> 
> Basically, I assert reset, delay for 1/2 second via a timer interrupt,
> deassert reset,
> delay for 2 seconds via another timer interrupt, then proceed with
> adapter initialization.

I'm surprised that even works properly... for example in the case of
PERST we need to mask various error traps before asserting and unmask
them when the link comes up (such as the surprise link down error), I
don't see an opportunity in that scheme for FW to do that latter...

Ben.
Gavin Shan Jan. 30, 2015, 1:37 a.m. UTC | #10
On Wed, Jan 28, 2015 at 10:58:42AM +1100, Benjamin Herrenschmidt wrote:
>On Tue, 2015-01-27 at 16:58 -0600, Brian King wrote:
>> I'd argue we are our own worst enemy here really. The new user is EEH
>> code.
>> I don't see a huge reason that code would need to use this exact same
>> API.
>> 
>> > In fact, even with IPR and the existing call, how do you wait for
>> the link to come
>> > back for a PERST ? That can take a while...
>> 
>> Basically, I assert reset, delay for 1/2 second via a timer interrupt,
>> deassert reset,
>> delay for 2 seconds via another timer interrupt, then proceed with
>> adapter initialization.
>
>I'm surprised that even works properly... for example in the case of
>PERST we need to mask various error traps before asserting and unmask
>them when the link comes up (such as the surprise link down error), I
>don't see an opportunity in that scheme for FW to do that latter...
>

The FW perhaps does more than what's supposed to do for assert, and
less than what's supposed to do for deassert, but need confirm with
FW developers later. In this case, the link should come up in 1/2
second, which is really short. Otherwise, FW need implement deassert
function in blocking mode to wait the link to come back, which forces
the API to be called in non-atomic context. I'll check with FW developer
later on this.

I guess we have to change the API to be called in non-atomic context in
long run. For now, Wendy is waiting for the fix and port it to RHEL7.1.
I also sent another alternative patch, which was verified by Wendy.
I'm not sure if it's reasonable to include the following patch and
change driver's code to call this API under non-atomic context later
as proceeding enhancement?

https://patchwork.ozlabs.org/patch/432065/

Thanks,
Gavin
diff mbox

Patch

diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c b/arch/powerpc/platforms/pseries/eeh_pseries.c
index a6c7e19..67623a3 100644
--- a/arch/powerpc/platforms/pseries/eeh_pseries.c
+++ b/arch/powerpc/platforms/pseries/eeh_pseries.c
@@ -503,8 +503,7 @@  static int pseries_eeh_get_state(struct eeh_pe *pe, int *state)
  */
 static int pseries_eeh_reset(struct eeh_pe *pe, int option)
 {
-	int config_addr;
-	int ret;
+	int config_addr, delay, ret;
 
 	/* Figure out PE address */
 	config_addr = pe->config_addr;
@@ -528,9 +527,14 @@  static int pseries_eeh_reset(struct eeh_pe *pe, int option)
 	/* We need reset hold or settlement delay */
 	if (option == EEH_RESET_FUNDAMENTAL ||
 	    option == EEH_RESET_HOT)
-		msleep(EEH_PE_RST_HOLD_TIME);
+		delay = EEH_PE_RST_HOLD_TIME;
+	else
+		delay = EEH_PE_RST_SETTLE_TIME;
+
+	if (in_atomic())
+		udelay(delay * 1000);
 	else
-		msleep(EEH_PE_RST_SETTLE_TIME);
+		msleep(delay);
 
 	return ret;
 }