diff mbox series

powerpc/rtas: Introduce rtas_get_sensor_nonblocking() for pci hotplug driver.

Message ID 163817631601.2016996.16085383012429651821.stgit@jupiter (mailing list archive)
State Changes Requested
Headers show
Series powerpc/rtas: Introduce rtas_get_sensor_nonblocking() for pci hotplug driver. | expand

Checks

Context Check Description
snowpatch_ozlabs/github-powerpc_ppctests success Successfully ran 8 jobs.
snowpatch_ozlabs/github-powerpc_selftests success Successfully ran 8 jobs.
snowpatch_ozlabs/github-powerpc_kernel_qemu success Successfully ran 24 jobs.
snowpatch_ozlabs/github-powerpc_sparse success Successfully ran 4 jobs.
snowpatch_ozlabs/github-powerpc_clang success Successfully ran 7 jobs.

Commit Message

Mahesh J Salgaonkar Nov. 29, 2021, 8:58 a.m. UTC
When certain PHB HW failure causes phyp to recover PHB, it marks the PE
state as temporarily unavailable until recovery is complete. This also
triggers an EEH handler in Linux which needs to notify drivers, and perform
recovery. But before notifying the driver about the pci error it uses
get_adapter_state()->get-sesnor-state() operation of the hotplug_slot to
determine if the slot contains a device or not. if the slot is empty, the
recovery is skipped entirely.

However on certain PHB failures, the rtas call get-sesnor-state() returns
extended busy error (9902) until PHB is recovered by phyp. Once PHB is
recovered, the get-sensor-state() returns success with correct presence
status. The rtas call interface rtas_get_sensor() loops over the rtas call
on extended delay return code (9902) until the return value is either
success (0) or error (-1). This causes the EEH handler to get stuck for ~6
seconds before it could notify that the pci error has been detected and
stop any active operations. Hence with running I/O traffic, during this 6
seconds, the network driver continues its operation and hits a timeout
(netdev watchdog). On timeouts, network driver go into ffdc capture mode
and reset path assuming the PCI device is in fatal condition. This
sometimes causes EEH recovery to fail. This impacts the ssh connection and
leads to the system being inaccessible.

------------
[52732.244731] DEBUG: ibm_read_slot_reset_state2()
[52732.244762] DEBUG: ret = 0, rets[0]=5, rets[1]=1, rets[2]=4000, rets[3]=>
[52732.244798] DEBUG: in eeh_slot_presence_check
[52732.244804] DEBUG: error state check
[52732.244807] DEBUG: Is slot hotpluggable
[52732.244810] DEBUG: hotpluggable ops ?
[52732.244953] DEBUG: Calling ops->get_adapter_status
[52732.244958] DEBUG: calling rpaphp_get_sensor_state
[52736.564262] ------------[ cut here ]------------
[52736.564299] NETDEV WATCHDOG: enP64p1s0f3 (tg3): transmit queue 0 timed o>
[52736.564324] WARNING: CPU: 1442 PID: 0 at net/sched/sch_generic.c:478 dev>
[...]
[52736.564505] NIP [c000000000c32368] dev_watchdog+0x438/0x440
[52736.564513] LR [c000000000c32364] dev_watchdog+0x434/0x440
------------

Fix this issue by introducing a new rtas_get_sensor_nonblocking() that does
not get blocked on BUSY condition and returns immediately with error. Use
this function in pseries pci hotplug driver which can return an error if
slot presence state can not be detected immediately. Please note that only
in certain PHB failures, the slot presence check returns BUSY condition. In
normal cases it returns immediately with a correct presence state value.
Hence this change has no impact on normal pci dlpar operations.

We could use rtas_get_sensor_fast() variant, but it thorws WARN_ON on BUSY
condition. The rtas_get_sensor_nonblocking() suppresses WARN_ON.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.ibm.com>
---

This is an alternate approach to fix the EEH issue instead of delaying slot
presence check proposed at
https://lists.ozlabs.org/pipermail/linuxppc-dev/2021-November/236956.html

Also refer:
https://lists.ozlabs.org/pipermail/linuxppc-dev/2021-November/237027.html
---
 arch/powerpc/include/asm/rtas.h  |    1 +
 arch/powerpc/kernel/rtas.c       |   19 ++++++++++++++++---
 drivers/pci/hotplug/rpaphp_pci.c |    8 ++++----
 3 files changed, 21 insertions(+), 7 deletions(-)

Comments

Tyrel Datwyler Nov. 29, 2021, 10:54 p.m. UTC | #1
On 11/29/21 12:58 AM, Mahesh Salgaonkar wrote:
> When certain PHB HW failure causes phyp to recover PHB, it marks the PE
> state as temporarily unavailable until recovery is complete. This also
> triggers an EEH handler in Linux which needs to notify drivers, and perform
> recovery. But before notifying the driver about the pci error it uses
> get_adapter_state()->get-sesnor-state() operation of the hotplug_slot to
> determine if the slot contains a device or not. if the slot is empty, the
> recovery is skipped entirely.
> 
> However on certain PHB failures, the rtas call get-sesnor-state() returns
> extended busy error (9902) until PHB is recovered by phyp. Once PHB is
> recovered, the get-sensor-state() returns success with correct presence
> status. The rtas call interface rtas_get_sensor() loops over the rtas call
> on extended delay return code (9902) until the return value is either
> success (0) or error (-1). This causes the EEH handler to get stuck for ~6
> seconds before it could notify that the pci error has been detected and
> stop any active operations. Hence with running I/O traffic, during this 6
> seconds, the network driver continues its operation and hits a timeout
> (netdev watchdog). On timeouts, network driver go into ffdc capture mode
> and reset path assuming the PCI device is in fatal condition. This
> sometimes causes EEH recovery to fail. This impacts the ssh connection and
> leads to the system being inaccessible.
> 
> ------------
> [52732.244731] DEBUG: ibm_read_slot_reset_state2()
> [52732.244762] DEBUG: ret = 0, rets[0]=5, rets[1]=1, rets[2]=4000, rets[3]=>
> [52732.244798] DEBUG: in eeh_slot_presence_check
> [52732.244804] DEBUG: error state check
> [52732.244807] DEBUG: Is slot hotpluggable
> [52732.244810] DEBUG: hotpluggable ops ?
> [52732.244953] DEBUG: Calling ops->get_adapter_status
> [52732.244958] DEBUG: calling rpaphp_get_sensor_state
> [52736.564262] ------------[ cut here ]------------
> [52736.564299] NETDEV WATCHDOG: enP64p1s0f3 (tg3): transmit queue 0 timed o>
> [52736.564324] WARNING: CPU: 1442 PID: 0 at net/sched/sch_generic.c:478 dev>
> [...]
> [52736.564505] NIP [c000000000c32368] dev_watchdog+0x438/0x440
> [52736.564513] LR [c000000000c32364] dev_watchdog+0x434/0x440
> ------------
> 
> Fix this issue by introducing a new rtas_get_sensor_nonblocking() that does
> not get blocked on BUSY condition and returns immediately with error. Use
> this function in pseries pci hotplug driver which can return an error if
> slot presence state can not be detected immediately. Please note that only
> in certain PHB failures, the slot presence check returns BUSY condition. In
> normal cases it returns immediately with a correct presence state value.
> Hence this change has no impact on normal pci dlpar operations.
> 
> We could use rtas_get_sensor_fast() variant, but it thorws WARN_ON on BUSY
> condition. The rtas_get_sensor_nonblocking() suppresses WARN_ON.
> 
> Signed-off-by: Mahesh Salgaonkar <mahesh@linux.ibm.com>
> ---
> 
> This is an alternate approach to fix the EEH issue instead of delaying slot
> presence check proposed at
> https://lists.ozlabs.org/pipermail/linuxppc-dev/2021-November/236956.html
> 
> Also refer:
> https://lists.ozlabs.org/pipermail/linuxppc-dev/2021-November/237027.html
> ---
>  arch/powerpc/include/asm/rtas.h  |    1 +
>  arch/powerpc/kernel/rtas.c       |   19 ++++++++++++++++---
>  drivers/pci/hotplug/rpaphp_pci.c |    8 ++++----
>  3 files changed, 21 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
> index 9dc97d2f9d27e..d8e8befb1c193 100644
> --- a/arch/powerpc/include/asm/rtas.h
> +++ b/arch/powerpc/include/asm/rtas.h
> @@ -250,6 +250,7 @@ extern void rtas_os_term(char *str);
>  void rtas_activate_firmware(void);
>  extern int rtas_get_sensor(int sensor, int index, int *state);
>  extern int rtas_get_sensor_fast(int sensor, int index, int *state);
> +int rtas_get_sensor_nonblocking(int sensor, int index, int *state);
>  extern int rtas_get_power_level(int powerdomain, int *level);
>  extern int rtas_set_power_level(int powerdomain, int level, int *setlevel);
>  extern bool rtas_indicator_present(int token, int *maxindex);
> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
> index ac61e226c9af6..fd5aa3bbd46c5 100644
> --- a/arch/powerpc/kernel/rtas.c
> +++ b/arch/powerpc/kernel/rtas.c
> @@ -609,7 +609,8 @@ int rtas_get_sensor(int sensor, int index, int *state)
>  }
>  EXPORT_SYMBOL(rtas_get_sensor);
> 
> -int rtas_get_sensor_fast(int sensor, int index, int *state)
> +static int
> +__rtas_get_sensor(int sensor, int index, int *state, bool warn_on)
>  {
>  	int token = rtas_token("get-sensor-state");
>  	int rc;
> @@ -618,14 +619,26 @@ int rtas_get_sensor_fast(int sensor, int index, int *state)
>  		return -ENOENT;
> 
>  	rc = rtas_call(token, 2, 2, state, sensor, index);
> -	WARN_ON(rc == RTAS_BUSY || (rc >= RTAS_EXTENDED_DELAY_MIN &&
> -				    rc <= RTAS_EXTENDED_DELAY_MAX));
> +	WARN_ON(warn_on &&
> +		(rc == RTAS_BUSY || (rc >= RTAS_EXTENDED_DELAY_MIN &&
> +				    rc <= RTAS_EXTENDED_DELAY_MAX)));

The whole point of rtas_get_sensor_fast() is that on busy we will just let it
error out because we don't want to wait. I'm not sure I see the point of the
spurious WARN_ONs anytime we hit a BUSY or DELAY return code. Maybe converting
that to a pr_debug() might be better and save expanding the API with a _fast and
_nonblocking variant that do the same thing minus one surpressing a WARN_ON splat.

-Tyrel

> 
>  	if (rc < 0)
>  		return rtas_error_rc(rc);
>  	return rc;
>  }
> 
> +int rtas_get_sensor_fast(int sensor, int index, int *state)
> +{
> +	return __rtas_get_sensor(sensor, index, state, true);
> +}
> +
> +int rtas_get_sensor_nonblocking(int sensor, int index, int *state)
> +{
> +	return __rtas_get_sensor(sensor, index, state, false);
> +}
> +EXPORT_SYMBOL(rtas_get_sensor_nonblocking);
> +
>  bool rtas_indicator_present(int token, int *maxindex)
>  {
>  	int proplen, count, i;
> diff --git a/drivers/pci/hotplug/rpaphp_pci.c b/drivers/pci/hotplug/rpaphp_pci.c
> index c380bdacd1466..8a7d681254ce9 100644
> --- a/drivers/pci/hotplug/rpaphp_pci.c
> +++ b/drivers/pci/hotplug/rpaphp_pci.c
> @@ -23,7 +23,7 @@ int rpaphp_get_sensor_state(struct slot *slot, int *state)
>  	int rc;
>  	int setlevel;
> 
> -	rc = rtas_get_sensor(DR_ENTITY_SENSE, slot->index, state);
> +	rc = rtas_get_sensor_nonblocking(DR_ENTITY_SENSE, slot->index, state);
> 
>  	if (rc < 0) {
>  		if (rc == -EFAULT || rc == -EEXIST) {
> @@ -38,10 +38,10 @@ int rpaphp_get_sensor_state(struct slot *slot, int *state)
>  			if (rc < 0) {
>  				dbg("%s: power on slot[%s] failed rc=%d.\n",
>  				    __func__, slot->name, rc);
> -			} else {
> -				rc = rtas_get_sensor(DR_ENTITY_SENSE,
> -						     slot->index, state);
> +				return rc;
>  			}
> +			rc = rtas_get_sensor_nonblocking(DR_ENTITY_SENSE,
> +							 slot->index, state);
>  		} else if (rc == -ENODEV)
>  			info("%s: slot is unusable\n", __func__);
>  		else
> 
>
Nathan Lynch Nov. 30, 2021, 1:06 a.m. UTC | #2
Tyrel Datwyler <tyreld@linux.ibm.com> writes:
> On 11/29/21 12:58 AM, Mahesh Salgaonkar wrote:
>> -int rtas_get_sensor_fast(int sensor, int index, int *state)
>> +static int
>> +__rtas_get_sensor(int sensor, int index, int *state, bool warn_on)
>>  {
>>  	int token = rtas_token("get-sensor-state");
>>  	int rc;
>> @@ -618,14 +619,26 @@ int rtas_get_sensor_fast(int sensor, int index, int *state)
>>  		return -ENOENT;
>> 
>>  	rc = rtas_call(token, 2, 2, state, sensor, index);
>> -	WARN_ON(rc == RTAS_BUSY || (rc >= RTAS_EXTENDED_DELAY_MIN &&
>> -				    rc <= RTAS_EXTENDED_DELAY_MAX));
>> +	WARN_ON(warn_on &&
>> +		(rc == RTAS_BUSY || (rc >= RTAS_EXTENDED_DELAY_MIN &&
>> +				    rc <= RTAS_EXTENDED_DELAY_MAX)));
>
> The whole point of rtas_get_sensor_fast() is that on busy we will just let it
> error out because we don't want to wait. I'm not sure I see the point of the
> spurious WARN_ONs anytime we hit a BUSY or DELAY return code. Maybe converting
> that to a pr_debug() might be better and save expanding the API with a _fast and
> _nonblocking variant that do the same thing minus one surpressing a
> WARN_ON splat.

There is a subset of sensors that are specified to not ever return busy
or delay statuses. rtas_get_sensor_fast() is meant to be used with
those, and it would be an error to use it on a sensor not in that set.
So the WARN_ON() is appropriate IMO; if it triggers it indicates either
a misuse of the API or a firmware bug. See commit 1c2cb594441d
"powerpc/rtas: Introduce rtas_get_sensor_fast() for IRQ handlers"
Tyrel Datwyler Nov. 30, 2021, 1:21 a.m. UTC | #3
On 11/29/21 5:06 PM, Nathan Lynch wrote:
> Tyrel Datwyler <tyreld@linux.ibm.com> writes:
>> On 11/29/21 12:58 AM, Mahesh Salgaonkar wrote:
>>> -int rtas_get_sensor_fast(int sensor, int index, int *state)
>>> +static int
>>> +__rtas_get_sensor(int sensor, int index, int *state, bool warn_on)
>>>  {
>>>  	int token = rtas_token("get-sensor-state");
>>>  	int rc;
>>> @@ -618,14 +619,26 @@ int rtas_get_sensor_fast(int sensor, int index, int *state)
>>>  		return -ENOENT;
>>>
>>>  	rc = rtas_call(token, 2, 2, state, sensor, index);
>>> -	WARN_ON(rc == RTAS_BUSY || (rc >= RTAS_EXTENDED_DELAY_MIN &&
>>> -				    rc <= RTAS_EXTENDED_DELAY_MAX));
>>> +	WARN_ON(warn_on &&
>>> +		(rc == RTAS_BUSY || (rc >= RTAS_EXTENDED_DELAY_MIN &&
>>> +				    rc <= RTAS_EXTENDED_DELAY_MAX)));
>>
>> The whole point of rtas_get_sensor_fast() is that on busy we will just let it
>> error out because we don't want to wait. I'm not sure I see the point of the
>> spurious WARN_ONs anytime we hit a BUSY or DELAY return code. Maybe converting
>> that to a pr_debug() might be better and save expanding the API with a _fast and
>> _nonblocking variant that do the same thing minus one surpressing a
>> WARN_ON splat.
> 
> There is a subset of sensors that are specified to not ever return busy
> or delay statuses. rtas_get_sensor_fast() is meant to be used with
> those, and it would be an error to use it on a sensor not in that set.
> So the WARN_ON() is appropriate IMO; if it triggers it indicates either
> a misuse of the API or a firmware bug. See commit 1c2cb594441d
> "powerpc/rtas: Introduce rtas_get_sensor_fast() for IRQ handlers"
> 

Fair enough. Seems I misremembered the nature of the original problem and should
have looked back at the commit to completely jog my memory.
Nathan Lynch Nov. 30, 2021, 4:53 a.m. UTC | #4
Mahesh Salgaonkar <mahesh@linux.ibm.com> writes:
> When certain PHB HW failure causes phyp to recover PHB, it marks the PE
> state as temporarily unavailable until recovery is complete. This also
> triggers an EEH handler in Linux which needs to notify drivers, and perform
> recovery. But before notifying the driver about the pci error it uses
> get_adapter_state()->get-sesnor-state() operation of the hotplug_slot to
> determine if the slot contains a device or not. if the slot is empty, the
> recovery is skipped entirely.
>
> However on certain PHB failures, the rtas call get-sesnor-state() returns
> extended busy error (9902) until PHB is recovered by phyp. Once PHB is
> recovered, the get-sensor-state() returns success with correct presence
> status. The rtas call interface rtas_get_sensor() loops over the rtas call
> on extended delay return code (9902) until the return value is either
> success (0) or error (-1). This causes the EEH handler to get stuck for ~6
> seconds before it could notify that the pci error has been detected and
> stop any active operations.

I am curious whether you see any difference with "powerpc/rtas:
rtas_busy_delay() improvements" which was recently applied. It will
cause the the calling task to sleep in response to a 990x status instead
of immediately retrying:

https://git.kernel.org/powerpc/c/38f7b7067dae0c101be573106018e8af22a90fdf

If that commit helps then maybe this change isn't needed.

Otherwise, see my comments below.


> -int rtas_get_sensor_fast(int sensor, int index, int *state)
> +static int
> +__rtas_get_sensor(int sensor, int index, int *state, bool warn_on)

Boolean flag parameters in this style are undesirable. As a reader you
can't infer the significance of a 'true' or 'false' in the argument list
at the call site.

>  {
>  	int token = rtas_token("get-sensor-state");
>  	int rc;
> @@ -618,14 +619,26 @@ int rtas_get_sensor_fast(int sensor, int index, int *state)
>  		return -ENOENT;
>  
>  	rc = rtas_call(token, 2, 2, state, sensor, index);
> -	WARN_ON(rc == RTAS_BUSY || (rc >= RTAS_EXTENDED_DELAY_MIN &&
> -				    rc <= RTAS_EXTENDED_DELAY_MAX));
> +	WARN_ON(warn_on &&
> +		(rc == RTAS_BUSY || (rc >= RTAS_EXTENDED_DELAY_MIN &&
> +				    rc <= RTAS_EXTENDED_DELAY_MAX)));
>  
>  	if (rc < 0)
>  		return rtas_error_rc(rc);
>  	return rc;
>  }

Issues I see with this, in terms of correctness and convention:

* On non-negative status from rtas_call(), including 990x,
  __rtas_get_sensor() returns the RTAS status unchanged. On a negative
  status, it returns a Linux errno value. On a -2 (busy) status
  rtas_error_rc() prints an error message and returns -ERANGE. Seems
  difficult for a caller to handle. Generally we want rtas_* APIs to
  adhere to a Linux 0/-errno convention or to return the RTAS
  status unchanged, but not a mixture.

* __rtas_get_sensor() is called by rtas_get_sensor_fast() and
  rtas_get_sensor_nonblocking(), but is not called by rtas_get_sensor(),
  despite common practice with __-prefixed functions.

> +int rtas_get_sensor_fast(int sensor, int index, int *state)
> +{
> +	return __rtas_get_sensor(sensor, index, state, true);
> +}
> +
> +int rtas_get_sensor_nonblocking(int sensor, int index, int *state)
> +{
> +	return __rtas_get_sensor(sensor, index, state, false);
> +}
> +EXPORT_SYMBOL(rtas_get_sensor_nonblocking);
> +
>  bool rtas_indicator_present(int token, int *maxindex)
>  {
>  	int proplen, count, i;
> diff --git a/drivers/pci/hotplug/rpaphp_pci.c b/drivers/pci/hotplug/rpaphp_pci.c
> index c380bdacd1466..8a7d681254ce9 100644
> --- a/drivers/pci/hotplug/rpaphp_pci.c
> +++ b/drivers/pci/hotplug/rpaphp_pci.c
> @@ -23,7 +23,7 @@ int rpaphp_get_sensor_state(struct slot *slot, int *state)
>  	int rc;
>  	int setlevel;
>  
> -	rc = rtas_get_sensor(DR_ENTITY_SENSE, slot->index, state);
> +	rc = rtas_get_sensor_nonblocking(DR_ENTITY_SENSE, slot->index, state);
>  
>  	if (rc < 0) {
>  		if (rc == -EFAULT || rc == -EEXIST) {
> @@ -38,10 +38,10 @@ int rpaphp_get_sensor_state(struct slot *slot, int *state)
>  			if (rc < 0) {
>  				dbg("%s: power on slot[%s] failed rc=%d.\n",
>  				    __func__, slot->name, rc);
> -			} else {
> -				rc = rtas_get_sensor(DR_ENTITY_SENSE,
> -						     slot->index, state);
> +				return rc;
>  			}
> +			rc = rtas_get_sensor_nonblocking(DR_ENTITY_SENSE,
> +							 slot->index, state);
>  		} else if (rc == -ENODEV)
>  			info("%s: slot is unusable\n", __func__);
>  		else

If I'm reading it right rpaphp_get_sensor_state() now returns 9902 in
the situation this change is trying to address. I checked a couple of
its call sites and it seems like this is going to propagate back into
the PCI hotplug core which of course doesn't understand RTAS call
statuses. So this doesn't seem right.

Maybe it would be better to have rpaphp_get_sensor_state() invoke
rtas_call("get-sensor-state", ...) directly and code whatever special
behavior is needed there, instead of introducing a new exported API. The
driver seems to want to deal with the RTAS return values anyway - it's
implicitly mapping ENODEV, EFAULT, EEXIST from rtas_get_sensor() back to
-9002, -9000, -9001 respectively.
Mahesh J Salgaonkar Nov. 30, 2021, 9:31 a.m. UTC | #5
On 2021-11-29 22:53:41 Mon, Nathan Lynch wrote:
> Mahesh Salgaonkar <mahesh@linux.ibm.com> writes:
> > When certain PHB HW failure causes phyp to recover PHB, it marks the PE
> > state as temporarily unavailable until recovery is complete. This also
> > triggers an EEH handler in Linux which needs to notify drivers, and perform
> > recovery. But before notifying the driver about the pci error it uses
> > get_adapter_state()->get-sesnor-state() operation of the hotplug_slot to
> > determine if the slot contains a device or not. if the slot is empty, the
> > recovery is skipped entirely.
> >
> > However on certain PHB failures, the rtas call get-sesnor-state() returns
> > extended busy error (9902) until PHB is recovered by phyp. Once PHB is
> > recovered, the get-sensor-state() returns success with correct presence
> > status. The rtas call interface rtas_get_sensor() loops over the rtas call
> > on extended delay return code (9902) until the return value is either
> > success (0) or error (-1). This causes the EEH handler to get stuck for ~6
> > seconds before it could notify that the pci error has been detected and
> > stop any active operations.
> 
> I am curious whether you see any difference with "powerpc/rtas:
> rtas_busy_delay() improvements" which was recently applied. It will
> cause the the calling task to sleep in response to a 990x status instead
> of immediately retrying:

If it is still sleeping it may not help, however I will give a try.

> 
> https://git.kernel.org/powerpc/c/38f7b7067dae0c101be573106018e8af22a90fdf
> 
> If that commit helps then maybe this change isn't needed.
> 
> Otherwise, see my comments below.
> 
> 
> > -int rtas_get_sensor_fast(int sensor, int index, int *state)
> > +static int
> > +__rtas_get_sensor(int sensor, int index, int *state, bool warn_on)
> 
> Boolean flag parameters in this style are undesirable. As a reader you
> can't infer the significance of a 'true' or 'false' in the argument list
> at the call site.
> 
> >  {
> >  	int token = rtas_token("get-sensor-state");
> >  	int rc;
> > @@ -618,14 +619,26 @@ int rtas_get_sensor_fast(int sensor, int index, int *state)
> >  		return -ENOENT;
> >  
> >  	rc = rtas_call(token, 2, 2, state, sensor, index);
> > -	WARN_ON(rc == RTAS_BUSY || (rc >= RTAS_EXTENDED_DELAY_MIN &&
> > -				    rc <= RTAS_EXTENDED_DELAY_MAX));
> > +	WARN_ON(warn_on &&
> > +		(rc == RTAS_BUSY || (rc >= RTAS_EXTENDED_DELAY_MIN &&
> > +				    rc <= RTAS_EXTENDED_DELAY_MAX)));
> >  
> >  	if (rc < 0)
> >  		return rtas_error_rc(rc);
> >  	return rc;
> >  }
> 
> Issues I see with this, in terms of correctness and convention:
> 
> * On non-negative status from rtas_call(), including 990x,
>   __rtas_get_sensor() returns the RTAS status unchanged. On a negative
>   status, it returns a Linux errno value. On a -2 (busy) status
>   rtas_error_rc() prints an error message and returns -ERANGE. Seems
>   difficult for a caller to handle. Generally we want rtas_* APIs to
>   adhere to a Linux 0/-errno convention or to return the RTAS
>   status unchanged, but not a mixture.
> 
> * __rtas_get_sensor() is called by rtas_get_sensor_fast() and
>   rtas_get_sensor_nonblocking(), but is not called by rtas_get_sensor(),
>   despite common practice with __-prefixed functions.
> 
> > +int rtas_get_sensor_fast(int sensor, int index, int *state)
> > +{
> > +	return __rtas_get_sensor(sensor, index, state, true);
> > +}
> > +
> > +int rtas_get_sensor_nonblocking(int sensor, int index, int *state)
> > +{
> > +	return __rtas_get_sensor(sensor, index, state, false);
> > +}
> > +EXPORT_SYMBOL(rtas_get_sensor_nonblocking);
> > +
> >  bool rtas_indicator_present(int token, int *maxindex)
> >  {
> >  	int proplen, count, i;
> > diff --git a/drivers/pci/hotplug/rpaphp_pci.c b/drivers/pci/hotplug/rpaphp_pci.c
> > index c380bdacd1466..8a7d681254ce9 100644
> > --- a/drivers/pci/hotplug/rpaphp_pci.c
> > +++ b/drivers/pci/hotplug/rpaphp_pci.c
> > @@ -23,7 +23,7 @@ int rpaphp_get_sensor_state(struct slot *slot, int *state)
> >  	int rc;
> >  	int setlevel;
> >  
> > -	rc = rtas_get_sensor(DR_ENTITY_SENSE, slot->index, state);
> > +	rc = rtas_get_sensor_nonblocking(DR_ENTITY_SENSE, slot->index, state);
> >  
> >  	if (rc < 0) {
> >  		if (rc == -EFAULT || rc == -EEXIST) {
> > @@ -38,10 +38,10 @@ int rpaphp_get_sensor_state(struct slot *slot, int *state)
> >  			if (rc < 0) {
> >  				dbg("%s: power on slot[%s] failed rc=%d.\n",
> >  				    __func__, slot->name, rc);
> > -			} else {
> > -				rc = rtas_get_sensor(DR_ENTITY_SENSE,
> > -						     slot->index, state);
> > +				return rc;
> >  			}
> > +			rc = rtas_get_sensor_nonblocking(DR_ENTITY_SENSE,
> > +							 slot->index, state);
> >  		} else if (rc == -ENODEV)
> >  			info("%s: slot is unusable\n", __func__);
> >  		else
> 
> If I'm reading it right rpaphp_get_sensor_state() now returns 9902 in
> the situation this change is trying to address. I checked a couple of
> its call sites and it seems like this is going to propagate back into
> the PCI hotplug core which of course doesn't understand RTAS call
> statuses. So this doesn't seem right.

Thanks for pointing it out. I should convert that into an error before
returning. I overlooked this when I moved away from get_sensor_state().

> 
> Maybe it would be better to have rpaphp_get_sensor_state() invoke
> rtas_call("get-sensor-state", ...) directly and code whatever special
> behavior is needed there, instead of introducing a new exported API. The
> driver seems to want to deal with the RTAS return values anyway - it's
> implicitly mapping ENODEV, EFAULT, EEXIST from rtas_get_sensor() back to
> -9002, -9000, -9001 respectively.

Sure I will try this.

Thanks,
-Mahesh.
Nathan Lynch Nov. 30, 2021, 12:39 p.m. UTC | #6
Mahesh J Salgaonkar <mahesh@linux.ibm.com> writes:

> On 2021-11-29 22:53:41 Mon, Nathan Lynch wrote:
>> Mahesh Salgaonkar <mahesh@linux.ibm.com> writes:
>> > When certain PHB HW failure causes phyp to recover PHB, it marks the PE
>> > state as temporarily unavailable until recovery is complete. This also
>> > triggers an EEH handler in Linux which needs to notify drivers, and perform
>> > recovery. But before notifying the driver about the pci error it uses
>> > get_adapter_state()->get-sesnor-state() operation of the hotplug_slot to
>> > determine if the slot contains a device or not. if the slot is empty, the
>> > recovery is skipped entirely.
>> >
>> > However on certain PHB failures, the rtas call get-sesnor-state() returns
>> > extended busy error (9902) until PHB is recovered by phyp. Once PHB is
>> > recovered, the get-sensor-state() returns success with correct presence
>> > status. The rtas call interface rtas_get_sensor() loops over the rtas call
>> > on extended delay return code (9902) until the return value is either
>> > success (0) or error (-1). This causes the EEH handler to get stuck for ~6
>> > seconds before it could notify that the pci error has been detected and
>> > stop any active operations.
>> 
>> I am curious whether you see any difference with "powerpc/rtas:
>> rtas_busy_delay() improvements" which was recently applied. It will
>> cause the the calling task to sleep in response to a 990x status instead
>> of immediately retrying:
>
> If it is still sleeping it may not help, however I will give a try.

Thanks. My thought is that with the old behavior of rtas_busy_delay(),
the repeated retries (potentially hundreds or thousands of
get-sensor-state calls) that occur before finally sleeping may be
prolonging the ongoing PHB recovery.
Mahesh J Salgaonkar Dec. 3, 2021, 1:42 p.m. UTC | #7
On 2021-11-29 22:53:41 Mon, Nathan Lynch wrote:
> Mahesh Salgaonkar <mahesh@linux.ibm.com> writes:
> > When certain PHB HW failure causes phyp to recover PHB, it marks the PE
> > state as temporarily unavailable until recovery is complete. This also
> > triggers an EEH handler in Linux which needs to notify drivers, and perform
> > recovery. But before notifying the driver about the pci error it uses
> > get_adapter_state()->get-sesnor-state() operation of the hotplug_slot to
> > determine if the slot contains a device or not. if the slot is empty, the
> > recovery is skipped entirely.
> >
> > However on certain PHB failures, the rtas call get-sesnor-state() returns
> > extended busy error (9902) until PHB is recovered by phyp. Once PHB is
> > recovered, the get-sensor-state() returns success with correct presence
> > status. The rtas call interface rtas_get_sensor() loops over the rtas call
> > on extended delay return code (9902) until the return value is either
> > success (0) or error (-1). This causes the EEH handler to get stuck for ~6
> > seconds before it could notify that the pci error has been detected and
> > stop any active operations.
> 
> I am curious whether you see any difference with "powerpc/rtas:
> rtas_busy_delay() improvements" which was recently applied. It will
> cause the the calling task to sleep in response to a 990x status instead
> of immediately retrying:
> 
> https://git.kernel.org/powerpc/c/38f7b7067dae0c101be573106018e8af22a90fdf
> 
> If that commit helps then maybe this change isn't needed.

I tried with above commit but it didn't help.

> 
> Maybe it would be better to have rpaphp_get_sensor_state() invoke
> rtas_call("get-sensor-state", ...) directly and code whatever special
> behavior is needed there, instead of introducing a new exported API. The

Posted v3 with above approach.
https://lists.ozlabs.org/pipermail/linuxppc-dev/2021-December/237538.html

Thanks for your review.
-Mahesh.
Nathan Lynch Dec. 9, 2021, 3:03 p.m. UTC | #8
Mahesh J Salgaonkar <mahesh@linux.ibm.com> writes:
> On 2021-11-29 22:53:41 Mon, Nathan Lynch wrote:
>> Mahesh Salgaonkar <mahesh@linux.ibm.com> writes:
>> >
>> > However on certain PHB failures, the rtas call get-sesnor-state() returns
>> > extended busy error (9902) until PHB is recovered by phyp. Once PHB is
>> > recovered, the get-sensor-state() returns success with correct presence
>> > status. The rtas call interface rtas_get_sensor() loops over the rtas call
>> > on extended delay return code (9902) until the return value is either
>> > success (0) or error (-1). This causes the EEH handler to get stuck for ~6
>> > seconds before it could notify that the pci error has been detected and
>> > stop any active operations.
>> 
>> I am curious whether you see any difference with "powerpc/rtas:
>> rtas_busy_delay() improvements" which was recently applied. It will
>> cause the the calling task to sleep in response to a 990x status instead
>> of immediately retrying:
>> 
>> https://git.kernel.org/powerpc/c/38f7b7067dae0c101be573106018e8af22a90fdf
>> 
>> If that commit helps then maybe this change isn't needed.
>
> I tried with above commit but it didn't help.

OK, not too surprising, but thank you for checking.
diff mbox series

Patch

diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
index 9dc97d2f9d27e..d8e8befb1c193 100644
--- a/arch/powerpc/include/asm/rtas.h
+++ b/arch/powerpc/include/asm/rtas.h
@@ -250,6 +250,7 @@  extern void rtas_os_term(char *str);
 void rtas_activate_firmware(void);
 extern int rtas_get_sensor(int sensor, int index, int *state);
 extern int rtas_get_sensor_fast(int sensor, int index, int *state);
+int rtas_get_sensor_nonblocking(int sensor, int index, int *state);
 extern int rtas_get_power_level(int powerdomain, int *level);
 extern int rtas_set_power_level(int powerdomain, int level, int *setlevel);
 extern bool rtas_indicator_present(int token, int *maxindex);
diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index ac61e226c9af6..fd5aa3bbd46c5 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -609,7 +609,8 @@  int rtas_get_sensor(int sensor, int index, int *state)
 }
 EXPORT_SYMBOL(rtas_get_sensor);
 
-int rtas_get_sensor_fast(int sensor, int index, int *state)
+static int
+__rtas_get_sensor(int sensor, int index, int *state, bool warn_on)
 {
 	int token = rtas_token("get-sensor-state");
 	int rc;
@@ -618,14 +619,26 @@  int rtas_get_sensor_fast(int sensor, int index, int *state)
 		return -ENOENT;
 
 	rc = rtas_call(token, 2, 2, state, sensor, index);
-	WARN_ON(rc == RTAS_BUSY || (rc >= RTAS_EXTENDED_DELAY_MIN &&
-				    rc <= RTAS_EXTENDED_DELAY_MAX));
+	WARN_ON(warn_on &&
+		(rc == RTAS_BUSY || (rc >= RTAS_EXTENDED_DELAY_MIN &&
+				    rc <= RTAS_EXTENDED_DELAY_MAX)));
 
 	if (rc < 0)
 		return rtas_error_rc(rc);
 	return rc;
 }
 
+int rtas_get_sensor_fast(int sensor, int index, int *state)
+{
+	return __rtas_get_sensor(sensor, index, state, true);
+}
+
+int rtas_get_sensor_nonblocking(int sensor, int index, int *state)
+{
+	return __rtas_get_sensor(sensor, index, state, false);
+}
+EXPORT_SYMBOL(rtas_get_sensor_nonblocking);
+
 bool rtas_indicator_present(int token, int *maxindex)
 {
 	int proplen, count, i;
diff --git a/drivers/pci/hotplug/rpaphp_pci.c b/drivers/pci/hotplug/rpaphp_pci.c
index c380bdacd1466..8a7d681254ce9 100644
--- a/drivers/pci/hotplug/rpaphp_pci.c
+++ b/drivers/pci/hotplug/rpaphp_pci.c
@@ -23,7 +23,7 @@  int rpaphp_get_sensor_state(struct slot *slot, int *state)
 	int rc;
 	int setlevel;
 
-	rc = rtas_get_sensor(DR_ENTITY_SENSE, slot->index, state);
+	rc = rtas_get_sensor_nonblocking(DR_ENTITY_SENSE, slot->index, state);
 
 	if (rc < 0) {
 		if (rc == -EFAULT || rc == -EEXIST) {
@@ -38,10 +38,10 @@  int rpaphp_get_sensor_state(struct slot *slot, int *state)
 			if (rc < 0) {
 				dbg("%s: power on slot[%s] failed rc=%d.\n",
 				    __func__, slot->name, rc);
-			} else {
-				rc = rtas_get_sensor(DR_ENTITY_SENSE,
-						     slot->index, state);
+				return rc;
 			}
+			rc = rtas_get_sensor_nonblocking(DR_ENTITY_SENSE,
+							 slot->index, state);
 		} else if (rc == -ENODEV)
 			info("%s: slot is unusable\n", __func__);
 		else