diff mbox

[3/3] powerpc/eeh: Synchronize recovery in host/guest

Message ID 1456445092-18337-4-git-send-email-gwshan@linux.vnet.ibm.com (mailing list archive)
State Superseded
Headers show

Commit Message

Gavin Shan Feb. 26, 2016, 12:04 a.m. UTC
When passing through SRIOV VFs to guest, we possibly encounter EEH
error on PF. In this case, the VF PEs are put into frozen state.
The error could be reported to guest before it's captured by the
host. That means the guest could attempt to recover errors on VFs
before host gets chance to recover errors on PFs. The VFs won't be
recovered successfully.

This enforces the recovery order for above case: the recovery on
child PE in guest is hold until the recovery on parent PE in host
is completed.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/eeh.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

Comments

Russell Currey March 2, 2016, 1:03 a.m. UTC | #1
On Fri, 2016-02-26 at 11:04 +1100, Gavin Shan wrote:
> When passing through SRIOV VFs to guest, we possibly encounter EEH
> error on PF. In this case, the VF PEs are put into frozen state.
> The error could be reported to guest before it's captured by the
> host. That means the guest could attempt to recover errors on VFs
> before host gets chance to recover errors on PFs. The VFs won't be
> recovered successfully.
> 
> This enforces the recovery order for above case: the recovery on
> child PE in guest is hold until the recovery on parent PE in host
> is completed.
> 
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> ---
>  arch/powerpc/kernel/eeh.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
> index fd9c782..42bd546 100644
> --- a/arch/powerpc/kernel/eeh.c
> +++ b/arch/powerpc/kernel/eeh.c
> @@ -1541,6 +1541,17 @@ int eeh_pe_get_state(struct eeh_pe *pe)
>  	if (!eeh_ops || !eeh_ops->get_state)
>  		return -ENOENT;
>  
> +	/*
> +	 * If the parent PE, which is owned by host kernel, is
> experiencing
> +	 * error recovery. We should return temporarily unavailable PE
> state
> +	 * so that the recovery on guest side is suspended until the
> error
> +	 * recovery is completed on host side.
> +	 */

Hi Gavin,

I think this could be worded a little better.  For example:

/*
 * If the parent PE is owned by the host kernel and is undergoing
 * error recovery, we should return the PE state as temporarily
 * unavailable so that the error recovery on the guest is suspended
 * until the recovery completes on the host.
 */

> +	if (pe->parent &&
> +	    !(pe->state & EEH_PE_REMOVED) &&
> +	    (pe->parent->state & (EEH_PE_ISOLATED | EEH_PE_RECOVERING)))
> +		return EEH_PE_STATE_UNAVAIL;
> +
>  	result = eeh_ops->get_state(pe, NULL);
>  	rst_active = !!(result & EEH_STATE_RESET_ACTIVE);
>  	dma_en = !!(result & EEH_STATE_DMA_ENABLED);
Gavin Shan March 2, 2016, 2:13 a.m. UTC | #2
On Wed, Mar 02, 2016 at 12:03:20PM +1100, Russell Currey wrote:
>On Fri, 2016-02-26 at 11:04 +1100, Gavin Shan wrote:
>> When passing through SRIOV VFs to guest, we possibly encounter EEH
>> error on PF. In this case, the VF PEs are put into frozen state.
>> The error could be reported to guest before it's captured by the
>> host. That means the guest could attempt to recover errors on VFs
>> before host gets chance to recover errors on PFs. The VFs won't be
>> recovered successfully.
>> 
>> This enforces the recovery order for above case: the recovery on
>> child PE in guest is hold until the recovery on parent PE in host
>> is completed.
>> 
>> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>> ---
>>  arch/powerpc/kernel/eeh.c | 11 +++++++++++
>>  1 file changed, 11 insertions(+)
>> 
>> diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
>> index fd9c782..42bd546 100644
>> --- a/arch/powerpc/kernel/eeh.c
>> +++ b/arch/powerpc/kernel/eeh.c
>> @@ -1541,6 +1541,17 @@ int eeh_pe_get_state(struct eeh_pe *pe)
>>  	if (!eeh_ops || !eeh_ops->get_state)
>>  		return -ENOENT;
>>  
>> +	/*
>> +	 * If the parent PE, which is owned by host kernel, is
>> experiencing
>> +	 * error recovery. We should return temporarily unavailable PE
>> state
>> +	 * so that the recovery on guest side is suspended until the
>> error
>> +	 * recovery is completed on host side.
>> +	 */
>
>Hi Gavin,
>
>I think this could be worded a little better.  For example:
>
>/*
> * If the parent PE is owned by the host kernel and is undergoing
> * error recovery, we should return the PE state as temporarily
> * unavailable so that the error recovery on the guest is suspended
> * until the recovery completes on the host.
> */
>

Yes, it will be integrated to v2. Thanks for review.

>> +	if (pe->parent &&
>> +	    !(pe->state & EEH_PE_REMOVED) &&
>> +	    (pe->parent->state & (EEH_PE_ISOLATED | EEH_PE_RECOVERING)))
>> +		return EEH_PE_STATE_UNAVAIL;
>> +
>>  	result = eeh_ops->get_state(pe, NULL);
>>  	rst_active = !!(result & EEH_STATE_RESET_ACTIVE);
>>  	dma_en = !!(result & EEH_STATE_DMA_ENABLED);
>

Thanks,
Gavin
diff mbox

Patch

diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index fd9c782..42bd546 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -1541,6 +1541,17 @@  int eeh_pe_get_state(struct eeh_pe *pe)
 	if (!eeh_ops || !eeh_ops->get_state)
 		return -ENOENT;
 
+	/*
+	 * If the parent PE, which is owned by host kernel, is experiencing
+	 * error recovery. We should return temporarily unavailable PE state
+	 * so that the recovery on guest side is suspended until the error
+	 * recovery is completed on host side.
+	 */
+	if (pe->parent &&
+	    !(pe->state & EEH_PE_REMOVED) &&
+	    (pe->parent->state & (EEH_PE_ISOLATED | EEH_PE_RECOVERING)))
+		return EEH_PE_STATE_UNAVAIL;
+
 	result = eeh_ops->get_state(pe, NULL);
 	rst_active = !!(result & EEH_STATE_RESET_ACTIVE);
 	dma_en = !!(result & EEH_STATE_DMA_ENABLED);