diff mbox series

[PATCH-RESEND] capi: Disable CAPP virtual machines

Message ID 20180118040943.13135-1-vaibhav@linux.vnet.ibm.com
State Accepted
Headers show
Series [PATCH-RESEND] capi: Disable CAPP virtual machines | expand

Commit Message

Vaibhav Jain Jan. 18, 2018, 4:09 a.m. UTC
When exercising more than one CAPI accelerators simultaneously in
cache coherency mode, the verification team is seeing a deadlock. To
fix this a workaround of disabling CAPP virtual machines is
suggested. These 'virtual machines' let PSL queue multiple CAPP
commands for servicing by CAPP there by increasing
throughput. Below is the error scenario described by the h/w team:

" With virtual machines enabled we had a deadlock scenario where with 2
or more CAPI's in a system you could get in a deadlock scenario due to
cast-outs that are required break the deadlock (evict lines that
another CAPI is requesting) get stuck in the virtual machine queue by
a command ahead of it that is being retried by the same scenario in
the other CAPI. "

So this patch updates CAPP APC Master Powerbus control
register during CAPP init to also set Bit(12) that disables CAPP
virtual machines. This forces processing of CAPP commands from PSL one
at a time and thereby preventing above mentioned deadlock scenario.

Signed-off-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com>
---
Change-log:
Resend -> Updated the patch description with more info CAPP virtual
       	  machines and the error scenario.
---
 hw/phb4.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Andrew Donnellan Jan. 18, 2018, 4:38 a.m. UTC | #1
On 18/01/18 15:09, Vaibhav Jain wrote:
> When exercising more than one CAPI accelerators simultaneously in
> cache coherency mode, the verification team is seeing a deadlock. To
> fix this a workaround of disabling CAPP virtual machines is
> suggested. These 'virtual machines' let PSL queue multiple CAPP
> commands for servicing by CAPP there by increasing
> throughput. Below is the error scenario described by the h/w team:
> 
> " With virtual machines enabled we had a deadlock scenario where with 2
> or more CAPI's in a system you could get in a deadlock scenario due to
> cast-outs that are required break the deadlock (evict lines that
> another CAPI is requesting) get stuck in the virtual machine queue by
> a command ahead of it that is being retried by the same scenario in
> the other CAPI. "
> 
> So this patch updates CAPP APC Master Powerbus control
> register during CAPP init to also set Bit(12) that disables CAPP
> virtual machines. This forces processing of CAPP commands from PSL one
> at a time and thereby preventing above mentioned deadlock scenario.
> 
> Signed-off-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com>

Thanks for the description - that makes a lot more sense.

Should this be heading to stable?

Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>

> ---
> Change-log:
> Resend -> Updated the patch description with more info CAPP virtual
>         	  machines and the error scenario.
> ---
>   hw/phb4.c | 1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/hw/phb4.c b/hw/phb4.c
> index ff912e1f..8e660b66 100644
> --- a/hw/phb4.c
> +++ b/hw/phb4.c
> @@ -3581,6 +3581,7 @@ static void phb4_init_capp_regs(struct phb4 *p, uint32_t capp_eng)
>   	xscom_read(p->chip_id, APC_MASTER_PB_CTRL + offset, &reg);
>   	reg |= PPC_BIT(0); /* enable cResp exam */
>   	reg |= PPC_BIT(3); /* disable vg not sys */
> +	reg |= PPC_BIT(12);/* HW417025: disable capp virtual machines */
>   	if (p->rev == PHB4_REV_NIMBUS_DD10) {
>   		reg |= PPC_BIT(1);
>   	} else {
>
Christophe Lombard Jan. 22, 2018, 9:55 a.m. UTC | #2
Le 18/01/2018 à 05:09, Vaibhav Jain a écrit :
> When exercising more than one CAPI accelerators simultaneously in
> cache coherency mode, the verification team is seeing a deadlock. To
> fix this a workaround of disabling CAPP virtual machines is
> suggested. These 'virtual machines' let PSL queue multiple CAPP
> commands for servicing by CAPP there by increasing
> throughput. Below is the error scenario described by the h/w team:
> 
> " With virtual machines enabled we had a deadlock scenario where with 2
> or more CAPI's in a system you could get in a deadlock scenario due to
> cast-outs that are required break the deadlock (evict lines that
> another CAPI is requesting) get stuck in the virtual machine queue by
> a command ahead of it that is being retried by the same scenario in
> the other CAPI. "
> 
> So this patch updates CAPP APC Master Powerbus control
> register during CAPP init to also set Bit(12) that disables CAPP
> virtual machines. This forces processing of CAPP commands from PSL one
> at a time and thereby preventing above mentioned deadlock scenario.
> 
> Signed-off-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com>
> ---
> Change-log:
> Resend -> Updated the patch description with more info CAPP virtual
>         	  machines and the error scenario.
> ---
>   hw/phb4.c | 1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/hw/phb4.c b/hw/phb4.c
> index ff912e1f..8e660b66 100644
> --- a/hw/phb4.c
> +++ b/hw/phb4.c
> @@ -3581,6 +3581,7 @@ static void phb4_init_capp_regs(struct phb4 *p, uint32_t capp_eng)
>   	xscom_read(p->chip_id, APC_MASTER_PB_CTRL + offset, &reg);
>   	reg |= PPC_BIT(0); /* enable cResp exam */
>   	reg |= PPC_BIT(3); /* disable vg not sys */
> +	reg |= PPC_BIT(12);/* HW417025: disable capp virtual machines */

Should this patch be applied on all chips ?
And same question for all devices using a PSL

>   	if (p->rev == PHB4_REV_NIMBUS_DD10) {
>   		reg |= PPC_BIT(1);
>   	} else {
>
Vaibhav Jain Jan. 22, 2018, 10:54 a.m. UTC | #3
Hi Christophe,

christophe lombard <clombard@linux.vnet.ibm.com> writes:

> Should this patch be applied on all chips ?
> And same question for all devices using a PSL
H/w team has confirmed that this needs to be done for all current P9
chips. This is the only workaround for HW417025 in errata section of the
CAPP workbook.
Christophe Lombard Jan. 31, 2018, 8:23 a.m. UTC | #4
Le 18/01/2018 à 05:09, Vaibhav Jain a écrit :
> When exercising more than one CAPI accelerators simultaneously in
> cache coherency mode, the verification team is seeing a deadlock. To
> fix this a workaround of disabling CAPP virtual machines is
> suggested. These 'virtual machines' let PSL queue multiple CAPP
> commands for servicing by CAPP there by increasing
> throughput. Below is the error scenario described by the h/w team:
> 
> " With virtual machines enabled we had a deadlock scenario where with 2
> or more CAPI's in a system you could get in a deadlock scenario due to
> cast-outs that are required break the deadlock (evict lines that
> another CAPI is requesting) get stuck in the virtual machine queue by
> a command ahead of it that is being retried by the same scenario in
> the other CAPI. "
> 
> So this patch updates CAPP APC Master Powerbus control
> register during CAPP init to also set Bit(12) that disables CAPP
> virtual machines. This forces processing of CAPP commands from PSL one
> at a time and thereby preventing above mentioned deadlock scenario.
> 
> Signed-off-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com>
> ---
> Change-log:
> Resend -> Updated the patch description with more info CAPP virtual
>         	  machines and the error scenario.
> ---
>   hw/phb4.c | 1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/hw/phb4.c b/hw/phb4.c
> index ff912e1f..8e660b66 100644
> --- a/hw/phb4.c
> +++ b/hw/phb4.c
> @@ -3581,6 +3581,7 @@ static void phb4_init_capp_regs(struct phb4 *p, uint32_t capp_eng)
>   	xscom_read(p->chip_id, APC_MASTER_PB_CTRL + offset, &reg);
>   	reg |= PPC_BIT(0); /* enable cResp exam */
>   	reg |= PPC_BIT(3); /* disable vg not sys */
> +	reg |= PPC_BIT(12);/* HW417025: disable capp virtual machines */
>   	if (p->rev == PHB4_REV_NIMBUS_DD10) {
>   		reg |= PPC_BIT(1);
>   	} else {
> 

Acked-by: Christophe Lombard clombard@linux.vnet.ibm.com
Stewart Smith Feb. 1, 2018, 8:29 a.m. UTC | #5
Vaibhav Jain <vaibhav@linux.vnet.ibm.com> writes:
> When exercising more than one CAPI accelerators simultaneously in
> cache coherency mode, the verification team is seeing a deadlock. To
> fix this a workaround of disabling CAPP virtual machines is
> suggested. These 'virtual machines' let PSL queue multiple CAPP
> commands for servicing by CAPP there by increasing
> throughput. Below is the error scenario described by the h/w team:
>
> " With virtual machines enabled we had a deadlock scenario where with 2
> or more CAPI's in a system you could get in a deadlock scenario due to
> cast-outs that are required break the deadlock (evict lines that
> another CAPI is requesting) get stuck in the virtual machine queue by
> a command ahead of it that is being retried by the same scenario in
> the other CAPI. "
>
> So this patch updates CAPP APC Master Powerbus control
> register during CAPP init to also set Bit(12) that disables CAPP
> virtual machines. This forces processing of CAPP commands from PSL one
> at a time and thereby preventing above mentioned deadlock scenario.
>
> Signed-off-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com>
> ---
> Change-log:
> Resend -> Updated the patch description with more info CAPP virtual
>        	  machines and the error scenario.
> ---
>  hw/phb4.c | 1 +
>  1 file changed, 1 insertion(+)

Thanks! Merged to master as of 5a959af3fb417c4269b625d9ff2cb204f20728d5
Stewart Smith Feb. 1, 2018, 8:29 a.m. UTC | #6
christophe lombard <clombard@linux.vnet.ibm.com> writes:
> Le 18/01/2018 à 05:09, Vaibhav Jain a écrit :
>> When exercising more than one CAPI accelerators simultaneously in
>> cache coherency mode, the verification team is seeing a deadlock. To
>> fix this a workaround of disabling CAPP virtual machines is
>> suggested. These 'virtual machines' let PSL queue multiple CAPP
>> commands for servicing by CAPP there by increasing
>> throughput. Below is the error scenario described by the h/w team:
>> 
>> " With virtual machines enabled we had a deadlock scenario where with 2
>> or more CAPI's in a system you could get in a deadlock scenario due to
>> cast-outs that are required break the deadlock (evict lines that
>> another CAPI is requesting) get stuck in the virtual machine queue by
>> a command ahead of it that is being retried by the same scenario in
>> the other CAPI. "
>> 
>> So this patch updates CAPP APC Master Powerbus control
>> register during CAPP init to also set Bit(12) that disables CAPP
>> virtual machines. This forces processing of CAPP commands from PSL one
>> at a time and thereby preventing above mentioned deadlock scenario.
>> 
>> Signed-off-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com>
>> ---
>> Change-log:
>> Resend -> Updated the patch description with more info CAPP virtual
>>         	  machines and the error scenario.
>> ---
>>   hw/phb4.c | 1 +
>>   1 file changed, 1 insertion(+)
>> 
>> diff --git a/hw/phb4.c b/hw/phb4.c
>> index ff912e1f..8e660b66 100644
>> --- a/hw/phb4.c
>> +++ b/hw/phb4.c
>> @@ -3581,6 +3581,7 @@ static void phb4_init_capp_regs(struct phb4 *p, uint32_t capp_eng)
>>   	xscom_read(p->chip_id, APC_MASTER_PB_CTRL + offset, &reg);
>>   	reg |= PPC_BIT(0); /* enable cResp exam */
>>   	reg |= PPC_BIT(3); /* disable vg not sys */
>> +	reg |= PPC_BIT(12);/* HW417025: disable capp virtual machines */
>>   	if (p->rev == PHB4_REV_NIMBUS_DD10) {
>>   		reg |= PPC_BIT(1);
>>   	} else {
>> 
>
> Acked-by: Christophe Lombard clombard@linux.vnet.ibm.com

You seem to be missing the < > around the email address, which caused
patchwork not to pick up the acked-by. Might want to fix that :)
Andrew Donnellan Feb. 1, 2018, 8:35 a.m. UTC | #7
On 01/02/18 19:29, Stewart Smith wrote:
>> Acked-by: Christophe Lombard clombard@linux.vnet.ibm.com
> 
> You seem to be missing the < > around the email address, which caused
> patchwork not to pick up the acked-by. Might want to fix that :)
> 

The next version of patchwork will fix that particular regex, but yes, 
gotta maintain the official style ;)
Christophe Lombard Feb. 1, 2018, 9:07 a.m. UTC | #8
Le 01/02/2018 à 09:29, Stewart Smith a écrit :
> christophe lombard <clombard@linux.vnet.ibm.com> writes:
>> Le 18/01/2018 à 05:09, Vaibhav Jain a écrit :
>>> When exercising more than one CAPI accelerators simultaneously in
>>> cache coherency mode, the verification team is seeing a deadlock. To
>>> fix this a workaround of disabling CAPP virtual machines is
>>> suggested. These 'virtual machines' let PSL queue multiple CAPP
>>> commands for servicing by CAPP there by increasing
>>> throughput. Below is the error scenario described by the h/w team:
>>>
>>> " With virtual machines enabled we had a deadlock scenario where with 2
>>> or more CAPI's in a system you could get in a deadlock scenario due to
>>> cast-outs that are required break the deadlock (evict lines that
>>> another CAPI is requesting) get stuck in the virtual machine queue by
>>> a command ahead of it that is being retried by the same scenario in
>>> the other CAPI. "
>>>
>>> So this patch updates CAPP APC Master Powerbus control
>>> register during CAPP init to also set Bit(12) that disables CAPP
>>> virtual machines. This forces processing of CAPP commands from PSL one
>>> at a time and thereby preventing above mentioned deadlock scenario.
>>>
>>> Signed-off-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com>
>>> ---
>>> Change-log:
>>> Resend -> Updated the patch description with more info CAPP virtual
>>>          	  machines and the error scenario.
>>> ---
>>>    hw/phb4.c | 1 +
>>>    1 file changed, 1 insertion(+)
>>>
>>> diff --git a/hw/phb4.c b/hw/phb4.c
>>> index ff912e1f..8e660b66 100644
>>> --- a/hw/phb4.c
>>> +++ b/hw/phb4.c
>>> @@ -3581,6 +3581,7 @@ static void phb4_init_capp_regs(struct phb4 *p, uint32_t capp_eng)
>>>    	xscom_read(p->chip_id, APC_MASTER_PB_CTRL + offset, &reg);
>>>    	reg |= PPC_BIT(0); /* enable cResp exam */
>>>    	reg |= PPC_BIT(3); /* disable vg not sys */
>>> +	reg |= PPC_BIT(12);/* HW417025: disable capp virtual machines */
>>>    	if (p->rev == PHB4_REV_NIMBUS_DD10) {
>>>    		reg |= PPC_BIT(1);
>>>    	} else {
>>>
>>
>> Acked-by: Christophe Lombard clombard@linux.vnet.ibm.com
> 
> You seem to be missing the < > around the email address, which caused
> patchwork not to pick up the acked-by. Might want to fix that :)
> 

Oups, sorry about that. I will check this point.
diff mbox series

Patch

diff --git a/hw/phb4.c b/hw/phb4.c
index ff912e1f..8e660b66 100644
--- a/hw/phb4.c
+++ b/hw/phb4.c
@@ -3581,6 +3581,7 @@  static void phb4_init_capp_regs(struct phb4 *p, uint32_t capp_eng)
 	xscom_read(p->chip_id, APC_MASTER_PB_CTRL + offset, &reg);
 	reg |= PPC_BIT(0); /* enable cResp exam */
 	reg |= PPC_BIT(3); /* disable vg not sys */
+	reg |= PPC_BIT(12);/* HW417025: disable capp virtual machines */
 	if (p->rev == PHB4_REV_NIMBUS_DD10) {
 		reg |= PPC_BIT(1);
 	} else {