diff mbox series

[iwl-net] ice: implement AQ download pkg retry

Message ID 20240516140426.60439-1-wojciech.drewek@intel.com
State Changes Requested
Headers show
Series [iwl-net] ice: implement AQ download pkg retry | expand

Commit Message

Wojciech Drewek May 16, 2024, 2:04 p.m. UTC
ice_aqc_opc_download_pkg (0x0C40) AQ sporadically returns error due
to FW issue. Fix this by retrying five times before moving to
Safe Mode.

Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_ddp.c | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

Comments

Simon Horman May 16, 2024, 2:56 p.m. UTC | #1
On Thu, May 16, 2024 at 04:04:26PM +0200, Wojciech Drewek wrote:
> ice_aqc_opc_download_pkg (0x0C40) AQ sporadically returns error due
> to FW issue. Fix this by retrying five times before moving to
> Safe Mode.

Hi Wojciech,

As this is for iwl-net, and therefore net, please consider supplying a
Fixes tag.

Otherwise this looks good to me.

> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
> Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>

...
Brett Creeley May 16, 2024, 4:36 p.m. UTC | #2
On 5/16/2024 7:04 AM, Wojciech Drewek wrote:
> Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
> 
> 
> ice_aqc_opc_download_pkg (0x0C40) AQ sporadically returns error due
> to FW issue. Fix this by retrying five times before moving to
> Safe Mode.
> 
> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
> Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
> ---
>   drivers/net/ethernet/intel/ice/ice_ddp.c | 19 +++++++++++++++++--
>   1 file changed, 17 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/ice/ice_ddp.c b/drivers/net/ethernet/intel/ice/ice_ddp.c
> index ce5034ed2b24..19e2111fcf08 100644
> --- a/drivers/net/ethernet/intel/ice/ice_ddp.c
> +++ b/drivers/net/ethernet/intel/ice/ice_ddp.c
> @@ -1339,6 +1339,7 @@ ice_dwnld_cfg_bufs_no_lock(struct ice_hw *hw, struct ice_buf *bufs, u32 start,
> 
>          for (i = 0; i < count; i++) {
>                  bool last = false;
> +               int try_cnt = 0;
>                  int status;
> 
>                  bh = (struct ice_buf_hdr *)(bufs + start + i);
> @@ -1346,8 +1347,22 @@ ice_dwnld_cfg_bufs_no_lock(struct ice_hw *hw, struct ice_buf *bufs, u32 start,
>                  if (indicate_last)
>                          last = ice_is_last_download_buffer(bh, i, count);
> 
> -               status = ice_aq_download_pkg(hw, bh, ICE_PKG_BUF_SIZE, last,
> -                                            &offset, &info, NULL);
> +               while (try_cnt < 5) {
> +                       status = ice_aq_download_pkg(hw, bh, ICE_PKG_BUF_SIZE,
> +                                                    last, &offset, &info,
> +                                                    NULL);
> +                       if (hw->adminq.sq_last_status != ICE_AQ_RC_ENOSEC &&
> +                           hw->adminq.sq_last_status != ICE_AQ_RC_EBADSIG)

Are these the only 2 sporadic errors that FW will return?

> +                               break;
> +
> +                       try_cnt++;
> +                       msleep(20);
> +               }
> +
> +               if (try_cnt)
> +                       dev_dbg(ice_hw_to_dev(hw),
> +                               "ice_aq_download_pkg failed, number of retries: %d\n",
> +                               try_cnt);

If try_cnt is non-zero it doesn't mean the last download failed, it just 
means one or more attempts to download failed right? Maybe just 
"ice_aq_download_pkg number of retries: %d" since the if (status) check 
below will print on failure?

> 
>                  /* Save AQ status from download package */
>                  if (status) {
> --
> 2.40.1
> 
>
Naveen Mamindlapalli May 17, 2024, 7:26 a.m. UTC | #3
> -----Original Message-----
> From: Wojciech Drewek <wojciech.drewek@intel.com>
> Sent: Thursday, May 16, 2024 7:34 PM
> To: netdev@vger.kernel.org
> Cc: intel-wired-lan@lists.osuosl.org
> Subject: [PATCH iwl-net] ice: implement AQ download pkg retry
> 
> ice_aqc_opc_download_pkg (0x0C40) AQ sporadically returns error due to FW
> issue. Fix this by retrying five times before moving to Safe Mode.
> 
> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
> Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
> ---
>  drivers/net/ethernet/intel/ice/ice_ddp.c | 19 +++++++++++++++++--
>  1 file changed, 17 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/ice/ice_ddp.c
> b/drivers/net/ethernet/intel/ice/ice_ddp.c
> index ce5034ed2b24..19e2111fcf08 100644
> --- a/drivers/net/ethernet/intel/ice/ice_ddp.c
> +++ b/drivers/net/ethernet/intel/ice/ice_ddp.c
> @@ -1339,6 +1339,7 @@ ice_dwnld_cfg_bufs_no_lock(struct ice_hw *hw, struct
> ice_buf *bufs, u32 start,
> 
>  	for (i = 0; i < count; i++) {
>  		bool last = false;
> +		int try_cnt = 0;
>  		int status;
> 
>  		bh = (struct ice_buf_hdr *)(bufs + start + i); @@ -1346,8
> +1347,22 @@ ice_dwnld_cfg_bufs_no_lock(struct ice_hw *hw, struct ice_buf
> *bufs, u32 start,
>  		if (indicate_last)
>  			last = ice_is_last_download_buffer(bh, i, count);
> 
> -		status = ice_aq_download_pkg(hw, bh, ICE_PKG_BUF_SIZE,
> last,
> -					     &offset, &info, NULL);
> +		while (try_cnt < 5) {
> +			status = ice_aq_download_pkg(hw, bh,
> ICE_PKG_BUF_SIZE,
> +						     last, &offset, &info,
> +						     NULL);
> +			if (hw->adminq.sq_last_status != ICE_AQ_RC_ENOSEC
> &&
> +			    hw->adminq.sq_last_status != ICE_AQ_RC_EBADSIG)
> +				break;
> +
> +			try_cnt++;
> +			msleep(20);
> +		}
> +
> +		if (try_cnt)
> +			dev_dbg(ice_hw_to_dev(hw),
> +				"ice_aq_download_pkg failed, number of retries:
> %d\n",
> +				try_cnt);

Do you really need this dbg statement when try_cnt < 5? Is it not misleading in success case (with retries)?

Thanks,
Naveen

> 
>  		/* Save AQ status from download package */
>  		if (status) {
> --
> 2.40.1
>
Wojciech Drewek May 17, 2024, 7:42 a.m. UTC | #4
On 16.05.2024 16:56, Simon Horman wrote:
> On Thu, May 16, 2024 at 04:04:26PM +0200, Wojciech Drewek wrote:
>> ice_aqc_opc_download_pkg (0x0C40) AQ sporadically returns error due
>> to FW issue. Fix this by retrying five times before moving to
>> Safe Mode.
> 
> Hi Wojciech,
> 
> As this is for iwl-net, and therefore net, please consider supplying a
> Fixes tag.

Sure thing.

> 
> Otherwise this looks good to me.
> 
>> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
>> Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
> 
> ...
Wojciech Drewek May 17, 2024, 7:49 a.m. UTC | #5
On 16.05.2024 18:36, Brett Creeley wrote:
> 
> 
> On 5/16/2024 7:04 AM, Wojciech Drewek wrote:
>> Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
>>
>>
>> ice_aqc_opc_download_pkg (0x0C40) AQ sporadically returns error due
>> to FW issue. Fix this by retrying five times before moving to
>> Safe Mode.
>>
>> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
>> Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
>> ---
>>   drivers/net/ethernet/intel/ice/ice_ddp.c | 19 +++++++++++++++++--
>>   1 file changed, 17 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/intel/ice/ice_ddp.c b/drivers/net/ethernet/intel/ice/ice_ddp.c
>> index ce5034ed2b24..19e2111fcf08 100644
>> --- a/drivers/net/ethernet/intel/ice/ice_ddp.c
>> +++ b/drivers/net/ethernet/intel/ice/ice_ddp.c
>> @@ -1339,6 +1339,7 @@ ice_dwnld_cfg_bufs_no_lock(struct ice_hw *hw, struct ice_buf *bufs, u32 start,
>>
>>          for (i = 0; i < count; i++) {
>>                  bool last = false;
>> +               int try_cnt = 0;
>>                  int status;
>>
>>                  bh = (struct ice_buf_hdr *)(bufs + start + i);
>> @@ -1346,8 +1347,22 @@ ice_dwnld_cfg_bufs_no_lock(struct ice_hw *hw, struct ice_buf *bufs, u32 start,
>>                  if (indicate_last)
>>                          last = ice_is_last_download_buffer(bh, i, count);
>>
>> -               status = ice_aq_download_pkg(hw, bh, ICE_PKG_BUF_SIZE, last,
>> -                                            &offset, &info, NULL);
>> +               while (try_cnt < 5) {
>> +                       status = ice_aq_download_pkg(hw, bh, ICE_PKG_BUF_SIZE,
>> +                                                    last, &offset, &info,
>> +                                                    NULL);
>> +                       if (hw->adminq.sq_last_status != ICE_AQ_RC_ENOSEC &&
>> +                           hw->adminq.sq_last_status != ICE_AQ_RC_EBADSIG)
> 
> Are these the only 2 sporadic errors that FW will return?

Yes, that's right. We don't want to retry in case of other errors since those might be valid.

> 
>> +                               break;
>> +
>> +                       try_cnt++;
>> +                       msleep(20);
>> +               }
>> +
>> +               if (try_cnt)
>> +                       dev_dbg(ice_hw_to_dev(hw),
>> +                               "ice_aq_download_pkg failed, number of retries: %d\n",
>> +                               try_cnt);
> 
> If try_cnt is non-zero it doesn't mean the last download failed, it just means one or more attempts to download failed right? Maybe just "ice_aq_download_pkg number of retries: %d" since the if (status) check below will print on failure?

Sounds reasonable, we want this log only because we want to know if we hit this sporadic failure.

> 
>>
>>                  /* Save AQ status from download package */
>>                  if (status) {
>> -- 
>> 2.40.1
>>
>>
Wojciech Drewek May 17, 2024, 7:51 a.m. UTC | #6
On 17.05.2024 09:26, Naveen Mamindlapalli wrote:
> 
>> -----Original Message-----
>> From: Wojciech Drewek <wojciech.drewek@intel.com>
>> Sent: Thursday, May 16, 2024 7:34 PM
>> To: netdev@vger.kernel.org
>> Cc: intel-wired-lan@lists.osuosl.org
>> Subject: [PATCH iwl-net] ice: implement AQ download pkg retry
>>
>> ice_aqc_opc_download_pkg (0x0C40) AQ sporadically returns error due to FW
>> issue. Fix this by retrying five times before moving to Safe Mode.
>>
>> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
>> Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
>> ---
>>  drivers/net/ethernet/intel/ice/ice_ddp.c | 19 +++++++++++++++++--
>>  1 file changed, 17 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/intel/ice/ice_ddp.c
>> b/drivers/net/ethernet/intel/ice/ice_ddp.c
>> index ce5034ed2b24..19e2111fcf08 100644
>> --- a/drivers/net/ethernet/intel/ice/ice_ddp.c
>> +++ b/drivers/net/ethernet/intel/ice/ice_ddp.c
>> @@ -1339,6 +1339,7 @@ ice_dwnld_cfg_bufs_no_lock(struct ice_hw *hw, struct
>> ice_buf *bufs, u32 start,
>>
>>  	for (i = 0; i < count; i++) {
>>  		bool last = false;
>> +		int try_cnt = 0;
>>  		int status;
>>
>>  		bh = (struct ice_buf_hdr *)(bufs + start + i); @@ -1346,8
>> +1347,22 @@ ice_dwnld_cfg_bufs_no_lock(struct ice_hw *hw, struct ice_buf
>> *bufs, u32 start,
>>  		if (indicate_last)
>>  			last = ice_is_last_download_buffer(bh, i, count);
>>
>> -		status = ice_aq_download_pkg(hw, bh, ICE_PKG_BUF_SIZE,
>> last,
>> -					     &offset, &info, NULL);
>> +		while (try_cnt < 5) {
>> +			status = ice_aq_download_pkg(hw, bh,
>> ICE_PKG_BUF_SIZE,
>> +						     last, &offset, &info,
>> +						     NULL);
>> +			if (hw->adminq.sq_last_status != ICE_AQ_RC_ENOSEC
>> &&
>> +			    hw->adminq.sq_last_status != ICE_AQ_RC_EBADSIG)
>> +				break;
>> +
>> +			try_cnt++;
>> +			msleep(20);
>> +		}
>> +
>> +		if (try_cnt)
>> +			dev_dbg(ice_hw_to_dev(hw),
>> +				"ice_aq_download_pkg failed, number of retries:
>> %d\n",
>> +				try_cnt);
> 
> Do you really need this dbg statement when try_cnt < 5? Is it not misleading in success case (with retries)?

Agree. I'm going to change it to "ice_aq_download_pkg number of retries: %d" as Brett suggested.

> 
> Thanks,
> Naveen
> 
>>
>>  		/* Save AQ status from download package */
>>  		if (status) {
>> --
>> 2.40.1
>>
>
Przemek Kitszel May 17, 2024, 8:29 a.m. UTC | #7
On 5/17/24 09:49, Wojciech Drewek wrote:
> 
> 
> On 16.05.2024 18:36, Brett Creeley wrote:
>>
>>
>> On 5/16/2024 7:04 AM, Wojciech Drewek wrote:
>>> Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
>>>
>>>
>>> ice_aqc_opc_download_pkg (0x0C40) AQ sporadically returns error due
>>> to FW issue. Fix this by retrying five times before moving to
>>> Safe Mode.
>>>
>>> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
>>> Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
>>> ---
>>>    drivers/net/ethernet/intel/ice/ice_ddp.c | 19 +++++++++++++++++--
>>>    1 file changed, 17 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/net/ethernet/intel/ice/ice_ddp.c b/drivers/net/ethernet/intel/ice/ice_ddp.c
>>> index ce5034ed2b24..19e2111fcf08 100644
>>> --- a/drivers/net/ethernet/intel/ice/ice_ddp.c
>>> +++ b/drivers/net/ethernet/intel/ice/ice_ddp.c
>>> @@ -1339,6 +1339,7 @@ ice_dwnld_cfg_bufs_no_lock(struct ice_hw *hw, struct ice_buf *bufs, u32 start,
>>>
>>>           for (i = 0; i < count; i++) {
>>>                   bool last = false;
>>> +               int try_cnt = 0;
>>>                   int status;
>>>
>>>                   bh = (struct ice_buf_hdr *)(bufs + start + i);
>>> @@ -1346,8 +1347,22 @@ ice_dwnld_cfg_bufs_no_lock(struct ice_hw *hw, struct ice_buf *bufs, u32 start,
>>>                   if (indicate_last)
>>>                           last = ice_is_last_download_buffer(bh, i, count);
>>>
>>> -               status = ice_aq_download_pkg(hw, bh, ICE_PKG_BUF_SIZE, last,
>>> -                                            &offset, &info, NULL);
>>> +               while (try_cnt < 5) {
>>> +                       status = ice_aq_download_pkg(hw, bh, ICE_PKG_BUF_SIZE,
>>> +                                                    last, &offset, &info,
>>> +                                                    NULL);
>>> +                       if (hw->adminq.sq_last_status != ICE_AQ_RC_ENOSEC &&
>>> +                           hw->adminq.sq_last_status != ICE_AQ_RC_EBADSIG)
>>
>> Are these the only 2 sporadic errors that FW will return?
> 
> Yes, that's right. We don't want to retry in case of other errors since those might be valid.

I would say that those are the only two non-sporadic errors ;)

> 
>>
>>> +                               break;
>>> +
>>> +                       try_cnt++;
>>> +                       msleep(20);
>>> +               }
>>> +
>>> +               if (try_cnt)
>>> +                       dev_dbg(ice_hw_to_dev(hw),
>>> +                               "ice_aq_download_pkg failed, number of retries: %d\n",

s/retries/attempts/
(as retries = attempts + 1 ;))

>>> +                               try_cnt);
>>
>> If try_cnt is non-zero it doesn't mean the last download failed, it just means one or more attempts to download failed right? Maybe just "ice_aq_download_pkg number of retries: %d" since the if (status) check below will print on failure?
> 
> Sounds reasonable, we want this log only because we want to know if we hit this sporadic failure.
> 
>>
>>>
>>>                   /* Save AQ status from download package */
>>>                   if (status) {
>>> -- 
>>> 2.40.1
>>>
>>>
diff mbox series

Patch

diff --git a/drivers/net/ethernet/intel/ice/ice_ddp.c b/drivers/net/ethernet/intel/ice/ice_ddp.c
index ce5034ed2b24..19e2111fcf08 100644
--- a/drivers/net/ethernet/intel/ice/ice_ddp.c
+++ b/drivers/net/ethernet/intel/ice/ice_ddp.c
@@ -1339,6 +1339,7 @@  ice_dwnld_cfg_bufs_no_lock(struct ice_hw *hw, struct ice_buf *bufs, u32 start,
 
 	for (i = 0; i < count; i++) {
 		bool last = false;
+		int try_cnt = 0;
 		int status;
 
 		bh = (struct ice_buf_hdr *)(bufs + start + i);
@@ -1346,8 +1347,22 @@  ice_dwnld_cfg_bufs_no_lock(struct ice_hw *hw, struct ice_buf *bufs, u32 start,
 		if (indicate_last)
 			last = ice_is_last_download_buffer(bh, i, count);
 
-		status = ice_aq_download_pkg(hw, bh, ICE_PKG_BUF_SIZE, last,
-					     &offset, &info, NULL);
+		while (try_cnt < 5) {
+			status = ice_aq_download_pkg(hw, bh, ICE_PKG_BUF_SIZE,
+						     last, &offset, &info,
+						     NULL);
+			if (hw->adminq.sq_last_status != ICE_AQ_RC_ENOSEC &&
+			    hw->adminq.sq_last_status != ICE_AQ_RC_EBADSIG)
+				break;
+
+			try_cnt++;
+			msleep(20);
+		}
+
+		if (try_cnt)
+			dev_dbg(ice_hw_to_dev(hw),
+				"ice_aq_download_pkg failed, number of retries: %d\n",
+				try_cnt);
 
 		/* Save AQ status from download package */
 		if (status) {