Message ID | 20200514095727.1361-1-irusskikh@marvell.com |
---|---|
Headers | show |
Series | net: qed/qede: critical hw error handling | expand |
On Thu, 14 May 2020 12:57:16 +0300 Igor Russkikh wrote: > FastLinQ devices as a complex systems may observe various hardware > level error conditions, both severe and recoverable. > > Driver is able to detect and report this, but so far it only did > trace/dmesg based reporting. > > Here we implement an extended hw error detection, service task > handler captures a dump for the later analysis. > > I also resubmit a patch from Denis Bolotin on tx timeout handler, > addressing David's comment regarding recovery procedure as an extra > reaction on this event. > > v2: > > Removing the patch with ethtool dump and udev magic. Its quite isolated, > I'm working on devlink based logic for this separately. > > v1: > > https://patchwork.ozlabs.org/project/netdev/cover/cover.1588758463.git.irusskikh@marvell.com/ I'm not 100% happy that the debug data gets reported to the management FW before the devlink health code is in place. For the Linux community, I think, having standard Linux interfaces implemented first is the priority.
> I'm not 100% happy that the debug data gets reported to the management > FW before the devlink health code is in place. For the Linux community, > I think, having standard Linux interfaces implemented first is the > priority. Hi Jakub, Thanks for the comment. I feel these two are a bit separate. We try to push important messages to MFW, not debug data. And all these messages are as well perfectly being reported on device level error printouts, they are not kind of lost. And for devlink, we anyway will need all the above infrastructure, to eventually implement devlink dumps and other features. Or, may be I didn't get your point? Thanks, Igor
From: Igor Russkikh <irusskikh@marvell.com> Date: Thu, 14 May 2020 12:57:16 +0300 > FastLinQ devices as a complex systems may observe various hardware > level error conditions, both severe and recoverable. > > Driver is able to detect and report this, but so far it only did > trace/dmesg based reporting. > > Here we implement an extended hw error detection, service task > handler captures a dump for the later analysis. > > I also resubmit a patch from Denis Bolotin on tx timeout handler, > addressing David's comment regarding recovery procedure as an extra > reaction on this event. > > v2: > > Removing the patch with ethtool dump and udev magic. Its quite isolated, > I'm working on devlink based logic for this separately. > > v1: > > https://patchwork.ozlabs.org/project/netdev/cover/cover.1588758463.git.irusskikh@marvell.com/ I'm only applying this series because I trust that you will actually do the devlink work, and you will have it done and submitted in a reasonable amount of ti me. Also, patch #4 had trailing empty lines added to a file, which is warned about by 'git' when I apply your patches. I fixed it up, but this is the kind of thing you should have sorted out before you submit changes to the list. Thank you.
On Thu, 14 May 2020 22:40:12 +0300 Igor Russkikh wrote: > > I'm not 100% happy that the debug data gets reported to the management > > FW before the devlink health code is in place. For the Linux community, > > I think, having standard Linux interfaces implemented first is the > > priority. > > Hi Jakub, > > Thanks for the comment. I feel these two are a bit separate. We try to push > important messages to MFW, not debug data. And all these messages are as well > perfectly being reported on device level error printouts, they are not kind of > lost. > > And for devlink, we anyway will need all the above infrastructure, to > eventually implement devlink dumps and other features. > > Or, may be I didn't get your point? That's fine, I'm just saying - I hope the devlink part doesn't take too long to implement :)
On 14/05/2020 11:02 pm, Jakub Kicinski wrote: > That's fine, I'm just saying - I hope the devlink part doesn't take too > long to implement :) > > I'm only applying this series because I trust that you will actually do > the > devlink work, and you will have it done and submitted in a reasonable > amount > of ti me. I see. Thanks Jakub, David. Doing hard already on devlink side. > Also, patch #4 had trailing empty lines added to a file, which is > warned about by 'git' when I apply your patches. I fixed it up, but > this is the kind of thing you should have sorted out before you submit > changes to the list. Sorry for that miss, will do in future. Thanks Igor
From: David Miller <davem@davemloft.net> Date: Thu, 14 May 2020 13:01:59 -0700 (PDT) > From: Igor Russkikh <irusskikh@marvell.com> > Date: Thu, 14 May 2020 12:57:16 +0300 > >> FastLinQ devices as a complex systems may observe various hardware >> level error conditions, both severe and recoverable. >> >> Driver is able to detect and report this, but so far it only did >> trace/dmesg based reporting. >> >> Here we implement an extended hw error detection, service task >> handler captures a dump for the later analysis. >> >> I also resubmit a patch from Denis Bolotin on tx timeout handler, >> addressing David's comment regarding recovery procedure as an extra >> reaction on this event. >> >> v2: >> >> Removing the patch with ethtool dump and udev magic. Its quite isolated, >> I'm working on devlink based logic for this separately. >> >> v1: >> >> https://patchwork.ozlabs.org/project/netdev/cover/cover.1588758463.git.irusskikh@marvell.com/ > > I'm only applying this series because I trust that you will actually do the > devlink work, and you will have it done and submitted in a reasonable amount > of ti me. > > Also, patch #4 had trailing empty lines added to a file, which is > warned about by 'git' when I apply your patches. I fixed it up, but > this is the kind of thing you should have sorted out before you submit > changes to the list. Actually, I had to revert, please fix these warnings (with gcc-10.1.1 on Fedora)_: drivers/net/ethernet/qlogic/qed/qed_dev.c: In function ‘qed_llh_add_mac_filter’: ./include/linux/printk.h:303:2: warning: ‘abs_ppfid’ may be used uninitialized in this function [-Wmaybe-uninitialized] 303 | printk(KERN_NOTICE pr_fmt(fmt), ##__VA_ARGS__) | ^~~~~~ drivers/net/ethernet/qlogic/qed/qed_dev.c:983:17: note: ‘abs_ppfid’ was declared here 983 | u8 filter_idx, abs_ppfid; | ^~~~~~~~~
From: David Miller <davem@davemloft.net> Date: Thu, 14 May 2020 13:09:52 -0700 (PDT) > Actually, I had to revert, please fix these warnings (with gcc-10.1.1 on Fedora)_: > > drivers/net/ethernet/qlogic/qed/qed_dev.c: In function ‘qed_llh_add_mac_filter’: > ./include/linux/printk.h:303:2: warning: ‘abs_ppfid’ may be used uninitialized in this function [-Wmaybe-uninitialized] > 303 | printk(KERN_NOTICE pr_fmt(fmt), ##__VA_ARGS__) > | ^~~~~~ > drivers/net/ethernet/qlogic/qed/qed_dev.c:983:17: note: ‘abs_ppfid’ was declared here > 983 | u8 filter_idx, abs_ppfid; > | ^~~~~~~~~ Hmm, this seems to actually be an existing warning, sorry. I'll reapply this.