Message ID | 1455634911-31206-5-git-send-email-ogerlitz@mellanox.com |
---|---|
State | Changes Requested, archived |
Delegated to: | David Miller |
Headers | show |
Hello. On 02/16/2016 06:01 PM, Or Gerlitz wrote: > From: Daniel Jurgens <danielj@mellanox.com> > > The PCI channel could go offline during reset due to EEH. Don't bug on in > this case, the error is recoverable. > > Fixes: f6bc11e42646 ('net/mlx4_core: Enhance the catas flow to support device reset') > Signed-off-by: Daniel Jurgens <danielj@mellanox.com> > Reviewed-by: Yishai Hadas <yishaih@mellanox.com> > --- > drivers/net/ethernet/mellanox/mlx4/catas.c | 11 +++++++++-- > 1 file changed, 9 insertions(+), 2 deletions(-) > > diff --git a/drivers/net/ethernet/mellanox/mlx4/catas.c b/drivers/net/ethernet/mellanox/mlx4/catas.c > index 715de8a..e866082 100644 > --- a/drivers/net/ethernet/mellanox/mlx4/catas.c > +++ b/drivers/net/ethernet/mellanox/mlx4/catas.c > @@ -182,10 +182,17 @@ void mlx4_enter_error_state(struct mlx4_dev_persistent *persist) > err = mlx4_reset_slave(dev); > else > err = mlx4_reset_master(dev); > - BUG_ON(err != 0); > + > + if (!err) > + mlx4_err(dev, "device was reset successfully\n"); > + else > + /* EEH could have disabled the PCI channel during reset. That's > + * recoverable and the PCI error flow will handle it. > + */ > + if (!pci_channel_offline(dev->persist->pdev)) > + BUG_ON(1); I'm afraid this needs {}. [...] MBR, Sergei
Hello. On 02/16/2016 06:01 PM, Or Gerlitz wrote: > From: Daniel Jurgens <danielj@mellanox.com> > > The PCI channel could go offline during reset due to EEH. Don't bug on in > this case, the error is recoverable. > > Fixes: f6bc11e42646 ('net/mlx4_core: Enhance the catas flow to support device reset') > Signed-off-by: Daniel Jurgens <danielj@mellanox.com> > Reviewed-by: Yishai Hadas <yishaih@mellanox.com> > --- > drivers/net/ethernet/mellanox/mlx4/catas.c | 11 +++++++++-- > 1 file changed, 9 insertions(+), 2 deletions(-) > > diff --git a/drivers/net/ethernet/mellanox/mlx4/catas.c b/drivers/net/ethernet/mellanox/mlx4/catas.c > index 715de8a..e866082 100644 > --- a/drivers/net/ethernet/mellanox/mlx4/catas.c > +++ b/drivers/net/ethernet/mellanox/mlx4/catas.c > @@ -182,10 +182,17 @@ void mlx4_enter_error_state(struct mlx4_dev_persistent *persist) > err = mlx4_reset_slave(dev); > else > err = mlx4_reset_master(dev); > - BUG_ON(err != 0); > + Just noticed: the empty line here isn't really needed. > + if (!err) > + mlx4_err(dev, "device was reset successfully\n"); > + else > + /* EEH could have disabled the PCI channel during reset. That's > + * recoverable and the PCI error flow will handle it. > + */ > + if (!pci_channel_offline(dev->persist->pdev)) > + BUG_ON(1); [...] MBR, Sergei
On 2/16/2016 8:30 PM, Sergei Shtylyov wrote: >> --- a/drivers/net/ethernet/mellanox/mlx4/catas.c >> +++ b/drivers/net/ethernet/mellanox/mlx4/catas.c >> @@ -182,10 +182,17 @@ void mlx4_enter_error_state(struct >> mlx4_dev_persistent *persist) >> err = mlx4_reset_slave(dev); >> else >> err = mlx4_reset_master(dev); >> - BUG_ON(err != 0); >> + >> + if (!err) >> + mlx4_err(dev, "device was reset successfully\n"); >> + else >> + /* EEH could have disabled the PCI channel during reset. That's >> + * recoverable and the PCI error flow will handle it. >> + */ >> + if (!pci_channel_offline(dev->persist->pdev)) >> + BUG_ON(1); > > I'm afraid this needs {}. Hey, don't be afraid just for that, stay cool... we can add that here if it helps, as for the blank line not deletedby this patch on which you commented later, will not remove it as part of a patch which is a strict fix. Or.
Hello. On 2/17/2016 12:21 PM, Or Gerlitz wrote: >>> --- a/drivers/net/ethernet/mellanox/mlx4/catas.c >>> +++ b/drivers/net/ethernet/mellanox/mlx4/catas.c >>> @@ -182,10 +182,17 @@ void mlx4_enter_error_state(struct >>> mlx4_dev_persistent *persist) >>> err = mlx4_reset_slave(dev); >>> else >>> err = mlx4_reset_master(dev); >>> - BUG_ON(err != 0); >>> + >>> + if (!err) >>> + mlx4_err(dev, "device was reset successfully\n"); >>> + else >>> + /* EEH could have disabled the PCI channel during reset. That's >>> + * recoverable and the PCI error flow will handle it. >>> + */ >>> + if (!pci_channel_offline(dev->persist->pdev)) >>> + BUG_ON(1); >> >> I'm afraid this needs {}. > > Hey, don't be afraid just for that, stay cool... we can add that here if it > helps, as for the blank line not deletedby this patch on which you commented > later, will not remove it as part of a patch which is a strict fix. Actually it's *added* by the patch, that's why I asked to not do that. > Or. MBR, Sergei
Hello. On 02/16/2016 06:01 PM, Or Gerlitz wrote: > From: Daniel Jurgens <danielj@mellanox.com> > > The PCI channel could go offline during reset due to EEH. Don't bug on in > this case, the error is recoverable. > > Fixes: f6bc11e42646 ('net/mlx4_core: Enhance the catas flow to support device reset') > Signed-off-by: Daniel Jurgens <danielj@mellanox.com> > Reviewed-by: Yishai Hadas <yishaih@mellanox.com> > --- > drivers/net/ethernet/mellanox/mlx4/catas.c | 11 +++++++++-- > 1 file changed, 9 insertions(+), 2 deletions(-) > > diff --git a/drivers/net/ethernet/mellanox/mlx4/catas.c b/drivers/net/ethernet/mellanox/mlx4/catas.c > index 715de8a..e866082 100644 > --- a/drivers/net/ethernet/mellanox/mlx4/catas.c > +++ b/drivers/net/ethernet/mellanox/mlx4/catas.c > @@ -182,10 +182,17 @@ void mlx4_enter_error_state(struct mlx4_dev_persistent *persist) > err = mlx4_reset_slave(dev); > else > err = mlx4_reset_master(dev); > - BUG_ON(err != 0); > + > + if (!err) > + mlx4_err(dev, "device was reset successfully\n"); > + else > + /* EEH could have disabled the PCI channel during reset. That's > + * recoverable and the PCI error flow will handle it. > + */ > + if (!pci_channel_offline(dev->persist->pdev)) > + BUG_ON(1); BTW, why not just BUG_ON(!pci_channel_offline(dev->persist->pdev)); WBR, Sergei
diff --git a/drivers/net/ethernet/mellanox/mlx4/catas.c b/drivers/net/ethernet/mellanox/mlx4/catas.c index 715de8a..e866082 100644 --- a/drivers/net/ethernet/mellanox/mlx4/catas.c +++ b/drivers/net/ethernet/mellanox/mlx4/catas.c @@ -182,10 +182,17 @@ void mlx4_enter_error_state(struct mlx4_dev_persistent *persist) err = mlx4_reset_slave(dev); else err = mlx4_reset_master(dev); - BUG_ON(err != 0); + + if (!err) + mlx4_err(dev, "device was reset successfully\n"); + else + /* EEH could have disabled the PCI channel during reset. That's + * recoverable and the PCI error flow will handle it. + */ + if (!pci_channel_offline(dev->persist->pdev)) + BUG_ON(1); dev->persist->state |= MLX4_DEVICE_STATE_INTERNAL_ERROR; - mlx4_err(dev, "device was reset successfully\n"); mutex_unlock(&persist->device_state_mutex); /* At that step HW was already reset, now notify clients */