diff mbox series

[net] hv_netvsc: Fix hibernation for mlx5 VF driver

Message ID 20200905025218.45268-1-decui@microsoft.com
State Changes Requested
Delegated to: David Miller
Headers show
Series [net] hv_netvsc: Fix hibernation for mlx5 VF driver | expand

Commit Message

Dexuan Cui Sept. 5, 2020, 2:52 a.m. UTC
mlx5_suspend()/resume() keep the network interface, so during hibernation
netvsc_unregister_vf() and netvsc_register_vf() are not called, and hence
netvsc_resume() should call netvsc_vf_changed() to switch the data path
back to the VF after hibernation. Similarly, netvsc_suspend() should
not call netvsc_unregister_vf().

BTW, mlx4_suspend()/resume() are differnt in that they destroy and
re-create the network device, so netvsc_register_vf() and
netvsc_unregister_vf() are automatically called. Note: mlx4 can also work
with the changes here because in netvsc_suspend()/resume()
ndev_ctx->vf_netdev is NULL for mlx4.

Fixes: 0efeea5fb153 ("hv_netvsc: Add the support of hibernation")
Signed-off-by: Dexuan Cui <decui@microsoft.com>
---
 drivers/net/hyperv/netvsc_drv.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

Comments

Jakub Kicinski Sept. 5, 2020, 11:27 p.m. UTC | #1
On Fri,  4 Sep 2020 19:52:18 -0700 Dexuan Cui wrote:
> mlx5_suspend()/resume() keep the network interface, so during hibernation
> netvsc_unregister_vf() and netvsc_register_vf() are not called, and hence
> netvsc_resume() should call netvsc_vf_changed() to switch the data path
> back to the VF after hibernation.

Does suspending the system automatically switch back to the synthetic
datapath? Please clarify this in the commit message and/or add a code
comment.

> Similarly, netvsc_suspend() should not call netvsc_unregister_vf().
> 
> BTW, mlx4_suspend()/resume() are differnt in that they destroy and
> re-create the network device, so netvsc_register_vf() and
> netvsc_unregister_vf() are automatically called. Note: mlx4 can also work
> with the changes here because in netvsc_suspend()/resume()
> ndev_ctx->vf_netdev is NULL for mlx4.
> 
> Fixes: 0efeea5fb153 ("hv_netvsc: Add the support of hibernation")
> Signed-off-by: Dexuan Cui <decui@microsoft.com>
> ---
>  drivers/net/hyperv/netvsc_drv.c | 11 ++++++-----
>  1 file changed, 6 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
> index 64b0a74c1523..f896059a9588 100644
> --- a/drivers/net/hyperv/netvsc_drv.c
> +++ b/drivers/net/hyperv/netvsc_drv.c
> @@ -2587,7 +2587,7 @@ static int netvsc_remove(struct hv_device *dev)
>  static int netvsc_suspend(struct hv_device *dev)
>  {
>  	struct net_device_context *ndev_ctx;
> -	struct net_device *vf_netdev, *net;
> +	struct net_device *net;
>  	struct netvsc_device *nvdev;
>  	int ret;

Please keep reverse xmas tree variable ordering.

> @@ -2604,10 +2604,6 @@ static int netvsc_suspend(struct hv_device *dev)
>  		goto out;
>  	}
>  
> -	vf_netdev = rtnl_dereference(ndev_ctx->vf_netdev);
> -	if (vf_netdev)
> -		netvsc_unregister_vf(vf_netdev);
> -
>  	/* Save the current config info */
>  	ndev_ctx->saved_netvsc_dev_info = netvsc_devinfo_get(nvdev);
>  
> @@ -2623,6 +2619,7 @@ static int netvsc_resume(struct hv_device *dev)
>  	struct net_device *net = hv_get_drvdata(dev);
>  	struct net_device_context *net_device_ctx;
>  	struct netvsc_device_info *device_info;
> +	struct net_device *vf_netdev;
>  	int ret;
>  
>  	rtnl_lock();
> @@ -2635,6 +2632,10 @@ static int netvsc_resume(struct hv_device *dev)
>  	netvsc_devinfo_put(device_info);
>  	net_device_ctx->saved_netvsc_dev_info = NULL;
>  
> +	vf_netdev = rtnl_dereference(net_device_ctx->vf_netdev);
> +	if (vf_netdev && netvsc_vf_changed(vf_netdev) != NOTIFY_OK)
> +		ret = -EINVAL;

Should you perhaps remove the VF in case of the failure?

>  	rtnl_unlock();
>  
>  	return ret;
Dexuan Cui Sept. 6, 2020, 3:05 a.m. UTC | #2
> From: Jakub Kicinski <kuba@kernel.org>
> Sent: Saturday, September 5, 2020 4:27 PM
> [...]
> On Fri,  4 Sep 2020 19:52:18 -0700 Dexuan Cui wrote:
> > mlx5_suspend()/resume() keep the network interface, so during hibernation
> > netvsc_unregister_vf() and netvsc_register_vf() are not called, and hence
> > netvsc_resume() should call netvsc_vf_changed() to switch the data path
> > back to the VF after hibernation.
> 
> Does suspending the system automatically switch back to the synthetic
> datapath? 
Yes. 

For mlx4, since the VF network interafce is explicitly destroyed and re-created
during hibernation (i.e. suspend + resume), hv_netvsc explicitly switches the
data path from and to the VF.

For mlx5, the VF network interface persists across hibernation, so there is no
explicit switch-over, but after we close and re-open the vmbus channel of
the netvsc NIC in netvsc_suspend() and netvsc_resume(), the data path is
implicitly switched to the netvsc NIC, and with this patch netvsc_resume() ->
netvsc_vf_changed() switches the data path back to the mlx5 NIC.

> Please clarify this in the commit message and/or add a code
> comment.
I will add a comment in the commit message and the code.
 
> > @@ -2587,7 +2587,7 @@ static int netvsc_remove(struct hv_device *dev)
> >  static int netvsc_suspend(struct hv_device *dev)
> >  {
> >  	struct net_device_context *ndev_ctx;
> > -	struct net_device *vf_netdev, *net;
> > +	struct net_device *net;
> >  	struct netvsc_device *nvdev;
> >  	int ret;
> 
> Please keep reverse xmas tree variable ordering.

Will do.

> > @@ -2635,6 +2632,10 @@ static int netvsc_resume(struct hv_device *dev)
> >  	netvsc_devinfo_put(device_info);
> >  	net_device_ctx->saved_netvsc_dev_info = NULL;
> >
> > +	vf_netdev = rtnl_dereference(net_device_ctx->vf_netdev);
> > +	if (vf_netdev && netvsc_vf_changed(vf_netdev) != NOTIFY_OK)
> > +		ret = -EINVAL;
> 
> Should you perhaps remove the VF in case of the failure?
IMO this failure actually should not happen since we're resuming the netvsc
NIC, so we're sure we have a valid pointer to the netvsc net device, and
netvsc_vf_changed() should be able to find the netvsc pointer and return
NOTIFY_OK. In case of a failure, something really bad must be happening,
and I'm not sure if it's safe to simply remove the VF, so I just return
-EINVAL for simplicity, since I believe the failure should not happen in practice.

I would rather keep the code as-is, but I'm OK to add a WARN_ON(1) if you
think that's necessary.

Thanks,
-- Dexuan
Jakub Kicinski Sept. 6, 2020, 4:26 p.m. UTC | #3
On Sun, 6 Sep 2020 03:05:48 +0000 Dexuan Cui wrote:
> > > @@ -2635,6 +2632,10 @@ static int netvsc_resume(struct hv_device *dev)
> > >  	netvsc_devinfo_put(device_info);
> > >  	net_device_ctx->saved_netvsc_dev_info = NULL;
> > >
> > > +	vf_netdev = rtnl_dereference(net_device_ctx->vf_netdev);
> > > +	if (vf_netdev && netvsc_vf_changed(vf_netdev) != NOTIFY_OK)
> > > +		ret = -EINVAL;  
> > 
> > Should you perhaps remove the VF in case of the failure?  
> IMO this failure actually should not happen since we're resuming the netvsc
> NIC, so we're sure we have a valid pointer to the netvsc net device, and
> netvsc_vf_changed() should be able to find the netvsc pointer and return
> NOTIFY_OK. In case of a failure, something really bad must be happening,
> and I'm not sure if it's safe to simply remove the VF, so I just return
> -EINVAL for simplicity, since I believe the failure should not happen in practice.

Okay, I see that the errors propagated by netvsc_vf_changed() aren't
actually coming from netvsc_switch_datapath(), so you're right. The
failures here won't be meaningful.

> I would rather keep the code as-is, but I'm OK to add a WARN_ON(1) if you
> think that's necessary.

No need, I think core will complain when resume callback fails. That
should be sufficient.
diff mbox series

Patch

diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index 64b0a74c1523..f896059a9588 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -2587,7 +2587,7 @@  static int netvsc_remove(struct hv_device *dev)
 static int netvsc_suspend(struct hv_device *dev)
 {
 	struct net_device_context *ndev_ctx;
-	struct net_device *vf_netdev, *net;
+	struct net_device *net;
 	struct netvsc_device *nvdev;
 	int ret;
 
@@ -2604,10 +2604,6 @@  static int netvsc_suspend(struct hv_device *dev)
 		goto out;
 	}
 
-	vf_netdev = rtnl_dereference(ndev_ctx->vf_netdev);
-	if (vf_netdev)
-		netvsc_unregister_vf(vf_netdev);
-
 	/* Save the current config info */
 	ndev_ctx->saved_netvsc_dev_info = netvsc_devinfo_get(nvdev);
 
@@ -2623,6 +2619,7 @@  static int netvsc_resume(struct hv_device *dev)
 	struct net_device *net = hv_get_drvdata(dev);
 	struct net_device_context *net_device_ctx;
 	struct netvsc_device_info *device_info;
+	struct net_device *vf_netdev;
 	int ret;
 
 	rtnl_lock();
@@ -2635,6 +2632,10 @@  static int netvsc_resume(struct hv_device *dev)
 	netvsc_devinfo_put(device_info);
 	net_device_ctx->saved_netvsc_dev_info = NULL;
 
+	vf_netdev = rtnl_dereference(net_device_ctx->vf_netdev);
+	if (vf_netdev && netvsc_vf_changed(vf_netdev) != NOTIFY_OK)
+		ret = -EINVAL;
+
 	rtnl_unlock();
 
 	return ret;