diff mbox

[net-next] hv_netvsc: Increase delay for RNDIS_STATUS_NETWORK_CHANGE

Message ID 1454458556-10591-1-git-send-email-haiyangz@microsoft.com
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Haiyang Zhang Feb. 3, 2016, 12:15 a.m. UTC
We simulates a link down period for RNDIS_STATUS_NETWORK_CHANGE message to
trigger DHCP renew. User daemons may need multiple seconds to trigger the
link down event. (e.g. ifplugd: 5sec, network-manager: 4sec.) So update
this link down period to 10 sec to properly trigger DHCP renew.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
---
 drivers/net/hyperv/netvsc_drv.c |   10 ++++++++--
 1 files changed, 8 insertions(+), 2 deletions(-)

Comments

Vitaly Kuznetsov Feb. 3, 2016, 1:05 p.m. UTC | #1
Haiyang Zhang <haiyangz@microsoft.com> writes:

> We simulates a link down period for RNDIS_STATUS_NETWORK_CHANGE message to
> trigger DHCP renew. User daemons may need multiple seconds to trigger the
> link down event. (e.g. ifplugd: 5sec, network-manager: 4sec.) So update
> this link down period to 10 sec to properly trigger DHCP renew.
>

I probably don't follow: why do we need sucha a delay? If (with real
hardware) you plug network cable out and in one second you plug it in
you'll get DHCP renewed, right?

When I introduced RNDIS_STATUS_NETWORK_CHANGE handling by emulating a
pair of up/down events I put 2 second delay to make link_watch happy (as
we only handle 1 event per second there) but 10 seconds sounds to much
to me.

> Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
> ---
>  drivers/net/hyperv/netvsc_drv.c |   10 ++++++++--
>  1 files changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
> index 1d3a665..6f23973 100644
> --- a/drivers/net/hyperv/netvsc_drv.c
> +++ b/drivers/net/hyperv/netvsc_drv.c
> @@ -43,6 +43,8 @@
>
>  #define RING_SIZE_MIN 64
>  #define LINKCHANGE_INT (2 * HZ)
> +/* Extra delay for RNDIS_STATUS_NETWORK_CHANGE: */
> +#define LINKCHANGE_DELAY (8 * HZ)
>  static int ring_size = 128;
>  module_param(ring_size, int, S_IRUGO);
>  MODULE_PARM_DESC(ring_size, "Ring buffer size (# of pages)");
> @@ -964,6 +966,7 @@ static void netvsc_link_change(struct work_struct *w)
>  		return;
>  	}
>  	ndev_ctx->last_reconfig = jiffies;
> +	delay = LINKCHANGE_INT;
>
>  	spin_lock_irqsave(&ndev_ctx->lock, flags);
>  	if (!list_empty(&ndev_ctx->reconfig_events)) {
> @@ -1009,8 +1012,11 @@ static void netvsc_link_change(struct work_struct *w)
>  			netif_tx_stop_all_queues(net);
>  			event->event = RNDIS_STATUS_MEDIA_CONNECT;
>  			spin_lock_irqsave(&ndev_ctx->lock, flags);
> -			list_add_tail(&event->list, &ndev_ctx->reconfig_events);
> +			list_add(&event->list, &ndev_ctx->reconfig_events);

Why? Adding to tail was here to not screw the order of events...

>  			spin_unlock_irqrestore(&ndev_ctx->lock, flags);
> +
> +			ndev_ctx->last_reconfig += LINKCHANGE_DELAY;
> +			delay = LINKCHANGE_INT + LINKCHANGE_DELAY;
>  			reschedule = true;
>  		}
>  		break;
> @@ -1025,7 +1031,7 @@ static void netvsc_link_change(struct work_struct *w)
>  	 * second, handle next reconfig event in 2 seconds.
>  	 */
>  	if (reschedule)
> -		schedule_delayed_work(&ndev_ctx->dwork, LINKCHANGE_INT);
> +		schedule_delayed_work(&ndev_ctx->dwork, delay);
>  }
>
>  static void netvsc_free_netdev(struct net_device *netdev)
Haiyang Zhang Feb. 3, 2016, 3:46 p.m. UTC | #2
> -----Original Message-----
> From: Vitaly Kuznetsov [mailto:vkuznets@redhat.com]
> Sent: Wednesday, February 3, 2016 8:06 AM
> To: Haiyang Zhang <haiyangz@microsoft.com>
> Cc: davem@davemloft.net; netdev@vger.kernel.org; KY Srinivasan
> <kys@microsoft.com>; olaf@aepfle.de; linux-kernel@vger.kernel.org;
> driverdev-devel@linuxdriverproject.org
> Subject: Re: [PATCH net-next] hv_netvsc: Increase delay for
> RNDIS_STATUS_NETWORK_CHANGE
> 
> Haiyang Zhang <haiyangz@microsoft.com> writes:
> 
> > We simulates a link down period for RNDIS_STATUS_NETWORK_CHANGE
> message to
> > trigger DHCP renew. User daemons may need multiple seconds to trigger
> the
> > link down event. (e.g. ifplugd: 5sec, network-manager: 4sec.) So update
> > this link down period to 10 sec to properly trigger DHCP renew.
> >
> 
> I probably don't follow: why do we need sucha a delay? If (with real
> hardware) you plug network cable out and in one second you plug it in
> you'll get DHCP renewed, right?
> 
> When I introduced RNDIS_STATUS_NETWORK_CHANGE handling by
> emulating a
> pair of up/down events I put 2 second delay to make link_watch happy (as
> we only handle 1 event per second there) but 10 seconds sounds to much
> to me.

In the test on Hyper-V, our software on host side  wants to trigger DHCP 
renew by sending only a RNDIS_STATUS_NETWORK_CHANGE message to 
a guest without physically unplug the cable. But, this message didn't trigger 
DHCP renew within 2 seconds. The VM is Centos 7.1 using Networkmanager, 
which needs 4 seconds after link down to renew IP. Some daemon, like 
ifplugd, needs 5 sec to renew. That's why we increase the simulated link 
down time for RNDIS_STATUS_NETWORK_CHANGE message. 


> > @@ -1009,8 +1012,11 @@ static void netvsc_link_change(struct
> work_struct *w)
> >  			netif_tx_stop_all_queues(net);
> >  			event->event = RNDIS_STATUS_MEDIA_CONNECT;
> >  			spin_lock_irqsave(&ndev_ctx->lock, flags);
> > -			list_add_tail(&event->list, &ndev_ctx-
> >reconfig_events);
> > +			list_add(&event->list, &ndev_ctx->reconfig_events);
> 
> Why? Adding to tail was here to not screw the order of events...

The RNDIS_STATUS_MEDIA_CONNECT message triggers two "half" events -- 
link down & up. After "link down", we want the paired "link up" to be the 
immediate next event. Since the function picks the next event from the list 
head, so it should be inserted to list head. Otherwise, the possible existing 
events in the list will be processed in the middle of these two "half events" 
-- link down & up.

Thanks,
- Haiyang
Vitaly Kuznetsov Feb. 3, 2016, 4:06 p.m. UTC | #3
Haiyang Zhang <haiyangz@microsoft.com> writes:

>> -----Original Message-----
>> From: Vitaly Kuznetsov [mailto:vkuznets@redhat.com]
>> Sent: Wednesday, February 3, 2016 8:06 AM
>> To: Haiyang Zhang <haiyangz@microsoft.com>
>> Cc: davem@davemloft.net; netdev@vger.kernel.org; KY Srinivasan
>> <kys@microsoft.com>; olaf@aepfle.de; linux-kernel@vger.kernel.org;
>> driverdev-devel@linuxdriverproject.org
>> Subject: Re: [PATCH net-next] hv_netvsc: Increase delay for
>> RNDIS_STATUS_NETWORK_CHANGE
>> 
>> Haiyang Zhang <haiyangz@microsoft.com> writes:
>> 
>> > We simulates a link down period for RNDIS_STATUS_NETWORK_CHANGE
>> message to
>> > trigger DHCP renew. User daemons may need multiple seconds to trigger
>> the
>> > link down event. (e.g. ifplugd: 5sec, network-manager: 4sec.) So update
>> > this link down period to 10 sec to properly trigger DHCP renew.
>> >
>> 
>> I probably don't follow: why do we need sucha a delay? If (with real
>> hardware) you plug network cable out and in one second you plug it in
>> you'll get DHCP renewed, right?
>> 
>> When I introduced RNDIS_STATUS_NETWORK_CHANGE handling by
>> emulating a
>> pair of up/down events I put 2 second delay to make link_watch happy (as
>> we only handle 1 event per second there) but 10 seconds sounds to much
>> to me.
>
> In the test on Hyper-V, our software on host side  wants to trigger DHCP 
> renew by sending only a RNDIS_STATUS_NETWORK_CHANGE message to 
> a guest without physically unplug the cable. But, this message didn't trigger 
> DHCP renew within 2 seconds. The VM is Centos 7.1 using Networkmanager, 
> which needs 4 seconds after link down to renew IP. Some daemon, like 
> ifplugd, needs 5 sec to renew. That's why we increase the simulated link 
> down time for RNDIS_STATUS_NETWORK_CHANGE message.

Yes, I understand the motivation but sorry if I was unclear with my
question. I meant to say that with physical network adapters it's
possible to trigger same two events by plugging your cable out and then
plugging it back in and it is certailnly doable in less than 10
seconds. NetworkManager or whoever is supposed to handle these events
and we don't really care how fast -- I think that 4 or 5 seconds
mentioned above is just an observation.

>
>> > @@ -1009,8 +1012,11 @@ static void netvsc_link_change(struct
>> work_struct *w)
>> >  			netif_tx_stop_all_queues(net);
>> >  			event->event = RNDIS_STATUS_MEDIA_CONNECT;
>> >  			spin_lock_irqsave(&ndev_ctx->lock, flags);
>> > -			list_add_tail(&event->list, &ndev_ctx-
>> >reconfig_events);
>> > +			list_add(&event->list, &ndev_ctx->reconfig_events);
>> 
>> Why? Adding to tail was here to not screw the order of events...
>
> The RNDIS_STATUS_MEDIA_CONNECT message triggers two "half" events -- 
> link down & up. After "link down", we want the paired "link up" to be the 
> immediate next event. Since the function picks the next event from the list 
> head, so it should be inserted to list head. Otherwise, the possible existing 
> events in the list will be processed in the middle of these two "half events" 
> -- link down & up.

Ah, yes, you're probably right.
Haiyang Zhang Feb. 3, 2016, 4:39 p.m. UTC | #4
> -----Original Message-----
> From: Vitaly Kuznetsov [mailto:vkuznets@redhat.com]
> Sent: Wednesday, February 3, 2016 11:06 AM
> To: Haiyang Zhang <haiyangz@microsoft.com>
> Cc: davem@davemloft.net; netdev@vger.kernel.org; KY Srinivasan
> <kys@microsoft.com>; olaf@aepfle.de; linux-kernel@vger.kernel.org;
> driverdev-devel@linuxdriverproject.org
> Subject: Re: [PATCH net-next] hv_netvsc: Increase delay for
> RNDIS_STATUS_NETWORK_CHANGE
> 
> Haiyang Zhang <haiyangz@microsoft.com> writes:
> 
> >> -----Original Message-----
> >> From: Vitaly Kuznetsov [mailto:vkuznets@redhat.com]
> >> Sent: Wednesday, February 3, 2016 8:06 AM
> >> To: Haiyang Zhang <haiyangz@microsoft.com>
> >> Cc: davem@davemloft.net; netdev@vger.kernel.org; KY Srinivasan
> >> <kys@microsoft.com>; olaf@aepfle.de; linux-kernel@vger.kernel.org;
> >> driverdev-devel@linuxdriverproject.org
> >> Subject: Re: [PATCH net-next] hv_netvsc: Increase delay for
> >> RNDIS_STATUS_NETWORK_CHANGE
> >>
> >> Haiyang Zhang <haiyangz@microsoft.com> writes:
> >>
> >> > We simulates a link down period for
> RNDIS_STATUS_NETWORK_CHANGE
> >> message to
> >> > trigger DHCP renew. User daemons may need multiple seconds to
> trigger
> >> the
> >> > link down event. (e.g. ifplugd: 5sec, network-manager: 4sec.) So update
> >> > this link down period to 10 sec to properly trigger DHCP renew.
> >> >
> >>
> >> I probably don't follow: why do we need sucha a delay? If (with real
> >> hardware) you plug network cable out and in one second you plug it in
> >> you'll get DHCP renewed, right?
> >>
> >> When I introduced RNDIS_STATUS_NETWORK_CHANGE handling by
> >> emulating a
> >> pair of up/down events I put 2 second delay to make link_watch happy
> (as
> >> we only handle 1 event per second there) but 10 seconds sounds to much
> >> to me.
> >
> > In the test on Hyper-V, our software on host side  wants to trigger DHCP
> > renew by sending only a RNDIS_STATUS_NETWORK_CHANGE message to
> > a guest without physically unplug the cable. But, this message didn't trigger
> > DHCP renew within 2 seconds. The VM is Centos 7.1 using
> Networkmanager,
> > which needs 4 seconds after link down to renew IP. Some daemon, like
> > ifplugd, needs 5 sec to renew. That's why we increase the simulated link
> > down time for RNDIS_STATUS_NETWORK_CHANGE message.
> 
> Yes, I understand the motivation but sorry if I was unclear with my
> question. I meant to say that with physical network adapters it's
> possible to trigger same two events by plugging your cable out and then
> plugging it back in and it is certailnly doable in less than 10
> seconds. NetworkManager or whoever is supposed to handle these events
> and we don't really care how fast -- I think that 4 or 5 seconds
> mentioned above is just an observation.

(forgot mailing lists in my last reply.... re-sending...)

Our test failed (i.e. not triggering DHCP renew) with existing 2sec delay. According 
to the ifplugd man page, it ignores link down time <5sec:
  http://linux.die.net/man/8/ifplugd
    -d | --delay-down= SECS Specify delay for deconfiguring interface (default: 5)

Networkmanager also has a waiting period (4sec).

Thanks,
- Haiyang
David Miller Feb. 9, 2016, 10:04 a.m. UTC | #5
From: Haiyang Zhang <haiyangz@microsoft.com>
Date: Tue,  2 Feb 2016 16:15:56 -0800

> We simulates a link down period for RNDIS_STATUS_NETWORK_CHANGE message to
> trigger DHCP renew. User daemons may need multiple seconds to trigger the
> link down event. (e.g. ifplugd: 5sec, network-manager: 4sec.) So update
> this link down period to 10 sec to properly trigger DHCP renew.
> 
> Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>

Two things look really bad about this to me:

1) Any value you choose is arbitrary.  If some new network configuration daemon
   is slower, you will have to change this value again.

   This is _NOT_ sustainable in the long term.

2) It is completely unclear to me why this driver needs to delay at all or
   wait for anything.  I see no other driver having to deal with this issue.

Until you address both of these points I am not going to apply this patch.

Thanks.
Haiyang Zhang Feb. 9, 2016, 3:31 p.m. UTC | #6
> -----Original Message-----
> From: David Miller [mailto:davem@davemloft.net]
> Sent: Tuesday, February 9, 2016 5:05 AM
> To: Haiyang Zhang <haiyangz@microsoft.com>
> Cc: netdev@vger.kernel.org; KY Srinivasan <kys@microsoft.com>;
> olaf@aepfle.de; vkuznets@redhat.com; linux-kernel@vger.kernel.org;
> driverdev-devel@linuxdriverproject.org
> Subject: Re: [PATCH net-next] hv_netvsc: Increase delay for
> RNDIS_STATUS_NETWORK_CHANGE
> 
> From: Haiyang Zhang <haiyangz@microsoft.com>
> Date: Tue,  2 Feb 2016 16:15:56 -0800
> 
> > We simulates a link down period for RNDIS_STATUS_NETWORK_CHANGE
> > message to trigger DHCP renew. User daemons may need multiple seconds
> > to trigger the link down event. (e.g. ifplugd: 5sec, network-manager:
> > 4sec.) So update this link down period to 10 sec to properly trigger DHCP
> renew.
> >
> > Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
> 
> Two things look really bad about this to me:
> 
> 1) Any value you choose is arbitrary.  If some new network configuration
> daemon
>    is slower, you will have to change this value again.
> 
>    This is _NOT_ sustainable in the long term.
> 
> 2) It is completely unclear to me why this driver needs to delay at all or
>    wait for anything.  I see no other driver having to deal with this issue.
> 
> Until you address both of these points I am not going to apply this patch.

1) I share your concern as well. Is there a universal way to immediately trigger 
DHCP renew of all current and future daemons with a single event from kernel? 
If not, can we put the delay (RNDIS_STATUS_NETWORK_CHANGE only) into a 
tunable variable of this driver?

2) We used to have the call_usermodehelper "/etc/init.d/network restart" to 
trigger DHCP renew. In commit 27a70af3f4, Vitaly has replaced it with the current 
code that updates the link status with at least 2 seconds interval, so that the 
"link_watch infrastructure" can send notification out. link_watch infrastructure 
only sends one notification per second.

Thanks,
- Haiyang
David Miller Feb. 16, 2016, 8:28 p.m. UTC | #7
From: Haiyang Zhang <haiyangz@microsoft.com>
Date: Tue, 9 Feb 2016 15:31:34 +0000

> 1) I share your concern as well. Is there a universal way to immediately trigger 
> DHCP renew of all current and future daemons with a single event from kernel? 
> If not, can we put the delay (RNDIS_STATUS_NETWORK_CHANGE only) into a 
> tunable variable of this driver?
> 
> 2) We used to have the call_usermodehelper "/etc/init.d/network restart" to 
> trigger DHCP renew. In commit 27a70af3f4, Vitaly has replaced it with the current 
> code that updates the link status with at least 2 seconds interval, so that the 
> "link_watch infrastructure" can send notification out. link_watch infrastructure 
> only sends one notification per second.

If the daemon is waiting for the link state change properly, there should be
no delay necessary at all.
Vitaly Kuznetsov Feb. 17, 2016, 12:53 p.m. UTC | #8
David Miller <davem@davemloft.net> writes:

> From: Haiyang Zhang <haiyangz@microsoft.com>
> Date: Tue, 9 Feb 2016 15:31:34 +0000
>
>> 1) I share your concern as well. Is there a universal way to immediately trigger 
>> DHCP renew of all current and future daemons with a single event from kernel? 
>> If not, can we put the delay (RNDIS_STATUS_NETWORK_CHANGE only) into a 
>> tunable variable of this driver?
>> 
>> 2) We used to have the call_usermodehelper "/etc/init.d/network restart" to 
>> trigger DHCP renew. In commit 27a70af3f4, Vitaly has replaced it with the current 
>> code that updates the link status with at least 2 seconds interval, so that the 
>> "link_watch infrastructure" can send notification out. link_watch infrastructure 
>> only sends one notification per second.
>
> If the daemon is waiting for the link state change properly, there should be
> no delay necessary at all.

The daemon won't get 2 state change notifications if they happen within
1 second, it will get the last state only so our link will transition
from 'UP' to 'UP'. Why is that? To signal link state change we call
netif_carrier_on()/netif_carrier_off() from the driver. These functions
do linkwatch_fire_event() which has the following code for non-urgent events:

        if (!test_and_set_bit(__LINK_STATE_LINKWATCH_PENDING, &dev->state)) {
		linkwatch_add_event(dev);
	} else if (!urgent)
		return;

	linkwatch_schedule_work(urgent);

So we'll add just one event (because of test_and_set_bit) for a pair of
consequent netif_carrier_off()/netif_carrier_on() calls.

linkwatch_schedule_work() does the following:
	unsigned long delay = linkwatch_nextevent - jiffies;

        ....

	/* If we wrap around we'll delay it by at most HZ. */
	if (delay > HZ)
		delay = 0;

so here is where mandatory ' > 1s' wait comes from.
        ....
        schedule_delayed_work(&linkwatch_work, delay);        

linkwatch_work is linkwatch_event() which calls __linkwatch_run_queue()
which does linkwatch_do_dev() for the list of events we have. But
linkwatch_do_dev() checks current carrier status (which in our case if
'UP' if we didn't wait for > 1s before doing /netif_carrier_on()).

Hyper-V driver is not the only one which has this delay. e1000 driver,
for example, has the following:

...
 * Need to wait a few seconds after link up to get diagnostic information from
 * the phy
 */
static void e1000_update_phy_info_task(struct work_struct *work)
...

which we schedule with
      schedule_delayed_work(&adapter->phy_info_task, 2 * HZ);

To my understanding this code serves the same purpose, so even if you're
super fast with unplugging and plugging back your cable you'll have 2
seconds between 'down' and 'up'. I don't think there is something wrong
with linkwatch, the '1s' protection against drivers going mad is fine so
'2s' workarounds in drivers seem legit.
diff mbox

Patch

diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index 1d3a665..6f23973 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -43,6 +43,8 @@ 
 
 #define RING_SIZE_MIN 64
 #define LINKCHANGE_INT (2 * HZ)
+/* Extra delay for RNDIS_STATUS_NETWORK_CHANGE: */
+#define LINKCHANGE_DELAY (8 * HZ)
 static int ring_size = 128;
 module_param(ring_size, int, S_IRUGO);
 MODULE_PARM_DESC(ring_size, "Ring buffer size (# of pages)");
@@ -964,6 +966,7 @@  static void netvsc_link_change(struct work_struct *w)
 		return;
 	}
 	ndev_ctx->last_reconfig = jiffies;
+	delay = LINKCHANGE_INT;
 
 	spin_lock_irqsave(&ndev_ctx->lock, flags);
 	if (!list_empty(&ndev_ctx->reconfig_events)) {
@@ -1009,8 +1012,11 @@  static void netvsc_link_change(struct work_struct *w)
 			netif_tx_stop_all_queues(net);
 			event->event = RNDIS_STATUS_MEDIA_CONNECT;
 			spin_lock_irqsave(&ndev_ctx->lock, flags);
-			list_add_tail(&event->list, &ndev_ctx->reconfig_events);
+			list_add(&event->list, &ndev_ctx->reconfig_events);
 			spin_unlock_irqrestore(&ndev_ctx->lock, flags);
+
+			ndev_ctx->last_reconfig += LINKCHANGE_DELAY;
+			delay = LINKCHANGE_INT + LINKCHANGE_DELAY;
 			reschedule = true;
 		}
 		break;
@@ -1025,7 +1031,7 @@  static void netvsc_link_change(struct work_struct *w)
 	 * second, handle next reconfig event in 2 seconds.
 	 */
 	if (reschedule)
-		schedule_delayed_work(&ndev_ctx->dwork, LINKCHANGE_INT);
+		schedule_delayed_work(&ndev_ctx->dwork, delay);
 }
 
 static void netvsc_free_netdev(struct net_device *netdev)