diff mbox

[RFC,v2,5/4] virtio-net: send gratuitous packet when needed

Message ID 20111022054311.21798.3340.stgit@dhcp-8-146.nay.redhat.com
State New
Headers show

Commit Message

Jason Wang Oct. 22, 2011, 5:43 a.m. UTC
This make let virtio-net driver can send gratituous packet by a new
config bit - VIRTIO_NET_S_ANNOUNCE in each config update
interrupt. When this bit is set by backend, the driver would schedule
a workqueue to send gratituous packet through NETDEV_NOTIFY_PEERS.

This feature is negotiated through bit VIRTIO_NET_F_GUEST_ANNOUNCE.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/virtio_net.c   |   31 ++++++++++++++++++++++++++++++-
 include/linux/virtio_net.h |    2 ++
 2 files changed, 32 insertions(+), 1 deletions(-)

Comments

Rusty Russell Oct. 24, 2011, 4:24 a.m. UTC | #1
On Sat, 22 Oct 2011 13:43:11 +0800, Jason Wang <jasowang@redhat.com> wrote:
> This make let virtio-net driver can send gratituous packet by a new
> config bit - VIRTIO_NET_S_ANNOUNCE in each config update
> interrupt. When this bit is set by backend, the driver would schedule
> a workqueue to send gratituous packet through NETDEV_NOTIFY_PEERS.
> 
> This feature is negotiated through bit VIRTIO_NET_F_GUEST_ANNOUNCE.
> 
> Signed-off-by: Jason Wang <jasowang@redhat.com>

This seems like a huge layering violation.  Imagine this in real
hardware, for example.

There may be a good reason why virtual devices might want this kind of
reconfiguration cheat, which is unnecessary for normal machines, but
it'd have to be spelled out clearly in the spec to justify it...

Cheers,
Rusty.
Michael S. Tsirkin Oct. 24, 2011, 5:25 a.m. UTC | #2
On Mon, Oct 24, 2011 at 02:54:59PM +1030, Rusty Russell wrote:
> On Sat, 22 Oct 2011 13:43:11 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > This make let virtio-net driver can send gratituous packet by a new
> > config bit - VIRTIO_NET_S_ANNOUNCE in each config update
> > interrupt. When this bit is set by backend, the driver would schedule
> > a workqueue to send gratituous packet through NETDEV_NOTIFY_PEERS.
> > 
> > This feature is negotiated through bit VIRTIO_NET_F_GUEST_ANNOUNCE.
> > 
> > Signed-off-by: Jason Wang <jasowang@redhat.com>
> 
> This seems like a huge layering violation.  Imagine this in real
> hardware, for example.

commits 06c4648d46d1b757d6b9591a86810be79818b60c
and 99606477a5888b0ead0284fecb13417b1da8e3af
document the need for this:

NETDEV_NOTIFY_PEERS notifier indicates that a device moved to a 
different physical link.
	and
In real hardware such notifications are only
generated when the device comes up or the address changes.

So hypervisor could get the same behaviour by sending link up/down
events, this is just an optimization so guest won't do
unecessary stuff like try to reconfigure an IP address.


Maybe LOCATION_CHANGE would be a better name?


> There may be a good reason why virtual devices might want this kind of
> reconfiguration cheat, which is unnecessary for normal machines,

I think yes, the difference with real hardware is guest can change
location without link getting dropped.
FWIW, Xen seems to use this capability too.

> but
> it'd have to be spelled out clearly in the spec to justify it...
> 
> Cheers,
> Rusty.

Agree, and I'd like to see the spec too. The interface seems
to involve the guest clearing the status bit when it detects
an event?

Also - how does it interact with the link up event?
We probably don't want to schedule this when we detect
a link status change or during initialization, as
this patch seems to do? What if link goes down
while the work is running? Is that OK?
Ben Hutchings Oct. 24, 2011, 9:11 a.m. UTC | #3
On Mon, 2011-10-24 at 07:25 +0200, Michael S. Tsirkin wrote:
> On Mon, Oct 24, 2011 at 02:54:59PM +1030, Rusty Russell wrote:
> > On Sat, 22 Oct 2011 13:43:11 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > This make let virtio-net driver can send gratituous packet by a new
> > > config bit - VIRTIO_NET_S_ANNOUNCE in each config update
> > > interrupt. When this bit is set by backend, the driver would schedule
> > > a workqueue to send gratituous packet through NETDEV_NOTIFY_PEERS.
> > > 
> > > This feature is negotiated through bit VIRTIO_NET_F_GUEST_ANNOUNCE.
> > > 
> > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > 
> > This seems like a huge layering violation.  Imagine this in real
> > hardware, for example.
> 
> commits 06c4648d46d1b757d6b9591a86810be79818b60c
> and 99606477a5888b0ead0284fecb13417b1da8e3af
> document the need for this:
> 
> NETDEV_NOTIFY_PEERS notifier indicates that a device moved to a 
> different physical link.
> 	and
> In real hardware such notifications are only
> generated when the device comes up or the address changes.
> 
> So hypervisor could get the same behaviour by sending link up/down
> events, this is just an optimization so guest won't do
> unecessary stuff like try to reconfigure an IP address.
> 
> 
> Maybe LOCATION_CHANGE would be a better name?
[...]

We also use this in bonding failover, where the system location doesn't
change but a different link is used.  However, I do recognise that the
name ought to indicate what kind of change happened and not what the
expected action is.

Ben.
Jason Wang Oct. 25, 2011, 2:50 a.m. UTC | #4
On 10/24/2011 01:25 PM, Michael S. Tsirkin wrote:
> On Mon, Oct 24, 2011 at 02:54:59PM +1030, Rusty Russell wrote:
>> On Sat, 22 Oct 2011 13:43:11 +0800, Jason Wang <jasowang@redhat.com> wrote:
>>> This make let virtio-net driver can send gratituous packet by a new
>>> config bit - VIRTIO_NET_S_ANNOUNCE in each config update
>>> interrupt. When this bit is set by backend, the driver would schedule
>>> a workqueue to send gratituous packet through NETDEV_NOTIFY_PEERS.
>>>
>>> This feature is negotiated through bit VIRTIO_NET_F_GUEST_ANNOUNCE.
>>>
>>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>>
>> This seems like a huge layering violation.  Imagine this in real
>> hardware, for example.
> 
> commits 06c4648d46d1b757d6b9591a86810be79818b60c
> and 99606477a5888b0ead0284fecb13417b1da8e3af
> document the need for this:
> 
> NETDEV_NOTIFY_PEERS notifier indicates that a device moved to a 
> different physical link.
> 	and
> In real hardware such notifications are only
> generated when the device comes up or the address changes.
> 
> So hypervisor could get the same behaviour by sending link up/down
> events, this is just an optimization so guest won't do
> unecessary stuff like try to reconfigure an IP address.
> 
> 
> Maybe LOCATION_CHANGE would be a better name?
> 

ANNOUNCE_SELF?

> 
>> There may be a good reason why virtual devices might want this kind of
>> reconfiguration cheat, which is unnecessary for normal machines,
> 
> I think yes, the difference with real hardware is guest can change
> location without link getting dropped.
> FWIW, Xen seems to use this capability too.

So does ms netvsc.

> 
>> but
>> it'd have to be spelled out clearly in the spec to justify it...
>>
>> Cheers,
>> Rusty.
> 
> Agree, and I'd like to see the spec too. The interface seems
> to involve the guest clearing the status bit when it detects
> an event?

I would describe this in spec. The interface need guest to clear the
status bit, this would let the back-end know it has finished the work as
we may need to send the gratuitous packets many times.

> 
> Also - how does it interact with the link up event?
> We probably don't want to schedule this when we detect
> a link status change or during initialization, as
> this patch seems to do? What if link goes down
> while the work is running? Is that OK?
> 

Looks like there's are duplications if guest enable arp_notify vm is
started, but we need to handle the situation that resuming a stopped
virtual machine.

For the link down race, I don't see any real issue, either dropping or
queued.
Michael S. Tsirkin Oct. 25, 2011, 3:41 p.m. UTC | #5
On Tue, Oct 25, 2011 at 10:50:41AM +0800, Jason Wang wrote:
> On 10/24/2011 01:25 PM, Michael S. Tsirkin wrote:
> > On Mon, Oct 24, 2011 at 02:54:59PM +1030, Rusty Russell wrote:
> >> On Sat, 22 Oct 2011 13:43:11 +0800, Jason Wang <jasowang@redhat.com> wrote:
> >>> This make let virtio-net driver can send gratituous packet by a new
> >>> config bit - VIRTIO_NET_S_ANNOUNCE in each config update
> >>> interrupt. When this bit is set by backend, the driver would schedule
> >>> a workqueue to send gratituous packet through NETDEV_NOTIFY_PEERS.
> >>>
> >>> This feature is negotiated through bit VIRTIO_NET_F_GUEST_ANNOUNCE.
> >>>
> >>> Signed-off-by: Jason Wang <jasowang@redhat.com>
> >>
> >> This seems like a huge layering violation.  Imagine this in real
> >> hardware, for example.
> > 
> > commits 06c4648d46d1b757d6b9591a86810be79818b60c
> > and 99606477a5888b0ead0284fecb13417b1da8e3af
> > document the need for this:
> > 
> > NETDEV_NOTIFY_PEERS notifier indicates that a device moved to a 
> > different physical link.
> > 	and
> > In real hardware such notifications are only
> > generated when the device comes up or the address changes.
> > 
> > So hypervisor could get the same behaviour by sending link up/down
> > events, this is just an optimization so guest won't do
> > unecessary stuff like try to reconfigure an IP address.
> > 
> > 
> > Maybe LOCATION_CHANGE would be a better name?
> > 
> 
> ANNOUNCE_SELF?

It would be nice to formulate what kind of event
are we notifying the guest about.
The announce part of it is really up to the guest, isn't it?

> > 
> >> There may be a good reason why virtual devices might want this kind of
> >> reconfiguration cheat, which is unnecessary for normal machines,
> > 
> > I think yes, the difference with real hardware is guest can change
> > location without link getting dropped.
> > FWIW, Xen seems to use this capability too.
> 
> So does ms netvsc.
> 
> > 
> >> but
> >> it'd have to be spelled out clearly in the spec to justify it...
> >>
> >> Cheers,
> >> Rusty.
> > 
> > Agree, and I'd like to see the spec too. The interface seems
> > to involve the guest clearing the status bit when it detects
> > an event?
> 
> I would describe this in spec. The interface need guest to clear the
> status bit, this would let the back-end know it has finished the work as
> we may need to send the gratuitous packets many times.
> 
> > 
> > Also - how does it interact with the link up event?
> > We probably don't want to schedule this when we detect
> > a link status change or during initialization, as
> > this patch seems to do? What if link goes down
> > while the work is running? Is that OK?
> > 
> 
> Looks like there's are duplications if guest enable arp_notify vm is
> started,

How hard would it be to avoid these duplicates?

> but we need to handle the situation that resuming a stopped
> virtual machine.
> 
> For the link down race, I don't see any real issue, either dropping or
> queued.

For example, you do
        unregister_netdev(vi->dev);
        cancel_work_sync(&vi->announce);

which looks scary as announce seems to use the netdev.
Jason Wang Oct. 26, 2011, 4:49 a.m. UTC | #6
On 10/25/2011 11:41 PM, Michael S. Tsirkin wrote:
> On Tue, Oct 25, 2011 at 10:50:41AM +0800, Jason Wang wrote:
>> On 10/24/2011 01:25 PM, Michael S. Tsirkin wrote:
>>> On Mon, Oct 24, 2011 at 02:54:59PM +1030, Rusty Russell wrote:
>>>> On Sat, 22 Oct 2011 13:43:11 +0800, Jason Wang <jasowang@redhat.com> wrote:
>>>>> This make let virtio-net driver can send gratituous packet by a new
>>>>> config bit - VIRTIO_NET_S_ANNOUNCE in each config update
>>>>> interrupt. When this bit is set by backend, the driver would schedule
>>>>> a workqueue to send gratituous packet through NETDEV_NOTIFY_PEERS.
>>>>>
>>>>> This feature is negotiated through bit VIRTIO_NET_F_GUEST_ANNOUNCE.
>>>>>
>>>>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>>>>
>>>> This seems like a huge layering violation.  Imagine this in real
>>>> hardware, for example.
>>>
>>> commits 06c4648d46d1b757d6b9591a86810be79818b60c
>>> and 99606477a5888b0ead0284fecb13417b1da8e3af
>>> document the need for this:
>>>
>>> NETDEV_NOTIFY_PEERS notifier indicates that a device moved to a 
>>> different physical link.
>>> 	and
>>> In real hardware such notifications are only
>>> generated when the device comes up or the address changes.
>>>
>>> So hypervisor could get the same behaviour by sending link up/down
>>> events, this is just an optimization so guest won't do
>>> unecessary stuff like try to reconfigure an IP address.
>>>
>>>
>>> Maybe LOCATION_CHANGE would be a better name?
>>>
>>
>> ANNOUNCE_SELF?
> 
> It would be nice to formulate what kind of event
> are we notifying the guest about.
> The announce part of it is really up to the guest, isn't it?
> 

Right.

>>>
>>>> There may be a good reason why virtual devices might want this kind of
>>>> reconfiguration cheat, which is unnecessary for normal machines,
>>>
>>> I think yes, the difference with real hardware is guest can change
>>> location without link getting dropped.
>>> FWIW, Xen seems to use this capability too.
>>
>> So does ms netvsc.
>>
>>>
>>>> but
>>>> it'd have to be spelled out clearly in the spec to justify it...
>>>>
>>>> Cheers,
>>>> Rusty.
>>>
>>> Agree, and I'd like to see the spec too. The interface seems
>>> to involve the guest clearing the status bit when it detects
>>> an event?
>>
>> I would describe this in spec. The interface need guest to clear the
>> status bit, this would let the back-end know it has finished the work as
>> we may need to send the gratuitous packets many times.
>>
>>>
>>> Also - how does it interact with the link up event?
>>> We probably don't want to schedule this when we detect
>>> a link status change or during initialization, as
>>> this patch seems to do? What if link goes down
>>> while the work is running? Is that OK?
>>>
>>
>> Looks like there's are duplications if guest enable arp_notify vm is
>> started,
> 
> How hard would it be to avoid these duplicates?

Not hard, it could be done in backend by distinguishing the reason :
fresh start or cont after migration or stop.

> 
>> but we need to handle the situation that resuming a stopped
>> virtual machine.
>>
>> For the link down race, I don't see any real issue, either dropping or
>> queued.
> 
> For example, you do
>         unregister_netdev(vi->dev);
>         cancel_work_sync(&vi->announce);
> 
> which looks scary as announce seems to use the netdev.
> 

oops, it's a bug, I would fix it. Thanks
diff mbox

Patch

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index b8225f3..1cdecf7 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -71,6 +71,9 @@  struct virtnet_info {
 	/* Work struct for refilling if we run low on memory. */
 	struct delayed_work refill;
 
+	/* Work struct for send gratituous packet. */
+	struct work_struct announce;
+
 	/* Chain pages by the private ptr. */
 	struct page *pages;
 
@@ -507,6 +510,13 @@  static void refill_work(struct work_struct *work)
 		schedule_delayed_work(&vi->refill, HZ/2);
 }
 
+static void announce_work(struct work_struct *work)
+{
+	struct virtnet_info *vi = container_of(work, struct virtnet_info,
+					       announce);
+	netif_notify_peers(vi->dev);
+}
+
 static int virtnet_poll(struct napi_struct *napi, int budget)
 {
 	struct virtnet_info *vi = container_of(napi, struct virtnet_info, napi);
@@ -923,11 +933,22 @@  static void virtnet_update_status(struct virtnet_info *vi)
 			      &v, sizeof(v));
 
 	/* Ignore unknown (future) status bits */
-	v &= VIRTIO_NET_S_LINK_UP;
+	v &= VIRTIO_NET_S_LINK_UP | VIRTIO_NET_S_ANNOUNCE;
 
 	if (vi->status == v)
 		return;
 
+	if (v & VIRTIO_NET_S_ANNOUNCE) {
+		if ((v & VIRTIO_NET_S_LINK_UP) &&
+		    virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_ANNOUNCE))
+			schedule_work(&vi->announce);
+		v &= ~VIRTIO_NET_S_ANNOUNCE;
+                vi->vdev->config->set(vi->vdev,
+                                      offsetof(struct virtio_net_config,
+		                               status),
+                                      &v, sizeof(v));
+	}
+
 	vi->status = v;
 
 	if (vi->status & VIRTIO_NET_S_LINK_UP) {
@@ -937,6 +958,7 @@  static void virtnet_update_status(struct virtnet_info *vi)
 		netif_carrier_off(vi->dev);
 		netif_stop_queue(vi->dev);
 	}
+
 }
 
 static void virtnet_config_changed(struct virtio_device *vdev)
@@ -1016,6 +1038,8 @@  static int virtnet_probe(struct virtio_device *vdev)
 		goto free;
 
 	INIT_DELAYED_WORK(&vi->refill, refill_work);
+	if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_ANNOUNCE))
+		INIT_WORK(&vi->announce, announce_work);
 	sg_init_table(vi->rx_sg, ARRAY_SIZE(vi->rx_sg));
 	sg_init_table(vi->tx_sg, ARRAY_SIZE(vi->tx_sg));
 
@@ -1077,6 +1101,8 @@  static int virtnet_probe(struct virtio_device *vdev)
 unregister:
 	unregister_netdev(dev);
 	cancel_delayed_work_sync(&vi->refill);
+	if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_ANNOUNCE))
+		cancel_work_sync(&vi->announce);
 free_vqs:
 	vdev->config->del_vqs(vdev);
 free_stats:
@@ -1118,6 +1144,8 @@  static void __devexit virtnet_remove(struct virtio_device *vdev)
 
 	unregister_netdev(vi->dev);
 	cancel_delayed_work_sync(&vi->refill);
+	if(virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_ANNOUNCE))
+		cancel_work_sync(&vi->announce);
 
 	/* Free unused buffers in both send and recv, if any. */
 	free_unused_bufs(vi);
@@ -1144,6 +1172,7 @@  static unsigned int features[] = {
 	VIRTIO_NET_F_GUEST_ECN, VIRTIO_NET_F_GUEST_UFO,
 	VIRTIO_NET_F_MRG_RXBUF, VIRTIO_NET_F_STATUS, VIRTIO_NET_F_CTRL_VQ,
 	VIRTIO_NET_F_CTRL_RX, VIRTIO_NET_F_CTRL_VLAN,
+	VIRTIO_NET_F_GUEST_ANNOUNCE,
 };
 
 static struct virtio_driver virtio_net_driver = {
diff --git a/include/linux/virtio_net.h b/include/linux/virtio_net.h
index 970d5a2..44a38d6 100644
--- a/include/linux/virtio_net.h
+++ b/include/linux/virtio_net.h
@@ -49,8 +49,10 @@ 
 #define VIRTIO_NET_F_CTRL_RX	18	/* Control channel RX mode support */
 #define VIRTIO_NET_F_CTRL_VLAN	19	/* Control channel VLAN filtering */
 #define VIRTIO_NET_F_CTRL_RX_EXTRA 20	/* Extra RX mode control support */
+#define VIRTIO_NET_F_GUEST_ANNOUNCE 21  /* Guest can send gratituous packet */
 
 #define VIRTIO_NET_S_LINK_UP	1	/* Link is up */
+#define VIRTIO_NET_S_ANNOUNCE   2       /* Announcement is needed */
 
 struct virtio_net_config {
 	/* The config defining mac address (if VIRTIO_NET_F_MAC) */