diff mbox

virtio-net: announce self by guest

Message ID 1400138207-16202-1-git-send-email-jasowang@redhat.com
State New
Headers show

Commit Message

Jason Wang May 15, 2014, 7:16 a.m. UTC
It's hard to track all mac addresses and their configurations (e.g
vlan or ipv6) in qemu. Without those informations, it's impossible to
build proper garp packet after migration. The only possible solution
to this is let guest (who knew all configurations) to do this.

So, this patch introduces a new readonly config status bit of virtio-net,
VIRTIO_NET_S_ANNOUNCE which is used to notify guest to announce
presence of its link through config update interrupt.When guest has
done the announcement, it should ack the notification through
VIRTIO_NET_CTRL_ANNOUNCE_ACK cmd. This feature is negotiated by a new
feature bit VIRTIO_NET_F_ANNOUNCE (which has already been supported by
Linux guest).

During load, a counter of announcing rounds were set so that the after
the vm is running it can trigger rounds of config interrupts to notify
the guest to build and send the correct garps.

Tested with ping to guest with vlan during migration. Without the
patch, lots of the packets were lost after migration. With the patch,
could not notice packet loss after migration.

Reference:
RFC v2: https://lists.gnu.org/archive/html/qemu-devel/2014-04/msg01750.html
RFC v1: https://lists.gnu.org/archive/html/qemu-devel/2014-03/msg02648.html
V7:     https://lists.gnu.org/archive/html/qemu-devel/2013-03/msg01127.html

Changes from RFC v2:
- use QEMU_CLOCK_VIRTUAL instead of QEMU_CLOCK_REALTIME
- compat self announce for 2.0 machine type

Changes from RFC v1:
- clean VIRTIO_NET_S_ANNOUNCE bit during reset
- free announce timer during clean
- make announce work for non-vhost case

Changes from V7:
- Instead of introducing a global method for each kind of nic, this
  version limits the changes to virtio-net itself.

Cc: Liuyongan <liuyongan@huawei.com>
Cc: Amos Kong <akong@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 hw/net/virtio-net.c            |   48 ++++++++++++++++++++++++++++++++++++++++
 include/hw/i386/pc.h           |    5 ++++
 include/hw/virtio/virtio-net.h |   16 +++++++++++++
 3 files changed, 69 insertions(+), 0 deletions(-)

Comments

Michael S. Tsirkin May 15, 2014, 8:28 a.m. UTC | #1
Thanks, looks good.
Some minor comments below,

On Thu, May 15, 2014 at 03:16:47PM +0800, Jason Wang wrote:
> It's hard to track all mac addresses and their configurations (e.g
> vlan or ipv6) in qemu. Without those informations, it's impossible to

s/those informations/this information/

> build proper garp packet after migration. The only possible solution
> to this is let guest (who knew all configurations) to do this.

s/knew/knows/

> 
> So, this patch introduces a new readonly config status bit of virtio-net,
> VIRTIO_NET_S_ANNOUNCE which is used to notify guest to announce
> presence of its link through config update interrupt.When guest has
> done the announcement, it should ack the notification through
> VIRTIO_NET_CTRL_ANNOUNCE_ACK cmd. This feature is negotiated by a new
> feature bit VIRTIO_NET_F_ANNOUNCE (which has already been supported by
> Linux guest).
> 
> During load, a counter of announcing rounds were set so that the after

s/were/is/
s/the after/after/

> the vm is running it can trigger rounds of config interrupts to notify
> the guest to build and send the correct garps.
> 
> Tested with ping to guest with vlan during migration. Without the
> patch, lots of the packets were lost after migration. With the patch,
> could not notice packet loss after migration.

below changelog should go after ---, until the ack list.

> 
> Reference:
> RFC v2: https://lists.gnu.org/archive/html/qemu-devel/2014-04/msg01750.html
> RFC v1: https://lists.gnu.org/archive/html/qemu-devel/2014-03/msg02648.html
> V7:     https://lists.gnu.org/archive/html/qemu-devel/2013-03/msg01127.html
> 
> Changes from RFC v2:
> - use QEMU_CLOCK_VIRTUAL instead of QEMU_CLOCK_REALTIME
> - compat self announce for 2.0 machine type
> 
> Changes from RFC v1:
> - clean VIRTIO_NET_S_ANNOUNCE bit during reset
> - free announce timer during clean
> - make announce work for non-vhost case
> 
> Changes from V7:
> - Instead of introducing a global method for each kind of nic, this
>   version limits the changes to virtio-net itself.
> 
> Cc: Liuyongan <liuyongan@huawei.com>
> Cc: Amos Kong <akong@redhat.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>  hw/net/virtio-net.c            |   48 ++++++++++++++++++++++++++++++++++++++++
>  include/hw/i386/pc.h           |    5 ++++
>  include/hw/virtio/virtio-net.h |   16 +++++++++++++
>  3 files changed, 69 insertions(+), 0 deletions(-)
> 
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index 940a7cf..98d59e9 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -25,6 +25,7 @@
>  #include "monitor/monitor.h"
>  
>  #define VIRTIO_NET_VM_VERSION    11
> +#define VIRTIO_NET_ANNOUNCE_ROUNDS    3
>  
>  #define MAC_TABLE_ENTRIES    64
>  #define MAX_VLAN    (1 << 12)   /* Per 802.1Q definition */

I would make it  5 to be consistent with SELF_ANNOUNCE_ROUNDS
in savevm.c

in fact, let's move SELF_ANNOUNCE_ROUNDS to include/migration/vmstate.h
and reuse it.

> @@ -99,6 +100,25 @@ static bool virtio_net_started(VirtIONet *n, uint8_t status)
>          (n->status & VIRTIO_NET_S_LINK_UP) && vdev->vm_running;
>  }
>  
> +static void virtio_net_announce_timer(void *opaque)
> +{
> +    VirtIONet *n = opaque;
> +    VirtIODevice *vdev = VIRTIO_DEVICE(n);
> +
> +    if (n->announce &&

I would make it > 0 here, just in case it becomes negative as a result
of some bug.

> +        virtio_net_started(n, vdev->status) &&
> +        vdev->guest_features & (0x1 << VIRTIO_NET_F_GUEST_ANNOUNCE) &&
> +        vdev->guest_features & (0x1 << VIRTIO_NET_F_CTRL_VQ)) {
> +
> +        n->announce--;
> +        n->status |= VIRTIO_NET_S_ANNOUNCE;
> +        virtio_notify_config(vdev);
> +    } else {
> +        timer_del(n->announce_timer);

why do this here?

> +        n->announce = 0;

why is this here?

> +    }
> +}
> +
>  static void virtio_net_vhost_status(VirtIONet *n, uint8_t status)
>  {
>      VirtIODevice *vdev = VIRTIO_DEVICE(n);
> @@ -147,6 +167,8 @@ static void virtio_net_set_status(struct VirtIODevice *vdev, uint8_t status)
>  
>      virtio_net_vhost_status(n, status);
>  
> +    virtio_net_announce_timer(n);
> +

why do this here? why not right after we set announce counter?

>      for (i = 0; i < n->max_queues; i++) {
>          q = &n->vqs[i];
>  
> @@ -322,6 +344,9 @@ static void virtio_net_reset(VirtIODevice *vdev)
>      n->nobcast = 0;
>      /* multiqueue is disabled by default */
>      n->curr_queues = 1;
> +    timer_del(n->announce_timer);
> +    n->announce = 0;
> +    n->status &= ~VIRTIO_NET_S_ANNOUNCE;
>  
>      /* Flush any MAC and VLAN filter table state */
>      n->mac_table.in_use = 0;
> @@ -731,6 +756,22 @@ static int virtio_net_handle_vlan_table(VirtIONet *n, uint8_t cmd,
>      return VIRTIO_NET_OK;
>  }
>  
> +static int virtio_net_handle_announce(VirtIONet *n, uint8_t cmd,
> +                                      struct iovec *iov, unsigned int iov_cnt)
> +{
> +    if (cmd == VIRTIO_NET_CTRL_ANNOUNCE_ACK) {
> +        n->status &= ~VIRTIO_NET_S_ANNOUNCE;
> +        if (n->announce) {

I would make it > 0 here, just in case it becomes negative as a result
of some bug.

> +            timer_mod(n->announce_timer,
> +                      qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + 50 +
> +                      (VIRTIO_NET_ANNOUNCE_ROUNDS - n->announce - 1) * 100);

savevm.c has this code, and a comment:
        /* delay 50ms, 150ms, 250ms, ... */
	50 + (SELF_ANNOUNCE_ROUNDS - count - 1) * 100

how about an API in include/migration/vmstate.h

static inline
int64_t self_announce_delay(int round)
{
	assert(round < SELF_ANNOUNCE_ROUNDS && round > 0);
        /* delay 50ms, 150ms, 250ms, ... */
	return 50 + (SELF_ANNOUNCE_ROUNDS - round - 1) * 100;
}

or something to this end.


> +        }
> +        return VIRTIO_NET_OK;
> +    } else {
> +        return VIRTIO_NET_ERR;
> +    }
> +}
> +
>  static int virtio_net_handle_mq(VirtIONet *n, uint8_t cmd,
>                                  struct iovec *iov, unsigned int iov_cnt)
>  {
> @@ -794,6 +835,8 @@ static void virtio_net_handle_ctrl(VirtIODevice *vdev, VirtQueue *vq)
>              status = virtio_net_handle_mac(n, ctrl.cmd, iov, iov_cnt);
>          } else if (ctrl.class == VIRTIO_NET_CTRL_VLAN) {
>              status = virtio_net_handle_vlan_table(n, ctrl.cmd, iov, iov_cnt);
> +        } else if (ctrl.class == VIRTIO_NET_CTRL_ANNOUNCE) {
> +            status = virtio_net_handle_announce(n, ctrl.cmd, iov, iov_cnt);
>          } else if (ctrl.class == VIRTIO_NET_CTRL_MQ) {
>              status = virtio_net_handle_mq(n, ctrl.cmd, iov, iov_cnt);
>          } else if (ctrl.class == VIRTIO_NET_CTRL_GUEST_OFFLOADS) {
> @@ -1451,6 +1494,7 @@ static int virtio_net_load(QEMUFile *f, void *opaque, int version_id)
>          qemu_get_subqueue(n->nic, i)->link_down = link_down;
>      }
>  
> +    n->announce = VIRTIO_NET_ANNOUNCE_ROUNDS;

Well if virtio_net_handle_announce runs now it will start timer
in the past, right?
This seems wrong.

>      return 0;
>  }
>  
> @@ -1562,6 +1606,8 @@ static void virtio_net_device_realize(DeviceState *dev, Error **errp)
>      qemu_macaddr_default_if_unset(&n->nic_conf.macaddr);
>      memcpy(&n->mac[0], &n->nic_conf.macaddr, sizeof(n->mac));
>      n->status = VIRTIO_NET_S_LINK_UP;
> +    n->announce_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL,
> +                                     virtio_net_announce_timer, n);
>  
>      if (n->netclient_type) {
>          /*
> @@ -1642,6 +1688,8 @@ static void virtio_net_device_unrealize(DeviceState *dev, Error **errp)
>          }
>      }
>  
> +    timer_del(n->announce_timer);
> +    timer_free(n->announce_timer);
>      g_free(n->vqs);
>      qemu_del_nic(n->nic);
>      virtio_cleanup(vdev);
> diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
> index 32a7687..f93b427 100644
> --- a/include/hw/i386/pc.h
> +++ b/include/hw/i386/pc.h
> @@ -271,6 +271,11 @@ bool e820_get_entry(int, uint32_t, uint64_t *, uint64_t *);
>              .driver   = "apic",\
>              .property = "version",\
>              .value    = stringify(0x11),\
> +        },\
> +        {\
> +            .driver   = "virtio-net-pci",\
> +            .property = "guest_announce",\
> +            .value    = "off",\
>          }
>  
>  #define PC_COMPAT_1_7 \
> diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
> index 4b32440..ca1a9c5 100644
> --- a/include/hw/virtio/virtio-net.h
> +++ b/include/hw/virtio/virtio-net.h
> @@ -49,12 +49,14 @@
>  #define VIRTIO_NET_F_CTRL_RX    18      /* Control channel RX mode support */
>  #define VIRTIO_NET_F_CTRL_VLAN  19      /* Control channel VLAN filtering */
>  #define VIRTIO_NET_F_CTRL_RX_EXTRA 20   /* Extra RX mode control support */
> +#define VIRTIO_NET_F_GUEST_ANNOUNCE 21  /* Guest can announce itself */
>  #define VIRTIO_NET_F_MQ         22      /* Device supports Receive Flow
>                                           * Steering */
>  
>  #define VIRTIO_NET_F_CTRL_MAC_ADDR   23 /* Set MAC address */
>  
>  #define VIRTIO_NET_S_LINK_UP    1       /* Link is up */
> +#define VIRTIO_NET_S_ANNOUNCE   2       /* Announcement is needed */
>  
>  #define TX_TIMER_INTERVAL 150000 /* 150 us */
>  
> @@ -193,6 +195,8 @@ typedef struct VirtIONet {
>      char *netclient_name;
>      char *netclient_type;
>      uint64_t curr_guest_offloads;
> +    QEMUTimer *announce_timer;
> +    int announce;


rename announce_counter pls.

>  } VirtIONet;
>  
>  #define VIRTIO_NET_CTRL_MAC    1
> @@ -213,6 +217,17 @@ typedef struct VirtIONet {
>   #define VIRTIO_NET_CTRL_VLAN_DEL             1
>  
>  /*
> + * Control link announce acknowledgement
> + *

bunch of typos and vagueness below

> + * The command VIRTIO_NET_CTRL_ANNOUNCE_ACK is used to indicate that
> + * driver has recevied the notification and device would clear the
> + * VIRTIO_NET_S_ANNOUNCE bit in the status filed after it received
> + * this command.

fixed version:

VIRTIO_NET_S_ANNOUNCE bit in the status field requests link
announcement from guest driver. The driver is notified by config space change
interrupt.  The command VIRTIO_NET_CTRL_ANNOUNCE_ACK is used to indicate
that the driver has received the notification. It makes the device clear the
bit VIRTIO_NET_S_ANNOUNCE in the status field.


> + */
> +#define VIRTIO_NET_CTRL_ANNOUNCE       3
> + #define VIRTIO_NET_CTRL_ANNOUNCE_ACK         0
> +
> +/*
>   * Control Multiqueue
>   *
>   * The command VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET
> @@ -251,6 +266,7 @@ struct virtio_net_ctrl_mq {
>          DEFINE_PROP_BIT("guest_tso6", _state, _field, VIRTIO_NET_F_GUEST_TSO6, true), \
>          DEFINE_PROP_BIT("guest_ecn", _state, _field, VIRTIO_NET_F_GUEST_ECN, true), \
>          DEFINE_PROP_BIT("guest_ufo", _state, _field, VIRTIO_NET_F_GUEST_UFO, true), \
> +        DEFINE_PROP_BIT("guest_announce", _state, _field, VIRTIO_NET_F_GUEST_ANNOUNCE, true), \
>          DEFINE_PROP_BIT("host_tso4", _state, _field, VIRTIO_NET_F_HOST_TSO4, true), \
>          DEFINE_PROP_BIT("host_tso6", _state, _field, VIRTIO_NET_F_HOST_TSO6, true), \
>          DEFINE_PROP_BIT("host_ecn", _state, _field, VIRTIO_NET_F_HOST_ECN, true), \
> -- 
> 1.7.1
Jason Wang May 15, 2014, 9:22 a.m. UTC | #2
On 05/15/2014 04:28 PM, Michael S. Tsirkin wrote:
> Thanks, looks good.
> Some minor comments below,
>
> On Thu, May 15, 2014 at 03:16:47PM +0800, Jason Wang wrote:
>> It's hard to track all mac addresses and their configurations (e.g
>> vlan or ipv6) in qemu. Without those informations, it's impossible to
> s/those informations/this information/
>
>> build proper garp packet after migration. The only possible solution
>> to this is let guest (who knew all configurations) to do this.
> s/knew/knows/
>
>> So, this patch introduces a new readonly config status bit of virtio-net,
>> VIRTIO_NET_S_ANNOUNCE which is used to notify guest to announce
>> presence of its link through config update interrupt.When guest has
>> done the announcement, it should ack the notification through
>> VIRTIO_NET_CTRL_ANNOUNCE_ACK cmd. This feature is negotiated by a new
>> feature bit VIRTIO_NET_F_ANNOUNCE (which has already been supported by
>> Linux guest).
>>
>> During load, a counter of announcing rounds were set so that the after
> s/were/is/
> s/the after/after/

Will correct those typos.
>> the vm is running it can trigger rounds of config interrupts to notify
>> the guest to build and send the correct garps.
>>
>> Tested with ping to guest with vlan during migration. Without the
>> patch, lots of the packets were lost after migration. With the patch,
>> could not notice packet loss after migration.
> below changelog should go after ---, until the ack list.
>

Ok.
>> Reference:
>> RFC v2: https://lists.gnu.org/archive/html/qemu-devel/2014-04/msg01750.html
>> RFC v1: https://lists.gnu.org/archive/html/qemu-devel/2014-03/msg02648.html
>> V7:     https://lists.gnu.org/archive/html/qemu-devel/2013-03/msg01127.html
>>
>> Changes from RFC v2:
>> - use QEMU_CLOCK_VIRTUAL instead of QEMU_CLOCK_REALTIME
>> - compat self announce for 2.0 machine type
>>
>> Changes from RFC v1:
>> - clean VIRTIO_NET_S_ANNOUNCE bit during reset
>> - free announce timer during clean
>> - make announce work for non-vhost case
>>
>> Changes from V7:
>> - Instead of introducing a global method for each kind of nic, this
>>   version limits the changes to virtio-net itself.
>>
>> Cc: Liuyongan <liuyongan@huawei.com>
>> Cc: Amos Kong <akong@redhat.com>
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>> ---
>>  hw/net/virtio-net.c            |   48 ++++++++++++++++++++++++++++++++++++++++
>>  include/hw/i386/pc.h           |    5 ++++
>>  include/hw/virtio/virtio-net.h |   16 +++++++++++++
>>  3 files changed, 69 insertions(+), 0 deletions(-)
>>
>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
>> index 940a7cf..98d59e9 100644
>> --- a/hw/net/virtio-net.c
>> +++ b/hw/net/virtio-net.c
>> @@ -25,6 +25,7 @@
>>  #include "monitor/monitor.h"
>>  
>>  #define VIRTIO_NET_VM_VERSION    11
>> +#define VIRTIO_NET_ANNOUNCE_ROUNDS    3
>>  
>>  #define MAC_TABLE_ENTRIES    64
>>  #define MAX_VLAN    (1 << 12)   /* Per 802.1Q definition */
> I would make it  5 to be consistent with SELF_ANNOUNCE_ROUNDS
> in savevm.c
>
> in fact, let's move SELF_ANNOUNCE_ROUNDS to include/migration/vmstate.h
> and reuse it.

Ok.
>> @@ -99,6 +100,25 @@ static bool virtio_net_started(VirtIONet *n, uint8_t status)
>>          (n->status & VIRTIO_NET_S_LINK_UP) && vdev->vm_running;
>>  }
>>  
>> +static void virtio_net_announce_timer(void *opaque)
>> +{
>> +    VirtIONet *n = opaque;
>> +    VirtIODevice *vdev = VIRTIO_DEVICE(n);
>> +
>> +    if (n->announce &&
> I would make it > 0 here, just in case it becomes negative as a result
> of some bug.

Sure.
>> +        virtio_net_started(n, vdev->status) &&
>> +        vdev->guest_features & (0x1 << VIRTIO_NET_F_GUEST_ANNOUNCE) &&
>> +        vdev->guest_features & (0x1 << VIRTIO_NET_F_CTRL_VQ)) {
>> +
>> +        n->announce--;
>> +        n->status |= VIRTIO_NET_S_ANNOUNCE;
>> +        virtio_notify_config(vdev);
>> +    } else {
>> +        timer_del(n->announce_timer);
> why do this here?
>
>> +        n->announce = 0;
> why is this here?
>

If guest shuts down the device, there's no need to do the announcing.
>> +    }
>> +}
>> +
>>  static void virtio_net_vhost_status(VirtIONet *n, uint8_t status)
>>  {
>>      VirtIODevice *vdev = VIRTIO_DEVICE(n);
>> @@ -147,6 +167,8 @@ static void virtio_net_set_status(struct VirtIODevice *vdev, uint8_t status)
>>  
>>      virtio_net_vhost_status(n, status);
>>  
>> +    virtio_net_announce_timer(n);
>> +
> why do this here? why not right after we set announce counter?

The reasons are:

- The counters were set in load, but the device is not running so we
could not inject the interrupt at that time.
- We can stop the progress when guest is shutting down the device.
>
>>      for (i = 0; i < n->max_queues; i++) {
>>          q = &n->vqs[i];
>>  
>> @@ -322,6 +344,9 @@ static void virtio_net_reset(VirtIODevice *vdev)
>>      n->nobcast = 0;
>>      /* multiqueue is disabled by default */
>>      n->curr_queues = 1;
>> +    timer_del(n->announce_timer);
>> +    n->announce = 0;
>> +    n->status &= ~VIRTIO_NET_S_ANNOUNCE;
>>  
>>      /* Flush any MAC and VLAN filter table state */
>>      n->mac_table.in_use = 0;
>> @@ -731,6 +756,22 @@ static int virtio_net_handle_vlan_table(VirtIONet *n, uint8_t cmd,
>>      return VIRTIO_NET_OK;
>>  }
>>  
>> +static int virtio_net_handle_announce(VirtIONet *n, uint8_t cmd,
>> +                                      struct iovec *iov, unsigned int iov_cnt)
>> +{
>> +    if (cmd == VIRTIO_NET_CTRL_ANNOUNCE_ACK) {
>> +        n->status &= ~VIRTIO_NET_S_ANNOUNCE;
>> +        if (n->announce) {
> I would make it > 0 here, just in case it becomes negative as a result
> of some bug.

Ok.
>> +            timer_mod(n->announce_timer,
>> +                      qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + 50 +
>> +                      (VIRTIO_NET_ANNOUNCE_ROUNDS - n->announce - 1) * 100);
> savevm.c has this code, and a comment:
>         /* delay 50ms, 150ms, 250ms, ... */
> 	50 + (SELF_ANNOUNCE_ROUNDS - count - 1) * 100
>
> how about an API in include/migration/vmstate.h
>
> static inline
> int64_t self_announce_delay(int round)
> {
> 	assert(round < SELF_ANNOUNCE_ROUNDS && round > 0);
>         /* delay 50ms, 150ms, 250ms, ... */
> 	return 50 + (SELF_ANNOUNCE_ROUNDS - round - 1) * 100;
> }
>
> or something to this end.
>

Good idea, will do this.
>> +        }
>> +        return VIRTIO_NET_OK;
>> +    } else {
>> +        return VIRTIO_NET_ERR;
>> +    }
>> +}
>> +
>>  static int virtio_net_handle_mq(VirtIONet *n, uint8_t cmd,
>>                                  struct iovec *iov, unsigned int iov_cnt)
>>  {
>> @@ -794,6 +835,8 @@ static void virtio_net_handle_ctrl(VirtIODevice *vdev, VirtQueue *vq)
>>              status = virtio_net_handle_mac(n, ctrl.cmd, iov, iov_cnt);
>>          } else if (ctrl.class == VIRTIO_NET_CTRL_VLAN) {
>>              status = virtio_net_handle_vlan_table(n, ctrl.cmd, iov, iov_cnt);
>> +        } else if (ctrl.class == VIRTIO_NET_CTRL_ANNOUNCE) {
>> +            status = virtio_net_handle_announce(n, ctrl.cmd, iov, iov_cnt);
>>          } else if (ctrl.class == VIRTIO_NET_CTRL_MQ) {
>>              status = virtio_net_handle_mq(n, ctrl.cmd, iov, iov_cnt);
>>          } else if (ctrl.class == VIRTIO_NET_CTRL_GUEST_OFFLOADS) {
>> @@ -1451,6 +1494,7 @@ static int virtio_net_load(QEMUFile *f, void *opaque, int version_id)
>>          qemu_get_subqueue(n->nic, i)->link_down = link_down;
>>      }
>>  
>> +    n->announce = VIRTIO_NET_ANNOUNCE_ROUNDS;
> Well if virtio_net_handle_announce runs now it will start timer
> in the past, right?
> This seems wrong.

Not sure I get the case. When in virtio_net_load() the vm is not even
running so looks like virtio_net_handle_announce() could not run in the
same time.
>
>>      return 0;
>>  }
>>  
>> @@ -1562,6 +1606,8 @@ static void virtio_net_device_realize(DeviceState *dev, Error **errp)
>>      qemu_macaddr_default_if_unset(&n->nic_conf.macaddr);
>>      memcpy(&n->mac[0], &n->nic_conf.macaddr, sizeof(n->mac));
>>      n->status = VIRTIO_NET_S_LINK_UP;
>> +    n->announce_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL,
>> +                                     virtio_net_announce_timer, n);
>>  
>>      if (n->netclient_type) {
>>          /*
>> @@ -1642,6 +1688,8 @@ static void virtio_net_device_unrealize(DeviceState *dev, Error **errp)
>>          }
>>      }
>>  
>> +    timer_del(n->announce_timer);
>> +    timer_free(n->announce_timer);
>>      g_free(n->vqs);
>>      qemu_del_nic(n->nic);
>>      virtio_cleanup(vdev);
>> diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
>> index 32a7687..f93b427 100644
>> --- a/include/hw/i386/pc.h
>> +++ b/include/hw/i386/pc.h
>> @@ -271,6 +271,11 @@ bool e820_get_entry(int, uint32_t, uint64_t *, uint64_t *);
>>              .driver   = "apic",\
>>              .property = "version",\
>>              .value    = stringify(0x11),\
>> +        },\
>> +        {\
>> +            .driver   = "virtio-net-pci",\
>> +            .property = "guest_announce",\
>> +            .value    = "off",\
>>          }
>>  
>>  #define PC_COMPAT_1_7 \
>> diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
>> index 4b32440..ca1a9c5 100644
>> --- a/include/hw/virtio/virtio-net.h
>> +++ b/include/hw/virtio/virtio-net.h
>> @@ -49,12 +49,14 @@
>>  #define VIRTIO_NET_F_CTRL_RX    18      /* Control channel RX mode support */
>>  #define VIRTIO_NET_F_CTRL_VLAN  19      /* Control channel VLAN filtering */
>>  #define VIRTIO_NET_F_CTRL_RX_EXTRA 20   /* Extra RX mode control support */
>> +#define VIRTIO_NET_F_GUEST_ANNOUNCE 21  /* Guest can announce itself */
>>  #define VIRTIO_NET_F_MQ         22      /* Device supports Receive Flow
>>                                           * Steering */
>>  
>>  #define VIRTIO_NET_F_CTRL_MAC_ADDR   23 /* Set MAC address */
>>  
>>  #define VIRTIO_NET_S_LINK_UP    1       /* Link is up */
>> +#define VIRTIO_NET_S_ANNOUNCE   2       /* Announcement is needed */
>>  
>>  #define TX_TIMER_INTERVAL 150000 /* 150 us */
>>  
>> @@ -193,6 +195,8 @@ typedef struct VirtIONet {
>>      char *netclient_name;
>>      char *netclient_type;
>>      uint64_t curr_guest_offloads;
>> +    QEMUTimer *announce_timer;
>> +    int announce;
>
> rename announce_counter pls.
>

Ok.
>>  } VirtIONet;
>>  
>>  #define VIRTIO_NET_CTRL_MAC    1
>> @@ -213,6 +217,17 @@ typedef struct VirtIONet {
>>   #define VIRTIO_NET_CTRL_VLAN_DEL             1
>>  
>>  /*
>> + * Control link announce acknowledgement
>> + *
> bunch of typos and vagueness below
>
>> + * The command VIRTIO_NET_CTRL_ANNOUNCE_ACK is used to indicate that
>> + * driver has recevied the notification and device would clear the
>> + * VIRTIO_NET_S_ANNOUNCE bit in the status filed after it received
>> + * this command.
> fixed version:
>
> VIRTIO_NET_S_ANNOUNCE bit in the status field requests link
> announcement from guest driver. The driver is notified by config space change
> interrupt.  The command VIRTIO_NET_CTRL_ANNOUNCE_ACK is used to indicate
> that the driver has received the notification. It makes the device clear the
> bit VIRTIO_NET_S_ANNOUNCE in the status field.
>

Sorry, look like I just copied those from a very old RFC. Will correct
this. Thanks
>> + */
>> +#define VIRTIO_NET_CTRL_ANNOUNCE       3
>> + #define VIRTIO_NET_CTRL_ANNOUNCE_ACK         0
>> +
>> +/*
>>   * Control Multiqueue
>>   *
>>   * The command VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET
>> @@ -251,6 +266,7 @@ struct virtio_net_ctrl_mq {
>>          DEFINE_PROP_BIT("guest_tso6", _state, _field, VIRTIO_NET_F_GUEST_TSO6, true), \
>>          DEFINE_PROP_BIT("guest_ecn", _state, _field, VIRTIO_NET_F_GUEST_ECN, true), \
>>          DEFINE_PROP_BIT("guest_ufo", _state, _field, VIRTIO_NET_F_GUEST_UFO, true), \
>> +        DEFINE_PROP_BIT("guest_announce", _state, _field, VIRTIO_NET_F_GUEST_ANNOUNCE, true), \
>>          DEFINE_PROP_BIT("host_tso4", _state, _field, VIRTIO_NET_F_HOST_TSO4, true), \
>>          DEFINE_PROP_BIT("host_tso6", _state, _field, VIRTIO_NET_F_HOST_TSO6, true), \
>>          DEFINE_PROP_BIT("host_ecn", _state, _field, VIRTIO_NET_F_HOST_ECN, true), \
>> -- 
>> 1.7.1
Michael S. Tsirkin May 15, 2014, 9:45 a.m. UTC | #3
On Thu, May 15, 2014 at 05:22:28PM +0800, Jason Wang wrote:
> On 05/15/2014 04:28 PM, Michael S. Tsirkin wrote:
> > Thanks, looks good.
> > Some minor comments below,
> >
> > On Thu, May 15, 2014 at 03:16:47PM +0800, Jason Wang wrote:
> >> It's hard to track all mac addresses and their configurations (e.g
> >> vlan or ipv6) in qemu. Without those informations, it's impossible to
> > s/those informations/this information/
> >
> >> build proper garp packet after migration. The only possible solution
> >> to this is let guest (who knew all configurations) to do this.
> > s/knew/knows/
> >
> >> So, this patch introduces a new readonly config status bit of virtio-net,
> >> VIRTIO_NET_S_ANNOUNCE which is used to notify guest to announce
> >> presence of its link through config update interrupt.When guest has
> >> done the announcement, it should ack the notification through
> >> VIRTIO_NET_CTRL_ANNOUNCE_ACK cmd. This feature is negotiated by a new
> >> feature bit VIRTIO_NET_F_ANNOUNCE (which has already been supported by
> >> Linux guest).
> >>
> >> During load, a counter of announcing rounds were set so that the after
> > s/were/is/
> > s/the after/after/
> 
> Will correct those typos.
> >> the vm is running it can trigger rounds of config interrupts to notify
> >> the guest to build and send the correct garps.
> >>
> >> Tested with ping to guest with vlan during migration. Without the
> >> patch, lots of the packets were lost after migration. With the patch,
> >> could not notice packet loss after migration.
> > below changelog should go after ---, until the ack list.
> >
> 
> Ok.
> >> Reference:
> >> RFC v2: https://lists.gnu.org/archive/html/qemu-devel/2014-04/msg01750.html
> >> RFC v1: https://lists.gnu.org/archive/html/qemu-devel/2014-03/msg02648.html
> >> V7:     https://lists.gnu.org/archive/html/qemu-devel/2013-03/msg01127.html
> >>
> >> Changes from RFC v2:
> >> - use QEMU_CLOCK_VIRTUAL instead of QEMU_CLOCK_REALTIME
> >> - compat self announce for 2.0 machine type
> >>
> >> Changes from RFC v1:
> >> - clean VIRTIO_NET_S_ANNOUNCE bit during reset
> >> - free announce timer during clean
> >> - make announce work for non-vhost case
> >>
> >> Changes from V7:
> >> - Instead of introducing a global method for each kind of nic, this
> >>   version limits the changes to virtio-net itself.
> >>
> >> Cc: Liuyongan <liuyongan@huawei.com>
> >> Cc: Amos Kong <akong@redhat.com>
> >> Signed-off-by: Jason Wang <jasowang@redhat.com>
> >> ---
> >>  hw/net/virtio-net.c            |   48 ++++++++++++++++++++++++++++++++++++++++
> >>  include/hw/i386/pc.h           |    5 ++++
> >>  include/hw/virtio/virtio-net.h |   16 +++++++++++++
> >>  3 files changed, 69 insertions(+), 0 deletions(-)
> >>
> >> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> >> index 940a7cf..98d59e9 100644
> >> --- a/hw/net/virtio-net.c
> >> +++ b/hw/net/virtio-net.c
> >> @@ -25,6 +25,7 @@
> >>  #include "monitor/monitor.h"
> >>  
> >>  #define VIRTIO_NET_VM_VERSION    11
> >> +#define VIRTIO_NET_ANNOUNCE_ROUNDS    3
> >>  
> >>  #define MAC_TABLE_ENTRIES    64
> >>  #define MAX_VLAN    (1 << 12)   /* Per 802.1Q definition */
> > I would make it  5 to be consistent with SELF_ANNOUNCE_ROUNDS
> > in savevm.c
> >
> > in fact, let's move SELF_ANNOUNCE_ROUNDS to include/migration/vmstate.h
> > and reuse it.
> 
> Ok.
> >> @@ -99,6 +100,25 @@ static bool virtio_net_started(VirtIONet *n, uint8_t status)
> >>          (n->status & VIRTIO_NET_S_LINK_UP) && vdev->vm_running;
> >>  }
> >>  
> >> +static void virtio_net_announce_timer(void *opaque)
> >> +{
> >> +    VirtIONet *n = opaque;
> >> +    VirtIODevice *vdev = VIRTIO_DEVICE(n);
> >> +
> >> +    if (n->announce &&
> > I would make it > 0 here, just in case it becomes negative as a result
> > of some bug.
> 
> Sure.
> >> +        virtio_net_started(n, vdev->status) &&
> >> +        vdev->guest_features & (0x1 << VIRTIO_NET_F_GUEST_ANNOUNCE) &&
> >> +        vdev->guest_features & (0x1 << VIRTIO_NET_F_CTRL_VQ)) {
> >> +
> >> +        n->announce--;
> >> +        n->status |= VIRTIO_NET_S_ANNOUNCE;
> >> +        virtio_notify_config(vdev);
> >> +    } else {
> >> +        timer_del(n->announce_timer);
> > why do this here?
> >
> >> +        n->announce = 0;
> > why is this here?
> >
> 
> If guest shuts down the device, there's no need to do the announcing.

It's still weird.
Either announce is 0 and then timer was not running
or it's > 0 and then this won't trigger.

> >> +    }
> >> +}
> >> +
> >>  static void virtio_net_vhost_status(VirtIONet *n, uint8_t status)
> >>  {
> >>      VirtIODevice *vdev = VIRTIO_DEVICE(n);
> >> @@ -147,6 +167,8 @@ static void virtio_net_set_status(struct VirtIODevice *vdev, uint8_t status)
> >>  
> >>      virtio_net_vhost_status(n, status);
> >>  
> >> +    virtio_net_announce_timer(n);
> >> +
> > why do this here? why not right after we set announce counter?
> 
> The reasons are:
> 
> - The counters were set in load, but the device is not running so we
> could not inject the interrupt at that time.

I see. This makes sense - but this isn't intuitive.
Why don't we simply start timer with current time?
Need to make sure it runs fine if time passes, but
I think it's fine.

> - We can stop the progress when guest is shutting down the device.

On shut down guest will reset device stopping timer - this seems enough.

> >
> >>      for (i = 0; i < n->max_queues; i++) {
> >>          q = &n->vqs[i];
> >>  
> >> @@ -322,6 +344,9 @@ static void virtio_net_reset(VirtIODevice *vdev)
> >>      n->nobcast = 0;
> >>      /* multiqueue is disabled by default */
> >>      n->curr_queues = 1;
> >> +    timer_del(n->announce_timer);
> >> +    n->announce = 0;
> >> +    n->status &= ~VIRTIO_NET_S_ANNOUNCE;
> >>  
> >>      /* Flush any MAC and VLAN filter table state */
> >>      n->mac_table.in_use = 0;
> >> @@ -731,6 +756,22 @@ static int virtio_net_handle_vlan_table(VirtIONet *n, uint8_t cmd,
> >>      return VIRTIO_NET_OK;
> >>  }
> >>  
> >> +static int virtio_net_handle_announce(VirtIONet *n, uint8_t cmd,
> >> +                                      struct iovec *iov, unsigned int iov_cnt)
> >> +{
> >> +    if (cmd == VIRTIO_NET_CTRL_ANNOUNCE_ACK) {
> >> +        n->status &= ~VIRTIO_NET_S_ANNOUNCE;
> >> +        if (n->announce) {
> > I would make it > 0 here, just in case it becomes negative as a result
> > of some bug.
> 
> Ok.
> >> +            timer_mod(n->announce_timer,
> >> +                      qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + 50 +
> >> +                      (VIRTIO_NET_ANNOUNCE_ROUNDS - n->announce - 1) * 100);
> > savevm.c has this code, and a comment:
> >         /* delay 50ms, 150ms, 250ms, ... */
> > 	50 + (SELF_ANNOUNCE_ROUNDS - count - 1) * 100
> >
> > how about an API in include/migration/vmstate.h
> >
> > static inline
> > int64_t self_announce_delay(int round)
> > {
> > 	assert(round < SELF_ANNOUNCE_ROUNDS && round > 0);
> >         /* delay 50ms, 150ms, 250ms, ... */
> > 	return 50 + (SELF_ANNOUNCE_ROUNDS - round - 1) * 100;
> > }
> >
> > or something to this end.
> >
> 
> Good idea, will do this.
> >> +        }
> >> +        return VIRTIO_NET_OK;
> >> +    } else {
> >> +        return VIRTIO_NET_ERR;
> >> +    }
> >> +}
> >> +
> >>  static int virtio_net_handle_mq(VirtIONet *n, uint8_t cmd,
> >>                                  struct iovec *iov, unsigned int iov_cnt)
> >>  {
> >> @@ -794,6 +835,8 @@ static void virtio_net_handle_ctrl(VirtIODevice *vdev, VirtQueue *vq)
> >>              status = virtio_net_handle_mac(n, ctrl.cmd, iov, iov_cnt);
> >>          } else if (ctrl.class == VIRTIO_NET_CTRL_VLAN) {
> >>              status = virtio_net_handle_vlan_table(n, ctrl.cmd, iov, iov_cnt);
> >> +        } else if (ctrl.class == VIRTIO_NET_CTRL_ANNOUNCE) {
> >> +            status = virtio_net_handle_announce(n, ctrl.cmd, iov, iov_cnt);
> >>          } else if (ctrl.class == VIRTIO_NET_CTRL_MQ) {
> >>              status = virtio_net_handle_mq(n, ctrl.cmd, iov, iov_cnt);
> >>          } else if (ctrl.class == VIRTIO_NET_CTRL_GUEST_OFFLOADS) {
> >> @@ -1451,6 +1494,7 @@ static int virtio_net_load(QEMUFile *f, void *opaque, int version_id)
> >>          qemu_get_subqueue(n->nic, i)->link_down = link_down;
> >>      }
> >>  
> >> +    n->announce = VIRTIO_NET_ANNOUNCE_ROUNDS;
> > Well if virtio_net_handle_announce runs now it will start timer
> > in the past, right?
> > This seems wrong.
> 
> Not sure I get the case. When in virtio_net_load() the vm is not even
> running so looks like virtio_net_handle_announce() could not run in the
> same time.

I see, this works because you decrement it when VM starts running.
I think it's not a good idea to rely on this though,
better do everything from timer, and handle
the case of command arriving too early.


> >
> >>      return 0;
> >>  }
> >>  
> >> @@ -1562,6 +1606,8 @@ static void virtio_net_device_realize(DeviceState *dev, Error **errp)
> >>      qemu_macaddr_default_if_unset(&n->nic_conf.macaddr);
> >>      memcpy(&n->mac[0], &n->nic_conf.macaddr, sizeof(n->mac));
> >>      n->status = VIRTIO_NET_S_LINK_UP;
> >> +    n->announce_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL,
> >> +                                     virtio_net_announce_timer, n);
> >>  
> >>      if (n->netclient_type) {
> >>          /*
> >> @@ -1642,6 +1688,8 @@ static void virtio_net_device_unrealize(DeviceState *dev, Error **errp)
> >>          }
> >>      }
> >>  
> >> +    timer_del(n->announce_timer);
> >> +    timer_free(n->announce_timer);
> >>      g_free(n->vqs);
> >>      qemu_del_nic(n->nic);
> >>      virtio_cleanup(vdev);
> >> diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
> >> index 32a7687..f93b427 100644
> >> --- a/include/hw/i386/pc.h
> >> +++ b/include/hw/i386/pc.h
> >> @@ -271,6 +271,11 @@ bool e820_get_entry(int, uint32_t, uint64_t *, uint64_t *);
> >>              .driver   = "apic",\
> >>              .property = "version",\
> >>              .value    = stringify(0x11),\
> >> +        },\
> >> +        {\
> >> +            .driver   = "virtio-net-pci",\
> >> +            .property = "guest_announce",\
> >> +            .value    = "off",\
> >>          }
> >>  
> >>  #define PC_COMPAT_1_7 \
> >> diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
> >> index 4b32440..ca1a9c5 100644
> >> --- a/include/hw/virtio/virtio-net.h
> >> +++ b/include/hw/virtio/virtio-net.h
> >> @@ -49,12 +49,14 @@
> >>  #define VIRTIO_NET_F_CTRL_RX    18      /* Control channel RX mode support */
> >>  #define VIRTIO_NET_F_CTRL_VLAN  19      /* Control channel VLAN filtering */
> >>  #define VIRTIO_NET_F_CTRL_RX_EXTRA 20   /* Extra RX mode control support */
> >> +#define VIRTIO_NET_F_GUEST_ANNOUNCE 21  /* Guest can announce itself */
> >>  #define VIRTIO_NET_F_MQ         22      /* Device supports Receive Flow
> >>                                           * Steering */
> >>  
> >>  #define VIRTIO_NET_F_CTRL_MAC_ADDR   23 /* Set MAC address */
> >>  
> >>  #define VIRTIO_NET_S_LINK_UP    1       /* Link is up */
> >> +#define VIRTIO_NET_S_ANNOUNCE   2       /* Announcement is needed */
> >>  
> >>  #define TX_TIMER_INTERVAL 150000 /* 150 us */
> >>  
> >> @@ -193,6 +195,8 @@ typedef struct VirtIONet {
> >>      char *netclient_name;
> >>      char *netclient_type;
> >>      uint64_t curr_guest_offloads;
> >> +    QEMUTimer *announce_timer;
> >> +    int announce;
> >
> > rename announce_counter pls.
> >
> 
> Ok.
> >>  } VirtIONet;
> >>  
> >>  #define VIRTIO_NET_CTRL_MAC    1
> >> @@ -213,6 +217,17 @@ typedef struct VirtIONet {
> >>   #define VIRTIO_NET_CTRL_VLAN_DEL             1
> >>  
> >>  /*
> >> + * Control link announce acknowledgement
> >> + *
> > bunch of typos and vagueness below
> >
> >> + * The command VIRTIO_NET_CTRL_ANNOUNCE_ACK is used to indicate that
> >> + * driver has recevied the notification and device would clear the
> >> + * VIRTIO_NET_S_ANNOUNCE bit in the status filed after it received
> >> + * this command.
> > fixed version:
> >
> > VIRTIO_NET_S_ANNOUNCE bit in the status field requests link
> > announcement from guest driver. The driver is notified by config space change
> > interrupt.  The command VIRTIO_NET_CTRL_ANNOUNCE_ACK is used to indicate
> > that the driver has received the notification. It makes the device clear the
> > bit VIRTIO_NET_S_ANNOUNCE in the status field.
> >
> 
> Sorry, look like I just copied those from a very old RFC. Will correct
> this. Thanks
> >> + */
> >> +#define VIRTIO_NET_CTRL_ANNOUNCE       3
> >> + #define VIRTIO_NET_CTRL_ANNOUNCE_ACK         0
> >> +
> >> +/*
> >>   * Control Multiqueue
> >>   *
> >>   * The command VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET
> >> @@ -251,6 +266,7 @@ struct virtio_net_ctrl_mq {
> >>          DEFINE_PROP_BIT("guest_tso6", _state, _field, VIRTIO_NET_F_GUEST_TSO6, true), \
> >>          DEFINE_PROP_BIT("guest_ecn", _state, _field, VIRTIO_NET_F_GUEST_ECN, true), \
> >>          DEFINE_PROP_BIT("guest_ufo", _state, _field, VIRTIO_NET_F_GUEST_UFO, true), \
> >> +        DEFINE_PROP_BIT("guest_announce", _state, _field, VIRTIO_NET_F_GUEST_ANNOUNCE, true), \
> >>          DEFINE_PROP_BIT("host_tso4", _state, _field, VIRTIO_NET_F_HOST_TSO4, true), \
> >>          DEFINE_PROP_BIT("host_tso6", _state, _field, VIRTIO_NET_F_HOST_TSO6, true), \
> >>          DEFINE_PROP_BIT("host_ecn", _state, _field, VIRTIO_NET_F_HOST_ECN, true), \
> >> -- 
> >> 1.7.1
Jason Wang May 16, 2014, 5:02 a.m. UTC | #4
On 05/15/2014 05:45 PM, Michael S. Tsirkin wrote:
> On Thu, May 15, 2014 at 05:22:28PM +0800, Jason Wang wrote:
>> On 05/15/2014 04:28 PM, Michael S. Tsirkin wrote:
>>> Thanks, looks good.
>>> Some minor comments below,
>>>
>>> On Thu, May 15, 2014 at 03:16:47PM +0800, Jason Wang wrote:
>>>> It's hard to track all mac addresses and their configurations (e.g
>>>> vlan or ipv6) in qemu. Without those informations, it's impossible to
>>> s/those informations/this information/
>>>
>>>> build proper garp packet after migration. The only possible solution
>>>> to this is let guest (who knew all configurations) to do this.
>>> s/knew/knows/
>>>
>>>> So, this patch introduces a new readonly config status bit of virtio-net,
>>>> VIRTIO_NET_S_ANNOUNCE which is used to notify guest to announce
>>>> presence of its link through config update interrupt.When guest has
>>>> done the announcement, it should ack the notification through
>>>> VIRTIO_NET_CTRL_ANNOUNCE_ACK cmd. This feature is negotiated by a new
>>>> feature bit VIRTIO_NET_F_ANNOUNCE (which has already been supported by
>>>> Linux guest).
>>>>
>>>> During load, a counter of announcing rounds were set so that the after
>>> s/were/is/
>>> s/the after/after/
>> Will correct those typos.
>>>> the vm is running it can trigger rounds of config interrupts to notify
>>>> the guest to build and send the correct garps.
>>>>
>>>> Tested with ping to guest with vlan during migration. Without the
>>>> patch, lots of the packets were lost after migration. With the patch,
>>>> could not notice packet loss after migration.
>>> below changelog should go after ---, until the ack list.
>>>
>> Ok.
>>>> Reference:
>>>> RFC v2: https://lists.gnu.org/archive/html/qemu-devel/2014-04/msg01750.html
>>>> RFC v1: https://lists.gnu.org/archive/html/qemu-devel/2014-03/msg02648.html
>>>> V7:     https://lists.gnu.org/archive/html/qemu-devel/2013-03/msg01127.html
>>>>
>>>> Changes from RFC v2:
>>>> - use QEMU_CLOCK_VIRTUAL instead of QEMU_CLOCK_REALTIME
>>>> - compat self announce for 2.0 machine type
>>>>
>>>> Changes from RFC v1:
>>>> - clean VIRTIO_NET_S_ANNOUNCE bit during reset
>>>> - free announce timer during clean
>>>> - make announce work for non-vhost case
>>>>
>>>> Changes from V7:
>>>> - Instead of introducing a global method for each kind of nic, this
>>>>   version limits the changes to virtio-net itself.
>>>>
>>>> Cc: Liuyongan <liuyongan@huawei.com>
>>>> Cc: Amos Kong <akong@redhat.com>
>>>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>>>> ---
>>>>  hw/net/virtio-net.c            |   48 ++++++++++++++++++++++++++++++++++++++++
>>>>  include/hw/i386/pc.h           |    5 ++++
>>>>  include/hw/virtio/virtio-net.h |   16 +++++++++++++
>>>>  3 files changed, 69 insertions(+), 0 deletions(-)
>>>>
>>>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
>>>> index 940a7cf..98d59e9 100644
>>>> --- a/hw/net/virtio-net.c
>>>> +++ b/hw/net/virtio-net.c
>>>> @@ -25,6 +25,7 @@
>>>>  #include "monitor/monitor.h"
>>>>  
>>>>  #define VIRTIO_NET_VM_VERSION    11
>>>> +#define VIRTIO_NET_ANNOUNCE_ROUNDS    3
>>>>  
>>>>  #define MAC_TABLE_ENTRIES    64
>>>>  #define MAX_VLAN    (1 << 12)   /* Per 802.1Q definition */
>>> I would make it  5 to be consistent with SELF_ANNOUNCE_ROUNDS
>>> in savevm.c
>>>
>>> in fact, let's move SELF_ANNOUNCE_ROUNDS to include/migration/vmstate.h
>>> and reuse it.
>> Ok.
>>>> @@ -99,6 +100,25 @@ static bool virtio_net_started(VirtIONet *n, uint8_t status)
>>>>          (n->status & VIRTIO_NET_S_LINK_UP) && vdev->vm_running;
>>>>  }
>>>>  
>>>> +static void virtio_net_announce_timer(void *opaque)
>>>> +{
>>>> +    VirtIONet *n = opaque;
>>>> +    VirtIODevice *vdev = VIRTIO_DEVICE(n);
>>>> +
>>>> +    if (n->announce &&
>>> I would make it > 0 here, just in case it becomes negative as a result
>>> of some bug.
>> Sure.
>>>> +        virtio_net_started(n, vdev->status) &&
>>>> +        vdev->guest_features & (0x1 << VIRTIO_NET_F_GUEST_ANNOUNCE) &&
>>>> +        vdev->guest_features & (0x1 << VIRTIO_NET_F_CTRL_VQ)) {
>>>> +
>>>> +        n->announce--;
>>>> +        n->status |= VIRTIO_NET_S_ANNOUNCE;
>>>> +        virtio_notify_config(vdev);
>>>> +    } else {
>>>> +        timer_del(n->announce_timer);
>>> why do this here?
>>>
>>>> +        n->announce = 0;
>>> why is this here?
>>>
>> If guest shuts down the device, there's no need to do the announcing.
> It's still weird.
> Either announce is 0 and then timer was not running
> or it's > 0 and then this won't trigger.

Right, the logic here is for QEMU_CLOCK_REALTIME. But there's another
question, we use QEMU_CLOCK_VIRTUAL while qemu is using
QEMU_CLOCK_REALTIME for its announcing. This looks fine but not sure
whether this is safe. And if the link was down, it's better for us to
stop the announcing?

>
>>>> +    }
>>>> +}
>>>> +
>>>>  static void virtio_net_vhost_status(VirtIONet *n, uint8_t status)
>>>>  {
>>>>      VirtIODevice *vdev = VIRTIO_DEVICE(n);
>>>> @@ -147,6 +167,8 @@ static void virtio_net_set_status(struct VirtIODevice *vdev, uint8_t status)
>>>>  
>>>>      virtio_net_vhost_status(n, status);
>>>>  
>>>> +    virtio_net_announce_timer(n);
>>>> +
>>> why do this here? why not right after we set announce counter?
>> The reasons are:
>>
>> - The counters were set in load, but the device is not running so we
>> could not inject the interrupt at that time.
> I see. This makes sense - but this isn't intuitive.
> Why don't we simply start timer with current time?
> Need to make sure it runs fine if time passes, but
> I think it's fine.

Not sure I get the point, I didn't see any differences except for an
extra timer fire.
>
>> - We can stop the progress when guest is shutting down the device.
> On shut down guest will reset device stopping timer - this seems enough.

Yes, I see.
>>>>      for (i = 0; i < n->max_queues; i++) {
>>>>          q = &n->vqs[i];
>>>>  
>>>> @@ -322,6 +344,9 @@ static void virtio_net_reset(VirtIODevice *vdev)
>>>>      n->nobcast = 0;
>>>>      /* multiqueue is disabled by default */
>>>>      n->curr_queues = 1;
>>>> +    timer_del(n->announce_timer);
>>>> +    n->announce = 0;
>>>> +    n->status &= ~VIRTIO_NET_S_ANNOUNCE;
>>>>  
>>>>      /* Flush any MAC and VLAN filter table state */
>>>>      n->mac_table.in_use = 0;
>>>> @@ -731,6 +756,22 @@ static int virtio_net_handle_vlan_table(VirtIONet *n, uint8_t cmd,
>>>>      return VIRTIO_NET_OK;
>>>>  }
>>>>  
>>>> +static int virtio_net_handle_announce(VirtIONet *n, uint8_t cmd,
>>>> +                                      struct iovec *iov, unsigned int iov_cnt)
>>>> +{
>>>> +    if (cmd == VIRTIO_NET_CTRL_ANNOUNCE_ACK) {
>>>> +        n->status &= ~VIRTIO_NET_S_ANNOUNCE;
>>>> +        if (n->announce) {
>>> I would make it > 0 here, just in case it becomes negative as a result
>>> of some bug.
>> Ok.
>>>> +            timer_mod(n->announce_timer,
>>>> +                      qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + 50 +
>>>> +                      (VIRTIO_NET_ANNOUNCE_ROUNDS - n->announce - 1) * 100);
>>> savevm.c has this code, and a comment:
>>>         /* delay 50ms, 150ms, 250ms, ... */
>>> 	50 + (SELF_ANNOUNCE_ROUNDS - count - 1) * 100
>>>
>>> how about an API in include/migration/vmstate.h
>>>
>>> static inline
>>> int64_t self_announce_delay(int round)
>>> {
>>> 	assert(round < SELF_ANNOUNCE_ROUNDS && round > 0);
>>>         /* delay 50ms, 150ms, 250ms, ... */
>>> 	return 50 + (SELF_ANNOUNCE_ROUNDS - round - 1) * 100;
>>> }
>>>
>>> or something to this end.
>>>
>> Good idea, will do this.
>>>> +        }
>>>> +        return VIRTIO_NET_OK;
>>>> +    } else {
>>>> +        return VIRTIO_NET_ERR;
>>>> +    }
>>>> +}
>>>> +
>>>>  static int virtio_net_handle_mq(VirtIONet *n, uint8_t cmd,
>>>>                                  struct iovec *iov, unsigned int iov_cnt)
>>>>  {
>>>> @@ -794,6 +835,8 @@ static void virtio_net_handle_ctrl(VirtIODevice *vdev, VirtQueue *vq)
>>>>              status = virtio_net_handle_mac(n, ctrl.cmd, iov, iov_cnt);
>>>>          } else if (ctrl.class == VIRTIO_NET_CTRL_VLAN) {
>>>>              status = virtio_net_handle_vlan_table(n, ctrl.cmd, iov, iov_cnt);
>>>> +        } else if (ctrl.class == VIRTIO_NET_CTRL_ANNOUNCE) {
>>>> +            status = virtio_net_handle_announce(n, ctrl.cmd, iov, iov_cnt);
>>>>          } else if (ctrl.class == VIRTIO_NET_CTRL_MQ) {
>>>>              status = virtio_net_handle_mq(n, ctrl.cmd, iov, iov_cnt);
>>>>          } else if (ctrl.class == VIRTIO_NET_CTRL_GUEST_OFFLOADS) {
>>>> @@ -1451,6 +1494,7 @@ static int virtio_net_load(QEMUFile *f, void *opaque, int version_id)
>>>>          qemu_get_subqueue(n->nic, i)->link_down = link_down;
>>>>      }
>>>>  
>>>> +    n->announce = VIRTIO_NET_ANNOUNCE_ROUNDS;
>>> Well if virtio_net_handle_announce runs now it will start timer
>>> in the past, right?
>>> This seems wrong.
>> Not sure I get the case. When in virtio_net_load() the vm is not even
>> running so looks like virtio_net_handle_announce() could not run in the
>> same time.
> I see, this works because you decrement it when VM starts running.
> I think it's not a good idea to rely on this though,
> better do everything from timer, and handle
> the case of command arriving too early.
>

Right, if QEMU_CLOCK_VIRTUAL is fine, we can do everything in a timer.

For the case of command arriving too early. Is there anything else we
need to do? Since we only start the next timer when we get ack command.

Thanks
Michael S. Tsirkin May 18, 2014, 9:04 a.m. UTC | #5
On Fri, May 16, 2014 at 01:02:51PM +0800, Jason Wang wrote:
> On 05/15/2014 05:45 PM, Michael S. Tsirkin wrote:
> > On Thu, May 15, 2014 at 05:22:28PM +0800, Jason Wang wrote:
> >> On 05/15/2014 04:28 PM, Michael S. Tsirkin wrote:
> >>> Thanks, looks good.
> >>> Some minor comments below,
> >>>
> >>> On Thu, May 15, 2014 at 03:16:47PM +0800, Jason Wang wrote:
> >>>> It's hard to track all mac addresses and their configurations (e.g
> >>>> vlan or ipv6) in qemu. Without those informations, it's impossible to
> >>> s/those informations/this information/
> >>>
> >>>> build proper garp packet after migration. The only possible solution
> >>>> to this is let guest (who knew all configurations) to do this.
> >>> s/knew/knows/
> >>>
> >>>> So, this patch introduces a new readonly config status bit of virtio-net,
> >>>> VIRTIO_NET_S_ANNOUNCE which is used to notify guest to announce
> >>>> presence of its link through config update interrupt.When guest has
> >>>> done the announcement, it should ack the notification through
> >>>> VIRTIO_NET_CTRL_ANNOUNCE_ACK cmd. This feature is negotiated by a new
> >>>> feature bit VIRTIO_NET_F_ANNOUNCE (which has already been supported by
> >>>> Linux guest).
> >>>>
> >>>> During load, a counter of announcing rounds were set so that the after
> >>> s/were/is/
> >>> s/the after/after/
> >> Will correct those typos.
> >>>> the vm is running it can trigger rounds of config interrupts to notify
> >>>> the guest to build and send the correct garps.
> >>>>
> >>>> Tested with ping to guest with vlan during migration. Without the
> >>>> patch, lots of the packets were lost after migration. With the patch,
> >>>> could not notice packet loss after migration.
> >>> below changelog should go after ---, until the ack list.
> >>>
> >> Ok.
> >>>> Reference:
> >>>> RFC v2: https://lists.gnu.org/archive/html/qemu-devel/2014-04/msg01750.html
> >>>> RFC v1: https://lists.gnu.org/archive/html/qemu-devel/2014-03/msg02648.html
> >>>> V7:     https://lists.gnu.org/archive/html/qemu-devel/2013-03/msg01127.html
> >>>>
> >>>> Changes from RFC v2:
> >>>> - use QEMU_CLOCK_VIRTUAL instead of QEMU_CLOCK_REALTIME
> >>>> - compat self announce for 2.0 machine type
> >>>>
> >>>> Changes from RFC v1:
> >>>> - clean VIRTIO_NET_S_ANNOUNCE bit during reset
> >>>> - free announce timer during clean
> >>>> - make announce work for non-vhost case
> >>>>
> >>>> Changes from V7:
> >>>> - Instead of introducing a global method for each kind of nic, this
> >>>>   version limits the changes to virtio-net itself.
> >>>>
> >>>> Cc: Liuyongan <liuyongan@huawei.com>
> >>>> Cc: Amos Kong <akong@redhat.com>
> >>>> Signed-off-by: Jason Wang <jasowang@redhat.com>
> >>>> ---
> >>>>  hw/net/virtio-net.c            |   48 ++++++++++++++++++++++++++++++++++++++++
> >>>>  include/hw/i386/pc.h           |    5 ++++
> >>>>  include/hw/virtio/virtio-net.h |   16 +++++++++++++
> >>>>  3 files changed, 69 insertions(+), 0 deletions(-)
> >>>>
> >>>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> >>>> index 940a7cf..98d59e9 100644
> >>>> --- a/hw/net/virtio-net.c
> >>>> +++ b/hw/net/virtio-net.c
> >>>> @@ -25,6 +25,7 @@
> >>>>  #include "monitor/monitor.h"
> >>>>  
> >>>>  #define VIRTIO_NET_VM_VERSION    11
> >>>> +#define VIRTIO_NET_ANNOUNCE_ROUNDS    3
> >>>>  
> >>>>  #define MAC_TABLE_ENTRIES    64
> >>>>  #define MAX_VLAN    (1 << 12)   /* Per 802.1Q definition */
> >>> I would make it  5 to be consistent with SELF_ANNOUNCE_ROUNDS
> >>> in savevm.c
> >>>
> >>> in fact, let's move SELF_ANNOUNCE_ROUNDS to include/migration/vmstate.h
> >>> and reuse it.
> >> Ok.
> >>>> @@ -99,6 +100,25 @@ static bool virtio_net_started(VirtIONet *n, uint8_t status)
> >>>>          (n->status & VIRTIO_NET_S_LINK_UP) && vdev->vm_running;
> >>>>  }
> >>>>  
> >>>> +static void virtio_net_announce_timer(void *opaque)
> >>>> +{
> >>>> +    VirtIONet *n = opaque;
> >>>> +    VirtIODevice *vdev = VIRTIO_DEVICE(n);
> >>>> +
> >>>> +    if (n->announce &&
> >>> I would make it > 0 here, just in case it becomes negative as a result
> >>> of some bug.
> >> Sure.
> >>>> +        virtio_net_started(n, vdev->status) &&
> >>>> +        vdev->guest_features & (0x1 << VIRTIO_NET_F_GUEST_ANNOUNCE) &&
> >>>> +        vdev->guest_features & (0x1 << VIRTIO_NET_F_CTRL_VQ)) {
> >>>> +
> >>>> +        n->announce--;
> >>>> +        n->status |= VIRTIO_NET_S_ANNOUNCE;
> >>>> +        virtio_notify_config(vdev);
> >>>> +    } else {
> >>>> +        timer_del(n->announce_timer);
> >>> why do this here?
> >>>
> >>>> +        n->announce = 0;
> >>> why is this here?
> >>>
> >> If guest shuts down the device, there's no need to do the announcing.
> > It's still weird.
> > Either announce is 0 and then timer was not running
> > or it's > 0 and then this won't trigger.
> 
> Right, the logic here is for QEMU_CLOCK_REALTIME. But there's another
> question, we use QEMU_CLOCK_VIRTUAL while qemu is using
> QEMU_CLOCK_REALTIME for its announcing. This looks fine but not sure
> whether this is safe.

Meaning QEMU_CLOCK_REALTIME that qemu uses?
Not sure either but it doesn't modify guest state so it seems safe.


> And if the link was down, it's better for us to
> stop the announcing?

I think that it doesn't matter: guest won't announce when
link is down, anyway.
Not worth it to write extra logic here in the host.

> >
> >>>> +    }
> >>>> +}
> >>>> +
> >>>>  static void virtio_net_vhost_status(VirtIONet *n, uint8_t status)
> >>>>  {
> >>>>      VirtIODevice *vdev = VIRTIO_DEVICE(n);
> >>>> @@ -147,6 +167,8 @@ static void virtio_net_set_status(struct VirtIODevice *vdev, uint8_t status)
> >>>>  
> >>>>      virtio_net_vhost_status(n, status);
> >>>>  
> >>>> +    virtio_net_announce_timer(n);
> >>>> +
> >>> why do this here? why not right after we set announce counter?
> >> The reasons are:
> >>
> >> - The counters were set in load, but the device is not running so we
> >> could not inject the interrupt at that time.
> > I see. This makes sense - but this isn't intuitive.
> > Why don't we simply start timer with current time?
> > Need to make sure it runs fine if time passes, but
> > I think it's fine.
> 
> Not sure I get the point, I didn't see any differences except for an
> extra timer fire.

The only reason you call virtio_net_announce_timer from  set_status
is because it gets run on vm start/stop.
It's true but not intuitive.
Just run timer always from timer, it's clearer this way :)


> >
> >> - We can stop the progress when guest is shutting down the device.
> > On shut down guest will reset device stopping timer - this seems enough.
> 
> Yes, I see.
> >>>>      for (i = 0; i < n->max_queues; i++) {
> >>>>          q = &n->vqs[i];
> >>>>  
> >>>> @@ -322,6 +344,9 @@ static void virtio_net_reset(VirtIODevice *vdev)
> >>>>      n->nobcast = 0;
> >>>>      /* multiqueue is disabled by default */
> >>>>      n->curr_queues = 1;
> >>>> +    timer_del(n->announce_timer);
> >>>> +    n->announce = 0;
> >>>> +    n->status &= ~VIRTIO_NET_S_ANNOUNCE;
> >>>>  
> >>>>      /* Flush any MAC and VLAN filter table state */
> >>>>      n->mac_table.in_use = 0;
> >>>> @@ -731,6 +756,22 @@ static int virtio_net_handle_vlan_table(VirtIONet *n, uint8_t cmd,
> >>>>      return VIRTIO_NET_OK;
> >>>>  }
> >>>>  
> >>>> +static int virtio_net_handle_announce(VirtIONet *n, uint8_t cmd,
> >>>> +                                      struct iovec *iov, unsigned int iov_cnt)
> >>>> +{
> >>>> +    if (cmd == VIRTIO_NET_CTRL_ANNOUNCE_ACK) {
> >>>> +        n->status &= ~VIRTIO_NET_S_ANNOUNCE;
> >>>> +        if (n->announce) {
> >>> I would make it > 0 here, just in case it becomes negative as a result
> >>> of some bug.
> >> Ok.
> >>>> +            timer_mod(n->announce_timer,
> >>>> +                      qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + 50 +
> >>>> +                      (VIRTIO_NET_ANNOUNCE_ROUNDS - n->announce - 1) * 100);
> >>> savevm.c has this code, and a comment:
> >>>         /* delay 50ms, 150ms, 250ms, ... */
> >>> 	50 + (SELF_ANNOUNCE_ROUNDS - count - 1) * 100
> >>>
> >>> how about an API in include/migration/vmstate.h
> >>>
> >>> static inline
> >>> int64_t self_announce_delay(int round)
> >>> {
> >>> 	assert(round < SELF_ANNOUNCE_ROUNDS && round > 0);
> >>>         /* delay 50ms, 150ms, 250ms, ... */
> >>> 	return 50 + (SELF_ANNOUNCE_ROUNDS - round - 1) * 100;
> >>> }
> >>>
> >>> or something to this end.
> >>>
> >> Good idea, will do this.
> >>>> +        }
> >>>> +        return VIRTIO_NET_OK;
> >>>> +    } else {
> >>>> +        return VIRTIO_NET_ERR;
> >>>> +    }
> >>>> +}
> >>>> +
> >>>>  static int virtio_net_handle_mq(VirtIONet *n, uint8_t cmd,
> >>>>                                  struct iovec *iov, unsigned int iov_cnt)
> >>>>  {
> >>>> @@ -794,6 +835,8 @@ static void virtio_net_handle_ctrl(VirtIODevice *vdev, VirtQueue *vq)
> >>>>              status = virtio_net_handle_mac(n, ctrl.cmd, iov, iov_cnt);
> >>>>          } else if (ctrl.class == VIRTIO_NET_CTRL_VLAN) {
> >>>>              status = virtio_net_handle_vlan_table(n, ctrl.cmd, iov, iov_cnt);
> >>>> +        } else if (ctrl.class == VIRTIO_NET_CTRL_ANNOUNCE) {
> >>>> +            status = virtio_net_handle_announce(n, ctrl.cmd, iov, iov_cnt);
> >>>>          } else if (ctrl.class == VIRTIO_NET_CTRL_MQ) {
> >>>>              status = virtio_net_handle_mq(n, ctrl.cmd, iov, iov_cnt);
> >>>>          } else if (ctrl.class == VIRTIO_NET_CTRL_GUEST_OFFLOADS) {
> >>>> @@ -1451,6 +1494,7 @@ static int virtio_net_load(QEMUFile *f, void *opaque, int version_id)
> >>>>          qemu_get_subqueue(n->nic, i)->link_down = link_down;
> >>>>      }
> >>>>  
> >>>> +    n->announce = VIRTIO_NET_ANNOUNCE_ROUNDS;
> >>> Well if virtio_net_handle_announce runs now it will start timer
> >>> in the past, right?
> >>> This seems wrong.
> >> Not sure I get the case. When in virtio_net_load() the vm is not even
> >> running so looks like virtio_net_handle_announce() could not run in the
> >> same time.
> > I see, this works because you decrement it when VM starts running.
> > I think it's not a good idea to rely on this though,
> > better do everything from timer, and handle
> > the case of command arriving too early.
> >
> 
> Right, if QEMU_CLOCK_VIRTUAL is fine, we can do everything in a timer.
> 
> For the case of command arriving too early. Is there anything else we
> need to do? Since we only start the next timer when we get ack command.
> 
> Thanks

I think we need to make sure we don't set the timer in the past or
very far in the future.
Jason Wang May 19, 2014, 9:34 a.m. UTC | #6
On 05/18/2014 05:04 PM, Michael S. Tsirkin wrote:
> On Fri, May 16, 2014 at 01:02:51PM +0800, Jason Wang wrote:
>> On 05/15/2014 05:45 PM, Michael S. Tsirkin wrote:
>>> On Thu, May 15, 2014 at 05:22:28PM +0800, Jason Wang wrote:
>>>> On 05/15/2014 04:28 PM, Michael S. Tsirkin wrote:
>>>>> Thanks, looks good.
>>>>> Some minor comments below,
>>>>>
>>>>> On Thu, May 15, 2014 at 03:16:47PM +0800, Jason Wang wrote:
>>>>>> It's hard to track all mac addresses and their configurations (e.g
>>>>>> vlan or ipv6) in qemu. Without those informations, it's impossible to
>>>>> s/those informations/this information/
>>>>>
>>>>>> build proper garp packet after migration. The only possible solution
>>>>>> to this is let guest (who knew all configurations) to do this.
>>>>> s/knew/knows/
>>>>>
>>>>>> So, this patch introduces a new readonly config status bit of virtio-net,
>>>>>> VIRTIO_NET_S_ANNOUNCE which is used to notify guest to announce
>>>>>> presence of its link through config update interrupt.When guest has
>>>>>> done the announcement, it should ack the notification through
>>>>>> VIRTIO_NET_CTRL_ANNOUNCE_ACK cmd. This feature is negotiated by a new
>>>>>> feature bit VIRTIO_NET_F_ANNOUNCE (which has already been supported by
>>>>>> Linux guest).
>>>>>>
>>>>>> During load, a counter of announcing rounds were set so that the after
>>>>> s/were/is/
>>>>> s/the after/after/
>>>> Will correct those typos.
>>>>>> the vm is running it can trigger rounds of config interrupts to notify
>>>>>> the guest to build and send the correct garps.
>>>>>>
>>>>>> Tested with ping to guest with vlan during migration. Without the
>>>>>> patch, lots of the packets were lost after migration. With the patch,
>>>>>> could not notice packet loss after migration.
>>>>> below changelog should go after ---, until the ack list.
>>>>>
>>>> Ok.
>>>>>> Reference:
>>>>>> RFC v2: https://lists.gnu.org/archive/html/qemu-devel/2014-04/msg01750.html
>>>>>> RFC v1: https://lists.gnu.org/archive/html/qemu-devel/2014-03/msg02648.html
>>>>>> V7:     https://lists.gnu.org/archive/html/qemu-devel/2013-03/msg01127.html
>>>>>>
>>>>>> Changes from RFC v2:
>>>>>> - use QEMU_CLOCK_VIRTUAL instead of QEMU_CLOCK_REALTIME
>>>>>> - compat self announce for 2.0 machine type
>>>>>>
>>>>>> Changes from RFC v1:
>>>>>> - clean VIRTIO_NET_S_ANNOUNCE bit during reset
>>>>>> - free announce timer during clean
>>>>>> - make announce work for non-vhost case
>>>>>>
>>>>>> Changes from V7:
>>>>>> - Instead of introducing a global method for each kind of nic, this
>>>>>>   version limits the changes to virtio-net itself.
>>>>>>
>>>>>> Cc: Liuyongan <liuyongan@huawei.com>
>>>>>> Cc: Amos Kong <akong@redhat.com>
>>>>>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>>>>>> ---
>>>>>>  hw/net/virtio-net.c            |   48 ++++++++++++++++++++++++++++++++++++++++
>>>>>>  include/hw/i386/pc.h           |    5 ++++
>>>>>>  include/hw/virtio/virtio-net.h |   16 +++++++++++++
>>>>>>  3 files changed, 69 insertions(+), 0 deletions(-)
>>>>>>
>>>>>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
>>>>>> index 940a7cf..98d59e9 100644
>>>>>> --- a/hw/net/virtio-net.c
>>>>>> +++ b/hw/net/virtio-net.c
>>>>>> @@ -25,6 +25,7 @@
>>>>>>  #include "monitor/monitor.h"
>>>>>>  
>>>>>>  #define VIRTIO_NET_VM_VERSION    11
>>>>>> +#define VIRTIO_NET_ANNOUNCE_ROUNDS    3
>>>>>>  
>>>>>>  #define MAC_TABLE_ENTRIES    64
>>>>>>  #define MAX_VLAN    (1 << 12)   /* Per 802.1Q definition */
>>>>> I would make it  5 to be consistent with SELF_ANNOUNCE_ROUNDS
>>>>> in savevm.c
>>>>>
>>>>> in fact, let's move SELF_ANNOUNCE_ROUNDS to include/migration/vmstate.h
>>>>> and reuse it.
>>>> Ok.
>>>>>> @@ -99,6 +100,25 @@ static bool virtio_net_started(VirtIONet *n, uint8_t status)
>>>>>>          (n->status & VIRTIO_NET_S_LINK_UP) && vdev->vm_running;
>>>>>>  }
>>>>>>  
>>>>>> +static void virtio_net_announce_timer(void *opaque)
>>>>>> +{
>>>>>> +    VirtIONet *n = opaque;
>>>>>> +    VirtIODevice *vdev = VIRTIO_DEVICE(n);
>>>>>> +
>>>>>> +    if (n->announce &&
>>>>> I would make it > 0 here, just in case it becomes negative as a result
>>>>> of some bug.
>>>> Sure.
>>>>>> +        virtio_net_started(n, vdev->status) &&
>>>>>> +        vdev->guest_features & (0x1 << VIRTIO_NET_F_GUEST_ANNOUNCE) &&
>>>>>> +        vdev->guest_features & (0x1 << VIRTIO_NET_F_CTRL_VQ)) {
>>>>>> +
>>>>>> +        n->announce--;
>>>>>> +        n->status |= VIRTIO_NET_S_ANNOUNCE;
>>>>>> +        virtio_notify_config(vdev);
>>>>>> +    } else {
>>>>>> +        timer_del(n->announce_timer);
>>>>> why do this here?
>>>>>
>>>>>> +        n->announce = 0;
>>>>> why is this here?
>>>>>
>>>> If guest shuts down the device, there's no need to do the announcing.
>>> It's still weird.
>>> Either announce is 0 and then timer was not running
>>> or it's > 0 and then this won't trigger.
>> Right, the logic here is for QEMU_CLOCK_REALTIME. But there's another
>> question, we use QEMU_CLOCK_VIRTUAL while qemu is using
>> QEMU_CLOCK_REALTIME for its announcing. This looks fine but not sure
>> whether this is safe.
> Meaning QEMU_CLOCK_REALTIME that qemu uses?
> Not sure either but it doesn't modify guest state so it seems safe.
>
>
>> And if the link was down, it's better for us to
>> stop the announcing?
> I think that it doesn't matter: guest won't announce when
> link is down, anyway.
> Not worth it to write extra logic here in the host.

Ok.
>>>>>> +    }
>>>>>> +}
>>>>>> +
>>>>>>  static void virtio_net_vhost_status(VirtIONet *n, uint8_t status)
>>>>>>  {
>>>>>>      VirtIODevice *vdev = VIRTIO_DEVICE(n);
>>>>>> @@ -147,6 +167,8 @@ static void virtio_net_set_status(struct VirtIODevice *vdev, uint8_t status)
>>>>>>  
>>>>>>      virtio_net_vhost_status(n, status);
>>>>>>  
>>>>>> +    virtio_net_announce_timer(n);
>>>>>> +
>>>>> why do this here? why not right after we set announce counter?
>>>> The reasons are:
>>>>
>>>> - The counters were set in load, but the device is not running so we
>>>> could not inject the interrupt at that time.
>>> I see. This makes sense - but this isn't intuitive.
>>> Why don't we simply start timer with current time?
>>> Need to make sure it runs fine if time passes, but
>>> I think it's fine.
>> Not sure I get the point, I didn't see any differences except for an
>> extra timer fire.
> The only reason you call virtio_net_announce_timer from  set_status
> is because it gets run on vm start/stop.
> It's true but not intuitive.
> Just run timer always from timer, it's clearer this way :)
>

Sure.
>>>> - We can stop the progress when guest is shutting down the device.
>>> On shut down guest will reset device stopping timer - this seems enough.
>> Yes, I see.
>>>>>>      for (i = 0; i < n->max_queues; i++) {
>>>>>>          q = &n->vqs[i];
>>>>>>  
>>>>>> @@ -322,6 +344,9 @@ static void virtio_net_reset(VirtIODevice *vdev)
>>>>>>      n->nobcast = 0;
>>>>>>      /* multiqueue is disabled by default */
>>>>>>      n->curr_queues = 1;
>>>>>> +    timer_del(n->announce_timer);
>>>>>> +    n->announce = 0;
>>>>>> +    n->status &= ~VIRTIO_NET_S_ANNOUNCE;
>>>>>>  
>>>>>>      /* Flush any MAC and VLAN filter table state */
>>>>>>      n->mac_table.in_use = 0;
>>>>>> @@ -731,6 +756,22 @@ static int virtio_net_handle_vlan_table(VirtIONet *n, uint8_t cmd,
>>>>>>      return VIRTIO_NET_OK;
>>>>>>  }
>>>>>>  
>>>>>> +static int virtio_net_handle_announce(VirtIONet *n, uint8_t cmd,
>>>>>> +                                      struct iovec *iov, unsigned int iov_cnt)
>>>>>> +{
>>>>>> +    if (cmd == VIRTIO_NET_CTRL_ANNOUNCE_ACK) {
>>>>>> +        n->status &= ~VIRTIO_NET_S_ANNOUNCE;
>>>>>> +        if (n->announce) {
>>>>> I would make it > 0 here, just in case it becomes negative as a result
>>>>> of some bug.
>>>> Ok.
>>>>>> +            timer_mod(n->announce_timer,
>>>>>> +                      qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + 50 +
>>>>>> +                      (VIRTIO_NET_ANNOUNCE_ROUNDS - n->announce - 1) * 100);
>>>>> savevm.c has this code, and a comment:
>>>>>         /* delay 50ms, 150ms, 250ms, ... */
>>>>> 	50 + (SELF_ANNOUNCE_ROUNDS - count - 1) * 100
>>>>>
>>>>> how about an API in include/migration/vmstate.h
>>>>>
>>>>> static inline
>>>>> int64_t self_announce_delay(int round)
>>>>> {
>>>>> 	assert(round < SELF_ANNOUNCE_ROUNDS && round > 0);
>>>>>         /* delay 50ms, 150ms, 250ms, ... */
>>>>> 	return 50 + (SELF_ANNOUNCE_ROUNDS - round - 1) * 100;
>>>>> }
>>>>>
>>>>> or something to this end.
>>>>>
>>>> Good idea, will do this.
>>>>>> +        }
>>>>>> +        return VIRTIO_NET_OK;
>>>>>> +    } else {
>>>>>> +        return VIRTIO_NET_ERR;
>>>>>> +    }
>>>>>> +}
>>>>>> +
>>>>>>  static int virtio_net_handle_mq(VirtIONet *n, uint8_t cmd,
>>>>>>                                  struct iovec *iov, unsigned int iov_cnt)
>>>>>>  {
>>>>>> @@ -794,6 +835,8 @@ static void virtio_net_handle_ctrl(VirtIODevice *vdev, VirtQueue *vq)
>>>>>>              status = virtio_net_handle_mac(n, ctrl.cmd, iov, iov_cnt);
>>>>>>          } else if (ctrl.class == VIRTIO_NET_CTRL_VLAN) {
>>>>>>              status = virtio_net_handle_vlan_table(n, ctrl.cmd, iov, iov_cnt);
>>>>>> +        } else if (ctrl.class == VIRTIO_NET_CTRL_ANNOUNCE) {
>>>>>> +            status = virtio_net_handle_announce(n, ctrl.cmd, iov, iov_cnt);
>>>>>>          } else if (ctrl.class == VIRTIO_NET_CTRL_MQ) {
>>>>>>              status = virtio_net_handle_mq(n, ctrl.cmd, iov, iov_cnt);
>>>>>>          } else if (ctrl.class == VIRTIO_NET_CTRL_GUEST_OFFLOADS) {
>>>>>> @@ -1451,6 +1494,7 @@ static int virtio_net_load(QEMUFile *f, void *opaque, int version_id)
>>>>>>          qemu_get_subqueue(n->nic, i)->link_down = link_down;
>>>>>>      }
>>>>>>  
>>>>>> +    n->announce = VIRTIO_NET_ANNOUNCE_ROUNDS;
>>>>> Well if virtio_net_handle_announce runs now it will start timer
>>>>> in the past, right?
>>>>> This seems wrong.
>>>> Not sure I get the case. When in virtio_net_load() the vm is not even
>>>> running so looks like virtio_net_handle_announce() could not run in the
>>>> same time.
>>> I see, this works because you decrement it when VM starts running.
>>> I think it's not a good idea to rely on this though,
>>> better do everything from timer, and handle
>>> the case of command arriving too early.
>>>
>> Right, if QEMU_CLOCK_VIRTUAL is fine, we can do everything in a timer.
>>
>> For the case of command arriving too early. Is there anything else we
>> need to do? Since we only start the next timer when we get ack command.
>>
>> Thanks
> I think we need to make sure we don't set the timer in the past or
> very far in the future.

n->announce is between 0 and VIRTIO_NET_ANNOUNCE_ROUNDS - 1, when doing
timer_mod(), so it looks ok. There's another case that if vm was stopped
for a long time, the timer will also be delayed, but it's still safe in
this case.
Michael S. Tsirkin May 19, 2014, 10:06 a.m. UTC | #7
On Mon, May 19, 2014 at 05:34:38PM +0800, Jason Wang wrote:
> On 05/18/2014 05:04 PM, Michael S. Tsirkin wrote:
> > On Fri, May 16, 2014 at 01:02:51PM +0800, Jason Wang wrote:
> >> On 05/15/2014 05:45 PM, Michael S. Tsirkin wrote:
> >>> On Thu, May 15, 2014 at 05:22:28PM +0800, Jason Wang wrote:
> >>>> On 05/15/2014 04:28 PM, Michael S. Tsirkin wrote:
> >>>>> Thanks, looks good.
> >>>>> Some minor comments below,
> >>>>>
> >>>>> On Thu, May 15, 2014 at 03:16:47PM +0800, Jason Wang wrote:
> >>>>>> It's hard to track all mac addresses and their configurations (e.g
> >>>>>> vlan or ipv6) in qemu. Without those informations, it's impossible to
> >>>>> s/those informations/this information/
> >>>>>
> >>>>>> build proper garp packet after migration. The only possible solution
> >>>>>> to this is let guest (who knew all configurations) to do this.
> >>>>> s/knew/knows/
> >>>>>
> >>>>>> So, this patch introduces a new readonly config status bit of virtio-net,
> >>>>>> VIRTIO_NET_S_ANNOUNCE which is used to notify guest to announce
> >>>>>> presence of its link through config update interrupt.When guest has
> >>>>>> done the announcement, it should ack the notification through
> >>>>>> VIRTIO_NET_CTRL_ANNOUNCE_ACK cmd. This feature is negotiated by a new
> >>>>>> feature bit VIRTIO_NET_F_ANNOUNCE (which has already been supported by
> >>>>>> Linux guest).
> >>>>>>
> >>>>>> During load, a counter of announcing rounds were set so that the after
> >>>>> s/were/is/
> >>>>> s/the after/after/
> >>>> Will correct those typos.
> >>>>>> the vm is running it can trigger rounds of config interrupts to notify
> >>>>>> the guest to build and send the correct garps.
> >>>>>>
> >>>>>> Tested with ping to guest with vlan during migration. Without the
> >>>>>> patch, lots of the packets were lost after migration. With the patch,
> >>>>>> could not notice packet loss after migration.
> >>>>> below changelog should go after ---, until the ack list.
> >>>>>
> >>>> Ok.
> >>>>>> Reference:
> >>>>>> RFC v2: https://lists.gnu.org/archive/html/qemu-devel/2014-04/msg01750.html
> >>>>>> RFC v1: https://lists.gnu.org/archive/html/qemu-devel/2014-03/msg02648.html
> >>>>>> V7:     https://lists.gnu.org/archive/html/qemu-devel/2013-03/msg01127.html
> >>>>>>
> >>>>>> Changes from RFC v2:
> >>>>>> - use QEMU_CLOCK_VIRTUAL instead of QEMU_CLOCK_REALTIME
> >>>>>> - compat self announce for 2.0 machine type
> >>>>>>
> >>>>>> Changes from RFC v1:
> >>>>>> - clean VIRTIO_NET_S_ANNOUNCE bit during reset
> >>>>>> - free announce timer during clean
> >>>>>> - make announce work for non-vhost case
> >>>>>>
> >>>>>> Changes from V7:
> >>>>>> - Instead of introducing a global method for each kind of nic, this
> >>>>>>   version limits the changes to virtio-net itself.
> >>>>>>
> >>>>>> Cc: Liuyongan <liuyongan@huawei.com>
> >>>>>> Cc: Amos Kong <akong@redhat.com>
> >>>>>> Signed-off-by: Jason Wang <jasowang@redhat.com>
> >>>>>> ---
> >>>>>>  hw/net/virtio-net.c            |   48 ++++++++++++++++++++++++++++++++++++++++
> >>>>>>  include/hw/i386/pc.h           |    5 ++++
> >>>>>>  include/hw/virtio/virtio-net.h |   16 +++++++++++++
> >>>>>>  3 files changed, 69 insertions(+), 0 deletions(-)
> >>>>>>
> >>>>>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> >>>>>> index 940a7cf..98d59e9 100644
> >>>>>> --- a/hw/net/virtio-net.c
> >>>>>> +++ b/hw/net/virtio-net.c
> >>>>>> @@ -25,6 +25,7 @@
> >>>>>>  #include "monitor/monitor.h"
> >>>>>>  
> >>>>>>  #define VIRTIO_NET_VM_VERSION    11
> >>>>>> +#define VIRTIO_NET_ANNOUNCE_ROUNDS    3
> >>>>>>  
> >>>>>>  #define MAC_TABLE_ENTRIES    64
> >>>>>>  #define MAX_VLAN    (1 << 12)   /* Per 802.1Q definition */
> >>>>> I would make it  5 to be consistent with SELF_ANNOUNCE_ROUNDS
> >>>>> in savevm.c
> >>>>>
> >>>>> in fact, let's move SELF_ANNOUNCE_ROUNDS to include/migration/vmstate.h
> >>>>> and reuse it.
> >>>> Ok.
> >>>>>> @@ -99,6 +100,25 @@ static bool virtio_net_started(VirtIONet *n, uint8_t status)
> >>>>>>          (n->status & VIRTIO_NET_S_LINK_UP) && vdev->vm_running;
> >>>>>>  }
> >>>>>>  
> >>>>>> +static void virtio_net_announce_timer(void *opaque)
> >>>>>> +{
> >>>>>> +    VirtIONet *n = opaque;
> >>>>>> +    VirtIODevice *vdev = VIRTIO_DEVICE(n);
> >>>>>> +
> >>>>>> +    if (n->announce &&
> >>>>> I would make it > 0 here, just in case it becomes negative as a result
> >>>>> of some bug.
> >>>> Sure.
> >>>>>> +        virtio_net_started(n, vdev->status) &&
> >>>>>> +        vdev->guest_features & (0x1 << VIRTIO_NET_F_GUEST_ANNOUNCE) &&
> >>>>>> +        vdev->guest_features & (0x1 << VIRTIO_NET_F_CTRL_VQ)) {
> >>>>>> +
> >>>>>> +        n->announce--;
> >>>>>> +        n->status |= VIRTIO_NET_S_ANNOUNCE;
> >>>>>> +        virtio_notify_config(vdev);
> >>>>>> +    } else {
> >>>>>> +        timer_del(n->announce_timer);
> >>>>> why do this here?
> >>>>>
> >>>>>> +        n->announce = 0;
> >>>>> why is this here?
> >>>>>
> >>>> If guest shuts down the device, there's no need to do the announcing.
> >>> It's still weird.
> >>> Either announce is 0 and then timer was not running
> >>> or it's > 0 and then this won't trigger.
> >> Right, the logic here is for QEMU_CLOCK_REALTIME. But there's another
> >> question, we use QEMU_CLOCK_VIRTUAL while qemu is using
> >> QEMU_CLOCK_REALTIME for its announcing. This looks fine but not sure
> >> whether this is safe.
> > Meaning QEMU_CLOCK_REALTIME that qemu uses?
> > Not sure either but it doesn't modify guest state so it seems safe.
> >
> >
> >> And if the link was down, it's better for us to
> >> stop the announcing?
> > I think that it doesn't matter: guest won't announce when
> > link is down, anyway.
> > Not worth it to write extra logic here in the host.
> 
> Ok.
> >>>>>> +    }
> >>>>>> +}
> >>>>>> +
> >>>>>>  static void virtio_net_vhost_status(VirtIONet *n, uint8_t status)
> >>>>>>  {
> >>>>>>      VirtIODevice *vdev = VIRTIO_DEVICE(n);
> >>>>>> @@ -147,6 +167,8 @@ static void virtio_net_set_status(struct VirtIODevice *vdev, uint8_t status)
> >>>>>>  
> >>>>>>      virtio_net_vhost_status(n, status);
> >>>>>>  
> >>>>>> +    virtio_net_announce_timer(n);
> >>>>>> +
> >>>>> why do this here? why not right after we set announce counter?
> >>>> The reasons are:
> >>>>
> >>>> - The counters were set in load, but the device is not running so we
> >>>> could not inject the interrupt at that time.
> >>> I see. This makes sense - but this isn't intuitive.
> >>> Why don't we simply start timer with current time?
> >>> Need to make sure it runs fine if time passes, but
> >>> I think it's fine.
> >> Not sure I get the point, I didn't see any differences except for an
> >> extra timer fire.
> > The only reason you call virtio_net_announce_timer from  set_status
> > is because it gets run on vm start/stop.
> > It's true but not intuitive.
> > Just run timer always from timer, it's clearer this way :)
> >
> 
> Sure.
> >>>> - We can stop the progress when guest is shutting down the device.
> >>> On shut down guest will reset device stopping timer - this seems enough.
> >> Yes, I see.
> >>>>>>      for (i = 0; i < n->max_queues; i++) {
> >>>>>>          q = &n->vqs[i];
> >>>>>>  
> >>>>>> @@ -322,6 +344,9 @@ static void virtio_net_reset(VirtIODevice *vdev)
> >>>>>>      n->nobcast = 0;
> >>>>>>      /* multiqueue is disabled by default */
> >>>>>>      n->curr_queues = 1;
> >>>>>> +    timer_del(n->announce_timer);
> >>>>>> +    n->announce = 0;
> >>>>>> +    n->status &= ~VIRTIO_NET_S_ANNOUNCE;
> >>>>>>  
> >>>>>>      /* Flush any MAC and VLAN filter table state */
> >>>>>>      n->mac_table.in_use = 0;
> >>>>>> @@ -731,6 +756,22 @@ static int virtio_net_handle_vlan_table(VirtIONet *n, uint8_t cmd,
> >>>>>>      return VIRTIO_NET_OK;
> >>>>>>  }
> >>>>>>  
> >>>>>> +static int virtio_net_handle_announce(VirtIONet *n, uint8_t cmd,
> >>>>>> +                                      struct iovec *iov, unsigned int iov_cnt)
> >>>>>> +{
> >>>>>> +    if (cmd == VIRTIO_NET_CTRL_ANNOUNCE_ACK) {
> >>>>>> +        n->status &= ~VIRTIO_NET_S_ANNOUNCE;
> >>>>>> +        if (n->announce) {
> >>>>> I would make it > 0 here, just in case it becomes negative as a result
> >>>>> of some bug.
> >>>> Ok.
> >>>>>> +            timer_mod(n->announce_timer,
> >>>>>> +                      qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + 50 +
> >>>>>> +                      (VIRTIO_NET_ANNOUNCE_ROUNDS - n->announce - 1) * 100);
> >>>>> savevm.c has this code, and a comment:
> >>>>>         /* delay 50ms, 150ms, 250ms, ... */
> >>>>> 	50 + (SELF_ANNOUNCE_ROUNDS - count - 1) * 100
> >>>>>
> >>>>> how about an API in include/migration/vmstate.h
> >>>>>
> >>>>> static inline
> >>>>> int64_t self_announce_delay(int round)
> >>>>> {
> >>>>> 	assert(round < SELF_ANNOUNCE_ROUNDS && round > 0);
> >>>>>         /* delay 50ms, 150ms, 250ms, ... */
> >>>>> 	return 50 + (SELF_ANNOUNCE_ROUNDS - round - 1) * 100;
> >>>>> }
> >>>>>
> >>>>> or something to this end.
> >>>>>
> >>>> Good idea, will do this.
> >>>>>> +        }
> >>>>>> +        return VIRTIO_NET_OK;
> >>>>>> +    } else {
> >>>>>> +        return VIRTIO_NET_ERR;
> >>>>>> +    }
> >>>>>> +}
> >>>>>> +
> >>>>>>  static int virtio_net_handle_mq(VirtIONet *n, uint8_t cmd,
> >>>>>>                                  struct iovec *iov, unsigned int iov_cnt)
> >>>>>>  {
> >>>>>> @@ -794,6 +835,8 @@ static void virtio_net_handle_ctrl(VirtIODevice *vdev, VirtQueue *vq)
> >>>>>>              status = virtio_net_handle_mac(n, ctrl.cmd, iov, iov_cnt);
> >>>>>>          } else if (ctrl.class == VIRTIO_NET_CTRL_VLAN) {
> >>>>>>              status = virtio_net_handle_vlan_table(n, ctrl.cmd, iov, iov_cnt);
> >>>>>> +        } else if (ctrl.class == VIRTIO_NET_CTRL_ANNOUNCE) {
> >>>>>> +            status = virtio_net_handle_announce(n, ctrl.cmd, iov, iov_cnt);
> >>>>>>          } else if (ctrl.class == VIRTIO_NET_CTRL_MQ) {
> >>>>>>              status = virtio_net_handle_mq(n, ctrl.cmd, iov, iov_cnt);
> >>>>>>          } else if (ctrl.class == VIRTIO_NET_CTRL_GUEST_OFFLOADS) {
> >>>>>> @@ -1451,6 +1494,7 @@ static int virtio_net_load(QEMUFile *f, void *opaque, int version_id)
> >>>>>>          qemu_get_subqueue(n->nic, i)->link_down = link_down;
> >>>>>>      }
> >>>>>>  
> >>>>>> +    n->announce = VIRTIO_NET_ANNOUNCE_ROUNDS;
> >>>>> Well if virtio_net_handle_announce runs now it will start timer
> >>>>> in the past, right?
> >>>>> This seems wrong.
> >>>> Not sure I get the case. When in virtio_net_load() the vm is not even
> >>>> running so looks like virtio_net_handle_announce() could not run in the
> >>>> same time.
> >>> I see, this works because you decrement it when VM starts running.
> >>> I think it's not a good idea to rely on this though,
> >>> better do everything from timer, and handle
> >>> the case of command arriving too early.
> >>>
> >> Right, if QEMU_CLOCK_VIRTUAL is fine, we can do everything in a timer.
> >>
> >> For the case of command arriving too early. Is there anything else we
> >> need to do? Since we only start the next timer when we get ack command.
> >>
> >> Thanks
> > I think we need to make sure we don't set the timer in the past or
> > very far in the future.
> 
> n->announce is between 0 and VIRTIO_NET_ANNOUNCE_ROUNDS - 1, when doing
> timer_mod(), so it looks ok.

All I am saying, make sure it's within the range even if
handle announce is called before timer_mod.
Can not happen with your current patch, but must make sure
it's still correct after you make changes :)
Also, maybe add an assert there.


> There's another case that if vm was stopped
> for a long time, the timer will also be delayed, but it's still safe in
> this case.
Jason Wang May 20, 2014, 6:06 a.m. UTC | #8
On 05/19/2014 06:06 PM, Michael S. Tsirkin wrote:
> On Mon, May 19, 2014 at 05:34:38PM +0800, Jason Wang wrote:
>> > On 05/18/2014 05:04 PM, Michael S. Tsirkin wrote:
>>> > > On Fri, May 16, 2014 at 01:02:51PM +0800, Jason Wang wrote:
>>>> > >> On 05/15/2014 05:45 PM, Michael S. Tsirkin wrote:
>>>>> > >>> On Thu, May 15, 2014 at 05:22:28PM +0800, Jason Wang wrote:
>>>>>> > >>>> On 05/15/2014 04:28 PM, Michael S. Tsirkin wrote:
>>>>>>> > >>>>> Thanks, looks good.
>>>>>>> > >>>>> Some minor comments below,
>>>>>>> > >>>>>
>>>>>>> > >>>>> On Thu, May 15, 2014 at 03:16:47PM +0800, Jason Wang wrote:
>>>>>>>> > >>>>>> It's hard to track all mac addresses and their configurations (e.g
>>>>>>>> > >>>>>> vlan or ipv6) in qemu. Without those informations, it's impossible to
>>>>>>> > >>>>> s/those informations/this information/
>>>>>>> > >>>>>
>>>>>>>> > >>>>>> build proper garp packet after migration. The only possible solution
>>>>>>>> > >>>>>> to this is let guest (who knew all configurations) to do this.
>>>>>>> > >>>>> s/knew/knows/
>>>>>>> > >>>>>
>>>>>>>> > >>>>>> So, this patch introduces a new readonly config status bit of virtio-net,
>>>>>>>> > >>>>>> VIRTIO_NET_S_ANNOUNCE which is used to notify guest to announce
>>>>>>>> > >>>>>> presence of its link through config update interrupt.When guest has
>>>>>>>> > >>>>>> done the announcement, it should ack the notification through
>>>>>>>> > >>>>>> VIRTIO_NET_CTRL_ANNOUNCE_ACK cmd. This feature is negotiated by a new
>>>>>>>> > >>>>>> feature bit VIRTIO_NET_F_ANNOUNCE (which has already been supported by
>>>>>>>> > >>>>>> Linux guest).
>>>>>>>> > >>>>>>
>>>>>>>> > >>>>>> During load, a counter of announcing rounds were set so that the after
>>>>>>> > >>>>> s/were/is/
>>>>>>> > >>>>> s/the after/after/
>>>>>> > >>>> Will correct those typos.
>>>>>>>> > >>>>>> the vm is running it can trigger rounds of config interrupts to notify
>>>>>>>> > >>>>>> the guest to build and send the correct garps.
>>>>>>>> > >>>>>>
>>>>>>>> > >>>>>> Tested with ping to guest with vlan during migration. Without the
>>>>>>>> > >>>>>> patch, lots of the packets were lost after migration. With the patch,
>>>>>>>> > >>>>>> could not notice packet loss after migration.
>>>>>>> > >>>>> below changelog should go after ---, until the ack list.
>>>>>>> > >>>>>
>>>>>> > >>>> Ok.
>>>>>>>> > >>>>>> Reference:
>>>>>>>> > >>>>>> RFC v2: https://lists.gnu.org/archive/html/qemu-devel/2014-04/msg01750.html
>>>>>>>> > >>>>>> RFC v1: https://lists.gnu.org/archive/html/qemu-devel/2014-03/msg02648.html
>>>>>>>> > >>>>>> V7:     https://lists.gnu.org/archive/html/qemu-devel/2013-03/msg01127.html
>>>>>>>> > >>>>>>
>>>>>>>> > >>>>>> Changes from RFC v2:
>>>>>>>> > >>>>>> - use QEMU_CLOCK_VIRTUAL instead of QEMU_CLOCK_REALTIME
>>>>>>>> > >>>>>> - compat self announce for 2.0 machine type
>>>>>>>> > >>>>>>
>>>>>>>> > >>>>>> Changes from RFC v1:
>>>>>>>> > >>>>>> - clean VIRTIO_NET_S_ANNOUNCE bit during reset
>>>>>>>> > >>>>>> - free announce timer during clean
>>>>>>>> > >>>>>> - make announce work for non-vhost case
>>>>>>>> > >>>>>>
>>>>>>>> > >>>>>> Changes from V7:
>>>>>>>> > >>>>>> - Instead of introducing a global method for each kind of nic, this
>>>>>>>> > >>>>>>   version limits the changes to virtio-net itself.
>>>>>>>> > >>>>>>
>>>>>>>> > >>>>>> Cc: Liuyongan <liuyongan@huawei.com>
>>>>>>>> > >>>>>> Cc: Amos Kong <akong@redhat.com>
>>>>>>>> > >>>>>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>>>>>>>> > >>>>>> ---
>>>>>>>> > >>>>>>  hw/net/virtio-net.c            |   48 ++++++++++++++++++++++++++++++++++++++++
>>>>>>>> > >>>>>>  include/hw/i386/pc.h           |    5 ++++
>>>>>>>> > >>>>>>  include/hw/virtio/virtio-net.h |   16 +++++++++++++
>>>>>>>> > >>>>>>  3 files changed, 69 insertions(+), 0 deletions(-)
>>>>>>>> > >>>>>>
>>>>>>>> > >>>>>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
>>>>>>>> > >>>>>> index 940a7cf..98d59e9 100644
>>>>>>>> > >>>>>> --- a/hw/net/virtio-net.c
>>>>>>>> > >>>>>> +++ b/hw/net/virtio-net.c
>>>>>>>> > >>>>>> @@ -25,6 +25,7 @@
>>>>>>>> > >>>>>>  #include "monitor/monitor.h"
>>>>>>>> > >>>>>>  
>>>>>>>> > >>>>>>  #define VIRTIO_NET_VM_VERSION    11
>>>>>>>> > >>>>>> +#define VIRTIO_NET_ANNOUNCE_ROUNDS    3
>>>>>>>> > >>>>>>  
>>>>>>>> > >>>>>>  #define MAC_TABLE_ENTRIES    64
>>>>>>>> > >>>>>>  #define MAX_VLAN    (1 << 12)   /* Per 802.1Q definition */
>>>>>>> > >>>>> I would make it  5 to be consistent with SELF_ANNOUNCE_ROUNDS
>>>>>>> > >>>>> in savevm.c
>>>>>>> > >>>>>
>>>>>>> > >>>>> in fact, let's move SELF_ANNOUNCE_ROUNDS to include/migration/vmstate.h
>>>>>>> > >>>>> and reuse it.
>>>>>> > >>>> Ok.
>>>>>>>> > >>>>>> @@ -99,6 +100,25 @@ static bool virtio_net_started(VirtIONet *n, uint8_t status)
>>>>>>>> > >>>>>>          (n->status & VIRTIO_NET_S_LINK_UP) && vdev->vm_running;
>>>>>>>> > >>>>>>  }
>>>>>>>> > >>>>>>  
>>>>>>>> > >>>>>> +static void virtio_net_announce_timer(void *opaque)
>>>>>>>> > >>>>>> +{
>>>>>>>> > >>>>>> +    VirtIONet *n = opaque;
>>>>>>>> > >>>>>> +    VirtIODevice *vdev = VIRTIO_DEVICE(n);
>>>>>>>> > >>>>>> +
>>>>>>>> > >>>>>> +    if (n->announce &&
>>>>>>> > >>>>> I would make it > 0 here, just in case it becomes negative as a result
>>>>>>> > >>>>> of some bug.
>>>>>> > >>>> Sure.
>>>>>>>> > >>>>>> +        virtio_net_started(n, vdev->status) &&
>>>>>>>> > >>>>>> +        vdev->guest_features & (0x1 << VIRTIO_NET_F_GUEST_ANNOUNCE) &&
>>>>>>>> > >>>>>> +        vdev->guest_features & (0x1 << VIRTIO_NET_F_CTRL_VQ)) {
>>>>>>>> > >>>>>> +
>>>>>>>> > >>>>>> +        n->announce--;
>>>>>>>> > >>>>>> +        n->status |= VIRTIO_NET_S_ANNOUNCE;
>>>>>>>> > >>>>>> +        virtio_notify_config(vdev);
>>>>>>>> > >>>>>> +    } else {
>>>>>>>> > >>>>>> +        timer_del(n->announce_timer);
>>>>>>> > >>>>> why do this here?
>>>>>>> > >>>>>
>>>>>>>> > >>>>>> +        n->announce = 0;
>>>>>>> > >>>>> why is this here?
>>>>>>> > >>>>>
>>>>>> > >>>> If guest shuts down the device, there's no need to do the announcing.
>>>>> > >>> It's still weird.
>>>>> > >>> Either announce is 0 and then timer was not running
>>>>> > >>> or it's > 0 and then this won't trigger.
>>>> > >> Right, the logic here is for QEMU_CLOCK_REALTIME. But there's another
>>>> > >> question, we use QEMU_CLOCK_VIRTUAL while qemu is using
>>>> > >> QEMU_CLOCK_REALTIME for its announcing. This looks fine but not sure
>>>> > >> whether this is safe.
>>> > > Meaning QEMU_CLOCK_REALTIME that qemu uses?
>>> > > Not sure either but it doesn't modify guest state so it seems safe.
>>> > >
>>> > >
>>>> > >> And if the link was down, it's better for us to
>>>> > >> stop the announcing?
>>> > > I think that it doesn't matter: guest won't announce when
>>> > > link is down, anyway.
>>> > > Not worth it to write extra logic here in the host.
>> > 
>> > Ok.
>>>>>>>> > >>>>>> +    }
>>>>>>>> > >>>>>> +}
>>>>>>>> > >>>>>> +
>>>>>>>> > >>>>>>  static void virtio_net_vhost_status(VirtIONet *n, uint8_t status)
>>>>>>>> > >>>>>>  {
>>>>>>>> > >>>>>>      VirtIODevice *vdev = VIRTIO_DEVICE(n);
>>>>>>>> > >>>>>> @@ -147,6 +167,8 @@ static void virtio_net_set_status(struct VirtIODevice *vdev, uint8_t status)
>>>>>>>> > >>>>>>  
>>>>>>>> > >>>>>>      virtio_net_vhost_status(n, status);
>>>>>>>> > >>>>>>  
>>>>>>>> > >>>>>> +    virtio_net_announce_timer(n);
>>>>>>>> > >>>>>> +
>>>>>>> > >>>>> why do this here? why not right after we set announce counter?
>>>>>> > >>>> The reasons are:
>>>>>> > >>>>
>>>>>> > >>>> - The counters were set in load, but the device is not running so we
>>>>>> > >>>> could not inject the interrupt at that time.
>>>>> > >>> I see. This makes sense - but this isn't intuitive.
>>>>> > >>> Why don't we simply start timer with current time?
>>>>> > >>> Need to make sure it runs fine if time passes, but
>>>>> > >>> I think it's fine.
>>>> > >> Not sure I get the point, I didn't see any differences except for an
>>>> > >> extra timer fire.
>>> > > The only reason you call virtio_net_announce_timer from  set_status
>>> > > is because it gets run on vm start/stop.
>>> > > It's true but not intuitive.
>>> > > Just run timer always from timer, it's clearer this way :)
>>> > >
>> > 
>> > Sure.
>>>>>> > >>>> - We can stop the progress when guest is shutting down the device.
>>>>> > >>> On shut down guest will reset device stopping timer - this seems enough.
>>>> > >> Yes, I see.
>>>>>>>> > >>>>>>      for (i = 0; i < n->max_queues; i++) {
>>>>>>>> > >>>>>>          q = &n->vqs[i];
>>>>>>>> > >>>>>>  
>>>>>>>> > >>>>>> @@ -322,6 +344,9 @@ static void virtio_net_reset(VirtIODevice *vdev)
>>>>>>>> > >>>>>>      n->nobcast = 0;
>>>>>>>> > >>>>>>      /* multiqueue is disabled by default */
>>>>>>>> > >>>>>>      n->curr_queues = 1;
>>>>>>>> > >>>>>> +    timer_del(n->announce_timer);
>>>>>>>> > >>>>>> +    n->announce = 0;
>>>>>>>> > >>>>>> +    n->status &= ~VIRTIO_NET_S_ANNOUNCE;
>>>>>>>> > >>>>>>  
>>>>>>>> > >>>>>>      /* Flush any MAC and VLAN filter table state */
>>>>>>>> > >>>>>>      n->mac_table.in_use = 0;
>>>>>>>> > >>>>>> @@ -731,6 +756,22 @@ static int virtio_net_handle_vlan_table(VirtIONet *n, uint8_t cmd,
>>>>>>>> > >>>>>>      return VIRTIO_NET_OK;
>>>>>>>> > >>>>>>  }
>>>>>>>> > >>>>>>  
>>>>>>>> > >>>>>> +static int virtio_net_handle_announce(VirtIONet *n, uint8_t cmd,
>>>>>>>> > >>>>>> +                                      struct iovec *iov, unsigned int iov_cnt)
>>>>>>>> > >>>>>> +{
>>>>>>>> > >>>>>> +    if (cmd == VIRTIO_NET_CTRL_ANNOUNCE_ACK) {
>>>>>>>> > >>>>>> +        n->status &= ~VIRTIO_NET_S_ANNOUNCE;
>>>>>>>> > >>>>>> +        if (n->announce) {
>>>>>>> > >>>>> I would make it > 0 here, just in case it becomes negative as a result
>>>>>>> > >>>>> of some bug.
>>>>>> > >>>> Ok.
>>>>>>>> > >>>>>> +            timer_mod(n->announce_timer,
>>>>>>>> > >>>>>> +                      qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + 50 +
>>>>>>>> > >>>>>> +                      (VIRTIO_NET_ANNOUNCE_ROUNDS - n->announce - 1) * 100);
>>>>>>> > >>>>> savevm.c has this code, and a comment:
>>>>>>> > >>>>>         /* delay 50ms, 150ms, 250ms, ... */
>>>>>>> > >>>>> 	50 + (SELF_ANNOUNCE_ROUNDS - count - 1) * 100
>>>>>>> > >>>>>
>>>>>>> > >>>>> how about an API in include/migration/vmstate.h
>>>>>>> > >>>>>
>>>>>>> > >>>>> static inline
>>>>>>> > >>>>> int64_t self_announce_delay(int round)
>>>>>>> > >>>>> {
>>>>>>> > >>>>> 	assert(round < SELF_ANNOUNCE_ROUNDS && round > 0);
>>>>>>> > >>>>>         /* delay 50ms, 150ms, 250ms, ... */
>>>>>>> > >>>>> 	return 50 + (SELF_ANNOUNCE_ROUNDS - round - 1) * 100;
>>>>>>> > >>>>> }
>>>>>>> > >>>>>
>>>>>>> > >>>>> or something to this end.
>>>>>>> > >>>>>
>>>>>> > >>>> Good idea, will do this.
>>>>>>>> > >>>>>> +        }
>>>>>>>> > >>>>>> +        return VIRTIO_NET_OK;
>>>>>>>> > >>>>>> +    } else {
>>>>>>>> > >>>>>> +        return VIRTIO_NET_ERR;
>>>>>>>> > >>>>>> +    }
>>>>>>>> > >>>>>> +}
>>>>>>>> > >>>>>> +
>>>>>>>> > >>>>>>  static int virtio_net_handle_mq(VirtIONet *n, uint8_t cmd,
>>>>>>>> > >>>>>>                                  struct iovec *iov, unsigned int iov_cnt)
>>>>>>>> > >>>>>>  {
>>>>>>>> > >>>>>> @@ -794,6 +835,8 @@ static void virtio_net_handle_ctrl(VirtIODevice *vdev, VirtQueue *vq)
>>>>>>>> > >>>>>>              status = virtio_net_handle_mac(n, ctrl.cmd, iov, iov_cnt);
>>>>>>>> > >>>>>>          } else if (ctrl.class == VIRTIO_NET_CTRL_VLAN) {
>>>>>>>> > >>>>>>              status = virtio_net_handle_vlan_table(n, ctrl.cmd, iov, iov_cnt);
>>>>>>>> > >>>>>> +        } else if (ctrl.class == VIRTIO_NET_CTRL_ANNOUNCE) {
>>>>>>>> > >>>>>> +            status = virtio_net_handle_announce(n, ctrl.cmd, iov, iov_cnt);
>>>>>>>> > >>>>>>          } else if (ctrl.class == VIRTIO_NET_CTRL_MQ) {
>>>>>>>> > >>>>>>              status = virtio_net_handle_mq(n, ctrl.cmd, iov, iov_cnt);
>>>>>>>> > >>>>>>          } else if (ctrl.class == VIRTIO_NET_CTRL_GUEST_OFFLOADS) {
>>>>>>>> > >>>>>> @@ -1451,6 +1494,7 @@ static int virtio_net_load(QEMUFile *f, void *opaque, int version_id)
>>>>>>>> > >>>>>>          qemu_get_subqueue(n->nic, i)->link_down = link_down;
>>>>>>>> > >>>>>>      }
>>>>>>>> > >>>>>>  
>>>>>>>> > >>>>>> +    n->announce = VIRTIO_NET_ANNOUNCE_ROUNDS;
>>>>>>> > >>>>> Well if virtio_net_handle_announce runs now it will start timer
>>>>>>> > >>>>> in the past, right?
>>>>>>> > >>>>> This seems wrong.
>>>>>> > >>>> Not sure I get the case. When in virtio_net_load() the vm is not even
>>>>>> > >>>> running so looks like virtio_net_handle_announce() could not run in the
>>>>>> > >>>> same time.
>>>>> > >>> I see, this works because you decrement it when VM starts running.
>>>>> > >>> I think it's not a good idea to rely on this though,
>>>>> > >>> better do everything from timer, and handle
>>>>> > >>> the case of command arriving too early.
>>>>> > >>>
>>>> > >> Right, if QEMU_CLOCK_VIRTUAL is fine, we can do everything in a timer.
>>>> > >>
>>>> > >> For the case of command arriving too early. Is there anything else we
>>>> > >> need to do? Since we only start the next timer when we get ack command.
>>>> > >>
>>>> > >> Thanks
>>> > > I think we need to make sure we don't set the timer in the past or
>>> > > very far in the future.
>> > 
>> > n->announce is between 0 and VIRTIO_NET_ANNOUNCE_ROUNDS - 1, when doing
>> > timer_mod(), so it looks ok.
> All I am saying, make sure it's within the range even if
> handle announce is called before timer_mod.
> Can not happen with your current patch, but must make sure
> it's still correct after you make changes :)
> Also, maybe add an assert there.
>

I get your meaning. Will post V2.

Thanks
diff mbox

Patch

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 940a7cf..98d59e9 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -25,6 +25,7 @@ 
 #include "monitor/monitor.h"
 
 #define VIRTIO_NET_VM_VERSION    11
+#define VIRTIO_NET_ANNOUNCE_ROUNDS    3
 
 #define MAC_TABLE_ENTRIES    64
 #define MAX_VLAN    (1 << 12)   /* Per 802.1Q definition */
@@ -99,6 +100,25 @@  static bool virtio_net_started(VirtIONet *n, uint8_t status)
         (n->status & VIRTIO_NET_S_LINK_UP) && vdev->vm_running;
 }
 
+static void virtio_net_announce_timer(void *opaque)
+{
+    VirtIONet *n = opaque;
+    VirtIODevice *vdev = VIRTIO_DEVICE(n);
+
+    if (n->announce &&
+        virtio_net_started(n, vdev->status) &&
+        vdev->guest_features & (0x1 << VIRTIO_NET_F_GUEST_ANNOUNCE) &&
+        vdev->guest_features & (0x1 << VIRTIO_NET_F_CTRL_VQ)) {
+
+        n->announce--;
+        n->status |= VIRTIO_NET_S_ANNOUNCE;
+        virtio_notify_config(vdev);
+    } else {
+        timer_del(n->announce_timer);
+        n->announce = 0;
+    }
+}
+
 static void virtio_net_vhost_status(VirtIONet *n, uint8_t status)
 {
     VirtIODevice *vdev = VIRTIO_DEVICE(n);
@@ -147,6 +167,8 @@  static void virtio_net_set_status(struct VirtIODevice *vdev, uint8_t status)
 
     virtio_net_vhost_status(n, status);
 
+    virtio_net_announce_timer(n);
+
     for (i = 0; i < n->max_queues; i++) {
         q = &n->vqs[i];
 
@@ -322,6 +344,9 @@  static void virtio_net_reset(VirtIODevice *vdev)
     n->nobcast = 0;
     /* multiqueue is disabled by default */
     n->curr_queues = 1;
+    timer_del(n->announce_timer);
+    n->announce = 0;
+    n->status &= ~VIRTIO_NET_S_ANNOUNCE;
 
     /* Flush any MAC and VLAN filter table state */
     n->mac_table.in_use = 0;
@@ -731,6 +756,22 @@  static int virtio_net_handle_vlan_table(VirtIONet *n, uint8_t cmd,
     return VIRTIO_NET_OK;
 }
 
+static int virtio_net_handle_announce(VirtIONet *n, uint8_t cmd,
+                                      struct iovec *iov, unsigned int iov_cnt)
+{
+    if (cmd == VIRTIO_NET_CTRL_ANNOUNCE_ACK) {
+        n->status &= ~VIRTIO_NET_S_ANNOUNCE;
+        if (n->announce) {
+            timer_mod(n->announce_timer,
+                      qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + 50 +
+                      (VIRTIO_NET_ANNOUNCE_ROUNDS - n->announce - 1) * 100);
+        }
+        return VIRTIO_NET_OK;
+    } else {
+        return VIRTIO_NET_ERR;
+    }
+}
+
 static int virtio_net_handle_mq(VirtIONet *n, uint8_t cmd,
                                 struct iovec *iov, unsigned int iov_cnt)
 {
@@ -794,6 +835,8 @@  static void virtio_net_handle_ctrl(VirtIODevice *vdev, VirtQueue *vq)
             status = virtio_net_handle_mac(n, ctrl.cmd, iov, iov_cnt);
         } else if (ctrl.class == VIRTIO_NET_CTRL_VLAN) {
             status = virtio_net_handle_vlan_table(n, ctrl.cmd, iov, iov_cnt);
+        } else if (ctrl.class == VIRTIO_NET_CTRL_ANNOUNCE) {
+            status = virtio_net_handle_announce(n, ctrl.cmd, iov, iov_cnt);
         } else if (ctrl.class == VIRTIO_NET_CTRL_MQ) {
             status = virtio_net_handle_mq(n, ctrl.cmd, iov, iov_cnt);
         } else if (ctrl.class == VIRTIO_NET_CTRL_GUEST_OFFLOADS) {
@@ -1451,6 +1494,7 @@  static int virtio_net_load(QEMUFile *f, void *opaque, int version_id)
         qemu_get_subqueue(n->nic, i)->link_down = link_down;
     }
 
+    n->announce = VIRTIO_NET_ANNOUNCE_ROUNDS;
     return 0;
 }
 
@@ -1562,6 +1606,8 @@  static void virtio_net_device_realize(DeviceState *dev, Error **errp)
     qemu_macaddr_default_if_unset(&n->nic_conf.macaddr);
     memcpy(&n->mac[0], &n->nic_conf.macaddr, sizeof(n->mac));
     n->status = VIRTIO_NET_S_LINK_UP;
+    n->announce_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL,
+                                     virtio_net_announce_timer, n);
 
     if (n->netclient_type) {
         /*
@@ -1642,6 +1688,8 @@  static void virtio_net_device_unrealize(DeviceState *dev, Error **errp)
         }
     }
 
+    timer_del(n->announce_timer);
+    timer_free(n->announce_timer);
     g_free(n->vqs);
     qemu_del_nic(n->nic);
     virtio_cleanup(vdev);
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index 32a7687..f93b427 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -271,6 +271,11 @@  bool e820_get_entry(int, uint32_t, uint64_t *, uint64_t *);
             .driver   = "apic",\
             .property = "version",\
             .value    = stringify(0x11),\
+        },\
+        {\
+            .driver   = "virtio-net-pci",\
+            .property = "guest_announce",\
+            .value    = "off",\
         }
 
 #define PC_COMPAT_1_7 \
diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
index 4b32440..ca1a9c5 100644
--- a/include/hw/virtio/virtio-net.h
+++ b/include/hw/virtio/virtio-net.h
@@ -49,12 +49,14 @@ 
 #define VIRTIO_NET_F_CTRL_RX    18      /* Control channel RX mode support */
 #define VIRTIO_NET_F_CTRL_VLAN  19      /* Control channel VLAN filtering */
 #define VIRTIO_NET_F_CTRL_RX_EXTRA 20   /* Extra RX mode control support */
+#define VIRTIO_NET_F_GUEST_ANNOUNCE 21  /* Guest can announce itself */
 #define VIRTIO_NET_F_MQ         22      /* Device supports Receive Flow
                                          * Steering */
 
 #define VIRTIO_NET_F_CTRL_MAC_ADDR   23 /* Set MAC address */
 
 #define VIRTIO_NET_S_LINK_UP    1       /* Link is up */
+#define VIRTIO_NET_S_ANNOUNCE   2       /* Announcement is needed */
 
 #define TX_TIMER_INTERVAL 150000 /* 150 us */
 
@@ -193,6 +195,8 @@  typedef struct VirtIONet {
     char *netclient_name;
     char *netclient_type;
     uint64_t curr_guest_offloads;
+    QEMUTimer *announce_timer;
+    int announce;
 } VirtIONet;
 
 #define VIRTIO_NET_CTRL_MAC    1
@@ -213,6 +217,17 @@  typedef struct VirtIONet {
  #define VIRTIO_NET_CTRL_VLAN_DEL             1
 
 /*
+ * Control link announce acknowledgement
+ *
+ * The command VIRTIO_NET_CTRL_ANNOUNCE_ACK is used to indicate that
+ * driver has recevied the notification and device would clear the
+ * VIRTIO_NET_S_ANNOUNCE bit in the status filed after it received
+ * this command.
+ */
+#define VIRTIO_NET_CTRL_ANNOUNCE       3
+ #define VIRTIO_NET_CTRL_ANNOUNCE_ACK         0
+
+/*
  * Control Multiqueue
  *
  * The command VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET
@@ -251,6 +266,7 @@  struct virtio_net_ctrl_mq {
         DEFINE_PROP_BIT("guest_tso6", _state, _field, VIRTIO_NET_F_GUEST_TSO6, true), \
         DEFINE_PROP_BIT("guest_ecn", _state, _field, VIRTIO_NET_F_GUEST_ECN, true), \
         DEFINE_PROP_BIT("guest_ufo", _state, _field, VIRTIO_NET_F_GUEST_UFO, true), \
+        DEFINE_PROP_BIT("guest_announce", _state, _field, VIRTIO_NET_F_GUEST_ANNOUNCE, true), \
         DEFINE_PROP_BIT("host_tso4", _state, _field, VIRTIO_NET_F_HOST_TSO4, true), \
         DEFINE_PROP_BIT("host_tso6", _state, _field, VIRTIO_NET_F_HOST_TSO6, true), \
         DEFINE_PROP_BIT("host_ecn", _state, _field, VIRTIO_NET_F_HOST_ECN, true), \