diff mbox

[v6] net: add support of mac-programming over macvtap in QEMU side

Message ID 1371195952-13922-1-git-send-email-akong@redhat.com
State New
Headers show

Commit Message

Amos Kong June 14, 2013, 7:45 a.m. UTC
Currently macvtap based macvlan device is working in promiscuous
mode, we want to implement mac-programming over macvtap through
Libvirt for better performance.

Design:
 QEMU notifies Libvirt when rx-filter config is changed in guest,
 then Libvirt query the rx-filter information by a monitor command,
 and sync the change to macvtap device. Related rx-filter config
 of the nic contains main mac, rx-mode items and vlan table.

This patch adds a QMP event to notify management of rx-filter change,
and adds a monitor command for management to query rx-filter
information.

Test:
 If we repeatedly add/remove vlan, and change macaddr of vlan
 interfaces in guest by a loop script.

Result:
 The events will flood the QMP client(management), management takes
 too much resource to process the events.

 Event_throttle API (set rate to 1 ms) can avoid the events to flood
 QMP client, but it could cause an unexpected delay (~1ms), guests
 guests normally expect rx-filter updates immediately.

 So we use a flag for each nic to avoid events flooding, the event
 is emitted once until the query command is executed. The flag
 implementation could not introduce unexpected delay.

There maybe exist an uncontrollable delay if we let Libvirt do the
real change, guests normally expect rx-filter updates immediately.
But it's another separate issue, we can investigate it when the
work in Libvirt side is done.

Signed-off-by: Amos Kong <akong@redhat.com>
---
v2: add argument to filter mac-table info of single nic (Stefan)
    update the document, add event notification
v3: rename to rx-filter, add main mac, avoid events flooding (MST)
    fix error process (Stefan), fix qmp interface (Eric)
v4: process qerror in hmp, cleanup (Luiz)
    set flag for each device, add device path in event, add
    helper for g_strdup_printf (MST)
    fix qmp document (Eric)
v5: add path in doc, define notify flag to unsigned (Eric)
    add vlan table (Jason), drop monitor cmd
v6: return vids list, add test results/analysis to commitlog
---
 QMP/qmp-events.txt        |  17 ++++++++
 hw/net/virtio-net.c       | 108 ++++++++++++++++++++++++++++++++++++++++++++++
 include/monitor/monitor.h |   1 +
 include/net/net.h         |   3 ++
 monitor.c                 |   1 +
 net/net.c                 |  47 ++++++++++++++++++++
 qapi-schema.json          |  75 ++++++++++++++++++++++++++++++++
 qmp-commands.hx           |  63 +++++++++++++++++++++++++++
 8 files changed, 315 insertions(+)

Comments

Luiz Capitulino June 17, 2013, 1:11 p.m. UTC | #1
On Fri, 14 Jun 2013 15:45:52 +0800
Amos Kong <akong@redhat.com> wrote:

> Currently macvtap based macvlan device is working in promiscuous
> mode, we want to implement mac-programming over macvtap through
> Libvirt for better performance.
> 
> Design:
>  QEMU notifies Libvirt when rx-filter config is changed in guest,
>  then Libvirt query the rx-filter information by a monitor command,
>  and sync the change to macvtap device. Related rx-filter config
>  of the nic contains main mac, rx-mode items and vlan table.
> 
> This patch adds a QMP event to notify management of rx-filter change,
> and adds a monitor command for management to query rx-filter
> information.
> 
> Test:
>  If we repeatedly add/remove vlan, and change macaddr of vlan
>  interfaces in guest by a loop script.
> 
> Result:
>  The events will flood the QMP client(management), management takes
>  too much resource to process the events.
> 
>  Event_throttle API (set rate to 1 ms) can avoid the events to flood

I doubt this is a valid value. Today, the three events that use the event
throttle API set the delay rate to 1000 ms.

>  QMP client, but it could cause an unexpected delay (~1ms), guests
>  guests normally expect rx-filter updates immediately.

What you mean by "immediately"? There's a round-trip to the host plus
all the stuff QEMU will execute to fulfil the request. And how did you
measure this, btw?

What you have to do is is to measure your test-case in three different
scenarios:

 1. Against upstream QEMU (ie. no patches)
 2. With the event throttle API
 3. With this patch

Only then you'll be able which is better.

> 
>  So we use a flag for each nic to avoid events flooding, the event
>  is emitted once until the query command is executed. The flag
>  implementation could not introduce unexpected delay.
> 
> There maybe exist an uncontrollable delay if we let Libvirt do the
> real change, guests normally expect rx-filter updates immediately.
> But it's another separate issue, we can investigate it when the
> work in Libvirt side is done.
> 
> Signed-off-by: Amos Kong <akong@redhat.com>
> ---
> v2: add argument to filter mac-table info of single nic (Stefan)
>     update the document, add event notification
> v3: rename to rx-filter, add main mac, avoid events flooding (MST)
>     fix error process (Stefan), fix qmp interface (Eric)
> v4: process qerror in hmp, cleanup (Luiz)
>     set flag for each device, add device path in event, add
>     helper for g_strdup_printf (MST)
>     fix qmp document (Eric)
> v5: add path in doc, define notify flag to unsigned (Eric)
>     add vlan table (Jason), drop monitor cmd
> v6: return vids list, add test results/analysis to commitlog
> ---
>  QMP/qmp-events.txt        |  17 ++++++++
>  hw/net/virtio-net.c       | 108 ++++++++++++++++++++++++++++++++++++++++++++++
>  include/monitor/monitor.h |   1 +
>  include/net/net.h         |   3 ++
>  monitor.c                 |   1 +
>  net/net.c                 |  47 ++++++++++++++++++++
>  qapi-schema.json          |  75 ++++++++++++++++++++++++++++++++
>  qmp-commands.hx           |  63 +++++++++++++++++++++++++++
>  8 files changed, 315 insertions(+)
> 
> diff --git a/QMP/qmp-events.txt b/QMP/qmp-events.txt
> index 24e804e9..39b6016 100644
> --- a/QMP/qmp-events.txt
> +++ b/QMP/qmp-events.txt
> @@ -172,6 +172,23 @@ Data:
>    },
>    "timestamp": { "seconds": 1265044230, "microseconds": 450486 } }
>  
> +NIC_RX_FILTER_CHANGED
> +-----------------
> +
> +The event is emitted once until the query command is executed,
> +the first event will always be emitted.
> +
> +Data:
> +
> +- "name": net client name (json-string)
> +- "path": device path (json-string)
> +
> +{ "event": "NIC_RX_FILTER_CHANGED",
> +  "data": { "name": "vnet0",
> +            "path": "/machine/peripheral/vnet0/virtio-backend" },
> +  "timestamp": { "seconds": 1368697518, "microseconds": 326866 } }
> +}
> +
>  RESET
>  -----
>  
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index 1ea9556..4137959 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -21,6 +21,8 @@
>  #include "hw/virtio/virtio-net.h"
>  #include "net/vhost_net.h"
>  #include "hw/virtio/virtio-bus.h"
> +#include "qapi/qmp/qjson.h"
> +#include "monitor/monitor.h"
>  
>  #define VIRTIO_NET_VM_VERSION    11
>  
> @@ -192,6 +194,100 @@ static void virtio_net_set_link_status(NetClientState *nc)
>      virtio_net_set_status(vdev, vdev->status);
>  }
>  
> +static void rxfilter_notify(NetClientState *nc)
> +{
> +    QObject *event_data;
> +    VirtIONet *n = qemu_get_nic_opaque(nc);
> +
> +    if (nc->rxfilter_notify_enabled) {
> +        event_data = qobject_from_jsonf("{ 'name': %s, 'path': %s }",
> +                           n->netclient_name,
> +                           object_get_canonical_path(OBJECT(n->qdev)));
> +        monitor_protocol_event(QEVENT_NIC_RX_FILTER_CHANGED, event_data);
> +        qobject_decref(event_data);
> +
> +        /* disable event notification to avoid events flooding */
> +        nc->rxfilter_notify_enabled = 0;
> +    }
> +}
> +
> +static char *mac_strdup_printf(const uint8_t *mac)
> +{
> +    return g_strdup_printf("%.2x:%.2x:%.2x:%.2x:%.2x:%.2x", mac[0],
> +                            mac[1], mac[2], mac[3], mac[4], mac[5]);
> +}
> +
> +static RxFilterInfo *virtio_net_query_rxfilter(NetClientState *nc)
> +{
> +    VirtIONet *n = qemu_get_nic_opaque(nc);
> +    RxFilterInfo *info;
> +    strList *str_list = NULL;
> +    strList *entry;
> +    intList *int_list = NULL;
> +    intList *int_entry;
> +    int i, j;
> +
> +    info = g_malloc0(sizeof(*info));
> +    info->name = g_strdup(nc->name);
> +    info->promiscuous = n->promisc;
> +
> +    if (n->nouni) {
> +        info->unicast = RX_STATE_NONE;
> +    } else if (n->alluni) {
> +        info->unicast = RX_STATE_ALL;
> +    } else {
> +        info->unicast = RX_STATE_NORMAL;
> +    }
> +
> +    if (n->nomulti) {
> +        info->multicast = RX_STATE_NONE;
> +    } else if (n->allmulti) {
> +        info->multicast = RX_STATE_ALL;
> +    } else {
> +        info->multicast = RX_STATE_NORMAL;
> +    }
> +
> +    info->broadcast_allowed = n->nobcast;
> +    info->multicast_overflow = n->mac_table.multi_overflow;
> +    info->unicast_overflow = n->mac_table.uni_overflow;
> +
> +    info->main_mac = mac_strdup_printf(n->mac);
> +
> +    for (i = 0; i < n->mac_table.first_multi; i++) {
> +        entry = g_malloc0(sizeof(*entry));
> +        entry->value = mac_strdup_printf(n->mac_table.macs + i * ETH_ALEN);
> +        entry->next = str_list;
> +        str_list = entry;
> +    }
> +    info->unicast_table = str_list;
> +
> +    str_list = NULL;
> +    for (i = n->mac_table.first_multi; i < n->mac_table.in_use; i++) {
> +        entry = g_malloc0(sizeof(*entry));
> +        entry->value = mac_strdup_printf(n->mac_table.macs + i * ETH_ALEN);
> +        entry->next = str_list;
> +        str_list = entry;
> +    }
> +    info->multicast_table = str_list;
> +
> +    for (i = 0; i < MAX_VLAN >> 5; i++) {
> +        for (j = 0; n->vlans[i] && j < 0x1f; j++) {
> +            if (n->vlans[i] & (1U << j)) {
> +                int_entry = g_malloc0(sizeof(*int_entry));
> +                int_entry->value = (i << 5) + j;
> +                int_entry->next = int_list;
> +                int_list = int_entry;
> +            }
> +        }
> +    }
> +    info->vlan_table = int_list;
> +
> +    /* enable event notification after query */
> +    nc->rxfilter_notify_enabled = 1;
> +
> +    return info;
> +}
> +
>  static void virtio_net_reset(VirtIODevice *vdev)
>  {
>      VirtIONet *n = VIRTIO_NET(vdev);
> @@ -420,6 +516,7 @@ static int virtio_net_handle_rx_mode(VirtIONet *n, uint8_t cmd,
>  {
>      uint8_t on;
>      size_t s;
> +    NetClientState *nc = qemu_get_queue(n->nic);
>  
>      s = iov_to_buf(iov, iov_cnt, 0, &on, sizeof(on));
>      if (s != sizeof(on)) {
> @@ -442,6 +539,8 @@ static int virtio_net_handle_rx_mode(VirtIONet *n, uint8_t cmd,
>          return VIRTIO_NET_ERR;
>      }
>  
> +    rxfilter_notify(nc);
> +
>      return VIRTIO_NET_OK;
>  }
>  
> @@ -487,6 +586,7 @@ static int virtio_net_handle_mac(VirtIONet *n, uint8_t cmd,
>  {
>      struct virtio_net_ctrl_mac mac_data;
>      size_t s;
> +    NetClientState *nc = qemu_get_queue(n->nic);
>  
>      if (cmd == VIRTIO_NET_CTRL_MAC_ADDR_SET) {
>          if (iov_size(iov, iov_cnt) != sizeof(n->mac)) {
> @@ -495,6 +595,8 @@ static int virtio_net_handle_mac(VirtIONet *n, uint8_t cmd,
>          s = iov_to_buf(iov, iov_cnt, 0, &n->mac, sizeof(n->mac));
>          assert(s == sizeof(n->mac));
>          qemu_format_nic_info_str(qemu_get_queue(n->nic), n->mac);
> +        rxfilter_notify(nc);
> +
>          return VIRTIO_NET_OK;
>      }
>  
> @@ -559,6 +661,8 @@ static int virtio_net_handle_mac(VirtIONet *n, uint8_t cmd,
>          n->mac_table.multi_overflow = 1;
>      }
>  
> +    rxfilter_notify(nc);
> +
>      return VIRTIO_NET_OK;
>  }
>  
> @@ -567,6 +671,7 @@ static int virtio_net_handle_vlan_table(VirtIONet *n, uint8_t cmd,
>  {
>      uint16_t vid;
>      size_t s;
> +    NetClientState *nc = qemu_get_queue(n->nic);
>  
>      s = iov_to_buf(iov, iov_cnt, 0, &vid, sizeof(vid));
>      vid = lduw_p(&vid);
> @@ -584,6 +689,8 @@ static int virtio_net_handle_vlan_table(VirtIONet *n, uint8_t cmd,
>      else
>          return VIRTIO_NET_ERR;
>  
> +    rxfilter_notify(nc);
> +
>      return VIRTIO_NET_OK;
>  }
>  
> @@ -1312,6 +1419,7 @@ static NetClientInfo net_virtio_info = {
>      .receive = virtio_net_receive,
>          .cleanup = virtio_net_cleanup,
>      .link_status_changed = virtio_net_set_link_status,
> +    .query_rx_filter = virtio_net_query_rxfilter,
>  };
>  
>  static bool virtio_net_guest_notifier_pending(VirtIODevice *vdev, int idx)
> diff --git a/include/monitor/monitor.h b/include/monitor/monitor.h
> index 1a6cfcf..1942cc4 100644
> --- a/include/monitor/monitor.h
> +++ b/include/monitor/monitor.h
> @@ -41,6 +41,7 @@ typedef enum MonitorEvent {
>      QEVENT_BLOCK_JOB_READY,
>      QEVENT_DEVICE_DELETED,
>      QEVENT_DEVICE_TRAY_MOVED,
> +    QEVENT_NIC_RX_FILTER_CHANGED,
>      QEVENT_SUSPEND,
>      QEVENT_SUSPEND_DISK,
>      QEVENT_WAKEUP,
> diff --git a/include/net/net.h b/include/net/net.h
> index 43d85a1..30e4b04 100644
> --- a/include/net/net.h
> +++ b/include/net/net.h
> @@ -49,6 +49,7 @@ typedef ssize_t (NetReceiveIOV)(NetClientState *, const struct iovec *, int);
>  typedef void (NetCleanup) (NetClientState *);
>  typedef void (LinkStatusChanged)(NetClientState *);
>  typedef void (NetClientDestructor)(NetClientState *);
> +typedef RxFilterInfo *(QueryRxFilter)(NetClientState *);
>  
>  typedef struct NetClientInfo {
>      NetClientOptionsKind type;
> @@ -59,6 +60,7 @@ typedef struct NetClientInfo {
>      NetCanReceive *can_receive;
>      NetCleanup *cleanup;
>      LinkStatusChanged *link_status_changed;
> +    QueryRxFilter *query_rx_filter;
>      NetPoll *poll;
>  } NetClientInfo;
>  
> @@ -74,6 +76,7 @@ struct NetClientState {
>      unsigned receive_disabled : 1;
>      NetClientDestructor *destructor;
>      unsigned int queue_index;
> +    unsigned rxfilter_notify_enabled:1;
>  };
>  
>  typedef struct NICState {
> diff --git a/monitor.c b/monitor.c
> index 017411f..3b8117c 100644
> --- a/monitor.c
> +++ b/monitor.c
> @@ -490,6 +490,7 @@ static const char *monitor_event_names[] = {
>      [QEVENT_BLOCK_JOB_READY] = "BLOCK_JOB_READY",
>      [QEVENT_DEVICE_DELETED] = "DEVICE_DELETED",
>      [QEVENT_DEVICE_TRAY_MOVED] = "DEVICE_TRAY_MOVED",
> +    [QEVENT_NIC_RX_FILTER_CHANGED] = "NIC_RX_FILTER_CHANGED",
>      [QEVENT_SUSPEND] = "SUSPEND",
>      [QEVENT_SUSPEND_DISK] = "SUSPEND_DISK",
>      [QEVENT_WAKEUP] = "WAKEUP",
> diff --git a/net/net.c b/net/net.c
> index 43a74e4..33abffe 100644
> --- a/net/net.c
> +++ b/net/net.c
> @@ -961,6 +961,53 @@ void print_net_client(Monitor *mon, NetClientState *nc)
>                     nc->info_str);
>  }
>  
> +RxFilterInfoList *qmp_query_rx_filter(bool has_name, const char *name,
> +                                      Error **errp)
> +{
> +    NetClientState *nc;
> +    RxFilterInfoList *filter_list = NULL, *last_entry = NULL;
> +
> +    QTAILQ_FOREACH(nc, &net_clients, next) {
> +        RxFilterInfoList *entry;
> +        RxFilterInfo *info;
> +
> +        /* only query rx-filter information of nic */
> +        if (nc->info->type != NET_CLIENT_OPTIONS_KIND_NIC) {
> +            continue;
> +        }
> +        if (has_name && strcmp(nc->name, name) != 0) {
> +            continue;
> +        }
> +
> +        if (nc->info->query_rx_filter) {
> +            info = nc->info->query_rx_filter(nc);
> +            entry = g_malloc0(sizeof(*entry));
> +            entry->value = info;
> +
> +            if (!filter_list) {
> +                filter_list = entry;
> +            } else {
> +                last_entry->next = entry;
> +            }
> +            last_entry = entry;
> +        } else if (has_name) {
> +            error_setg(errp, "net client(%s) doesn't support"
> +                       " rx-filter querying", name);
> +            break;
> +        }
> +    }
> +
> +    if (filter_list == NULL && !error_is_set(errp)) {
> +        if (has_name) {
> +            error_setg(errp, "invalid net client name: %s", name);
> +        } else {
> +            error_setg(errp, "no net client supports rx-filter querying");
> +        }
> +    }
> +
> +    return filter_list;
> +}
> +
>  void do_info_network(Monitor *mon, const QDict *qdict)
>  {
>      NetClientState *nc, *peer;
> diff --git a/qapi-schema.json b/qapi-schema.json
> index 5ad6894..a69dc17 100644
> --- a/qapi-schema.json
> +++ b/qapi-schema.json
> @@ -3624,3 +3624,78 @@
>              '*cpuid-input-ecx': 'int',
>              'cpuid-register': 'X86CPURegister32',
>              'features': 'int' } }
> +
> +##
> +# @RxState:
> +#
> +# Packets receiving state
> +#
> +# @normal: filter assigned packets according to the mac-table
> +#
> +# @none: don't receive any assigned packet
> +#
> +# @all: receive all assigned packets
> +#
> +##
> +{ 'enum': 'RxState', 'data': [ 'normal', 'none', 'all' ] }
> +
> +##
> +# @RxFilterInfo:
> +#
> +# Rx-filter information for a nic.
> +#
> +# @name: net client name
> +#
> +# @promiscuous: whether promiscuous mode is enabled
> +#
> +# @multicast: multicast receive state
> +#
> +# @unicast: unicast receive state
> +#
> +# @broadcast-allowed: whether to receive broadcast
> +#
> +# @multicast-overflow: multicast table is overflowed or not
> +#
> +# @unicast-overflow: unicast table is overflowed or not
> +#
> +# @main-mac: the main macaddr string
> +#
> +# @vlan-table: a list of active vlan id
> +#
> +# @unicast-table: a list of unicast macaddr string
> +#
> +# @multicast-table: a list of multicast macaddr string
> +#
> +# Since 1.6
> +##
> +
> +{ 'type': 'RxFilterInfo',
> +  'data': {
> +    'name':               'str',
> +    'promiscuous':        'bool',
> +    'multicast':          'RxState',
> +    'unicast':            'RxState',
> +    'broadcast-allowed':  'bool',
> +    'multicast-overflow': 'bool',
> +    'unicast-overflow':   'bool',
> +    'main-mac':           'str',
> +    'vlan-table':         ['int'],
> +    'unicast-table':      ['str'],
> +    'multicast-table':    ['str'] }}
> +
> +##
> +# @query-rx-filter:
> +#
> +# Return rx-filter information for all nics (or for the given nic).
> +#
> +# @name: #optional net client name
> +#
> +# Returns: list of @RxFilterInfo for all nics (or for the given nic).
> +#          Returns an error if the given @name doesn't exist, or given
> +#          nic doesn't support rx-filter querying, or no net client
> +#          supports rx-filter querying
> +#
> +# Since: 1.6
> +##
> +{ 'command': 'query-rx-filter', 'data': { '*name': 'str' },
> +  'returns': ['RxFilterInfo'] }
> diff --git a/qmp-commands.hx b/qmp-commands.hx
> index 8cea5e5..aa00a94 100644
> --- a/qmp-commands.hx
> +++ b/qmp-commands.hx
> @@ -2997,3 +2997,66 @@ Example:
>  <- { "return": {} }
>  
>  EQMP
> +    {
> +        .name       = "query-rx-filter",
> +        .args_type  = "name:s?",
> +        .mhandler.cmd_new = qmp_marshal_input_query_rx_filter,
> +    },
> +
> +SQMP
> +query-rx-filter
> +---------------
> +
> +Show rx-filter information.
> +
> +Returns a json-array of rx-filter information for all nics (or for the
> +given nic), returning an error if the given nic doesn't exist, or
> +given nic doesn't support rx-filter querying, or no net client
> +supports rx-filter querying.
> +
> +The query will clear the event notification flag of each nic, then qemu
> +will start to emit event to QMP monitor.
> +
> +Each array entry contains the following:
> +
> +- "name": net client name (json-string)
> +- "promiscuous": promiscuous mode is enabled (json-bool)
> +- "multicast": multicast receive state (one of 'normal', 'none', 'all')
> +- "unicast": unicast receive state  (one of 'normal', 'none', 'all')
> +- "broadcast-allowed": allow to receive broadcast (json-bool)
> +- "multicast-overflow": multicast table is overflowed (json-bool)
> +- "unicast-overflow": unicast table is overflowed (json-bool)
> +- "main-mac": main macaddr string (json-string)
> +- "vlan-table": a json-array of active vlan id
> +- "unicast-table": a json-array of unicast macaddr string
> +- "multicast-table": a json-array of multicast macaddr string
> +
> +Example:
> +
> +-> { "execute": "query-rx-filter", "arguments": { "name": "vnet0" } }
> +<- { "return": [
> +        {
> +            "promiscuous": true,
> +            "name": "vnet0",
> +            "main-mac": "52:54:00:12:34:56",
> +            "unicast": "normal",
> +            "vlan-table": [
> +                4,
> +                0
> +            ],
> +            "unicast-table": [
> +            ],
> +            "multicast": "normal",
> +            "multicast-overflow": false,
> +            "unicast-overflow": false,
> +            "multicast-table": [
> +                "01:00:5e:00:00:01",
> +                "33:33:00:00:00:01",
> +                "33:33:ff:12:34:56"
> +            ],
> +            "broadcast-allowed": false
> +        }
> +      ]
> +   }
> +
> +EQMP
Michael S. Tsirkin June 17, 2013, 1:21 p.m. UTC | #2
On Mon, Jun 17, 2013 at 09:11:27AM -0400, Luiz Capitulino wrote:
> On Fri, 14 Jun 2013 15:45:52 +0800
> Amos Kong <akong@redhat.com> wrote:
> 
> > Currently macvtap based macvlan device is working in promiscuous
> > mode, we want to implement mac-programming over macvtap through
> > Libvirt for better performance.
> > 
> > Design:
> >  QEMU notifies Libvirt when rx-filter config is changed in guest,
> >  then Libvirt query the rx-filter information by a monitor command,
> >  and sync the change to macvtap device. Related rx-filter config
> >  of the nic contains main mac, rx-mode items and vlan table.
> > 
> > This patch adds a QMP event to notify management of rx-filter change,
> > and adds a monitor command for management to query rx-filter
> > information.
> > 
> > Test:
> >  If we repeatedly add/remove vlan, and change macaddr of vlan
> >  interfaces in guest by a loop script.
> > 
> > Result:
> >  The events will flood the QMP client(management), management takes
> >  too much resource to process the events.
> > 
> >  Event_throttle API (set rate to 1 ms) can avoid the events to flood
> 
> I doubt this is a valid value. Today, the three events that use the event
> throttle API set the delay rate to 1000 ms.
> 
> >  QMP client, but it could cause an unexpected delay (~1ms), guests
> >  guests normally expect rx-filter updates immediately.
> 
> What you mean by "immediately"? There's a round-trip to the host plus
> all the stuff QEMU will execute to fulfil the request. And how did you
> measure this, btw?
> 
> What you have to do is is to measure your test-case in three different
> scenarios:
> 
>  1. Against upstream QEMU (ie. no patches)
>  2. With the event throttle API
>  3. With this patch
> 
> Only then you'll be able which is better.


I doubt there's a lot of value in splitting this patch in two and
testing whether we can reproduce any problems with a partial patch.  The
problem of management not keeping up with event flood triggered by mac
updates by guest might be purely theoretical, but a robust solution that
does not have theoretical holes makes me sleep better at night, and it's a
trivial amount of code.

So

Acked-by: Michael S. Tsirkin <mst@redhat.com>

Any enhancements can be implemented
by additional patches, to be applied on top.

> > 
> >  So we use a flag for each nic to avoid events flooding, the event
> >  is emitted once until the query command is executed. The flag
> >  implementation could not introduce unexpected delay.
> > 
> > There maybe exist an uncontrollable delay if we let Libvirt do the
> > real change, guests normally expect rx-filter updates immediately.
> > But it's another separate issue, we can investigate it when the
> > work in Libvirt side is done.
> > 
> > Signed-off-by: Amos Kong <akong@redhat.com>
> > ---
> > v2: add argument to filter mac-table info of single nic (Stefan)
> >     update the document, add event notification
> > v3: rename to rx-filter, add main mac, avoid events flooding (MST)
> >     fix error process (Stefan), fix qmp interface (Eric)
> > v4: process qerror in hmp, cleanup (Luiz)
> >     set flag for each device, add device path in event, add
> >     helper for g_strdup_printf (MST)
> >     fix qmp document (Eric)
> > v5: add path in doc, define notify flag to unsigned (Eric)
> >     add vlan table (Jason), drop monitor cmd
> > v6: return vids list, add test results/analysis to commitlog
> > ---
> >  QMP/qmp-events.txt        |  17 ++++++++
> >  hw/net/virtio-net.c       | 108 ++++++++++++++++++++++++++++++++++++++++++++++
> >  include/monitor/monitor.h |   1 +
> >  include/net/net.h         |   3 ++
> >  monitor.c                 |   1 +
> >  net/net.c                 |  47 ++++++++++++++++++++
> >  qapi-schema.json          |  75 ++++++++++++++++++++++++++++++++
> >  qmp-commands.hx           |  63 +++++++++++++++++++++++++++
> >  8 files changed, 315 insertions(+)
> > 
> > diff --git a/QMP/qmp-events.txt b/QMP/qmp-events.txt
> > index 24e804e9..39b6016 100644
> > --- a/QMP/qmp-events.txt
> > +++ b/QMP/qmp-events.txt
> > @@ -172,6 +172,23 @@ Data:
> >    },
> >    "timestamp": { "seconds": 1265044230, "microseconds": 450486 } }
> >  
> > +NIC_RX_FILTER_CHANGED
> > +-----------------
> > +
> > +The event is emitted once until the query command is executed,
> > +the first event will always be emitted.
> > +
> > +Data:
> > +
> > +- "name": net client name (json-string)
> > +- "path": device path (json-string)
> > +
> > +{ "event": "NIC_RX_FILTER_CHANGED",
> > +  "data": { "name": "vnet0",
> > +            "path": "/machine/peripheral/vnet0/virtio-backend" },
> > +  "timestamp": { "seconds": 1368697518, "microseconds": 326866 } }
> > +}
> > +
> >  RESET
> >  -----
> >  
> > diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> > index 1ea9556..4137959 100644
> > --- a/hw/net/virtio-net.c
> > +++ b/hw/net/virtio-net.c
> > @@ -21,6 +21,8 @@
> >  #include "hw/virtio/virtio-net.h"
> >  #include "net/vhost_net.h"
> >  #include "hw/virtio/virtio-bus.h"
> > +#include "qapi/qmp/qjson.h"
> > +#include "monitor/monitor.h"
> >  
> >  #define VIRTIO_NET_VM_VERSION    11
> >  
> > @@ -192,6 +194,100 @@ static void virtio_net_set_link_status(NetClientState *nc)
> >      virtio_net_set_status(vdev, vdev->status);
> >  }
> >  
> > +static void rxfilter_notify(NetClientState *nc)
> > +{
> > +    QObject *event_data;
> > +    VirtIONet *n = qemu_get_nic_opaque(nc);
> > +
> > +    if (nc->rxfilter_notify_enabled) {
> > +        event_data = qobject_from_jsonf("{ 'name': %s, 'path': %s }",
> > +                           n->netclient_name,
> > +                           object_get_canonical_path(OBJECT(n->qdev)));
> > +        monitor_protocol_event(QEVENT_NIC_RX_FILTER_CHANGED, event_data);
> > +        qobject_decref(event_data);
> > +
> > +        /* disable event notification to avoid events flooding */
> > +        nc->rxfilter_notify_enabled = 0;
> > +    }
> > +}
> > +
> > +static char *mac_strdup_printf(const uint8_t *mac)
> > +{
> > +    return g_strdup_printf("%.2x:%.2x:%.2x:%.2x:%.2x:%.2x", mac[0],
> > +                            mac[1], mac[2], mac[3], mac[4], mac[5]);
> > +}
> > +
> > +static RxFilterInfo *virtio_net_query_rxfilter(NetClientState *nc)
> > +{
> > +    VirtIONet *n = qemu_get_nic_opaque(nc);
> > +    RxFilterInfo *info;
> > +    strList *str_list = NULL;
> > +    strList *entry;
> > +    intList *int_list = NULL;
> > +    intList *int_entry;
> > +    int i, j;
> > +
> > +    info = g_malloc0(sizeof(*info));
> > +    info->name = g_strdup(nc->name);
> > +    info->promiscuous = n->promisc;
> > +
> > +    if (n->nouni) {
> > +        info->unicast = RX_STATE_NONE;
> > +    } else if (n->alluni) {
> > +        info->unicast = RX_STATE_ALL;
> > +    } else {
> > +        info->unicast = RX_STATE_NORMAL;
> > +    }
> > +
> > +    if (n->nomulti) {
> > +        info->multicast = RX_STATE_NONE;
> > +    } else if (n->allmulti) {
> > +        info->multicast = RX_STATE_ALL;
> > +    } else {
> > +        info->multicast = RX_STATE_NORMAL;
> > +    }
> > +
> > +    info->broadcast_allowed = n->nobcast;
> > +    info->multicast_overflow = n->mac_table.multi_overflow;
> > +    info->unicast_overflow = n->mac_table.uni_overflow;
> > +
> > +    info->main_mac = mac_strdup_printf(n->mac);
> > +
> > +    for (i = 0; i < n->mac_table.first_multi; i++) {
> > +        entry = g_malloc0(sizeof(*entry));
> > +        entry->value = mac_strdup_printf(n->mac_table.macs + i * ETH_ALEN);
> > +        entry->next = str_list;
> > +        str_list = entry;
> > +    }
> > +    info->unicast_table = str_list;
> > +
> > +    str_list = NULL;
> > +    for (i = n->mac_table.first_multi; i < n->mac_table.in_use; i++) {
> > +        entry = g_malloc0(sizeof(*entry));
> > +        entry->value = mac_strdup_printf(n->mac_table.macs + i * ETH_ALEN);
> > +        entry->next = str_list;
> > +        str_list = entry;
> > +    }
> > +    info->multicast_table = str_list;
> > +
> > +    for (i = 0; i < MAX_VLAN >> 5; i++) {
> > +        for (j = 0; n->vlans[i] && j < 0x1f; j++) {
> > +            if (n->vlans[i] & (1U << j)) {
> > +                int_entry = g_malloc0(sizeof(*int_entry));
> > +                int_entry->value = (i << 5) + j;
> > +                int_entry->next = int_list;
> > +                int_list = int_entry;
> > +            }
> > +        }
> > +    }
> > +    info->vlan_table = int_list;
> > +
> > +    /* enable event notification after query */
> > +    nc->rxfilter_notify_enabled = 1;
> > +
> > +    return info;
> > +}
> > +
> >  static void virtio_net_reset(VirtIODevice *vdev)
> >  {
> >      VirtIONet *n = VIRTIO_NET(vdev);
> > @@ -420,6 +516,7 @@ static int virtio_net_handle_rx_mode(VirtIONet *n, uint8_t cmd,
> >  {
> >      uint8_t on;
> >      size_t s;
> > +    NetClientState *nc = qemu_get_queue(n->nic);
> >  
> >      s = iov_to_buf(iov, iov_cnt, 0, &on, sizeof(on));
> >      if (s != sizeof(on)) {
> > @@ -442,6 +539,8 @@ static int virtio_net_handle_rx_mode(VirtIONet *n, uint8_t cmd,
> >          return VIRTIO_NET_ERR;
> >      }
> >  
> > +    rxfilter_notify(nc);
> > +
> >      return VIRTIO_NET_OK;
> >  }
> >  
> > @@ -487,6 +586,7 @@ static int virtio_net_handle_mac(VirtIONet *n, uint8_t cmd,
> >  {
> >      struct virtio_net_ctrl_mac mac_data;
> >      size_t s;
> > +    NetClientState *nc = qemu_get_queue(n->nic);
> >  
> >      if (cmd == VIRTIO_NET_CTRL_MAC_ADDR_SET) {
> >          if (iov_size(iov, iov_cnt) != sizeof(n->mac)) {
> > @@ -495,6 +595,8 @@ static int virtio_net_handle_mac(VirtIONet *n, uint8_t cmd,
> >          s = iov_to_buf(iov, iov_cnt, 0, &n->mac, sizeof(n->mac));
> >          assert(s == sizeof(n->mac));
> >          qemu_format_nic_info_str(qemu_get_queue(n->nic), n->mac);
> > +        rxfilter_notify(nc);
> > +
> >          return VIRTIO_NET_OK;
> >      }
> >  
> > @@ -559,6 +661,8 @@ static int virtio_net_handle_mac(VirtIONet *n, uint8_t cmd,
> >          n->mac_table.multi_overflow = 1;
> >      }
> >  
> > +    rxfilter_notify(nc);
> > +
> >      return VIRTIO_NET_OK;
> >  }
> >  
> > @@ -567,6 +671,7 @@ static int virtio_net_handle_vlan_table(VirtIONet *n, uint8_t cmd,
> >  {
> >      uint16_t vid;
> >      size_t s;
> > +    NetClientState *nc = qemu_get_queue(n->nic);
> >  
> >      s = iov_to_buf(iov, iov_cnt, 0, &vid, sizeof(vid));
> >      vid = lduw_p(&vid);
> > @@ -584,6 +689,8 @@ static int virtio_net_handle_vlan_table(VirtIONet *n, uint8_t cmd,
> >      else
> >          return VIRTIO_NET_ERR;
> >  
> > +    rxfilter_notify(nc);
> > +
> >      return VIRTIO_NET_OK;
> >  }
> >  
> > @@ -1312,6 +1419,7 @@ static NetClientInfo net_virtio_info = {
> >      .receive = virtio_net_receive,
> >          .cleanup = virtio_net_cleanup,
> >      .link_status_changed = virtio_net_set_link_status,
> > +    .query_rx_filter = virtio_net_query_rxfilter,
> >  };
> >  
> >  static bool virtio_net_guest_notifier_pending(VirtIODevice *vdev, int idx)
> > diff --git a/include/monitor/monitor.h b/include/monitor/monitor.h
> > index 1a6cfcf..1942cc4 100644
> > --- a/include/monitor/monitor.h
> > +++ b/include/monitor/monitor.h
> > @@ -41,6 +41,7 @@ typedef enum MonitorEvent {
> >      QEVENT_BLOCK_JOB_READY,
> >      QEVENT_DEVICE_DELETED,
> >      QEVENT_DEVICE_TRAY_MOVED,
> > +    QEVENT_NIC_RX_FILTER_CHANGED,
> >      QEVENT_SUSPEND,
> >      QEVENT_SUSPEND_DISK,
> >      QEVENT_WAKEUP,
> > diff --git a/include/net/net.h b/include/net/net.h
> > index 43d85a1..30e4b04 100644
> > --- a/include/net/net.h
> > +++ b/include/net/net.h
> > @@ -49,6 +49,7 @@ typedef ssize_t (NetReceiveIOV)(NetClientState *, const struct iovec *, int);
> >  typedef void (NetCleanup) (NetClientState *);
> >  typedef void (LinkStatusChanged)(NetClientState *);
> >  typedef void (NetClientDestructor)(NetClientState *);
> > +typedef RxFilterInfo *(QueryRxFilter)(NetClientState *);
> >  
> >  typedef struct NetClientInfo {
> >      NetClientOptionsKind type;
> > @@ -59,6 +60,7 @@ typedef struct NetClientInfo {
> >      NetCanReceive *can_receive;
> >      NetCleanup *cleanup;
> >      LinkStatusChanged *link_status_changed;
> > +    QueryRxFilter *query_rx_filter;
> >      NetPoll *poll;
> >  } NetClientInfo;
> >  
> > @@ -74,6 +76,7 @@ struct NetClientState {
> >      unsigned receive_disabled : 1;
> >      NetClientDestructor *destructor;
> >      unsigned int queue_index;
> > +    unsigned rxfilter_notify_enabled:1;
> >  };
> >  
> >  typedef struct NICState {
> > diff --git a/monitor.c b/monitor.c
> > index 017411f..3b8117c 100644
> > --- a/monitor.c
> > +++ b/monitor.c
> > @@ -490,6 +490,7 @@ static const char *monitor_event_names[] = {
> >      [QEVENT_BLOCK_JOB_READY] = "BLOCK_JOB_READY",
> >      [QEVENT_DEVICE_DELETED] = "DEVICE_DELETED",
> >      [QEVENT_DEVICE_TRAY_MOVED] = "DEVICE_TRAY_MOVED",
> > +    [QEVENT_NIC_RX_FILTER_CHANGED] = "NIC_RX_FILTER_CHANGED",
> >      [QEVENT_SUSPEND] = "SUSPEND",
> >      [QEVENT_SUSPEND_DISK] = "SUSPEND_DISK",
> >      [QEVENT_WAKEUP] = "WAKEUP",
> > diff --git a/net/net.c b/net/net.c
> > index 43a74e4..33abffe 100644
> > --- a/net/net.c
> > +++ b/net/net.c
> > @@ -961,6 +961,53 @@ void print_net_client(Monitor *mon, NetClientState *nc)
> >                     nc->info_str);
> >  }
> >  
> > +RxFilterInfoList *qmp_query_rx_filter(bool has_name, const char *name,
> > +                                      Error **errp)
> > +{
> > +    NetClientState *nc;
> > +    RxFilterInfoList *filter_list = NULL, *last_entry = NULL;
> > +
> > +    QTAILQ_FOREACH(nc, &net_clients, next) {
> > +        RxFilterInfoList *entry;
> > +        RxFilterInfo *info;
> > +
> > +        /* only query rx-filter information of nic */
> > +        if (nc->info->type != NET_CLIENT_OPTIONS_KIND_NIC) {
> > +            continue;
> > +        }
> > +        if (has_name && strcmp(nc->name, name) != 0) {
> > +            continue;
> > +        }
> > +
> > +        if (nc->info->query_rx_filter) {
> > +            info = nc->info->query_rx_filter(nc);
> > +            entry = g_malloc0(sizeof(*entry));
> > +            entry->value = info;
> > +
> > +            if (!filter_list) {
> > +                filter_list = entry;
> > +            } else {
> > +                last_entry->next = entry;
> > +            }
> > +            last_entry = entry;
> > +        } else if (has_name) {
> > +            error_setg(errp, "net client(%s) doesn't support"
> > +                       " rx-filter querying", name);
> > +            break;
> > +        }
> > +    }
> > +
> > +    if (filter_list == NULL && !error_is_set(errp)) {
> > +        if (has_name) {
> > +            error_setg(errp, "invalid net client name: %s", name);
> > +        } else {
> > +            error_setg(errp, "no net client supports rx-filter querying");
> > +        }
> > +    }
> > +
> > +    return filter_list;
> > +}
> > +
> >  void do_info_network(Monitor *mon, const QDict *qdict)
> >  {
> >      NetClientState *nc, *peer;
> > diff --git a/qapi-schema.json b/qapi-schema.json
> > index 5ad6894..a69dc17 100644
> > --- a/qapi-schema.json
> > +++ b/qapi-schema.json
> > @@ -3624,3 +3624,78 @@
> >              '*cpuid-input-ecx': 'int',
> >              'cpuid-register': 'X86CPURegister32',
> >              'features': 'int' } }
> > +
> > +##
> > +# @RxState:
> > +#
> > +# Packets receiving state
> > +#
> > +# @normal: filter assigned packets according to the mac-table
> > +#
> > +# @none: don't receive any assigned packet
> > +#
> > +# @all: receive all assigned packets
> > +#
> > +##
> > +{ 'enum': 'RxState', 'data': [ 'normal', 'none', 'all' ] }
> > +
> > +##
> > +# @RxFilterInfo:
> > +#
> > +# Rx-filter information for a nic.
> > +#
> > +# @name: net client name
> > +#
> > +# @promiscuous: whether promiscuous mode is enabled
> > +#
> > +# @multicast: multicast receive state
> > +#
> > +# @unicast: unicast receive state
> > +#
> > +# @broadcast-allowed: whether to receive broadcast
> > +#
> > +# @multicast-overflow: multicast table is overflowed or not
> > +#
> > +# @unicast-overflow: unicast table is overflowed or not
> > +#
> > +# @main-mac: the main macaddr string
> > +#
> > +# @vlan-table: a list of active vlan id
> > +#
> > +# @unicast-table: a list of unicast macaddr string
> > +#
> > +# @multicast-table: a list of multicast macaddr string
> > +#
> > +# Since 1.6
> > +##
> > +
> > +{ 'type': 'RxFilterInfo',
> > +  'data': {
> > +    'name':               'str',
> > +    'promiscuous':        'bool',
> > +    'multicast':          'RxState',
> > +    'unicast':            'RxState',
> > +    'broadcast-allowed':  'bool',
> > +    'multicast-overflow': 'bool',
> > +    'unicast-overflow':   'bool',
> > +    'main-mac':           'str',
> > +    'vlan-table':         ['int'],
> > +    'unicast-table':      ['str'],
> > +    'multicast-table':    ['str'] }}
> > +
> > +##
> > +# @query-rx-filter:
> > +#
> > +# Return rx-filter information for all nics (or for the given nic).
> > +#
> > +# @name: #optional net client name
> > +#
> > +# Returns: list of @RxFilterInfo for all nics (or for the given nic).
> > +#          Returns an error if the given @name doesn't exist, or given
> > +#          nic doesn't support rx-filter querying, or no net client
> > +#          supports rx-filter querying
> > +#
> > +# Since: 1.6
> > +##
> > +{ 'command': 'query-rx-filter', 'data': { '*name': 'str' },
> > +  'returns': ['RxFilterInfo'] }
> > diff --git a/qmp-commands.hx b/qmp-commands.hx
> > index 8cea5e5..aa00a94 100644
> > --- a/qmp-commands.hx
> > +++ b/qmp-commands.hx
> > @@ -2997,3 +2997,66 @@ Example:
> >  <- { "return": {} }
> >  
> >  EQMP
> > +    {
> > +        .name       = "query-rx-filter",
> > +        .args_type  = "name:s?",
> > +        .mhandler.cmd_new = qmp_marshal_input_query_rx_filter,
> > +    },
> > +
> > +SQMP
> > +query-rx-filter
> > +---------------
> > +
> > +Show rx-filter information.
> > +
> > +Returns a json-array of rx-filter information for all nics (or for the
> > +given nic), returning an error if the given nic doesn't exist, or
> > +given nic doesn't support rx-filter querying, or no net client
> > +supports rx-filter querying.
> > +
> > +The query will clear the event notification flag of each nic, then qemu
> > +will start to emit event to QMP monitor.
> > +
> > +Each array entry contains the following:
> > +
> > +- "name": net client name (json-string)
> > +- "promiscuous": promiscuous mode is enabled (json-bool)
> > +- "multicast": multicast receive state (one of 'normal', 'none', 'all')
> > +- "unicast": unicast receive state  (one of 'normal', 'none', 'all')
> > +- "broadcast-allowed": allow to receive broadcast (json-bool)
> > +- "multicast-overflow": multicast table is overflowed (json-bool)
> > +- "unicast-overflow": unicast table is overflowed (json-bool)
> > +- "main-mac": main macaddr string (json-string)
> > +- "vlan-table": a json-array of active vlan id
> > +- "unicast-table": a json-array of unicast macaddr string
> > +- "multicast-table": a json-array of multicast macaddr string
> > +
> > +Example:
> > +
> > +-> { "execute": "query-rx-filter", "arguments": { "name": "vnet0" } }
> > +<- { "return": [
> > +        {
> > +            "promiscuous": true,
> > +            "name": "vnet0",
> > +            "main-mac": "52:54:00:12:34:56",
> > +            "unicast": "normal",
> > +            "vlan-table": [
> > +                4,
> > +                0
> > +            ],
> > +            "unicast-table": [
> > +            ],
> > +            "multicast": "normal",
> > +            "multicast-overflow": false,
> > +            "unicast-overflow": false,
> > +            "multicast-table": [
> > +                "01:00:5e:00:00:01",
> > +                "33:33:00:00:00:01",
> > +                "33:33:ff:12:34:56"
> > +            ],
> > +            "broadcast-allowed": false
> > +        }
> > +      ]
> > +   }
> > +
> > +EQMP
Luiz Capitulino June 17, 2013, 1:30 p.m. UTC | #3
On Mon, 17 Jun 2013 16:21:31 +0300
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Mon, Jun 17, 2013 at 09:11:27AM -0400, Luiz Capitulino wrote:
> > On Fri, 14 Jun 2013 15:45:52 +0800
> > Amos Kong <akong@redhat.com> wrote:
> > 
> > > Currently macvtap based macvlan device is working in promiscuous
> > > mode, we want to implement mac-programming over macvtap through
> > > Libvirt for better performance.
> > > 
> > > Design:
> > >  QEMU notifies Libvirt when rx-filter config is changed in guest,
> > >  then Libvirt query the rx-filter information by a monitor command,
> > >  and sync the change to macvtap device. Related rx-filter config
> > >  of the nic contains main mac, rx-mode items and vlan table.
> > > 
> > > This patch adds a QMP event to notify management of rx-filter change,
> > > and adds a monitor command for management to query rx-filter
> > > information.
> > > 
> > > Test:
> > >  If we repeatedly add/remove vlan, and change macaddr of vlan
> > >  interfaces in guest by a loop script.
> > > 
> > > Result:
> > >  The events will flood the QMP client(management), management takes
> > >  too much resource to process the events.
> > > 
> > >  Event_throttle API (set rate to 1 ms) can avoid the events to flood
> > 
> > I doubt this is a valid value. Today, the three events that use the event
> > throttle API set the delay rate to 1000 ms.
> > 
> > >  QMP client, but it could cause an unexpected delay (~1ms), guests
> > >  guests normally expect rx-filter updates immediately.
> > 
> > What you mean by "immediately"? There's a round-trip to the host plus
> > all the stuff QEMU will execute to fulfil the request. And how did you
> > measure this, btw?
> > 
> > What you have to do is is to measure your test-case in three different
> > scenarios:
> > 
> >  1. Against upstream QEMU (ie. no patches)
> >  2. With the event throttle API
> >  3. With this patch
> > 
> > Only then you'll be able which is better.
> 
> 
> I doubt there's a lot of value in splitting this patch in two and
> testing whether we can reproduce any problems with a partial patch. 

I didn't suggest that. Actually, I didn't even talk about code.

> The
> problem of management not keeping up with event flood triggered by mac
> updates by guest might be purely theoretical,

That's not the problem the flag thing is trying to solve. We have the
event throttle API for that. What the flag is trying to solve is _guest_
latency due to event generation and it's not clear at all if this is a real
problem.

> but a robust solution that
> does not have theoretical holes makes me sleep better at night, and it's a
> trivial amount of code.

It has the drawback of coupling the query command with the event. IMO, such
a coupling has to be justified with real data.

> 
> So
> 
> Acked-by: Michael S. Tsirkin <mst@redhat.com>
> 
> Any enhancements can be implemented
> by additional patches, to be applied on top.
> 
> > > 
> > >  So we use a flag for each nic to avoid events flooding, the event
> > >  is emitted once until the query command is executed. The flag
> > >  implementation could not introduce unexpected delay.
> > > 
> > > There maybe exist an uncontrollable delay if we let Libvirt do the
> > > real change, guests normally expect rx-filter updates immediately.
> > > But it's another separate issue, we can investigate it when the
> > > work in Libvirt side is done.
> > > 
> > > Signed-off-by: Amos Kong <akong@redhat.com>
> > > ---
> > > v2: add argument to filter mac-table info of single nic (Stefan)
> > >     update the document, add event notification
> > > v3: rename to rx-filter, add main mac, avoid events flooding (MST)
> > >     fix error process (Stefan), fix qmp interface (Eric)
> > > v4: process qerror in hmp, cleanup (Luiz)
> > >     set flag for each device, add device path in event, add
> > >     helper for g_strdup_printf (MST)
> > >     fix qmp document (Eric)
> > > v5: add path in doc, define notify flag to unsigned (Eric)
> > >     add vlan table (Jason), drop monitor cmd
> > > v6: return vids list, add test results/analysis to commitlog
> > > ---
> > >  QMP/qmp-events.txt        |  17 ++++++++
> > >  hw/net/virtio-net.c       | 108 ++++++++++++++++++++++++++++++++++++++++++++++
> > >  include/monitor/monitor.h |   1 +
> > >  include/net/net.h         |   3 ++
> > >  monitor.c                 |   1 +
> > >  net/net.c                 |  47 ++++++++++++++++++++
> > >  qapi-schema.json          |  75 ++++++++++++++++++++++++++++++++
> > >  qmp-commands.hx           |  63 +++++++++++++++++++++++++++
> > >  8 files changed, 315 insertions(+)
> > > 
> > > diff --git a/QMP/qmp-events.txt b/QMP/qmp-events.txt
> > > index 24e804e9..39b6016 100644
> > > --- a/QMP/qmp-events.txt
> > > +++ b/QMP/qmp-events.txt
> > > @@ -172,6 +172,23 @@ Data:
> > >    },
> > >    "timestamp": { "seconds": 1265044230, "microseconds": 450486 } }
> > >  
> > > +NIC_RX_FILTER_CHANGED
> > > +-----------------
> > > +
> > > +The event is emitted once until the query command is executed,
> > > +the first event will always be emitted.
> > > +
> > > +Data:
> > > +
> > > +- "name": net client name (json-string)
> > > +- "path": device path (json-string)
> > > +
> > > +{ "event": "NIC_RX_FILTER_CHANGED",
> > > +  "data": { "name": "vnet0",
> > > +            "path": "/machine/peripheral/vnet0/virtio-backend" },
> > > +  "timestamp": { "seconds": 1368697518, "microseconds": 326866 } }
> > > +}
> > > +
> > >  RESET
> > >  -----
> > >  
> > > diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> > > index 1ea9556..4137959 100644
> > > --- a/hw/net/virtio-net.c
> > > +++ b/hw/net/virtio-net.c
> > > @@ -21,6 +21,8 @@
> > >  #include "hw/virtio/virtio-net.h"
> > >  #include "net/vhost_net.h"
> > >  #include "hw/virtio/virtio-bus.h"
> > > +#include "qapi/qmp/qjson.h"
> > > +#include "monitor/monitor.h"
> > >  
> > >  #define VIRTIO_NET_VM_VERSION    11
> > >  
> > > @@ -192,6 +194,100 @@ static void virtio_net_set_link_status(NetClientState *nc)
> > >      virtio_net_set_status(vdev, vdev->status);
> > >  }
> > >  
> > > +static void rxfilter_notify(NetClientState *nc)
> > > +{
> > > +    QObject *event_data;
> > > +    VirtIONet *n = qemu_get_nic_opaque(nc);
> > > +
> > > +    if (nc->rxfilter_notify_enabled) {
> > > +        event_data = qobject_from_jsonf("{ 'name': %s, 'path': %s }",
> > > +                           n->netclient_name,
> > > +                           object_get_canonical_path(OBJECT(n->qdev)));
> > > +        monitor_protocol_event(QEVENT_NIC_RX_FILTER_CHANGED, event_data);
> > > +        qobject_decref(event_data);
> > > +
> > > +        /* disable event notification to avoid events flooding */
> > > +        nc->rxfilter_notify_enabled = 0;
> > > +    }
> > > +}
> > > +
> > > +static char *mac_strdup_printf(const uint8_t *mac)
> > > +{
> > > +    return g_strdup_printf("%.2x:%.2x:%.2x:%.2x:%.2x:%.2x", mac[0],
> > > +                            mac[1], mac[2], mac[3], mac[4], mac[5]);
> > > +}
> > > +
> > > +static RxFilterInfo *virtio_net_query_rxfilter(NetClientState *nc)
> > > +{
> > > +    VirtIONet *n = qemu_get_nic_opaque(nc);
> > > +    RxFilterInfo *info;
> > > +    strList *str_list = NULL;
> > > +    strList *entry;
> > > +    intList *int_list = NULL;
> > > +    intList *int_entry;
> > > +    int i, j;
> > > +
> > > +    info = g_malloc0(sizeof(*info));
> > > +    info->name = g_strdup(nc->name);
> > > +    info->promiscuous = n->promisc;
> > > +
> > > +    if (n->nouni) {
> > > +        info->unicast = RX_STATE_NONE;
> > > +    } else if (n->alluni) {
> > > +        info->unicast = RX_STATE_ALL;
> > > +    } else {
> > > +        info->unicast = RX_STATE_NORMAL;
> > > +    }
> > > +
> > > +    if (n->nomulti) {
> > > +        info->multicast = RX_STATE_NONE;
> > > +    } else if (n->allmulti) {
> > > +        info->multicast = RX_STATE_ALL;
> > > +    } else {
> > > +        info->multicast = RX_STATE_NORMAL;
> > > +    }
> > > +
> > > +    info->broadcast_allowed = n->nobcast;
> > > +    info->multicast_overflow = n->mac_table.multi_overflow;
> > > +    info->unicast_overflow = n->mac_table.uni_overflow;
> > > +
> > > +    info->main_mac = mac_strdup_printf(n->mac);
> > > +
> > > +    for (i = 0; i < n->mac_table.first_multi; i++) {
> > > +        entry = g_malloc0(sizeof(*entry));
> > > +        entry->value = mac_strdup_printf(n->mac_table.macs + i * ETH_ALEN);
> > > +        entry->next = str_list;
> > > +        str_list = entry;
> > > +    }
> > > +    info->unicast_table = str_list;
> > > +
> > > +    str_list = NULL;
> > > +    for (i = n->mac_table.first_multi; i < n->mac_table.in_use; i++) {
> > > +        entry = g_malloc0(sizeof(*entry));
> > > +        entry->value = mac_strdup_printf(n->mac_table.macs + i * ETH_ALEN);
> > > +        entry->next = str_list;
> > > +        str_list = entry;
> > > +    }
> > > +    info->multicast_table = str_list;
> > > +
> > > +    for (i = 0; i < MAX_VLAN >> 5; i++) {
> > > +        for (j = 0; n->vlans[i] && j < 0x1f; j++) {
> > > +            if (n->vlans[i] & (1U << j)) {
> > > +                int_entry = g_malloc0(sizeof(*int_entry));
> > > +                int_entry->value = (i << 5) + j;
> > > +                int_entry->next = int_list;
> > > +                int_list = int_entry;
> > > +            }
> > > +        }
> > > +    }
> > > +    info->vlan_table = int_list;
> > > +
> > > +    /* enable event notification after query */
> > > +    nc->rxfilter_notify_enabled = 1;
> > > +
> > > +    return info;
> > > +}
> > > +
> > >  static void virtio_net_reset(VirtIODevice *vdev)
> > >  {
> > >      VirtIONet *n = VIRTIO_NET(vdev);
> > > @@ -420,6 +516,7 @@ static int virtio_net_handle_rx_mode(VirtIONet *n, uint8_t cmd,
> > >  {
> > >      uint8_t on;
> > >      size_t s;
> > > +    NetClientState *nc = qemu_get_queue(n->nic);
> > >  
> > >      s = iov_to_buf(iov, iov_cnt, 0, &on, sizeof(on));
> > >      if (s != sizeof(on)) {
> > > @@ -442,6 +539,8 @@ static int virtio_net_handle_rx_mode(VirtIONet *n, uint8_t cmd,
> > >          return VIRTIO_NET_ERR;
> > >      }
> > >  
> > > +    rxfilter_notify(nc);
> > > +
> > >      return VIRTIO_NET_OK;
> > >  }
> > >  
> > > @@ -487,6 +586,7 @@ static int virtio_net_handle_mac(VirtIONet *n, uint8_t cmd,
> > >  {
> > >      struct virtio_net_ctrl_mac mac_data;
> > >      size_t s;
> > > +    NetClientState *nc = qemu_get_queue(n->nic);
> > >  
> > >      if (cmd == VIRTIO_NET_CTRL_MAC_ADDR_SET) {
> > >          if (iov_size(iov, iov_cnt) != sizeof(n->mac)) {
> > > @@ -495,6 +595,8 @@ static int virtio_net_handle_mac(VirtIONet *n, uint8_t cmd,
> > >          s = iov_to_buf(iov, iov_cnt, 0, &n->mac, sizeof(n->mac));
> > >          assert(s == sizeof(n->mac));
> > >          qemu_format_nic_info_str(qemu_get_queue(n->nic), n->mac);
> > > +        rxfilter_notify(nc);
> > > +
> > >          return VIRTIO_NET_OK;
> > >      }
> > >  
> > > @@ -559,6 +661,8 @@ static int virtio_net_handle_mac(VirtIONet *n, uint8_t cmd,
> > >          n->mac_table.multi_overflow = 1;
> > >      }
> > >  
> > > +    rxfilter_notify(nc);
> > > +
> > >      return VIRTIO_NET_OK;
> > >  }
> > >  
> > > @@ -567,6 +671,7 @@ static int virtio_net_handle_vlan_table(VirtIONet *n, uint8_t cmd,
> > >  {
> > >      uint16_t vid;
> > >      size_t s;
> > > +    NetClientState *nc = qemu_get_queue(n->nic);
> > >  
> > >      s = iov_to_buf(iov, iov_cnt, 0, &vid, sizeof(vid));
> > >      vid = lduw_p(&vid);
> > > @@ -584,6 +689,8 @@ static int virtio_net_handle_vlan_table(VirtIONet *n, uint8_t cmd,
> > >      else
> > >          return VIRTIO_NET_ERR;
> > >  
> > > +    rxfilter_notify(nc);
> > > +
> > >      return VIRTIO_NET_OK;
> > >  }
> > >  
> > > @@ -1312,6 +1419,7 @@ static NetClientInfo net_virtio_info = {
> > >      .receive = virtio_net_receive,
> > >          .cleanup = virtio_net_cleanup,
> > >      .link_status_changed = virtio_net_set_link_status,
> > > +    .query_rx_filter = virtio_net_query_rxfilter,
> > >  };
> > >  
> > >  static bool virtio_net_guest_notifier_pending(VirtIODevice *vdev, int idx)
> > > diff --git a/include/monitor/monitor.h b/include/monitor/monitor.h
> > > index 1a6cfcf..1942cc4 100644
> > > --- a/include/monitor/monitor.h
> > > +++ b/include/monitor/monitor.h
> > > @@ -41,6 +41,7 @@ typedef enum MonitorEvent {
> > >      QEVENT_BLOCK_JOB_READY,
> > >      QEVENT_DEVICE_DELETED,
> > >      QEVENT_DEVICE_TRAY_MOVED,
> > > +    QEVENT_NIC_RX_FILTER_CHANGED,
> > >      QEVENT_SUSPEND,
> > >      QEVENT_SUSPEND_DISK,
> > >      QEVENT_WAKEUP,
> > > diff --git a/include/net/net.h b/include/net/net.h
> > > index 43d85a1..30e4b04 100644
> > > --- a/include/net/net.h
> > > +++ b/include/net/net.h
> > > @@ -49,6 +49,7 @@ typedef ssize_t (NetReceiveIOV)(NetClientState *, const struct iovec *, int);
> > >  typedef void (NetCleanup) (NetClientState *);
> > >  typedef void (LinkStatusChanged)(NetClientState *);
> > >  typedef void (NetClientDestructor)(NetClientState *);
> > > +typedef RxFilterInfo *(QueryRxFilter)(NetClientState *);
> > >  
> > >  typedef struct NetClientInfo {
> > >      NetClientOptionsKind type;
> > > @@ -59,6 +60,7 @@ typedef struct NetClientInfo {
> > >      NetCanReceive *can_receive;
> > >      NetCleanup *cleanup;
> > >      LinkStatusChanged *link_status_changed;
> > > +    QueryRxFilter *query_rx_filter;
> > >      NetPoll *poll;
> > >  } NetClientInfo;
> > >  
> > > @@ -74,6 +76,7 @@ struct NetClientState {
> > >      unsigned receive_disabled : 1;
> > >      NetClientDestructor *destructor;
> > >      unsigned int queue_index;
> > > +    unsigned rxfilter_notify_enabled:1;
> > >  };
> > >  
> > >  typedef struct NICState {
> > > diff --git a/monitor.c b/monitor.c
> > > index 017411f..3b8117c 100644
> > > --- a/monitor.c
> > > +++ b/monitor.c
> > > @@ -490,6 +490,7 @@ static const char *monitor_event_names[] = {
> > >      [QEVENT_BLOCK_JOB_READY] = "BLOCK_JOB_READY",
> > >      [QEVENT_DEVICE_DELETED] = "DEVICE_DELETED",
> > >      [QEVENT_DEVICE_TRAY_MOVED] = "DEVICE_TRAY_MOVED",
> > > +    [QEVENT_NIC_RX_FILTER_CHANGED] = "NIC_RX_FILTER_CHANGED",
> > >      [QEVENT_SUSPEND] = "SUSPEND",
> > >      [QEVENT_SUSPEND_DISK] = "SUSPEND_DISK",
> > >      [QEVENT_WAKEUP] = "WAKEUP",
> > > diff --git a/net/net.c b/net/net.c
> > > index 43a74e4..33abffe 100644
> > > --- a/net/net.c
> > > +++ b/net/net.c
> > > @@ -961,6 +961,53 @@ void print_net_client(Monitor *mon, NetClientState *nc)
> > >                     nc->info_str);
> > >  }
> > >  
> > > +RxFilterInfoList *qmp_query_rx_filter(bool has_name, const char *name,
> > > +                                      Error **errp)
> > > +{
> > > +    NetClientState *nc;
> > > +    RxFilterInfoList *filter_list = NULL, *last_entry = NULL;
> > > +
> > > +    QTAILQ_FOREACH(nc, &net_clients, next) {
> > > +        RxFilterInfoList *entry;
> > > +        RxFilterInfo *info;
> > > +
> > > +        /* only query rx-filter information of nic */
> > > +        if (nc->info->type != NET_CLIENT_OPTIONS_KIND_NIC) {
> > > +            continue;
> > > +        }
> > > +        if (has_name && strcmp(nc->name, name) != 0) {
> > > +            continue;
> > > +        }
> > > +
> > > +        if (nc->info->query_rx_filter) {
> > > +            info = nc->info->query_rx_filter(nc);
> > > +            entry = g_malloc0(sizeof(*entry));
> > > +            entry->value = info;
> > > +
> > > +            if (!filter_list) {
> > > +                filter_list = entry;
> > > +            } else {
> > > +                last_entry->next = entry;
> > > +            }
> > > +            last_entry = entry;
> > > +        } else if (has_name) {
> > > +            error_setg(errp, "net client(%s) doesn't support"
> > > +                       " rx-filter querying", name);
> > > +            break;
> > > +        }
> > > +    }
> > > +
> > > +    if (filter_list == NULL && !error_is_set(errp)) {
> > > +        if (has_name) {
> > > +            error_setg(errp, "invalid net client name: %s", name);
> > > +        } else {
> > > +            error_setg(errp, "no net client supports rx-filter querying");
> > > +        }
> > > +    }
> > > +
> > > +    return filter_list;
> > > +}
> > > +
> > >  void do_info_network(Monitor *mon, const QDict *qdict)
> > >  {
> > >      NetClientState *nc, *peer;
> > > diff --git a/qapi-schema.json b/qapi-schema.json
> > > index 5ad6894..a69dc17 100644
> > > --- a/qapi-schema.json
> > > +++ b/qapi-schema.json
> > > @@ -3624,3 +3624,78 @@
> > >              '*cpuid-input-ecx': 'int',
> > >              'cpuid-register': 'X86CPURegister32',
> > >              'features': 'int' } }
> > > +
> > > +##
> > > +# @RxState:
> > > +#
> > > +# Packets receiving state
> > > +#
> > > +# @normal: filter assigned packets according to the mac-table
> > > +#
> > > +# @none: don't receive any assigned packet
> > > +#
> > > +# @all: receive all assigned packets
> > > +#
> > > +##
> > > +{ 'enum': 'RxState', 'data': [ 'normal', 'none', 'all' ] }
> > > +
> > > +##
> > > +# @RxFilterInfo:
> > > +#
> > > +# Rx-filter information for a nic.
> > > +#
> > > +# @name: net client name
> > > +#
> > > +# @promiscuous: whether promiscuous mode is enabled
> > > +#
> > > +# @multicast: multicast receive state
> > > +#
> > > +# @unicast: unicast receive state
> > > +#
> > > +# @broadcast-allowed: whether to receive broadcast
> > > +#
> > > +# @multicast-overflow: multicast table is overflowed or not
> > > +#
> > > +# @unicast-overflow: unicast table is overflowed or not
> > > +#
> > > +# @main-mac: the main macaddr string
> > > +#
> > > +# @vlan-table: a list of active vlan id
> > > +#
> > > +# @unicast-table: a list of unicast macaddr string
> > > +#
> > > +# @multicast-table: a list of multicast macaddr string
> > > +#
> > > +# Since 1.6
> > > +##
> > > +
> > > +{ 'type': 'RxFilterInfo',
> > > +  'data': {
> > > +    'name':               'str',
> > > +    'promiscuous':        'bool',
> > > +    'multicast':          'RxState',
> > > +    'unicast':            'RxState',
> > > +    'broadcast-allowed':  'bool',
> > > +    'multicast-overflow': 'bool',
> > > +    'unicast-overflow':   'bool',
> > > +    'main-mac':           'str',
> > > +    'vlan-table':         ['int'],
> > > +    'unicast-table':      ['str'],
> > > +    'multicast-table':    ['str'] }}
> > > +
> > > +##
> > > +# @query-rx-filter:
> > > +#
> > > +# Return rx-filter information for all nics (or for the given nic).
> > > +#
> > > +# @name: #optional net client name
> > > +#
> > > +# Returns: list of @RxFilterInfo for all nics (or for the given nic).
> > > +#          Returns an error if the given @name doesn't exist, or given
> > > +#          nic doesn't support rx-filter querying, or no net client
> > > +#          supports rx-filter querying
> > > +#
> > > +# Since: 1.6
> > > +##
> > > +{ 'command': 'query-rx-filter', 'data': { '*name': 'str' },
> > > +  'returns': ['RxFilterInfo'] }
> > > diff --git a/qmp-commands.hx b/qmp-commands.hx
> > > index 8cea5e5..aa00a94 100644
> > > --- a/qmp-commands.hx
> > > +++ b/qmp-commands.hx
> > > @@ -2997,3 +2997,66 @@ Example:
> > >  <- { "return": {} }
> > >  
> > >  EQMP
> > > +    {
> > > +        .name       = "query-rx-filter",
> > > +        .args_type  = "name:s?",
> > > +        .mhandler.cmd_new = qmp_marshal_input_query_rx_filter,
> > > +    },
> > > +
> > > +SQMP
> > > +query-rx-filter
> > > +---------------
> > > +
> > > +Show rx-filter information.
> > > +
> > > +Returns a json-array of rx-filter information for all nics (or for the
> > > +given nic), returning an error if the given nic doesn't exist, or
> > > +given nic doesn't support rx-filter querying, or no net client
> > > +supports rx-filter querying.
> > > +
> > > +The query will clear the event notification flag of each nic, then qemu
> > > +will start to emit event to QMP monitor.
> > > +
> > > +Each array entry contains the following:
> > > +
> > > +- "name": net client name (json-string)
> > > +- "promiscuous": promiscuous mode is enabled (json-bool)
> > > +- "multicast": multicast receive state (one of 'normal', 'none', 'all')
> > > +- "unicast": unicast receive state  (one of 'normal', 'none', 'all')
> > > +- "broadcast-allowed": allow to receive broadcast (json-bool)
> > > +- "multicast-overflow": multicast table is overflowed (json-bool)
> > > +- "unicast-overflow": unicast table is overflowed (json-bool)
> > > +- "main-mac": main macaddr string (json-string)
> > > +- "vlan-table": a json-array of active vlan id
> > > +- "unicast-table": a json-array of unicast macaddr string
> > > +- "multicast-table": a json-array of multicast macaddr string
> > > +
> > > +Example:
> > > +
> > > +-> { "execute": "query-rx-filter", "arguments": { "name": "vnet0" } }
> > > +<- { "return": [
> > > +        {
> > > +            "promiscuous": true,
> > > +            "name": "vnet0",
> > > +            "main-mac": "52:54:00:12:34:56",
> > > +            "unicast": "normal",
> > > +            "vlan-table": [
> > > +                4,
> > > +                0
> > > +            ],
> > > +            "unicast-table": [
> > > +            ],
> > > +            "multicast": "normal",
> > > +            "multicast-overflow": false,
> > > +            "unicast-overflow": false,
> > > +            "multicast-table": [
> > > +                "01:00:5e:00:00:01",
> > > +                "33:33:00:00:00:01",
> > > +                "33:33:ff:12:34:56"
> > > +            ],
> > > +            "broadcast-allowed": false
> > > +        }
> > > +      ]
> > > +   }
> > > +
> > > +EQMP
>
Michael S. Tsirkin June 17, 2013, 2:20 p.m. UTC | #4
On Mon, Jun 17, 2013 at 09:30:42AM -0400, Luiz Capitulino wrote:
> On Mon, 17 Jun 2013 16:21:31 +0300
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
> > On Mon, Jun 17, 2013 at 09:11:27AM -0400, Luiz Capitulino wrote:
> > > On Fri, 14 Jun 2013 15:45:52 +0800
> > > Amos Kong <akong@redhat.com> wrote:
> > > 
> > > > Currently macvtap based macvlan device is working in promiscuous
> > > > mode, we want to implement mac-programming over macvtap through
> > > > Libvirt for better performance.
> > > > 
> > > > Design:
> > > >  QEMU notifies Libvirt when rx-filter config is changed in guest,
> > > >  then Libvirt query the rx-filter information by a monitor command,
> > > >  and sync the change to macvtap device. Related rx-filter config
> > > >  of the nic contains main mac, rx-mode items and vlan table.
> > > > 
> > > > This patch adds a QMP event to notify management of rx-filter change,
> > > > and adds a monitor command for management to query rx-filter
> > > > information.
> > > > 
> > > > Test:
> > > >  If we repeatedly add/remove vlan, and change macaddr of vlan
> > > >  interfaces in guest by a loop script.
> > > > 
> > > > Result:
> > > >  The events will flood the QMP client(management), management takes
> > > >  too much resource to process the events.
> > > > 
> > > >  Event_throttle API (set rate to 1 ms) can avoid the events to flood
> > > 
> > > I doubt this is a valid value. Today, the three events that use the event
> > > throttle API set the delay rate to 1000 ms.
> > > 
> > > >  QMP client, but it could cause an unexpected delay (~1ms), guests
> > > >  guests normally expect rx-filter updates immediately.
> > > 
> > > What you mean by "immediately"? There's a round-trip to the host plus
> > > all the stuff QEMU will execute to fulfil the request. And how did you
> > > measure this, btw?
> > > 
> > > What you have to do is is to measure your test-case in three different
> > > scenarios:
> > > 
> > >  1. Against upstream QEMU (ie. no patches)
> > >  2. With the event throttle API
> > >  3. With this patch
> > > 
> > > Only then you'll be able which is better.
> > 
> > 
> > I doubt there's a lot of value in splitting this patch in two and
> > testing whether we can reproduce any problems with a partial patch. 
> 
> I didn't suggest that. Actually, I didn't even talk about code.
>
> > The
> > problem of management not keeping up with event flood triggered by mac
> > updates by guest might be purely theoretical,
> 
> That's not the problem the flag thing is trying to solve. We have the
> event throttle API for that. What the flag is trying to solve is _guest_
> latency due to event generation and it's not clear at all if this is a real
> problem.
> 
> > but a robust solution that
> > does not have theoretical holes makes me sleep better at night, and it's a
> > trivial amount of code.
> 
> It has the drawback of coupling the query command with the event.

What specifically is the problem?

I only see a fix for a problem, that might be theoretical, but I'm
happier knowing I don't need to worry about it.

> IMO, such
> a coupling has to be justified with real data.

It's a quality of implementation issue.

No spec says that you should not delay RX filter updates for a very long
time (yes 1000ms is very long), but if a NIC does it it's a low quality
NIC, and it will cause trouble in the field.

I don't think we need to spend time proving that with real data, it's just
obvious.

> > 
> > So
> > 
> > Acked-by: Michael S. Tsirkin <mst@redhat.com>
> > 
> > Any enhancements can be implemented
> > by additional patches, to be applied on top.
> > 
> > > > 
> > > >  So we use a flag for each nic to avoid events flooding, the event
> > > >  is emitted once until the query command is executed. The flag
> > > >  implementation could not introduce unexpected delay.
> > > > 
> > > > There maybe exist an uncontrollable delay if we let Libvirt do the
> > > > real change, guests normally expect rx-filter updates immediately.
> > > > But it's another separate issue, we can investigate it when the
> > > > work in Libvirt side is done.
> > > > 
> > > > Signed-off-by: Amos Kong <akong@redhat.com>
> > > > ---
> > > > v2: add argument to filter mac-table info of single nic (Stefan)
> > > >     update the document, add event notification
> > > > v3: rename to rx-filter, add main mac, avoid events flooding (MST)
> > > >     fix error process (Stefan), fix qmp interface (Eric)
> > > > v4: process qerror in hmp, cleanup (Luiz)
> > > >     set flag for each device, add device path in event, add
> > > >     helper for g_strdup_printf (MST)
> > > >     fix qmp document (Eric)
> > > > v5: add path in doc, define notify flag to unsigned (Eric)
> > > >     add vlan table (Jason), drop monitor cmd
> > > > v6: return vids list, add test results/analysis to commitlog
> > > > ---
> > > >  QMP/qmp-events.txt        |  17 ++++++++
> > > >  hw/net/virtio-net.c       | 108 ++++++++++++++++++++++++++++++++++++++++++++++
> > > >  include/monitor/monitor.h |   1 +
> > > >  include/net/net.h         |   3 ++
> > > >  monitor.c                 |   1 +
> > > >  net/net.c                 |  47 ++++++++++++++++++++
> > > >  qapi-schema.json          |  75 ++++++++++++++++++++++++++++++++
> > > >  qmp-commands.hx           |  63 +++++++++++++++++++++++++++
> > > >  8 files changed, 315 insertions(+)
> > > > 
> > > > diff --git a/QMP/qmp-events.txt b/QMP/qmp-events.txt
> > > > index 24e804e9..39b6016 100644
> > > > --- a/QMP/qmp-events.txt
> > > > +++ b/QMP/qmp-events.txt
> > > > @@ -172,6 +172,23 @@ Data:
> > > >    },
> > > >    "timestamp": { "seconds": 1265044230, "microseconds": 450486 } }
> > > >  
> > > > +NIC_RX_FILTER_CHANGED
> > > > +-----------------
> > > > +
> > > > +The event is emitted once until the query command is executed,
> > > > +the first event will always be emitted.
> > > > +
> > > > +Data:
> > > > +
> > > > +- "name": net client name (json-string)
> > > > +- "path": device path (json-string)
> > > > +
> > > > +{ "event": "NIC_RX_FILTER_CHANGED",
> > > > +  "data": { "name": "vnet0",
> > > > +            "path": "/machine/peripheral/vnet0/virtio-backend" },
> > > > +  "timestamp": { "seconds": 1368697518, "microseconds": 326866 } }
> > > > +}
> > > > +
> > > >  RESET
> > > >  -----
> > > >  
> > > > diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> > > > index 1ea9556..4137959 100644
> > > > --- a/hw/net/virtio-net.c
> > > > +++ b/hw/net/virtio-net.c
> > > > @@ -21,6 +21,8 @@
> > > >  #include "hw/virtio/virtio-net.h"
> > > >  #include "net/vhost_net.h"
> > > >  #include "hw/virtio/virtio-bus.h"
> > > > +#include "qapi/qmp/qjson.h"
> > > > +#include "monitor/monitor.h"
> > > >  
> > > >  #define VIRTIO_NET_VM_VERSION    11
> > > >  
> > > > @@ -192,6 +194,100 @@ static void virtio_net_set_link_status(NetClientState *nc)
> > > >      virtio_net_set_status(vdev, vdev->status);
> > > >  }
> > > >  
> > > > +static void rxfilter_notify(NetClientState *nc)
> > > > +{
> > > > +    QObject *event_data;
> > > > +    VirtIONet *n = qemu_get_nic_opaque(nc);
> > > > +
> > > > +    if (nc->rxfilter_notify_enabled) {
> > > > +        event_data = qobject_from_jsonf("{ 'name': %s, 'path': %s }",
> > > > +                           n->netclient_name,
> > > > +                           object_get_canonical_path(OBJECT(n->qdev)));
> > > > +        monitor_protocol_event(QEVENT_NIC_RX_FILTER_CHANGED, event_data);
> > > > +        qobject_decref(event_data);
> > > > +
> > > > +        /* disable event notification to avoid events flooding */
> > > > +        nc->rxfilter_notify_enabled = 0;
> > > > +    }
> > > > +}
> > > > +
> > > > +static char *mac_strdup_printf(const uint8_t *mac)
> > > > +{
> > > > +    return g_strdup_printf("%.2x:%.2x:%.2x:%.2x:%.2x:%.2x", mac[0],
> > > > +                            mac[1], mac[2], mac[3], mac[4], mac[5]);
> > > > +}
> > > > +
> > > > +static RxFilterInfo *virtio_net_query_rxfilter(NetClientState *nc)
> > > > +{
> > > > +    VirtIONet *n = qemu_get_nic_opaque(nc);
> > > > +    RxFilterInfo *info;
> > > > +    strList *str_list = NULL;
> > > > +    strList *entry;
> > > > +    intList *int_list = NULL;
> > > > +    intList *int_entry;
> > > > +    int i, j;
> > > > +
> > > > +    info = g_malloc0(sizeof(*info));
> > > > +    info->name = g_strdup(nc->name);
> > > > +    info->promiscuous = n->promisc;
> > > > +
> > > > +    if (n->nouni) {
> > > > +        info->unicast = RX_STATE_NONE;
> > > > +    } else if (n->alluni) {
> > > > +        info->unicast = RX_STATE_ALL;
> > > > +    } else {
> > > > +        info->unicast = RX_STATE_NORMAL;
> > > > +    }
> > > > +
> > > > +    if (n->nomulti) {
> > > > +        info->multicast = RX_STATE_NONE;
> > > > +    } else if (n->allmulti) {
> > > > +        info->multicast = RX_STATE_ALL;
> > > > +    } else {
> > > > +        info->multicast = RX_STATE_NORMAL;
> > > > +    }
> > > > +
> > > > +    info->broadcast_allowed = n->nobcast;
> > > > +    info->multicast_overflow = n->mac_table.multi_overflow;
> > > > +    info->unicast_overflow = n->mac_table.uni_overflow;
> > > > +
> > > > +    info->main_mac = mac_strdup_printf(n->mac);
> > > > +
> > > > +    for (i = 0; i < n->mac_table.first_multi; i++) {
> > > > +        entry = g_malloc0(sizeof(*entry));
> > > > +        entry->value = mac_strdup_printf(n->mac_table.macs + i * ETH_ALEN);
> > > > +        entry->next = str_list;
> > > > +        str_list = entry;
> > > > +    }
> > > > +    info->unicast_table = str_list;
> > > > +
> > > > +    str_list = NULL;
> > > > +    for (i = n->mac_table.first_multi; i < n->mac_table.in_use; i++) {
> > > > +        entry = g_malloc0(sizeof(*entry));
> > > > +        entry->value = mac_strdup_printf(n->mac_table.macs + i * ETH_ALEN);
> > > > +        entry->next = str_list;
> > > > +        str_list = entry;
> > > > +    }
> > > > +    info->multicast_table = str_list;
> > > > +
> > > > +    for (i = 0; i < MAX_VLAN >> 5; i++) {
> > > > +        for (j = 0; n->vlans[i] && j < 0x1f; j++) {
> > > > +            if (n->vlans[i] & (1U << j)) {
> > > > +                int_entry = g_malloc0(sizeof(*int_entry));
> > > > +                int_entry->value = (i << 5) + j;
> > > > +                int_entry->next = int_list;
> > > > +                int_list = int_entry;
> > > > +            }
> > > > +        }
> > > > +    }
> > > > +    info->vlan_table = int_list;
> > > > +
> > > > +    /* enable event notification after query */
> > > > +    nc->rxfilter_notify_enabled = 1;
> > > > +
> > > > +    return info;
> > > > +}
> > > > +
> > > >  static void virtio_net_reset(VirtIODevice *vdev)
> > > >  {
> > > >      VirtIONet *n = VIRTIO_NET(vdev);
> > > > @@ -420,6 +516,7 @@ static int virtio_net_handle_rx_mode(VirtIONet *n, uint8_t cmd,
> > > >  {
> > > >      uint8_t on;
> > > >      size_t s;
> > > > +    NetClientState *nc = qemu_get_queue(n->nic);
> > > >  
> > > >      s = iov_to_buf(iov, iov_cnt, 0, &on, sizeof(on));
> > > >      if (s != sizeof(on)) {
> > > > @@ -442,6 +539,8 @@ static int virtio_net_handle_rx_mode(VirtIONet *n, uint8_t cmd,
> > > >          return VIRTIO_NET_ERR;
> > > >      }
> > > >  
> > > > +    rxfilter_notify(nc);
> > > > +
> > > >      return VIRTIO_NET_OK;
> > > >  }
> > > >  
> > > > @@ -487,6 +586,7 @@ static int virtio_net_handle_mac(VirtIONet *n, uint8_t cmd,
> > > >  {
> > > >      struct virtio_net_ctrl_mac mac_data;
> > > >      size_t s;
> > > > +    NetClientState *nc = qemu_get_queue(n->nic);
> > > >  
> > > >      if (cmd == VIRTIO_NET_CTRL_MAC_ADDR_SET) {
> > > >          if (iov_size(iov, iov_cnt) != sizeof(n->mac)) {
> > > > @@ -495,6 +595,8 @@ static int virtio_net_handle_mac(VirtIONet *n, uint8_t cmd,
> > > >          s = iov_to_buf(iov, iov_cnt, 0, &n->mac, sizeof(n->mac));
> > > >          assert(s == sizeof(n->mac));
> > > >          qemu_format_nic_info_str(qemu_get_queue(n->nic), n->mac);
> > > > +        rxfilter_notify(nc);
> > > > +
> > > >          return VIRTIO_NET_OK;
> > > >      }
> > > >  
> > > > @@ -559,6 +661,8 @@ static int virtio_net_handle_mac(VirtIONet *n, uint8_t cmd,
> > > >          n->mac_table.multi_overflow = 1;
> > > >      }
> > > >  
> > > > +    rxfilter_notify(nc);
> > > > +
> > > >      return VIRTIO_NET_OK;
> > > >  }
> > > >  
> > > > @@ -567,6 +671,7 @@ static int virtio_net_handle_vlan_table(VirtIONet *n, uint8_t cmd,
> > > >  {
> > > >      uint16_t vid;
> > > >      size_t s;
> > > > +    NetClientState *nc = qemu_get_queue(n->nic);
> > > >  
> > > >      s = iov_to_buf(iov, iov_cnt, 0, &vid, sizeof(vid));
> > > >      vid = lduw_p(&vid);
> > > > @@ -584,6 +689,8 @@ static int virtio_net_handle_vlan_table(VirtIONet *n, uint8_t cmd,
> > > >      else
> > > >          return VIRTIO_NET_ERR;
> > > >  
> > > > +    rxfilter_notify(nc);
> > > > +
> > > >      return VIRTIO_NET_OK;
> > > >  }
> > > >  
> > > > @@ -1312,6 +1419,7 @@ static NetClientInfo net_virtio_info = {
> > > >      .receive = virtio_net_receive,
> > > >          .cleanup = virtio_net_cleanup,
> > > >      .link_status_changed = virtio_net_set_link_status,
> > > > +    .query_rx_filter = virtio_net_query_rxfilter,
> > > >  };
> > > >  
> > > >  static bool virtio_net_guest_notifier_pending(VirtIODevice *vdev, int idx)
> > > > diff --git a/include/monitor/monitor.h b/include/monitor/monitor.h
> > > > index 1a6cfcf..1942cc4 100644
> > > > --- a/include/monitor/monitor.h
> > > > +++ b/include/monitor/monitor.h
> > > > @@ -41,6 +41,7 @@ typedef enum MonitorEvent {
> > > >      QEVENT_BLOCK_JOB_READY,
> > > >      QEVENT_DEVICE_DELETED,
> > > >      QEVENT_DEVICE_TRAY_MOVED,
> > > > +    QEVENT_NIC_RX_FILTER_CHANGED,
> > > >      QEVENT_SUSPEND,
> > > >      QEVENT_SUSPEND_DISK,
> > > >      QEVENT_WAKEUP,
> > > > diff --git a/include/net/net.h b/include/net/net.h
> > > > index 43d85a1..30e4b04 100644
> > > > --- a/include/net/net.h
> > > > +++ b/include/net/net.h
> > > > @@ -49,6 +49,7 @@ typedef ssize_t (NetReceiveIOV)(NetClientState *, const struct iovec *, int);
> > > >  typedef void (NetCleanup) (NetClientState *);
> > > >  typedef void (LinkStatusChanged)(NetClientState *);
> > > >  typedef void (NetClientDestructor)(NetClientState *);
> > > > +typedef RxFilterInfo *(QueryRxFilter)(NetClientState *);
> > > >  
> > > >  typedef struct NetClientInfo {
> > > >      NetClientOptionsKind type;
> > > > @@ -59,6 +60,7 @@ typedef struct NetClientInfo {
> > > >      NetCanReceive *can_receive;
> > > >      NetCleanup *cleanup;
> > > >      LinkStatusChanged *link_status_changed;
> > > > +    QueryRxFilter *query_rx_filter;
> > > >      NetPoll *poll;
> > > >  } NetClientInfo;
> > > >  
> > > > @@ -74,6 +76,7 @@ struct NetClientState {
> > > >      unsigned receive_disabled : 1;
> > > >      NetClientDestructor *destructor;
> > > >      unsigned int queue_index;
> > > > +    unsigned rxfilter_notify_enabled:1;
> > > >  };
> > > >  
> > > >  typedef struct NICState {
> > > > diff --git a/monitor.c b/monitor.c
> > > > index 017411f..3b8117c 100644
> > > > --- a/monitor.c
> > > > +++ b/monitor.c
> > > > @@ -490,6 +490,7 @@ static const char *monitor_event_names[] = {
> > > >      [QEVENT_BLOCK_JOB_READY] = "BLOCK_JOB_READY",
> > > >      [QEVENT_DEVICE_DELETED] = "DEVICE_DELETED",
> > > >      [QEVENT_DEVICE_TRAY_MOVED] = "DEVICE_TRAY_MOVED",
> > > > +    [QEVENT_NIC_RX_FILTER_CHANGED] = "NIC_RX_FILTER_CHANGED",
> > > >      [QEVENT_SUSPEND] = "SUSPEND",
> > > >      [QEVENT_SUSPEND_DISK] = "SUSPEND_DISK",
> > > >      [QEVENT_WAKEUP] = "WAKEUP",
> > > > diff --git a/net/net.c b/net/net.c
> > > > index 43a74e4..33abffe 100644
> > > > --- a/net/net.c
> > > > +++ b/net/net.c
> > > > @@ -961,6 +961,53 @@ void print_net_client(Monitor *mon, NetClientState *nc)
> > > >                     nc->info_str);
> > > >  }
> > > >  
> > > > +RxFilterInfoList *qmp_query_rx_filter(bool has_name, const char *name,
> > > > +                                      Error **errp)
> > > > +{
> > > > +    NetClientState *nc;
> > > > +    RxFilterInfoList *filter_list = NULL, *last_entry = NULL;
> > > > +
> > > > +    QTAILQ_FOREACH(nc, &net_clients, next) {
> > > > +        RxFilterInfoList *entry;
> > > > +        RxFilterInfo *info;
> > > > +
> > > > +        /* only query rx-filter information of nic */
> > > > +        if (nc->info->type != NET_CLIENT_OPTIONS_KIND_NIC) {
> > > > +            continue;
> > > > +        }
> > > > +        if (has_name && strcmp(nc->name, name) != 0) {
> > > > +            continue;
> > > > +        }
> > > > +
> > > > +        if (nc->info->query_rx_filter) {
> > > > +            info = nc->info->query_rx_filter(nc);
> > > > +            entry = g_malloc0(sizeof(*entry));
> > > > +            entry->value = info;
> > > > +
> > > > +            if (!filter_list) {
> > > > +                filter_list = entry;
> > > > +            } else {
> > > > +                last_entry->next = entry;
> > > > +            }
> > > > +            last_entry = entry;
> > > > +        } else if (has_name) {
> > > > +            error_setg(errp, "net client(%s) doesn't support"
> > > > +                       " rx-filter querying", name);
> > > > +            break;
> > > > +        }
> > > > +    }
> > > > +
> > > > +    if (filter_list == NULL && !error_is_set(errp)) {
> > > > +        if (has_name) {
> > > > +            error_setg(errp, "invalid net client name: %s", name);
> > > > +        } else {
> > > > +            error_setg(errp, "no net client supports rx-filter querying");
> > > > +        }
> > > > +    }
> > > > +
> > > > +    return filter_list;
> > > > +}
> > > > +
> > > >  void do_info_network(Monitor *mon, const QDict *qdict)
> > > >  {
> > > >      NetClientState *nc, *peer;
> > > > diff --git a/qapi-schema.json b/qapi-schema.json
> > > > index 5ad6894..a69dc17 100644
> > > > --- a/qapi-schema.json
> > > > +++ b/qapi-schema.json
> > > > @@ -3624,3 +3624,78 @@
> > > >              '*cpuid-input-ecx': 'int',
> > > >              'cpuid-register': 'X86CPURegister32',
> > > >              'features': 'int' } }
> > > > +
> > > > +##
> > > > +# @RxState:
> > > > +#
> > > > +# Packets receiving state
> > > > +#
> > > > +# @normal: filter assigned packets according to the mac-table
> > > > +#
> > > > +# @none: don't receive any assigned packet
> > > > +#
> > > > +# @all: receive all assigned packets
> > > > +#
> > > > +##
> > > > +{ 'enum': 'RxState', 'data': [ 'normal', 'none', 'all' ] }
> > > > +
> > > > +##
> > > > +# @RxFilterInfo:
> > > > +#
> > > > +# Rx-filter information for a nic.
> > > > +#
> > > > +# @name: net client name
> > > > +#
> > > > +# @promiscuous: whether promiscuous mode is enabled
> > > > +#
> > > > +# @multicast: multicast receive state
> > > > +#
> > > > +# @unicast: unicast receive state
> > > > +#
> > > > +# @broadcast-allowed: whether to receive broadcast
> > > > +#
> > > > +# @multicast-overflow: multicast table is overflowed or not
> > > > +#
> > > > +# @unicast-overflow: unicast table is overflowed or not
> > > > +#
> > > > +# @main-mac: the main macaddr string
> > > > +#
> > > > +# @vlan-table: a list of active vlan id
> > > > +#
> > > > +# @unicast-table: a list of unicast macaddr string
> > > > +#
> > > > +# @multicast-table: a list of multicast macaddr string
> > > > +#
> > > > +# Since 1.6
> > > > +##
> > > > +
> > > > +{ 'type': 'RxFilterInfo',
> > > > +  'data': {
> > > > +    'name':               'str',
> > > > +    'promiscuous':        'bool',
> > > > +    'multicast':          'RxState',
> > > > +    'unicast':            'RxState',
> > > > +    'broadcast-allowed':  'bool',
> > > > +    'multicast-overflow': 'bool',
> > > > +    'unicast-overflow':   'bool',
> > > > +    'main-mac':           'str',
> > > > +    'vlan-table':         ['int'],
> > > > +    'unicast-table':      ['str'],
> > > > +    'multicast-table':    ['str'] }}
> > > > +
> > > > +##
> > > > +# @query-rx-filter:
> > > > +#
> > > > +# Return rx-filter information for all nics (or for the given nic).
> > > > +#
> > > > +# @name: #optional net client name
> > > > +#
> > > > +# Returns: list of @RxFilterInfo for all nics (or for the given nic).
> > > > +#          Returns an error if the given @name doesn't exist, or given
> > > > +#          nic doesn't support rx-filter querying, or no net client
> > > > +#          supports rx-filter querying
> > > > +#
> > > > +# Since: 1.6
> > > > +##
> > > > +{ 'command': 'query-rx-filter', 'data': { '*name': 'str' },
> > > > +  'returns': ['RxFilterInfo'] }
> > > > diff --git a/qmp-commands.hx b/qmp-commands.hx
> > > > index 8cea5e5..aa00a94 100644
> > > > --- a/qmp-commands.hx
> > > > +++ b/qmp-commands.hx
> > > > @@ -2997,3 +2997,66 @@ Example:
> > > >  <- { "return": {} }
> > > >  
> > > >  EQMP
> > > > +    {
> > > > +        .name       = "query-rx-filter",
> > > > +        .args_type  = "name:s?",
> > > > +        .mhandler.cmd_new = qmp_marshal_input_query_rx_filter,
> > > > +    },
> > > > +
> > > > +SQMP
> > > > +query-rx-filter
> > > > +---------------
> > > > +
> > > > +Show rx-filter information.
> > > > +
> > > > +Returns a json-array of rx-filter information for all nics (or for the
> > > > +given nic), returning an error if the given nic doesn't exist, or
> > > > +given nic doesn't support rx-filter querying, or no net client
> > > > +supports rx-filter querying.
> > > > +
> > > > +The query will clear the event notification flag of each nic, then qemu
> > > > +will start to emit event to QMP monitor.
> > > > +
> > > > +Each array entry contains the following:
> > > > +
> > > > +- "name": net client name (json-string)
> > > > +- "promiscuous": promiscuous mode is enabled (json-bool)
> > > > +- "multicast": multicast receive state (one of 'normal', 'none', 'all')
> > > > +- "unicast": unicast receive state  (one of 'normal', 'none', 'all')
> > > > +- "broadcast-allowed": allow to receive broadcast (json-bool)
> > > > +- "multicast-overflow": multicast table is overflowed (json-bool)
> > > > +- "unicast-overflow": unicast table is overflowed (json-bool)
> > > > +- "main-mac": main macaddr string (json-string)
> > > > +- "vlan-table": a json-array of active vlan id
> > > > +- "unicast-table": a json-array of unicast macaddr string
> > > > +- "multicast-table": a json-array of multicast macaddr string
> > > > +
> > > > +Example:
> > > > +
> > > > +-> { "execute": "query-rx-filter", "arguments": { "name": "vnet0" } }
> > > > +<- { "return": [
> > > > +        {
> > > > +            "promiscuous": true,
> > > > +            "name": "vnet0",
> > > > +            "main-mac": "52:54:00:12:34:56",
> > > > +            "unicast": "normal",
> > > > +            "vlan-table": [
> > > > +                4,
> > > > +                0
> > > > +            ],
> > > > +            "unicast-table": [
> > > > +            ],
> > > > +            "multicast": "normal",
> > > > +            "multicast-overflow": false,
> > > > +            "unicast-overflow": false,
> > > > +            "multicast-table": [
> > > > +                "01:00:5e:00:00:01",
> > > > +                "33:33:00:00:00:01",
> > > > +                "33:33:ff:12:34:56"
> > > > +            ],
> > > > +            "broadcast-allowed": false
> > > > +        }
> > > > +      ]
> > > > +   }
> > > > +
> > > > +EQMP
> >
Luiz Capitulino June 17, 2013, 2:34 p.m. UTC | #5
On Mon, 17 Jun 2013 17:20:13 +0300
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Mon, Jun 17, 2013 at 09:30:42AM -0400, Luiz Capitulino wrote:
> > On Mon, 17 Jun 2013 16:21:31 +0300
> > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > 
> > > On Mon, Jun 17, 2013 at 09:11:27AM -0400, Luiz Capitulino wrote:
> > > > On Fri, 14 Jun 2013 15:45:52 +0800
> > > > Amos Kong <akong@redhat.com> wrote:
> > > > 
> > > > > Currently macvtap based macvlan device is working in promiscuous
> > > > > mode, we want to implement mac-programming over macvtap through
> > > > > Libvirt for better performance.
> > > > > 
> > > > > Design:
> > > > >  QEMU notifies Libvirt when rx-filter config is changed in guest,
> > > > >  then Libvirt query the rx-filter information by a monitor command,
> > > > >  and sync the change to macvtap device. Related rx-filter config
> > > > >  of the nic contains main mac, rx-mode items and vlan table.
> > > > > 
> > > > > This patch adds a QMP event to notify management of rx-filter change,
> > > > > and adds a monitor command for management to query rx-filter
> > > > > information.
> > > > > 
> > > > > Test:
> > > > >  If we repeatedly add/remove vlan, and change macaddr of vlan
> > > > >  interfaces in guest by a loop script.
> > > > > 
> > > > > Result:
> > > > >  The events will flood the QMP client(management), management takes
> > > > >  too much resource to process the events.
> > > > > 
> > > > >  Event_throttle API (set rate to 1 ms) can avoid the events to flood
> > > > 
> > > > I doubt this is a valid value. Today, the three events that use the event
> > > > throttle API set the delay rate to 1000 ms.
> > > > 
> > > > >  QMP client, but it could cause an unexpected delay (~1ms), guests
> > > > >  guests normally expect rx-filter updates immediately.
> > > > 
> > > > What you mean by "immediately"? There's a round-trip to the host plus
> > > > all the stuff QEMU will execute to fulfil the request. And how did you
> > > > measure this, btw?
> > > > 
> > > > What you have to do is is to measure your test-case in three different
> > > > scenarios:
> > > > 
> > > >  1. Against upstream QEMU (ie. no patches)
> > > >  2. With the event throttle API
> > > >  3. With this patch
> > > > 
> > > > Only then you'll be able which is better.
> > > 
> > > 
> > > I doubt there's a lot of value in splitting this patch in two and
> > > testing whether we can reproduce any problems with a partial patch. 
> > 
> > I didn't suggest that. Actually, I didn't even talk about code.
> >
> > > The
> > > problem of management not keeping up with event flood triggered by mac
> > > updates by guest might be purely theoretical,
> > 
> > That's not the problem the flag thing is trying to solve. We have the
> > event throttle API for that. What the flag is trying to solve is _guest_
> > latency due to event generation and it's not clear at all if this is a real
> > problem.
> > 
> > > but a robust solution that
> > > does not have theoretical holes makes me sleep better at night, and it's a
> > > trivial amount of code.
> > 
> > It has the drawback of coupling the query command with the event.
> 
> What specifically is the problem?

Events and commands are independent entities. Coupling them can have bad
side effects, like we may have the ability to disable commands in
the future. In this case this event would only be sent once!

We could, of course, disable the event along, but that's an unclear side
effect too.

> I only see a fix for a problem, that might be theoretical, but I'm
> happier knowing I don't need to worry about it.
> 
> > IMO, such
> > a coupling has to be justified with real data.
> 
> It's a quality of implementation issue.
> 
> No spec says that you should not delay RX filter updates for a very long
> time (yes 1000ms is very long), but if a NIC does it it's a low quality
> NIC, and it will cause trouble in the field.

The 1000ms I talked about is *not* what the guest will see. If there are
events pending, the throttle API just queues the event and returns right
away. I'd even _guess_ that this is faster then emitting the event.

The timeout you have to specify in the throttle API is what *mngt* will
see if events are flooded.

> I don't think we need to spend time proving that with real data, it's just
> obvious.

Why is it so difficult to show if it's that obvious?

Let me clarify that I don't oppose to the implementation, as long as it's
shown that it's really needed.
Michael S. Tsirkin June 17, 2013, 2:42 p.m. UTC | #6
On Mon, Jun 17, 2013 at 10:34:28AM -0400, Luiz Capitulino wrote:
> On Mon, 17 Jun 2013 17:20:13 +0300
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
> > On Mon, Jun 17, 2013 at 09:30:42AM -0400, Luiz Capitulino wrote:
> > > On Mon, 17 Jun 2013 16:21:31 +0300
> > > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > 
> > > > On Mon, Jun 17, 2013 at 09:11:27AM -0400, Luiz Capitulino wrote:
> > > > > On Fri, 14 Jun 2013 15:45:52 +0800
> > > > > Amos Kong <akong@redhat.com> wrote:
> > > > > 
> > > > > > Currently macvtap based macvlan device is working in promiscuous
> > > > > > mode, we want to implement mac-programming over macvtap through
> > > > > > Libvirt for better performance.
> > > > > > 
> > > > > > Design:
> > > > > >  QEMU notifies Libvirt when rx-filter config is changed in guest,
> > > > > >  then Libvirt query the rx-filter information by a monitor command,
> > > > > >  and sync the change to macvtap device. Related rx-filter config
> > > > > >  of the nic contains main mac, rx-mode items and vlan table.
> > > > > > 
> > > > > > This patch adds a QMP event to notify management of rx-filter change,
> > > > > > and adds a monitor command for management to query rx-filter
> > > > > > information.
> > > > > > 
> > > > > > Test:
> > > > > >  If we repeatedly add/remove vlan, and change macaddr of vlan
> > > > > >  interfaces in guest by a loop script.
> > > > > > 
> > > > > > Result:
> > > > > >  The events will flood the QMP client(management), management takes
> > > > > >  too much resource to process the events.
> > > > > > 
> > > > > >  Event_throttle API (set rate to 1 ms) can avoid the events to flood
> > > > > 
> > > > > I doubt this is a valid value. Today, the three events that use the event
> > > > > throttle API set the delay rate to 1000 ms.
> > > > > 
> > > > > >  QMP client, but it could cause an unexpected delay (~1ms), guests
> > > > > >  guests normally expect rx-filter updates immediately.
> > > > > 
> > > > > What you mean by "immediately"? There's a round-trip to the host plus
> > > > > all the stuff QEMU will execute to fulfil the request. And how did you
> > > > > measure this, btw?
> > > > > 
> > > > > What you have to do is is to measure your test-case in three different
> > > > > scenarios:
> > > > > 
> > > > >  1. Against upstream QEMU (ie. no patches)
> > > > >  2. With the event throttle API
> > > > >  3. With this patch
> > > > > 
> > > > > Only then you'll be able which is better.
> > > > 
> > > > 
> > > > I doubt there's a lot of value in splitting this patch in two and
> > > > testing whether we can reproduce any problems with a partial patch. 
> > > 
> > > I didn't suggest that. Actually, I didn't even talk about code.
> > >
> > > > The
> > > > problem of management not keeping up with event flood triggered by mac
> > > > updates by guest might be purely theoretical,
> > > 
> > > That's not the problem the flag thing is trying to solve. We have the
> > > event throttle API for that. What the flag is trying to solve is _guest_
> > > latency due to event generation and it's not clear at all if this is a real
> > > problem.
> > > 
> > > > but a robust solution that
> > > > does not have theoretical holes makes me sleep better at night, and it's a
> > > > trivial amount of code.
> > > 
> > > It has the drawback of coupling the query command with the event.
> > 
> > What specifically is the problem?
> 
> Events and commands are independent entities. Coupling them can have bad
> side effects, like we may have the ability to disable commands in
> the future. In this case this event would only be sent once!
> We could, of course, disable the event along, but that's an unclear side
> effect too.

If query command is disabled, this event is useless.

> 
> > I only see a fix for a problem, that might be theoretical, but I'm
> > happier knowing I don't need to worry about it.
> > 
> > > IMO, such
> > > a coupling has to be justified with real data.
> > 
> > It's a quality of implementation issue.
> > 
> > No spec says that you should not delay RX filter updates for a very long
> > time (yes 1000ms is very long), but if a NIC does it it's a low quality
> > NIC, and it will cause trouble in the field.
> 
> The 1000ms I talked about is *not* what the guest will see. If there are
> events pending, the throttle API just queues the event and returns right
> away. I'd even _guess_ that this is faster then emitting the event.

If the filter is not updated for 1000ms then that is guest visible:
it is not getting packets with the new MAC.

> 
> The timeout you have to specify in the throttle API is what *mngt* will
> see if events are flooded.

That's the point. It's a good fit for events which are not guest
visible. This event should lead to a guest visible effect so it
should not be delayed.

> > I don't think we need to spend time proving that with real data, it's just
> > obvious.
> 
> Why is it so difficult to show if it's that obvious?
>
> Let me clarify that I don't oppose to the implementation, as long as it's
> shown that it's really needed.
Amos Kong June 17, 2013, 2:50 p.m. UTC | #7
On Mon, Jun 17, 2013 at 09:11:27AM -0400, Luiz Capitulino wrote:
> On Fri, 14 Jun 2013 15:45:52 +0800
> Amos Kong <akong@redhat.com> wrote:
> 
> > Currently macvtap based macvlan device is working in promiscuous
> > mode, we want to implement mac-programming over macvtap through
> > Libvirt for better performance.
> > 
> > Design:
> >  QEMU notifies Libvirt when rx-filter config is changed in guest,
> >  then Libvirt query the rx-filter information by a monitor command,
> >  and sync the change to macvtap device. Related rx-filter config
> >  of the nic contains main mac, rx-mode items and vlan table.
> > 
> > This patch adds a QMP event to notify management of rx-filter change,
> > and adds a monitor command for management to query rx-filter
> > information.
> > 
> > Test:
> >  If we repeatedly add/remove vlan, and change macaddr of vlan
> >  interfaces in guest by a loop script.
> > 
> > Result:
> >  The events will flood the QMP client(management), management takes
> >  too much resource to process the events.
> > 
> >  Event_throttle API (set rate to 1 ms) can avoid the events to flood
> 
> I doubt this is a valid value. Today, the three events that use the event
> throttle API set the delay rate to 1000 ms.

We even could not bear the 1ms delay, 1000 is too large.
 
> >  QMP client, but it could cause an unexpected delay (~1ms), guests
> >  guests normally expect rx-filter updates immediately.
> 
> What you mean by "immediately"? There's a round-trip to the host plus
> all the stuff QEMU will execute to fulfil the request. And how did you
> measure this, btw?

'Immediately' means ASAP here.

We could not prevent qemu to execute. But we replace throttle API by
a smart flag.

Throttle API might be a good design, but it's redundant in this case.
 
> What you have to do is is to measure your test-case in three different
> scenarios:
> 
>  1. Against upstream QEMU (ie. no patches)
>  2. With the event throttle API
>  3. With this patch

1. use flag: the time used to check the flag, we can ignore the delay.
2. use throttle API: delay is 1ms or 1000ms.

~0ms  VS. more than 1000ms

> Only then you'll be able which is better.

Amos.
Luiz Capitulino June 17, 2013, 2:54 p.m. UTC | #8
On Mon, 17 Jun 2013 17:42:50 +0300
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> > The 1000ms I talked about is *not* what the guest will see. If there are
> > events pending, the throttle API just queues the event and returns right
> > away. I'd even _guess_ that this is faster then emitting the event.
> 
> If the filter is not updated for 1000ms then that is guest visible:
> it is not getting packets with the new MAC.

Let me understand this better: the filter is going to be updated by
libvirt, is that correctly?

If this is right, then I can understand where you came from, but how
can we possibly control how long it will take for libvirt to take
action?
Amos Kong June 17, 2013, 3:03 p.m. UTC | #9
On Mon, Jun 17, 2013 at 10:54:39AM -0400, Luiz Capitulino wrote:
> On Mon, 17 Jun 2013 17:42:50 +0300
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
> > > The 1000ms I talked about is *not* what the guest will see. If there are
> > > events pending, the throttle API just queues the event and returns right
> > > away. I'd even _guess_ that this is faster then emitting the event.
> > 
> > If the filter is not updated for 1000ms then that is guest visible:
> > it is not getting packets with the new MAC.
> 
> Let me understand this better: the filter is going to be updated by
> libvirt, is that correctly?

Yes.
 
> If this is right, then I can understand where you came from, but how
> can we possibly control how long it will take for libvirt to take
> action?

Quote from my commitlog

|> There maybe exist an uncontrollable delay if we let Libvirt do the
|> real change, guests normally expect rx-filter updates immediately.
|> But it's another separate issue, we can investigate it when the
|> work in Libvirt side is done.
Michael S. Tsirkin June 17, 2013, 3:10 p.m. UTC | #10
On Mon, Jun 17, 2013 at 10:54:39AM -0400, Luiz Capitulino wrote:
> On Mon, 17 Jun 2013 17:42:50 +0300
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
> > > The 1000ms I talked about is *not* what the guest will see. If there are
> > > events pending, the throttle API just queues the event and returns right
> > > away. I'd even _guess_ that this is faster then emitting the event.
> > 
> > If the filter is not updated for 1000ms then that is guest visible:
> > it is not getting packets with the new MAC.
> 
> Let me understand this better: the filter is going to be updated by
> libvirt, is that correctly?

Exactly.

> If this is right, then I can understand where you came from, but how
> can we possibly control how long it will take for libvirt to take
> action?

We can't, it's a best effort, but that's also true for some NICs.
At least let's not introduce an artificial delay there.
Luiz Capitulino June 17, 2013, 3:15 p.m. UTC | #11
On Mon, 17 Jun 2013 18:10:23 +0300
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Mon, Jun 17, 2013 at 10:54:39AM -0400, Luiz Capitulino wrote:
> > On Mon, 17 Jun 2013 17:42:50 +0300
> > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > 
> > > > The 1000ms I talked about is *not* what the guest will see. If there are
> > > > events pending, the throttle API just queues the event and returns right
> > > > away. I'd even _guess_ that this is faster then emitting the event.
> > > 
> > > If the filter is not updated for 1000ms then that is guest visible:
> > > it is not getting packets with the new MAC.
> > 
> > Let me understand this better: the filter is going to be updated by
> > libvirt, is that correctly?
> 
> Exactly.
> 
> > If this is right, then I can understand where you came from, but how
> > can we possibly control how long it will take for libvirt to take
> > action?
> 
> We can't, it's a best effort, but that's also true for some NICs.
> At least let's not introduce an artificial delay there.

Got it. Okay, I think the flag is fine and won't ask for more evidence then.

Although I still wonder if we would do alright w/o the flag and w/o using
the throttle API...
Wayne Xia June 18, 2013, 1:59 a.m. UTC | #12
于 2013-6-14 15:45, Amos Kong 写道:
> Currently macvtap based macvlan device is working in promiscuous
> mode, we want to implement mac-programming over macvtap through
> Libvirt for better performance.
> 
> Design:
>   QEMU notifies Libvirt when rx-filter config is changed in guest,
>   then Libvirt query the rx-filter information by a monitor command,
>   and sync the change to macvtap device. Related rx-filter config
>   of the nic contains main mac, rx-mode items and vlan table.
> 
> This patch adds a QMP event to notify management of rx-filter change,
> and adds a monitor command for management to query rx-filter
> information.
> 
> Test:
>   If we repeatedly add/remove vlan, and change macaddr of vlan
>   interfaces in guest by a loop script.
> 
> Result:
>   The events will flood the QMP client(management), management takes
>   too much resource to process the events.
> 
>   Event_throttle API (set rate to 1 ms) can avoid the events to flood
>   QMP client, but it could cause an unexpected delay (~1ms), guests
>   guests normally expect rx-filter updates immediately.
> 
>   So we use a flag for each nic to avoid events flooding, the event
>   is emitted once until the query command is executed. The flag
>   implementation could not introduce unexpected delay.
> 
> There maybe exist an uncontrollable delay if we let Libvirt do the
> real change, guests normally expect rx-filter updates immediately.
> But it's another separate issue, we can investigate it when the
> work in Libvirt side is done.
> 
> Signed-off-by: Amos Kong <akong@redhat.com>
> ---
> v2: add argument to filter mac-table info of single nic (Stefan)
>      update the document, add event notification
> v3: rename to rx-filter, add main mac, avoid events flooding (MST)
>      fix error process (Stefan), fix qmp interface (Eric)
> v4: process qerror in hmp, cleanup (Luiz)
>      set flag for each device, add device path in event, add
>      helper for g_strdup_printf (MST)
>      fix qmp document (Eric)
> v5: add path in doc, define notify flag to unsigned (Eric)
>      add vlan table (Jason), drop monitor cmd
> v6: return vids list, add test results/analysis to commitlog
> ---
>   QMP/qmp-events.txt        |  17 ++++++++
>   hw/net/virtio-net.c       | 108 ++++++++++++++++++++++++++++++++++++++++++++++
>   include/monitor/monitor.h |   1 +
>   include/net/net.h         |   3 ++
>   monitor.c                 |   1 +
>   net/net.c                 |  47 ++++++++++++++++++++
>   qapi-schema.json          |  75 ++++++++++++++++++++++++++++++++
>   qmp-commands.hx           |  63 +++++++++++++++++++++++++++
>   8 files changed, 315 insertions(+)
> 
> diff --git a/QMP/qmp-events.txt b/QMP/qmp-events.txt
> index 24e804e9..39b6016 100644
> --- a/QMP/qmp-events.txt
> +++ b/QMP/qmp-events.txt
> @@ -172,6 +172,23 @@ Data:
>     },
>     "timestamp": { "seconds": 1265044230, "microseconds": 450486 } }
> 
> +NIC_RX_FILTER_CHANGED
> +-----------------
> +
> +The event is emitted once until the query command is executed,
> +the first event will always be emitted.
> +
> +Data:
> +
> +- "name": net client name (json-string)
> +- "path": device path (json-string)
> +
> +{ "event": "NIC_RX_FILTER_CHANGED",
> +  "data": { "name": "vnet0",
> +            "path": "/machine/peripheral/vnet0/virtio-backend" },
> +  "timestamp": { "seconds": 1368697518, "microseconds": 326866 } }
> +}
> +
>   RESET
>   -----
> 
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index 1ea9556..4137959 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -21,6 +21,8 @@
>   #include "hw/virtio/virtio-net.h"
>   #include "net/vhost_net.h"
>   #include "hw/virtio/virtio-bus.h"
> +#include "qapi/qmp/qjson.h"
> +#include "monitor/monitor.h"
> 
>   #define VIRTIO_NET_VM_VERSION    11
> 
> @@ -192,6 +194,100 @@ static void virtio_net_set_link_status(NetClientState *nc)
>       virtio_net_set_status(vdev, vdev->status);
>   }
> 
> +static void rxfilter_notify(NetClientState *nc)
> +{
> +    QObject *event_data;
> +    VirtIONet *n = qemu_get_nic_opaque(nc);
> +
> +    if (nc->rxfilter_notify_enabled) {
> +        event_data = qobject_from_jsonf("{ 'name': %s, 'path': %s }",
> +                           n->netclient_name,
> +                           object_get_canonical_path(OBJECT(n->qdev)));
> +        monitor_protocol_event(QEVENT_NIC_RX_FILTER_CHANGED, event_data);
> +        qobject_decref(event_data);
> +
> +        /* disable event notification to avoid events flooding */
> +        nc->rxfilter_notify_enabled = 0;
> +    }
> +}
> +
> +static char *mac_strdup_printf(const uint8_t *mac)
> +{
> +    return g_strdup_printf("%.2x:%.2x:%.2x:%.2x:%.2x:%.2x", mac[0],
> +                            mac[1], mac[2], mac[3], mac[4], mac[5]);
> +}
> +
> +static RxFilterInfo *virtio_net_query_rxfilter(NetClientState *nc)
> +{
> +    VirtIONet *n = qemu_get_nic_opaque(nc);
> +    RxFilterInfo *info;
> +    strList *str_list = NULL;
> +    strList *entry;
> +    intList *int_list = NULL;
> +    intList *int_entry;
> +    int i, j;
> +
> +    info = g_malloc0(sizeof(*info));
> +    info->name = g_strdup(nc->name);
> +    info->promiscuous = n->promisc;
> +
> +    if (n->nouni) {
> +        info->unicast = RX_STATE_NONE;
> +    } else if (n->alluni) {
> +        info->unicast = RX_STATE_ALL;
> +    } else {
> +        info->unicast = RX_STATE_NORMAL;
> +    }
> +
> +    if (n->nomulti) {
> +        info->multicast = RX_STATE_NONE;
> +    } else if (n->allmulti) {
> +        info->multicast = RX_STATE_ALL;
> +    } else {
> +        info->multicast = RX_STATE_NORMAL;
> +    }
> +
> +    info->broadcast_allowed = n->nobcast;
> +    info->multicast_overflow = n->mac_table.multi_overflow;
> +    info->unicast_overflow = n->mac_table.uni_overflow;
> +
> +    info->main_mac = mac_strdup_printf(n->mac);
> +
> +    for (i = 0; i < n->mac_table.first_multi; i++) {
> +        entry = g_malloc0(sizeof(*entry));
> +        entry->value = mac_strdup_printf(n->mac_table.macs + i * ETH_ALEN);
> +        entry->next = str_list;
> +        str_list = entry;
> +    }
> +    info->unicast_table = str_list;
> +
> +    str_list = NULL;
> +    for (i = n->mac_table.first_multi; i < n->mac_table.in_use; i++) {
> +        entry = g_malloc0(sizeof(*entry));
> +        entry->value = mac_strdup_printf(n->mac_table.macs + i * ETH_ALEN);
> +        entry->next = str_list;
> +        str_list = entry;
> +    }
> +    info->multicast_table = str_list;
> +
> +    for (i = 0; i < MAX_VLAN >> 5; i++) {
> +        for (j = 0; n->vlans[i] && j < 0x1f; j++) {
> +            if (n->vlans[i] & (1U << j)) {
> +                int_entry = g_malloc0(sizeof(*int_entry));
> +                int_entry->value = (i << 5) + j;
> +                int_entry->next = int_list;
> +                int_list = int_entry;
> +            }
> +        }
> +    }
> +    info->vlan_table = int_list;
> +
> +    /* enable event notification after query */
> +    nc->rxfilter_notify_enabled = 1;
> +
> +    return info;
> +}
> +
>   static void virtio_net_reset(VirtIODevice *vdev)
>   {
>       VirtIONet *n = VIRTIO_NET(vdev);
> @@ -420,6 +516,7 @@ static int virtio_net_handle_rx_mode(VirtIONet *n, uint8_t cmd,
>   {
>       uint8_t on;
>       size_t s;
> +    NetClientState *nc = qemu_get_queue(n->nic);
> 
>       s = iov_to_buf(iov, iov_cnt, 0, &on, sizeof(on));
>       if (s != sizeof(on)) {
> @@ -442,6 +539,8 @@ static int virtio_net_handle_rx_mode(VirtIONet *n, uint8_t cmd,
>           return VIRTIO_NET_ERR;
>       }
> 
> +    rxfilter_notify(nc);
> +
>       return VIRTIO_NET_OK;
>   }
> 
> @@ -487,6 +586,7 @@ static int virtio_net_handle_mac(VirtIONet *n, uint8_t cmd,
>   {
>       struct virtio_net_ctrl_mac mac_data;
>       size_t s;
> +    NetClientState *nc = qemu_get_queue(n->nic);
> 
>       if (cmd == VIRTIO_NET_CTRL_MAC_ADDR_SET) {
>           if (iov_size(iov, iov_cnt) != sizeof(n->mac)) {
> @@ -495,6 +595,8 @@ static int virtio_net_handle_mac(VirtIONet *n, uint8_t cmd,
>           s = iov_to_buf(iov, iov_cnt, 0, &n->mac, sizeof(n->mac));
>           assert(s == sizeof(n->mac));
>           qemu_format_nic_info_str(qemu_get_queue(n->nic), n->mac);
> +        rxfilter_notify(nc);
> +
>           return VIRTIO_NET_OK;
>       }
> 
> @@ -559,6 +661,8 @@ static int virtio_net_handle_mac(VirtIONet *n, uint8_t cmd,
>           n->mac_table.multi_overflow = 1;
>       }
> 
> +    rxfilter_notify(nc);
> +
>       return VIRTIO_NET_OK;
>   }
> 
> @@ -567,6 +671,7 @@ static int virtio_net_handle_vlan_table(VirtIONet *n, uint8_t cmd,
>   {
>       uint16_t vid;
>       size_t s;
> +    NetClientState *nc = qemu_get_queue(n->nic);
> 
>       s = iov_to_buf(iov, iov_cnt, 0, &vid, sizeof(vid));
>       vid = lduw_p(&vid);
> @@ -584,6 +689,8 @@ static int virtio_net_handle_vlan_table(VirtIONet *n, uint8_t cmd,
>       else
>           return VIRTIO_NET_ERR;
> 
> +    rxfilter_notify(nc);
> +
>       return VIRTIO_NET_OK;
>   }
> 
> @@ -1312,6 +1419,7 @@ static NetClientInfo net_virtio_info = {
>       .receive = virtio_net_receive,
>           .cleanup = virtio_net_cleanup,
>       .link_status_changed = virtio_net_set_link_status,
> +    .query_rx_filter = virtio_net_query_rxfilter,
>   };
> 
>   static bool virtio_net_guest_notifier_pending(VirtIODevice *vdev, int idx)
> diff --git a/include/monitor/monitor.h b/include/monitor/monitor.h
> index 1a6cfcf..1942cc4 100644
> --- a/include/monitor/monitor.h
> +++ b/include/monitor/monitor.h
> @@ -41,6 +41,7 @@ typedef enum MonitorEvent {
>       QEVENT_BLOCK_JOB_READY,
>       QEVENT_DEVICE_DELETED,
>       QEVENT_DEVICE_TRAY_MOVED,
> +    QEVENT_NIC_RX_FILTER_CHANGED,
>       QEVENT_SUSPEND,
>       QEVENT_SUSPEND_DISK,
>       QEVENT_WAKEUP,
> diff --git a/include/net/net.h b/include/net/net.h
> index 43d85a1..30e4b04 100644
> --- a/include/net/net.h
> +++ b/include/net/net.h
> @@ -49,6 +49,7 @@ typedef ssize_t (NetReceiveIOV)(NetClientState *, const struct iovec *, int);
>   typedef void (NetCleanup) (NetClientState *);
>   typedef void (LinkStatusChanged)(NetClientState *);
>   typedef void (NetClientDestructor)(NetClientState *);
> +typedef RxFilterInfo *(QueryRxFilter)(NetClientState *);
> 
>   typedef struct NetClientInfo {
>       NetClientOptionsKind type;
> @@ -59,6 +60,7 @@ typedef struct NetClientInfo {
>       NetCanReceive *can_receive;
>       NetCleanup *cleanup;
>       LinkStatusChanged *link_status_changed;
> +    QueryRxFilter *query_rx_filter;
>       NetPoll *poll;
>   } NetClientInfo;
> 
> @@ -74,6 +76,7 @@ struct NetClientState {
>       unsigned receive_disabled : 1;
>       NetClientDestructor *destructor;
>       unsigned int queue_index;
> +    unsigned rxfilter_notify_enabled:1;
>   };
> 
>   typedef struct NICState {
> diff --git a/monitor.c b/monitor.c
> index 017411f..3b8117c 100644
> --- a/monitor.c
> +++ b/monitor.c
> @@ -490,6 +490,7 @@ static const char *monitor_event_names[] = {
>       [QEVENT_BLOCK_JOB_READY] = "BLOCK_JOB_READY",
>       [QEVENT_DEVICE_DELETED] = "DEVICE_DELETED",
>       [QEVENT_DEVICE_TRAY_MOVED] = "DEVICE_TRAY_MOVED",
> +    [QEVENT_NIC_RX_FILTER_CHANGED] = "NIC_RX_FILTER_CHANGED",
>       [QEVENT_SUSPEND] = "SUSPEND",
>       [QEVENT_SUSPEND_DISK] = "SUSPEND_DISK",
>       [QEVENT_WAKEUP] = "WAKEUP",
> diff --git a/net/net.c b/net/net.c
> index 43a74e4..33abffe 100644
> --- a/net/net.c
> +++ b/net/net.c
> @@ -961,6 +961,53 @@ void print_net_client(Monitor *mon, NetClientState *nc)
>                      nc->info_str);
>   }
> 
> +RxFilterInfoList *qmp_query_rx_filter(bool has_name, const char *name,
> +                                      Error **errp)
> +{
> +    NetClientState *nc;
> +    RxFilterInfoList *filter_list = NULL, *last_entry = NULL;
> +
> +    QTAILQ_FOREACH(nc, &net_clients, next) {
> +        RxFilterInfoList *entry;
> +        RxFilterInfo *info;
> +
> +        /* only query rx-filter information of nic */
> +        if (nc->info->type != NET_CLIENT_OPTIONS_KIND_NIC) {
> +            continue;
> +        }
> +        if (has_name && strcmp(nc->name, name) != 0) {
> +            continue;
> +        }
> +
> +        if (nc->info->query_rx_filter) {
> +            info = nc->info->query_rx_filter(nc);
> +            entry = g_malloc0(sizeof(*entry));
> +            entry->value = info;
> +
> +            if (!filter_list) {
> +                filter_list = entry;
> +            } else {
> +                last_entry->next = entry;
> +            }
> +            last_entry = entry;
> +        } else if (has_name) {
> +            error_setg(errp, "net client(%s) doesn't support"
> +                       " rx-filter querying", name);
> +            break;
> +        }
> +    }
> +
> +    if (filter_list == NULL && !error_is_set(errp)) {
> +        if (has_name) {
> +            error_setg(errp, "invalid net client name: %s", name);
> +        } else {
> +            error_setg(errp, "no net client supports rx-filter querying");
> +        }
> +    }
> +
> +    return filter_list;
> +}
> +
>   void do_info_network(Monitor *mon, const QDict *qdict)
>   {
>       NetClientState *nc, *peer;
> diff --git a/qapi-schema.json b/qapi-schema.json
> index 5ad6894..a69dc17 100644
> --- a/qapi-schema.json
> +++ b/qapi-schema.json
> @@ -3624,3 +3624,78 @@
>               '*cpuid-input-ecx': 'int',
>               'cpuid-register': 'X86CPURegister32',
>               'features': 'int' } }
> +
> +##
> +# @RxState:
> +#
> +# Packets receiving state
> +#
> +# @normal: filter assigned packets according to the mac-table
> +#
> +# @none: don't receive any assigned packet
> +#
> +# @all: receive all assigned packets
> +#
> +##
> +{ 'enum': 'RxState', 'data': [ 'normal', 'none', 'all' ] }
> +
> +##
> +# @RxFilterInfo:
> +#
> +# Rx-filter information for a nic.
> +#
> +# @name: net client name
> +#
> +# @promiscuous: whether promiscuous mode is enabled
> +#
> +# @multicast: multicast receive state
> +#
> +# @unicast: unicast receive state
> +#
> +# @broadcast-allowed: whether to receive broadcast
> +#
> +# @multicast-overflow: multicast table is overflowed or not
> +#
> +# @unicast-overflow: unicast table is overflowed or not
> +#
> +# @main-mac: the main macaddr string
> +#
> +# @vlan-table: a list of active vlan id
> +#
> +# @unicast-table: a list of unicast macaddr string
> +#
> +# @multicast-table: a list of multicast macaddr string
> +#
> +# Since 1.6
> +##
> +
> +{ 'type': 'RxFilterInfo',
> +  'data': {
> +    'name':               'str',
> +    'promiscuous':        'bool',
> +    'multicast':          'RxState',
> +    'unicast':            'RxState',
> +    'broadcast-allowed':  'bool',
> +    'multicast-overflow': 'bool',
> +    'unicast-overflow':   'bool',
> +    'main-mac':           'str',
> +    'vlan-table':         ['int'],
> +    'unicast-table':      ['str'],
> +    'multicast-table':    ['str'] }}
> +
> +##
> +# @query-rx-filter:
> +#
> +# Return rx-filter information for all nics (or for the given nic).
> +#
> +# @name: #optional net client name
> +#
> +# Returns: list of @RxFilterInfo for all nics (or for the given nic).
> +#          Returns an error if the given @name doesn't exist, or given
> +#          nic doesn't support rx-filter querying, or no net client
> +#          supports rx-filter querying
> +#
> +# Since: 1.6
> +##
> +{ 'command': 'query-rx-filter', 'data': { '*name': 'str' },
> +  'returns': ['RxFilterInfo'] }
> diff --git a/qmp-commands.hx b/qmp-commands.hx
> index 8cea5e5..aa00a94 100644
> --- a/qmp-commands.hx
> +++ b/qmp-commands.hx
> @@ -2997,3 +2997,66 @@ Example:
>   <- { "return": {} }
> 
>   EQMP
> +    {
> +        .name       = "query-rx-filter",
> +        .args_type  = "name:s?",
> +        .mhandler.cmd_new = qmp_marshal_input_query_rx_filter,
> +    },
> +
> +SQMP
> +query-rx-filter
> +---------------
> +
> +Show rx-filter information.
> +
> +Returns a json-array of rx-filter information for all nics (or for the
> +given nic), returning an error if the given nic doesn't exist, or
> +given nic doesn't support rx-filter querying, or no net client
> +supports rx-filter querying.
> +
> +The query will clear the event notification flag of each nic, then qemu
> +will start to emit event to QMP monitor.
> +
> +Each array entry contains the following:
> +
> +- "name": net client name (json-string)
> +- "promiscuous": promiscuous mode is enabled (json-bool)
> +- "multicast": multicast receive state (one of 'normal', 'none', 'all')
> +- "unicast": unicast receive state  (one of 'normal', 'none', 'all')
> +- "broadcast-allowed": allow to receive broadcast (json-bool)
> +- "multicast-overflow": multicast table is overflowed (json-bool)
> +- "unicast-overflow": unicast table is overflowed (json-bool)
> +- "main-mac": main macaddr string (json-string)
> +- "vlan-table": a json-array of active vlan id
> +- "unicast-table": a json-array of unicast macaddr string
> +- "multicast-table": a json-array of multicast macaddr string
> +
> +Example:
> +
> +-> { "execute": "query-rx-filter", "arguments": { "name": "vnet0" } }
> +<- { "return": [
> +        {
> +            "promiscuous": true,
> +            "name": "vnet0",
> +            "main-mac": "52:54:00:12:34:56",
> +            "unicast": "normal",
> +            "vlan-table": [
> +                4,
> +                0
> +            ],
> +            "unicast-table": [
> +            ],
> +            "multicast": "normal",
> +            "multicast-overflow": false,
> +            "unicast-overflow": false,
> +            "multicast-table": [
> +                "01:00:5e:00:00:01",
> +                "33:33:00:00:00:01",
> +                "33:33:ff:12:34:56"
> +            ],
> +            "broadcast-allowed": false
> +        }
> +      ]
> +   }
> +
> +EQMP
> 
Hi, Amos
  This patch is for the user case that guest use a macvtap device right?
could u add some document in commit message, so reader can understand
what it trying to do?
Amos Kong June 18, 2013, 3 a.m. UTC | #13
On Tue, Jun 18, 2013 at 09:59:10AM +0800, Wenchao Xia wrote:
> 于 2013-6-14 15:45, Amos Kong 写道:
> > Currently macvtap based macvlan device is working in promiscuous
> > mode, we want to implement mac-programming over macvtap through
> > Libvirt for better performance.
> > 
> > Design:
> >   QEMU notifies Libvirt when rx-filter config is changed in guest,
> >   then Libvirt query the rx-filter information by a monitor command,
> >   and sync the change to macvtap device. Related rx-filter config
> >   of the nic contains main mac, rx-mode items and vlan table.
> > 
> > This patch adds a QMP event to notify management of rx-filter change,
> > and adds a monitor command for management to query rx-filter
> > information.
> > 
> > Test:
> >   If we repeatedly add/remove vlan, and change macaddr of vlan
> >   interfaces in guest by a loop script.
> > 
> > Result:
> >   The events will flood the QMP client(management), management takes
> >   too much resource to process the events.
> > 
> >   Event_throttle API (set rate to 1 ms) can avoid the events to flood
> >   QMP client, but it could cause an unexpected delay (~1ms), guests
> >   guests normally expect rx-filter updates immediately.
> > 
> >   So we use a flag for each nic to avoid events flooding, the event
> >   is emitted once until the query command is executed. The flag
> >   implementation could not introduce unexpected delay.
> > 
> > There maybe exist an uncontrollable delay if we let Libvirt do the
> > real change, guests normally expect rx-filter updates immediately.
> > But it's another separate issue, we can investigate it when the
> > work in Libvirt side is done.


> > 
> Hi, Amos
>   This patch is for the user case that guest use a macvtap device right?
> could u add some document in commit message, so reader can understand
> what it trying to do?

I will add a single document later for it.
 docs/macprogramming-over-macvtap.txt

Thanks.
 
> Wenchao Xia
diff mbox

Patch

diff --git a/QMP/qmp-events.txt b/QMP/qmp-events.txt
index 24e804e9..39b6016 100644
--- a/QMP/qmp-events.txt
+++ b/QMP/qmp-events.txt
@@ -172,6 +172,23 @@  Data:
   },
   "timestamp": { "seconds": 1265044230, "microseconds": 450486 } }
 
+NIC_RX_FILTER_CHANGED
+-----------------
+
+The event is emitted once until the query command is executed,
+the first event will always be emitted.
+
+Data:
+
+- "name": net client name (json-string)
+- "path": device path (json-string)
+
+{ "event": "NIC_RX_FILTER_CHANGED",
+  "data": { "name": "vnet0",
+            "path": "/machine/peripheral/vnet0/virtio-backend" },
+  "timestamp": { "seconds": 1368697518, "microseconds": 326866 } }
+}
+
 RESET
 -----
 
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 1ea9556..4137959 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -21,6 +21,8 @@ 
 #include "hw/virtio/virtio-net.h"
 #include "net/vhost_net.h"
 #include "hw/virtio/virtio-bus.h"
+#include "qapi/qmp/qjson.h"
+#include "monitor/monitor.h"
 
 #define VIRTIO_NET_VM_VERSION    11
 
@@ -192,6 +194,100 @@  static void virtio_net_set_link_status(NetClientState *nc)
     virtio_net_set_status(vdev, vdev->status);
 }
 
+static void rxfilter_notify(NetClientState *nc)
+{
+    QObject *event_data;
+    VirtIONet *n = qemu_get_nic_opaque(nc);
+
+    if (nc->rxfilter_notify_enabled) {
+        event_data = qobject_from_jsonf("{ 'name': %s, 'path': %s }",
+                           n->netclient_name,
+                           object_get_canonical_path(OBJECT(n->qdev)));
+        monitor_protocol_event(QEVENT_NIC_RX_FILTER_CHANGED, event_data);
+        qobject_decref(event_data);
+
+        /* disable event notification to avoid events flooding */
+        nc->rxfilter_notify_enabled = 0;
+    }
+}
+
+static char *mac_strdup_printf(const uint8_t *mac)
+{
+    return g_strdup_printf("%.2x:%.2x:%.2x:%.2x:%.2x:%.2x", mac[0],
+                            mac[1], mac[2], mac[3], mac[4], mac[5]);
+}
+
+static RxFilterInfo *virtio_net_query_rxfilter(NetClientState *nc)
+{
+    VirtIONet *n = qemu_get_nic_opaque(nc);
+    RxFilterInfo *info;
+    strList *str_list = NULL;
+    strList *entry;
+    intList *int_list = NULL;
+    intList *int_entry;
+    int i, j;
+
+    info = g_malloc0(sizeof(*info));
+    info->name = g_strdup(nc->name);
+    info->promiscuous = n->promisc;
+
+    if (n->nouni) {
+        info->unicast = RX_STATE_NONE;
+    } else if (n->alluni) {
+        info->unicast = RX_STATE_ALL;
+    } else {
+        info->unicast = RX_STATE_NORMAL;
+    }
+
+    if (n->nomulti) {
+        info->multicast = RX_STATE_NONE;
+    } else if (n->allmulti) {
+        info->multicast = RX_STATE_ALL;
+    } else {
+        info->multicast = RX_STATE_NORMAL;
+    }
+
+    info->broadcast_allowed = n->nobcast;
+    info->multicast_overflow = n->mac_table.multi_overflow;
+    info->unicast_overflow = n->mac_table.uni_overflow;
+
+    info->main_mac = mac_strdup_printf(n->mac);
+
+    for (i = 0; i < n->mac_table.first_multi; i++) {
+        entry = g_malloc0(sizeof(*entry));
+        entry->value = mac_strdup_printf(n->mac_table.macs + i * ETH_ALEN);
+        entry->next = str_list;
+        str_list = entry;
+    }
+    info->unicast_table = str_list;
+
+    str_list = NULL;
+    for (i = n->mac_table.first_multi; i < n->mac_table.in_use; i++) {
+        entry = g_malloc0(sizeof(*entry));
+        entry->value = mac_strdup_printf(n->mac_table.macs + i * ETH_ALEN);
+        entry->next = str_list;
+        str_list = entry;
+    }
+    info->multicast_table = str_list;
+
+    for (i = 0; i < MAX_VLAN >> 5; i++) {
+        for (j = 0; n->vlans[i] && j < 0x1f; j++) {
+            if (n->vlans[i] & (1U << j)) {
+                int_entry = g_malloc0(sizeof(*int_entry));
+                int_entry->value = (i << 5) + j;
+                int_entry->next = int_list;
+                int_list = int_entry;
+            }
+        }
+    }
+    info->vlan_table = int_list;
+
+    /* enable event notification after query */
+    nc->rxfilter_notify_enabled = 1;
+
+    return info;
+}
+
 static void virtio_net_reset(VirtIODevice *vdev)
 {
     VirtIONet *n = VIRTIO_NET(vdev);
@@ -420,6 +516,7 @@  static int virtio_net_handle_rx_mode(VirtIONet *n, uint8_t cmd,
 {
     uint8_t on;
     size_t s;
+    NetClientState *nc = qemu_get_queue(n->nic);
 
     s = iov_to_buf(iov, iov_cnt, 0, &on, sizeof(on));
     if (s != sizeof(on)) {
@@ -442,6 +539,8 @@  static int virtio_net_handle_rx_mode(VirtIONet *n, uint8_t cmd,
         return VIRTIO_NET_ERR;
     }
 
+    rxfilter_notify(nc);
+
     return VIRTIO_NET_OK;
 }
 
@@ -487,6 +586,7 @@  static int virtio_net_handle_mac(VirtIONet *n, uint8_t cmd,
 {
     struct virtio_net_ctrl_mac mac_data;
     size_t s;
+    NetClientState *nc = qemu_get_queue(n->nic);
 
     if (cmd == VIRTIO_NET_CTRL_MAC_ADDR_SET) {
         if (iov_size(iov, iov_cnt) != sizeof(n->mac)) {
@@ -495,6 +595,8 @@  static int virtio_net_handle_mac(VirtIONet *n, uint8_t cmd,
         s = iov_to_buf(iov, iov_cnt, 0, &n->mac, sizeof(n->mac));
         assert(s == sizeof(n->mac));
         qemu_format_nic_info_str(qemu_get_queue(n->nic), n->mac);
+        rxfilter_notify(nc);
+
         return VIRTIO_NET_OK;
     }
 
@@ -559,6 +661,8 @@  static int virtio_net_handle_mac(VirtIONet *n, uint8_t cmd,
         n->mac_table.multi_overflow = 1;
     }
 
+    rxfilter_notify(nc);
+
     return VIRTIO_NET_OK;
 }
 
@@ -567,6 +671,7 @@  static int virtio_net_handle_vlan_table(VirtIONet *n, uint8_t cmd,
 {
     uint16_t vid;
     size_t s;
+    NetClientState *nc = qemu_get_queue(n->nic);
 
     s = iov_to_buf(iov, iov_cnt, 0, &vid, sizeof(vid));
     vid = lduw_p(&vid);
@@ -584,6 +689,8 @@  static int virtio_net_handle_vlan_table(VirtIONet *n, uint8_t cmd,
     else
         return VIRTIO_NET_ERR;
 
+    rxfilter_notify(nc);
+
     return VIRTIO_NET_OK;
 }
 
@@ -1312,6 +1419,7 @@  static NetClientInfo net_virtio_info = {
     .receive = virtio_net_receive,
         .cleanup = virtio_net_cleanup,
     .link_status_changed = virtio_net_set_link_status,
+    .query_rx_filter = virtio_net_query_rxfilter,
 };
 
 static bool virtio_net_guest_notifier_pending(VirtIODevice *vdev, int idx)
diff --git a/include/monitor/monitor.h b/include/monitor/monitor.h
index 1a6cfcf..1942cc4 100644
--- a/include/monitor/monitor.h
+++ b/include/monitor/monitor.h
@@ -41,6 +41,7 @@  typedef enum MonitorEvent {
     QEVENT_BLOCK_JOB_READY,
     QEVENT_DEVICE_DELETED,
     QEVENT_DEVICE_TRAY_MOVED,
+    QEVENT_NIC_RX_FILTER_CHANGED,
     QEVENT_SUSPEND,
     QEVENT_SUSPEND_DISK,
     QEVENT_WAKEUP,
diff --git a/include/net/net.h b/include/net/net.h
index 43d85a1..30e4b04 100644
--- a/include/net/net.h
+++ b/include/net/net.h
@@ -49,6 +49,7 @@  typedef ssize_t (NetReceiveIOV)(NetClientState *, const struct iovec *, int);
 typedef void (NetCleanup) (NetClientState *);
 typedef void (LinkStatusChanged)(NetClientState *);
 typedef void (NetClientDestructor)(NetClientState *);
+typedef RxFilterInfo *(QueryRxFilter)(NetClientState *);
 
 typedef struct NetClientInfo {
     NetClientOptionsKind type;
@@ -59,6 +60,7 @@  typedef struct NetClientInfo {
     NetCanReceive *can_receive;
     NetCleanup *cleanup;
     LinkStatusChanged *link_status_changed;
+    QueryRxFilter *query_rx_filter;
     NetPoll *poll;
 } NetClientInfo;
 
@@ -74,6 +76,7 @@  struct NetClientState {
     unsigned receive_disabled : 1;
     NetClientDestructor *destructor;
     unsigned int queue_index;
+    unsigned rxfilter_notify_enabled:1;
 };
 
 typedef struct NICState {
diff --git a/monitor.c b/monitor.c
index 017411f..3b8117c 100644
--- a/monitor.c
+++ b/monitor.c
@@ -490,6 +490,7 @@  static const char *monitor_event_names[] = {
     [QEVENT_BLOCK_JOB_READY] = "BLOCK_JOB_READY",
     [QEVENT_DEVICE_DELETED] = "DEVICE_DELETED",
     [QEVENT_DEVICE_TRAY_MOVED] = "DEVICE_TRAY_MOVED",
+    [QEVENT_NIC_RX_FILTER_CHANGED] = "NIC_RX_FILTER_CHANGED",
     [QEVENT_SUSPEND] = "SUSPEND",
     [QEVENT_SUSPEND_DISK] = "SUSPEND_DISK",
     [QEVENT_WAKEUP] = "WAKEUP",
diff --git a/net/net.c b/net/net.c
index 43a74e4..33abffe 100644
--- a/net/net.c
+++ b/net/net.c
@@ -961,6 +961,53 @@  void print_net_client(Monitor *mon, NetClientState *nc)
                    nc->info_str);
 }
 
+RxFilterInfoList *qmp_query_rx_filter(bool has_name, const char *name,
+                                      Error **errp)
+{
+    NetClientState *nc;
+    RxFilterInfoList *filter_list = NULL, *last_entry = NULL;
+
+    QTAILQ_FOREACH(nc, &net_clients, next) {
+        RxFilterInfoList *entry;
+        RxFilterInfo *info;
+
+        /* only query rx-filter information of nic */
+        if (nc->info->type != NET_CLIENT_OPTIONS_KIND_NIC) {
+            continue;
+        }
+        if (has_name && strcmp(nc->name, name) != 0) {
+            continue;
+        }
+
+        if (nc->info->query_rx_filter) {
+            info = nc->info->query_rx_filter(nc);
+            entry = g_malloc0(sizeof(*entry));
+            entry->value = info;
+
+            if (!filter_list) {
+                filter_list = entry;
+            } else {
+                last_entry->next = entry;
+            }
+            last_entry = entry;
+        } else if (has_name) {
+            error_setg(errp, "net client(%s) doesn't support"
+                       " rx-filter querying", name);
+            break;
+        }
+    }
+
+    if (filter_list == NULL && !error_is_set(errp)) {
+        if (has_name) {
+            error_setg(errp, "invalid net client name: %s", name);
+        } else {
+            error_setg(errp, "no net client supports rx-filter querying");
+        }
+    }
+
+    return filter_list;
+}
+
 void do_info_network(Monitor *mon, const QDict *qdict)
 {
     NetClientState *nc, *peer;
diff --git a/qapi-schema.json b/qapi-schema.json
index 5ad6894..a69dc17 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -3624,3 +3624,78 @@ 
             '*cpuid-input-ecx': 'int',
             'cpuid-register': 'X86CPURegister32',
             'features': 'int' } }
+
+##
+# @RxState:
+#
+# Packets receiving state
+#
+# @normal: filter assigned packets according to the mac-table
+#
+# @none: don't receive any assigned packet
+#
+# @all: receive all assigned packets
+#
+##
+{ 'enum': 'RxState', 'data': [ 'normal', 'none', 'all' ] }
+
+##
+# @RxFilterInfo:
+#
+# Rx-filter information for a nic.
+#
+# @name: net client name
+#
+# @promiscuous: whether promiscuous mode is enabled
+#
+# @multicast: multicast receive state
+#
+# @unicast: unicast receive state
+#
+# @broadcast-allowed: whether to receive broadcast
+#
+# @multicast-overflow: multicast table is overflowed or not
+#
+# @unicast-overflow: unicast table is overflowed or not
+#
+# @main-mac: the main macaddr string
+#
+# @vlan-table: a list of active vlan id
+#
+# @unicast-table: a list of unicast macaddr string
+#
+# @multicast-table: a list of multicast macaddr string
+#
+# Since 1.6
+##
+
+{ 'type': 'RxFilterInfo',
+  'data': {
+    'name':               'str',
+    'promiscuous':        'bool',
+    'multicast':          'RxState',
+    'unicast':            'RxState',
+    'broadcast-allowed':  'bool',
+    'multicast-overflow': 'bool',
+    'unicast-overflow':   'bool',
+    'main-mac':           'str',
+    'vlan-table':         ['int'],
+    'unicast-table':      ['str'],
+    'multicast-table':    ['str'] }}
+
+##
+# @query-rx-filter:
+#
+# Return rx-filter information for all nics (or for the given nic).
+#
+# @name: #optional net client name
+#
+# Returns: list of @RxFilterInfo for all nics (or for the given nic).
+#          Returns an error if the given @name doesn't exist, or given
+#          nic doesn't support rx-filter querying, or no net client
+#          supports rx-filter querying
+#
+# Since: 1.6
+##
+{ 'command': 'query-rx-filter', 'data': { '*name': 'str' },
+  'returns': ['RxFilterInfo'] }
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 8cea5e5..aa00a94 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -2997,3 +2997,66 @@  Example:
 <- { "return": {} }
 
 EQMP
+    {
+        .name       = "query-rx-filter",
+        .args_type  = "name:s?",
+        .mhandler.cmd_new = qmp_marshal_input_query_rx_filter,
+    },
+
+SQMP
+query-rx-filter
+---------------
+
+Show rx-filter information.
+
+Returns a json-array of rx-filter information for all nics (or for the
+given nic), returning an error if the given nic doesn't exist, or
+given nic doesn't support rx-filter querying, or no net client
+supports rx-filter querying.
+
+The query will clear the event notification flag of each nic, then qemu
+will start to emit event to QMP monitor.
+
+Each array entry contains the following:
+
+- "name": net client name (json-string)
+- "promiscuous": promiscuous mode is enabled (json-bool)
+- "multicast": multicast receive state (one of 'normal', 'none', 'all')
+- "unicast": unicast receive state  (one of 'normal', 'none', 'all')
+- "broadcast-allowed": allow to receive broadcast (json-bool)
+- "multicast-overflow": multicast table is overflowed (json-bool)
+- "unicast-overflow": unicast table is overflowed (json-bool)
+- "main-mac": main macaddr string (json-string)
+- "vlan-table": a json-array of active vlan id
+- "unicast-table": a json-array of unicast macaddr string
+- "multicast-table": a json-array of multicast macaddr string
+
+Example:
+
+-> { "execute": "query-rx-filter", "arguments": { "name": "vnet0" } }
+<- { "return": [
+        {
+            "promiscuous": true,
+            "name": "vnet0",
+            "main-mac": "52:54:00:12:34:56",
+            "unicast": "normal",
+            "vlan-table": [
+                4,
+                0
+            ],
+            "unicast-table": [
+            ],
+            "multicast": "normal",
+            "multicast-overflow": false,
+            "unicast-overflow": false,
+            "multicast-table": [
+                "01:00:5e:00:00:01",
+                "33:33:00:00:00:01",
+                "33:33:ff:12:34:56"
+            ],
+            "broadcast-allowed": false
+        }
+      ]
+   }
+
+EQMP