diff mbox series

[ovs-dev] Extends the existing mirror configuration parameters

Message ID 20210510160045.49434-2-timothy.miskell@intel.com
State Changes Requested
Headers show
Series [ovs-dev] Extends the existing mirror configuration parameters | expand

Commit Message

Miskell, Timothy May 10, 2021, 4 p.m. UTC
From: Liang-min Wang <liang-min.wang@intel.com>

The following parameters are added:
 - mirror-offload: to turn on/off mirror offloading.
 - output-port-name: specify a port, using name string, that is on a different
   bridge
 - output-src-vlan: output port vlan for each select-src-port.
 - output-dst-vlan: output port vlan for each select-dst-port.
 - flow-src-mac: use src mac address of each select-dst-port for the header
   scan.
 - flow-dst-mac: use dst mac address of each select-src-port for the header
   scan.
 - mirror-tunnel-addr: BDF string of the tunnel device.

ovs-vsctl test change because new mirroring parameters are introduced in this patch

Create a defer procedure call thread to handle all mirror offload requests.
This is a light-weight thread which remains in sleep-state when there is no new request.
This is created between ovs-vsctl and mirror offloading back end

Implementing DPDK tx-burst (VIRTIO ingress traffic
mirror) and rx-burst (VIRTIO egress traffic mirror) callbacks.
Each callback  functions implement the following tasks:
 1. Enable per-packet VLAN insertion
   - for port mirroring, all packets are enabled per-packet VLAN insertion.
   - for flow mirroring, only packet header matches the required mac address
     are enabled.
 2. Sending the packets to the specified transport port (output-port in
    mirror offload configuration)
   - for port mirroring, all packets are sent to the transport port.
   - for flow mirroring, only matched packets are sent.
 3. Restore each packet attributes (remove DPDK per-packet offload flag)

Signed-off-by: Liang-min Wang <liang-min.wang@intel.com>
Tested-by: Timothy Miskell <Timothy.Miskell@intel.com>
Suggested-by: Munish Mehan <mm6021@att.com>
---
 lib/automake.mk            |   2 +
 lib/netdev-dpdk-mirror.c   | 516 +++++++++++++++++++++++++++++++++++++
 lib/netdev-dpdk-mirror.h   |  83 ++++++
 lib/netdev-dpdk.c          | 397 ++++++++++++++++++++++++++++
 lib/netdev-provider.h      |  16 ++
 lib/netdev.c               | 386 +++++++++++++++++++++++++++
 lib/netdev.h               |  16 ++
 tests/ovs-vsctl.at         |   2 +
 vswitchd/bridge.c          | 271 ++++++++++++++++++-
 vswitchd/vswitch.ovsschema |  24 +-
 vswitchd/vswitch.xml       |  50 ++++
 11 files changed, 1759 insertions(+), 4 deletions(-)
 create mode 100644 lib/netdev-dpdk-mirror.c
 create mode 100644 lib/netdev-dpdk-mirror.h

Comments

0-day Robot May 10, 2021, 5:12 p.m. UTC | #1
Bleep bloop.  Greetings Timothy Miskell, I am a robot and I have tried out your patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


checkpatch:
WARNING: Line lacks whitespace around operator
#1224 FILE: lib/netdev.c:2331:
netdev_mirror_db_resize(struct netdev_mirror_offload_item ***old_db,

Lines checked: 2067, Warnings: 1, Errors: 0


Please check this out.  If you feel there has been an error, please email aconole@redhat.com

Thanks,
0-day Robot
Maxime Coquelin May 18, 2021, 4:15 p.m. UTC | #2
Hi Timothy, Liang-min,

Thanks for rebasing the patch.
A list of delta against the first RFC could help the reviewers.
I notice one change in the right direction is the conversion to Vhost
API datapath instead of Vhost PMD.

Also, I would suggest to have the patch split in several incremental
patches to ease the review.

On 5/10/21 6:00 PM, Timothy Miskell wrote:
> From: Liang-min Wang <liang-min.wang@intel.com>
> 
> The following parameters are added:
>  - mirror-offload: to turn on/off mirror offloading.
>  - output-port-name: specify a port, using name string, that is on a different
>    bridge
>  - output-src-vlan: output port vlan for each select-src-port.
>  - output-dst-vlan: output port vlan for each select-dst-port.
>  - flow-src-mac: use src mac address of each select-dst-port for the header
>    scan.
>  - flow-dst-mac: use dst mac address of each select-src-port for the header
>    scan.
>  - mirror-tunnel-addr: BDF string of the tunnel device.
> 
> ovs-vsctl test change because new mirroring parameters are introduced in this patch

It would help to provide examples of usage of these new parameters.

> Create a defer procedure call thread to handle all mirror offload requests.
> This is a light-weight thread which remains in sleep-state when there is no new request.
> This is created between ovs-vsctl and mirror offloading back end
> 
> Implementing DPDK tx-burst (VIRTIO ingress traffic
> mirror) and rx-burst (VIRTIO egress traffic mirror) callbacks.
> Each callback  functions implement the following tasks:
>  1. Enable per-packet VLAN insertion
>    - for port mirroring, all packets are enabled per-packet VLAN insertion.
>    - for flow mirroring, only packet header matches the required mac address
>      are enabled.
>  2. Sending the packets to the specified transport port (output-port in
>     mirror offload configuration)
>    - for port mirroring, all packets are sent to the transport port.
>    - for flow mirroring, only matched packets are sent.
>  3. Restore each packet attributes (remove DPDK per-packet offload flag)

I will for sure have more questions later, but please find a few
comments/questions below:

> Signed-off-by: Liang-min Wang <liang-min.wang@intel.com>
> Tested-by: Timothy Miskell <Timothy.Miskell@intel.com>
> Suggested-by: Munish Mehan <mm6021@att.com>
> ---
>  lib/automake.mk            |   2 +
>  lib/netdev-dpdk-mirror.c   | 516 +++++++++++++++++++++++++++++++++++++
>  lib/netdev-dpdk-mirror.h   |  83 ++++++
>  lib/netdev-dpdk.c          | 397 ++++++++++++++++++++++++++++
>  lib/netdev-provider.h      |  16 ++
>  lib/netdev.c               | 386 +++++++++++++++++++++++++++
>  lib/netdev.h               |  16 ++
>  tests/ovs-vsctl.at         |   2 +
>  vswitchd/bridge.c          | 271 ++++++++++++++++++-
>  vswitchd/vswitch.ovsschema |  24 +-
>  vswitchd/vswitch.xml       |  50 ++++
>  11 files changed, 1759 insertions(+), 4 deletions(-)
>  create mode 100644 lib/netdev-dpdk-mirror.c
>  create mode 100644 lib/netdev-dpdk-mirror.h
> 
> diff --git a/lib/automake.mk b/lib/automake.mk
> index 39901bd6d..dcafbfaca 100644
> --- a/lib/automake.mk
> +++ b/lib/automake.mk
> @@ -170,6 +170,7 @@ lib_libopenvswitch_la_SOURCES = \
>  	lib/multipath.h \
>  	lib/namemap.c \
>  	lib/netdev-dpdk.h \
> +	lib/netdev-dpdk-mirror.h \
>  	lib/netdev-dummy.c \
>  	lib/netdev-offload.c \
>  	lib/netdev-offload.h \
> @@ -460,6 +461,7 @@ if DPDK_NETDEV
>  lib_libopenvswitch_la_SOURCES += \
>  	lib/dpdk.c \
>  	lib/netdev-dpdk.c \
> +	lib/netdev-dpdk-mirror.c \
>  	lib/netdev-offload-dpdk.c
>  else
>  lib_libopenvswitch_la_SOURCES += \
> diff --git a/lib/netdev-dpdk-mirror.c b/lib/netdev-dpdk-mirror.c
> new file mode 100644
> index 000000000..ff2701660
> --- /dev/null
> +++ b/lib/netdev-dpdk-mirror.c
> @@ -0,0 +1,516 @@
> +/*
> + * Copyright (c) 2014, 2015, 2016, 2017 Nicira, Inc.
> + * Copyright (c) 2019 Mellanox Technologies, Ltd.
> + *
> + * Licensed under the Apache License, Version 2.0 (the "License");
> + * you may not use this file except in compliance with the License.
> + * You may obtain a copy of the License at:
> + *
> + *     http://www.apache.org/licenses/LICENSE-2.0
> + *
> + * Unless required by applicable law or agreed to in writing, software
> + * distributed under the License is distributed on an "AS IS" BASIS,
> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> + * See the License for the specific language governing permissions and
> + * limitations under the License.
> + */
> +#include <config.h>
> +#include <rte_ethdev.h>
> +
> +#include "netdev-dpdk-mirror.h"
> +#include "openvswitch/vlog.h"
> +#include "openvswitch/dynamic-string.h"
> +#include "util.h"
> +
> +#define MAC_ADDR_MAP           0x0000FFFFFFFFFFFFULL
> +#define is_mac_addr_match(a,b) (((a^b)&MAC_ADDR_MAP) == 0)
> +#define INIT_MIRROR_DB_SIZE    8
> +#define INVALID_DEVICE_ID      0xFFFFFFFF
> +
> +VLOG_DEFINE_THIS_MODULE(netdev_dpdk_mirror);
> +
> +/* port/flow mirror database management routines */
> +/*
> + * The below API is for port/flow mirror offloading which uses a different DPDK
> + * interface as rte-flow.
> + */
> +static int mirror_port_db_size = 0;
> +static int mirror_port_used = 0;
> +static struct mirror_offload_port *mirror_port_db = NULL;
> +
> +static void
> +netdev_mirror_db_init(struct mirror_offload_port *db, int size)
> +{
> +    int i;
> +
> +    for (i = 0; i < size; i++) {
> +        db[i].dev_id = INVALID_DEVICE_ID;
> +        memset(&db[i].rx, 0, sizeof(struct mirror_param));
> +        memset(&db[i].tx, 0, sizeof(struct mirror_param));
> +    }
> +}
> +
> +/* Double the db size when it runs out of space */
> +static int
> +netdev_mirror_db_resize(void)
> +{
> +    int new_size = mirror_port_db_size << 1;
> +    struct mirror_offload_port *new_db = xmalloc(
> +        sizeof(struct mirror_offload_port)*new_size);
> +
> +    memcpy(new_db, mirror_port_db, sizeof(struct mirror_offload_port)
> +        *mirror_port_db_size);
> +    netdev_mirror_db_init(&new_db[mirror_port_db_size], mirror_port_db_size);
> +    mirror_port_db_size = new_size;
> +    mirror_port_db = new_db;
> +
> +    return 0;
> +}
> +
> +
> +static struct mirror_offload_port*
> +netdev_mirror_data_find(uint32_t dev_id)
> +{
> +    int i;
> +
> +    if (mirror_port_db == NULL) {
> +        return NULL;
> +    }
> +
> +    for (i = 0; i < mirror_port_db_size; i++) {
> +        if (dev_id == mirror_port_db[i].dev_id) {
> +            return &mirror_port_db[i];
> +        }
> +    }
> +    return NULL;
> +}
> +
> +static struct mirror_offload_port*
> +netdev_mirror_data_add(uint32_t dev_id, int tx,
> +    struct mirror_param *new_param)
> +{
> +    struct mirror_offload_port *target = NULL;
> +    int i;
> +
> +    if (!mirror_port_db) {
> +        mirror_port_db_size = INIT_MIRROR_DB_SIZE;
> +        mirror_port_db = xmalloc(sizeof(struct mirror_offload_port)*
> +            mirror_port_db_size);
> +        netdev_mirror_db_init(mirror_port_db, mirror_port_db_size);
> +    }
> +    target = netdev_mirror_data_find(dev_id);
> +    if (target) {
> +        if (tx) {
> +            if (target->tx.mirror_cb) {
> +                VLOG_ERR("Attempt to add ingress mirror offloading"
> +                    " on port, %d, while one is outstanding\n", dev_id);
> +                return target;
> +            }
> +
> +            memcpy(&target->tx, new_param, sizeof(*new_param));
> +        } else {
> +            if (target->rx.mirror_cb) {
> +                VLOG_ERR("Attempt to add egress mirror offloading"
> +                    " on port, %d, while one is outstanding\n", dev_id);
> +                return target;
> +            }
> +
> +            memcpy(&target->rx, new_param, sizeof(struct mirror_param));
> +        }
> +    } else {
> +        struct mirror_param *param;
> +        /* find an unused spot on db */
> +        for (i = 0; i < mirror_port_db_size; i++) {
> +            if (mirror_port_db[i].dev_id == INVALID_DEVICE_ID) {
> +                break;
> +            }
> +        }
> +        if (i == mirror_port_db_size && netdev_mirror_db_resize()) {
> +                return NULL;
> +        }
> +
> +        param = tx ? &mirror_port_db[i].tx : &mirror_port_db[i].rx;
> +        memcpy(param, new_param, sizeof(struct mirror_param));
> +
> +        target = &mirror_port_db[i];
> +        target->dev_id = dev_id;
> +        mirror_port_used ++;
> +    }
> +    return target;
> +}
> +
> +static void
> +netdev_mirror_data_remove(uint32_t dev_id, int tx) {
> +    struct mirror_offload_port *target = netdev_mirror_data_find(dev_id);
> +
> +    if (!target) {
> +        VLOG_ERR("Attempt to remove unsaved port, %d, %s callback\n",
> +        dev_id, tx?"tx": "rx");
> +    }
> +
> +    if (tx) {
> +        memset(&target->tx, 0, sizeof(struct mirror_param));
> +    } else {
> +        memset(&target->rx, 0, sizeof(struct mirror_param));
> +    }
> +
> +    if ((target->rx.mirror_cb == NULL) &&
> +        (target->tx.mirror_cb == NULL)) {
> +        target->dev_id = INVALID_DEVICE_ID;
> +        mirror_port_used --;
> +        /* release port mirror db memory when there
> +         * is no outstanding port mirror offloading
> +         * configuration
> +         */
> +        if (mirror_port_used == 0) {
> +            free(mirror_port_db);
> +            mirror_port_db = NULL;
> +            mirror_port_db_size = 0;
> +        }
> +    }
> +}
> +
> +void
> +netdev_mirror_data_proc(uint32_t dev_id, mirror_data_op op,
> +    int tx, struct mirror_param *in_param,
> +    struct mirror_offload_port **out_param)
> +{
> +    switch (op) {
> +    case mirror_data_find:
> +        *out_param = netdev_mirror_data_find(dev_id);
> +        break;
> +    case mirror_data_add:
> +        *out_param = netdev_mirror_data_add(dev_id, tx, in_param);
> +        break;
> +    case mirror_data_rem:
> +        netdev_mirror_data_remove(dev_id, tx);
> +        break;
> +    }
> +}
> +
> +/* port/flow mirror traffic processors */
> +static inline uint16_t
> +netdev_custom_mirror_offload_cb(uint16_t qidx, struct rte_mbuf **pkts,
> +    uint16_t nb_pkts, void *user_params)
> +{
> +    struct mirror_param *data = user_params;
> +    uint16_t i, dst_qidx, match_count = 0;
> +    uint16_t pkt_trans;
> +    uint16_t dst_port_id = data->dst_port_id;
> +    uint16_t dst_vlan_id = data->dst_vlan_id;
> +    struct rte_mbuf **pkt_buf = &data->pkt_buf[qidx * data->max_burst_size];
> +
> +    if (nb_pkts == 0) {
> +        return 0;
> +    }
> +
> +    if (nb_pkts > data->max_burst_size) {
> +        VLOG_ERR("Per-flow batch size, %d, exceeds maximum limit\n", nb_pkts);
> +        return 0;
> +    }
> +
> +    for (i = 0; i < nb_pkts; i++) {
> +        if (data->custom_scan(pkts[i], user_params)) {
> +            pkt_buf[match_count] = pkts[i];
> +            pkt_buf[match_count]->ol_flags |= PKT_TX_VLAN_PKT;

Does it work if the packet already has a VLAN inserted?

> +            pkt_buf[match_count]->vlan_tci = dst_vlan_id;
> +            rte_mbuf_refcnt_update(pkt_buf[match_count], 1);



> +            match_count++;
> +        }
> +    }
> +
> +    dst_qidx = (data->n_dst_queue > qidx)?qidx:(data->n_dst_queue -1);

Wouldn't it scale better with:
dst_qidx = qidx % data->n_dst_queue
?

> +
> +    rte_spinlock_lock(&data->locks[dst_qidx]);
> +    pkt_trans = rte_eth_tx_burst(dst_port_id, dst_qidx, pkt_buf, match_count);
> +    rte_spinlock_unlock(&data->locks[dst_qidx]);
> +
> +    for (i = 0; i < match_count; i++) {
> +        pkt_buf[i]->ol_flags &= ~PKT_TX_VLAN_PKT;
> +    }

In order to further reduce the performance impact of mirroring, have you
envisaged to offload it to dedicated PMD threads?

> +
> +    while (unlikely (pkt_trans < match_count)) {
> +        rte_pktmbuf_free(pkt_buf[pkt_trans]);
> +        pkt_trans++;
> +    }
> +
> +    return nb_pkts;
> +}
> +
> +static inline uint16_t
> +netdev_flow_mirror_offload_cb(uint16_t qidx, struct rte_mbuf **pkts,
> +    uint16_t nb_pkts, void *user_params, uint32_t offset)
> +{
> +    struct mirror_param *data = user_params;
> +    uint16_t i, dst_qidx, match_count = 0;
> +    uint16_t pkt_trans;
> +    uint16_t dst_port_id = data->dst_port_id;
> +    uint16_t dst_vlan_id = data->dst_vlan_id;
> +    uint64_t target_addr = *(uint64_t *) data->extra_data;
> +    struct rte_mbuf **pkt_buf = &data->pkt_buf[qidx * data->max_burst_size];
> +
> +    if (nb_pkts == 0) {
> +        return 0;
> +    }
> +
> +    if (nb_pkts > data->max_burst_size) {
> +        VLOG_ERR("Per-flow batch size, %d, exceeds maximum limit\n", nb_pkts);
> +        return 0;
> +    }
> +
> +    for (i = 0; i < nb_pkts; i++) {
> +        uint64_t *dst_mac_addr =
> +            rte_pktmbuf_mtod_offset(pkts[i], void *, offset);
> +        if (is_mac_addr_match(target_addr, (*dst_mac_addr))) {
> +            pkt_buf[match_count] = pkts[i];
> +            pkt_buf[match_count]->ol_flags |= PKT_TX_VLAN_PKT;
> +            pkt_buf[match_count]->vlan_tci = dst_vlan_id;
> +            rte_mbuf_refcnt_update(pkt_buf[match_count], 1);
> +            match_count ++;
> +        }
> +    }
> +
> +    dst_qidx = (data->n_dst_queue > qidx) ? qidx : (data->n_dst_queue -1);
> +
> +    rte_spinlock_lock(&data->locks[dst_qidx]);
> +    pkt_trans = rte_eth_tx_burst(dst_port_id, dst_qidx, pkt_buf, match_count);
> +    rte_spinlock_unlock(&data->locks[dst_qidx]);
> +
> +    for (i = 0; i < match_count; i++) {
> +        pkt_buf[i]->ol_flags &= ~PKT_TX_VLAN_PKT;
> +    }
> +
> +    while (unlikely (pkt_trans < match_count)) {
> +        rte_pktmbuf_free(pkt_buf[pkt_trans]);
> +        pkt_trans++;
> +    }
> +
> +    return nb_pkts;
> +}
> +
> +static inline uint16_t
> +netdev_port_mirror_offload_cb(uint16_t qidx, struct rte_mbuf **pkts,
> +    uint16_t nb_pkts, void *user_params)
> +{
> +    struct mirror_param *data = user_params;
> +    uint16_t i, dst_qidx;
> +    uint16_t pkt_trans;
> +    uint16_t dst_port_id = data->dst_port_id;
> +    uint16_t dst_vlan_id = data->dst_vlan_id;
> +
> +    if (nb_pkts == 0) {
> +        return 0;
> +    }
> +
> +    for (i = 0; i < nb_pkts; i++) {
> +        pkts[i]->ol_flags |= PKT_TX_VLAN_PKT;
> +        pkts[i]->vlan_tci = dst_vlan_id;
> +        rte_mbuf_refcnt_update(pkts[i], 1);
> +    }
> +
> +    dst_qidx = (data->n_dst_queue > qidx) ? qidx : (data->n_dst_queue -1);
> +
> +    rte_spinlock_lock(&data->locks[dst_qidx]);
> +    pkt_trans = rte_eth_tx_burst(dst_port_id, dst_qidx, pkts, nb_pkts);
> +    rte_spinlock_unlock(&data->locks[dst_qidx]);
> +
> +    for (i = 0; i < nb_pkts; i++) {
> +        pkts[i]->ol_flags &= ~PKT_TX_VLAN_PKT;
> +    }
> +
> +    while (unlikely (pkt_trans < nb_pkts)) {
> +        rte_pktmbuf_free(pkts[pkt_trans]);
> +        pkt_trans++;
> +    }
> +
> +    return nb_pkts;
> +}
> +
> +static inline uint16_t
> +netdev_rx_custom_mirror_offload_cb(uint16_t port_id OVS_UNUSED,
> +    uint16_t qidx, struct rte_mbuf **pkts, uint16_t nb_pkts,
> +    uint16_t maxi_pkts OVS_UNUSED, void *user_params)
> +{
> +    return netdev_custom_mirror_offload_cb(qidx, pkts, nb_pkts, user_params);
> +}
> +
> +static inline uint16_t
> +netdev_tx_custom_mirror_offload_cb(uint16_t port_id OVS_UNUSED,
> +    uint16_t qidx, struct rte_mbuf **pkts, uint16_t nb_pkts,
> +    void *user_params)
> +{
> +    return netdev_custom_mirror_offload_cb(qidx, pkts, nb_pkts, user_params);
> +}
> +
> +static inline uint16_t
> +netdev_rx_flow_mirror_offload_cb(uint16_t port_id OVS_UNUSED,
> +    uint16_t qidx, struct rte_mbuf **pkts, uint16_t nb_pkts,
> +    uint16_t maxi_pkts OVS_UNUSED, void *user_params)
> +{
> +    return netdev_flow_mirror_offload_cb(qidx, pkts, nb_pkts, user_params, 0);
> +}
> +
> +static inline uint16_t
> +netdev_tx_flow_mirror_offload_cb(uint16_t port_id OVS_UNUSED,
> +    uint16_t qidx, struct rte_mbuf **pkts, uint16_t nb_pkts,
> +    void *user_params)
> +{
> +    return netdev_flow_mirror_offload_cb(qidx, pkts, nb_pkts, user_params, 6);
> +}
> +
> +static inline uint16_t
> +netdev_rx_port_mirror_offload_cb(uint16_t port_id OVS_UNUSED,
> +    uint16_t qidx, struct rte_mbuf **pkts, uint16_t nb_pkts,
> +    uint16_t max_pkts OVS_UNUSED, void *user_params)
> +{
> +    return netdev_port_mirror_offload_cb(qidx, pkts, nb_pkts, user_params);
> +}
> +
> +static inline uint16_t
> +netdev_tx_port_mirror_offload_cb(uint16_t port_id OVS_UNUSED,
> +    uint16_t qidx, struct rte_mbuf **pkts, uint16_t nb_pkts,
> +    void *user_params)
> +{
> +    return netdev_port_mirror_offload_cb(qidx, pkts, nb_pkts, user_params);
> +}
> +
> +static rte_rx_callback_fn
> +netdev_mirror_rx_cb(rte_mirror_type mirror_type)
> +{
> +    switch (mirror_type) {
> +    case mirror_port:
> +        return netdev_rx_port_mirror_offload_cb;
> +    case mirror_flow_mac:
> +        return netdev_rx_flow_mirror_offload_cb;
> +    case mirror_flow_custom:
> +        return netdev_rx_custom_mirror_offload_cb;
> +    case mirror_invalid:
> +        return NULL;
> +    }
> +    VLOG_ERR("Un-supported mirror type\n");
> +    return NULL;
> +}
> +
> +static rte_tx_callback_fn
> +netdev_mirror_tx_cb(rte_mirror_type mirror_type)
> +{
> +    switch (mirror_type) {
> +    case mirror_port:
> +        return netdev_tx_port_mirror_offload_cb;
> +    case mirror_flow_mac:
> +        return netdev_tx_flow_mirror_offload_cb;
> +        break;
> +    case mirror_flow_custom:
> +        return netdev_tx_custom_mirror_offload_cb;
> +    case mirror_invalid:
> +        return NULL;
> +    }
> +    VLOG_ERR("Un-supported mirror type\n");
> +    return NULL;
> +}
> +
> +void
> +netdev_mirror_cb_set(struct mirror_param *data, uint16_t port_id,
> +    int pmd_cb, int tx)
> +{
> +    unsigned int qid;
> +
> +    data->pkt_buf = NULL;
> +    if (data->extra_data_size) {
> +        data->pkt_buf = xmalloc(sizeof(mirror_fn_cb)*data->max_burst_size *
> +            data->n_src_queue);
> +    }
> +
> +    data->mirror_cb = xmalloc(sizeof(struct rte_eth_rxtx_callback *)
> +        * data->n_src_queue);
> +    for (qid = 0; qid < data->n_src_queue; qid++) {
> +        if (pmd_cb) {
> +            if (tx) {
> +                data->mirror_cb[qid].pmd = rte_eth_add_tx_callback(port_id,
> +                    qid, netdev_mirror_tx_cb(data->mirror_type), data);
> +            } else {
> +                data->mirror_cb[qid].pmd = rte_eth_add_rx_callback(port_id,
> +                    qid, netdev_mirror_rx_cb(data->mirror_type), data);
> +            }
> +        } else {
> +            struct rte_eth_rxtx_callback *rxtx_cb =
> +                xmalloc(sizeof(struct rte_eth_rxtx_callback));
> +
> +            data->mirror_cb[qid].direct = rxtx_cb;
> +            rxtx_cb->next = NULL;
> +            rxtx_cb->param = data;
> +
> +            if (tx) {
> +                rxtx_cb->fn.tx = netdev_mirror_tx_cb(data->mirror_type);
> +            } else {
> +                rxtx_cb->fn.rx = netdev_mirror_rx_cb(data->mirror_type);
> +            }
> +        }
> +    }
> +}
> +
> +/* port/flow mirroring device (port) register/un-registe routines */
> +int
> +netdev_eth_register_mirror(uint16_t src_port, struct mirror_param *param,
> +    int tx_cb)
> +{
> +    struct mirror_offload_port *port_info = NULL;
> +    struct mirror_param *data;
> +
> +    netdev_mirror_data_proc(src_port, mirror_data_add, tx_cb, param,
> +        &port_info);
> +    if (!port_info) {
> +        return -1;
> +    }
> +
> +    data = tx_cb ? &port_info->tx : &port_info->rx;
> +    netdev_mirror_cb_set(data, src_port, 1, tx_cb);
> +
> +    return 0;
> +}
> +
> +int
> +netdev_eth_unregister_mirror(uint16_t src_port, int tx_cb)
> +{
> +    /* release both cb and pkt_buf */
> +    unsigned int i;
> +    struct mirror_offload_port *port_info = NULL;
> +    struct mirror_param *data;
> +
> +    netdev_mirror_data_proc(src_port, mirror_data_find, tx_cb, NULL,
> +        &port_info);
> +    if (port_info == NULL) {
> +        VLOG_ERR("Source port %d is not on outstanding port mirror db\n",
> +            src_port);
> +        return -1;
> +    }
> +    data = tx_cb ? &port_info->tx : &port_info->rx;
> +
> +    for (i = 0; i < data->n_src_queue; i++) {
> +        if (data->mirror_cb[i].pmd) {
> +            if (tx_cb) {
> +                rte_eth_remove_tx_callback(src_port, i,
> +                    data->mirror_cb[i].pmd);
> +            } else {
> +                rte_eth_remove_rx_callback(src_port, i,
> +                    data->mirror_cb[i].pmd);
> +            }
> +        }
> +        data->mirror_cb[i].pmd = NULL;
> +    }
> +    free(data->mirror_cb);
> +
> +    if (data->pkt_buf) {
> +        free(data->pkt_buf);
> +        data->pkt_buf = NULL;
> +    }
> +
> +    if (data->extra_data) {
> +        free(data->extra_data);
> +        data->extra_data = NULL;
> +        data->extra_data_size = 0;
> +    }
> +
> +    netdev_mirror_data_proc(src_port, mirror_data_rem, tx_cb, NULL, NULL);
> +    return 0;
> +}
> diff --git a/lib/netdev-dpdk-mirror.h b/lib/netdev-dpdk-mirror.h
> new file mode 100644
> index 000000000..ee4b933ba
> --- /dev/null
> +++ b/lib/netdev-dpdk-mirror.h
> @@ -0,0 +1,83 @@
> +/*
> + * Copyright (c) 2014, 2015, 2016 Nicira, Inc.
> + *
> + * Licensed under the Apache License, Version 2.0 (the "License");
> + * you may not use this file except in compliance with the License.
> + * You may obtain a copy of the License at:
> + *
> + *     http://www.apache.org/licenses/LICENSE-2.0
> + *
> + * Unless required by applicable law or agreed to in writing, software
> + * distributed under the License is distributed on an "AS IS" BASIS,
> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> + * See the License for the specific language governing permissions and
> + * limitations under the License.
> + */
> +
> +#ifndef NETDEV_DPDK_MIRROR_H
> +#define NETDEV_DPDK_MIRROR_H
> +
> +#include "openvswitch/types.h"
> +
> +#ifdef  __cplusplus
> +extern "C" {
> +#endif
> +
> +typedef enum {
> +    mirror_data_find, /* find the mirror-data allocated */
> +    mirror_data_add, /* add a new mirror_param data int DB */
> +    mirror_data_rem, /* remove a mirror_param from the DB */
> +} mirror_data_op;
> +
> +typedef int (*rte_mirror_scan_fn)(struct rte_mbuf *pkt, void *user_param);
> +typedef enum {
> +    mirror_port, /* port mirror */
> +    mirror_flow_mac, /* flow mirror according to source mac */
> +    mirror_flow_custom,  /* flow mirror according to a callback scn */
> +    mirror_invalid,      /* invalid mirror_type */
> +} rte_mirror_type;
> +
> +typedef union {
> +    const struct rte_eth_rxtx_callback *pmd;
> +    struct rte_eth_rxtx_callback *direct;
> +} mirror_fn_cb;
> +
> +struct mirror_param {
> +    uint16_t dst_port_id;
> +    uint16_t dst_vlan_id;
> +    rte_spinlock_t *locks;
> +    int n_src_queue;
> +    int n_dst_queue;
> +    struct rte_mbuf **pkt_buf;
> +    mirror_fn_cb *mirror_cb;
> +    unsigned int max_burst_size;
> +    rte_mirror_scan_fn custom_scan;
> +    rte_mirror_type mirror_type;
> +    unsigned int extra_data_size;
> +    void *extra_data; /* extra mirror parameter */
> +};
> +
> +struct mirror_offload_port {
> +    uint32_t dev_id;
> +    struct mirror_param rx;
> +    struct mirror_param tx;
> +};
> +
> +bool netdev_port_started(uint16_t port_id, uint32_t *num_tx_queue);
> +int netdev_get_portid_from_addr(const char *pci_addr_str, uint16_t *port_id);
> +int netdev_tunnel_port_setup(uint16_t portid, uint32_t *num_queue);
> +
> +void netdev_mirror_data_proc(uint32_t dev_id, mirror_data_op op,
> +    int tx, struct mirror_param *in_param,
> +    struct mirror_offload_port **out_param);
> +void netdev_mirror_cb_set(struct mirror_param *data, uint16_t port_id,
> +    int pmd, int tx);
> +int netdev_eth_register_mirror(uint16_t src_port,
> +    struct mirror_param *param, int tx_cb);
> +int netdev_eth_unregister_mirror(uint16_t src_port, int tx_cb);
> +
> +#ifdef  __cplusplus
> +}
> +#endif
> +
> +#endif /* netdev-dpdk-mirror.h */
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index 9d8096668..eb6644333 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -48,6 +48,7 @@
>  #include "fatal-signal.h"
>  #include "if-notifier.h"
>  #include "netdev-provider.h"
> +#include "netdev-dpdk-mirror.h"
>  #include "netdev-vport.h"
>  #include "odp-util.h"
>  #include "openvswitch/dynamic-string.h"
> @@ -171,6 +172,16 @@ static const struct rte_eth_conf port_conf = {
>      },
>  };
>  
> +struct mirror_tunnel_port_info {
> +    uint16_t port_id;
> +    rte_spinlock_t *locks;
> +    uint32_t share_count;
> +    uint32_t num_queue;
> +    bool port_started;
> +    struct mirror_tunnel_port_info *next;
> +};
> +static struct mirror_tunnel_port_info *mirror_tunnel_head = NULL;
> +
>  /*
>   * These callbacks allow virtio-net devices to be added to vhost ports when
>   * configuration has been fully completed.
> @@ -443,6 +454,8 @@ struct netdev_dpdk {
>          };
>          struct dpdk_tx_queue *tx_q;
>          struct rte_eth_link link;
> +        mirror_fn_cb *rx_cb; /* shared pointer */
> +        mirror_fn_cb *tx_cb;
>      );
>  
>      PADDED_MEMBERS_CACHELINE_MARKER(CACHE_LINE_SIZE, cacheline1,
> @@ -2417,6 +2430,13 @@ netdev_dpdk_vhost_rxq_recv(struct netdev_rxq *rxq,
>      nb_rx = rte_vhost_dequeue_burst(vid, qid, dev->dpdk_mp->mp,
>                                      (struct rte_mbuf **) batch->packets,
>                                      NETDEV_MAX_BURST);
> +
> +    if (dev->rx_cb && dev->rx_cb[qid].direct->fn.rx) {
> +        dev->rx_cb[qid].direct->fn.rx((uint16_t) vid, qid,
> +        (struct rte_mbuf **) batch->packets, nb_rx,
> +        NETDEV_MAX_BURST, dev->rx_cb[qid].direct->param);
> +    }
> +
>      if (!nb_rx) {
>          return EAGAIN;
>      }
> @@ -2634,6 +2654,10 @@ __netdev_dpdk_vhost_send(struct netdev *netdev, int qid,
>          int vhost_qid = qid * VIRTIO_QNUM + VIRTIO_RXQ;
>          unsigned int tx_pkts;
>  
> +        if (dev->tx_cb && dev->tx_cb[qid].direct->fn.tx) {
> +            dev->tx_cb[qid].direct->fn.tx((uint16_t) vid, qid, cur_pkts, cnt,
> +                dev->tx_cb[qid].direct->param);
> +        }
>          tx_pkts = rte_vhost_enqueue_burst(vid, vhost_qid, cur_pkts, cnt);
>          if (OVS_LIKELY(tx_pkts)) {
>              /* Packets have been sent.*/
> @@ -5291,6 +5315,376 @@ netdev_dpdk_rte_flow_query_count(struct netdev *netdev,
>      return ret;
>  }
>  
> +/*
> + * mirror tunnel device management routines
> + * mirror tunnel devices are devices reserved solely for
> + * traffic mirroring
> + */
> +static void
> +netdev_dpdk_update_mt_list(struct mirror_tunnel_port_info *mt_port_info,
> +                           bool add_port)
> +{
> +    struct mirror_tunnel_port_info *ptr = mirror_tunnel_head;
> +
> +    if (add_port) {
> +        if (!ptr) {
> +            mirror_tunnel_head = mt_port_info;
> +            return;
> +        }
> +        while (ptr->next) {
> +            ptr = ptr->next;
> +        }
> +        ptr->next = mt_port_info;
> +    } else {
> +        while (ptr->next &&
> +            ptr->next->port_id != mt_port_info->port_id) {
> +            ptr = ptr->next;
> +        }
> +
> +        if (ptr->next) {
> +            ptr->next = ptr->next->next;
> +            free(mt_port_info);
> +        } else {
> +            if (ptr->port_id == mt_port_info->port_id) {
> +                mirror_tunnel_head = NULL;
> +                free(mt_port_info);
> +            } else {
> +                VLOG_ERR("Fail to find %s mirror port (%d) info\n",
> +                 add_port?"add":"remove", mt_port_info->port_id);
> +            }
> +        }
> +    }
> +}
> +
> +static struct mirror_tunnel_port_info*
> +netdev_dpdk_get_mt_port_info(uint16_t port_id)
> +{
> +    struct mirror_tunnel_port_info *mt_port_info;
> +
> +    if (mirror_tunnel_head) {
> +        mt_port_info = mirror_tunnel_head;
> +        while (mt_port_info) {
> +            if (mt_port_info->port_id == port_id) {
> +                return mt_port_info;
> +            }
> +            mt_port_info = mt_port_info->next;
> +        }
> +        VLOG_ERR("Could not tunnel port with port-id %d\n",
> +            port_id);
> +    }
> +
> +    mt_port_info = xmalloc(sizeof(struct mirror_tunnel_port_info));
> +    memset(mt_port_info, 0, sizeof(*mt_port_info));
> +    mt_port_info->port_id = port_id;
> +    mt_port_info->next = NULL;
> +
> +    return mt_port_info;
> +}
> +
> +static int
> +netdev_dpdk_addr_to_portid(const char *pci_addr_str, uint16_t *port_id)
> +{
> +    struct rte_pci_device *pci_dev;
> +    struct rte_pci_addr pci_addr;
> +    int i;
> +
> +    if (rte_pci_addr_parse(pci_addr_str, &pci_addr)) {
> +        VLOG_ERR("Incorrect pci address %s\n", pci_addr_str);
> +        return -1;
> +    }
> +
> +    for (i = 0; i < RTE_MAX_ETHPORTS; i++) {
> +        struct rte_pci_addr *eth_pci_addr;
> +
> +        if (!rte_eth_devices[i].device) {
> +            continue;
> +        }
> +
> +        pci_dev = RTE_ETH_DEV_TO_PCI(&rte_eth_devices[i]);
> +        if (!pci_dev) {
> +            continue;
> +        }
> +
> +        eth_pci_addr = &pci_dev->addr;
> +
> +        if (pci_addr.bus == eth_pci_addr->bus &&
> +            pci_addr.devid == eth_pci_addr->devid &&
> +            pci_addr.domain == eth_pci_addr->domain &&
> +            pci_addr.function == eth_pci_addr->function) {
> +            *port_id = i;
> +
> +            return 0;
> +        }
> +    }
> +
> +    return -1;
> +}
> +
> +static int
> +netdev_dpdk_mt_open(uint16_t port_id, struct mirror_param *param)
> +{
> +    struct rte_eth_dev_info dev_info;
> +    struct rte_eth_txconf txq_conf;
> +    struct rte_eth_rxconf rxq_conf;
> +    struct rte_mempool *pktbuf;
> +
> +    struct mirror_tunnel_port_info *mt_info;
> +
> +    uint16_t nb_rxd = NIC_PORT_DEFAULT_RXQ_SIZE;
> +    uint16_t nb_txd = NIC_PORT_DEFAULT_TXQ_SIZE;
> +    unsigned int i, num_queue;
> +
> +    struct rte_eth_conf mt_port_conf = {
> +        .rxmode = {
> +            .split_hdr_size = 0,
> +        },
> +        .txmode = {
> +            .mq_mode = ETH_MQ_TX_NONE,
> +        },
> +    };
> +
> +    mt_info = netdev_dpdk_get_mt_port_info(port_id);
> +    if (!mt_info) {
> +        return -1;
> +    }
> +
> +    if (mt_info->port_started) {
> +        param->n_dst_queue = mt_info->num_queue;
> +        param->dst_port_id = port_id;
> +        param->locks = mt_info->locks;
> +        mt_info->share_count++;
> +
> +        return 0;
> +    }
> +
> +    rte_eth_dev_info_get(port_id, &dev_info);
> +    num_queue = param->n_src_queue;
> +
> +    /* A tunnel device doesn't require mbuf. It's used as
> +     * hardware channel, transmit packets with
> +     * mbuf provided by source. Need this mbuf creation
> +     * to finish port initialization
> +     */
> +    pktbuf = rte_pktmbuf_pool_create(
> +            "tunnel-port",
> +            (dev_info.rx_desc_lim.nb_max + dev_info.tx_desc_lim.nb_max),
> +            RTE_MEMPOOL_CACHE_MAX_SIZE, 0, RTE_MBUF_DEFAULT_BUF_SIZE,
> +            rte_eth_dev_socket_id(port_id));
> +
> +    mt_port_conf.txmode.offloads |= DEV_TX_OFFLOAD_VLAN_INSERT;
> +    if (dev_info.tx_offload_capa & DEV_TX_OFFLOAD_MBUF_FAST_FREE) {
> +        mt_port_conf.txmode.offloads |= DEV_TX_OFFLOAD_MBUF_FAST_FREE;
> +    }
> +    rte_eth_dev_configure(port_id, 1, num_queue, &mt_port_conf);
> +
> +    /* init one Rx queue */
> +    rxq_conf = dev_info.default_rxconf;
> +    rxq_conf.offloads = mt_port_conf.rxmode.offloads;
> +    if (rte_eth_rx_queue_setup(port_id, 0, nb_rxd,
> +        rte_eth_dev_socket_id(port_id), &rxq_conf, pktbuf) < 0)
> +        VLOG_ERR("fail to setup tunnel port (%d) rx-queue\n", port_id);
> +
> +    /* init # of Tx queue as part of mirror-tunnel setup */
> +    txq_conf = dev_info.default_txconf;
> +    txq_conf.offloads |= mt_port_conf.txmode.offloads;
> +    for (i = 0; i < num_queue; i++) {
> +        if (rte_eth_tx_queue_setup(port_id,
> +            i, nb_txd,
> +            rte_eth_dev_socket_id(port_id),
> +            &txq_conf) < 0) {
> +            VLOG_ERR("fail to setup tunnel port (%d) tx queue #%u\n",
> +                port_id, i);
> +            return -1;
> +        }
> +    }
> +
> +    if (rte_eth_dev_start(port_id) < 0) {
> +        VLOG_ERR("fail to start tunnel port %d\n", port_id);
> +        return -1;
> +    }
> +
> +    mt_info->locks = xmalloc(num_queue * sizeof(rte_spinlock_t));
> +    if (mt_info->locks) {
> +        for (i = 0; i < mt_info->num_queue; i++) {
> +            rte_spinlock_init(&mt_info->locks[i]);
> +        }
> +    } else {
> +        return -1;
> +    }
> +    mt_info->share_count = 1;
> +    mt_info->port_started = true;
> +    mt_info->num_queue = num_queue;
> +
> +    param->n_dst_queue = mt_info->num_queue;
> +    param->dst_port_id = port_id;
> +    param->locks = mt_info->locks;
> +
> +    netdev_dpdk_update_mt_list(mt_info, true);
> +    return 0;
> +}
> +
> +static void
> +netdev_dpdk_mt_close(uint16_t mirror_port_id)
> +{
> +    struct mirror_tunnel_port_info *mt_port_info =
> +        netdev_dpdk_get_mt_port_info(mirror_port_id);
> +
> +    if (mt_port_info) {
> +        mt_port_info->share_count--;
> +        if (!mt_port_info->share_count) {
> +            netdev_dpdk_update_mt_list(mt_port_info, false);
> +            rte_eth_dev_stop(mirror_port_id);
> +            rte_eth_dev_close(mirror_port_id);
> +        }
> +    }
> +}
> +
> +/* vhost device mirror registration and un-registration routines */
> +static int
> +netdev_vhost_register_mirror(struct netdev_dpdk *dev,
> +    struct mirror_param *param, int tx_cb)
> +{
> +    uint32_t vid = netdev_dpdk_get_vid(dev);
> +    struct mirror_offload_port *port_info = NULL;
> +    struct mirror_param *data;
> +
> +    netdev_mirror_data_proc(vid, mirror_data_add, tx_cb, param, &port_info);
> +    if (!port_info) {
> +        return -1;
> +    }
> +
> +    data = tx_cb ? &port_info->tx : &port_info->rx;
> +    netdev_mirror_cb_set(data, (uint16_t) vid, 0, tx_cb);
> +
> +    if (tx_cb) {
> +        dev->tx_cb = data->mirror_cb;
> +    } else {
> +        dev->rx_cb = data->mirror_cb;
> +    }
> +
> +    return 0;
> +}
> +
> +static int
> +netdev_vhost_unregister_mirror(struct netdev_dpdk *dev, int tx_cb)
> +{
> +    /* release both cb and pkt_buf */
> +    unsigned int i;
> +    uint32_t vid = netdev_dpdk_get_vid(dev);
> +    struct mirror_offload_port *port_info = NULL;
> +    struct mirror_param *data;
> +
> +    netdev_mirror_data_proc(vid, mirror_data_find, tx_cb, NULL, &port_info);
> +    if (port_info == NULL) {
> +        VLOG_ERR("Source port %d is not on outstanding port mirror db\n", vid);
> +        return -1;
> +    }
> +    data = tx_cb ? &port_info->tx : &port_info->rx;
> +
> +    if (tx_cb) {
> +        dev->tx_cb = NULL;
> +    } else {
> +        dev->rx_cb = NULL;
> +    }
> +
> +    for (i = 0; i < data->n_src_queue; i++) {
> +        free(data->mirror_cb[i].direct);
> +    }
> +
> +    free(data->mirror_cb);
> +
> +    if (data->pkt_buf) {
> +        free(data->pkt_buf);
> +        data->pkt_buf = NULL;
> +    }
> +
> +    if (data->extra_data) {
> +        free(data->extra_data);
> +        data->extra_data = NULL;
> +        data->extra_data_size = 0;
> +    }
> +
> +    netdev_mirror_data_proc(vid,  mirror_data_rem, tx_cb, NULL, NULL);
> +    return 0;
> +}
> +
> +static int
> +netdev_dpdk_mirror_offload(struct netdev *src, struct eth_addr *flow_addr,
> +                           uint16_t vlan_id, char *mirror_tunnel_addr,
> +                           bool add_mirror, bool tx_cb) {
> +    struct netdev_dpdk *src_dev = netdev_dpdk_cast(src);
> +    bool eth_dev = src_dev->type == DPDK_DEV_ETH;
> +    uint16_t mirror_port_id;
> +    int status = 0;
> +
> +    if (netdev_dpdk_addr_to_portid(mirror_tunnel_addr, &mirror_port_id)) {
> +        VLOG_ERR("Could not find tunnel port with BDF addr %s\n",
> +            mirror_tunnel_addr);
> +        return -1;
> +    }
> +    if (add_mirror) {
> +        uint32_t i;
> +        struct mirror_param data;
> +        uint64_t mac_addr = 0;
> +
> +        memset(&data, 0, sizeof(struct mirror_param));
> +        data.extra_data_size = 0;
> +        data.extra_data = NULL;
> +        data.mirror_type = mirror_port;
> +        for (i = 0; i < 6; i++) {
> +            mac_addr <<= 8;
> +            mac_addr |= flow_addr->ea[6 - i - 1];
> +        }
> +        if (mac_addr) {
> +            data.mirror_type = mirror_flow_mac;
> +            data.extra_data_size = sizeof(uint64_t);
> +            data.extra_data = xmalloc(sizeof(uint64_t));
> +            memcpy(data.extra_data, &mac_addr, sizeof(uint64_t));
> +        }
> +        data.dst_vlan_id = vlan_id;
> +        data.n_src_queue = tx_cb?src->n_txq:src->n_rxq;
> +        data.max_burst_size = NETDEV_MAX_BURST;
> +
> +        if (netdev_dpdk_mt_open(mirror_port_id, &data)) {
> +            VLOG_ERR("Fail to initialize mirror tunnel port %d\n",
> +                mirror_port_id);
> +            return -1;
> +        }
> +
> +        VLOG_INFO("register %s device with %s mirror-offload with"
> +            "src-port:%d (%s) and output-port:%d (%s) vlan-id=%d flow-mac="
> +            "0x%" PRIx64 "\n",
> +            eth_dev?"ethdev":"vhost",
> +            tx_cb?"ingress":"egress", src_dev->port_id,
> +            src->name, mirror_port_id, mirror_tunnel_addr, vlan_id,
> +            (uint64_t)__builtin_bswap64(mac_addr));
> +
> +        if (eth_dev) {
> +            status = netdev_eth_register_mirror(src_dev->port_id, &data,
> +                tx_cb);
> +        } else {
> +            status = netdev_vhost_register_mirror(src_dev, &data, tx_cb);
> +        }
> +    } else {
> +        VLOG_INFO("unregister %s device with %s mirror-offload with"
> +            " src-port:%d(%s)\n",
> +            eth_dev?"ethdev":"vhost",
> +            tx_cb?"ingress":"egress", src_dev->port_id,
> +            src->name);
> +
> +        if (eth_dev) {
> +            status = netdev_eth_unregister_mirror(src_dev->port_id, tx_cb);
> +        } else {
> +            status = netdev_vhost_unregister_mirror(src_dev, tx_cb);
> +        }
> +
> +        netdev_dpdk_mt_close(mirror_port_id);
> +    }
> +
> +    return status;
> +}
> +
>  #define NETDEV_DPDK_CLASS_COMMON                            \
>      .is_pmd = true,                                         \
>      .alloc = netdev_dpdk_alloc,                             \
> @@ -5340,6 +5734,7 @@ static const struct netdev_class dpdk_class = {
>      .construct = netdev_dpdk_construct,
>      .set_config = netdev_dpdk_set_config,
>      .send = netdev_dpdk_eth_send,
> +    .mirror_offload = netdev_dpdk_mirror_offload,
>  };
>  
>  static const struct netdev_class dpdk_vhost_class = {
> @@ -5355,6 +5750,7 @@ static const struct netdev_class dpdk_vhost_class = {
>      .reconfigure = netdev_dpdk_vhost_reconfigure,
>      .rxq_recv = netdev_dpdk_vhost_rxq_recv,
>      .rxq_enabled = netdev_dpdk_vhost_rxq_enabled,
> +    .mirror_offload = netdev_dpdk_mirror_offload,
>  };
>  
>  static const struct netdev_class dpdk_vhost_client_class = {
> @@ -5371,6 +5767,7 @@ static const struct netdev_class dpdk_vhost_client_class = {
>      .reconfigure = netdev_dpdk_vhost_client_reconfigure,
>      .rxq_recv = netdev_dpdk_vhost_rxq_recv,
>      .rxq_enabled = netdev_dpdk_vhost_rxq_enabled,
> +    .mirror_offload = ne²tdev_dpdk_mirror_offload,
>  };
>  
>  void
> diff --git a/lib/netdev-provider.h b/lib/netdev-provider.h
> index 73dce2fca..dab278dcd 100644
> --- a/lib/netdev-provider.h
> +++ b/lib/netdev-provider.h
> @@ -834,6 +834,22 @@ struct netdev_class {
>      /* Get a block_id from the netdev.
>       * Returns the block_id or 0 if none exists for netdev. */
>      uint32_t (*get_block_id)(struct netdev *);
> +
> +    /* Configure a mirror offload setting on a netdev.
> +     * 'src': netdev traffic to be mirrored
> +     * 'flow_addr': the destination mac address is of source traffic for
> +     *  inspection.
> +     * 'dst': netdev where mirror traffic is transmitted.
> +     * 'vlan_id': vlag to be added to the mirrored packets.
> +     * 'mt_pci_addr': mirror tunnel pcie address.
> +     * 'add_mirror': true: configure a mirror traffic; false: remove mirror
> +     * 'ingress': true: mirror 'src' netdev Rx traffic; false: mirror
> +     *  'src' netdev Tx traffic.
> +     */
> +    int (*mirror_offload)(struct netdev *src, struct eth_addr *flow_addr,
> +                          uint16_t vlan_id, char *mt_pci_addr,
> +                          bool add_mirror, bool ingress);
> +
>  };
>  
>  int netdev_register_provider(const struct netdev_class *);
> diff --git a/lib/netdev.c b/lib/netdev.c
> index 91e91955c..464c2f8fe 100644
> --- a/lib/netdev.c
> +++ b/lib/netdev.c
> @@ -69,6 +69,8 @@ COVERAGE_DEFINE(netdev_get_stats);
>  COVERAGE_DEFINE(netdev_send_prepare_drops);
>  COVERAGE_DEFINE(netdev_push_header_drops);
>  
> +#define MIRROR_DB_INIT_SIZE 8
> +
>  struct netdev_saved_flags {
>      struct netdev *netdev;
>      struct ovs_list node;           /* In struct netdev's saved_flags_list. */
> @@ -2297,3 +2299,387 @@ netdev_free_custom_stats_counters(struct netdev_custom_stats *custom_stats)
>          }
>      }
>  }
> +
> +
> +struct netdev_mirror_offload_item {
> +    struct mirror_offload_info info;
> +
> +    struct ovs_list node;
> +};
> +
> +struct netdev_mirror_offload {
> +    struct ovs_mutex mutex;
> +    struct ovs_list list;
> +    pthread_cond_t cond;
> +};
> +
> +static struct netdev_mirror_offload netdev_mirror_offload = {
> +    .mutex = OVS_MUTEX_INITIALIZER,
> +    .list  = OVS_LIST_INITIALIZER(&netdev_mirror_offload.list),
> +};
> +
> +static struct ovsthread_once offload_thread_once
> +    = OVSTHREAD_ONCE_INITIALIZER;
> +
> +static void *netdev_mirror_offload_main(void *data);
> +
> +/*
> + * Re-size mirror_db when it's out of space.
> + * Always double the buffer when it's needed
> + */
> +static int
> +netdev_mirror_db_resize(struct netdev_mirror_offload_item ***old_db,
> +    int *old_db_size)
> +{
> +    struct netdev_mirror_offload_item **new_db;
> +    int cur_size = *old_db_size;
> +    int new_size;
> +
> +    if (!cur_size) {
> +        new_size = MIRROR_DB_INIT_SIZE;
> +    } else {
> +        new_size = 2 * cur_size;
> +    }
> +
> +    new_db = xzalloc(sizeof(struct netdev_mirror_offload_item *) * new_size);
> +
> +    if (!new_db) {
> +        VLOG_ERR("Out of memory!!!");
> +        return -1;
> +    }
> +    memset(new_db, 0, sizeof(struct netdev_mirror_offload_item *) * new_size);
> +
> +    if (cur_size) {
> +        int i;
> +
> +        for (i = 0; i < cur_size; i++) {
> +            new_db[i] = (*old_db)[i];
> +        }
> +        free(*old_db);
> +    }
> +
> +    *old_db = new_db;
> +    *old_db_size = new_size;
> +
> +    return 0;
> +}
> +
> +static void
> +netdev_free_mirror_offload(struct netdev_mirror_offload_item *offload)
> +{
> +    if (!offload) {
> +        return;
> +    }
> +
> +    if (offload->info.src) {
> +        free(offload->info.src);
> +    }
> +    if (offload->info.dst) {
> +        free(offload->info.dst);
> +    }
> +    if (offload->info.flow_dst_mac) {
> +        free(offload->info.flow_dst_mac);
> +    }
> +    if (offload->info.flow_src_mac) {
> +        free(offload->info.flow_src_mac);
> +    }
> +    if (offload->info.output_src_tags) {
> +        free(offload->info.output_src_tags);
> +    }
> +    if (offload->info.output_dst_tags) {
> +        free(offload->info.output_dst_tags);
> +    }
> +    if (offload->info.name) {
> +        free(offload->info.name);
> +    }
> +    if (offload->info.mirror_tunnel_addr) {
> +        free(offload->info.mirror_tunnel_addr);
> +    }
> +
> +    free(offload);
> +}
> +
> +static struct
> +netdev_mirror_offload_item *
> +netdev_alloc_mirror_offload(struct mirror_offload_info *info)
> +{
> +    struct netdev_mirror_offload_item *offload;
> +    int i;
> +
> +    offload = xzalloc(sizeof(*offload));
> +    memcpy(&offload->info, info, sizeof(struct mirror_offload_info));
> +
> +    if (info->name) {
> +        offload->info.name = xzalloc(strlen(info->name) + 1);
> +        if (offload->info.name) {
> +            ovs_strzcpy(offload->info.name, info->name, strlen(info->name));
> +        }
> +    }
> +
> +    if (info->mirror_tunnel_addr) {
> +        offload->info.mirror_tunnel_addr =
> +            xzalloc(strlen(info->mirror_tunnel_addr) + 1);
> +        if (offload->info.mirror_tunnel_addr) {
> +            ovs_strzcpy(offload->info.mirror_tunnel_addr,
> +                        info->mirror_tunnel_addr,
> +                        strlen(info->mirror_tunnel_addr));
> +        }
> +    }
> +
> +    /* only add_mirror request include valid configuration */
> +    if (info->n_src_port) {
> +        offload->info.src = xzalloc(sizeof(struct netdev *)*info->n_src_port);
> +        offload->info.flow_dst_mac = xzalloc(sizeof(struct eth_addr)*
> +            info->n_src_port);
> +        offload->info.output_src_tags = xzalloc(sizeof(uint16_t)*
> +            info->n_src_port);
> +        if (!offload->info.src || !offload->info.flow_dst_mac ||
> +            !offload->info.output_src_tags) {
> +            VLOG_ERR("Out of memory!!!");
> +            netdev_free_mirror_offload(offload);
> +            return NULL;
> +        }
> +
> +        for (i = 0; i < info->n_src_port; i++) {
> +            offload->info.src[i] = info->src[i];
> +            offload->info.output_src_tags[i] = info->output_src_tags[i];
> +            memcpy(&offload->info.flow_dst_mac[i], &info->flow_dst_mac[i],
> +                sizeof(struct eth_addr));
> +        }
> +    }
> +
> +    if (info->n_dst_port) {
> +        offload->info.dst = xzalloc(sizeof(struct netdev *)*info->n_dst_port);
> +        offload->info.flow_src_mac = xzalloc(sizeof(struct eth_addr)*
> +            info->n_dst_port);
> +        offload->info.output_dst_tags = xzalloc(sizeof(uint16_t)*
> +            info->n_dst_port);
> +        if (!offload->info.dst || !offload->info.flow_src_mac ||
> +            !offload->info.output_dst_tags) {
> +            VLOG_ERR("Out of memory!!!");
> +            netdev_free_mirror_offload(offload);
> +            return NULL;
> +        }
> +
> +        for (i = 0; i < info->n_dst_port; i++) {
> +            offload->info.dst[i] = info->dst[i];
> +            offload->info.output_dst_tags[i] = info->output_dst_tags[i];
> +            memcpy(&offload->info.flow_src_mac[i], &info->flow_src_mac[i],
> +                sizeof(struct eth_addr));
> +        }
> +    }
> +
> +    return offload;
> +}
> +
> +static void
> +netdev_append_mirror_offload(struct netdev_mirror_offload_item *offload)
> +{
> +    ovs_mutex_lock(&netdev_mirror_offload.mutex);
> +    ovs_list_push_back(&netdev_mirror_offload.list, &offload->node);
> +    xpthread_cond_signal(&netdev_mirror_offload.cond);
> +    ovs_mutex_unlock(&netdev_mirror_offload.mutex);
> +}
> +
> +void
> +netdev_mirror_offload_put(struct mirror_offload_info *info)
> +{
> +    struct netdev_mirror_offload_item *offload;
> +    /* only support tunnel port for traffic mirroring */
> +    if (info->add_mirror && !info->mirror_tunnel_addr) {
> +        return;
> +    }
> +
> +    if (ovsthread_once_start(&offload_thread_once)) {
> +        xpthread_cond_init(&netdev_mirror_offload.cond, NULL);
> +        ovs_thread_create("netdev_mirror_offload",
> +                          netdev_mirror_offload_main, NULL);
> +        ovsthread_once_done(&offload_thread_once);
> +    }
> +
> +    offload = netdev_alloc_mirror_offload(info);
> +    netdev_append_mirror_offload(offload);
> +}
> +
> +static int
> +netdev_mirror_offload_configue(struct mirror_offload_info *info,
> +    bool add_mirror)
> +{
> +    int un_support_count = 0;
> +    int ret;
> +
> +    if (info->n_src_port) {
> +        for (int i = 0; i < info->n_src_port; i++) {
> +            const struct netdev_class *class =
> +                info->src[i]->netdev_class;
> +            if (!class) {
> +                return -1;
> +            }
> +            if (class->mirror_offload) {
> +                ret = class->mirror_offload(
> +                    info->src[i],
> +                    &info->flow_dst_mac[i],
> +                    info->output_src_tags[i],
> +                    info->mirror_tunnel_addr,
> +                    add_mirror, false);
> +                if (ret) {
> +                    VLOG_ERR("Fail to %s mirror-offload"
> +                        " configuration %s\n",
> +                        add_mirror ? "add" : "remove",
> +                        info->name);
> +                    return ret;
> +                }
> +            } else {
> +                un_support_count++;
> +            }
> +        }
> +    }
> +
> +    if (info->n_dst_port) {
> +        for (int i = 0; i < info->n_dst_port; i++) {
> +            const struct netdev_class *class =
> +                info->dst[i]->netdev_class;
> +            if (!class) {
> +                return -1;
> +            }
> +            if (class->mirror_offload) {
> +                ret = class->mirror_offload(
> +                    info->dst[i],
> +                    &info->flow_src_mac[i],
> +                    info->output_dst_tags[i],
> +                    info->mirror_tunnel_addr,
> +                    add_mirror, true);
> +                if (ret) {
> +                    VLOG_ERR("Fail to %s mirror-offload"
> +                        " configuration %s\n",
> +                        add_mirror ? "add" : "remove",
> +                        info->name);
> +                    return ret;
> +                }
> +            } else {
> +                un_support_count++;
> +            }
> +        }
> +    }
> +
> +    return un_support_count;
> +}
> +
> +static void *
> +netdev_mirror_offload_main(void *data OVS_UNUSED)
> +{
> +    struct netdev_mirror_offload_item *offload;
> +    struct mirror_offload_info *info;
> +    struct ovs_list *list;
> +    struct netdev_mirror_offload_item **offload_db = NULL;
> +    int offload_used_count = 0;
> +    int offload_db_size = 0;
> +    int ret, i, ind;
> +
> +    /* continue polling to check if there is an outstanding request */
> +    for (;;) {
> +        ovs_mutex_lock(&netdev_mirror_offload.mutex);
> +        if (ovs_list_is_empty(&netdev_mirror_offload.list)) {
> +            ovsrcu_quiesce_start();
> +            ovs_mutex_cond_wait(&netdev_mirror_offload.cond,
> +                                &netdev_mirror_offload.mutex);
> +            ovsrcu_quiesce_end();
> +        }
> +        list = ovs_list_pop_front(&netdev_mirror_offload.list);
> +        offload = CONTAINER_OF(list, struct netdev_mirror_offload_item,
> +            node);
> +        ovs_mutex_unlock(&netdev_mirror_offload.mutex);
> +
> +        if (!offload_db_size &&
> +            netdev_mirror_db_resize(&offload_db, &offload_db_size)){
> +            return NULL;
> +        }
> +
> +        ind = offload_db_size;
> +        for (i = 0; i < offload_db_size; i++) {
> +            if (offload_db[i] &&
> +                !strncmp(offload_db[i]->info.name, offload->info.name,
> +                strlen(offload->info.name) + 1)) {
> +                ind = i;
> +                break;
> +            }
> +        }
> +
> +        if (!offload->info.add_mirror) {
> +            /* remove mirror offload setup */
> +            if (ind == offload_db_size) {
> +                VLOG_WARN("Mirror offload remove configuration, %s, "
> +                    "not found; clear mirror offload operation"
> +                    " aborted\n", offload->info.name);
> +                continue;
> +            }
> +        } else {
> +            /* add mirror offload */
> +            if (ind < offload_db_size) {
> +                netdev_free_mirror_offload(offload);
> +                VLOG_WARN("Attempt adding an existing mirror-offload "
> +                    "configuration; request aborted\n");
> +                continue;
> +            }
> +
> +            if (offload_used_count == offload_db_size &&
> +                netdev_mirror_db_resize(&offload_db, &offload_db_size)) {
> +                return NULL;
> +            }
> +        }
> +
> +        info = offload->info.add_mirror ? &offload->info :
> +            &offload_db[ind]->info;
> +        ret = netdev_mirror_offload_configue(info, offload->info.add_mirror);
> +
> +        if (ret) {
> +            VLOG_ERR("%s mirror configuration fails due to %s\n",
> +                offload->info.add_mirror ? "Add" : "Remove",
> +                ret > 0 ? "unsupport source traffic type" :
> +                "device is not ready");
> +            netdev_free_mirror_offload(offload);
> +            continue;
> +        } else {
> +            VLOG_INFO("Succeed %s mirror-offload configuration: %s",
> +                offload->info.add_mirror ? "adding" : "removing",
> +                offload->info.name);
> +        }
> +
> +        if (offload->info.add_mirror) {
> +            for (i = 0; i < offload_db_size; i++) {
> +                if (offload_db[i] == NULL) {
> +                    offload_db[i] = offload;
> +                    offload_used_count++;
> +                    break;
> +                }
> +            }
> +        } else {
> +            /* remove the prior "add" request */
> +            netdev_free_mirror_offload(offload_db[ind]);
> +            offload_db[ind] = NULL;
> +
> +            /* remove the current("remove") request */
> +            netdev_free_mirror_offload(offload);
> +            offload_used_count--;
> +        }
> +
> +        /* free db when the used count drop to 0 */
> +        if (!offload_used_count) {
> +            free(offload_db);
> +            offload_db = NULL;
> +            offload_db_size = 0;
> +        }
> +    }
> +
> +    /* clean up memory */
> +    for (i = 0; i < offload_db_size; i++) {
> +        if (offload_db[i]) {
> +            netdev_free_mirror_offload(offload_db[i]);
> +        }
> +    }
> +    if (offload_db) {
> +        free(offload_db);
> +    }
> +
> +    return NULL;
> +}
> diff --git a/lib/netdev.h b/lib/netdev.h
> index b705a9e56..cce042fc7 100644
> --- a/lib/netdev.h
> +++ b/lib/netdev.h
> @@ -201,6 +201,22 @@ int netdev_send(struct netdev *, int qid, struct dp_packet_batch *,
>                  bool concurrent_txq);
>  void netdev_send_wait(struct netdev *, int qid);
>  
> +/* Hardware assisted mirror offloading*/
> +struct mirror_offload_info {
> +    struct netdev **src;
> +    struct netdev **dst;
> +    int n_src_port;
> +    int n_dst_port;
> +    struct eth_addr *flow_src_mac;
> +    struct eth_addr *flow_dst_mac;
> +    uint16_t *output_src_tags;
> +    uint16_t *output_dst_tags;
> +    bool add_mirror;
> +    char *mirror_tunnel_addr;
> +    char *name;
> +};
> +void netdev_mirror_offload_put(struct mirror_offload_info *);
> +
>  /* native tunnel APIs */
>  /* Structure to pass parameters required to build a tunnel header. */
>  struct netdev_tnl_build_header_params {
> diff --git a/tests/ovs-vsctl.at b/tests/ovs-vsctl.at
> index dccb11741..ff6e9e625 100644
> --- a/tests/ovs-vsctl.at
> +++ b/tests/ovs-vsctl.at
> @@ -1364,7 +1364,9 @@ _uuid               : <1>
>  name                : eth1
>  _uuid               : <2>
>  name                : mymirror
> +output_dst_vlan     : []
>  output_port         : <1>
> +output_src_vlan     : []
>  output_vlan         : []
>  select_all          : false
>  select_dst_port     : [<0>]
> diff --git a/vswitchd/bridge.c b/vswitchd/bridge.c
> index 5ed7e8234..7b7603513 100644
> --- a/vswitchd/bridge.c
> +++ b/vswitchd/bridge.c
> @@ -38,6 +38,7 @@
>  #include "mac-learning.h"
>  #include "mcast-snooping.h"
>  #include "netdev.h"
> +#include "netdev-provider.h"
>  #include "netdev-offload.h"
>  #include "nx-match.h"
>  #include "ofproto/bond.h"
> @@ -330,6 +331,9 @@ static void mirror_destroy(struct mirror *);
>  static bool mirror_configure(struct mirror *);
>  static void mirror_refresh_stats(struct mirror *);
>  
> +static void mirror_offload_destroy(struct mirror *);
> +static bool mirror_offload_configure(struct mirror *);
> +
>  static void iface_configure_lacp(struct iface *,
>                                   struct lacp_member_settings *);
>  static bool iface_create(struct bridge *, const struct ovsrec_interface *,
> @@ -423,6 +427,35 @@ if_notifier_changed(struct if_notifier *notifier OVS_UNUSED)
>      seq_wait(ifaces_changed, last_ifaces_changed);
>      return changed;
>  }
> +
> +static struct port *
> +port_lookup_all(const char *port_name)
> +{
> +    struct bridge *br;
> +    struct port *port = NULL;
> +    int found = 0;
> +
> +    HMAP_FOR_EACH (br, node, &all_bridges) {
> +        struct port *temp_port = NULL;
> +        temp_port = port_lookup(br, port_name);
> +        if (temp_port) {
> +            if (!port) {
> +                port = temp_port;
> +            }
> +            found++;
> +        }
> +    }
> +
> +    if (found) {
> +        if (found > 1) {
> +            VLOG_INFO("More than one bridge owns port with name:%s\n",
> +                port_name);
> +        }
> +        return port;
> +    }
> +    return NULL;
> +}
> +
>  
>  /* Public functions. */
>  
> @@ -5055,14 +5088,228 @@ mirror_create(struct bridge *br, const struct ovsrec_mirror *cfg)
>      return m;
>  }
>  
> +static struct netdev *get_netdev_from_port(struct mirror *m,
> +    struct port **port, const char *name)
> +{
> +    struct port *temp_port;
> +    struct iface *iface;
> +
> +    *port = NULL;
> +    temp_port = port_lookup(m->bridge, name);
> +    if (temp_port) {
> +        LIST_FOR_EACH (iface, port_elem, &temp_port->ifaces) {
> +            if (iface) {
> +                *port = temp_port;
> +                return iface->netdev;
> +            }
> +        }
> +    }
> +    /* try different bridges */
> +    temp_port = port_lookup_all(name);
> +    if (temp_port) {
> +        LIST_FOR_EACH (iface, port_elem, &temp_port->ifaces) {
> +            if (iface) {
> +                *port = temp_port;
> +                return iface->netdev;
> +            }
> +        }
> +    }
> +    return NULL;
> +}
> +
> +static void
> +release_mirror_offload_info(struct mirror_offload_info *info)
> +{
> +    if (info->src) {
> +        free(info->src);
> +    }
> +    if (info->dst) {
> +        free(info->dst);
> +    }
> +    if (info->flow_dst_mac) {
> +        free(info->flow_dst_mac);
> +    }
> +    if (info->flow_src_mac) {
> +        free(info->flow_src_mac);
> +    }
> +    if (info->output_src_tags) {
> +        free(info->output_src_tags);
> +    }
> +    if (info->output_dst_tags) {
> +        free(info->output_dst_tags);
> +    }
> +    if (info->name) {
> +        free(info->name);
> +    }
> +    if (info->mirror_tunnel_addr) {
> +        free(info->mirror_tunnel_addr);
> +    }
> +}
> +
> +static int
> +set_mirror_offload_info(struct mirror *m, struct mirror_offload_info *info)
> +{
> +    const struct ovsrec_mirror *cfg = m->cfg;
> +    struct port *port = NULL;
> +    int i;
> +
> +    if (m->name) {
> +        info->name = xmalloc(strlen(m->name) + 1);
> +        ovs_strzcpy(info->name, m->name, strlen(m->name));
> +    }
> +
> +    if (cfg->mirror_tunnel_addr) {
> +        info->mirror_tunnel_addr = xmalloc(strlen(cfg->mirror_tunnel_addr)
> +            + 1);
> +        ovs_strzcpy(info->mirror_tunnel_addr, cfg->mirror_tunnel_addr,
> +                    strlen(cfg->mirror_tunnel_addr));
> +    } else {
> +        VLOG_ERR("mirror-offload configuration fails because"
> +            " lack of tunnel device\n");
> +        return -1;
> +    }
> +
> +    /* source port */
> +    info->n_src_port = cfg->n_select_src_port;
> +    if (info->n_src_port) {
> +        info->src = xmalloc(sizeof(struct netdev *)*info->n_src_port);
> +        info->flow_dst_mac = xmalloc(sizeof(struct eth_addr)*
> +            info->n_src_port);
> +        if (info->n_src_port != cfg->n_output_src_vlan) {
> +            VLOG_ERR("src port count:%d ouput src vlan count:%lu",
> +                info->n_src_port, (unsigned long) cfg->n_output_src_vlan);
> +            return -1;
> +        }
> +        info->output_src_tags = xmalloc(sizeof(uint16_t)*info->n_src_port);
> +    }
> +
> +    if (info->n_src_port) {
> +        /* find netdev instance for each port */
> +        for (i = 0; i < info->n_src_port; i++) {
> +            info->src[i] = get_netdev_from_port(m, &port,
> +                cfg->select_src_port[i]->name);
> +            if (!info->src[i]) {
> +                VLOG_ERR("src-port: %s is not a netdev device\n",
> +                    cfg->select_src_port[i]->name);
> +                return -1;
> +            }
> +        }
> +        memset(info->flow_dst_mac, 0, sizeof(struct eth_addr)*
> +            info->n_src_port);
> +
> +        /*
> +         * for source port, flow is separated by
> +         * different dst mac addr
> +         */
> +        if (cfg->n_flow_dst_mac) {
> +            int dst_count = (info->n_src_port > cfg->n_flow_dst_mac)?
> +                cfg->n_flow_dst_mac:info->n_src_port;
> +            for (i = 0; i < dst_count; i++) {
> +                eth_addr_from_string(cfg->flow_dst_mac[i],
> +                    &info->flow_dst_mac[i]);
> +            }
> +        }
> +
> +        if (cfg->n_output_src_vlan) {
> +            int count = (cfg->n_output_src_vlan > info->n_src_port)?
> +                info->n_src_port:cfg->n_output_src_vlan;
> +            for (i = 0; i < count; i++) {
> +                info->output_src_tags[i] = cfg->output_src_vlan[i] & 0xFFF;
> +            }
> +        }
> +    }
> +
> +    /* dst ports */
> +    info->n_dst_port = cfg->n_select_dst_port;
> +    if (info->n_dst_port) {
> +        info->dst = xmalloc(sizeof(struct netdev *)*info->n_dst_port);
> +        info->flow_src_mac = xmalloc(sizeof(struct eth_addr)*
> +            info->n_dst_port);
> +        if (info->n_dst_port != cfg->n_output_dst_vlan) {
> +            VLOG_ERR("dst port count:%d ouput dst vlan count:%lu\n",
> +                info->n_dst_port, (unsigned long) cfg->n_output_dst_vlan);
> +            return -1;
> +        }
> +        info->output_dst_tags = xmalloc(sizeof(uint16_t)*info->n_dst_port);
> +    }
> +
> +    if (info->n_dst_port) {
> +        for (i = 0; i < info->n_dst_port; i++) {
> +            info->dst[i] = get_netdev_from_port(m, &port,
> +                cfg->select_dst_port[i]->name);
> +            if (!info->dst[i]) {
> +                VLOG_ERR("dst-port: %s is not a netdev device\n",
> +                    cfg->select_dst_port[i]->name);
> +                return -1;
> +            }
> +        }
> +        memset(info->flow_src_mac, 0, sizeof(struct eth_addr)*
> +            info->n_dst_port);
> +
> +        /*
> +         * for destination port, flow is separated by
> +         * different src mac addr
> +         */
> +        if (cfg->n_flow_src_mac) {
> +            int src_count = (info->n_dst_port > cfg->n_flow_src_mac)?
> +                cfg->n_flow_src_mac:info->n_dst_port;
> +            for (i = 0; i < src_count; i++) {
> +                eth_addr_from_string(cfg->flow_src_mac[i],
> +                    &info->flow_src_mac[i]);
> +            }
> +        }
> +
> +        if (cfg->n_output_dst_vlan) {
> +            int count = (cfg->n_output_dst_vlan > info->n_dst_port)?
> +                info->n_dst_port:cfg->n_output_dst_vlan;
> +            for (i = 0; i < count; i++) {
> +                info->output_dst_tags[i] = cfg->output_dst_vlan[i] & 0xFFF;
> +            }
> +        }
> +    }
> +
> +    VLOG_INFO("sucess creating mirror-offload(%s): with %d src-port"
> +        " streams %d dst-port streams to tunnel %s\n",
> +        cfg->name, info->n_src_port, info->n_dst_port,
> +        info->mirror_tunnel_addr?info->mirror_tunnel_addr:"none");
> +    return 0;
> +}
> +
> +static void
> +mirror_offload_destroy(struct mirror *m)
> +{
> +    struct mirror_offload_info info;
> +
> +    memset(&info, 0, sizeof(struct mirror_offload_info));
> +    info.add_mirror = false;
> +    if (m->name) {
> +        info.name = xmalloc(strlen(m->name) + 1);
> +        if (info.name) {
> +            ovs_strzcpy(info.name, m->name, strlen(m->name));
> +        }
> +    }
> +
> +    netdev_mirror_offload_put(&info);
> +    if (info.name) {
> +        free(info.name);
> +    }
> +    if (info.mirror_tunnel_addr) {
> +        free(info.mirror_tunnel_addr);
> +    }
> +}
> +
>  static void
>  mirror_destroy(struct mirror *m)
>  {
>      if (m) {
>          struct bridge *br = m->bridge;
>  
> -        if (br->ofproto) {
> -            ofproto_mirror_unregister(br->ofproto, m);
> +        if (m->cfg && m->cfg->mirror_offload) {
> +            mirror_offload_destroy(m);
> +        } else {
> +            if (br->ofproto) {
> +                ofproto_mirror_unregister(br->ofproto, m);
> +            }
>          }
>  
>          hmap_remove(&br->mirrors, &m->hmap_node);
> @@ -5094,12 +5341,32 @@ mirror_collect_ports(struct mirror *m,
>      *n_out_portsp = n_out_ports;
>  }
>  
> +static bool
> +mirror_offload_configure(struct mirror *m)
> +{
> +    struct mirror_offload_info info;
> +
> +    memset(&info, 0, sizeof(struct mirror_offload_info));
> +    info.add_mirror = true;
> +    if (set_mirror_offload_info(m, &info)) {
> +        release_mirror_offload_info(&info);
> +        return false;
> +    }
> +
> +    netdev_mirror_offload_put(&info);
> +    release_mirror_offload_info(&info);
> +    return true;
> +}
> +
>  static bool
>  mirror_configure(struct mirror *m)
>  {
>      const struct ovsrec_mirror *cfg = m->cfg;
>      struct ofproto_mirror_settings s;
>  
> +    if (cfg->mirror_offload) {
> +        return mirror_offload_configure(m);
> +    }
>      /* Set name. */
>      if (strcmp(cfg->name, m->name)) {
>          free(m->name);
> diff --git a/vswitchd/vswitch.ovsschema b/vswitchd/vswitch.ovsschema
> index 0666c8c76..4a1a34a1f 100644
> --- a/vswitchd/vswitch.ovsschema
> +++ b/vswitchd/vswitch.ovsschema
> @@ -1,6 +1,6 @@
>  {"name": "Open_vSwitch",
> - "version": "8.2.0",
> - "cksum": "1076640191 26427",
> + "version": "8.2.1",
> + "cksum": "4051567316 27206",
>   "tables": {
>     "Open_vSwitch": {
>       "columns": {
> @@ -418,8 +418,18 @@
>       "columns": {
>         "name": {
>           "type": "string"},
> +       "mirror_tunnel_addr": {
> +         "type": "string"},
>         "select_all": {
>           "type": "boolean"},
> +       "mirror_offload": {
> +         "type": "boolean"},
> +       "flow_src_mac": {
> +         "type": {"key": {"type": "string"},
> +                  "min": 0, "max": "unlimited"}},
> +       "flow_dst_mac": {
> +         "type": {"key": {"type": "string"},
> +                  "min": 0, "max": "unlimited"}},
>         "select_src_port": {
>           "type": {"key": {"type": "uuid",
>                            "refTable": "Port",
> @@ -440,6 +450,16 @@
>                            "refTable": "Port",
>                            "refType": "weak"},
>                    "min": 0, "max": 1}},
> +       "output_src_vlan": {
> +         "type": {"key": {"type": "integer",
> +                          "minInteger": 0,
> +                          "maxInteger": 4294967295},
> +                  "min": 0, "max": 4096}},
> +       "output_dst_vlan": {
> +         "type": {"key": {"type": "integer",
> +                          "minInteger": 0,
> +                          "maxInteger": 4294967295},
> +                  "min": 0, "max": 4096}},
>         "output_vlan": {
>           "type": {"key": {"type": "integer",
>                            "minInteger": 1,
> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
> index 4597a215d..fd2049a7f 100644
> --- a/vswitchd/vswitch.xml
> +++ b/vswitchd/vswitch.xml
> @@ -4869,11 +4869,35 @@ ovs-vsctl add-port br0 p0 -- set Interface p0 type=patch options:peer=p1 \
>          selected VLANs.
>        </p>
>  
> +      <column name="mirror_tunnel_addr">
> +        BDF string of the tunnel device on which mirrored traffic will be
> +        transmitted.
> +      </column>
> +
>        <column name="select_all">
>          If true, every packet arriving or departing on any port is
>          selected for mirroring.
>        </column>
>  
> +      <column name="mirror_offload">
> +        If true, a hw-assisted port mirroring is configured instead
> +        default mirroring.
> +      </column>
> +
> +      <column name="flow_src_mac">
> +        The source MAC address(es) for per-flow mirroring. Each MAC
> +        address is separate by ','. This parametr is paired with
> +        select_dst_port. A '0' MAC address indicates the requested mirror
> +        is a per-port mirroring, otherwise it's a per-flow mirroring
> +      </column>
> +
> +      <column name="flow_dst_mac">
> +        The destination MAC address(es) for per-flow mirroring. Each MAC
> +        address is separate by ','. This parametr is paired with
> +        select_src_port. A '0' MAC address indicates the requested mirror
> +        is a per-port mirroring, otherwise it's a per-flow mirroring
> +      </column>
> +
>        <column name="select_dst_port">
>          Ports on which departing packets are selected for mirroring.
>        </column>
> @@ -4955,6 +4979,32 @@ ovs-vsctl add-port br0 p0 -- set Interface p0 type=patch options:peer=p1 \
>          </p>
>        </column>
>  
> +      <column name="output_src_vlan">
> +        <p>Output VLAN for selected source port packets, if nonempty.</p>
> +        <p>
> +          <em>Please note:</em> This is different than
> +          <ref column="output-vlan"/> This vlan is used to add an additional
> +          vlan tag on the mirror traffic, regardless it contains vlan or not.
> +          The receive end could choose to filter out this additional vlan.
> +          This option is provided so the mirrored traffic could maintain its
> +          original vlan informaiton, and this mirror can be used to filter
> +          out un-wanted traffic such as in <ref column="mirror_offload"/>.
> +        </p>
> +      </column>
> +
> +      <column name="output_dst_vlan">
> +        <p>Output VLAN for selected destination port packets, if nonempty.</p>
> +        <p>
> +          <em>Please note:</em> This is different than
> +          <ref column="output-vlan"/> This vlan is used to add an additional
> +          vlan tag on the mirror traffic, regardless it contains vlan or not.
> +          The receive end could choose to filter out this additional vlan.
> +          This option is provided so the mirrored traffic could maintain its
> +          original vlan informaiton, and this mirror cab be used to filter
> +          out un-wanted traffic such as in <ref column="mirror_offload"/>.
> +        </p>
> +      </column>
> +
>        <column name="snaplen">
>          <p>Maximum per-packet number of bytes to mirror.</p>
>          <p>A mirrored packet with size larger than <ref column="snaplen"/>
>
Wang, Liang-min May 18, 2021, 6 p.m. UTC | #3
> -----Original Message-----
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> Sent: Tuesday, May 18, 2021 12:15 PM
> To: Miskell, Timothy <timothy.miskell@intel.com>; dev@openvswitch.org
> Cc: Wang, Liang-min <liang-min.wang@intel.com>
> Subject: Re: [PATCH] Extends the existing mirror configuration parameters
> 
> Hi Timothy, Liang-min,
> 
> Thanks for rebasing the patch.
> A list of delta against the first RFC could help the reviewers.
> I notice one change in the right direction is the conversion to Vhost
> API datapath instead of Vhost PMD.
> 
>>In this patch we support both Vhost API and Vhost PMD because OVS supports both
>>VhostUser and Vdev ports.
>>
> Also, I would suggest to have the patch split in several incremental
> patches to ease the review.
> 
>> Thank you for suggestion. We will provide incremental patches on next submission
>>
> On 5/10/21 6:00 PM, Timothy Miskell wrote:
> > From: Liang-min Wang <liang-min.wang@intel.com>
> >
> > The following parameters are added:
> >  - mirror-offload: to turn on/off mirror offloading.
> >  - output-port-name: specify a port, using name string, that is on a different
> >    bridge
> >  - output-src-vlan: output port vlan for each select-src-port.
> >  - output-dst-vlan: output port vlan for each select-dst-port.
> >  - flow-src-mac: use src mac address of each select-dst-port for the header
> >    scan.
> >  - flow-dst-mac: use dst mac address of each select-src-port for the header
> >    scan.
> >  - mirror-tunnel-addr: BDF string of the tunnel device.
> >
> > ovs-vsctl test change because new mirroring parameters are introduced in
> this patch
> 
> It would help to provide examples of usage of these new parameters.
> 
>> Will add examples in the new patches
>>
> > Create a defer procedure call thread to handle all mirror offload requests.
> > This is a light-weight thread which remains in sleep-state when there is no
> new request.
> > This is created between ovs-vsctl and mirror offloading back end
> >
> > Implementing DPDK tx-burst (VIRTIO ingress traffic
> > mirror) and rx-burst (VIRTIO egress traffic mirror) callbacks.
> > Each callback  functions implement the following tasks:
> >  1. Enable per-packet VLAN insertion
> >    - for port mirroring, all packets are enabled per-packet VLAN insertion.
> >    - for flow mirroring, only packet header matches the required mac
> address
> >      are enabled.
> >  2. Sending the packets to the specified transport port (output-port in
> >     mirror offload configuration)
> >    - for port mirroring, all packets are sent to the transport port.
> >    - for flow mirroring, only matched packets are sent.
> >  3. Restore each packet attributes (remove DPDK per-packet offload flag)
> 
> I will for sure have more questions later, but please find a few
> comments/questions below:
> 
> > Signed-off-by: Liang-min Wang <liang-min.wang@intel.com>
> > Tested-by: Timothy Miskell <Timothy.Miskell@intel.com>
> > Suggested-by: Munish Mehan <mm6021@att.com>
> > ---
> >  lib/automake.mk            |   2 +
> >  lib/netdev-dpdk-mirror.c   | 516
> +++++++++++++++++++++++++++++++++++++
> >  lib/netdev-dpdk-mirror.h   |  83 ++++++
> >  lib/netdev-dpdk.c          | 397 ++++++++++++++++++++++++++++
> >  lib/netdev-provider.h      |  16 ++
> >  lib/netdev.c               | 386 +++++++++++++++++++++++++++
> >  lib/netdev.h               |  16 ++
> >  tests/ovs-vsctl.at         |   2 +
> >  vswitchd/bridge.c          | 271 ++++++++++++++++++-
> >  vswitchd/vswitch.ovsschema |  24 +-
> >  vswitchd/vswitch.xml       |  50 ++++
> >  11 files changed, 1759 insertions(+), 4 deletions(-)
> >  create mode 100644 lib/netdev-dpdk-mirror.c
> >  create mode 100644 lib/netdev-dpdk-mirror.h
> >
> > diff --git a/lib/automake.mk b/lib/automake.mk
> > index 39901bd6d..dcafbfaca 100644
> > --- a/lib/automake.mk
> > +++ b/lib/automake.mk
> > @@ -170,6 +170,7 @@ lib_libopenvswitch_la_SOURCES = \
> >  	lib/multipath.h \
> >  	lib/namemap.c \
> >  	lib/netdev-dpdk.h \
> > +	lib/netdev-dpdk-mirror.h \
> >  	lib/netdev-dummy.c \
> >  	lib/netdev-offload.c \
> >  	lib/netdev-offload.h \
> > @@ -460,6 +461,7 @@ if DPDK_NETDEV
> >  lib_libopenvswitch_la_SOURCES += \
> >  	lib/dpdk.c \
> >  	lib/netdev-dpdk.c \
> > +	lib/netdev-dpdk-mirror.c \
> >  	lib/netdev-offload-dpdk.c
> >  else
> >  lib_libopenvswitch_la_SOURCES += \
> > diff --git a/lib/netdev-dpdk-mirror.c b/lib/netdev-dpdk-mirror.c
> > new file mode 100644
> > index 000000000..ff2701660
> > --- /dev/null
> > +++ b/lib/netdev-dpdk-mirror.c
> > @@ -0,0 +1,516 @@
> > +/*
> > + * Copyright (c) 2014, 2015, 2016, 2017 Nicira, Inc.
> > + * Copyright (c) 2019 Mellanox Technologies, Ltd.
> > + *
> > + * Licensed under the Apache License, Version 2.0 (the "License");
> > + * you may not use this file except in compliance with the License.
> > + * You may obtain a copy of the License at:
> > + *
> > + *     http://www.apache.org/licenses/LICENSE-2.0
> > + *
> > + * Unless required by applicable law or agreed to in writing, software
> > + * distributed under the License is distributed on an "AS IS" BASIS,
> > + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
> or implied.
> > + * See the License for the specific language governing permissions and
> > + * limitations under the License.
> > + */
> > +#include <config.h>
> > +#include <rte_ethdev.h>
> > +
> > +#include "netdev-dpdk-mirror.h"
> > +#include "openvswitch/vlog.h"
> > +#include "openvswitch/dynamic-string.h"
> > +#include "util.h"
> > +
> > +#define MAC_ADDR_MAP           0x0000FFFFFFFFFFFFULL
> > +#define is_mac_addr_match(a,b) (((a^b)&MAC_ADDR_MAP) == 0)
> > +#define INIT_MIRROR_DB_SIZE    8
> > +#define INVALID_DEVICE_ID      0xFFFFFFFF
> > +
> > +VLOG_DEFINE_THIS_MODULE(netdev_dpdk_mirror);
> > +
> > +/* port/flow mirror database management routines */
> > +/*
> > + * The below API is for port/flow mirror offloading which uses a different
> DPDK
> > + * interface as rte-flow.
> > + */
> > +static int mirror_port_db_size = 0;
> > +static int mirror_port_used = 0;
> > +static struct mirror_offload_port *mirror_port_db = NULL;
> > +
> > +static void
> > +netdev_mirror_db_init(struct mirror_offload_port *db, int size)
> > +{
> > +    int i;
> > +
> > +    for (i = 0; i < size; i++) {
> > +        db[i].dev_id = INVALID_DEVICE_ID;
> > +        memset(&db[i].rx, 0, sizeof(struct mirror_param));
> > +        memset(&db[i].tx, 0, sizeof(struct mirror_param));
> > +    }
> > +}
> > +
> > +/* Double the db size when it runs out of space */
> > +static int
> > +netdev_mirror_db_resize(void)
> > +{
> > +    int new_size = mirror_port_db_size << 1;
> > +    struct mirror_offload_port *new_db = xmalloc(
> > +        sizeof(struct mirror_offload_port)*new_size);
> > +
> > +    memcpy(new_db, mirror_port_db, sizeof(struct mirror_offload_port)
> > +        *mirror_port_db_size);
> > +    netdev_mirror_db_init(&new_db[mirror_port_db_size],
> mirror_port_db_size);
> > +    mirror_port_db_size = new_size;
> > +    mirror_port_db = new_db;
> > +
> > +    return 0;
> > +}
> > +
> > +
> > +static struct mirror_offload_port*
> > +netdev_mirror_data_find(uint32_t dev_id)
> > +{
> > +    int i;
> > +
> > +    if (mirror_port_db == NULL) {
> > +        return NULL;
> > +    }
> > +
> > +    for (i = 0; i < mirror_port_db_size; i++) {
> > +        if (dev_id == mirror_port_db[i].dev_id) {
> > +            return &mirror_port_db[i];
> > +        }
> > +    }
> > +    return NULL;
> > +}
> > +
> > +static struct mirror_offload_port*
> > +netdev_mirror_data_add(uint32_t dev_id, int tx,
> > +    struct mirror_param *new_param)
> > +{
> > +    struct mirror_offload_port *target = NULL;
> > +    int i;
> > +
> > +    if (!mirror_port_db) {
> > +        mirror_port_db_size = INIT_MIRROR_DB_SIZE;
> > +        mirror_port_db = xmalloc(sizeof(struct mirror_offload_port)*
> > +            mirror_port_db_size);
> > +        netdev_mirror_db_init(mirror_port_db, mirror_port_db_size);
> > +    }
> > +    target = netdev_mirror_data_find(dev_id);
> > +    if (target) {
> > +        if (tx) {
> > +            if (target->tx.mirror_cb) {
> > +                VLOG_ERR("Attempt to add ingress mirror offloading"
> > +                    " on port, %d, while one is outstanding\n", dev_id);
> > +                return target;
> > +            }
> > +
> > +            memcpy(&target->tx, new_param, sizeof(*new_param));
> > +        } else {
> > +            if (target->rx.mirror_cb) {
> > +                VLOG_ERR("Attempt to add egress mirror offloading"
> > +                    " on port, %d, while one is outstanding\n", dev_id);
> > +                return target;
> > +            }
> > +
> > +            memcpy(&target->rx, new_param, sizeof(struct mirror_param));
> > +        }
> > +    } else {
> > +        struct mirror_param *param;
> > +        /* find an unused spot on db */
> > +        for (i = 0; i < mirror_port_db_size; i++) {
> > +            if (mirror_port_db[i].dev_id == INVALID_DEVICE_ID) {
> > +                break;
> > +            }
> > +        }
> > +        if (i == mirror_port_db_size && netdev_mirror_db_resize()) {
> > +                return NULL;
> > +        }
> > +
> > +        param = tx ? &mirror_port_db[i].tx : &mirror_port_db[i].rx;
> > +        memcpy(param, new_param, sizeof(struct mirror_param));
> > +
> > +        target = &mirror_port_db[i];
> > +        target->dev_id = dev_id;
> > +        mirror_port_used ++;
> > +    }
> > +    return target;
> > +}
> > +
> > +static void
> > +netdev_mirror_data_remove(uint32_t dev_id, int tx) {
> > +    struct mirror_offload_port *target = netdev_mirror_data_find(dev_id);
> > +
> > +    if (!target) {
> > +        VLOG_ERR("Attempt to remove unsaved port, %d, %s callback\n",
> > +        dev_id, tx?"tx": "rx");
> > +    }
> > +
> > +    if (tx) {
> > +        memset(&target->tx, 0, sizeof(struct mirror_param));
> > +    } else {
> > +        memset(&target->rx, 0, sizeof(struct mirror_param));
> > +    }
> > +
> > +    if ((target->rx.mirror_cb == NULL) &&
> > +        (target->tx.mirror_cb == NULL)) {
> > +        target->dev_id = INVALID_DEVICE_ID;
> > +        mirror_port_used --;
> > +        /* release port mirror db memory when there
> > +         * is no outstanding port mirror offloading
> > +         * configuration
> > +         */
> > +        if (mirror_port_used == 0) {
> > +            free(mirror_port_db);
> > +            mirror_port_db = NULL;
> > +            mirror_port_db_size = 0;
> > +        }
> > +    }
> > +}
> > +
> > +void
> > +netdev_mirror_data_proc(uint32_t dev_id, mirror_data_op op,
> > +    int tx, struct mirror_param *in_param,
> > +    struct mirror_offload_port **out_param)
> > +{
> > +    switch (op) {
> > +    case mirror_data_find:
> > +        *out_param = netdev_mirror_data_find(dev_id);
> > +        break;
> > +    case mirror_data_add:
> > +        *out_param = netdev_mirror_data_add(dev_id, tx, in_param);
> > +        break;
> > +    case mirror_data_rem:
> > +        netdev_mirror_data_remove(dev_id, tx);
> > +        break;
> > +    }
> > +}
> > +
> > +/* port/flow mirror traffic processors */
> > +static inline uint16_t
> > +netdev_custom_mirror_offload_cb(uint16_t qidx, struct rte_mbuf
> **pkts,
> > +    uint16_t nb_pkts, void *user_params)
> > +{
> > +    struct mirror_param *data = user_params;
> > +    uint16_t i, dst_qidx, match_count = 0;
> > +    uint16_t pkt_trans;
> > +    uint16_t dst_port_id = data->dst_port_id;
> > +    uint16_t dst_vlan_id = data->dst_vlan_id;
> > +    struct rte_mbuf **pkt_buf = &data->pkt_buf[qidx * data-
> >max_burst_size];
> > +
> > +    if (nb_pkts == 0) {
> > +        return 0;
> > +    }
> > +
> > +    if (nb_pkts > data->max_burst_size) {
> > +        VLOG_ERR("Per-flow batch size, %d, exceeds maximum limit\n",
> nb_pkts);
> > +        return 0;
> > +    }
> > +
> > +    for (i = 0; i < nb_pkts; i++) {
> > +        if (data->custom_scan(pkts[i], user_params)) {
> > +            pkt_buf[match_count] = pkts[i];
> > +            pkt_buf[match_count]->ol_flags |= PKT_TX_VLAN_PKT;
> 
> Does it work if the packet already has a VLAN inserted?
> 
>> Good catch. The design is based upon no VLAN insertion offloading is applied on source traffic. 
>>
> > +            pkt_buf[match_count]->vlan_tci = dst_vlan_id;
> > +            rte_mbuf_refcnt_update(pkt_buf[match_count], 1);
> 
> 
> 
> > +            match_count++;
> > +        }
> > +    }
> > +
> > +    dst_qidx = (data->n_dst_queue > qidx)?qidx:(data->n_dst_queue -1);
> 
> Wouldn't it scale better with:
> dst_qidx = qidx % data->n_dst_queue
> ?
> 
>> We tried to avoid using "%" operator. We could add "unlikely" and the suggested "%" to make improvement
>>
> > +
> > +    rte_spinlock_lock(&data->locks[dst_qidx]);
> > +    pkt_trans = rte_eth_tx_burst(dst_port_id, dst_qidx, pkt_buf,
> match_count);
> > +    rte_spinlock_unlock(&data->locks[dst_qidx]);
> > +
> > +    for (i = 0; i < match_count; i++) {
> > +        pkt_buf[i]->ol_flags &= ~PKT_TX_VLAN_PKT;
> > +    }
> 
> In order to further reduce the performance impact of mirroring, have you
> envisaged to offload it to dedicated PMD threads?
> 
>> The mirror-tunnel design is a comprised approach between hardware TAP and software TAP.
>> The tunnel itself is designed to have very little impact on source traffic processing core. From
>> our benchmark on SR-IOV, L2 forwarding, mirroring, we only observed 10-20% impact on 64-byte packets, and
>> we did not observe impact when running traffic with packet size with 128-byte or above.
>>
> > +
> > +    while (unlikely (pkt_trans < match_count)) {
> > +        rte_pktmbuf_free(pkt_buf[pkt_trans]);
> > +        pkt_trans++;
> > +    }
> > +
> > +    return nb_pkts;
> > +}
> > +
> > +static inline uint16_t
> > +netdev_flow_mirror_offload_cb(uint16_t qidx, struct rte_mbuf **pkts,
> > +    uint16_t nb_pkts, void *user_params, uint32_t offset)
> > +{
> > +    struct mirror_param *data = user_params;
> > +    uint16_t i, dst_qidx, match_count = 0;
> > +    uint16_t pkt_trans;
> > +    uint16_t dst_port_id = data->dst_port_id;
> > +    uint16_t dst_vlan_id = data->dst_vlan_id;
> > +    uint64_t target_addr = *(uint64_t *) data->extra_data;
> > +    struct rte_mbuf **pkt_buf = &data->pkt_buf[qidx * data-
> >max_burst_size];
> > +
> > +    if (nb_pkts == 0) {
> > +        return 0;
> > +    }
> > +
> > +    if (nb_pkts > data->max_burst_size) {
> > +        VLOG_ERR("Per-flow batch size, %d, exceeds maximum limit\n",
> nb_pkts);
> > +        return 0;
> > +    }
> > +
> > +    for (i = 0; i < nb_pkts; i++) {
> > +        uint64_t *dst_mac_addr =
> > +            rte_pktmbuf_mtod_offset(pkts[i], void *, offset);
> > +        if (is_mac_addr_match(target_addr, (*dst_mac_addr))) {
> > +            pkt_buf[match_count] = pkts[i];
> > +            pkt_buf[match_count]->ol_flags |= PKT_TX_VLAN_PKT;
> > +            pkt_buf[match_count]->vlan_tci = dst_vlan_id;
> > +            rte_mbuf_refcnt_update(pkt_buf[match_count], 1);
> > +            match_count ++;
> > +        }
> > +    }
> > +
> > +    dst_qidx = (data->n_dst_queue > qidx) ? qidx : (data->n_dst_queue -
> 1);
> > +
> > +    rte_spinlock_lock(&data->locks[dst_qidx]);
> > +    pkt_trans = rte_eth_tx_burst(dst_port_id, dst_qidx, pkt_buf,
> match_count);
> > +    rte_spinlock_unlock(&data->locks[dst_qidx]);
> > +
> > +    for (i = 0; i < match_count; i++) {
> > +        pkt_buf[i]->ol_flags &= ~PKT_TX_VLAN_PKT;
> > +    }
> > +
> > +    while (unlikely (pkt_trans < match_count)) {
> > +        rte_pktmbuf_free(pkt_buf[pkt_trans]);
> > +        pkt_trans++;
> > +    }
> > +
> > +    return nb_pkts;
> > +}
> > +
> > +static inline uint16_t
> > +netdev_port_mirror_offload_cb(uint16_t qidx, struct rte_mbuf **pkts,
> > +    uint16_t nb_pkts, void *user_params)
> > +{
> > +    struct mirror_param *data = user_params;
> > +    uint16_t i, dst_qidx;
> > +    uint16_t pkt_trans;
> > +    uint16_t dst_port_id = data->dst_port_id;
> > +    uint16_t dst_vlan_id = data->dst_vlan_id;
> > +
> > +    if (nb_pkts == 0) {
> > +        return 0;
> > +    }
> > +
> > +    for (i = 0; i < nb_pkts; i++) {
> > +        pkts[i]->ol_flags |= PKT_TX_VLAN_PKT;
> > +        pkts[i]->vlan_tci = dst_vlan_id;
> > +        rte_mbuf_refcnt_update(pkts[i], 1);
> > +    }
> > +
> > +    dst_qidx = (data->n_dst_queue > qidx) ? qidx : (data->n_dst_queue -
> 1);
> > +
> > +    rte_spinlock_lock(&data->locks[dst_qidx]);
> > +    pkt_trans = rte_eth_tx_burst(dst_port_id, dst_qidx, pkts, nb_pkts);
> > +    rte_spinlock_unlock(&data->locks[dst_qidx]);
> > +
> > +    for (i = 0; i < nb_pkts; i++) {
> > +        pkts[i]->ol_flags &= ~PKT_TX_VLAN_PKT;
> > +    }
> > +
> > +    while (unlikely (pkt_trans < nb_pkts)) {
> > +        rte_pktmbuf_free(pkts[pkt_trans]);
> > +        pkt_trans++;
> > +    }
> > +
> > +    return nb_pkts;
> > +}
> > +
> > +static inline uint16_t
> > +netdev_rx_custom_mirror_offload_cb(uint16_t port_id OVS_UNUSED,
> > +    uint16_t qidx, struct rte_mbuf **pkts, uint16_t nb_pkts,
> > +    uint16_t maxi_pkts OVS_UNUSED, void *user_params)
> > +{
> > +    return netdev_custom_mirror_offload_cb(qidx, pkts, nb_pkts,
> user_params);
> > +}
> > +
> > +static inline uint16_t
> > +netdev_tx_custom_mirror_offload_cb(uint16_t port_id OVS_UNUSED,
> > +    uint16_t qidx, struct rte_mbuf **pkts, uint16_t nb_pkts,
> > +    void *user_params)
> > +{
> > +    return netdev_custom_mirror_offload_cb(qidx, pkts, nb_pkts,
> user_params);
> > +}
> > +
> > +static inline uint16_t
> > +netdev_rx_flow_mirror_offload_cb(uint16_t port_id OVS_UNUSED,
> > +    uint16_t qidx, struct rte_mbuf **pkts, uint16_t nb_pkts,
> > +    uint16_t maxi_pkts OVS_UNUSED, void *user_params)
> > +{
> > +    return netdev_flow_mirror_offload_cb(qidx, pkts, nb_pkts,
> user_params, 0);
> > +}
> > +
> > +static inline uint16_t
> > +netdev_tx_flow_mirror_offload_cb(uint16_t port_id OVS_UNUSED,
> > +    uint16_t qidx, struct rte_mbuf **pkts, uint16_t nb_pkts,
> > +    void *user_params)
> > +{
> > +    return netdev_flow_mirror_offload_cb(qidx, pkts, nb_pkts,
> user_params, 6);
> > +}
> > +
> > +static inline uint16_t
> > +netdev_rx_port_mirror_offload_cb(uint16_t port_id OVS_UNUSED,
> > +    uint16_t qidx, struct rte_mbuf **pkts, uint16_t nb_pkts,
> > +    uint16_t max_pkts OVS_UNUSED, void *user_params)
> > +{
> > +    return netdev_port_mirror_offload_cb(qidx, pkts, nb_pkts,
> user_params);
> > +}
> > +
> > +static inline uint16_t
> > +netdev_tx_port_mirror_offload_cb(uint16_t port_id OVS_UNUSED,
> > +    uint16_t qidx, struct rte_mbuf **pkts, uint16_t nb_pkts,
> > +    void *user_params)
> > +{
> > +    return netdev_port_mirror_offload_cb(qidx, pkts, nb_pkts,
> user_params);
> > +}
> > +
> > +static rte_rx_callback_fn
> > +netdev_mirror_rx_cb(rte_mirror_type mirror_type)
> > +{
> > +    switch (mirror_type) {
> > +    case mirror_port:
> > +        return netdev_rx_port_mirror_offload_cb;
> > +    case mirror_flow_mac:
> > +        return netdev_rx_flow_mirror_offload_cb;
> > +    case mirror_flow_custom:
> > +        return netdev_rx_custom_mirror_offload_cb;
> > +    case mirror_invalid:
> > +        return NULL;
> > +    }
> > +    VLOG_ERR("Un-supported mirror type\n");
> > +    return NULL;
> > +}
> > +
> > +static rte_tx_callback_fn
> > +netdev_mirror_tx_cb(rte_mirror_type mirror_type)
> > +{
> > +    switch (mirror_type) {
> > +    case mirror_port:
> > +        return netdev_tx_port_mirror_offload_cb;
> > +    case mirror_flow_mac:
> > +        return netdev_tx_flow_mirror_offload_cb;
> > +        break;
> > +    case mirror_flow_custom:
> > +        return netdev_tx_custom_mirror_offload_cb;
> > +    case mirror_invalid:
> > +        return NULL;
> > +    }
> > +    VLOG_ERR("Un-supported mirror type\n");
> > +    return NULL;
> > +}
> > +
> > +void
> > +netdev_mirror_cb_set(struct mirror_param *data, uint16_t port_id,
> > +    int pmd_cb, int tx)
> > +{
> > +    unsigned int qid;
> > +
> > +    data->pkt_buf = NULL;
> > +    if (data->extra_data_size) {
> > +        data->pkt_buf = xmalloc(sizeof(mirror_fn_cb)*data->max_burst_size
> *
> > +            data->n_src_queue);
> > +    }
> > +
> > +    data->mirror_cb = xmalloc(sizeof(struct rte_eth_rxtx_callback *)
> > +        * data->n_src_queue);
> > +    for (qid = 0; qid < data->n_src_queue; qid++) {
> > +        if (pmd_cb) {
> > +            if (tx) {
> > +                data->mirror_cb[qid].pmd = rte_eth_add_tx_callback(port_id,
> > +                    qid, netdev_mirror_tx_cb(data->mirror_type), data);
> > +            } else {
> > +                data->mirror_cb[qid].pmd = rte_eth_add_rx_callback(port_id,
> > +                    qid, netdev_mirror_rx_cb(data->mirror_type), data);
> > +            }
> > +        } else {
> > +            struct rte_eth_rxtx_callback *rxtx_cb =
> > +                xmalloc(sizeof(struct rte_eth_rxtx_callback));
> > +
> > +            data->mirror_cb[qid].direct = rxtx_cb;
> > +            rxtx_cb->next = NULL;
> > +            rxtx_cb->param = data;
> > +
> > +            if (tx) {
> > +                rxtx_cb->fn.tx = netdev_mirror_tx_cb(data->mirror_type);
> > +            } else {
> > +                rxtx_cb->fn.rx = netdev_mirror_rx_cb(data->mirror_type);
> > +            }
> > +        }
> > +    }
> > +}
> > +
> > +/* port/flow mirroring device (port) register/un-registe routines */
> > +int
> > +netdev_eth_register_mirror(uint16_t src_port, struct mirror_param
> *param,
> > +    int tx_cb)
> > +{
> > +    struct mirror_offload_port *port_info = NULL;
> > +    struct mirror_param *data;
> > +
> > +    netdev_mirror_data_proc(src_port, mirror_data_add, tx_cb, param,
> > +        &port_info);
> > +    if (!port_info) {
> > +        return -1;
> > +    }
> > +
> > +    data = tx_cb ? &port_info->tx : &port_info->rx;
> > +    netdev_mirror_cb_set(data, src_port, 1, tx_cb);
> > +
> > +    return 0;
> > +}
> > +
> > +int
> > +netdev_eth_unregister_mirror(uint16_t src_port, int tx_cb)
> > +{
> > +    /* release both cb and pkt_buf */
> > +    unsigned int i;
> > +    struct mirror_offload_port *port_info = NULL;
> > +    struct mirror_param *data;
> > +
> > +    netdev_mirror_data_proc(src_port, mirror_data_find, tx_cb, NULL,
> > +        &port_info);
> > +    if (port_info == NULL) {
> > +        VLOG_ERR("Source port %d is not on outstanding port mirror db\n",
> > +            src_port);
> > +        return -1;
> > +    }
> > +    data = tx_cb ? &port_info->tx : &port_info->rx;
> > +
> > +    for (i = 0; i < data->n_src_queue; i++) {
> > +        if (data->mirror_cb[i].pmd) {
> > +            if (tx_cb) {
> > +                rte_eth_remove_tx_callback(src_port, i,
> > +                    data->mirror_cb[i].pmd);
> > +            } else {
> > +                rte_eth_remove_rx_callback(src_port, i,
> > +                    data->mirror_cb[i].pmd);
> > +            }
> > +        }
> > +        data->mirror_cb[i].pmd = NULL;
> > +    }
> > +    free(data->mirror_cb);
> > +
> > +    if (data->pkt_buf) {
> > +        free(data->pkt_buf);
> > +        data->pkt_buf = NULL;
> > +    }
> > +
> > +    if (data->extra_data) {
> > +        free(data->extra_data);
> > +        data->extra_data = NULL;
> > +        data->extra_data_size = 0;
> > +    }
> > +
> > +    netdev_mirror_data_proc(src_port, mirror_data_rem, tx_cb, NULL,
> NULL);
> > +    return 0;
> > +}
> > diff --git a/lib/netdev-dpdk-mirror.h b/lib/netdev-dpdk-mirror.h
> > new file mode 100644
> > index 000000000..ee4b933ba
> > --- /dev/null
> > +++ b/lib/netdev-dpdk-mirror.h
> > @@ -0,0 +1,83 @@
> > +/*
> > + * Copyright (c) 2014, 2015, 2016 Nicira, Inc.
> > + *
> > + * Licensed under the Apache License, Version 2.0 (the "License");
> > + * you may not use this file except in compliance with the License.
> > + * You may obtain a copy of the License at:
> > + *
> > + *     http://www.apache.org/licenses/LICENSE-2.0
> > + *
> > + * Unless required by applicable law or agreed to in writing, software
> > + * distributed under the License is distributed on an "AS IS" BASIS,
> > + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
> or implied.
> > + * See the License for the specific language governing permissions and
> > + * limitations under the License.
> > + */
> > +
> > +#ifndef NETDEV_DPDK_MIRROR_H
> > +#define NETDEV_DPDK_MIRROR_H
> > +
> > +#include "openvswitch/types.h"
> > +
> > +#ifdef  __cplusplus
> > +extern "C" {
> > +#endif
> > +
> > +typedef enum {
> > +    mirror_data_find, /* find the mirror-data allocated */
> > +    mirror_data_add, /* add a new mirror_param data int DB */
> > +    mirror_data_rem, /* remove a mirror_param from the DB */
> > +} mirror_data_op;
> > +
> > +typedef int (*rte_mirror_scan_fn)(struct rte_mbuf *pkt, void
> *user_param);
> > +typedef enum {
> > +    mirror_port, /* port mirror */
> > +    mirror_flow_mac, /* flow mirror according to source mac */
> > +    mirror_flow_custom,  /* flow mirror according to a callback scn */
> > +    mirror_invalid,      /* invalid mirror_type */
> > +} rte_mirror_type;
> > +
> > +typedef union {
> > +    const struct rte_eth_rxtx_callback *pmd;
> > +    struct rte_eth_rxtx_callback *direct;
> > +} mirror_fn_cb;
> > +
> > +struct mirror_param {
> > +    uint16_t dst_port_id;
> > +    uint16_t dst_vlan_id;
> > +    rte_spinlock_t *locks;
> > +    int n_src_queue;
> > +    int n_dst_queue;
> > +    struct rte_mbuf **pkt_buf;
> > +    mirror_fn_cb *mirror_cb;
> > +    unsigned int max_burst_size;
> > +    rte_mirror_scan_fn custom_scan;
> > +    rte_mirror_type mirror_type;
> > +    unsigned int extra_data_size;
> > +    void *extra_data; /* extra mirror parameter */
> > +};
> > +
> > +struct mirror_offload_port {
> > +    uint32_t dev_id;
> > +    struct mirror_param rx;
> > +    struct mirror_param tx;
> > +};
> > +
> > +bool netdev_port_started(uint16_t port_id, uint32_t *num_tx_queue);
> > +int netdev_get_portid_from_addr(const char *pci_addr_str, uint16_t
> *port_id);
> > +int netdev_tunnel_port_setup(uint16_t portid, uint32_t *num_queue);
> > +
> > +void netdev_mirror_data_proc(uint32_t dev_id, mirror_data_op op,
> > +    int tx, struct mirror_param *in_param,
> > +    struct mirror_offload_port **out_param);
> > +void netdev_mirror_cb_set(struct mirror_param *data, uint16_t port_id,
> > +    int pmd, int tx);
> > +int netdev_eth_register_mirror(uint16_t src_port,
> > +    struct mirror_param *param, int tx_cb);
> > +int netdev_eth_unregister_mirror(uint16_t src_port, int tx_cb);
> > +
> > +#ifdef  __cplusplus
> > +}
> > +#endif
> > +
> > +#endif /* netdev-dpdk-mirror.h */
> > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> > index 9d8096668..eb6644333 100644
> > --- a/lib/netdev-dpdk.c
> > +++ b/lib/netdev-dpdk.c
> > @@ -48,6 +48,7 @@
> >  #include "fatal-signal.h"
> >  #include "if-notifier.h"
> >  #include "netdev-provider.h"
> > +#include "netdev-dpdk-mirror.h"
> >  #include "netdev-vport.h"
> >  #include "odp-util.h"
> >  #include "openvswitch/dynamic-string.h"
> > @@ -171,6 +172,16 @@ static const struct rte_eth_conf port_conf = {
> >      },
> >  };
> >
> > +struct mirror_tunnel_port_info {
> > +    uint16_t port_id;
> > +    rte_spinlock_t *locks;
> > +    uint32_t share_count;
> > +    uint32_t num_queue;
> > +    bool port_started;
> > +    struct mirror_tunnel_port_info *next;
> > +};
> > +static struct mirror_tunnel_port_info *mirror_tunnel_head = NULL;
> > +
> >  /*
> >   * These callbacks allow virtio-net devices to be added to vhost ports when
> >   * configuration has been fully completed.
> > @@ -443,6 +454,8 @@ struct netdev_dpdk {
> >          };
> >          struct dpdk_tx_queue *tx_q;
> >          struct rte_eth_link link;
> > +        mirror_fn_cb *rx_cb; /* shared pointer */
> > +        mirror_fn_cb *tx_cb;
> >      );
> >
> >      PADDED_MEMBERS_CACHELINE_MARKER(CACHE_LINE_SIZE,
> cacheline1,
> > @@ -2417,6 +2430,13 @@ netdev_dpdk_vhost_rxq_recv(struct
> netdev_rxq *rxq,
> >      nb_rx = rte_vhost_dequeue_burst(vid, qid, dev->dpdk_mp->mp,
> >                                      (struct rte_mbuf **) batch->packets,
> >                                      NETDEV_MAX_BURST);
> > +
> > +    if (dev->rx_cb && dev->rx_cb[qid].direct->fn.rx) {
> > +        dev->rx_cb[qid].direct->fn.rx((uint16_t) vid, qid,
> > +        (struct rte_mbuf **) batch->packets, nb_rx,
> > +        NETDEV_MAX_BURST, dev->rx_cb[qid].direct->param);
> > +    }
> > +
> >      if (!nb_rx) {
> >          return EAGAIN;
> >      }
> > @@ -2634,6 +2654,10 @@ __netdev_dpdk_vhost_send(struct netdev
> *netdev, int qid,
> >          int vhost_qid = qid * VIRTIO_QNUM + VIRTIO_RXQ;
> >          unsigned int tx_pkts;
> >
> > +        if (dev->tx_cb && dev->tx_cb[qid].direct->fn.tx) {
> > +            dev->tx_cb[qid].direct->fn.tx((uint16_t) vid, qid, cur_pkts, cnt,
> > +                dev->tx_cb[qid].direct->param);
> > +        }
> >          tx_pkts = rte_vhost_enqueue_burst(vid, vhost_qid, cur_pkts, cnt);
> >          if (OVS_LIKELY(tx_pkts)) {
> >              /* Packets have been sent.*/
> > @@ -5291,6 +5315,376 @@ netdev_dpdk_rte_flow_query_count(struct
> netdev *netdev,
> >      return ret;
> >  }
> >
> > +/*
> > + * mirror tunnel device management routines
> > + * mirror tunnel devices are devices reserved solely for
> > + * traffic mirroring
> > + */
> > +static void
> > +netdev_dpdk_update_mt_list(struct mirror_tunnel_port_info
> *mt_port_info,
> > +                           bool add_port)
> > +{
> > +    struct mirror_tunnel_port_info *ptr = mirror_tunnel_head;
> > +
> > +    if (add_port) {
> > +        if (!ptr) {
> > +            mirror_tunnel_head = mt_port_info;
> > +            return;
> > +        }
> > +        while (ptr->next) {
> > +            ptr = ptr->next;
> > +        }
> > +        ptr->next = mt_port_info;
> > +    } else {
> > +        while (ptr->next &&
> > +            ptr->next->port_id != mt_port_info->port_id) {
> > +            ptr = ptr->next;
> > +        }
> > +
> > +        if (ptr->next) {
> > +            ptr->next = ptr->next->next;
> > +            free(mt_port_info);
> > +        } else {
> > +            if (ptr->port_id == mt_port_info->port_id) {
> > +                mirror_tunnel_head = NULL;
> > +                free(mt_port_info);
> > +            } else {
> > +                VLOG_ERR("Fail to find %s mirror port (%d) info\n",
> > +                 add_port?"add":"remove", mt_port_info->port_id);
> > +            }
> > +        }
> > +    }
> > +}
> > +
> > +static struct mirror_tunnel_port_info*
> > +netdev_dpdk_get_mt_port_info(uint16_t port_id)
> > +{
> > +    struct mirror_tunnel_port_info *mt_port_info;
> > +
> > +    if (mirror_tunnel_head) {
> > +        mt_port_info = mirror_tunnel_head;
> > +        while (mt_port_info) {
> > +            if (mt_port_info->port_id == port_id) {
> > +                return mt_port_info;
> > +            }
> > +            mt_port_info = mt_port_info->next;
> > +        }
> > +        VLOG_ERR("Could not tunnel port with port-id %d\n",
> > +            port_id);
> > +    }
> > +
> > +    mt_port_info = xmalloc(sizeof(struct mirror_tunnel_port_info));
> > +    memset(mt_port_info, 0, sizeof(*mt_port_info));
> > +    mt_port_info->port_id = port_id;
> > +    mt_port_info->next = NULL;
> > +
> > +    return mt_port_info;
> > +}
> > +
> > +static int
> > +netdev_dpdk_addr_to_portid(const char *pci_addr_str, uint16_t
> *port_id)
> > +{
> > +    struct rte_pci_device *pci_dev;
> > +    struct rte_pci_addr pci_addr;
> > +    int i;
> > +
> > +    if (rte_pci_addr_parse(pci_addr_str, &pci_addr)) {
> > +        VLOG_ERR("Incorrect pci address %s\n", pci_addr_str);
> > +        return -1;
> > +    }
> > +
> > +    for (i = 0; i < RTE_MAX_ETHPORTS; i++) {
> > +        struct rte_pci_addr *eth_pci_addr;
> > +
> > +        if (!rte_eth_devices[i].device) {
> > +            continue;
> > +        }
> > +
> > +        pci_dev = RTE_ETH_DEV_TO_PCI(&rte_eth_devices[i]);
> > +        if (!pci_dev) {
> > +            continue;
> > +        }
> > +
> > +        eth_pci_addr = &pci_dev->addr;
> > +
> > +        if (pci_addr.bus == eth_pci_addr->bus &&
> > +            pci_addr.devid == eth_pci_addr->devid &&
> > +            pci_addr.domain == eth_pci_addr->domain &&
> > +            pci_addr.function == eth_pci_addr->function) {
> > +            *port_id = i;
> > +
> > +            return 0;
> > +        }
> > +    }
> > +
> > +    return -1;
> > +}
> > +
> > +static int
> > +netdev_dpdk_mt_open(uint16_t port_id, struct mirror_param *param)
> > +{
> > +    struct rte_eth_dev_info dev_info;
> > +    struct rte_eth_txconf txq_conf;
> > +    struct rte_eth_rxconf rxq_conf;
> > +    struct rte_mempool *pktbuf;
> > +
> > +    struct mirror_tunnel_port_info *mt_info;
> > +
> > +    uint16_t nb_rxd = NIC_PORT_DEFAULT_RXQ_SIZE;
> > +    uint16_t nb_txd = NIC_PORT_DEFAULT_TXQ_SIZE;
> > +    unsigned int i, num_queue;
> > +
> > +    struct rte_eth_conf mt_port_conf = {
> > +        .rxmode = {
> > +            .split_hdr_size = 0,
> > +        },
> > +        .txmode = {
> > +            .mq_mode = ETH_MQ_TX_NONE,
> > +        },
> > +    };
> > +
> > +    mt_info = netdev_dpdk_get_mt_port_info(port_id);
> > +    if (!mt_info) {
> > +        return -1;
> > +    }
> > +
> > +    if (mt_info->port_started) {
> > +        param->n_dst_queue = mt_info->num_queue;
> > +        param->dst_port_id = port_id;
> > +        param->locks = mt_info->locks;
> > +        mt_info->share_count++;
> > +
> > +        return 0;
> > +    }
> > +
> > +    rte_eth_dev_info_get(port_id, &dev_info);
> > +    num_queue = param->n_src_queue;
> > +
> > +    /* A tunnel device doesn't require mbuf. It's used as
> > +     * hardware channel, transmit packets with
> > +     * mbuf provided by source. Need this mbuf creation
> > +     * to finish port initialization
> > +     */
> > +    pktbuf = rte_pktmbuf_pool_create(
> > +            "tunnel-port",
> > +            (dev_info.rx_desc_lim.nb_max + dev_info.tx_desc_lim.nb_max),
> > +            RTE_MEMPOOL_CACHE_MAX_SIZE, 0,
> RTE_MBUF_DEFAULT_BUF_SIZE,
> > +            rte_eth_dev_socket_id(port_id));
> > +
> > +    mt_port_conf.txmode.offloads |= DEV_TX_OFFLOAD_VLAN_INSERT;
> > +    if (dev_info.tx_offload_capa & DEV_TX_OFFLOAD_MBUF_FAST_FREE) {
> > +        mt_port_conf.txmode.offloads |=
> DEV_TX_OFFLOAD_MBUF_FAST_FREE;
> > +    }
> > +    rte_eth_dev_configure(port_id, 1, num_queue, &mt_port_conf);
> > +
> > +    /* init one Rx queue */
> > +    rxq_conf = dev_info.default_rxconf;
> > +    rxq_conf.offloads = mt_port_conf.rxmode.offloads;
> > +    if (rte_eth_rx_queue_setup(port_id, 0, nb_rxd,
> > +        rte_eth_dev_socket_id(port_id), &rxq_conf, pktbuf) < 0)
> > +        VLOG_ERR("fail to setup tunnel port (%d) rx-queue\n", port_id);
> > +
> > +    /* init # of Tx queue as part of mirror-tunnel setup */
> > +    txq_conf = dev_info.default_txconf;
> > +    txq_conf.offloads |= mt_port_conf.txmode.offloads;
> > +    for (i = 0; i < num_queue; i++) {
> > +        if (rte_eth_tx_queue_setup(port_id,
> > +            i, nb_txd,
> > +            rte_eth_dev_socket_id(port_id),
> > +            &txq_conf) < 0) {
> > +            VLOG_ERR("fail to setup tunnel port (%d) tx queue #%u\n",
> > +                port_id, i);
> > +            return -1;
> > +        }
> > +    }
> > +
> > +    if (rte_eth_dev_start(port_id) < 0) {
> > +        VLOG_ERR("fail to start tunnel port %d\n", port_id);
> > +        return -1;
> > +    }
> > +
> > +    mt_info->locks = xmalloc(num_queue * sizeof(rte_spinlock_t));
> > +    if (mt_info->locks) {
> > +        for (i = 0; i < mt_info->num_queue; i++) {
> > +            rte_spinlock_init(&mt_info->locks[i]);
> > +        }
> > +    } else {
> > +        return -1;
> > +    }
> > +    mt_info->share_count = 1;
> > +    mt_info->port_started = true;
> > +    mt_info->num_queue = num_queue;
> > +
> > +    param->n_dst_queue = mt_info->num_queue;
> > +    param->dst_port_id = port_id;
> > +    param->locks = mt_info->locks;
> > +
> > +    netdev_dpdk_update_mt_list(mt_info, true);
> > +    return 0;
> > +}
> > +
> > +static void
> > +netdev_dpdk_mt_close(uint16_t mirror_port_id)
> > +{
> > +    struct mirror_tunnel_port_info *mt_port_info =
> > +        netdev_dpdk_get_mt_port_info(mirror_port_id);
> > +
> > +    if (mt_port_info) {
> > +        mt_port_info->share_count--;
> > +        if (!mt_port_info->share_count) {
> > +            netdev_dpdk_update_mt_list(mt_port_info, false);
> > +            rte_eth_dev_stop(mirror_port_id);
> > +            rte_eth_dev_close(mirror_port_id);
> > +        }
> > +    }
> > +}
> > +
> > +/* vhost device mirror registration and un-registration routines */
> > +static int
> > +netdev_vhost_register_mirror(struct netdev_dpdk *dev,
> > +    struct mirror_param *param, int tx_cb)
> > +{
> > +    uint32_t vid = netdev_dpdk_get_vid(dev);
> > +    struct mirror_offload_port *port_info = NULL;
> > +    struct mirror_param *data;
> > +
> > +    netdev_mirror_data_proc(vid, mirror_data_add, tx_cb, param,
> &port_info);
> > +    if (!port_info) {
> > +        return -1;
> > +    }
> > +
> > +    data = tx_cb ? &port_info->tx : &port_info->rx;
> > +    netdev_mirror_cb_set(data, (uint16_t) vid, 0, tx_cb);
> > +
> > +    if (tx_cb) {
> > +        dev->tx_cb = data->mirror_cb;
> > +    } else {
> > +        dev->rx_cb = data->mirror_cb;
> > +    }
> > +
> > +    return 0;
> > +}
> > +
> > +static int
> > +netdev_vhost_unregister_mirror(struct netdev_dpdk *dev, int tx_cb)
> > +{
> > +    /* release both cb and pkt_buf */
> > +    unsigned int i;
> > +    uint32_t vid = netdev_dpdk_get_vid(dev);
> > +    struct mirror_offload_port *port_info = NULL;
> > +    struct mirror_param *data;
> > +
> > +    netdev_mirror_data_proc(vid, mirror_data_find, tx_cb, NULL,
> &port_info);
> > +    if (port_info == NULL) {
> > +        VLOG_ERR("Source port %d is not on outstanding port mirror db\n",
> vid);
> > +        return -1;
> > +    }
> > +    data = tx_cb ? &port_info->tx : &port_info->rx;
> > +
> > +    if (tx_cb) {
> > +        dev->tx_cb = NULL;
> > +    } else {
> > +        dev->rx_cb = NULL;
> > +    }
> > +
> > +    for (i = 0; i < data->n_src_queue; i++) {
> > +        free(data->mirror_cb[i].direct);
> > +    }
> > +
> > +    free(data->mirror_cb);
> > +
> > +    if (data->pkt_buf) {
> > +        free(data->pkt_buf);
> > +        data->pkt_buf = NULL;
> > +    }
> > +
> > +    if (data->extra_data) {
> > +        free(data->extra_data);
> > +        data->extra_data = NULL;
> > +        data->extra_data_size = 0;
> > +    }
> > +
> > +    netdev_mirror_data_proc(vid,  mirror_data_rem, tx_cb, NULL, NULL);
> > +    return 0;
> > +}
> > +
> > +static int
> > +netdev_dpdk_mirror_offload(struct netdev *src, struct eth_addr
> *flow_addr,
> > +                           uint16_t vlan_id, char *mirror_tunnel_addr,
> > +                           bool add_mirror, bool tx_cb) {
> > +    struct netdev_dpdk *src_dev = netdev_dpdk_cast(src);
> > +    bool eth_dev = src_dev->type == DPDK_DEV_ETH;
> > +    uint16_t mirror_port_id;
> > +    int status = 0;
> > +
> > +    if (netdev_dpdk_addr_to_portid(mirror_tunnel_addr,
> &mirror_port_id)) {
> > +        VLOG_ERR("Could not find tunnel port with BDF addr %s\n",
> > +            mirror_tunnel_addr);
> > +        return -1;
> > +    }
> > +    if (add_mirror) {
> > +        uint32_t i;
> > +        struct mirror_param data;
> > +        uint64_t mac_addr = 0;
> > +
> > +        memset(&data, 0, sizeof(struct mirror_param));
> > +        data.extra_data_size = 0;
> > +        data.extra_data = NULL;
> > +        data.mirror_type = mirror_port;
> > +        for (i = 0; i < 6; i++) {
> > +            mac_addr <<= 8;
> > +            mac_addr |= flow_addr->ea[6 - i - 1];
> > +        }
> > +        if (mac_addr) {
> > +            data.mirror_type = mirror_flow_mac;
> > +            data.extra_data_size = sizeof(uint64_t);
> > +            data.extra_data = xmalloc(sizeof(uint64_t));
> > +            memcpy(data.extra_data, &mac_addr, sizeof(uint64_t));
> > +        }
> > +        data.dst_vlan_id = vlan_id;
> > +        data.n_src_queue = tx_cb?src->n_txq:src->n_rxq;
> > +        data.max_burst_size = NETDEV_MAX_BURST;
> > +
> > +        if (netdev_dpdk_mt_open(mirror_port_id, &data)) {
> > +            VLOG_ERR("Fail to initialize mirror tunnel port %d\n",
> > +                mirror_port_id);
> > +            return -1;
> > +        }
> > +
> > +        VLOG_INFO("register %s device with %s mirror-offload with"
> > +            "src-port:%d (%s) and output-port:%d (%s) vlan-id=%d flow-mac="
> > +            "0x%" PRIx64 "\n",
> > +            eth_dev?"ethdev":"vhost",
> > +            tx_cb?"ingress":"egress", src_dev->port_id,
> > +            src->name, mirror_port_id, mirror_tunnel_addr, vlan_id,
> > +            (uint64_t)__builtin_bswap64(mac_addr));
> > +
> > +        if (eth_dev) {
> > +            status = netdev_eth_register_mirror(src_dev->port_id, &data,
> > +                tx_cb);
> > +        } else {
> > +            status = netdev_vhost_register_mirror(src_dev, &data, tx_cb);
> > +        }
> > +    } else {
> > +        VLOG_INFO("unregister %s device with %s mirror-offload with"
> > +            " src-port:%d(%s)\n",
> > +            eth_dev?"ethdev":"vhost",
> > +            tx_cb?"ingress":"egress", src_dev->port_id,
> > +            src->name);
> > +
> > +        if (eth_dev) {
> > +            status = netdev_eth_unregister_mirror(src_dev->port_id, tx_cb);
> > +        } else {
> > +            status = netdev_vhost_unregister_mirror(src_dev, tx_cb);
> > +        }
> > +
> > +        netdev_dpdk_mt_close(mirror_port_id);
> > +    }
> > +
> > +    return status;
> > +}
> > +
> >  #define NETDEV_DPDK_CLASS_COMMON                            \
> >      .is_pmd = true,                                         \
> >      .alloc = netdev_dpdk_alloc,                             \
> > @@ -5340,6 +5734,7 @@ static const struct netdev_class dpdk_class = {
> >      .construct = netdev_dpdk_construct,
> >      .set_config = netdev_dpdk_set_config,
> >      .send = netdev_dpdk_eth_send,
> > +    .mirror_offload = netdev_dpdk_mirror_offload,
> >  };
> >
> >  static const struct netdev_class dpdk_vhost_class = {
> > @@ -5355,6 +5750,7 @@ static const struct netdev_class dpdk_vhost_class
> = {
> >      .reconfigure = netdev_dpdk_vhost_reconfigure,
> >      .rxq_recv = netdev_dpdk_vhost_rxq_recv,
> >      .rxq_enabled = netdev_dpdk_vhost_rxq_enabled,
> > +    .mirror_offload = netdev_dpdk_mirror_offload,
> >  };
> >
> >  static const struct netdev_class dpdk_vhost_client_class = {
> > @@ -5371,6 +5767,7 @@ static const struct netdev_class
> dpdk_vhost_client_class = {
> >      .reconfigure = netdev_dpdk_vhost_client_reconfigure,
> >      .rxq_recv = netdev_dpdk_vhost_rxq_recv,
> >      .rxq_enabled = netdev_dpdk_vhost_rxq_enabled,
> > +    .mirror_offload = ne²tdev_dpdk_mirror_offload,
> >  };
> >
> >  void
> > diff --git a/lib/netdev-provider.h b/lib/netdev-provider.h
> > index 73dce2fca..dab278dcd 100644
> > --- a/lib/netdev-provider.h
> > +++ b/lib/netdev-provider.h
> > @@ -834,6 +834,22 @@ struct netdev_class {
> >      /* Get a block_id from the netdev.
> >       * Returns the block_id or 0 if none exists for netdev. */
> >      uint32_t (*get_block_id)(struct netdev *);
> > +
> > +    /* Configure a mirror offload setting on a netdev.
> > +     * 'src': netdev traffic to be mirrored
> > +     * 'flow_addr': the destination mac address is of source traffic for
> > +     *  inspection.
> > +     * 'dst': netdev where mirror traffic is transmitted.
> > +     * 'vlan_id': vlag to be added to the mirrored packets.
> > +     * 'mt_pci_addr': mirror tunnel pcie address.
> > +     * 'add_mirror': true: configure a mirror traffic; false: remove mirror
> > +     * 'ingress': true: mirror 'src' netdev Rx traffic; false: mirror
> > +     *  'src' netdev Tx traffic.
> > +     */
> > +    int (*mirror_offload)(struct netdev *src, struct eth_addr *flow_addr,
> > +                          uint16_t vlan_id, char *mt_pci_addr,
> > +                          bool add_mirror, bool ingress);
> > +
> >  };
> >
> >  int netdev_register_provider(const struct netdev_class *);
> > diff --git a/lib/netdev.c b/lib/netdev.c
> > index 91e91955c..464c2f8fe 100644
> > --- a/lib/netdev.c
> > +++ b/lib/netdev.c
> > @@ -69,6 +69,8 @@ COVERAGE_DEFINE(netdev_get_stats);
> >  COVERAGE_DEFINE(netdev_send_prepare_drops);
> >  COVERAGE_DEFINE(netdev_push_header_drops);
> >
> > +#define MIRROR_DB_INIT_SIZE 8
> > +
> >  struct netdev_saved_flags {
> >      struct netdev *netdev;
> >      struct ovs_list node;           /* In struct netdev's saved_flags_list. */
> > @@ -2297,3 +2299,387 @@ netdev_free_custom_stats_counters(struct
> netdev_custom_stats *custom_stats)
> >          }
> >      }
> >  }
> > +
> > +
> > +struct netdev_mirror_offload_item {
> > +    struct mirror_offload_info info;
> > +
> > +    struct ovs_list node;
> > +};
> > +
> > +struct netdev_mirror_offload {
> > +    struct ovs_mutex mutex;
> > +    struct ovs_list list;
> > +    pthread_cond_t cond;
> > +};
> > +
> > +static struct netdev_mirror_offload netdev_mirror_offload = {
> > +    .mutex = OVS_MUTEX_INITIALIZER,
> > +    .list  = OVS_LIST_INITIALIZER(&netdev_mirror_offload.list),
> > +};
> > +
> > +static struct ovsthread_once offload_thread_once
> > +    = OVSTHREAD_ONCE_INITIALIZER;
> > +
> > +static void *netdev_mirror_offload_main(void *data);
> > +
> > +/*
> > + * Re-size mirror_db when it's out of space.
> > + * Always double the buffer when it's needed
> > + */
> > +static int
> > +netdev_mirror_db_resize(struct netdev_mirror_offload_item
> ***old_db,
> > +    int *old_db_size)
> > +{
> > +    struct netdev_mirror_offload_item **new_db;
> > +    int cur_size = *old_db_size;
> > +    int new_size;
> > +
> > +    if (!cur_size) {
> > +        new_size = MIRROR_DB_INIT_SIZE;
> > +    } else {
> > +        new_size = 2 * cur_size;
> > +    }
> > +
> > +    new_db = xzalloc(sizeof(struct netdev_mirror_offload_item *) *
> new_size);
> > +
> > +    if (!new_db) {
> > +        VLOG_ERR("Out of memory!!!");
> > +        return -1;
> > +    }
> > +    memset(new_db, 0, sizeof(struct netdev_mirror_offload_item *) *
> new_size);
> > +
> > +    if (cur_size) {
> > +        int i;
> > +
> > +        for (i = 0; i < cur_size; i++) {
> > +            new_db[i] = (*old_db)[i];
> > +        }
> > +        free(*old_db);
> > +    }
> > +
> > +    *old_db = new_db;
> > +    *old_db_size = new_size;
> > +
> > +    return 0;
> > +}
> > +
> > +static void
> > +netdev_free_mirror_offload(struct netdev_mirror_offload_item
> *offload)
> > +{
> > +    if (!offload) {
> > +        return;
> > +    }
> > +
> > +    if (offload->info.src) {
> > +        free(offload->info.src);
> > +    }
> > +    if (offload->info.dst) {
> > +        free(offload->info.dst);
> > +    }
> > +    if (offload->info.flow_dst_mac) {
> > +        free(offload->info.flow_dst_mac);
> > +    }
> > +    if (offload->info.flow_src_mac) {
> > +        free(offload->info.flow_src_mac);
> > +    }
> > +    if (offload->info.output_src_tags) {
> > +        free(offload->info.output_src_tags);
> > +    }
> > +    if (offload->info.output_dst_tags) {
> > +        free(offload->info.output_dst_tags);
> > +    }
> > +    if (offload->info.name) {
> > +        free(offload->info.name);
> > +    }
> > +    if (offload->info.mirror_tunnel_addr) {
> > +        free(offload->info.mirror_tunnel_addr);
> > +    }
> > +
> > +    free(offload);
> > +}
> > +
> > +static struct
> > +netdev_mirror_offload_item *
> > +netdev_alloc_mirror_offload(struct mirror_offload_info *info)
> > +{
> > +    struct netdev_mirror_offload_item *offload;
> > +    int i;
> > +
> > +    offload = xzalloc(sizeof(*offload));
> > +    memcpy(&offload->info, info, sizeof(struct mirror_offload_info));
> > +
> > +    if (info->name) {
> > +        offload->info.name = xzalloc(strlen(info->name) + 1);
> > +        if (offload->info.name) {
> > +            ovs_strzcpy(offload->info.name, info->name, strlen(info->name));
> > +        }
> > +    }
> > +
> > +    if (info->mirror_tunnel_addr) {
> > +        offload->info.mirror_tunnel_addr =
> > +            xzalloc(strlen(info->mirror_tunnel_addr) + 1);
> > +        if (offload->info.mirror_tunnel_addr) {
> > +            ovs_strzcpy(offload->info.mirror_tunnel_addr,
> > +                        info->mirror_tunnel_addr,
> > +                        strlen(info->mirror_tunnel_addr));
> > +        }
> > +    }
> > +
> > +    /* only add_mirror request include valid configuration */
> > +    if (info->n_src_port) {
> > +        offload->info.src = xzalloc(sizeof(struct netdev *)*info->n_src_port);
> > +        offload->info.flow_dst_mac = xzalloc(sizeof(struct eth_addr)*
> > +            info->n_src_port);
> > +        offload->info.output_src_tags = xzalloc(sizeof(uint16_t)*
> > +            info->n_src_port);
> > +        if (!offload->info.src || !offload->info.flow_dst_mac ||
> > +            !offload->info.output_src_tags) {
> > +            VLOG_ERR("Out of memory!!!");
> > +            netdev_free_mirror_offload(offload);
> > +            return NULL;
> > +        }
> > +
> > +        for (i = 0; i < info->n_src_port; i++) {
> > +            offload->info.src[i] = info->src[i];
> > +            offload->info.output_src_tags[i] = info->output_src_tags[i];
> > +            memcpy(&offload->info.flow_dst_mac[i], &info->flow_dst_mac[i],
> > +                sizeof(struct eth_addr));
> > +        }
> > +    }
> > +
> > +    if (info->n_dst_port) {
> > +        offload->info.dst = xzalloc(sizeof(struct netdev *)*info->n_dst_port);
> > +        offload->info.flow_src_mac = xzalloc(sizeof(struct eth_addr)*
> > +            info->n_dst_port);
> > +        offload->info.output_dst_tags = xzalloc(sizeof(uint16_t)*
> > +            info->n_dst_port);
> > +        if (!offload->info.dst || !offload->info.flow_src_mac ||
> > +            !offload->info.output_dst_tags) {
> > +            VLOG_ERR("Out of memory!!!");
> > +            netdev_free_mirror_offload(offload);
> > +            return NULL;
> > +        }
> > +
> > +        for (i = 0; i < info->n_dst_port; i++) {
> > +            offload->info.dst[i] = info->dst[i];
> > +            offload->info.output_dst_tags[i] = info->output_dst_tags[i];
> > +            memcpy(&offload->info.flow_src_mac[i], &info->flow_src_mac[i],
> > +                sizeof(struct eth_addr));
> > +        }
> > +    }
> > +
> > +    return offload;
> > +}
> > +
> > +static void
> > +netdev_append_mirror_offload(struct netdev_mirror_offload_item
> *offload)
> > +{
> > +    ovs_mutex_lock(&netdev_mirror_offload.mutex);
> > +    ovs_list_push_back(&netdev_mirror_offload.list, &offload->node);
> > +    xpthread_cond_signal(&netdev_mirror_offload.cond);
> > +    ovs_mutex_unlock(&netdev_mirror_offload.mutex);
> > +}
> > +
> > +void
> > +netdev_mirror_offload_put(struct mirror_offload_info *info)
> > +{
> > +    struct netdev_mirror_offload_item *offload;
> > +    /* only support tunnel port for traffic mirroring */
> > +    if (info->add_mirror && !info->mirror_tunnel_addr) {
> > +        return;
> > +    }
> > +
> > +    if (ovsthread_once_start(&offload_thread_once)) {
> > +        xpthread_cond_init(&netdev_mirror_offload.cond, NULL);
> > +        ovs_thread_create("netdev_mirror_offload",
> > +                          netdev_mirror_offload_main, NULL);
> > +        ovsthread_once_done(&offload_thread_once);
> > +    }
> > +
> > +    offload = netdev_alloc_mirror_offload(info);
> > +    netdev_append_mirror_offload(offload);
> > +}
> > +
> > +static int
> > +netdev_mirror_offload_configue(struct mirror_offload_info *info,
> > +    bool add_mirror)
> > +{
> > +    int un_support_count = 0;
> > +    int ret;
> > +
> > +    if (info->n_src_port) {
> > +        for (int i = 0; i < info->n_src_port; i++) {
> > +            const struct netdev_class *class =
> > +                info->src[i]->netdev_class;
> > +            if (!class) {
> > +                return -1;
> > +            }
> > +            if (class->mirror_offload) {
> > +                ret = class->mirror_offload(
> > +                    info->src[i],
> > +                    &info->flow_dst_mac[i],
> > +                    info->output_src_tags[i],
> > +                    info->mirror_tunnel_addr,
> > +                    add_mirror, false);
> > +                if (ret) {
> > +                    VLOG_ERR("Fail to %s mirror-offload"
> > +                        " configuration %s\n",
> > +                        add_mirror ? "add" : "remove",
> > +                        info->name);
> > +                    return ret;
> > +                }
> > +            } else {
> > +                un_support_count++;
> > +            }
> > +        }
> > +    }
> > +
> > +    if (info->n_dst_port) {
> > +        for (int i = 0; i < info->n_dst_port; i++) {
> > +            const struct netdev_class *class =
> > +                info->dst[i]->netdev_class;
> > +            if (!class) {
> > +                return -1;
> > +            }
> > +            if (class->mirror_offload) {
> > +                ret = class->mirror_offload(
> > +                    info->dst[i],
> > +                    &info->flow_src_mac[i],
> > +                    info->output_dst_tags[i],
> > +                    info->mirror_tunnel_addr,
> > +                    add_mirror, true);
> > +                if (ret) {
> > +                    VLOG_ERR("Fail to %s mirror-offload"
> > +                        " configuration %s\n",
> > +                        add_mirror ? "add" : "remove",
> > +                        info->name);
> > +                    return ret;
> > +                }
> > +            } else {
> > +                un_support_count++;
> > +            }
> > +        }
> > +    }
> > +
> > +    return un_support_count;
> > +}
> > +
> > +static void *
> > +netdev_mirror_offload_main(void *data OVS_UNUSED)
> > +{
> > +    struct netdev_mirror_offload_item *offload;
> > +    struct mirror_offload_info *info;
> > +    struct ovs_list *list;
> > +    struct netdev_mirror_offload_item **offload_db = NULL;
> > +    int offload_used_count = 0;
> > +    int offload_db_size = 0;
> > +    int ret, i, ind;
> > +
> > +    /* continue polling to check if there is an outstanding request */
> > +    for (;;) {
> > +        ovs_mutex_lock(&netdev_mirror_offload.mutex);
> > +        if (ovs_list_is_empty(&netdev_mirror_offload.list)) {
> > +            ovsrcu_quiesce_start();
> > +            ovs_mutex_cond_wait(&netdev_mirror_offload.cond,
> > +                                &netdev_mirror_offload.mutex);
> > +            ovsrcu_quiesce_end();
> > +        }
> > +        list = ovs_list_pop_front(&netdev_mirror_offload.list);
> > +        offload = CONTAINER_OF(list, struct netdev_mirror_offload_item,
> > +            node);
> > +        ovs_mutex_unlock(&netdev_mirror_offload.mutex);
> > +
> > +        if (!offload_db_size &&
> > +            netdev_mirror_db_resize(&offload_db, &offload_db_size)){
> > +            return NULL;
> > +        }
> > +
> > +        ind = offload_db_size;
> > +        for (i = 0; i < offload_db_size; i++) {
> > +            if (offload_db[i] &&
> > +                !strncmp(offload_db[i]->info.name, offload->info.name,
> > +                strlen(offload->info.name) + 1)) {
> > +                ind = i;
> > +                break;
> > +            }
> > +        }
> > +
> > +        if (!offload->info.add_mirror) {
> > +            /* remove mirror offload setup */
> > +            if (ind == offload_db_size) {
> > +                VLOG_WARN("Mirror offload remove configuration, %s, "
> > +                    "not found; clear mirror offload operation"
> > +                    " aborted\n", offload->info.name);
> > +                continue;
> > +            }
> > +        } else {
> > +            /* add mirror offload */
> > +            if (ind < offload_db_size) {
> > +                netdev_free_mirror_offload(offload);
> > +                VLOG_WARN("Attempt adding an existing mirror-offload "
> > +                    "configuration; request aborted\n");
> > +                continue;
> > +            }
> > +
> > +            if (offload_used_count == offload_db_size &&
> > +                netdev_mirror_db_resize(&offload_db, &offload_db_size)) {
> > +                return NULL;
> > +            }
> > +        }
> > +
> > +        info = offload->info.add_mirror ? &offload->info :
> > +            &offload_db[ind]->info;
> > +        ret = netdev_mirror_offload_configue(info, offload-
> >info.add_mirror);
> > +
> > +        if (ret) {
> > +            VLOG_ERR("%s mirror configuration fails due to %s\n",
> > +                offload->info.add_mirror ? "Add" : "Remove",
> > +                ret > 0 ? "unsupport source traffic type" :
> > +                "device is not ready");
> > +            netdev_free_mirror_offload(offload);
> > +            continue;
> > +        } else {
> > +            VLOG_INFO("Succeed %s mirror-offload configuration: %s",
> > +                offload->info.add_mirror ? "adding" : "removing",
> > +                offload->info.name);
> > +        }
> > +
> > +        if (offload->info.add_mirror) {
> > +            for (i = 0; i < offload_db_size; i++) {
> > +                if (offload_db[i] == NULL) {
> > +                    offload_db[i] = offload;
> > +                    offload_used_count++;
> > +                    break;
> > +                }
> > +            }
> > +        } else {
> > +            /* remove the prior "add" request */
> > +            netdev_free_mirror_offload(offload_db[ind]);
> > +            offload_db[ind] = NULL;
> > +
> > +            /* remove the current("remove") request */
> > +            netdev_free_mirror_offload(offload);
> > +            offload_used_count--;
> > +        }
> > +
> > +        /* free db when the used count drop to 0 */
> > +        if (!offload_used_count) {
> > +            free(offload_db);
> > +            offload_db = NULL;
> > +            offload_db_size = 0;
> > +        }
> > +    }
> > +
> > +    /* clean up memory */
> > +    for (i = 0; i < offload_db_size; i++) {
> > +        if (offload_db[i]) {
> > +            netdev_free_mirror_offload(offload_db[i]);
> > +        }
> > +    }
> > +    if (offload_db) {
> > +        free(offload_db);
> > +    }
> > +
> > +    return NULL;
> > +}
> > diff --git a/lib/netdev.h b/lib/netdev.h
> > index b705a9e56..cce042fc7 100644
> > --- a/lib/netdev.h
> > +++ b/lib/netdev.h
> > @@ -201,6 +201,22 @@ int netdev_send(struct netdev *, int qid, struct
> dp_packet_batch *,
> >                  bool concurrent_txq);
> >  void netdev_send_wait(struct netdev *, int qid);
> >
> > +/* Hardware assisted mirror offloading*/
> > +struct mirror_offload_info {
> > +    struct netdev **src;
> > +    struct netdev **dst;
> > +    int n_src_port;
> > +    int n_dst_port;
> > +    struct eth_addr *flow_src_mac;
> > +    struct eth_addr *flow_dst_mac;
> > +    uint16_t *output_src_tags;
> > +    uint16_t *output_dst_tags;
> > +    bool add_mirror;
> > +    char *mirror_tunnel_addr;
> > +    char *name;
> > +};
> > +void netdev_mirror_offload_put(struct mirror_offload_info *);
> > +
> >  /* native tunnel APIs */
> >  /* Structure to pass parameters required to build a tunnel header. */
> >  struct netdev_tnl_build_header_params {
> > diff --git a/tests/ovs-vsctl.at b/tests/ovs-vsctl.at
> > index dccb11741..ff6e9e625 100644
> > --- a/tests/ovs-vsctl.at
> > +++ b/tests/ovs-vsctl.at
> > @@ -1364,7 +1364,9 @@ _uuid               : <1>
> >  name                : eth1
> >  _uuid               : <2>
> >  name                : mymirror
> > +output_dst_vlan     : []
> >  output_port         : <1>
> > +output_src_vlan     : []
> >  output_vlan         : []
> >  select_all          : false
> >  select_dst_port     : [<0>]
> > diff --git a/vswitchd/bridge.c b/vswitchd/bridge.c
> > index 5ed7e8234..7b7603513 100644
> > --- a/vswitchd/bridge.c
> > +++ b/vswitchd/bridge.c
> > @@ -38,6 +38,7 @@
> >  #include "mac-learning.h"
> >  #include "mcast-snooping.h"
> >  #include "netdev.h"
> > +#include "netdev-provider.h"
> >  #include "netdev-offload.h"
> >  #include "nx-match.h"
> >  #include "ofproto/bond.h"
> > @@ -330,6 +331,9 @@ static void mirror_destroy(struct mirror *);
> >  static bool mirror_configure(struct mirror *);
> >  static void mirror_refresh_stats(struct mirror *);
> >
> > +static void mirror_offload_destroy(struct mirror *);
> > +static bool mirror_offload_configure(struct mirror *);
> > +
> >  static void iface_configure_lacp(struct iface *,
> >                                   struct lacp_member_settings *);
> >  static bool iface_create(struct bridge *, const struct ovsrec_interface *,
> > @@ -423,6 +427,35 @@ if_notifier_changed(struct if_notifier *notifier
> OVS_UNUSED)
> >      seq_wait(ifaces_changed, last_ifaces_changed);
> >      return changed;
> >  }
> > +
> > +static struct port *
> > +port_lookup_all(const char *port_name)
> > +{
> > +    struct bridge *br;
> > +    struct port *port = NULL;
> > +    int found = 0;
> > +
> > +    HMAP_FOR_EACH (br, node, &all_bridges) {
> > +        struct port *temp_port = NULL;
> > +        temp_port = port_lookup(br, port_name);
> > +        if (temp_port) {
> > +            if (!port) {
> > +                port = temp_port;
> > +            }
> > +            found++;
> > +        }
> > +    }
> > +
> > +    if (found) {
> > +        if (found > 1) {
> > +            VLOG_INFO("More than one bridge owns port with name:%s\n",
> > +                port_name);
> > +        }
> > +        return port;
> > +    }
> > +    return NULL;
> > +}
> > +
> >
> 
> 
> >  /* Public functions. */
> >
> > @@ -5055,14 +5088,228 @@ mirror_create(struct bridge *br, const struct
> ovsrec_mirror *cfg)
> >      return m;
> >  }
> >
> > +static struct netdev *get_netdev_from_port(struct mirror *m,
> > +    struct port **port, const char *name)
> > +{
> > +    struct port *temp_port;
> > +    struct iface *iface;
> > +
> > +    *port = NULL;
> > +    temp_port = port_lookup(m->bridge, name);
> > +    if (temp_port) {
> > +        LIST_FOR_EACH (iface, port_elem, &temp_port->ifaces) {
> > +            if (iface) {
> > +                *port = temp_port;
> > +                return iface->netdev;
> > +            }
> > +        }
> > +    }
> > +    /* try different bridges */
> > +    temp_port = port_lookup_all(name);
> > +    if (temp_port) {
> > +        LIST_FOR_EACH (iface, port_elem, &temp_port->ifaces) {
> > +            if (iface) {
> > +                *port = temp_port;
> > +                return iface->netdev;
> > +            }
> > +        }
> > +    }
> > +    return NULL;
> > +}
> > +
> > +static void
> > +release_mirror_offload_info(struct mirror_offload_info *info)
> > +{
> > +    if (info->src) {
> > +        free(info->src);
> > +    }
> > +    if (info->dst) {
> > +        free(info->dst);
> > +    }
> > +    if (info->flow_dst_mac) {
> > +        free(info->flow_dst_mac);
> > +    }
> > +    if (info->flow_src_mac) {
> > +        free(info->flow_src_mac);
> > +    }
> > +    if (info->output_src_tags) {
> > +        free(info->output_src_tags);
> > +    }
> > +    if (info->output_dst_tags) {
> > +        free(info->output_dst_tags);
> > +    }
> > +    if (info->name) {
> > +        free(info->name);
> > +    }
> > +    if (info->mirror_tunnel_addr) {
> > +        free(info->mirror_tunnel_addr);
> > +    }
> > +}
> > +
> > +static int
> > +set_mirror_offload_info(struct mirror *m, struct mirror_offload_info
> *info)
> > +{
> > +    const struct ovsrec_mirror *cfg = m->cfg;
> > +    struct port *port = NULL;
> > +    int i;
> > +
> > +    if (m->name) {
> > +        info->name = xmalloc(strlen(m->name) + 1);
> > +        ovs_strzcpy(info->name, m->name, strlen(m->name));
> > +    }
> > +
> > +    if (cfg->mirror_tunnel_addr) {
> > +        info->mirror_tunnel_addr = xmalloc(strlen(cfg->mirror_tunnel_addr)
> > +            + 1);
> > +        ovs_strzcpy(info->mirror_tunnel_addr, cfg->mirror_tunnel_addr,
> > +                    strlen(cfg->mirror_tunnel_addr));
> > +    } else {
> > +        VLOG_ERR("mirror-offload configuration fails because"
> > +            " lack of tunnel device\n");
> > +        return -1;
> > +    }
> > +
> > +    /* source port */
> > +    info->n_src_port = cfg->n_select_src_port;
> > +    if (info->n_src_port) {
> > +        info->src = xmalloc(sizeof(struct netdev *)*info->n_src_port);
> > +        info->flow_dst_mac = xmalloc(sizeof(struct eth_addr)*
> > +            info->n_src_port);
> > +        if (info->n_src_port != cfg->n_output_src_vlan) {
> > +            VLOG_ERR("src port count:%d ouput src vlan count:%lu",
> > +                info->n_src_port, (unsigned long) cfg->n_output_src_vlan);
> > +            return -1;
> > +        }
> > +        info->output_src_tags = xmalloc(sizeof(uint16_t)*info->n_src_port);
> > +    }
> > +
> > +    if (info->n_src_port) {
> > +        /* find netdev instance for each port */
> > +        for (i = 0; i < info->n_src_port; i++) {
> > +            info->src[i] = get_netdev_from_port(m, &port,
> > +                cfg->select_src_port[i]->name);
> > +            if (!info->src[i]) {
> > +                VLOG_ERR("src-port: %s is not a netdev device\n",
> > +                    cfg->select_src_port[i]->name);
> > +                return -1;
> > +            }
> > +        }
> > +        memset(info->flow_dst_mac, 0, sizeof(struct eth_addr)*
> > +            info->n_src_port);
> > +
> > +        /*
> > +         * for source port, flow is separated by
> > +         * different dst mac addr
> > +         */
> > +        if (cfg->n_flow_dst_mac) {
> > +            int dst_count = (info->n_src_port > cfg->n_flow_dst_mac)?
> > +                cfg->n_flow_dst_mac:info->n_src_port;
> > +            for (i = 0; i < dst_count; i++) {
> > +                eth_addr_from_string(cfg->flow_dst_mac[i],
> > +                    &info->flow_dst_mac[i]);
> > +            }
> > +        }
> > +
> > +        if (cfg->n_output_src_vlan) {
> > +            int count = (cfg->n_output_src_vlan > info->n_src_port)?
> > +                info->n_src_port:cfg->n_output_src_vlan;
> > +            for (i = 0; i < count; i++) {
> > +                info->output_src_tags[i] = cfg->output_src_vlan[i] & 0xFFF;
> > +            }
> > +        }
> > +    }
> > +
> > +    /* dst ports */
> > +    info->n_dst_port = cfg->n_select_dst_port;
> > +    if (info->n_dst_port) {
> > +        info->dst = xmalloc(sizeof(struct netdev *)*info->n_dst_port);
> > +        info->flow_src_mac = xmalloc(sizeof(struct eth_addr)*
> > +            info->n_dst_port);
> > +        if (info->n_dst_port != cfg->n_output_dst_vlan) {
> > +            VLOG_ERR("dst port count:%d ouput dst vlan count:%lu\n",
> > +                info->n_dst_port, (unsigned long) cfg->n_output_dst_vlan);
> > +            return -1;
> > +        }
> > +        info->output_dst_tags = xmalloc(sizeof(uint16_t)*info->n_dst_port);
> > +    }
> > +
> > +    if (info->n_dst_port) {
> > +        for (i = 0; i < info->n_dst_port; i++) {
> > +            info->dst[i] = get_netdev_from_port(m, &port,
> > +                cfg->select_dst_port[i]->name);
> > +            if (!info->dst[i]) {
> > +                VLOG_ERR("dst-port: %s is not a netdev device\n",
> > +                    cfg->select_dst_port[i]->name);
> > +                return -1;
> > +            }
> > +        }
> > +        memset(info->flow_src_mac, 0, sizeof(struct eth_addr)*
> > +            info->n_dst_port);
> > +
> > +        /*
> > +         * for destination port, flow is separated by
> > +         * different src mac addr
> > +         */
> > +        if (cfg->n_flow_src_mac) {
> > +            int src_count = (info->n_dst_port > cfg->n_flow_src_mac)?
> > +                cfg->n_flow_src_mac:info->n_dst_port;
> > +            for (i = 0; i < src_count; i++) {
> > +                eth_addr_from_string(cfg->flow_src_mac[i],
> > +                    &info->flow_src_mac[i]);
> > +            }
> > +        }
> > +
> > +        if (cfg->n_output_dst_vlan) {
> > +            int count = (cfg->n_output_dst_vlan > info->n_dst_port)?
> > +                info->n_dst_port:cfg->n_output_dst_vlan;
> > +            for (i = 0; i < count; i++) {
> > +                info->output_dst_tags[i] = cfg->output_dst_vlan[i] & 0xFFF;
> > +            }
> > +        }
> > +    }
> > +
> > +    VLOG_INFO("sucess creating mirror-offload(%s): with %d src-port"
> > +        " streams %d dst-port streams to tunnel %s\n",
> > +        cfg->name, info->n_src_port, info->n_dst_port,
> > +        info->mirror_tunnel_addr?info->mirror_tunnel_addr:"none");
> > +    return 0;
> > +}
> > +
> > +static void
> > +mirror_offload_destroy(struct mirror *m)
> > +{
> > +    struct mirror_offload_info info;
> > +
> > +    memset(&info, 0, sizeof(struct mirror_offload_info));
> > +    info.add_mirror = false;
> > +    if (m->name) {
> > +        info.name = xmalloc(strlen(m->name) + 1);
> > +        if (info.name) {
> > +            ovs_strzcpy(info.name, m->name, strlen(m->name));
> > +        }
> > +    }
> > +
> > +    netdev_mirror_offload_put(&info);
> > +    if (info.name) {
> > +        free(info.name);
> > +    }
> > +    if (info.mirror_tunnel_addr) {
> > +        free(info.mirror_tunnel_addr);
> > +    }
> > +}
> > +
> >  static void
> >  mirror_destroy(struct mirror *m)
> >  {
> >      if (m) {
> >          struct bridge *br = m->bridge;
> >
> > -        if (br->ofproto) {
> > -            ofproto_mirror_unregister(br->ofproto, m);
> > +        if (m->cfg && m->cfg->mirror_offload) {
> > +            mirror_offload_destroy(m);
> > +        } else {
> > +            if (br->ofproto) {
> > +                ofproto_mirror_unregister(br->ofproto, m);
> > +            }
> >          }
> >
> >          hmap_remove(&br->mirrors, &m->hmap_node);
> > @@ -5094,12 +5341,32 @@ mirror_collect_ports(struct mirror *m,
> >      *n_out_portsp = n_out_ports;
> >  }
> >
> > +static bool
> > +mirror_offload_configure(struct mirror *m)
> > +{
> > +    struct mirror_offload_info info;
> > +
> > +    memset(&info, 0, sizeof(struct mirror_offload_info));
> > +    info.add_mirror = true;
> > +    if (set_mirror_offload_info(m, &info)) {
> > +        release_mirror_offload_info(&info);
> > +        return false;
> > +    }
> > +
> > +    netdev_mirror_offload_put(&info);
> > +    release_mirror_offload_info(&info);
> > +    return true;
> > +}
> > +
> >  static bool
> >  mirror_configure(struct mirror *m)
> >  {
> >      const struct ovsrec_mirror *cfg = m->cfg;
> >      struct ofproto_mirror_settings s;
> >
> > +    if (cfg->mirror_offload) {
> > +        return mirror_offload_configure(m);
> > +    }
> >      /* Set name. */
> >      if (strcmp(cfg->name, m->name)) {
> >          free(m->name);
> > diff --git a/vswitchd/vswitch.ovsschema b/vswitchd/vswitch.ovsschema
> > index 0666c8c76..4a1a34a1f 100644
> > --- a/vswitchd/vswitch.ovsschema
> > +++ b/vswitchd/vswitch.ovsschema
> > @@ -1,6 +1,6 @@
> >  {"name": "Open_vSwitch",
> > - "version": "8.2.0",
> > - "cksum": "1076640191 26427",
> > + "version": "8.2.1",
> > + "cksum": "4051567316 27206",
> >   "tables": {
> >     "Open_vSwitch": {
> >       "columns": {
> > @@ -418,8 +418,18 @@
> >       "columns": {
> >         "name": {
> >           "type": "string"},
> > +       "mirror_tunnel_addr": {
> > +         "type": "string"},
> >         "select_all": {
> >           "type": "boolean"},
> > +       "mirror_offload": {
> > +         "type": "boolean"},
> > +       "flow_src_mac": {
> > +         "type": {"key": {"type": "string"},
> > +                  "min": 0, "max": "unlimited"}},
> > +       "flow_dst_mac": {
> > +         "type": {"key": {"type": "string"},
> > +                  "min": 0, "max": "unlimited"}},
> >         "select_src_port": {
> >           "type": {"key": {"type": "uuid",
> >                            "refTable": "Port",
> > @@ -440,6 +450,16 @@
> >                            "refTable": "Port",
> >                            "refType": "weak"},
> >                    "min": 0, "max": 1}},
> > +       "output_src_vlan": {
> > +         "type": {"key": {"type": "integer",
> > +                          "minInteger": 0,
> > +                          "maxInteger": 4294967295},
> > +                  "min": 0, "max": 4096}},
> > +       "output_dst_vlan": {
> > +         "type": {"key": {"type": "integer",
> > +                          "minInteger": 0,
> > +                          "maxInteger": 4294967295},
> > +                  "min": 0, "max": 4096}},
> >         "output_vlan": {
> >           "type": {"key": {"type": "integer",
> >                            "minInteger": 1,
> > diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
> > index 4597a215d..fd2049a7f 100644
> > --- a/vswitchd/vswitch.xml
> > +++ b/vswitchd/vswitch.xml
> > @@ -4869,11 +4869,35 @@ ovs-vsctl add-port br0 p0 -- set Interface p0
> type=patch options:peer=p1 \
> >          selected VLANs.
> >        </p>
> >
> > +      <column name="mirror_tunnel_addr">
> > +        BDF string of the tunnel device on which mirrored traffic will be
> > +        transmitted.
> > +      </column>
> > +
> >        <column name="select_all">
> >          If true, every packet arriving or departing on any port is
> >          selected for mirroring.
> >        </column>
> >
> > +      <column name="mirror_offload">
> > +        If true, a hw-assisted port mirroring is configured instead
> > +        default mirroring.
> > +      </column>
> > +
> > +      <column name="flow_src_mac">
> > +        The source MAC address(es) for per-flow mirroring. Each MAC
> > +        address is separate by ','. This parametr is paired with
> > +        select_dst_port. A '0' MAC address indicates the requested mirror
> > +        is a per-port mirroring, otherwise it's a per-flow mirroring
> > +      </column>
> > +
> > +      <column name="flow_dst_mac">
> > +        The destination MAC address(es) for per-flow mirroring. Each MAC
> > +        address is separate by ','. This parametr is paired with
> > +        select_src_port. A '0' MAC address indicates the requested mirror
> > +        is a per-port mirroring, otherwise it's a per-flow mirroring
> > +      </column>
> > +
> >        <column name="select_dst_port">
> >          Ports on which departing packets are selected for mirroring.
> >        </column>
> > @@ -4955,6 +4979,32 @@ ovs-vsctl add-port br0 p0 -- set Interface p0
> type=patch options:peer=p1 \
> >          </p>
> >        </column>
> >
> > +      <column name="output_src_vlan">
> > +        <p>Output VLAN for selected source port packets, if nonempty.</p>
> > +        <p>
> > +          <em>Please note:</em> This is different than
> > +          <ref column="output-vlan"/> This vlan is used to add an additional
> > +          vlan tag on the mirror traffic, regardless it contains vlan or not.
> > +          The receive end could choose to filter out this additional vlan.
> > +          This option is provided so the mirrored traffic could maintain its
> > +          original vlan informaiton, and this mirror can be used to filter
> > +          out un-wanted traffic such as in <ref column="mirror_offload"/>.
> > +        </p>
> > +      </column>
> > +
> > +      <column name="output_dst_vlan">
> > +        <p>Output VLAN for selected destination port packets, if
> nonempty.</p>
> > +        <p>
> > +          <em>Please note:</em> This is different than
> > +          <ref column="output-vlan"/> This vlan is used to add an additional
> > +          vlan tag on the mirror traffic, regardless it contains vlan or not.
> > +          The receive end could choose to filter out this additional vlan.
> > +          This option is provided so the mirrored traffic could maintain its
> > +          original vlan informaiton, and this mirror cab be used to filter
> > +          out un-wanted traffic such as in <ref column="mirror_offload"/>.
> > +        </p>
> > +      </column>
> > +
> >        <column name="snaplen">
> >          <p>Maximum per-packet number of bytes to mirror.</p>
> >          <p>A mirrored packet with size larger than <ref column="snaplen"/>
> >
Maxime Coquelin May 19, 2021, 7:55 a.m. UTC | #4
Hi Liang-min,

When replying inline, please do not prefix with ">>" as it is handled as
quoted text. There is no need to prefix.

On 5/18/21 8:00 PM, Wang, Liang-min wrote:
>> -----Original Message-----
>> From: Maxime Coquelin <maxime.coquelin@redhat.com>
>> Sent: Tuesday, May 18, 2021 12:15 PM
>> To: Miskell, Timothy <timothy.miskell@intel.com>; dev@openvswitch.org
>> Cc: Wang, Liang-min <liang-min.wang@intel.com>
>> Subject: Re: [PATCH] Extends the existing mirror configuration parameters
>>
>> Hi Timothy, Liang-min,
>>
>> Thanks for rebasing the patch.
>> A list of delta against the first RFC could help the reviewers.
>> I notice one change in the right direction is the conversion to Vhost
>> API datapath instead of Vhost PMD.
>>
>>> In this patch we support both Vhost API and Vhost PMD because OVS supports both
>>> VhostUser and Vdev ports.
>>>
>> Also, I would suggest to have the patch split in several incremental
>> patches to ease the review.
>>
>>> Thank you for suggestion. We will provide incremental patches on next submission
>>>
>> On 5/10/21 6:00 PM, Timothy Miskell wrote:
>>> From: Liang-min Wang <liang-min.wang@intel.com>
>>>
>>> The following parameters are added:
>>>  - mirror-offload: to turn on/off mirror offloading.
>>>  - output-port-name: specify a port, using name string, that is on a different
>>>    bridge
>>>  - output-src-vlan: output port vlan for each select-src-port.
>>>  - output-dst-vlan: output port vlan for each select-dst-port.
>>>  - flow-src-mac: use src mac address of each select-dst-port for the header
>>>    scan.
>>>  - flow-dst-mac: use dst mac address of each select-src-port for the header
>>>    scan.
>>>  - mirror-tunnel-addr: BDF string of the tunnel device.
>>>
>>> ovs-vsctl test change because new mirroring parameters are introduced in
>> this patch
>>
>> It would help to provide examples of usage of these new parameters.
>>
>>> Will add examples in the new patches
>>>
>>> Create a defer procedure call thread to handle all mirror offload requests.
>>> This is a light-weight thread which remains in sleep-state when there is no
>> new request.
>>> This is created between ovs-vsctl and mirror offloading back end
>>>
>>> Implementing DPDK tx-burst (VIRTIO ingress traffic
>>> mirror) and rx-burst (VIRTIO egress traffic mirror) callbacks.
>>> Each callback  functions implement the following tasks:
>>>  1. Enable per-packet VLAN insertion
>>>    - for port mirroring, all packets are enabled per-packet VLAN insertion.
>>>    - for flow mirroring, only packet header matches the required mac
>> address
>>>      are enabled.
>>>  2. Sending the packets to the specified transport port (output-port in
>>>     mirror offload configuration)
>>>    - for port mirroring, all packets are sent to the transport port.
>>>    - for flow mirroring, only matched packets are sent.
>>>  3. Restore each packet attributes (remove DPDK per-packet offload flag)
>>
>> I will for sure have more questions later, but please find a few
>> comments/questions below:
>>
>>> Signed-off-by: Liang-min Wang <liang-min.wang@intel.com>
>>> Tested-by: Timothy Miskell <Timothy.Miskell@intel.com>
>>> Suggested-by: Munish Mehan <mm6021@att.com>
>>> ---
>>>  lib/automake.mk            |   2 +
>>>  lib/netdev-dpdk-mirror.c   | 516
>> +++++++++++++++++++++++++++++++++++++
>>>  lib/netdev-dpdk-mirror.h   |  83 ++++++
>>>  lib/netdev-dpdk.c          | 397 ++++++++++++++++++++++++++++
>>>  lib/netdev-provider.h      |  16 ++
>>>  lib/netdev.c               | 386 +++++++++++++++++++++++++++
>>>  lib/netdev.h               |  16 ++
>>>  tests/ovs-vsctl.at         |   2 +
>>>  vswitchd/bridge.c          | 271 ++++++++++++++++++-
>>>  vswitchd/vswitch.ovsschema |  24 +-
>>>  vswitchd/vswitch.xml       |  50 ++++
>>>  11 files changed, 1759 insertions(+), 4 deletions(-)
>>>  create mode 100644 lib/netdev-dpdk-mirror.c
>>>  create mode 100644 lib/netdev-dpdk-mirror.h
>>>

...

>>> +
>>> +/* port/flow mirror traffic processors */
>>> +static inline uint16_t
>>> +netdev_custom_mirror_offload_cb(uint16_t qidx, struct rte_mbuf
>> **pkts,
>>> +    uint16_t nb_pkts, void *user_params)
>>> +{
>>> +    struct mirror_param *data = user_params;
>>> +    uint16_t i, dst_qidx, match_count = 0;
>>> +    uint16_t pkt_trans;
>>> +    uint16_t dst_port_id = data->dst_port_id;
>>> +    uint16_t dst_vlan_id = data->dst_vlan_id;
>>> +    struct rte_mbuf **pkt_buf = &data->pkt_buf[qidx * data-
>>> max_burst_size];
>>> +
>>> +    if (nb_pkts == 0) {
>>> +        return 0;
>>> +    }
>>> +
>>> +    if (nb_pkts > data->max_burst_size) {
>>> +        VLOG_ERR("Per-flow batch size, %d, exceeds maximum limit\n",
>> nb_pkts);
>>> +        return 0;
>>> +    }
>>> +
>>> +    for (i = 0; i < nb_pkts; i++) {
>>> +        if (data->custom_scan(pkts[i], user_params)) {
>>> +            pkt_buf[match_count] = pkts[i];
>>> +            pkt_buf[match_count]->ol_flags |= PKT_TX_VLAN_PKT;
>>
>> Does it work if the packet already has a VLAN inserted?
>>
>>> Good catch. The design is based upon no VLAN insertion offloading is applied on source traffic. 

That is a significant limitation, I haven not seen it documented in the
series. Also, the IEEE paper mentions using QinQ when the packet has
already a VLAN inserted, but it is not in the series. Is there a reason
QinQ is not possible?

IIUC, that means the TAP VM would see traffic having no VLAN, whereas it
has in reality?

>>> +            pkt_buf[match_count]->vlan_tci = dst_vlan_id;
>>> +            rte_mbuf_refcnt_update(pkt_buf[match_count], 1);
>>
>>
>>
>>> +            match_count++;
>>> +        }
>>> +    }
>>> +
>>> +    dst_qidx = (data->n_dst_queue > qidx)?qidx:(data->n_dst_queue -1);
>>
>> Wouldn't it scale better with:
>> dst_qidx = qidx % data->n_dst_queue
>> ?
>>
>>> We tried to avoid using "%" operator. We could add "unlikely" and the suggested "%" to make improvement

Not sure adding 'unlikely' is really necessary. The cost of the modulo
operation is nothing compared to all we do in this path.

>>>
>>> +
>>> +    rte_spinlock_lock(&data->locks[dst_qidx]);
>>> +    pkt_trans = rte_eth_tx_burst(dst_port_id, dst_qidx, pkt_buf,
>> match_count);
>>> +    rte_spinlock_unlock(&data->locks[dst_qidx]);
>>> +
>>> +    for (i = 0; i < match_count; i++) {
>>> +        pkt_buf[i]->ol_flags &= ~PKT_TX_VLAN_PKT;
>>> +    }
>>
>> In order to further reduce the performance impact of mirroring, have you
>> envisaged to offload it to dedicated PMD threads?
>>
>>> The mirror-tunnel design is a comprised approach between hardware TAP and software TAP.
>>> The tunnel itself is designed to have very little impact on source traffic processing core. From
>>> our benchmark on SR-IOV, L2 forwarding, mirroring, we only observed 10-20% impact on 64-byte packets, and
>>> we did not observe impact when running traffic with packet size with 128-byte or above.

The cover letter mentions 20-30% for 64B packets, and the IEEE paper
seems to indicate a significant impact up to 512B packets (~25%?).

Maybe the workload is different in these tests, that could explain why
you don't see an impact starting 128B?

>>>
>>> +
>>> +    while (unlikely (pkt_trans < match_count)) {
>>> +        rte_pktmbuf_free(pkt_buf[pkt_trans]);
>>> +        pkt_trans++;
>>> +    }
>>> +
>>> +    return nb_pkts;
>>> +}
>>> +

...

>>> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
>>> index 4597a215d..fd2049a7f 100644
>>> --- a/vswitchd/vswitch.xml
>>> +++ b/vswitchd/vswitch.xml
>>> @@ -4869,11 +4869,35 @@ ovs-vsctl add-port br0 p0 -- set Interface p0
>> type=patch options:peer=p1 \
>>>          selected VLANs.
>>>        </p>
>>>
>>> +      <column name="mirror_tunnel_addr">
>>> +        BDF string of the tunnel device on which mirrored traffic will be
>>> +        transmitted.
>>> +      </column>
>>> +
>>>        <column name="select_all">
>>>          If true, every packet arriving or departing on any port is
>>>          selected for mirroring.
>>>        </column>
>>>
>>> +      <column name="mirror_offload">
>>> +        If true, a hw-assisted port mirroring is configured instead
>>> +        default mirroring.
>>> +      </column>
>>> +
>>> +      <column name="flow_src_mac">
>>> +        The source MAC address(es) for per-flow mirroring. Each MAC
>>> +        address is separate by ','. This parametr is paired with
>>> +        select_dst_port. A '0' MAC address indicates the requested mirror
>>> +        is a per-port mirroring, otherwise it's a per-flow mirroring
>>> +      </column>
>>> +
>>> +      <column name="flow_dst_mac">
>>> +        The destination MAC address(es) for per-flow mirroring. Each MAC
>>> +        address is separate by ','. This parametr is paired with
>>> +        select_src_port. A '0' MAC address indicates the requested mirror
>>> +        is a per-port mirroring, otherwise it's a per-flow mirroring
>>> +      </column>
>>> +
>>>        <column name="select_dst_port">
>>>          Ports on which departing packets are selected for mirroring.
>>>        </column>
>>> @@ -4955,6 +4979,32 @@ ovs-vsctl add-port br0 p0 -- set Interface p0
>> type=patch options:peer=p1 \
>>>          </p>
>>>        </column>
>>>
>>> +      <column name="output_src_vlan">
>>> +        <p>Output VLAN for selected source port packets, if nonempty.</p>
>>> +        <p>
>>> +          <em>Please note:</em> This is different than
>>> +          <ref column="output-vlan"/> This vlan is used to add an additional
>>> +          vlan tag on the mirror traffic, regardless it contains vlan or not.
>>> +          The receive end could choose to filter out this additional vlan.
>>> +          This option is provided so the mirrored traffic could maintain its
>>> +          original vlan informaiton, and this mirror can be used to filter
>>> +          out un-wanted traffic such as in <ref column="mirror_offload"/>.
>>> +        </p>
>>> +      </column>
>>> +
>>> +      <column name="output_dst_vlan">
>>> +        <p>Output VLAN for selected destination port packets, if
>> nonempty.</p>
>>> +        <p>
>>> +          <em>Please note:</em> This is different than
>>> +          <ref column="output-vlan"/> This vlan is used to add an additional
>>> +          vlan tag on the mirror traffic, regardless it contains vlan or not.
>>> +          The receive end could choose to filter out this additional vlan.
>>> +          This option is provided so the mirrored traffic could maintain its
>>> +          original vlan informaiton, and this mirror cab be used to filter
>>> +          out un-wanted traffic such as in <ref column="mirror_offload"/>.
>>> +        </p>
>>> +      </column>
>>> +
>>>        <column name="snaplen">
>>>          <p>Maximum per-packet number of bytes to mirror.</p>
>>>          <p>A mirrored packet with size larger than <ref column="snaplen"/>
>>>
> 

These parameters are DPDK specific, but nothing mentions it. It would
confuse the OVS-Kernel users.

Regards,
Maxime
Wang, Liang-min May 19, 2021, 11:53 a.m. UTC | #5
> -----Original Message-----
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> Sent: Wednesday, May 19, 2021 3:56 AM
> To: Wang, Liang-min <liang-min.wang@intel.com>; Miskell, Timothy
> <timothy.miskell@intel.com>; dev@openvswitch.org
> Subject: Re: [PATCH] Extends the existing mirror configuration parameters
> 
> Hi Liang-min,
> 
> When replying inline, please do not prefix with ">>" as it is handled as
> quoted text. There is no need to prefix.
> 
> On 5/18/21 8:00 PM, Wang, Liang-min wrote:
> >> -----Original Message-----
> >> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> >> Sent: Tuesday, May 18, 2021 12:15 PM
> >> To: Miskell, Timothy <timothy.miskell@intel.com>; dev@openvswitch.org
> >> Cc: Wang, Liang-min <liang-min.wang@intel.com>
> >> Subject: Re: [PATCH] Extends the existing mirror configuration
> parameters
> >>
> >> Hi Timothy, Liang-min,
> >>
> >> Thanks for rebasing the patch.
> >> A list of delta against the first RFC could help the reviewers.
> >> I notice one change in the right direction is the conversion to Vhost
> >> API datapath instead of Vhost PMD.
> >>
> >>> In this patch we support both Vhost API and Vhost PMD because OVS
> supports both
> >>> VhostUser and Vdev ports.
> >>>
> >> Also, I would suggest to have the patch split in several incremental
> >> patches to ease the review.
> >>
> >>> Thank you for suggestion. We will provide incremental patches on next
> submission
> >>>
> >> On 5/10/21 6:00 PM, Timothy Miskell wrote:
> >>> From: Liang-min Wang <liang-min.wang@intel.com>
> >>>
> >>> The following parameters are added:
> >>>  - mirror-offload: to turn on/off mirror offloading.
> >>>  - output-port-name: specify a port, using name string, that is on a
> different
> >>>    bridge
> >>>  - output-src-vlan: output port vlan for each select-src-port.
> >>>  - output-dst-vlan: output port vlan for each select-dst-port.
> >>>  - flow-src-mac: use src mac address of each select-dst-port for the
> header
> >>>    scan.
> >>>  - flow-dst-mac: use dst mac address of each select-src-port for the
> header
> >>>    scan.
> >>>  - mirror-tunnel-addr: BDF string of the tunnel device.
> >>>
> >>> ovs-vsctl test change because new mirroring parameters are introduced
> in
> >> this patch
> >>
> >> It would help to provide examples of usage of these new parameters.
> >>
> >>> Will add examples in the new patches
> >>>
> >>> Create a defer procedure call thread to handle all mirror offload
> requests.
> >>> This is a light-weight thread which remains in sleep-state when there is
> no
> >> new request.
> >>> This is created between ovs-vsctl and mirror offloading back end
> >>>
> >>> Implementing DPDK tx-burst (VIRTIO ingress traffic
> >>> mirror) and rx-burst (VIRTIO egress traffic mirror) callbacks.
> >>> Each callback  functions implement the following tasks:
> >>>  1. Enable per-packet VLAN insertion
> >>>    - for port mirroring, all packets are enabled per-packet VLAN insertion.
> >>>    - for flow mirroring, only packet header matches the required mac
> >> address
> >>>      are enabled.
> >>>  2. Sending the packets to the specified transport port (output-port in
> >>>     mirror offload configuration)
> >>>    - for port mirroring, all packets are sent to the transport port.
> >>>    - for flow mirroring, only matched packets are sent.
> >>>  3. Restore each packet attributes (remove DPDK per-packet offload
> flag)
> >>
> >> I will for sure have more questions later, but please find a few
> >> comments/questions below:
> >>
> >>> Signed-off-by: Liang-min Wang <liang-min.wang@intel.com>
> >>> Tested-by: Timothy Miskell <Timothy.Miskell@intel.com>
> >>> Suggested-by: Munish Mehan <mm6021@att.com>
> >>> ---
> >>>  lib/automake.mk            |   2 +
> >>>  lib/netdev-dpdk-mirror.c   | 516
> >> +++++++++++++++++++++++++++++++++++++
> >>>  lib/netdev-dpdk-mirror.h   |  83 ++++++
> >>>  lib/netdev-dpdk.c          | 397 ++++++++++++++++++++++++++++
> >>>  lib/netdev-provider.h      |  16 ++
> >>>  lib/netdev.c               | 386 +++++++++++++++++++++++++++
> >>>  lib/netdev.h               |  16 ++
> >>>  tests/ovs-vsctl.at         |   2 +
> >>>  vswitchd/bridge.c          | 271 ++++++++++++++++++-
> >>>  vswitchd/vswitch.ovsschema |  24 +-
> >>>  vswitchd/vswitch.xml       |  50 ++++
> >>>  11 files changed, 1759 insertions(+), 4 deletions(-)
> >>>  create mode 100644 lib/netdev-dpdk-mirror.c
> >>>  create mode 100644 lib/netdev-dpdk-mirror.h
> >>>
> 
> ...
> 
> >>> +
> >>> +/* port/flow mirror traffic processors */
> >>> +static inline uint16_t
> >>> +netdev_custom_mirror_offload_cb(uint16_t qidx, struct rte_mbuf
> >> **pkts,
> >>> +    uint16_t nb_pkts, void *user_params)
> >>> +{
> >>> +    struct mirror_param *data = user_params;
> >>> +    uint16_t i, dst_qidx, match_count = 0;
> >>> +    uint16_t pkt_trans;
> >>> +    uint16_t dst_port_id = data->dst_port_id;
> >>> +    uint16_t dst_vlan_id = data->dst_vlan_id;
> >>> +    struct rte_mbuf **pkt_buf = &data->pkt_buf[qidx * data-
> >>> max_burst_size];
> >>> +
> >>> +    if (nb_pkts == 0) {
> >>> +        return 0;
> >>> +    }
> >>> +
> >>> +    if (nb_pkts > data->max_burst_size) {
> >>> +        VLOG_ERR("Per-flow batch size, %d, exceeds maximum limit\n",
> >> nb_pkts);
> >>> +        return 0;
> >>> +    }
> >>> +
> >>> +    for (i = 0; i < nb_pkts; i++) {
> >>> +        if (data->custom_scan(pkts[i], user_params)) {
> >>> +            pkt_buf[match_count] = pkts[i];
> >>> +            pkt_buf[match_count]->ol_flags |= PKT_TX_VLAN_PKT;
> >>
> >> Does it work if the packet already has a VLAN inserted?
> >>
> >>> Good catch. The design is based upon no VLAN insertion offloading is
> applied on source traffic.
> 
> That is a significant limitation, I haven not seen it documented in the
> series. Also, the IEEE paper mentions using QinQ when the packet has
> already a VLAN inserted, but it is not in the series. Is there a reason
> QinQ is not possible?
> 
> IIUC, that means the TAP VM would see traffic having no VLAN, whereas it
> has in reality?
> 

It does support 802.1q (single VLAN). I mis-understood your last email.
This design assumes there is no PKT_TX_VLAN_PKT flag set on the source mbuf.
This assumption is based upon that VIRTIO spec. doesn't support
per-packet VLAN insertion offloading.

> >>> +            pkt_buf[match_count]->vlan_tci = dst_vlan_id;
> >>> +            rte_mbuf_refcnt_update(pkt_buf[match_count], 1);
> >>
> >>
> >>
> >>> +            match_count++;
> >>> +        }
> >>> +    }
> >>> +
> >>> +    dst_qidx = (data->n_dst_queue > qidx)?qidx:(data->n_dst_queue -
> 1);
> >>
> >> Wouldn't it scale better with:
> >> dst_qidx = qidx % data->n_dst_queue
> >> ?
> >>
> >>> We tried to avoid using "%" operator. We could add "unlikely" and the
> suggested "%" to make improvement
> 
> Not sure adding 'unlikely' is really necessary. The cost of the modulo
> operation is nothing compared to all we do in this path.
> 
> >>>
> >>> +
> >>> +    rte_spinlock_lock(&data->locks[dst_qidx]);
> >>> +    pkt_trans = rte_eth_tx_burst(dst_port_id, dst_qidx, pkt_buf,
> >> match_count);
> >>> +    rte_spinlock_unlock(&data->locks[dst_qidx]);
> >>> +
> >>> +    for (i = 0; i < match_count; i++) {
> >>> +        pkt_buf[i]->ol_flags &= ~PKT_TX_VLAN_PKT;
> >>> +    }
> >>
> >> In order to further reduce the performance impact of mirroring, have you
> >> envisaged to offload it to dedicated PMD threads?
> >>
> >>> The mirror-tunnel design is a comprised approach between hardware
> TAP and software TAP.
> >>> The tunnel itself is designed to have very little impact on source traffic
> processing core. From
> >>> our benchmark on SR-IOV, L2 forwarding, mirroring, we only observed
> 10-20% impact on 64-byte packets, and
> >>> we did not observe impact when running traffic with packet size with
> 128-byte or above.
> 
> The cover letter mentions 20-30% for 64B packets, and the IEEE paper
> seems to indicate a significant impact up to 512B packets (~25%?).
> 
> Maybe the workload is different in these tests, that could explain why
> you don't see an impact starting 128B?
> 

I believe the 20-30% drop is for VIRTIO port mirroring not SR-IOV mirroring.
The 512B packet (~25%?) is for default VIRTIO mirroring (OVS default
Implementation) not this design.

> >>>
> >>> +
> >>> +    while (unlikely (pkt_trans < match_count)) {
> >>> +        rte_pktmbuf_free(pkt_buf[pkt_trans]);
> >>> +        pkt_trans++;
> >>> +    }
> >>> +
> >>> +    return nb_pkts;
> >>> +}
> >>> +
> 
> ...
> 
> >>> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
> >>> index 4597a215d..fd2049a7f 100644
> >>> --- a/vswitchd/vswitch.xml
> >>> +++ b/vswitchd/vswitch.xml
> >>> @@ -4869,11 +4869,35 @@ ovs-vsctl add-port br0 p0 -- set Interface p0
> >> type=patch options:peer=p1 \
> >>>          selected VLANs.
> >>>        </p>
> >>>
> >>> +      <column name="mirror_tunnel_addr">
> >>> +        BDF string of the tunnel device on which mirrored traffic will be
> >>> +        transmitted.
> >>> +      </column>
> >>> +
> >>>        <column name="select_all">
> >>>          If true, every packet arriving or departing on any port is
> >>>          selected for mirroring.
> >>>        </column>
> >>>
> >>> +      <column name="mirror_offload">
> >>> +        If true, a hw-assisted port mirroring is configured instead
> >>> +        default mirroring.
> >>> +      </column>
> >>> +
> >>> +      <column name="flow_src_mac">
> >>> +        The source MAC address(es) for per-flow mirroring. Each MAC
> >>> +        address is separate by ','. This parametr is paired with
> >>> +        select_dst_port. A '0' MAC address indicates the requested mirror
> >>> +        is a per-port mirroring, otherwise it's a per-flow mirroring
> >>> +      </column>
> >>> +
> >>> +      <column name="flow_dst_mac">
> >>> +        The destination MAC address(es) for per-flow mirroring. Each MAC
> >>> +        address is separate by ','. This parametr is paired with
> >>> +        select_src_port. A '0' MAC address indicates the requested mirror
> >>> +        is a per-port mirroring, otherwise it's a per-flow mirroring
> >>> +      </column>
> >>> +
> >>>        <column name="select_dst_port">
> >>>          Ports on which departing packets are selected for mirroring.
> >>>        </column>
> >>> @@ -4955,6 +4979,32 @@ ovs-vsctl add-port br0 p0 -- set Interface p0
> >> type=patch options:peer=p1 \
> >>>          </p>
> >>>        </column>
> >>>
> >>> +      <column name="output_src_vlan">
> >>> +        <p>Output VLAN for selected source port packets, if
> nonempty.</p>
> >>> +        <p>
> >>> +          <em>Please note:</em> This is different than
> >>> +          <ref column="output-vlan"/> This vlan is used to add an additional
> >>> +          vlan tag on the mirror traffic, regardless it contains vlan or not.
> >>> +          The receive end could choose to filter out this additional vlan.
> >>> +          This option is provided so the mirrored traffic could maintain its
> >>> +          original vlan informaiton, and this mirror can be used to filter
> >>> +          out un-wanted traffic such as in <ref column="mirror_offload"/>.
> >>> +        </p>
> >>> +      </column>
> >>> +
> >>> +      <column name="output_dst_vlan">
> >>> +        <p>Output VLAN for selected destination port packets, if
> >> nonempty.</p>
> >>> +        <p>
> >>> +          <em>Please note:</em> This is different than
> >>> +          <ref column="output-vlan"/> This vlan is used to add an additional
> >>> +          vlan tag on the mirror traffic, regardless it contains vlan or not.
> >>> +          The receive end could choose to filter out this additional vlan.
> >>> +          This option is provided so the mirrored traffic could maintain its
> >>> +          original vlan informaiton, and this mirror cab be used to filter
> >>> +          out un-wanted traffic such as in <ref column="mirror_offload"/>.
> >>> +        </p>
> >>> +      </column>
> >>> +
> >>>        <column name="snaplen">
> >>>          <p>Maximum per-packet number of bytes to mirror.</p>
> >>>          <p>A mirrored packet with size larger than <ref
> column="snaplen"/>
> >>>
> >
> 
> These parameters are DPDK specific, but nothing mentions it. It would
> confuse the OVS-Kernel users.
> 

The current implementation only supports OVS-DPDK.

> Regards,
> Maxime
Maxime Coquelin May 19, 2021, 12:49 p.m. UTC | #6
On 5/19/21 1:53 PM, Wang, Liang-min wrote:
>> -----Original Message-----
>> From: Maxime Coquelin <maxime.coquelin@redhat.com>
>> Sent: Wednesday, May 19, 2021 3:56 AM
>> To: Wang, Liang-min <liang-min.wang@intel.com>; Miskell, Timothy
>> <timothy.miskell@intel.com>; dev@openvswitch.org
>> Subject: Re: [PATCH] Extends the existing mirror configuration parameters
>>
>> Hi Liang-min,
>>
>> When replying inline, please do not prefix with ">>" as it is handled as
>> quoted text. There is no need to prefix.
>>
>> On 5/18/21 8:00 PM, Wang, Liang-min wrote:
>>>> -----Original Message-----
>>>> From: Maxime Coquelin <maxime.coquelin@redhat.com>
>>>> Sent: Tuesday, May 18, 2021 12:15 PM
>>>> To: Miskell, Timothy <timothy.miskell@intel.com>; dev@openvswitch.org
>>>> Cc: Wang, Liang-min <liang-min.wang@intel.com>
>>>> Subject: Re: [PATCH] Extends the existing mirror configuration
>> parameters
>>>>
>>>> Hi Timothy, Liang-min,
>>>>
>>>> Thanks for rebasing the patch.
>>>> A list of delta against the first RFC could help the reviewers.
>>>> I notice one change in the right direction is the conversion to Vhost
>>>> API datapath instead of Vhost PMD.
>>>>
>>>>> In this patch we support both Vhost API and Vhost PMD because OVS
>> supports both
>>>>> VhostUser and Vdev ports.
>>>>>
>>>> Also, I would suggest to have the patch split in several incremental
>>>> patches to ease the review.
>>>>
>>>>> Thank you for suggestion. We will provide incremental patches on next
>> submission
>>>>>
>>>> On 5/10/21 6:00 PM, Timothy Miskell wrote:
>>>>> From: Liang-min Wang <liang-min.wang@intel.com>
>>>>>
>>>>> The following parameters are added:
>>>>>  - mirror-offload: to turn on/off mirror offloading.
>>>>>  - output-port-name: specify a port, using name string, that is on a
>> different
>>>>>    bridge
>>>>>  - output-src-vlan: output port vlan for each select-src-port.
>>>>>  - output-dst-vlan: output port vlan for each select-dst-port.
>>>>>  - flow-src-mac: use src mac address of each select-dst-port for the
>> header
>>>>>    scan.
>>>>>  - flow-dst-mac: use dst mac address of each select-src-port for the
>> header
>>>>>    scan.
>>>>>  - mirror-tunnel-addr: BDF string of the tunnel device.
>>>>>
>>>>> ovs-vsctl test change because new mirroring parameters are introduced
>> in
>>>> this patch
>>>>
>>>> It would help to provide examples of usage of these new parameters.
>>>>
>>>>> Will add examples in the new patches
>>>>>
>>>>> Create a defer procedure call thread to handle all mirror offload
>> requests.
>>>>> This is a light-weight thread which remains in sleep-state when there is
>> no
>>>> new request.
>>>>> This is created between ovs-vsctl and mirror offloading back end
>>>>>
>>>>> Implementing DPDK tx-burst (VIRTIO ingress traffic
>>>>> mirror) and rx-burst (VIRTIO egress traffic mirror) callbacks.
>>>>> Each callback  functions implement the following tasks:
>>>>>  1. Enable per-packet VLAN insertion
>>>>>    - for port mirroring, all packets are enabled per-packet VLAN insertion.
>>>>>    - for flow mirroring, only packet header matches the required mac
>>>> address
>>>>>      are enabled.
>>>>>  2. Sending the packets to the specified transport port (output-port in
>>>>>     mirror offload configuration)
>>>>>    - for port mirroring, all packets are sent to the transport port.
>>>>>    - for flow mirroring, only matched packets are sent.
>>>>>  3. Restore each packet attributes (remove DPDK per-packet offload
>> flag)
>>>>
>>>> I will for sure have more questions later, but please find a few
>>>> comments/questions below:
>>>>
>>>>> Signed-off-by: Liang-min Wang <liang-min.wang@intel.com>
>>>>> Tested-by: Timothy Miskell <Timothy.Miskell@intel.com>
>>>>> Suggested-by: Munish Mehan <mm6021@att.com>
>>>>> ---
>>>>>  lib/automake.mk            |   2 +
>>>>>  lib/netdev-dpdk-mirror.c   | 516
>>>> +++++++++++++++++++++++++++++++++++++
>>>>>  lib/netdev-dpdk-mirror.h   |  83 ++++++
>>>>>  lib/netdev-dpdk.c          | 397 ++++++++++++++++++++++++++++
>>>>>  lib/netdev-provider.h      |  16 ++
>>>>>  lib/netdev.c               | 386 +++++++++++++++++++++++++++
>>>>>  lib/netdev.h               |  16 ++
>>>>>  tests/ovs-vsctl.at         |   2 +
>>>>>  vswitchd/bridge.c          | 271 ++++++++++++++++++-
>>>>>  vswitchd/vswitch.ovsschema |  24 +-
>>>>>  vswitchd/vswitch.xml       |  50 ++++
>>>>>  11 files changed, 1759 insertions(+), 4 deletions(-)
>>>>>  create mode 100644 lib/netdev-dpdk-mirror.c
>>>>>  create mode 100644 lib/netdev-dpdk-mirror.h
>>>>>
>>
>> ...
>>
>>>>> +
>>>>> +/* port/flow mirror traffic processors */
>>>>> +static inline uint16_t
>>>>> +netdev_custom_mirror_offload_cb(uint16_t qidx, struct rte_mbuf
>>>> **pkts,
>>>>> +    uint16_t nb_pkts, void *user_params)
>>>>> +{
>>>>> +    struct mirror_param *data = user_params;
>>>>> +    uint16_t i, dst_qidx, match_count = 0;
>>>>> +    uint16_t pkt_trans;
>>>>> +    uint16_t dst_port_id = data->dst_port_id;
>>>>> +    uint16_t dst_vlan_id = data->dst_vlan_id;
>>>>> +    struct rte_mbuf **pkt_buf = &data->pkt_buf[qidx * data-
>>>>> max_burst_size];
>>>>> +
>>>>> +    if (nb_pkts == 0) {
>>>>> +        return 0;
>>>>> +    }
>>>>> +
>>>>> +    if (nb_pkts > data->max_burst_size) {
>>>>> +        VLOG_ERR("Per-flow batch size, %d, exceeds maximum limit\n",
>>>> nb_pkts);
>>>>> +        return 0;
>>>>> +    }
>>>>> +
>>>>> +    for (i = 0; i < nb_pkts; i++) {
>>>>> +        if (data->custom_scan(pkts[i], user_params)) {
>>>>> +            pkt_buf[match_count] = pkts[i];
>>>>> +            pkt_buf[match_count]->ol_flags |= PKT_TX_VLAN_PKT;
>>>>
>>>> Does it work if the packet already has a VLAN inserted?
>>>>
>>>>> Good catch. The design is based upon no VLAN insertion offloading is
>> applied on source traffic.
>>
>> That is a significant limitation, I haven not seen it documented in the
>> series. Also, the IEEE paper mentions using QinQ when the packet has
>> already a VLAN inserted, but it is not in the series. Is there a reason
>> QinQ is not possible?
>>
>> IIUC, that means the TAP VM would see traffic having no VLAN, whereas it
>> has in reality?
>>
> 
> It does support 802.1q (single VLAN). I mis-understood your last email.
> This design assumes there is no PKT_TX_VLAN_PKT flag set on the source mbuf.

I understood that assumption, and it seems OVS does not use VLAN offload
currently so it should be OK (even though I would suggest adding a check
that the flag isn't already set not to mess up with the regular traffic
if one day it is added).

My question was:
If you packet already has a VLAN tag *already* inserted (i.e. not to be
inserted), won't the mirrored packet lose that information? I don't
expect QinQ will be setup by the VF driver as

> This assumption is based upon that VIRTIO spec. doesn't support
> per-packet VLAN insertion offloading.

I am not sure to understand this comment, but for Vhost PMD we do the
VLAN insertion in SW:
https://elixir.bootlin.com/dpdk/latest/source/drivers/net/vhost/rte_eth_vhost.c#L447

I think this VLAN insertion in Vhost PMD is problematic twice with this
series:
1. The offloaded VLAN insertion would be lost
2. rte_vlan_insert() will fail as it needs the mbuf refcount to be 1. It
means the packet would be mirrored but would never reach its expected
destination as dropped by the Vhost PMD.

> 
>>>>> +            pkt_buf[match_count]->vlan_tci = dst_vlan_id;
>>>>> +            rte_mbuf_refcnt_update(pkt_buf[match_count], 1);
>>>>
>>>>
>>>>
>>>>> +            match_count++;
>>>>> +        }
>>>>> +    }
>>>>> +
>>>>> +    dst_qidx = (data->n_dst_queue > qidx)?qidx:(data->n_dst_queue -
>> 1);
>>>>
>>>> Wouldn't it scale better with:
>>>> dst_qidx = qidx % data->n_dst_queue
>>>> ?
>>>>
>>>>> We tried to avoid using "%" operator. We could add "unlikely" and the
>> suggested "%" to make improvement
>>
>> Not sure adding 'unlikely' is really necessary. The cost of the modulo
>> operation is nothing compared to all we do in this path.
>>
>>>>>
>>>>> +
>>>>> +    rte_spinlock_lock(&data->locks[dst_qidx]);
>>>>> +    pkt_trans = rte_eth_tx_burst(dst_port_id, dst_qidx, pkt_buf,
>>>> match_count);
>>>>> +    rte_spinlock_unlock(&data->locks[dst_qidx]);
>>>>> +
>>>>> +    for (i = 0; i < match_count; i++) {
>>>>> +        pkt_buf[i]->ol_flags &= ~PKT_TX_VLAN_PKT;
>>>>> +    }
>>>>
>>>> In order to further reduce the performance impact of mirroring, have you
>>>> envisaged to offload it to dedicated PMD threads?
>>>>
>>>>> The mirror-tunnel design is a comprised approach between hardware
>> TAP and software TAP.
>>>>> The tunnel itself is designed to have very little impact on source traffic
>> processing core. From
>>>>> our benchmark on SR-IOV, L2 forwarding, mirroring, we only observed
>> 10-20% impact on 64-byte packets, and
>>>>> we did not observe impact when running traffic with packet size with
>> 128-byte or above.
>>
>> The cover letter mentions 20-30% for 64B packets, and the IEEE paper
>> seems to indicate a significant impact up to 512B packets (~25%?).
>>
>> Maybe the workload is different in these tests, that could explain why
>> you don't see an impact starting 128B?
>>
> 
> I believe the 20-30% drop is for VIRTIO port mirroring not SR-IOV mirroring.

What you mean by SRIOV mirroring is ingress traffic on one VF is
mirrored to another VF? Cannot it be done directly in HW?

Regarding Virtio port mirroring, we can see in the benchmark results you
presented at OVSCon 20 a significant impact for ingress mirroring for
all packet size with your VLAN offload solution. I agree this is much
better than with current mirroring solution, but there should be room
for improvement.

> The 512B packet (~25%?) is for default VIRTIO mirroring (OVS default
> Implementation) not this design.

No, this is with your mirroring solution. See the green bar in the
benchmark results in the paper.

>>>>>
>>>>> +
>>>>> +    while (unlikely (pkt_trans < match_count)) {
>>>>> +        rte_pktmbuf_free(pkt_buf[pkt_trans]);
>>>>> +        pkt_trans++;
>>>>> +    }
>>>>> +
>>>>> +    return nb_pkts;
>>>>> +}
>>>>> +
>>
>> ...
>>
>>>>> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
>>>>> index 4597a215d..fd2049a7f 100644
>>>>> --- a/vswitchd/vswitch.xml
>>>>> +++ b/vswitchd/vswitch.xml
>>>>> @@ -4869,11 +4869,35 @@ ovs-vsctl add-port br0 p0 -- set Interface p0
>>>> type=patch options:peer=p1 \
>>>>>          selected VLANs.
>>>>>        </p>
>>>>>
>>>>> +      <column name="mirror_tunnel_addr">
>>>>> +        BDF string of the tunnel device on which mirrored traffic will be
>>>>> +        transmitted.
>>>>> +      </column>
>>>>> +
>>>>>        <column name="select_all">
>>>>>          If true, every packet arriving or departing on any port is
>>>>>          selected for mirroring.
>>>>>        </column>
>>>>>
>>>>> +      <column name="mirror_offload">
>>>>> +        If true, a hw-assisted port mirroring is configured instead
>>>>> +        default mirroring.
>>>>> +      </column>
>>>>> +
>>>>> +      <column name="flow_src_mac">
>>>>> +        The source MAC address(es) for per-flow mirroring. Each MAC
>>>>> +        address is separate by ','. This parametr is paired with
>>>>> +        select_dst_port. A '0' MAC address indicates the requested mirror
>>>>> +        is a per-port mirroring, otherwise it's a per-flow mirroring
>>>>> +      </column>
>>>>> +
>>>>> +      <column name="flow_dst_mac">
>>>>> +        The destination MAC address(es) for per-flow mirroring. Each MAC
>>>>> +        address is separate by ','. This parametr is paired with
>>>>> +        select_src_port. A '0' MAC address indicates the requested mirror
>>>>> +        is a per-port mirroring, otherwise it's a per-flow mirroring
>>>>> +      </column>
>>>>> +
>>>>>        <column name="select_dst_port">
>>>>>          Ports on which departing packets are selected for mirroring.
>>>>>        </column>
>>>>> @@ -4955,6 +4979,32 @@ ovs-vsctl add-port br0 p0 -- set Interface p0
>>>> type=patch options:peer=p1 \
>>>>>          </p>
>>>>>        </column>
>>>>>
>>>>> +      <column name="output_src_vlan">
>>>>> +        <p>Output VLAN for selected source port packets, if
>> nonempty.</p>
>>>>> +        <p>
>>>>> +          <em>Please note:</em> This is different than
>>>>> +          <ref column="output-vlan"/> This vlan is used to add an additional
>>>>> +          vlan tag on the mirror traffic, regardless it contains vlan or not.
>>>>> +          The receive end could choose to filter out this additional vlan.
>>>>> +          This option is provided so the mirrored traffic could maintain its
>>>>> +          original vlan informaiton, and this mirror can be used to filter
>>>>> +          out un-wanted traffic such as in <ref column="mirror_offload"/>.
>>>>> +        </p>
>>>>> +      </column>
>>>>> +
>>>>> +      <column name="output_dst_vlan">
>>>>> +        <p>Output VLAN for selected destination port packets, if
>>>> nonempty.</p>
>>>>> +        <p>
>>>>> +          <em>Please note:</em> This is different than
>>>>> +          <ref column="output-vlan"/> This vlan is used to add an additional
>>>>> +          vlan tag on the mirror traffic, regardless it contains vlan or not.
>>>>> +          The receive end could choose to filter out this additional vlan.
>>>>> +          This option is provided so the mirrored traffic could maintain its
>>>>> +          original vlan informaiton, and this mirror cab be used to filter
>>>>> +          out un-wanted traffic such as in <ref column="mirror_offload"/>.
>>>>> +        </p>
>>>>> +      </column>
>>>>> +
>>>>>        <column name="snaplen">
>>>>>          <p>Maximum per-packet number of bytes to mirror.</p>
>>>>>          <p>A mirrored packet with size larger than <ref
>> column="snaplen"/>
>>>>>
>>>
>>
>> These parameters are DPDK specific, but nothing mentions it. It would
>> confuse the OVS-Kernel users.
>>
> 
> The current implementation only supports OVS-DPDK.

Yes, that is my point. It is not mentioned in the documentation that it
is DPDK specific, and we should not expect the user to dive into the
code to understand it is not supported with OVS Kernel.

>> Regards,
>> Maxime
>
Wang, Liang-min May 19, 2021, 2:17 p.m. UTC | #7
> -----Original Message-----
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> Sent: Wednesday, May 19, 2021 8:50 AM
> To: Wang, Liang-min <liang-min.wang@intel.com>; Miskell, Timothy
> <timothy.miskell@intel.com>; dev@openvswitch.org
> Subject: Re: [PATCH] Extends the existing mirror configuration parameters
> 
> 
> 
> On 5/19/21 1:53 PM, Wang, Liang-min wrote:
> >> -----Original Message-----
> >> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> >> Sent: Wednesday, May 19, 2021 3:56 AM
> >> To: Wang, Liang-min <liang-min.wang@intel.com>; Miskell, Timothy
> >> <timothy.miskell@intel.com>; dev@openvswitch.org
> >> Subject: Re: [PATCH] Extends the existing mirror configuration
> parameters
> >>
> >> Hi Liang-min,
> >>
> >> When replying inline, please do not prefix with ">>" as it is handled as
> >> quoted text. There is no need to prefix.
> >>
> >> On 5/18/21 8:00 PM, Wang, Liang-min wrote:
> >>>> -----Original Message-----
> >>>> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> >>>> Sent: Tuesday, May 18, 2021 12:15 PM
> >>>> To: Miskell, Timothy <timothy.miskell@intel.com>;
> dev@openvswitch.org
> >>>> Cc: Wang, Liang-min <liang-min.wang@intel.com>
> >>>> Subject: Re: [PATCH] Extends the existing mirror configuration
> >> parameters
> >>>>
> >>>> Hi Timothy, Liang-min,
> >>>>
> >>>> Thanks for rebasing the patch.
> >>>> A list of delta against the first RFC could help the reviewers.
> >>>> I notice one change in the right direction is the conversion to Vhost
> >>>> API datapath instead of Vhost PMD.
> >>>>
> >>>>> In this patch we support both Vhost API and Vhost PMD because OVS
> >> supports both
> >>>>> VhostUser and Vdev ports.
> >>>>>
> >>>> Also, I would suggest to have the patch split in several incremental
> >>>> patches to ease the review.
> >>>>
> >>>>> Thank you for suggestion. We will provide incremental patches on
> next
> >> submission
> >>>>>
> >>>> On 5/10/21 6:00 PM, Timothy Miskell wrote:
> >>>>> From: Liang-min Wang <liang-min.wang@intel.com>
> >>>>>
> >>>>> The following parameters are added:
> >>>>>  - mirror-offload: to turn on/off mirror offloading.
> >>>>>  - output-port-name: specify a port, using name string, that is on a
> >> different
> >>>>>    bridge
> >>>>>  - output-src-vlan: output port vlan for each select-src-port.
> >>>>>  - output-dst-vlan: output port vlan for each select-dst-port.
> >>>>>  - flow-src-mac: use src mac address of each select-dst-port for the
> >> header
> >>>>>    scan.
> >>>>>  - flow-dst-mac: use dst mac address of each select-src-port for the
> >> header
> >>>>>    scan.
> >>>>>  - mirror-tunnel-addr: BDF string of the tunnel device.
> >>>>>
> >>>>> ovs-vsctl test change because new mirroring parameters are
> introduced
> >> in
> >>>> this patch
> >>>>
> >>>> It would help to provide examples of usage of these new parameters.
> >>>>
> >>>>> Will add examples in the new patches
> >>>>>
> >>>>> Create a defer procedure call thread to handle all mirror offload
> >> requests.
> >>>>> This is a light-weight thread which remains in sleep-state when there
> is
> >> no
> >>>> new request.
> >>>>> This is created between ovs-vsctl and mirror offloading back end
> >>>>>
> >>>>> Implementing DPDK tx-burst (VIRTIO ingress traffic
> >>>>> mirror) and rx-burst (VIRTIO egress traffic mirror) callbacks.
> >>>>> Each callback  functions implement the following tasks:
> >>>>>  1. Enable per-packet VLAN insertion
> >>>>>    - for port mirroring, all packets are enabled per-packet VLAN
> insertion.
> >>>>>    - for flow mirroring, only packet header matches the required mac
> >>>> address
> >>>>>      are enabled.
> >>>>>  2. Sending the packets to the specified transport port (output-port in
> >>>>>     mirror offload configuration)
> >>>>>    - for port mirroring, all packets are sent to the transport port.
> >>>>>    - for flow mirroring, only matched packets are sent.
> >>>>>  3. Restore each packet attributes (remove DPDK per-packet offload
> >> flag)
> >>>>
> >>>> I will for sure have more questions later, but please find a few
> >>>> comments/questions below:
> >>>>
> >>>>> Signed-off-by: Liang-min Wang <liang-min.wang@intel.com>
> >>>>> Tested-by: Timothy Miskell <Timothy.Miskell@intel.com>
> >>>>> Suggested-by: Munish Mehan <mm6021@att.com>
> >>>>> ---
> >>>>>  lib/automake.mk            |   2 +
> >>>>>  lib/netdev-dpdk-mirror.c   | 516
> >>>> +++++++++++++++++++++++++++++++++++++
> >>>>>  lib/netdev-dpdk-mirror.h   |  83 ++++++
> >>>>>  lib/netdev-dpdk.c          | 397 ++++++++++++++++++++++++++++
> >>>>>  lib/netdev-provider.h      |  16 ++
> >>>>>  lib/netdev.c               | 386 +++++++++++++++++++++++++++
> >>>>>  lib/netdev.h               |  16 ++
> >>>>>  tests/ovs-vsctl.at         |   2 +
> >>>>>  vswitchd/bridge.c          | 271 ++++++++++++++++++-
> >>>>>  vswitchd/vswitch.ovsschema |  24 +-
> >>>>>  vswitchd/vswitch.xml       |  50 ++++
> >>>>>  11 files changed, 1759 insertions(+), 4 deletions(-)
> >>>>>  create mode 100644 lib/netdev-dpdk-mirror.c
> >>>>>  create mode 100644 lib/netdev-dpdk-mirror.h
> >>>>>
> >>
> >> ...
> >>
> >>>>> +
> >>>>> +/* port/flow mirror traffic processors */
> >>>>> +static inline uint16_t
> >>>>> +netdev_custom_mirror_offload_cb(uint16_t qidx, struct rte_mbuf
> >>>> **pkts,
> >>>>> +    uint16_t nb_pkts, void *user_params)
> >>>>> +{
> >>>>> +    struct mirror_param *data = user_params;
> >>>>> +    uint16_t i, dst_qidx, match_count = 0;
> >>>>> +    uint16_t pkt_trans;
> >>>>> +    uint16_t dst_port_id = data->dst_port_id;
> >>>>> +    uint16_t dst_vlan_id = data->dst_vlan_id;
> >>>>> +    struct rte_mbuf **pkt_buf = &data->pkt_buf[qidx * data-
> >>>>> max_burst_size];
> >>>>> +
> >>>>> +    if (nb_pkts == 0) {
> >>>>> +        return 0;
> >>>>> +    }
> >>>>> +
> >>>>> +    if (nb_pkts > data->max_burst_size) {
> >>>>> +        VLOG_ERR("Per-flow batch size, %d, exceeds maximum limit\n",
> >>>> nb_pkts);
> >>>>> +        return 0;
> >>>>> +    }
> >>>>> +
> >>>>> +    for (i = 0; i < nb_pkts; i++) {
> >>>>> +        if (data->custom_scan(pkts[i], user_params)) {
> >>>>> +            pkt_buf[match_count] = pkts[i];
> >>>>> +            pkt_buf[match_count]->ol_flags |= PKT_TX_VLAN_PKT;
> >>>>
> >>>> Does it work if the packet already has a VLAN inserted?
> >>>>
> >>>>> Good catch. The design is based upon no VLAN insertion offloading is
> >> applied on source traffic.
> >>
> >> That is a significant limitation, I haven not seen it documented in the
> >> series. Also, the IEEE paper mentions using QinQ when the packet has
> >> already a VLAN inserted, but it is not in the series. Is there a reason
> >> QinQ is not possible?
> >>
> >> IIUC, that means the TAP VM would see traffic having no VLAN, whereas
> it
> >> has in reality?
> >>
> >
> > It does support 802.1q (single VLAN). I mis-understood your last email.
> > This design assumes there is no PKT_TX_VLAN_PKT flag set on the source
> mbuf.
> 
> I understood that assumption, and it seems OVS does not use VLAN offload
> currently so it should be OK (even though I would suggest adding a check
> that the flag isn't already set not to mess up with the regular traffic
> if one day it is added).
> 
> My question was:
> If you packet already has a VLAN tag *already* inserted (i.e. not to be
> inserted), won't the mirrored packet lose that information? I don't
> expect QinQ will be setup by the VF driver as
> 

With this design the mirrored traffic won't lose VLAN tag. 
For single VLAN packets, this design will add an outer VLAN tag.
The outer VLAN can be stripped if the destination VNF (vProbe) turn on VLAN stripping.
The existing DPDK library does support QinQ offload. BTW, this change is not visible in
source VNF (no need to change source VNF setting) because VLAN insertion
happens at mirror tunnel device.

> > This assumption is based upon that VIRTIO spec. doesn't support
> > per-packet VLAN insertion offloading.
> 
> I am not sure to understand this comment, but for Vhost PMD we do the
> VLAN insertion in SW:
> https://elixir.bootlin.com/dpdk/latest/source/drivers/net/vhost/rte_eth_vh
> ost.c#L447
> 

> I think this VLAN insertion in Vhost PMD is problematic twice with this
> series:
> 1. The offloaded VLAN insertion would be lost
> 2. rte_vlan_insert() will fail as it needs the mbuf refcount to be 1. It
> means the packet would be mirrored but would never reach its expected
> destination as dropped by the Vhost PMD.
> 

I see your pointes now. I agree with your assessment. We could either 
1.  support no Vhost PMD, or 2. check the offload flag at the run-time.

> >
> >>>>> +            pkt_buf[match_count]->vlan_tci = dst_vlan_id;
> >>>>> +            rte_mbuf_refcnt_update(pkt_buf[match_count], 1);
> >>>>
> >>>>
> >>>>
> >>>>> +            match_count++;
> >>>>> +        }
> >>>>> +    }
> >>>>> +
> >>>>> +    dst_qidx = (data->n_dst_queue > qidx)?qidx:(data->n_dst_queue
> -
> >> 1);
> >>>>
> >>>> Wouldn't it scale better with:
> >>>> dst_qidx = qidx % data->n_dst_queue
> >>>> ?
> >>>>
> >>>>> We tried to avoid using "%" operator. We could add "unlikely" and the
> >> suggested "%" to make improvement
> >>
> >> Not sure adding 'unlikely' is really necessary. The cost of the modulo
> >> operation is nothing compared to all we do in this path.
> >>
> >>>>>
> >>>>> +
> >>>>> +    rte_spinlock_lock(&data->locks[dst_qidx]);
> >>>>> +    pkt_trans = rte_eth_tx_burst(dst_port_id, dst_qidx, pkt_buf,
> >>>> match_count);
> >>>>> +    rte_spinlock_unlock(&data->locks[dst_qidx]);
> >>>>> +
> >>>>> +    for (i = 0; i < match_count; i++) {
> >>>>> +        pkt_buf[i]->ol_flags &= ~PKT_TX_VLAN_PKT;
> >>>>> +    }
> >>>>
> >>>> In order to further reduce the performance impact of mirroring, have
> you
> >>>> envisaged to offload it to dedicated PMD threads?
> >>>>
> >>>>> The mirror-tunnel design is a comprised approach between hardware
> >> TAP and software TAP.
> >>>>> The tunnel itself is designed to have very little impact on source traffic
> >> processing core. From
> >>>>> our benchmark on SR-IOV, L2 forwarding, mirroring, we only observed
> >> 10-20% impact on 64-byte packets, and
> >>>>> we did not observe impact when running traffic with packet size with
> >> 128-byte or above.
> >>
> >> The cover letter mentions 20-30% for 64B packets, and the IEEE paper
> >> seems to indicate a significant impact up to 512B packets (~25%?).
> >>
> >> Maybe the workload is different in these tests, that could explain why
> >> you don't see an impact starting 128B?
> >>
> >
> > I believe the 20-30% drop is for VIRTIO port mirroring not SR-IOV mirroring.
> 
> What you mean by SRIOV mirroring is ingress traffic on one VF is
> mirrored to another VF? Cannot it be done directly in HW?
> 

The SR-IOV performance number provided here because of prior ask for additional
PMD core for scaling.

> Regarding Virtio port mirroring, we can see in the benchmark results you
> presented at OVSCon 20 a significant impact for ingress mirroring for
> all packet size with your VLAN offload solution. I agree this is much
> better than with current mirroring solution, but there should be room
> for improvement.
> 
> > The 512B packet (~25%?) is for default VIRTIO mirroring (OVS default
> > Implementation) not this design.
> 
> No, this is with your mirroring solution. See the green bar in the
> benchmark results in the paper.
> 

I see the disparity here. The 25% is from the 1st IEEE paper where no mirror tunnel
mechanism is devised and there is no visible degradation on 512B packets over
our 2nd IEEE paper (expected to be published on June, https://im2021.ieee-im.org/)

> >>>>>
> >>>>> +
> >>>>> +    while (unlikely (pkt_trans < match_count)) {
> >>>>> +        rte_pktmbuf_free(pkt_buf[pkt_trans]);
> >>>>> +        pkt_trans++;
> >>>>> +    }
> >>>>> +
> >>>>> +    return nb_pkts;
> >>>>> +}
> >>>>> +
> >>
> >> ...
> >>
> >>>>> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
> >>>>> index 4597a215d..fd2049a7f 100644
> >>>>> --- a/vswitchd/vswitch.xml
> >>>>> +++ b/vswitchd/vswitch.xml
> >>>>> @@ -4869,11 +4869,35 @@ ovs-vsctl add-port br0 p0 -- set Interface
> p0
> >>>> type=patch options:peer=p1 \
> >>>>>          selected VLANs.
> >>>>>        </p>
> >>>>>
> >>>>> +      <column name="mirror_tunnel_addr">
> >>>>> +        BDF string of the tunnel device on which mirrored traffic will be
> >>>>> +        transmitted.
> >>>>> +      </column>
> >>>>> +
> >>>>>        <column name="select_all">
> >>>>>          If true, every packet arriving or departing on any port is
> >>>>>          selected for mirroring.
> >>>>>        </column>
> >>>>>
> >>>>> +      <column name="mirror_offload">
> >>>>> +        If true, a hw-assisted port mirroring is configured instead
> >>>>> +        default mirroring.
> >>>>> +      </column>
> >>>>> +
> >>>>> +      <column name="flow_src_mac">
> >>>>> +        The source MAC address(es) for per-flow mirroring. Each MAC
> >>>>> +        address is separate by ','. This parametr is paired with
> >>>>> +        select_dst_port. A '0' MAC address indicates the requested
> mirror
> >>>>> +        is a per-port mirroring, otherwise it's a per-flow mirroring
> >>>>> +      </column>
> >>>>> +
> >>>>> +      <column name="flow_dst_mac">
> >>>>> +        The destination MAC address(es) for per-flow mirroring. Each
> MAC
> >>>>> +        address is separate by ','. This parametr is paired with
> >>>>> +        select_src_port. A '0' MAC address indicates the requested
> mirror
> >>>>> +        is a per-port mirroring, otherwise it's a per-flow mirroring
> >>>>> +      </column>
> >>>>> +
> >>>>>        <column name="select_dst_port">
> >>>>>          Ports on which departing packets are selected for mirroring.
> >>>>>        </column>
> >>>>> @@ -4955,6 +4979,32 @@ ovs-vsctl add-port br0 p0 -- set Interface p0
> >>>> type=patch options:peer=p1 \
> >>>>>          </p>
> >>>>>        </column>
> >>>>>
> >>>>> +      <column name="output_src_vlan">
> >>>>> +        <p>Output VLAN for selected source port packets, if
> >> nonempty.</p>
> >>>>> +        <p>
> >>>>> +          <em>Please note:</em> This is different than
> >>>>> +          <ref column="output-vlan"/> This vlan is used to add an
> additional
> >>>>> +          vlan tag on the mirror traffic, regardless it contains vlan or not.
> >>>>> +          The receive end could choose to filter out this additional vlan.
> >>>>> +          This option is provided so the mirrored traffic could maintain its
> >>>>> +          original vlan informaiton, and this mirror can be used to filter
> >>>>> +          out un-wanted traffic such as in <ref
> column="mirror_offload"/>.
> >>>>> +        </p>
> >>>>> +      </column>
> >>>>> +
> >>>>> +      <column name="output_dst_vlan">
> >>>>> +        <p>Output VLAN for selected destination port packets, if
> >>>> nonempty.</p>
> >>>>> +        <p>
> >>>>> +          <em>Please note:</em> This is different than
> >>>>> +          <ref column="output-vlan"/> This vlan is used to add an
> additional
> >>>>> +          vlan tag on the mirror traffic, regardless it contains vlan or not.
> >>>>> +          The receive end could choose to filter out this additional vlan.
> >>>>> +          This option is provided so the mirrored traffic could maintain its
> >>>>> +          original vlan informaiton, and this mirror cab be used to filter
> >>>>> +          out un-wanted traffic such as in <ref
> column="mirror_offload"/>.
> >>>>> +        </p>
> >>>>> +      </column>
> >>>>> +
> >>>>>        <column name="snaplen">
> >>>>>          <p>Maximum per-packet number of bytes to mirror.</p>
> >>>>>          <p>A mirrored packet with size larger than <ref
> >> column="snaplen"/>
> >>>>>
> >>>
> >>
> >> These parameters are DPDK specific, but nothing mentions it. It would
> >> confuse the OVS-Kernel users.
> >>
> >
> > The current implementation only supports OVS-DPDK.
> 
> Yes, that is my point. It is not mentioned in the documentation that it
> is DPDK specific, and we should not expect the user to dive into the
> code to understand it is not supported with OVS Kernel.
> 

Got it. Will add this caveat in the document. Thanks.

> >> Regards,
> >> Maxime
> >
Maxime Coquelin May 19, 2021, 8:46 p.m. UTC | #8
On 5/19/21 4:17 PM, Wang, Liang-min wrote:
>> -----Original Message-----
>> From: Maxime Coquelin <maxime.coquelin@redhat.com>
>> Sent: Wednesday, May 19, 2021 8:50 AM
>> To: Wang, Liang-min <liang-min.wang@intel.com>; Miskell, Timothy
>> <timothy.miskell@intel.com>; dev@openvswitch.org
>> Subject: Re: [PATCH] Extends the existing mirror configuration parameters
>>
>>
>>
>> On 5/19/21 1:53 PM, Wang, Liang-min wrote:
>>>> -----Original Message-----
>>>> From: Maxime Coquelin <maxime.coquelin@redhat.com>
>>>> Sent: Wednesday, May 19, 2021 3:56 AM
>>>> To: Wang, Liang-min <liang-min.wang@intel.com>; Miskell, Timothy
>>>> <timothy.miskell@intel.com>; dev@openvswitch.org
>>>> Subject: Re: [PATCH] Extends the existing mirror configuration
>> parameters
>>>>
>>>> Hi Liang-min,
>>>>
>>>> When replying inline, please do not prefix with ">>" as it is handled as
>>>> quoted text. There is no need to prefix.
>>>>
>>>> On 5/18/21 8:00 PM, Wang, Liang-min wrote:
>>>>>> -----Original Message-----
>>>>>> From: Maxime Coquelin <maxime.coquelin@redhat.com>
>>>>>> Sent: Tuesday, May 18, 2021 12:15 PM
>>>>>> To: Miskell, Timothy <timothy.miskell@intel.com>;
>> dev@openvswitch.org
>>>>>> Cc: Wang, Liang-min <liang-min.wang@intel.com>
>>>>>> Subject: Re: [PATCH] Extends the existing mirror configuration
>>>> parameters
>>>>>>
>>>>>> Hi Timothy, Liang-min,
>>>>>>
>>>>>> Thanks for rebasing the patch.
>>>>>> A list of delta against the first RFC could help the reviewers.
>>>>>> I notice one change in the right direction is the conversion to Vhost
>>>>>> API datapath instead of Vhost PMD.
>>>>>>
>>>>>>> In this patch we support both Vhost API and Vhost PMD because OVS
>>>> supports both
>>>>>>> VhostUser and Vdev ports.
>>>>>>>
>>>>>> Also, I would suggest to have the patch split in several incremental
>>>>>> patches to ease the review.
>>>>>>
>>>>>>> Thank you for suggestion. We will provide incremental patches on
>> next
>>>> submission
>>>>>>>
>>>>>> On 5/10/21 6:00 PM, Timothy Miskell wrote:
>>>>>>> From: Liang-min Wang <liang-min.wang@intel.com>
>>>>>>>
>>>>>>> The following parameters are added:
>>>>>>>  - mirror-offload: to turn on/off mirror offloading.
>>>>>>>  - output-port-name: specify a port, using name string, that is on a
>>>> different
>>>>>>>    bridge
>>>>>>>  - output-src-vlan: output port vlan for each select-src-port.
>>>>>>>  - output-dst-vlan: output port vlan for each select-dst-port.
>>>>>>>  - flow-src-mac: use src mac address of each select-dst-port for the
>>>> header
>>>>>>>    scan.
>>>>>>>  - flow-dst-mac: use dst mac address of each select-src-port for the
>>>> header
>>>>>>>    scan.
>>>>>>>  - mirror-tunnel-addr: BDF string of the tunnel device.
>>>>>>>
>>>>>>> ovs-vsctl test change because new mirroring parameters are
>> introduced
>>>> in
>>>>>> this patch
>>>>>>
>>>>>> It would help to provide examples of usage of these new parameters.
>>>>>>
>>>>>>> Will add examples in the new patches
>>>>>>>
>>>>>>> Create a defer procedure call thread to handle all mirror offload
>>>> requests.
>>>>>>> This is a light-weight thread which remains in sleep-state when there
>> is
>>>> no
>>>>>> new request.
>>>>>>> This is created between ovs-vsctl and mirror offloading back end
>>>>>>>
>>>>>>> Implementing DPDK tx-burst (VIRTIO ingress traffic
>>>>>>> mirror) and rx-burst (VIRTIO egress traffic mirror) callbacks.
>>>>>>> Each callback  functions implement the following tasks:
>>>>>>>  1. Enable per-packet VLAN insertion
>>>>>>>    - for port mirroring, all packets are enabled per-packet VLAN
>> insertion.
>>>>>>>    - for flow mirroring, only packet header matches the required mac
>>>>>> address
>>>>>>>      are enabled.
>>>>>>>  2. Sending the packets to the specified transport port (output-port in
>>>>>>>     mirror offload configuration)
>>>>>>>    - for port mirroring, all packets are sent to the transport port.
>>>>>>>    - for flow mirroring, only matched packets are sent.
>>>>>>>  3. Restore each packet attributes (remove DPDK per-packet offload
>>>> flag)
>>>>>>
>>>>>> I will for sure have more questions later, but please find a few
>>>>>> comments/questions below:
>>>>>>
>>>>>>> Signed-off-by: Liang-min Wang <liang-min.wang@intel.com>
>>>>>>> Tested-by: Timothy Miskell <Timothy.Miskell@intel.com>
>>>>>>> Suggested-by: Munish Mehan <mm6021@att.com>
>>>>>>> ---
>>>>>>>  lib/automake.mk            |   2 +
>>>>>>>  lib/netdev-dpdk-mirror.c   | 516
>>>>>> +++++++++++++++++++++++++++++++++++++
>>>>>>>  lib/netdev-dpdk-mirror.h   |  83 ++++++
>>>>>>>  lib/netdev-dpdk.c          | 397 ++++++++++++++++++++++++++++
>>>>>>>  lib/netdev-provider.h      |  16 ++
>>>>>>>  lib/netdev.c               | 386 +++++++++++++++++++++++++++
>>>>>>>  lib/netdev.h               |  16 ++
>>>>>>>  tests/ovs-vsctl.at         |   2 +
>>>>>>>  vswitchd/bridge.c          | 271 ++++++++++++++++++-
>>>>>>>  vswitchd/vswitch.ovsschema |  24 +-
>>>>>>>  vswitchd/vswitch.xml       |  50 ++++
>>>>>>>  11 files changed, 1759 insertions(+), 4 deletions(-)
>>>>>>>  create mode 100644 lib/netdev-dpdk-mirror.c
>>>>>>>  create mode 100644 lib/netdev-dpdk-mirror.h
>>>>>>>
>>>>
>>>> ...
>>>>
>>>>>>> +
>>>>>>> +/* port/flow mirror traffic processors */
>>>>>>> +static inline uint16_t
>>>>>>> +netdev_custom_mirror_offload_cb(uint16_t qidx, struct rte_mbuf
>>>>>> **pkts,
>>>>>>> +    uint16_t nb_pkts, void *user_params)
>>>>>>> +{
>>>>>>> +    struct mirror_param *data = user_params;
>>>>>>> +    uint16_t i, dst_qidx, match_count = 0;
>>>>>>> +    uint16_t pkt_trans;
>>>>>>> +    uint16_t dst_port_id = data->dst_port_id;
>>>>>>> +    uint16_t dst_vlan_id = data->dst_vlan_id;
>>>>>>> +    struct rte_mbuf **pkt_buf = &data->pkt_buf[qidx * data-
>>>>>>> max_burst_size];
>>>>>>> +
>>>>>>> +    if (nb_pkts == 0) {
>>>>>>> +        return 0;
>>>>>>> +    }
>>>>>>> +
>>>>>>> +    if (nb_pkts > data->max_burst_size) {
>>>>>>> +        VLOG_ERR("Per-flow batch size, %d, exceeds maximum limit\n",
>>>>>> nb_pkts);
>>>>>>> +        return 0;
>>>>>>> +    }
>>>>>>> +
>>>>>>> +    for (i = 0; i < nb_pkts; i++) {
>>>>>>> +        if (data->custom_scan(pkts[i], user_params)) {
>>>>>>> +            pkt_buf[match_count] = pkts[i];
>>>>>>> +            pkt_buf[match_count]->ol_flags |= PKT_TX_VLAN_PKT;
>>>>>>
>>>>>> Does it work if the packet already has a VLAN inserted?
>>>>>>
>>>>>>> Good catch. The design is based upon no VLAN insertion offloading is
>>>> applied on source traffic.
>>>>
>>>> That is a significant limitation, I haven not seen it documented in the
>>>> series. Also, the IEEE paper mentions using QinQ when the packet has
>>>> already a VLAN inserted, but it is not in the series. Is there a reason
>>>> QinQ is not possible?
>>>>
>>>> IIUC, that means the TAP VM would see traffic having no VLAN, whereas
>> it
>>>> has in reality?
>>>>
>>>
>>> It does support 802.1q (single VLAN). I mis-understood your last email.
>>> This design assumes there is no PKT_TX_VLAN_PKT flag set on the source
>> mbuf.
>>
>> I understood that assumption, and it seems OVS does not use VLAN offload
>> currently so it should be OK (even though I would suggest adding a check
>> that the flag isn't already set not to mess up with the regular traffic
>> if one day it is added).
>>
>> My question was:
>> If you packet already has a VLAN tag *already* inserted (i.e. not to be
>> inserted), won't the mirrored packet lose that information? I don't
>> expect QinQ will be setup by the VF driver as
>>
> 
> With this design the mirrored traffic won't lose VLAN tag. 
> For single VLAN packets, this design will add an outer VLAN tag.
> The outer VLAN can be stripped if the destination VNF (vProbe) turn on VLAN stripping.
> The existing DPDK library does support QinQ offload. BTW, this change is not visible in
> source VNF (no need to change source VNF setting) because VLAN insertion
> happens at mirror tunnel device.

OK, I would have thought that setting PKT_TX_QINQ_PKT would be needed to
achieve that and also ensuring that the mirror port supports
DEV_TX_OFFLOAD_QINQ_INSERT.

> 
>>> This assumption is based upon that VIRTIO spec. doesn't support
>>> per-packet VLAN insertion offloading.
>>
>> I am not sure to understand this comment, but for Vhost PMD we do the
>> VLAN insertion in SW:
>> https://elixir.bootlin.com/dpdk/latest/source/drivers/net/vhost/rte_eth_vh
>> ost.c#L447
>>
> 
>> I think this VLAN insertion in Vhost PMD is problematic twice with this
>> series:
>> 1. The offloaded VLAN insertion would be lost
>> 2. rte_vlan_insert() will fail as it needs the mbuf refcount to be 1. It
>> means the packet would be mirrored but would never reach its expected
>> destination as dropped by the Vhost PMD.
>>
> 
> I see your pointes now. I agree with your assessment. We could either 
> 1.  support no Vhost PMD, or 2. check the offload flag at the run-time.

For 1, how would you differentiate between a Vhost PMD port and any
other PMD? Also, I think Vhost is not the only PMD doing VLAN insert in
SW.

I'm not sure to understand what you mean by checking the offload flag at
runtime. What would you do if the flag is set? Only solution I see would
be to do the insert in SW before doing the mirroring, but it would mean
an overhead for packets transmitted to a device that supports VLAN
offloading in HW.

> 
>>>
>>>>>>> +            pkt_buf[match_count]->vlan_tci = dst_vlan_id;
>>>>>>> +            rte_mbuf_refcnt_update(pkt_buf[match_count], 1);
>>>>>>
>>>>>>
>>>>>>
>>>>>>> +            match_count++;
>>>>>>> +        }
>>>>>>> +    }
>>>>>>> +
>>>>>>> +    dst_qidx = (data->n_dst_queue > qidx)?qidx:(data->n_dst_queue
>> -
>>>> 1);
>>>>>>
>>>>>> Wouldn't it scale better with:
>>>>>> dst_qidx = qidx % data->n_dst_queue
>>>>>> ?
>>>>>>
>>>>>>> We tried to avoid using "%" operator. We could add "unlikely" and the
>>>> suggested "%" to make improvement
>>>>
>>>> Not sure adding 'unlikely' is really necessary. The cost of the modulo
>>>> operation is nothing compared to all we do in this path.
>>>>
>>>>>>>
>>>>>>> +
>>>>>>> +    rte_spinlock_lock(&data->locks[dst_qidx]);
>>>>>>> +    pkt_trans = rte_eth_tx_burst(dst_port_id, dst_qidx, pkt_buf,
>>>>>> match_count);
>>>>>>> +    rte_spinlock_unlock(&data->locks[dst_qidx]);
>>>>>>> +
>>>>>>> +    for (i = 0; i < match_count; i++) {
>>>>>>> +        pkt_buf[i]->ol_flags &= ~PKT_TX_VLAN_PKT;
>>>>>>> +    }
>>>>>>
>>>>>> In order to further reduce the performance impact of mirroring, have
>> you
>>>>>> envisaged to offload it to dedicated PMD threads?
>>>>>>
>>>>>>> The mirror-tunnel design is a comprised approach between hardware
>>>> TAP and software TAP.
>>>>>>> The tunnel itself is designed to have very little impact on source traffic
>>>> processing core. From
>>>>>>> our benchmark on SR-IOV, L2 forwarding, mirroring, we only observed
>>>> 10-20% impact on 64-byte packets, and
>>>>>>> we did not observe impact when running traffic with packet size with
>>>> 128-byte or above.
>>>>
>>>> The cover letter mentions 20-30% for 64B packets, and the IEEE paper
>>>> seems to indicate a significant impact up to 512B packets (~25%?).
>>>>
>>>> Maybe the workload is different in these tests, that could explain why
>>>> you don't see an impact starting 128B?
>>>>
>>>
>>> I believe the 20-30% drop is for VIRTIO port mirroring not SR-IOV mirroring.
>>
>> What you mean by SRIOV mirroring is ingress traffic on one VF is
>> mirrored to another VF? Cannot it be done directly in HW?
>>
> 
> The SR-IOV performance number provided here because of prior ask for additional
> PMD core for scaling.

OK

>> Regarding Virtio port mirroring, we can see in the benchmark results you
>> presented at OVSCon 20 a significant impact for ingress mirroring for
>> all packet size with your VLAN offload solution. I agree this is much
>> better than with current mirroring solution, but there should be room
>> for improvement.
>>
>>> The 512B packet (~25%?) is for default VIRTIO mirroring (OVS default
>>> Implementation) not this design.
>>
>> No, this is with your mirroring solution. See the green bar in the
>> benchmark results in the paper.
>>
> 
> I see the disparity here. The 25% is from the 1st IEEE paper where no mirror tunnel
> mechanism is devised and there is no visible degradation on 512B packets over
> our 2nd IEEE paper (expected to be published on June, https://im2021.ieee-im.org/)

Now; I'm curious about what has changed in the design between the two
papers that would explain that? :) Because what the first paper presents
seems quite similar to what is implemented here.

>>>>>>>
>>>>>>> +
>>>>>>> +    while (unlikely (pkt_trans < match_count)) {
>>>>>>> +        rte_pktmbuf_free(pkt_buf[pkt_trans]);
>>>>>>> +        pkt_trans++;
>>>>>>> +    }
>>>>>>> +
>>>>>>> +    return nb_pkts;
>>>>>>> +}
>>>>>>> +
>>>>
>>>> ...
>>>>
>>>>>>> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
>>>>>>> index 4597a215d..fd2049a7f 100644
>>>>>>> --- a/vswitchd/vswitch.xml
>>>>>>> +++ b/vswitchd/vswitch.xml
>>>>>>> @@ -4869,11 +4869,35 @@ ovs-vsctl add-port br0 p0 -- set Interface
>> p0
>>>>>> type=patch options:peer=p1 \
>>>>>>>          selected VLANs.
>>>>>>>        </p>
>>>>>>>
>>>>>>> +      <column name="mirror_tunnel_addr">
>>>>>>> +        BDF string of the tunnel device on which mirrored traffic will be
>>>>>>> +        transmitted.
>>>>>>> +      </column>
>>>>>>> +
>>>>>>>        <column name="select_all">
>>>>>>>          If true, every packet arriving or departing on any port is
>>>>>>>          selected for mirroring.
>>>>>>>        </column>
>>>>>>>
>>>>>>> +      <column name="mirror_offload">
>>>>>>> +        If true, a hw-assisted port mirroring is configured instead
>>>>>>> +        default mirroring.
>>>>>>> +      </column>
>>>>>>> +
>>>>>>> +      <column name="flow_src_mac">
>>>>>>> +        The source MAC address(es) for per-flow mirroring. Each MAC
>>>>>>> +        address is separate by ','. This parametr is paired with
>>>>>>> +        select_dst_port. A '0' MAC address indicates the requested
>> mirror
>>>>>>> +        is a per-port mirroring, otherwise it's a per-flow mirroring
>>>>>>> +      </column>
>>>>>>> +
>>>>>>> +      <column name="flow_dst_mac">
>>>>>>> +        The destination MAC address(es) for per-flow mirroring. Each
>> MAC
>>>>>>> +        address is separate by ','. This parametr is paired with
>>>>>>> +        select_src_port. A '0' MAC address indicates the requested
>> mirror
>>>>>>> +        is a per-port mirroring, otherwise it's a per-flow mirroring
>>>>>>> +      </column>
>>>>>>> +
>>>>>>>        <column name="select_dst_port">
>>>>>>>          Ports on which departing packets are selected for mirroring.
>>>>>>>        </column>
>>>>>>> @@ -4955,6 +4979,32 @@ ovs-vsctl add-port br0 p0 -- set Interface p0
>>>>>> type=patch options:peer=p1 \
>>>>>>>          </p>
>>>>>>>        </column>
>>>>>>>
>>>>>>> +      <column name="output_src_vlan">
>>>>>>> +        <p>Output VLAN for selected source port packets, if
>>>> nonempty.</p>
>>>>>>> +        <p>
>>>>>>> +          <em>Please note:</em> This is different than
>>>>>>> +          <ref column="output-vlan"/> This vlan is used to add an
>> additional
>>>>>>> +          vlan tag on the mirror traffic, regardless it contains vlan or not.
>>>>>>> +          The receive end could choose to filter out this additional vlan.
>>>>>>> +          This option is provided so the mirrored traffic could maintain its
>>>>>>> +          original vlan informaiton, and this mirror can be used to filter
>>>>>>> +          out un-wanted traffic such as in <ref
>> column="mirror_offload"/>.
>>>>>>> +        </p>
>>>>>>> +      </column>
>>>>>>> +
>>>>>>> +      <column name="output_dst_vlan">
>>>>>>> +        <p>Output VLAN for selected destination port packets, if
>>>>>> nonempty.</p>
>>>>>>> +        <p>
>>>>>>> +          <em>Please note:</em> This is different than
>>>>>>> +          <ref column="output-vlan"/> This vlan is used to add an
>> additional
>>>>>>> +          vlan tag on the mirror traffic, regardless it contains vlan or not.
>>>>>>> +          The receive end could choose to filter out this additional vlan.
>>>>>>> +          This option is provided so the mirrored traffic could maintain its
>>>>>>> +          original vlan informaiton, and this mirror cab be used to filter
>>>>>>> +          out un-wanted traffic such as in <ref
>> column="mirror_offload"/>.
>>>>>>> +        </p>
>>>>>>> +      </column>
>>>>>>> +
>>>>>>>        <column name="snaplen">
>>>>>>>          <p>Maximum per-packet number of bytes to mirror.</p>
>>>>>>>          <p>A mirrored packet with size larger than <ref
>>>> column="snaplen"/>
>>>>>>>
>>>>>
>>>>
>>>> These parameters are DPDK specific, but nothing mentions it. It would
>>>> confuse the OVS-Kernel users.
>>>>
>>>
>>> The current implementation only supports OVS-DPDK.
>>
>> Yes, that is my point. It is not mentioned in the documentation that it
>> is DPDK specific, and we should not expect the user to dive into the
>> code to understand it is not supported with OVS Kernel.
>>
> 
> Got it. Will add this caveat in the document. Thanks.

Thanks!
Maxime
Wang, Liang-min May 19, 2021, 9:16 p.m. UTC | #9
> -----Original Message-----
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> Sent: Wednesday, May 19, 2021 4:47 PM
> To: Wang, Liang-min <liang-min.wang@intel.com>; Miskell, Timothy
> <timothy.miskell@intel.com>; dev@openvswitch.org
> Subject: Re: [PATCH] Extends the existing mirror configuration parameters
> 
> 
> 
> On 5/19/21 4:17 PM, Wang, Liang-min wrote:
> >> -----Original Message-----
> >> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> >> Sent: Wednesday, May 19, 2021 8:50 AM
> >> To: Wang, Liang-min <liang-min.wang@intel.com>; Miskell, Timothy
> >> <timothy.miskell@intel.com>; dev@openvswitch.org
> >> Subject: Re: [PATCH] Extends the existing mirror configuration
> parameters
> >>
> >>
> >>
> >> On 5/19/21 1:53 PM, Wang, Liang-min wrote:
> >>>> -----Original Message-----
> >>>> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> >>>> Sent: Wednesday, May 19, 2021 3:56 AM
> >>>> To: Wang, Liang-min <liang-min.wang@intel.com>; Miskell, Timothy
> >>>> <timothy.miskell@intel.com>; dev@openvswitch.org
> >>>> Subject: Re: [PATCH] Extends the existing mirror configuration
> >> parameters
> >>>>
> >>>> Hi Liang-min,
> >>>>
> >>>> When replying inline, please do not prefix with ">>" as it is handled as
> >>>> quoted text. There is no need to prefix.
> >>>>
> >>>> On 5/18/21 8:00 PM, Wang, Liang-min wrote:
> >>>>>> -----Original Message-----
> >>>>>> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> >>>>>> Sent: Tuesday, May 18, 2021 12:15 PM
> >>>>>> To: Miskell, Timothy <timothy.miskell@intel.com>;
> >> dev@openvswitch.org
> >>>>>> Cc: Wang, Liang-min <liang-min.wang@intel.com>
> >>>>>> Subject: Re: [PATCH] Extends the existing mirror configuration
> >>>> parameters
> >>>>>>
> >>>>>> Hi Timothy, Liang-min,
> >>>>>>
> >>>>>> Thanks for rebasing the patch.
> >>>>>> A list of delta against the first RFC could help the reviewers.
> >>>>>> I notice one change in the right direction is the conversion to Vhost
> >>>>>> API datapath instead of Vhost PMD.
> >>>>>>
> >>>>>>> In this patch we support both Vhost API and Vhost PMD because
> OVS
> >>>> supports both
> >>>>>>> VhostUser and Vdev ports.
> >>>>>>>
> >>>>>> Also, I would suggest to have the patch split in several incremental
> >>>>>> patches to ease the review.
> >>>>>>
> >>>>>>> Thank you for suggestion. We will provide incremental patches on
> >> next
> >>>> submission
> >>>>>>>
> >>>>>> On 5/10/21 6:00 PM, Timothy Miskell wrote:
> >>>>>>> From: Liang-min Wang <liang-min.wang@intel.com>
> >>>>>>>
> >>>>>>> The following parameters are added:
> >>>>>>>  - mirror-offload: to turn on/off mirror offloading.
> >>>>>>>  - output-port-name: specify a port, using name string, that is on a
> >>>> different
> >>>>>>>    bridge
> >>>>>>>  - output-src-vlan: output port vlan for each select-src-port.
> >>>>>>>  - output-dst-vlan: output port vlan for each select-dst-port.
> >>>>>>>  - flow-src-mac: use src mac address of each select-dst-port for the
> >>>> header
> >>>>>>>    scan.
> >>>>>>>  - flow-dst-mac: use dst mac address of each select-src-port for the
> >>>> header
> >>>>>>>    scan.
> >>>>>>>  - mirror-tunnel-addr: BDF string of the tunnel device.
> >>>>>>>
> >>>>>>> ovs-vsctl test change because new mirroring parameters are
> >> introduced
> >>>> in
> >>>>>> this patch
> >>>>>>
> >>>>>> It would help to provide examples of usage of these new
> parameters.
> >>>>>>
> >>>>>>> Will add examples in the new patches
> >>>>>>>
> >>>>>>> Create a defer procedure call thread to handle all mirror offload
> >>>> requests.
> >>>>>>> This is a light-weight thread which remains in sleep-state when
> there
> >> is
> >>>> no
> >>>>>> new request.
> >>>>>>> This is created between ovs-vsctl and mirror offloading back end
> >>>>>>>
> >>>>>>> Implementing DPDK tx-burst (VIRTIO ingress traffic
> >>>>>>> mirror) and rx-burst (VIRTIO egress traffic mirror) callbacks.
> >>>>>>> Each callback  functions implement the following tasks:
> >>>>>>>  1. Enable per-packet VLAN insertion
> >>>>>>>    - for port mirroring, all packets are enabled per-packet VLAN
> >> insertion.
> >>>>>>>    - for flow mirroring, only packet header matches the required mac
> >>>>>> address
> >>>>>>>      are enabled.
> >>>>>>>  2. Sending the packets to the specified transport port (output-port
> in
> >>>>>>>     mirror offload configuration)
> >>>>>>>    - for port mirroring, all packets are sent to the transport port.
> >>>>>>>    - for flow mirroring, only matched packets are sent.
> >>>>>>>  3. Restore each packet attributes (remove DPDK per-packet
> offload
> >>>> flag)
> >>>>>>
> >>>>>> I will for sure have more questions later, but please find a few
> >>>>>> comments/questions below:
> >>>>>>
> >>>>>>> Signed-off-by: Liang-min Wang <liang-min.wang@intel.com>
> >>>>>>> Tested-by: Timothy Miskell <Timothy.Miskell@intel.com>
> >>>>>>> Suggested-by: Munish Mehan <mm6021@att.com>
> >>>>>>> ---
> >>>>>>>  lib/automake.mk            |   2 +
> >>>>>>>  lib/netdev-dpdk-mirror.c   | 516
> >>>>>> +++++++++++++++++++++++++++++++++++++
> >>>>>>>  lib/netdev-dpdk-mirror.h   |  83 ++++++
> >>>>>>>  lib/netdev-dpdk.c          | 397 ++++++++++++++++++++++++++++
> >>>>>>>  lib/netdev-provider.h      |  16 ++
> >>>>>>>  lib/netdev.c               | 386 +++++++++++++++++++++++++++
> >>>>>>>  lib/netdev.h               |  16 ++
> >>>>>>>  tests/ovs-vsctl.at         |   2 +
> >>>>>>>  vswitchd/bridge.c          | 271 ++++++++++++++++++-
> >>>>>>>  vswitchd/vswitch.ovsschema |  24 +-
> >>>>>>>  vswitchd/vswitch.xml       |  50 ++++
> >>>>>>>  11 files changed, 1759 insertions(+), 4 deletions(-)
> >>>>>>>  create mode 100644 lib/netdev-dpdk-mirror.c
> >>>>>>>  create mode 100644 lib/netdev-dpdk-mirror.h
> >>>>>>>
> >>>>
> >>>> ...
> >>>>
> >>>>>>> +
> >>>>>>> +/* port/flow mirror traffic processors */
> >>>>>>> +static inline uint16_t
> >>>>>>> +netdev_custom_mirror_offload_cb(uint16_t qidx, struct
> rte_mbuf
> >>>>>> **pkts,
> >>>>>>> +    uint16_t nb_pkts, void *user_params)
> >>>>>>> +{
> >>>>>>> +    struct mirror_param *data = user_params;
> >>>>>>> +    uint16_t i, dst_qidx, match_count = 0;
> >>>>>>> +    uint16_t pkt_trans;
> >>>>>>> +    uint16_t dst_port_id = data->dst_port_id;
> >>>>>>> +    uint16_t dst_vlan_id = data->dst_vlan_id;
> >>>>>>> +    struct rte_mbuf **pkt_buf = &data->pkt_buf[qidx * data-
> >>>>>>> max_burst_size];
> >>>>>>> +
> >>>>>>> +    if (nb_pkts == 0) {
> >>>>>>> +        return 0;
> >>>>>>> +    }
> >>>>>>> +
> >>>>>>> +    if (nb_pkts > data->max_burst_size) {
> >>>>>>> +        VLOG_ERR("Per-flow batch size, %d, exceeds maximum
> limit\n",
> >>>>>> nb_pkts);
> >>>>>>> +        return 0;
> >>>>>>> +    }
> >>>>>>> +
> >>>>>>> +    for (i = 0; i < nb_pkts; i++) {
> >>>>>>> +        if (data->custom_scan(pkts[i], user_params)) {
> >>>>>>> +            pkt_buf[match_count] = pkts[i];
> >>>>>>> +            pkt_buf[match_count]->ol_flags |= PKT_TX_VLAN_PKT;
> >>>>>>
> >>>>>> Does it work if the packet already has a VLAN inserted?
> >>>>>>
> >>>>>>> Good catch. The design is based upon no VLAN insertion offloading
> is
> >>>> applied on source traffic.
> >>>>
> >>>> That is a significant limitation, I haven not seen it documented in the
> >>>> series. Also, the IEEE paper mentions using QinQ when the packet has
> >>>> already a VLAN inserted, but it is not in the series. Is there a reason
> >>>> QinQ is not possible?
> >>>>
> >>>> IIUC, that means the TAP VM would see traffic having no VLAN,
> whereas
> >> it
> >>>> has in reality?
> >>>>
> >>>
> >>> It does support 802.1q (single VLAN). I mis-understood your last email.
> >>> This design assumes there is no PKT_TX_VLAN_PKT flag set on the
> source
> >> mbuf.
> >>
> >> I understood that assumption, and it seems OVS does not use VLAN
> offload
> >> currently so it should be OK (even though I would suggest adding a check
> >> that the flag isn't already set not to mess up with the regular traffic
> >> if one day it is added).
> >>
> >> My question was:
> >> If you packet already has a VLAN tag *already* inserted (i.e. not to be
> >> inserted), won't the mirrored packet lose that information? I don't
> >> expect QinQ will be setup by the VF driver as
> >>
> >
> > With this design the mirrored traffic won't lose VLAN tag.
> > For single VLAN packets, this design will add an outer VLAN tag.
> > The outer VLAN can be stripped if the destination VNF (vProbe) turn on
> VLAN stripping.
> > The existing DPDK library does support QinQ offload. BTW, this change is
> not visible in
> > source VNF (no need to change source VNF setting) because VLAN
> insertion
> > happens at mirror tunnel device.
> 
> OK, I would have thought that setting PKT_TX_QINQ_PKT would be needed
> to
> achieve that and also ensuring that the mirror port supports
> DEV_TX_OFFLOAD_QINQ_INSERT.
> 
> >
> >>> This assumption is based upon that VIRTIO spec. doesn't support
> >>> per-packet VLAN insertion offloading.
> >>
> >> I am not sure to understand this comment, but for Vhost PMD we do the
> >> VLAN insertion in SW:
> >>
> https://elixir.bootlin.com/dpdk/latest/source/drivers/net/vhost/rte_eth_vh
> >> ost.c#L447
> >>
> >
> >> I think this VLAN insertion in Vhost PMD is problematic twice with this
> >> series:
> >> 1. The offloaded VLAN insertion would be lost
> >> 2. rte_vlan_insert() will fail as it needs the mbuf refcount to be 1. It
> >> means the packet would be mirrored but would never reach its expected
> >> destination as dropped by the Vhost PMD.
> >>
> >
> > I see your pointes now. I agree with your assessment. We could either
> > 1.  support no Vhost PMD, or 2. check the offload flag at the run-time.
> 
> For 1, how would you differentiate between a Vhost PMD port and any
> other PMD? Also, I think Vhost is not the only PMD doing VLAN insert in
> SW.
> 
> I'm not sure to understand what you mean by checking the offload flag at
> runtime. What would you do if the flag is set? Only solution I see would
> be to do the insert in SW before doing the mirroring, but it would mean
> an overhead for packets transmitted to a device that supports VLAN
> offloading in HW.
> 
Below is the new change on the "software tapping". The new change uses pre-allocated VLAN buffer
to save VLAN-ID for each packet if "PKT_TX_VLAN_PK" is enabled in source traffic, and restore the
overwritten meta data, vlan_tci, if it's necessary.
...
    memset(data->tag_buf, 0, nb_pkts * sizeof(uint16_t));
    for (i = 0; i < nb_pkts; i++) {
        if (unlikely(pks[i]->ol_flags & PKT_TX_VLAN_PKT)) {
            data->tag_buf[i] = pkts[i]->vlan_tci;
        }
        pkts[i]->ol_flags |= PKT_TX_VLAN_PKT;
        pkts[i]->vlan_tci = dst_vlan_id;
        rte_mbuf_refcnt_update(pkts[i], 1);
    }

    dst_qidx = data->queue_map[qidx];

    rte_spinlock_lock(&data->locks[dst_qidx]);
    pkt_trans = rte_eth_tx_burst(dst_port_id, dst_qidx, pkts, nb_pkts);
    rte_spinlock_unlock(&data->locks[dst_qidx]);

    for (i = 0; i < nb_pkts; i++) {
        if (unlikely(data->tag_buf[i])) {
            pkts[i]->vlan_tci = data->tag_buf[i];
        } else {
            pkts[i]->ol_flags &= ~PKT_TX_VLAN_PKT;
            pkts[i]->vlan_tci = 0;
        }
    }

> >
> >>>
> >>>>>>> +            pkt_buf[match_count]->vlan_tci = dst_vlan_id;
> >>>>>>> +            rte_mbuf_refcnt_update(pkt_buf[match_count], 1);
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>> +            match_count++;
> >>>>>>> +        }
> >>>>>>> +    }
> >>>>>>> +
> >>>>>>> +    dst_qidx = (data->n_dst_queue > qidx)?qidx:(data-
> >n_dst_queue
> >> -
> >>>> 1);
> >>>>>>
> >>>>>> Wouldn't it scale better with:
> >>>>>> dst_qidx = qidx % data->n_dst_queue
> >>>>>> ?
> >>>>>>
> >>>>>>> We tried to avoid using "%" operator. We could add "unlikely" and
> the
> >>>> suggested "%" to make improvement
> >>>>
> >>>> Not sure adding 'unlikely' is really necessary. The cost of the modulo
> >>>> operation is nothing compared to all we do in this path.
> >>>>
> >>>>>>>
> >>>>>>> +
> >>>>>>> +    rte_spinlock_lock(&data->locks[dst_qidx]);
> >>>>>>> +    pkt_trans = rte_eth_tx_burst(dst_port_id, dst_qidx, pkt_buf,
> >>>>>> match_count);
> >>>>>>> +    rte_spinlock_unlock(&data->locks[dst_qidx]);
> >>>>>>> +
> >>>>>>> +    for (i = 0; i < match_count; i++) {
> >>>>>>> +        pkt_buf[i]->ol_flags &= ~PKT_TX_VLAN_PKT;
> >>>>>>> +    }
> >>>>>>
> >>>>>> In order to further reduce the performance impact of mirroring,
> have
> >> you
> >>>>>> envisaged to offload it to dedicated PMD threads?
> >>>>>>
> >>>>>>> The mirror-tunnel design is a comprised approach between
> hardware
> >>>> TAP and software TAP.
> >>>>>>> The tunnel itself is designed to have very little impact on source
> traffic
> >>>> processing core. From
> >>>>>>> our benchmark on SR-IOV, L2 forwarding, mirroring, we only
> observed
> >>>> 10-20% impact on 64-byte packets, and
> >>>>>>> we did not observe impact when running traffic with packet size
> with
> >>>> 128-byte or above.
> >>>>
> >>>> The cover letter mentions 20-30% for 64B packets, and the IEEE paper
> >>>> seems to indicate a significant impact up to 512B packets (~25%?).
> >>>>
> >>>> Maybe the workload is different in these tests, that could explain why
> >>>> you don't see an impact starting 128B?
> >>>>
> >>>
> >>> I believe the 20-30% drop is for VIRTIO port mirroring not SR-IOV
> mirroring.
> >>
> >> What you mean by SRIOV mirroring is ingress traffic on one VF is
> >> mirrored to another VF? Cannot it be done directly in HW?
> >>
> >
> > The SR-IOV performance number provided here because of prior ask for
> additional
> > PMD core for scaling.
> 
> OK
> 
> >> Regarding Virtio port mirroring, we can see in the benchmark results you
> >> presented at OVSCon 20 a significant impact for ingress mirroring for
> >> all packet size with your VLAN offload solution. I agree this is much
> >> better than with current mirroring solution, but there should be room
> >> for improvement.
> >>
> >>> The 512B packet (~25%?) is for default VIRTIO mirroring (OVS default
> >>> Implementation) not this design.
> >>
> >> No, this is with your mirroring solution. See the green bar in the
> >> benchmark results in the paper.
> >>
> >
> > I see the disparity here. The 25% is from the 1st IEEE paper where no mirror
> tunnel
> > mechanism is devised and there is no visible degradation on 512B packets
> over
> > our 2nd IEEE paper (expected to be published on June,
> https://im2021.ieee-im.org/)
> 
> Now; I'm curious about what has changed in the design between the two
> papers that would explain that? :) Because what the first paper presents
> seems quite similar to what is implemented here.
> 

The major difference between this upstream and last upstream (RFC) is
the new mirror tunnel apparatus. With mirror tunnel design,
1. The source traffic device would be using vector PMD instead of scalar
PMD because of the VLAN insertion offloading. With mirror tunnel design,
the VLAN insertion offloading is turned on only on mirror tunnel device.
2. The mirror tunnel is a dedicated device used only for transmitting
mirrored traffic (Tx only). Therefore, we enable Tx loopback on
mirror tunnel device which saves PCIe BW -- improving throughput when
running high throughput traffic for benchmark.

> >>>>>>>
> >>>>>>> +
> >>>>>>> +    while (unlikely (pkt_trans < match_count)) {
> >>>>>>> +        rte_pktmbuf_free(pkt_buf[pkt_trans]);
> >>>>>>> +        pkt_trans++;
> >>>>>>> +    }
> >>>>>>> +
> >>>>>>> +    return nb_pkts;
> >>>>>>> +}
> >>>>>>> +
> >>>>
> >>>> ...
> >>>>
> >>>>>>> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
> >>>>>>> index 4597a215d..fd2049a7f 100644
> >>>>>>> --- a/vswitchd/vswitch.xml
> >>>>>>> +++ b/vswitchd/vswitch.xml
> >>>>>>> @@ -4869,11 +4869,35 @@ ovs-vsctl add-port br0 p0 -- set
> Interface
> >> p0
> >>>>>> type=patch options:peer=p1 \
> >>>>>>>          selected VLANs.
> >>>>>>>        </p>
> >>>>>>>
> >>>>>>> +      <column name="mirror_tunnel_addr">
> >>>>>>> +        BDF string of the tunnel device on which mirrored traffic will
> be
> >>>>>>> +        transmitted.
> >>>>>>> +      </column>
> >>>>>>> +
> >>>>>>>        <column name="select_all">
> >>>>>>>          If true, every packet arriving or departing on any port is
> >>>>>>>          selected for mirroring.
> >>>>>>>        </column>
> >>>>>>>
> >>>>>>> +      <column name="mirror_offload">
> >>>>>>> +        If true, a hw-assisted port mirroring is configured instead
> >>>>>>> +        default mirroring.
> >>>>>>> +      </column>
> >>>>>>> +
> >>>>>>> +      <column name="flow_src_mac">
> >>>>>>> +        The source MAC address(es) for per-flow mirroring. Each MAC
> >>>>>>> +        address is separate by ','. This parametr is paired with
> >>>>>>> +        select_dst_port. A '0' MAC address indicates the requested
> >> mirror
> >>>>>>> +        is a per-port mirroring, otherwise it's a per-flow mirroring
> >>>>>>> +      </column>
> >>>>>>> +
> >>>>>>> +      <column name="flow_dst_mac">
> >>>>>>> +        The destination MAC address(es) for per-flow mirroring. Each
> >> MAC
> >>>>>>> +        address is separate by ','. This parametr is paired with
> >>>>>>> +        select_src_port. A '0' MAC address indicates the requested
> >> mirror
> >>>>>>> +        is a per-port mirroring, otherwise it's a per-flow mirroring
> >>>>>>> +      </column>
> >>>>>>> +
> >>>>>>>        <column name="select_dst_port">
> >>>>>>>          Ports on which departing packets are selected for mirroring.
> >>>>>>>        </column>
> >>>>>>> @@ -4955,6 +4979,32 @@ ovs-vsctl add-port br0 p0 -- set Interface
> p0
> >>>>>> type=patch options:peer=p1 \
> >>>>>>>          </p>
> >>>>>>>        </column>
> >>>>>>>
> >>>>>>> +      <column name="output_src_vlan">
> >>>>>>> +        <p>Output VLAN for selected source port packets, if
> >>>> nonempty.</p>
> >>>>>>> +        <p>
> >>>>>>> +          <em>Please note:</em> This is different than
> >>>>>>> +          <ref column="output-vlan"/> This vlan is used to add an
> >> additional
> >>>>>>> +          vlan tag on the mirror traffic, regardless it contains vlan or
> not.
> >>>>>>> +          The receive end could choose to filter out this additional vlan.
> >>>>>>> +          This option is provided so the mirrored traffic could maintain
> its
> >>>>>>> +          original vlan informaiton, and this mirror can be used to filter
> >>>>>>> +          out un-wanted traffic such as in <ref
> >> column="mirror_offload"/>.
> >>>>>>> +        </p>
> >>>>>>> +      </column>
> >>>>>>> +
> >>>>>>> +      <column name="output_dst_vlan">
> >>>>>>> +        <p>Output VLAN for selected destination port packets, if
> >>>>>> nonempty.</p>
> >>>>>>> +        <p>
> >>>>>>> +          <em>Please note:</em> This is different than
> >>>>>>> +          <ref column="output-vlan"/> This vlan is used to add an
> >> additional
> >>>>>>> +          vlan tag on the mirror traffic, regardless it contains vlan or
> not.
> >>>>>>> +          The receive end could choose to filter out this additional vlan.
> >>>>>>> +          This option is provided so the mirrored traffic could maintain
> its
> >>>>>>> +          original vlan informaiton, and this mirror cab be used to filter
> >>>>>>> +          out un-wanted traffic such as in <ref
> >> column="mirror_offload"/>.
> >>>>>>> +        </p>
> >>>>>>> +      </column>
> >>>>>>> +
> >>>>>>>        <column name="snaplen">
> >>>>>>>          <p>Maximum per-packet number of bytes to mirror.</p>
> >>>>>>>          <p>A mirrored packet with size larger than <ref
> >>>> column="snaplen"/>
> >>>>>>>
> >>>>>
> >>>>
> >>>> These parameters are DPDK specific, but nothing mentions it. It would
> >>>> confuse the OVS-Kernel users.
> >>>>
> >>>
> >>> The current implementation only supports OVS-DPDK.
> >>
> >> Yes, that is my point. It is not mentioned in the documentation that it
> >> is DPDK specific, and we should not expect the user to dive into the
> >> code to understand it is not supported with OVS Kernel.
> >>
> >
> > Got it. Will add this caveat in the document. Thanks.
> 
> Thanks!
> Maxime
Gaetan Rivet May 23, 2021, 11:26 p.m. UTC | #10
On Wed, May 19, 2021, at 09:55, Maxime Coquelin wrote:
> Hi Liang-min,
> 
> When replying inline, please do not prefix with ">>" as it is handled as
> quoted text. There is no need to prefix.
> 
> On 5/18/21 8:00 PM, Wang, Liang-min wrote:
> >> -----Original Message-----
> >> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> >> Sent: Tuesday, May 18, 2021 12:15 PM
> >> To: Miskell, Timothy <timothy.miskell@intel.com>; dev@openvswitch.org
> >> Cc: Wang, Liang-min <liang-min.wang@intel.com>
> >> Subject: Re: [PATCH] Extends the existing mirror configuration parameters
> >>
> [...]
> >>> +            pkt_buf[match_count]->vlan_tci = dst_vlan_id;
> >>> +            rte_mbuf_refcnt_update(pkt_buf[match_count], 1);
> >>
> >>
> >>
> >>> +            match_count++;
> >>> +        }
> >>> +    }
> >>> +
> >>> +    dst_qidx = (data->n_dst_queue > qidx)?qidx:(data->n_dst_queue -1);
> >>
> >> Wouldn't it scale better with:
> >> dst_qidx = qidx % data->n_dst_queue
> >> ?
> >>
> >>> We tried to avoid using "%" operator. We could add "unlikely" and the suggested "%" to make improvement
> 
> Not sure adding 'unlikely' is really necessary. The cost of the modulo
> operation is nothing compared to all we do in this path.
> 

Hi,

Although the modulo might well be nothing compared to the rest,
an alternative is to use Lemire's fastrange: https://github.com/lemire/fastrange
Here is the uint32_t version:

/*
* Given a value "word", produces an integer in [0,p) without division.
* The function is as fair as possible in the sense that if you iterate
* through all possible values of "word", then you will generate all
* possible outputs as uniformly as possible.
*/
static inline uint32_t
fastrange32(uint32_t word, uint32_t p) {
    return (uint32_t)(((uint64_t)word * (uint64_t)p) >> 32);
}

It should be as fair as the modulo.

Kind regards,
Maxime Coquelin May 25, 2021, 8:28 a.m. UTC | #11
Hi Gaetan,

On 5/24/21 1:26 AM, Gaëtan Rivet wrote:
> On Wed, May 19, 2021, at 09:55, Maxime Coquelin wrote:
>> Hi Liang-min,
>>
>> When replying inline, please do not prefix with ">>" as it is handled as
>> quoted text. There is no need to prefix.
>>
>> On 5/18/21 8:00 PM, Wang, Liang-min wrote:
>>>> -----Original Message-----
>>>> From: Maxime Coquelin <maxime.coquelin@redhat.com>
>>>> Sent: Tuesday, May 18, 2021 12:15 PM
>>>> To: Miskell, Timothy <timothy.miskell@intel.com>; dev@openvswitch.org
>>>> Cc: Wang, Liang-min <liang-min.wang@intel.com>
>>>> Subject: Re: [PATCH] Extends the existing mirror configuration parameters
>>>>
>> [...]
>>>>> +            pkt_buf[match_count]->vlan_tci = dst_vlan_id;
>>>>> +            rte_mbuf_refcnt_update(pkt_buf[match_count], 1);
>>>>
>>>>
>>>>
>>>>> +            match_count++;
>>>>> +        }
>>>>> +    }
>>>>> +
>>>>> +    dst_qidx = (data->n_dst_queue > qidx)?qidx:(data->n_dst_queue -1);
>>>>
>>>> Wouldn't it scale better with:
>>>> dst_qidx = qidx % data->n_dst_queue
>>>> ?
>>>>
>>>>> We tried to avoid using "%" operator. We could add "unlikely" and the suggested "%" to make improvement
>>
>> Not sure adding 'unlikely' is really necessary. The cost of the modulo
>> operation is nothing compared to all we do in this path.
>>
> 
> Hi,
> 
> Although the modulo might well be nothing compared to the rest,
> an alternative is to use Lemire's fastrange: https://github.com/lemire/fastrange
> Here is the uint32_t version:
> 
> /*
> * Given a value "word", produces an integer in [0,p) without division.
> * The function is as fair as possible in the sense that if you iterate
> * through all possible values of "word", then you will generate all
> * possible outputs as uniformly as possible.
> */
> static inline uint32_t
> fastrange32(uint32_t word, uint32_t p) {
>     return (uint32_t)(((uint64_t)word * (uint64_t)p) >> 32);
> }
> 
> It should be as fair as the modulo.

Interresting, note that modulo is used to distribute on Vhost Tx queues:
https://github.com/openvswitch/ovs/blob/13c0eaa7b4fc2694a8c6cc8e6487ec6538c607e4/lib/netdev-dpdk.c#L2601

Maxime

> Kind regards,
>
diff mbox series

Patch

diff --git a/lib/automake.mk b/lib/automake.mk
index 39901bd6d..dcafbfaca 100644
--- a/lib/automake.mk
+++ b/lib/automake.mk
@@ -170,6 +170,7 @@  lib_libopenvswitch_la_SOURCES = \
 	lib/multipath.h \
 	lib/namemap.c \
 	lib/netdev-dpdk.h \
+	lib/netdev-dpdk-mirror.h \
 	lib/netdev-dummy.c \
 	lib/netdev-offload.c \
 	lib/netdev-offload.h \
@@ -460,6 +461,7 @@  if DPDK_NETDEV
 lib_libopenvswitch_la_SOURCES += \
 	lib/dpdk.c \
 	lib/netdev-dpdk.c \
+	lib/netdev-dpdk-mirror.c \
 	lib/netdev-offload-dpdk.c
 else
 lib_libopenvswitch_la_SOURCES += \
diff --git a/lib/netdev-dpdk-mirror.c b/lib/netdev-dpdk-mirror.c
new file mode 100644
index 000000000..ff2701660
--- /dev/null
+++ b/lib/netdev-dpdk-mirror.c
@@ -0,0 +1,516 @@ 
+/*
+ * Copyright (c) 2014, 2015, 2016, 2017 Nicira, Inc.
+ * Copyright (c) 2019 Mellanox Technologies, Ltd.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+#include <config.h>
+#include <rte_ethdev.h>
+
+#include "netdev-dpdk-mirror.h"
+#include "openvswitch/vlog.h"
+#include "openvswitch/dynamic-string.h"
+#include "util.h"
+
+#define MAC_ADDR_MAP           0x0000FFFFFFFFFFFFULL
+#define is_mac_addr_match(a,b) (((a^b)&MAC_ADDR_MAP) == 0)
+#define INIT_MIRROR_DB_SIZE    8
+#define INVALID_DEVICE_ID      0xFFFFFFFF
+
+VLOG_DEFINE_THIS_MODULE(netdev_dpdk_mirror);
+
+/* port/flow mirror database management routines */
+/*
+ * The below API is for port/flow mirror offloading which uses a different DPDK
+ * interface as rte-flow.
+ */
+static int mirror_port_db_size = 0;
+static int mirror_port_used = 0;
+static struct mirror_offload_port *mirror_port_db = NULL;
+
+static void
+netdev_mirror_db_init(struct mirror_offload_port *db, int size)
+{
+    int i;
+
+    for (i = 0; i < size; i++) {
+        db[i].dev_id = INVALID_DEVICE_ID;
+        memset(&db[i].rx, 0, sizeof(struct mirror_param));
+        memset(&db[i].tx, 0, sizeof(struct mirror_param));
+    }
+}
+
+/* Double the db size when it runs out of space */
+static int
+netdev_mirror_db_resize(void)
+{
+    int new_size = mirror_port_db_size << 1;
+    struct mirror_offload_port *new_db = xmalloc(
+        sizeof(struct mirror_offload_port)*new_size);
+
+    memcpy(new_db, mirror_port_db, sizeof(struct mirror_offload_port)
+        *mirror_port_db_size);
+    netdev_mirror_db_init(&new_db[mirror_port_db_size], mirror_port_db_size);
+    mirror_port_db_size = new_size;
+    mirror_port_db = new_db;
+
+    return 0;
+}
+
+
+static struct mirror_offload_port*
+netdev_mirror_data_find(uint32_t dev_id)
+{
+    int i;
+
+    if (mirror_port_db == NULL) {
+        return NULL;
+    }
+
+    for (i = 0; i < mirror_port_db_size; i++) {
+        if (dev_id == mirror_port_db[i].dev_id) {
+            return &mirror_port_db[i];
+        }
+    }
+    return NULL;
+}
+
+static struct mirror_offload_port*
+netdev_mirror_data_add(uint32_t dev_id, int tx,
+    struct mirror_param *new_param)
+{
+    struct mirror_offload_port *target = NULL;
+    int i;
+
+    if (!mirror_port_db) {
+        mirror_port_db_size = INIT_MIRROR_DB_SIZE;
+        mirror_port_db = xmalloc(sizeof(struct mirror_offload_port)*
+            mirror_port_db_size);
+        netdev_mirror_db_init(mirror_port_db, mirror_port_db_size);
+    }
+    target = netdev_mirror_data_find(dev_id);
+    if (target) {
+        if (tx) {
+            if (target->tx.mirror_cb) {
+                VLOG_ERR("Attempt to add ingress mirror offloading"
+                    " on port, %d, while one is outstanding\n", dev_id);
+                return target;
+            }
+
+            memcpy(&target->tx, new_param, sizeof(*new_param));
+        } else {
+            if (target->rx.mirror_cb) {
+                VLOG_ERR("Attempt to add egress mirror offloading"
+                    " on port, %d, while one is outstanding\n", dev_id);
+                return target;
+            }
+
+            memcpy(&target->rx, new_param, sizeof(struct mirror_param));
+        }
+    } else {
+        struct mirror_param *param;
+        /* find an unused spot on db */
+        for (i = 0; i < mirror_port_db_size; i++) {
+            if (mirror_port_db[i].dev_id == INVALID_DEVICE_ID) {
+                break;
+            }
+        }
+        if (i == mirror_port_db_size && netdev_mirror_db_resize()) {
+                return NULL;
+        }
+
+        param = tx ? &mirror_port_db[i].tx : &mirror_port_db[i].rx;
+        memcpy(param, new_param, sizeof(struct mirror_param));
+
+        target = &mirror_port_db[i];
+        target->dev_id = dev_id;
+        mirror_port_used ++;
+    }
+    return target;
+}
+
+static void
+netdev_mirror_data_remove(uint32_t dev_id, int tx) {
+    struct mirror_offload_port *target = netdev_mirror_data_find(dev_id);
+
+    if (!target) {
+        VLOG_ERR("Attempt to remove unsaved port, %d, %s callback\n",
+        dev_id, tx?"tx": "rx");
+    }
+
+    if (tx) {
+        memset(&target->tx, 0, sizeof(struct mirror_param));
+    } else {
+        memset(&target->rx, 0, sizeof(struct mirror_param));
+    }
+
+    if ((target->rx.mirror_cb == NULL) &&
+        (target->tx.mirror_cb == NULL)) {
+        target->dev_id = INVALID_DEVICE_ID;
+        mirror_port_used --;
+        /* release port mirror db memory when there
+         * is no outstanding port mirror offloading
+         * configuration
+         */
+        if (mirror_port_used == 0) {
+            free(mirror_port_db);
+            mirror_port_db = NULL;
+            mirror_port_db_size = 0;
+        }
+    }
+}
+
+void
+netdev_mirror_data_proc(uint32_t dev_id, mirror_data_op op,
+    int tx, struct mirror_param *in_param,
+    struct mirror_offload_port **out_param)
+{
+    switch (op) {
+    case mirror_data_find:
+        *out_param = netdev_mirror_data_find(dev_id);
+        break;
+    case mirror_data_add:
+        *out_param = netdev_mirror_data_add(dev_id, tx, in_param);
+        break;
+    case mirror_data_rem:
+        netdev_mirror_data_remove(dev_id, tx);
+        break;
+    }
+}
+
+/* port/flow mirror traffic processors */
+static inline uint16_t
+netdev_custom_mirror_offload_cb(uint16_t qidx, struct rte_mbuf **pkts,
+    uint16_t nb_pkts, void *user_params)
+{
+    struct mirror_param *data = user_params;
+    uint16_t i, dst_qidx, match_count = 0;
+    uint16_t pkt_trans;
+    uint16_t dst_port_id = data->dst_port_id;
+    uint16_t dst_vlan_id = data->dst_vlan_id;
+    struct rte_mbuf **pkt_buf = &data->pkt_buf[qidx * data->max_burst_size];
+
+    if (nb_pkts == 0) {
+        return 0;
+    }
+
+    if (nb_pkts > data->max_burst_size) {
+        VLOG_ERR("Per-flow batch size, %d, exceeds maximum limit\n", nb_pkts);
+        return 0;
+    }
+
+    for (i = 0; i < nb_pkts; i++) {
+        if (data->custom_scan(pkts[i], user_params)) {
+            pkt_buf[match_count] = pkts[i];
+            pkt_buf[match_count]->ol_flags |= PKT_TX_VLAN_PKT;
+            pkt_buf[match_count]->vlan_tci = dst_vlan_id;
+            rte_mbuf_refcnt_update(pkt_buf[match_count], 1);
+            match_count++;
+        }
+    }
+
+    dst_qidx = (data->n_dst_queue > qidx)?qidx:(data->n_dst_queue -1);
+
+    rte_spinlock_lock(&data->locks[dst_qidx]);
+    pkt_trans = rte_eth_tx_burst(dst_port_id, dst_qidx, pkt_buf, match_count);
+    rte_spinlock_unlock(&data->locks[dst_qidx]);
+
+    for (i = 0; i < match_count; i++) {
+        pkt_buf[i]->ol_flags &= ~PKT_TX_VLAN_PKT;
+    }
+
+    while (unlikely (pkt_trans < match_count)) {
+        rte_pktmbuf_free(pkt_buf[pkt_trans]);
+        pkt_trans++;
+    }
+
+    return nb_pkts;
+}
+
+static inline uint16_t
+netdev_flow_mirror_offload_cb(uint16_t qidx, struct rte_mbuf **pkts,
+    uint16_t nb_pkts, void *user_params, uint32_t offset)
+{
+    struct mirror_param *data = user_params;
+    uint16_t i, dst_qidx, match_count = 0;
+    uint16_t pkt_trans;
+    uint16_t dst_port_id = data->dst_port_id;
+    uint16_t dst_vlan_id = data->dst_vlan_id;
+    uint64_t target_addr = *(uint64_t *) data->extra_data;
+    struct rte_mbuf **pkt_buf = &data->pkt_buf[qidx * data->max_burst_size];
+
+    if (nb_pkts == 0) {
+        return 0;
+    }
+
+    if (nb_pkts > data->max_burst_size) {
+        VLOG_ERR("Per-flow batch size, %d, exceeds maximum limit\n", nb_pkts);
+        return 0;
+    }
+
+    for (i = 0; i < nb_pkts; i++) {
+        uint64_t *dst_mac_addr =
+            rte_pktmbuf_mtod_offset(pkts[i], void *, offset);
+        if (is_mac_addr_match(target_addr, (*dst_mac_addr))) {
+            pkt_buf[match_count] = pkts[i];
+            pkt_buf[match_count]->ol_flags |= PKT_TX_VLAN_PKT;
+            pkt_buf[match_count]->vlan_tci = dst_vlan_id;
+            rte_mbuf_refcnt_update(pkt_buf[match_count], 1);
+            match_count ++;
+        }
+    }
+
+    dst_qidx = (data->n_dst_queue > qidx) ? qidx : (data->n_dst_queue -1);
+
+    rte_spinlock_lock(&data->locks[dst_qidx]);
+    pkt_trans = rte_eth_tx_burst(dst_port_id, dst_qidx, pkt_buf, match_count);
+    rte_spinlock_unlock(&data->locks[dst_qidx]);
+
+    for (i = 0; i < match_count; i++) {
+        pkt_buf[i]->ol_flags &= ~PKT_TX_VLAN_PKT;
+    }
+
+    while (unlikely (pkt_trans < match_count)) {
+        rte_pktmbuf_free(pkt_buf[pkt_trans]);
+        pkt_trans++;
+    }
+
+    return nb_pkts;
+}
+
+static inline uint16_t
+netdev_port_mirror_offload_cb(uint16_t qidx, struct rte_mbuf **pkts,
+    uint16_t nb_pkts, void *user_params)
+{
+    struct mirror_param *data = user_params;
+    uint16_t i, dst_qidx;
+    uint16_t pkt_trans;
+    uint16_t dst_port_id = data->dst_port_id;
+    uint16_t dst_vlan_id = data->dst_vlan_id;
+
+    if (nb_pkts == 0) {
+        return 0;
+    }
+
+    for (i = 0; i < nb_pkts; i++) {
+        pkts[i]->ol_flags |= PKT_TX_VLAN_PKT;
+        pkts[i]->vlan_tci = dst_vlan_id;
+        rte_mbuf_refcnt_update(pkts[i], 1);
+    }
+
+    dst_qidx = (data->n_dst_queue > qidx) ? qidx : (data->n_dst_queue -1);
+
+    rte_spinlock_lock(&data->locks[dst_qidx]);
+    pkt_trans = rte_eth_tx_burst(dst_port_id, dst_qidx, pkts, nb_pkts);
+    rte_spinlock_unlock(&data->locks[dst_qidx]);
+
+    for (i = 0; i < nb_pkts; i++) {
+        pkts[i]->ol_flags &= ~PKT_TX_VLAN_PKT;
+    }
+
+    while (unlikely (pkt_trans < nb_pkts)) {
+        rte_pktmbuf_free(pkts[pkt_trans]);
+        pkt_trans++;
+    }
+
+    return nb_pkts;
+}
+
+static inline uint16_t
+netdev_rx_custom_mirror_offload_cb(uint16_t port_id OVS_UNUSED,
+    uint16_t qidx, struct rte_mbuf **pkts, uint16_t nb_pkts,
+    uint16_t maxi_pkts OVS_UNUSED, void *user_params)
+{
+    return netdev_custom_mirror_offload_cb(qidx, pkts, nb_pkts, user_params);
+}
+
+static inline uint16_t
+netdev_tx_custom_mirror_offload_cb(uint16_t port_id OVS_UNUSED,
+    uint16_t qidx, struct rte_mbuf **pkts, uint16_t nb_pkts,
+    void *user_params)
+{
+    return netdev_custom_mirror_offload_cb(qidx, pkts, nb_pkts, user_params);
+}
+
+static inline uint16_t
+netdev_rx_flow_mirror_offload_cb(uint16_t port_id OVS_UNUSED,
+    uint16_t qidx, struct rte_mbuf **pkts, uint16_t nb_pkts,
+    uint16_t maxi_pkts OVS_UNUSED, void *user_params)
+{
+    return netdev_flow_mirror_offload_cb(qidx, pkts, nb_pkts, user_params, 0);
+}
+
+static inline uint16_t
+netdev_tx_flow_mirror_offload_cb(uint16_t port_id OVS_UNUSED,
+    uint16_t qidx, struct rte_mbuf **pkts, uint16_t nb_pkts,
+    void *user_params)
+{
+    return netdev_flow_mirror_offload_cb(qidx, pkts, nb_pkts, user_params, 6);
+}
+
+static inline uint16_t
+netdev_rx_port_mirror_offload_cb(uint16_t port_id OVS_UNUSED,
+    uint16_t qidx, struct rte_mbuf **pkts, uint16_t nb_pkts,
+    uint16_t max_pkts OVS_UNUSED, void *user_params)
+{
+    return netdev_port_mirror_offload_cb(qidx, pkts, nb_pkts, user_params);
+}
+
+static inline uint16_t
+netdev_tx_port_mirror_offload_cb(uint16_t port_id OVS_UNUSED,
+    uint16_t qidx, struct rte_mbuf **pkts, uint16_t nb_pkts,
+    void *user_params)
+{
+    return netdev_port_mirror_offload_cb(qidx, pkts, nb_pkts, user_params);
+}
+
+static rte_rx_callback_fn
+netdev_mirror_rx_cb(rte_mirror_type mirror_type)
+{
+    switch (mirror_type) {
+    case mirror_port:
+        return netdev_rx_port_mirror_offload_cb;
+    case mirror_flow_mac:
+        return netdev_rx_flow_mirror_offload_cb;
+    case mirror_flow_custom:
+        return netdev_rx_custom_mirror_offload_cb;
+    case mirror_invalid:
+        return NULL;
+    }
+    VLOG_ERR("Un-supported mirror type\n");
+    return NULL;
+}
+
+static rte_tx_callback_fn
+netdev_mirror_tx_cb(rte_mirror_type mirror_type)
+{
+    switch (mirror_type) {
+    case mirror_port:
+        return netdev_tx_port_mirror_offload_cb;
+    case mirror_flow_mac:
+        return netdev_tx_flow_mirror_offload_cb;
+        break;
+    case mirror_flow_custom:
+        return netdev_tx_custom_mirror_offload_cb;
+    case mirror_invalid:
+        return NULL;
+    }
+    VLOG_ERR("Un-supported mirror type\n");
+    return NULL;
+}
+
+void
+netdev_mirror_cb_set(struct mirror_param *data, uint16_t port_id,
+    int pmd_cb, int tx)
+{
+    unsigned int qid;
+
+    data->pkt_buf = NULL;
+    if (data->extra_data_size) {
+        data->pkt_buf = xmalloc(sizeof(mirror_fn_cb)*data->max_burst_size *
+            data->n_src_queue);
+    }
+
+    data->mirror_cb = xmalloc(sizeof(struct rte_eth_rxtx_callback *)
+        * data->n_src_queue);
+    for (qid = 0; qid < data->n_src_queue; qid++) {
+        if (pmd_cb) {
+            if (tx) {
+                data->mirror_cb[qid].pmd = rte_eth_add_tx_callback(port_id,
+                    qid, netdev_mirror_tx_cb(data->mirror_type), data);
+            } else {
+                data->mirror_cb[qid].pmd = rte_eth_add_rx_callback(port_id,
+                    qid, netdev_mirror_rx_cb(data->mirror_type), data);
+            }
+        } else {
+            struct rte_eth_rxtx_callback *rxtx_cb =
+                xmalloc(sizeof(struct rte_eth_rxtx_callback));
+
+            data->mirror_cb[qid].direct = rxtx_cb;
+            rxtx_cb->next = NULL;
+            rxtx_cb->param = data;
+
+            if (tx) {
+                rxtx_cb->fn.tx = netdev_mirror_tx_cb(data->mirror_type);
+            } else {
+                rxtx_cb->fn.rx = netdev_mirror_rx_cb(data->mirror_type);
+            }
+        }
+    }
+}
+
+/* port/flow mirroring device (port) register/un-registe routines */
+int
+netdev_eth_register_mirror(uint16_t src_port, struct mirror_param *param,
+    int tx_cb)
+{
+    struct mirror_offload_port *port_info = NULL;
+    struct mirror_param *data;
+
+    netdev_mirror_data_proc(src_port, mirror_data_add, tx_cb, param,
+        &port_info);
+    if (!port_info) {
+        return -1;
+    }
+
+    data = tx_cb ? &port_info->tx : &port_info->rx;
+    netdev_mirror_cb_set(data, src_port, 1, tx_cb);
+
+    return 0;
+}
+
+int
+netdev_eth_unregister_mirror(uint16_t src_port, int tx_cb)
+{
+    /* release both cb and pkt_buf */
+    unsigned int i;
+    struct mirror_offload_port *port_info = NULL;
+    struct mirror_param *data;
+
+    netdev_mirror_data_proc(src_port, mirror_data_find, tx_cb, NULL,
+        &port_info);
+    if (port_info == NULL) {
+        VLOG_ERR("Source port %d is not on outstanding port mirror db\n",
+            src_port);
+        return -1;
+    }
+    data = tx_cb ? &port_info->tx : &port_info->rx;
+
+    for (i = 0; i < data->n_src_queue; i++) {
+        if (data->mirror_cb[i].pmd) {
+            if (tx_cb) {
+                rte_eth_remove_tx_callback(src_port, i,
+                    data->mirror_cb[i].pmd);
+            } else {
+                rte_eth_remove_rx_callback(src_port, i,
+                    data->mirror_cb[i].pmd);
+            }
+        }
+        data->mirror_cb[i].pmd = NULL;
+    }
+    free(data->mirror_cb);
+
+    if (data->pkt_buf) {
+        free(data->pkt_buf);
+        data->pkt_buf = NULL;
+    }
+
+    if (data->extra_data) {
+        free(data->extra_data);
+        data->extra_data = NULL;
+        data->extra_data_size = 0;
+    }
+
+    netdev_mirror_data_proc(src_port, mirror_data_rem, tx_cb, NULL, NULL);
+    return 0;
+}
diff --git a/lib/netdev-dpdk-mirror.h b/lib/netdev-dpdk-mirror.h
new file mode 100644
index 000000000..ee4b933ba
--- /dev/null
+++ b/lib/netdev-dpdk-mirror.h
@@ -0,0 +1,83 @@ 
+/*
+ * Copyright (c) 2014, 2015, 2016 Nicira, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#ifndef NETDEV_DPDK_MIRROR_H
+#define NETDEV_DPDK_MIRROR_H
+
+#include "openvswitch/types.h"
+
+#ifdef  __cplusplus
+extern "C" {
+#endif
+
+typedef enum {
+    mirror_data_find, /* find the mirror-data allocated */
+    mirror_data_add, /* add a new mirror_param data int DB */
+    mirror_data_rem, /* remove a mirror_param from the DB */
+} mirror_data_op;
+
+typedef int (*rte_mirror_scan_fn)(struct rte_mbuf *pkt, void *user_param);
+typedef enum {
+    mirror_port, /* port mirror */
+    mirror_flow_mac, /* flow mirror according to source mac */
+    mirror_flow_custom,  /* flow mirror according to a callback scn */
+    mirror_invalid,      /* invalid mirror_type */
+} rte_mirror_type;
+
+typedef union {
+    const struct rte_eth_rxtx_callback *pmd;
+    struct rte_eth_rxtx_callback *direct;
+} mirror_fn_cb;
+
+struct mirror_param {
+    uint16_t dst_port_id;
+    uint16_t dst_vlan_id;
+    rte_spinlock_t *locks;
+    int n_src_queue;
+    int n_dst_queue;
+    struct rte_mbuf **pkt_buf;
+    mirror_fn_cb *mirror_cb;
+    unsigned int max_burst_size;
+    rte_mirror_scan_fn custom_scan;
+    rte_mirror_type mirror_type;
+    unsigned int extra_data_size;
+    void *extra_data; /* extra mirror parameter */
+};
+
+struct mirror_offload_port {
+    uint32_t dev_id;
+    struct mirror_param rx;
+    struct mirror_param tx;
+};
+
+bool netdev_port_started(uint16_t port_id, uint32_t *num_tx_queue);
+int netdev_get_portid_from_addr(const char *pci_addr_str, uint16_t *port_id);
+int netdev_tunnel_port_setup(uint16_t portid, uint32_t *num_queue);
+
+void netdev_mirror_data_proc(uint32_t dev_id, mirror_data_op op,
+    int tx, struct mirror_param *in_param,
+    struct mirror_offload_port **out_param);
+void netdev_mirror_cb_set(struct mirror_param *data, uint16_t port_id,
+    int pmd, int tx);
+int netdev_eth_register_mirror(uint16_t src_port,
+    struct mirror_param *param, int tx_cb);
+int netdev_eth_unregister_mirror(uint16_t src_port, int tx_cb);
+
+#ifdef  __cplusplus
+}
+#endif
+
+#endif /* netdev-dpdk-mirror.h */
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 9d8096668..eb6644333 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -48,6 +48,7 @@ 
 #include "fatal-signal.h"
 #include "if-notifier.h"
 #include "netdev-provider.h"
+#include "netdev-dpdk-mirror.h"
 #include "netdev-vport.h"
 #include "odp-util.h"
 #include "openvswitch/dynamic-string.h"
@@ -171,6 +172,16 @@  static const struct rte_eth_conf port_conf = {
     },
 };
 
+struct mirror_tunnel_port_info {
+    uint16_t port_id;
+    rte_spinlock_t *locks;
+    uint32_t share_count;
+    uint32_t num_queue;
+    bool port_started;
+    struct mirror_tunnel_port_info *next;
+};
+static struct mirror_tunnel_port_info *mirror_tunnel_head = NULL;
+
 /*
  * These callbacks allow virtio-net devices to be added to vhost ports when
  * configuration has been fully completed.
@@ -443,6 +454,8 @@  struct netdev_dpdk {
         };
         struct dpdk_tx_queue *tx_q;
         struct rte_eth_link link;
+        mirror_fn_cb *rx_cb; /* shared pointer */
+        mirror_fn_cb *tx_cb;
     );
 
     PADDED_MEMBERS_CACHELINE_MARKER(CACHE_LINE_SIZE, cacheline1,
@@ -2417,6 +2430,13 @@  netdev_dpdk_vhost_rxq_recv(struct netdev_rxq *rxq,
     nb_rx = rte_vhost_dequeue_burst(vid, qid, dev->dpdk_mp->mp,
                                     (struct rte_mbuf **) batch->packets,
                                     NETDEV_MAX_BURST);
+
+    if (dev->rx_cb && dev->rx_cb[qid].direct->fn.rx) {
+        dev->rx_cb[qid].direct->fn.rx((uint16_t) vid, qid,
+        (struct rte_mbuf **) batch->packets, nb_rx,
+        NETDEV_MAX_BURST, dev->rx_cb[qid].direct->param);
+    }
+
     if (!nb_rx) {
         return EAGAIN;
     }
@@ -2634,6 +2654,10 @@  __netdev_dpdk_vhost_send(struct netdev *netdev, int qid,
         int vhost_qid = qid * VIRTIO_QNUM + VIRTIO_RXQ;
         unsigned int tx_pkts;
 
+        if (dev->tx_cb && dev->tx_cb[qid].direct->fn.tx) {
+            dev->tx_cb[qid].direct->fn.tx((uint16_t) vid, qid, cur_pkts, cnt,
+                dev->tx_cb[qid].direct->param);
+        }
         tx_pkts = rte_vhost_enqueue_burst(vid, vhost_qid, cur_pkts, cnt);
         if (OVS_LIKELY(tx_pkts)) {
             /* Packets have been sent.*/
@@ -5291,6 +5315,376 @@  netdev_dpdk_rte_flow_query_count(struct netdev *netdev,
     return ret;
 }
 
+/*
+ * mirror tunnel device management routines
+ * mirror tunnel devices are devices reserved solely for
+ * traffic mirroring
+ */
+static void
+netdev_dpdk_update_mt_list(struct mirror_tunnel_port_info *mt_port_info,
+                           bool add_port)
+{
+    struct mirror_tunnel_port_info *ptr = mirror_tunnel_head;
+
+    if (add_port) {
+        if (!ptr) {
+            mirror_tunnel_head = mt_port_info;
+            return;
+        }
+        while (ptr->next) {
+            ptr = ptr->next;
+        }
+        ptr->next = mt_port_info;
+    } else {
+        while (ptr->next &&
+            ptr->next->port_id != mt_port_info->port_id) {
+            ptr = ptr->next;
+        }
+
+        if (ptr->next) {
+            ptr->next = ptr->next->next;
+            free(mt_port_info);
+        } else {
+            if (ptr->port_id == mt_port_info->port_id) {
+                mirror_tunnel_head = NULL;
+                free(mt_port_info);
+            } else {
+                VLOG_ERR("Fail to find %s mirror port (%d) info\n",
+                 add_port?"add":"remove", mt_port_info->port_id);
+            }
+        }
+    }
+}
+
+static struct mirror_tunnel_port_info*
+netdev_dpdk_get_mt_port_info(uint16_t port_id)
+{
+    struct mirror_tunnel_port_info *mt_port_info;
+
+    if (mirror_tunnel_head) {
+        mt_port_info = mirror_tunnel_head;
+        while (mt_port_info) {
+            if (mt_port_info->port_id == port_id) {
+                return mt_port_info;
+            }
+            mt_port_info = mt_port_info->next;
+        }
+        VLOG_ERR("Could not tunnel port with port-id %d\n",
+            port_id);
+    }
+
+    mt_port_info = xmalloc(sizeof(struct mirror_tunnel_port_info));
+    memset(mt_port_info, 0, sizeof(*mt_port_info));
+    mt_port_info->port_id = port_id;
+    mt_port_info->next = NULL;
+
+    return mt_port_info;
+}
+
+static int
+netdev_dpdk_addr_to_portid(const char *pci_addr_str, uint16_t *port_id)
+{
+    struct rte_pci_device *pci_dev;
+    struct rte_pci_addr pci_addr;
+    int i;
+
+    if (rte_pci_addr_parse(pci_addr_str, &pci_addr)) {
+        VLOG_ERR("Incorrect pci address %s\n", pci_addr_str);
+        return -1;
+    }
+
+    for (i = 0; i < RTE_MAX_ETHPORTS; i++) {
+        struct rte_pci_addr *eth_pci_addr;
+
+        if (!rte_eth_devices[i].device) {
+            continue;
+        }
+
+        pci_dev = RTE_ETH_DEV_TO_PCI(&rte_eth_devices[i]);
+        if (!pci_dev) {
+            continue;
+        }
+
+        eth_pci_addr = &pci_dev->addr;
+
+        if (pci_addr.bus == eth_pci_addr->bus &&
+            pci_addr.devid == eth_pci_addr->devid &&
+            pci_addr.domain == eth_pci_addr->domain &&
+            pci_addr.function == eth_pci_addr->function) {
+            *port_id = i;
+
+            return 0;
+        }
+    }
+
+    return -1;
+}
+
+static int
+netdev_dpdk_mt_open(uint16_t port_id, struct mirror_param *param)
+{
+    struct rte_eth_dev_info dev_info;
+    struct rte_eth_txconf txq_conf;
+    struct rte_eth_rxconf rxq_conf;
+    struct rte_mempool *pktbuf;
+
+    struct mirror_tunnel_port_info *mt_info;
+
+    uint16_t nb_rxd = NIC_PORT_DEFAULT_RXQ_SIZE;
+    uint16_t nb_txd = NIC_PORT_DEFAULT_TXQ_SIZE;
+    unsigned int i, num_queue;
+
+    struct rte_eth_conf mt_port_conf = {
+        .rxmode = {
+            .split_hdr_size = 0,
+        },
+        .txmode = {
+            .mq_mode = ETH_MQ_TX_NONE,
+        },
+    };
+
+    mt_info = netdev_dpdk_get_mt_port_info(port_id);
+    if (!mt_info) {
+        return -1;
+    }
+
+    if (mt_info->port_started) {
+        param->n_dst_queue = mt_info->num_queue;
+        param->dst_port_id = port_id;
+        param->locks = mt_info->locks;
+        mt_info->share_count++;
+
+        return 0;
+    }
+
+    rte_eth_dev_info_get(port_id, &dev_info);
+    num_queue = param->n_src_queue;
+
+    /* A tunnel device doesn't require mbuf. It's used as
+     * hardware channel, transmit packets with
+     * mbuf provided by source. Need this mbuf creation
+     * to finish port initialization
+     */
+    pktbuf = rte_pktmbuf_pool_create(
+            "tunnel-port",
+            (dev_info.rx_desc_lim.nb_max + dev_info.tx_desc_lim.nb_max),
+            RTE_MEMPOOL_CACHE_MAX_SIZE, 0, RTE_MBUF_DEFAULT_BUF_SIZE,
+            rte_eth_dev_socket_id(port_id));
+
+    mt_port_conf.txmode.offloads |= DEV_TX_OFFLOAD_VLAN_INSERT;
+    if (dev_info.tx_offload_capa & DEV_TX_OFFLOAD_MBUF_FAST_FREE) {
+        mt_port_conf.txmode.offloads |= DEV_TX_OFFLOAD_MBUF_FAST_FREE;
+    }
+    rte_eth_dev_configure(port_id, 1, num_queue, &mt_port_conf);
+
+    /* init one Rx queue */
+    rxq_conf = dev_info.default_rxconf;
+    rxq_conf.offloads = mt_port_conf.rxmode.offloads;
+    if (rte_eth_rx_queue_setup(port_id, 0, nb_rxd,
+        rte_eth_dev_socket_id(port_id), &rxq_conf, pktbuf) < 0)
+        VLOG_ERR("fail to setup tunnel port (%d) rx-queue\n", port_id);
+
+    /* init # of Tx queue as part of mirror-tunnel setup */
+    txq_conf = dev_info.default_txconf;
+    txq_conf.offloads |= mt_port_conf.txmode.offloads;
+    for (i = 0; i < num_queue; i++) {
+        if (rte_eth_tx_queue_setup(port_id,
+            i, nb_txd,
+            rte_eth_dev_socket_id(port_id),
+            &txq_conf) < 0) {
+            VLOG_ERR("fail to setup tunnel port (%d) tx queue #%u\n",
+                port_id, i);
+            return -1;
+        }
+    }
+
+    if (rte_eth_dev_start(port_id) < 0) {
+        VLOG_ERR("fail to start tunnel port %d\n", port_id);
+        return -1;
+    }
+
+    mt_info->locks = xmalloc(num_queue * sizeof(rte_spinlock_t));
+    if (mt_info->locks) {
+        for (i = 0; i < mt_info->num_queue; i++) {
+            rte_spinlock_init(&mt_info->locks[i]);
+        }
+    } else {
+        return -1;
+    }
+    mt_info->share_count = 1;
+    mt_info->port_started = true;
+    mt_info->num_queue = num_queue;
+
+    param->n_dst_queue = mt_info->num_queue;
+    param->dst_port_id = port_id;
+    param->locks = mt_info->locks;
+
+    netdev_dpdk_update_mt_list(mt_info, true);
+    return 0;
+}
+
+static void
+netdev_dpdk_mt_close(uint16_t mirror_port_id)
+{
+    struct mirror_tunnel_port_info *mt_port_info =
+        netdev_dpdk_get_mt_port_info(mirror_port_id);
+
+    if (mt_port_info) {
+        mt_port_info->share_count--;
+        if (!mt_port_info->share_count) {
+            netdev_dpdk_update_mt_list(mt_port_info, false);
+            rte_eth_dev_stop(mirror_port_id);
+            rte_eth_dev_close(mirror_port_id);
+        }
+    }
+}
+
+/* vhost device mirror registration and un-registration routines */
+static int
+netdev_vhost_register_mirror(struct netdev_dpdk *dev,
+    struct mirror_param *param, int tx_cb)
+{
+    uint32_t vid = netdev_dpdk_get_vid(dev);
+    struct mirror_offload_port *port_info = NULL;
+    struct mirror_param *data;
+
+    netdev_mirror_data_proc(vid, mirror_data_add, tx_cb, param, &port_info);
+    if (!port_info) {
+        return -1;
+    }
+
+    data = tx_cb ? &port_info->tx : &port_info->rx;
+    netdev_mirror_cb_set(data, (uint16_t) vid, 0, tx_cb);
+
+    if (tx_cb) {
+        dev->tx_cb = data->mirror_cb;
+    } else {
+        dev->rx_cb = data->mirror_cb;
+    }
+
+    return 0;
+}
+
+static int
+netdev_vhost_unregister_mirror(struct netdev_dpdk *dev, int tx_cb)
+{
+    /* release both cb and pkt_buf */
+    unsigned int i;
+    uint32_t vid = netdev_dpdk_get_vid(dev);
+    struct mirror_offload_port *port_info = NULL;
+    struct mirror_param *data;
+
+    netdev_mirror_data_proc(vid, mirror_data_find, tx_cb, NULL, &port_info);
+    if (port_info == NULL) {
+        VLOG_ERR("Source port %d is not on outstanding port mirror db\n", vid);
+        return -1;
+    }
+    data = tx_cb ? &port_info->tx : &port_info->rx;
+
+    if (tx_cb) {
+        dev->tx_cb = NULL;
+    } else {
+        dev->rx_cb = NULL;
+    }
+
+    for (i = 0; i < data->n_src_queue; i++) {
+        free(data->mirror_cb[i].direct);
+    }
+
+    free(data->mirror_cb);
+
+    if (data->pkt_buf) {
+        free(data->pkt_buf);
+        data->pkt_buf = NULL;
+    }
+
+    if (data->extra_data) {
+        free(data->extra_data);
+        data->extra_data = NULL;
+        data->extra_data_size = 0;
+    }
+
+    netdev_mirror_data_proc(vid,  mirror_data_rem, tx_cb, NULL, NULL);
+    return 0;
+}
+
+static int
+netdev_dpdk_mirror_offload(struct netdev *src, struct eth_addr *flow_addr,
+                           uint16_t vlan_id, char *mirror_tunnel_addr,
+                           bool add_mirror, bool tx_cb) {
+    struct netdev_dpdk *src_dev = netdev_dpdk_cast(src);
+    bool eth_dev = src_dev->type == DPDK_DEV_ETH;
+    uint16_t mirror_port_id;
+    int status = 0;
+
+    if (netdev_dpdk_addr_to_portid(mirror_tunnel_addr, &mirror_port_id)) {
+        VLOG_ERR("Could not find tunnel port with BDF addr %s\n",
+            mirror_tunnel_addr);
+        return -1;
+    }
+
+    if (add_mirror) {
+        uint32_t i;
+        struct mirror_param data;
+        uint64_t mac_addr = 0;
+
+        memset(&data, 0, sizeof(struct mirror_param));
+        data.extra_data_size = 0;
+        data.extra_data = NULL;
+        data.mirror_type = mirror_port;
+        for (i = 0; i < 6; i++) {
+            mac_addr <<= 8;
+            mac_addr |= flow_addr->ea[6 - i - 1];
+        }
+        if (mac_addr) {
+            data.mirror_type = mirror_flow_mac;
+            data.extra_data_size = sizeof(uint64_t);
+            data.extra_data = xmalloc(sizeof(uint64_t));
+            memcpy(data.extra_data, &mac_addr, sizeof(uint64_t));
+        }
+        data.dst_vlan_id = vlan_id;
+        data.n_src_queue = tx_cb?src->n_txq:src->n_rxq;
+        data.max_burst_size = NETDEV_MAX_BURST;
+
+        if (netdev_dpdk_mt_open(mirror_port_id, &data)) {
+            VLOG_ERR("Fail to initialize mirror tunnel port %d\n",
+                mirror_port_id);
+            return -1;
+        }
+
+        VLOG_INFO("register %s device with %s mirror-offload with"
+            "src-port:%d (%s) and output-port:%d (%s) vlan-id=%d flow-mac="
+            "0x%" PRIx64 "\n",
+            eth_dev?"ethdev":"vhost",
+            tx_cb?"ingress":"egress", src_dev->port_id,
+            src->name, mirror_port_id, mirror_tunnel_addr, vlan_id,
+            (uint64_t)__builtin_bswap64(mac_addr));
+
+        if (eth_dev) {
+            status = netdev_eth_register_mirror(src_dev->port_id, &data,
+                tx_cb);
+        } else {
+            status = netdev_vhost_register_mirror(src_dev, &data, tx_cb);
+        }
+    } else {
+        VLOG_INFO("unregister %s device with %s mirror-offload with"
+            " src-port:%d(%s)\n",
+            eth_dev?"ethdev":"vhost",
+            tx_cb?"ingress":"egress", src_dev->port_id,
+            src->name);
+
+        if (eth_dev) {
+            status = netdev_eth_unregister_mirror(src_dev->port_id, tx_cb);
+        } else {
+            status = netdev_vhost_unregister_mirror(src_dev, tx_cb);
+        }
+
+        netdev_dpdk_mt_close(mirror_port_id);
+    }
+
+    return status;
+}
+
 #define NETDEV_DPDK_CLASS_COMMON                            \
     .is_pmd = true,                                         \
     .alloc = netdev_dpdk_alloc,                             \
@@ -5340,6 +5734,7 @@  static const struct netdev_class dpdk_class = {
     .construct = netdev_dpdk_construct,
     .set_config = netdev_dpdk_set_config,
     .send = netdev_dpdk_eth_send,
+    .mirror_offload = netdev_dpdk_mirror_offload,
 };
 
 static const struct netdev_class dpdk_vhost_class = {
@@ -5355,6 +5750,7 @@  static const struct netdev_class dpdk_vhost_class = {
     .reconfigure = netdev_dpdk_vhost_reconfigure,
     .rxq_recv = netdev_dpdk_vhost_rxq_recv,
     .rxq_enabled = netdev_dpdk_vhost_rxq_enabled,
+    .mirror_offload = netdev_dpdk_mirror_offload,
 };
 
 static const struct netdev_class dpdk_vhost_client_class = {
@@ -5371,6 +5767,7 @@  static const struct netdev_class dpdk_vhost_client_class = {
     .reconfigure = netdev_dpdk_vhost_client_reconfigure,
     .rxq_recv = netdev_dpdk_vhost_rxq_recv,
     .rxq_enabled = netdev_dpdk_vhost_rxq_enabled,
+    .mirror_offload = netdev_dpdk_mirror_offload,
 };
 
 void
diff --git a/lib/netdev-provider.h b/lib/netdev-provider.h
index 73dce2fca..dab278dcd 100644
--- a/lib/netdev-provider.h
+++ b/lib/netdev-provider.h
@@ -834,6 +834,22 @@  struct netdev_class {
     /* Get a block_id from the netdev.
      * Returns the block_id or 0 if none exists for netdev. */
     uint32_t (*get_block_id)(struct netdev *);
+
+    /* Configure a mirror offload setting on a netdev.
+     * 'src': netdev traffic to be mirrored
+     * 'flow_addr': the destination mac address is of source traffic for
+     *  inspection.
+     * 'dst': netdev where mirror traffic is transmitted.
+     * 'vlan_id': vlag to be added to the mirrored packets.
+     * 'mt_pci_addr': mirror tunnel pcie address.
+     * 'add_mirror': true: configure a mirror traffic; false: remove mirror
+     * 'ingress': true: mirror 'src' netdev Rx traffic; false: mirror
+     *  'src' netdev Tx traffic.
+     */
+    int (*mirror_offload)(struct netdev *src, struct eth_addr *flow_addr,
+                          uint16_t vlan_id, char *mt_pci_addr,
+                          bool add_mirror, bool ingress);
+
 };
 
 int netdev_register_provider(const struct netdev_class *);
diff --git a/lib/netdev.c b/lib/netdev.c
index 91e91955c..464c2f8fe 100644
--- a/lib/netdev.c
+++ b/lib/netdev.c
@@ -69,6 +69,8 @@  COVERAGE_DEFINE(netdev_get_stats);
 COVERAGE_DEFINE(netdev_send_prepare_drops);
 COVERAGE_DEFINE(netdev_push_header_drops);
 
+#define MIRROR_DB_INIT_SIZE 8
+
 struct netdev_saved_flags {
     struct netdev *netdev;
     struct ovs_list node;           /* In struct netdev's saved_flags_list. */
@@ -2297,3 +2299,387 @@  netdev_free_custom_stats_counters(struct netdev_custom_stats *custom_stats)
         }
     }
 }
+
+
+struct netdev_mirror_offload_item {
+    struct mirror_offload_info info;
+
+    struct ovs_list node;
+};
+
+struct netdev_mirror_offload {
+    struct ovs_mutex mutex;
+    struct ovs_list list;
+    pthread_cond_t cond;
+};
+
+static struct netdev_mirror_offload netdev_mirror_offload = {
+    .mutex = OVS_MUTEX_INITIALIZER,
+    .list  = OVS_LIST_INITIALIZER(&netdev_mirror_offload.list),
+};
+
+static struct ovsthread_once offload_thread_once
+    = OVSTHREAD_ONCE_INITIALIZER;
+
+static void *netdev_mirror_offload_main(void *data);
+
+/*
+ * Re-size mirror_db when it's out of space.
+ * Always double the buffer when it's needed
+ */
+static int
+netdev_mirror_db_resize(struct netdev_mirror_offload_item ***old_db,
+    int *old_db_size)
+{
+    struct netdev_mirror_offload_item **new_db;
+    int cur_size = *old_db_size;
+    int new_size;
+
+    if (!cur_size) {
+        new_size = MIRROR_DB_INIT_SIZE;
+    } else {
+        new_size = 2 * cur_size;
+    }
+
+    new_db = xzalloc(sizeof(struct netdev_mirror_offload_item *) * new_size);
+
+    if (!new_db) {
+        VLOG_ERR("Out of memory!!!");
+        return -1;
+    }
+    memset(new_db, 0, sizeof(struct netdev_mirror_offload_item *) * new_size);
+
+    if (cur_size) {
+        int i;
+
+        for (i = 0; i < cur_size; i++) {
+            new_db[i] = (*old_db)[i];
+        }
+        free(*old_db);
+    }
+
+    *old_db = new_db;
+    *old_db_size = new_size;
+
+    return 0;
+}
+
+static void
+netdev_free_mirror_offload(struct netdev_mirror_offload_item *offload)
+{
+    if (!offload) {
+        return;
+    }
+
+    if (offload->info.src) {
+        free(offload->info.src);
+    }
+    if (offload->info.dst) {
+        free(offload->info.dst);
+    }
+    if (offload->info.flow_dst_mac) {
+        free(offload->info.flow_dst_mac);
+    }
+    if (offload->info.flow_src_mac) {
+        free(offload->info.flow_src_mac);
+    }
+    if (offload->info.output_src_tags) {
+        free(offload->info.output_src_tags);
+    }
+    if (offload->info.output_dst_tags) {
+        free(offload->info.output_dst_tags);
+    }
+    if (offload->info.name) {
+        free(offload->info.name);
+    }
+    if (offload->info.mirror_tunnel_addr) {
+        free(offload->info.mirror_tunnel_addr);
+    }
+
+    free(offload);
+}
+
+static struct
+netdev_mirror_offload_item *
+netdev_alloc_mirror_offload(struct mirror_offload_info *info)
+{
+    struct netdev_mirror_offload_item *offload;
+    int i;
+
+    offload = xzalloc(sizeof(*offload));
+    memcpy(&offload->info, info, sizeof(struct mirror_offload_info));
+
+    if (info->name) {
+        offload->info.name = xzalloc(strlen(info->name) + 1);
+        if (offload->info.name) {
+            ovs_strzcpy(offload->info.name, info->name, strlen(info->name));
+        }
+    }
+
+    if (info->mirror_tunnel_addr) {
+        offload->info.mirror_tunnel_addr =
+            xzalloc(strlen(info->mirror_tunnel_addr) + 1);
+        if (offload->info.mirror_tunnel_addr) {
+            ovs_strzcpy(offload->info.mirror_tunnel_addr,
+                        info->mirror_tunnel_addr,
+                        strlen(info->mirror_tunnel_addr));
+        }
+    }
+
+    /* only add_mirror request include valid configuration */
+    if (info->n_src_port) {
+        offload->info.src = xzalloc(sizeof(struct netdev *)*info->n_src_port);
+        offload->info.flow_dst_mac = xzalloc(sizeof(struct eth_addr)*
+            info->n_src_port);
+        offload->info.output_src_tags = xzalloc(sizeof(uint16_t)*
+            info->n_src_port);
+        if (!offload->info.src || !offload->info.flow_dst_mac ||
+            !offload->info.output_src_tags) {
+            VLOG_ERR("Out of memory!!!");
+            netdev_free_mirror_offload(offload);
+            return NULL;
+        }
+
+        for (i = 0; i < info->n_src_port; i++) {
+            offload->info.src[i] = info->src[i];
+            offload->info.output_src_tags[i] = info->output_src_tags[i];
+            memcpy(&offload->info.flow_dst_mac[i], &info->flow_dst_mac[i],
+                sizeof(struct eth_addr));
+        }
+    }
+
+    if (info->n_dst_port) {
+        offload->info.dst = xzalloc(sizeof(struct netdev *)*info->n_dst_port);
+        offload->info.flow_src_mac = xzalloc(sizeof(struct eth_addr)*
+            info->n_dst_port);
+        offload->info.output_dst_tags = xzalloc(sizeof(uint16_t)*
+            info->n_dst_port);
+        if (!offload->info.dst || !offload->info.flow_src_mac ||
+            !offload->info.output_dst_tags) {
+            VLOG_ERR("Out of memory!!!");
+            netdev_free_mirror_offload(offload);
+            return NULL;
+        }
+
+        for (i = 0; i < info->n_dst_port; i++) {
+            offload->info.dst[i] = info->dst[i];
+            offload->info.output_dst_tags[i] = info->output_dst_tags[i];
+            memcpy(&offload->info.flow_src_mac[i], &info->flow_src_mac[i],
+                sizeof(struct eth_addr));
+        }
+    }
+
+    return offload;
+}
+
+static void
+netdev_append_mirror_offload(struct netdev_mirror_offload_item *offload)
+{
+    ovs_mutex_lock(&netdev_mirror_offload.mutex);
+    ovs_list_push_back(&netdev_mirror_offload.list, &offload->node);
+    xpthread_cond_signal(&netdev_mirror_offload.cond);
+    ovs_mutex_unlock(&netdev_mirror_offload.mutex);
+}
+
+void
+netdev_mirror_offload_put(struct mirror_offload_info *info)
+{
+    struct netdev_mirror_offload_item *offload;
+    /* only support tunnel port for traffic mirroring */
+    if (info->add_mirror && !info->mirror_tunnel_addr) {
+        return;
+    }
+
+    if (ovsthread_once_start(&offload_thread_once)) {
+        xpthread_cond_init(&netdev_mirror_offload.cond, NULL);
+        ovs_thread_create("netdev_mirror_offload",
+                          netdev_mirror_offload_main, NULL);
+        ovsthread_once_done(&offload_thread_once);
+    }
+
+    offload = netdev_alloc_mirror_offload(info);
+    netdev_append_mirror_offload(offload);
+}
+
+static int
+netdev_mirror_offload_configue(struct mirror_offload_info *info,
+    bool add_mirror)
+{
+    int un_support_count = 0;
+    int ret;
+
+    if (info->n_src_port) {
+        for (int i = 0; i < info->n_src_port; i++) {
+            const struct netdev_class *class =
+                info->src[i]->netdev_class;
+            if (!class) {
+                return -1;
+            }
+            if (class->mirror_offload) {
+                ret = class->mirror_offload(
+                    info->src[i],
+                    &info->flow_dst_mac[i],
+                    info->output_src_tags[i],
+                    info->mirror_tunnel_addr,
+                    add_mirror, false);
+                if (ret) {
+                    VLOG_ERR("Fail to %s mirror-offload"
+                        " configuration %s\n",
+                        add_mirror ? "add" : "remove",
+                        info->name);
+                    return ret;
+                }
+            } else {
+                un_support_count++;
+            }
+        }
+    }
+
+    if (info->n_dst_port) {
+        for (int i = 0; i < info->n_dst_port; i++) {
+            const struct netdev_class *class =
+                info->dst[i]->netdev_class;
+            if (!class) {
+                return -1;
+            }
+            if (class->mirror_offload) {
+                ret = class->mirror_offload(
+                    info->dst[i],
+                    &info->flow_src_mac[i],
+                    info->output_dst_tags[i],
+                    info->mirror_tunnel_addr,
+                    add_mirror, true);
+                if (ret) {
+                    VLOG_ERR("Fail to %s mirror-offload"
+                        " configuration %s\n",
+                        add_mirror ? "add" : "remove",
+                        info->name);
+                    return ret;
+                }
+            } else {
+                un_support_count++;
+            }
+        }
+    }
+
+    return un_support_count;
+}
+
+static void *
+netdev_mirror_offload_main(void *data OVS_UNUSED)
+{
+    struct netdev_mirror_offload_item *offload;
+    struct mirror_offload_info *info;
+    struct ovs_list *list;
+    struct netdev_mirror_offload_item **offload_db = NULL;
+    int offload_used_count = 0;
+    int offload_db_size = 0;
+    int ret, i, ind;
+
+    /* continue polling to check if there is an outstanding request */
+    for (;;) {
+        ovs_mutex_lock(&netdev_mirror_offload.mutex);
+        if (ovs_list_is_empty(&netdev_mirror_offload.list)) {
+            ovsrcu_quiesce_start();
+            ovs_mutex_cond_wait(&netdev_mirror_offload.cond,
+                                &netdev_mirror_offload.mutex);
+            ovsrcu_quiesce_end();
+        }
+        list = ovs_list_pop_front(&netdev_mirror_offload.list);
+        offload = CONTAINER_OF(list, struct netdev_mirror_offload_item,
+            node);
+        ovs_mutex_unlock(&netdev_mirror_offload.mutex);
+
+        if (!offload_db_size &&
+            netdev_mirror_db_resize(&offload_db, &offload_db_size)){
+            return NULL;
+        }
+
+        ind = offload_db_size;
+        for (i = 0; i < offload_db_size; i++) {
+            if (offload_db[i] &&
+                !strncmp(offload_db[i]->info.name, offload->info.name,
+                strlen(offload->info.name) + 1)) {
+                ind = i;
+                break;
+            }
+        }
+
+        if (!offload->info.add_mirror) {
+            /* remove mirror offload setup */
+            if (ind == offload_db_size) {
+                VLOG_WARN("Mirror offload remove configuration, %s, "
+                    "not found; clear mirror offload operation"
+                    " aborted\n", offload->info.name);
+                continue;
+            }
+        } else {
+            /* add mirror offload */
+            if (ind < offload_db_size) {
+                netdev_free_mirror_offload(offload);
+                VLOG_WARN("Attempt adding an existing mirror-offload "
+                    "configuration; request aborted\n");
+                continue;
+            }
+
+            if (offload_used_count == offload_db_size &&
+                netdev_mirror_db_resize(&offload_db, &offload_db_size)) {
+                return NULL;
+            }
+        }
+
+        info = offload->info.add_mirror ? &offload->info :
+            &offload_db[ind]->info;
+        ret = netdev_mirror_offload_configue(info, offload->info.add_mirror);
+
+        if (ret) {
+            VLOG_ERR("%s mirror configuration fails due to %s\n",
+                offload->info.add_mirror ? "Add" : "Remove",
+                ret > 0 ? "unsupport source traffic type" :
+                "device is not ready");
+            netdev_free_mirror_offload(offload);
+            continue;
+        } else {
+            VLOG_INFO("Succeed %s mirror-offload configuration: %s",
+                offload->info.add_mirror ? "adding" : "removing",
+                offload->info.name);
+        }
+
+        if (offload->info.add_mirror) {
+            for (i = 0; i < offload_db_size; i++) {
+                if (offload_db[i] == NULL) {
+                    offload_db[i] = offload;
+                    offload_used_count++;
+                    break;
+                }
+            }
+        } else {
+            /* remove the prior "add" request */
+            netdev_free_mirror_offload(offload_db[ind]);
+            offload_db[ind] = NULL;
+
+            /* remove the current("remove") request */
+            netdev_free_mirror_offload(offload);
+            offload_used_count--;
+        }
+
+        /* free db when the used count drop to 0 */
+        if (!offload_used_count) {
+            free(offload_db);
+            offload_db = NULL;
+            offload_db_size = 0;
+        }
+    }
+
+    /* clean up memory */
+    for (i = 0; i < offload_db_size; i++) {
+        if (offload_db[i]) {
+            netdev_free_mirror_offload(offload_db[i]);
+        }
+    }
+    if (offload_db) {
+        free(offload_db);
+    }
+
+    return NULL;
+}
diff --git a/lib/netdev.h b/lib/netdev.h
index b705a9e56..cce042fc7 100644
--- a/lib/netdev.h
+++ b/lib/netdev.h
@@ -201,6 +201,22 @@  int netdev_send(struct netdev *, int qid, struct dp_packet_batch *,
                 bool concurrent_txq);
 void netdev_send_wait(struct netdev *, int qid);
 
+/* Hardware assisted mirror offloading*/
+struct mirror_offload_info {
+    struct netdev **src;
+    struct netdev **dst;
+    int n_src_port;
+    int n_dst_port;
+    struct eth_addr *flow_src_mac;
+    struct eth_addr *flow_dst_mac;
+    uint16_t *output_src_tags;
+    uint16_t *output_dst_tags;
+    bool add_mirror;
+    char *mirror_tunnel_addr;
+    char *name;
+};
+void netdev_mirror_offload_put(struct mirror_offload_info *);
+
 /* native tunnel APIs */
 /* Structure to pass parameters required to build a tunnel header. */
 struct netdev_tnl_build_header_params {
diff --git a/tests/ovs-vsctl.at b/tests/ovs-vsctl.at
index dccb11741..ff6e9e625 100644
--- a/tests/ovs-vsctl.at
+++ b/tests/ovs-vsctl.at
@@ -1364,7 +1364,9 @@  _uuid               : <1>
 name                : eth1
 _uuid               : <2>
 name                : mymirror
+output_dst_vlan     : []
 output_port         : <1>
+output_src_vlan     : []
 output_vlan         : []
 select_all          : false
 select_dst_port     : [<0>]
diff --git a/vswitchd/bridge.c b/vswitchd/bridge.c
index 5ed7e8234..7b7603513 100644
--- a/vswitchd/bridge.c
+++ b/vswitchd/bridge.c
@@ -38,6 +38,7 @@ 
 #include "mac-learning.h"
 #include "mcast-snooping.h"
 #include "netdev.h"
+#include "netdev-provider.h"
 #include "netdev-offload.h"
 #include "nx-match.h"
 #include "ofproto/bond.h"
@@ -330,6 +331,9 @@  static void mirror_destroy(struct mirror *);
 static bool mirror_configure(struct mirror *);
 static void mirror_refresh_stats(struct mirror *);
 
+static void mirror_offload_destroy(struct mirror *);
+static bool mirror_offload_configure(struct mirror *);
+
 static void iface_configure_lacp(struct iface *,
                                  struct lacp_member_settings *);
 static bool iface_create(struct bridge *, const struct ovsrec_interface *,
@@ -423,6 +427,35 @@  if_notifier_changed(struct if_notifier *notifier OVS_UNUSED)
     seq_wait(ifaces_changed, last_ifaces_changed);
     return changed;
 }
+
+static struct port *
+port_lookup_all(const char *port_name)
+{
+    struct bridge *br;
+    struct port *port = NULL;
+    int found = 0;
+
+    HMAP_FOR_EACH (br, node, &all_bridges) {
+        struct port *temp_port = NULL;
+        temp_port = port_lookup(br, port_name);
+        if (temp_port) {
+            if (!port) {
+                port = temp_port;
+            }
+            found++;
+        }
+    }
+
+    if (found) {
+        if (found > 1) {
+            VLOG_INFO("More than one bridge owns port with name:%s\n",
+                port_name);
+        }
+        return port;
+    }
+    return NULL;
+}
+
 
 /* Public functions. */
 
@@ -5055,14 +5088,228 @@  mirror_create(struct bridge *br, const struct ovsrec_mirror *cfg)
     return m;
 }
 
+static struct netdev *get_netdev_from_port(struct mirror *m,
+    struct port **port, const char *name)
+{
+    struct port *temp_port;
+    struct iface *iface;
+
+    *port = NULL;
+    temp_port = port_lookup(m->bridge, name);
+    if (temp_port) {
+        LIST_FOR_EACH (iface, port_elem, &temp_port->ifaces) {
+            if (iface) {
+                *port = temp_port;
+                return iface->netdev;
+            }
+        }
+    }
+    /* try different bridges */
+    temp_port = port_lookup_all(name);
+    if (temp_port) {
+        LIST_FOR_EACH (iface, port_elem, &temp_port->ifaces) {
+            if (iface) {
+                *port = temp_port;
+                return iface->netdev;
+            }
+        }
+    }
+    return NULL;
+}
+
+static void
+release_mirror_offload_info(struct mirror_offload_info *info)
+{
+    if (info->src) {
+        free(info->src);
+    }
+    if (info->dst) {
+        free(info->dst);
+    }
+    if (info->flow_dst_mac) {
+        free(info->flow_dst_mac);
+    }
+    if (info->flow_src_mac) {
+        free(info->flow_src_mac);
+    }
+    if (info->output_src_tags) {
+        free(info->output_src_tags);
+    }
+    if (info->output_dst_tags) {
+        free(info->output_dst_tags);
+    }
+    if (info->name) {
+        free(info->name);
+    }
+    if (info->mirror_tunnel_addr) {
+        free(info->mirror_tunnel_addr);
+    }
+}
+
+static int
+set_mirror_offload_info(struct mirror *m, struct mirror_offload_info *info)
+{
+    const struct ovsrec_mirror *cfg = m->cfg;
+    struct port *port = NULL;
+    int i;
+
+    if (m->name) {
+        info->name = xmalloc(strlen(m->name) + 1);
+        ovs_strzcpy(info->name, m->name, strlen(m->name));
+    }
+
+    if (cfg->mirror_tunnel_addr) {
+        info->mirror_tunnel_addr = xmalloc(strlen(cfg->mirror_tunnel_addr)
+            + 1);
+        ovs_strzcpy(info->mirror_tunnel_addr, cfg->mirror_tunnel_addr,
+                    strlen(cfg->mirror_tunnel_addr));
+    } else {
+        VLOG_ERR("mirror-offload configuration fails because"
+            " lack of tunnel device\n");
+        return -1;
+    }
+
+    /* source port */
+    info->n_src_port = cfg->n_select_src_port;
+    if (info->n_src_port) {
+        info->src = xmalloc(sizeof(struct netdev *)*info->n_src_port);
+        info->flow_dst_mac = xmalloc(sizeof(struct eth_addr)*
+            info->n_src_port);
+        if (info->n_src_port != cfg->n_output_src_vlan) {
+            VLOG_ERR("src port count:%d ouput src vlan count:%lu",
+                info->n_src_port, (unsigned long) cfg->n_output_src_vlan);
+            return -1;
+        }
+        info->output_src_tags = xmalloc(sizeof(uint16_t)*info->n_src_port);
+    }
+
+    if (info->n_src_port) {
+        /* find netdev instance for each port */
+        for (i = 0; i < info->n_src_port; i++) {
+            info->src[i] = get_netdev_from_port(m, &port,
+                cfg->select_src_port[i]->name);
+            if (!info->src[i]) {
+                VLOG_ERR("src-port: %s is not a netdev device\n",
+                    cfg->select_src_port[i]->name);
+                return -1;
+            }
+        }
+        memset(info->flow_dst_mac, 0, sizeof(struct eth_addr)*
+            info->n_src_port);
+
+        /*
+         * for source port, flow is separated by
+         * different dst mac addr
+         */
+        if (cfg->n_flow_dst_mac) {
+            int dst_count = (info->n_src_port > cfg->n_flow_dst_mac)?
+                cfg->n_flow_dst_mac:info->n_src_port;
+            for (i = 0; i < dst_count; i++) {
+                eth_addr_from_string(cfg->flow_dst_mac[i],
+                    &info->flow_dst_mac[i]);
+            }
+        }
+
+        if (cfg->n_output_src_vlan) {
+            int count = (cfg->n_output_src_vlan > info->n_src_port)?
+                info->n_src_port:cfg->n_output_src_vlan;
+            for (i = 0; i < count; i++) {
+                info->output_src_tags[i] = cfg->output_src_vlan[i] & 0xFFF;
+            }
+        }
+    }
+
+    /* dst ports */
+    info->n_dst_port = cfg->n_select_dst_port;
+    if (info->n_dst_port) {
+        info->dst = xmalloc(sizeof(struct netdev *)*info->n_dst_port);
+        info->flow_src_mac = xmalloc(sizeof(struct eth_addr)*
+            info->n_dst_port);
+        if (info->n_dst_port != cfg->n_output_dst_vlan) {
+            VLOG_ERR("dst port count:%d ouput dst vlan count:%lu\n",
+                info->n_dst_port, (unsigned long) cfg->n_output_dst_vlan);
+            return -1;
+        }
+        info->output_dst_tags = xmalloc(sizeof(uint16_t)*info->n_dst_port);
+    }
+
+    if (info->n_dst_port) {
+        for (i = 0; i < info->n_dst_port; i++) {
+            info->dst[i] = get_netdev_from_port(m, &port,
+                cfg->select_dst_port[i]->name);
+            if (!info->dst[i]) {
+                VLOG_ERR("dst-port: %s is not a netdev device\n",
+                    cfg->select_dst_port[i]->name);
+                return -1;
+            }
+        }
+        memset(info->flow_src_mac, 0, sizeof(struct eth_addr)*
+            info->n_dst_port);
+
+        /*
+         * for destination port, flow is separated by
+         * different src mac addr
+         */
+        if (cfg->n_flow_src_mac) {
+            int src_count = (info->n_dst_port > cfg->n_flow_src_mac)?
+                cfg->n_flow_src_mac:info->n_dst_port;
+            for (i = 0; i < src_count; i++) {
+                eth_addr_from_string(cfg->flow_src_mac[i],
+                    &info->flow_src_mac[i]);
+            }
+        }
+
+        if (cfg->n_output_dst_vlan) {
+            int count = (cfg->n_output_dst_vlan > info->n_dst_port)?
+                info->n_dst_port:cfg->n_output_dst_vlan;
+            for (i = 0; i < count; i++) {
+                info->output_dst_tags[i] = cfg->output_dst_vlan[i] & 0xFFF;
+            }
+        }
+    }
+
+    VLOG_INFO("sucess creating mirror-offload(%s): with %d src-port"
+        " streams %d dst-port streams to tunnel %s\n",
+        cfg->name, info->n_src_port, info->n_dst_port,
+        info->mirror_tunnel_addr?info->mirror_tunnel_addr:"none");
+    return 0;
+}
+
+static void
+mirror_offload_destroy(struct mirror *m)
+{
+    struct mirror_offload_info info;
+
+    memset(&info, 0, sizeof(struct mirror_offload_info));
+    info.add_mirror = false;
+    if (m->name) {
+        info.name = xmalloc(strlen(m->name) + 1);
+        if (info.name) {
+            ovs_strzcpy(info.name, m->name, strlen(m->name));
+        }
+    }
+
+    netdev_mirror_offload_put(&info);
+    if (info.name) {
+        free(info.name);
+    }
+    if (info.mirror_tunnel_addr) {
+        free(info.mirror_tunnel_addr);
+    }
+}
+
 static void
 mirror_destroy(struct mirror *m)
 {
     if (m) {
         struct bridge *br = m->bridge;
 
-        if (br->ofproto) {
-            ofproto_mirror_unregister(br->ofproto, m);
+        if (m->cfg && m->cfg->mirror_offload) {
+            mirror_offload_destroy(m);
+        } else {
+            if (br->ofproto) {
+                ofproto_mirror_unregister(br->ofproto, m);
+            }
         }
 
         hmap_remove(&br->mirrors, &m->hmap_node);
@@ -5094,12 +5341,32 @@  mirror_collect_ports(struct mirror *m,
     *n_out_portsp = n_out_ports;
 }
 
+static bool
+mirror_offload_configure(struct mirror *m)
+{
+    struct mirror_offload_info info;
+
+    memset(&info, 0, sizeof(struct mirror_offload_info));
+    info.add_mirror = true;
+    if (set_mirror_offload_info(m, &info)) {
+        release_mirror_offload_info(&info);
+        return false;
+    }
+
+    netdev_mirror_offload_put(&info);
+    release_mirror_offload_info(&info);
+    return true;
+}
+
 static bool
 mirror_configure(struct mirror *m)
 {
     const struct ovsrec_mirror *cfg = m->cfg;
     struct ofproto_mirror_settings s;
 
+    if (cfg->mirror_offload) {
+        return mirror_offload_configure(m);
+    }
     /* Set name. */
     if (strcmp(cfg->name, m->name)) {
         free(m->name);
diff --git a/vswitchd/vswitch.ovsschema b/vswitchd/vswitch.ovsschema
index 0666c8c76..4a1a34a1f 100644
--- a/vswitchd/vswitch.ovsschema
+++ b/vswitchd/vswitch.ovsschema
@@ -1,6 +1,6 @@ 
 {"name": "Open_vSwitch",
- "version": "8.2.0",
- "cksum": "1076640191 26427",
+ "version": "8.2.1",
+ "cksum": "4051567316 27206",
  "tables": {
    "Open_vSwitch": {
      "columns": {
@@ -418,8 +418,18 @@ 
      "columns": {
        "name": {
          "type": "string"},
+       "mirror_tunnel_addr": {
+         "type": "string"},
        "select_all": {
          "type": "boolean"},
+       "mirror_offload": {
+         "type": "boolean"},
+       "flow_src_mac": {
+         "type": {"key": {"type": "string"},
+                  "min": 0, "max": "unlimited"}},
+       "flow_dst_mac": {
+         "type": {"key": {"type": "string"},
+                  "min": 0, "max": "unlimited"}},
        "select_src_port": {
          "type": {"key": {"type": "uuid",
                           "refTable": "Port",
@@ -440,6 +450,16 @@ 
                           "refTable": "Port",
                           "refType": "weak"},
                   "min": 0, "max": 1}},
+       "output_src_vlan": {
+         "type": {"key": {"type": "integer",
+                          "minInteger": 0,
+                          "maxInteger": 4294967295},
+                  "min": 0, "max": 4096}},
+       "output_dst_vlan": {
+         "type": {"key": {"type": "integer",
+                          "minInteger": 0,
+                          "maxInteger": 4294967295},
+                  "min": 0, "max": 4096}},
        "output_vlan": {
          "type": {"key": {"type": "integer",
                           "minInteger": 1,
diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
index 4597a215d..fd2049a7f 100644
--- a/vswitchd/vswitch.xml
+++ b/vswitchd/vswitch.xml
@@ -4869,11 +4869,35 @@  ovs-vsctl add-port br0 p0 -- set Interface p0 type=patch options:peer=p1 \
         selected VLANs.
       </p>
 
+      <column name="mirror_tunnel_addr">
+        BDF string of the tunnel device on which mirrored traffic will be
+        transmitted.
+      </column>
+
       <column name="select_all">
         If true, every packet arriving or departing on any port is
         selected for mirroring.
       </column>
 
+      <column name="mirror_offload">
+        If true, a hw-assisted port mirroring is configured instead
+        default mirroring.
+      </column>
+
+      <column name="flow_src_mac">
+        The source MAC address(es) for per-flow mirroring. Each MAC
+        address is separate by ','. This parametr is paired with
+        select_dst_port. A '0' MAC address indicates the requested mirror
+        is a per-port mirroring, otherwise it's a per-flow mirroring
+      </column>
+
+      <column name="flow_dst_mac">
+        The destination MAC address(es) for per-flow mirroring. Each MAC
+        address is separate by ','. This parametr is paired with
+        select_src_port. A '0' MAC address indicates the requested mirror
+        is a per-port mirroring, otherwise it's a per-flow mirroring
+      </column>
+
       <column name="select_dst_port">
         Ports on which departing packets are selected for mirroring.
       </column>
@@ -4955,6 +4979,32 @@  ovs-vsctl add-port br0 p0 -- set Interface p0 type=patch options:peer=p1 \
         </p>
       </column>
 
+      <column name="output_src_vlan">
+        <p>Output VLAN for selected source port packets, if nonempty.</p>
+        <p>
+          <em>Please note:</em> This is different than
+          <ref column="output-vlan"/> This vlan is used to add an additional
+          vlan tag on the mirror traffic, regardless it contains vlan or not.
+          The receive end could choose to filter out this additional vlan.
+          This option is provided so the mirrored traffic could maintain its
+          original vlan informaiton, and this mirror can be used to filter
+          out un-wanted traffic such as in <ref column="mirror_offload"/>.
+        </p>
+      </column>
+
+      <column name="output_dst_vlan">
+        <p>Output VLAN for selected destination port packets, if nonempty.</p>
+        <p>
+          <em>Please note:</em> This is different than
+          <ref column="output-vlan"/> This vlan is used to add an additional
+          vlan tag on the mirror traffic, regardless it contains vlan or not.
+          The receive end could choose to filter out this additional vlan.
+          This option is provided so the mirrored traffic could maintain its
+          original vlan informaiton, and this mirror cab be used to filter
+          out un-wanted traffic such as in <ref column="mirror_offload"/>.
+        </p>
+      </column>
+
       <column name="snaplen">
         <p>Maximum per-packet number of bytes to mirror.</p>
         <p>A mirrored packet with size larger than <ref column="snaplen"/>