[ovs-dev,v4,2/2] netdev-dpdk: Add new DPDK RFC 4115 egress policer
diff mbox series

Message ID 20200113155630.7981.32847.stgit@netdev64
State Changes Requested
Headers show
Series
  • netdev-dpdk: Add new DPDK RFC 4115 egress policer
Related show

Commit Message

Eelco Chaudron Jan. 13, 2020, 3:56 p.m. UTC
This patch adds a new policer to the DPDK datapath based on RFC 4115's
Two-Rate, Three-Color marker. It's a two-level hierarchical policer
which first does a color-blind marking of the traffic at the queue
level, followed by a color-aware marking at the port level. At the end
traffic marked as Green or Yellow is forwarded, Red is dropped. For
details on how traffic is marked, see RFC 4115.

This egress policer can be used to limit traffic at different rated
based on the queues the traffic is in. In addition, it can also be used
to prioritize certain traffic over others at a port level.

For example, the following configuration will limit the traffic rate at a
port level to a maximum of 2000 packets a second (64 bytes IPv4 packets).
100pps as CIR (Committed Information Rate) and 1000pps as EIR (Excess
Information Rate). High priority traffic is routed to queue 10, which marks
all traffic as CIR, i.e. Green. All low priority traffic, queue 20, is
marked as EIR, i.e. Yellow.

ovs-vsctl --timeout=5 set port dpdk1 qos=@myqos -- \
  --id=@myqos create qos type=trtcm-policer \
  other-config:cir=52000 other-config:cbs=2048 \
  other-config:eir=52000 other-config:ebs=2048  \
  queues:10=@dpdk1Q10 queues:20=@dpdk1Q20 -- \
  --id=@dpdk1Q10 create queue \
    other-config:cir=41600000 other-config:cbs=2048 \
    other-config:eir=0 other-config:ebs=0 -- \
  --id=@dpdk1Q20 create queue \
    other-config:cir=0 other-config:cbs=0 \
    other-config:eir=41600000 other-config:ebs=2048 \

This configuration accomplishes that the high priority traffic has a
guaranteed bandwidth egressing the ports at CIR (1000pps), but it can also
use the EIR, so a total of 2000pps at max. These additional 1000pps is
shared with the low priority traffic. The low priority traffic can use at
maximum 1000pps.

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
---
 Documentation/topics/dpdk/qos.rst |   43 ++++
 lib/netdev-dpdk.c                 |  358 +++++++++++++++++++++++++++++++++++--
 vswitchd/vswitch.xml              |   34 ++++
 3 files changed, 420 insertions(+), 15 deletions(-)

Comments

Stokes, Ian Jan. 13, 2020, 8:32 p.m. UTC | #1
On 1/13/2020 3:56 PM, Eelco Chaudron wrote:
> This patch adds a new policer to the DPDK datapath based on RFC 4115's
> Two-Rate, Three-Color marker. It's a two-level hierarchical policer
> which first does a color-blind marking of the traffic at the queue
> level, followed by a color-aware marking at the port level. At the end
> traffic marked as Green or Yellow is forwarded, Red is dropped. For
> details on how traffic is marked, see RFC 4115.
> 
> This egress policer can be used to limit traffic at different rated
> based on the queues the traffic is in. In addition, it can also be used
> to prioritize certain traffic over others at a port level.
> 
> For example, the following configuration will limit the traffic rate at a
> port level to a maximum of 2000 packets a second (64 bytes IPv4 packets).
> 100pps as CIR (Committed Information Rate) and 1000pps as EIR (Excess
> Information Rate). High priority traffic is routed to queue 10, which marks
> all traffic as CIR, i.e. Green. All low priority traffic, queue 20, is
> marked as EIR, i.e. Yellow.
> 
> ovs-vsctl --timeout=5 set port dpdk1 qos=@myqos -- \
>    --id=@myqos create qos type=trtcm-policer \
>    other-config:cir=52000 other-config:cbs=2048 \
>    other-config:eir=52000 other-config:ebs=2048  \
>    queues:10=@dpdk1Q10 queues:20=@dpdk1Q20 -- \
>    --id=@dpdk1Q10 create queue \
>      other-config:cir=41600000 other-config:cbs=2048 \
>      other-config:eir=0 other-config:ebs=0 -- \
>    --id=@dpdk1Q20 create queue \
>      other-config:cir=0 other-config:cbs=0 \
>      other-config:eir=41600000 other-config:ebs=2048 \
> 
> This configuration accomplishes that the high priority traffic has a
> guaranteed bandwidth egressing the ports at CIR (1000pps), but it can also
> use the EIR, so a total of 2000pps at max. These additional 1000pps is
> shared with the low priority traffic. The low priority traffic can use at
> maximum 1000pps.

Thanks for the patch Eelco, LGTM, I just want to finish a little more 
testing but I intend to merge this tomorrow to master if nothing else 
comes up/is raised.

Regards
Ian

> 
> Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
> ---
>   Documentation/topics/dpdk/qos.rst |   43 ++++
>   lib/netdev-dpdk.c                 |  358 +++++++++++++++++++++++++++++++++++--
>   vswitchd/vswitch.xml              |   34 ++++
>   3 files changed, 420 insertions(+), 15 deletions(-)
> 
> diff --git a/Documentation/topics/dpdk/qos.rst b/Documentation/topics/dpdk/qos.rst
> index c0aec5d..1034954 100644
> --- a/Documentation/topics/dpdk/qos.rst
> +++ b/Documentation/topics/dpdk/qos.rst
> @@ -33,6 +33,9 @@ datapath. These are referred to as *QoS* and *Rate Limiting*, respectively.
>   QoS (Egress Policing)
>   ---------------------
>   
> +Single Queue Policer
> +~~~~~~~~~~~~~~~~~~~~
> +
>   Assuming you have a :doc:`vhost-user port <vhost-user>` transmitting traffic
>   consisting of packets of size 64 bytes, the following command would limit the
>   egress transmission rate of the port to ~1,000,000 packets per second::
> @@ -49,6 +52,46 @@ To clear the QoS configuration from the port and ovsdb, run::
>   
>       $ ovs-vsctl destroy QoS vhost-user0 -- clear Port vhost-user0 qos
>   
> +
> +Multi Queue Policer
> +~~~~~~~~~~~~~~~~~~~
> +
> +In addition to the egress-policer OVS-DPDK also has support for a RFC
> +4115's Two-Rate, Three-Color marker meter. It's a two-level hierarchical
> +policer which first does a color-blind marking of the traffic at the queue
> +level, followed by a color-aware marking at the port level. At the end
> +traffic marked as Green or Yellow is forwarded, Red is dropped. For
> +details on how traffic is marked, see RFC 4115.
> +
> +This egress policer can be used to limit traffic at different rated
> +based on the queues the traffic is in. In addition, it can also be used
> +to prioritize certain traffic over others at a port level.
> +
> +For example, the following configuration will limit the traffic rate at a
> +port level to a maximum of 2000 packets a second (64 bytes IPv4 packets).
> +100pps as CIR (Committed Information Rate) and 1000pps as EIR (Excess
> +Information Rate). High priority traffic is routed to queue 10, which marks
> +all traffic as CIR, i.e. Green. All low priority traffic, queue 20, is
> +marked as EIR, i.e. Yellow::
> +
> +    $ ovs-vsctl --timeout=5 set port dpdk1 qos=@myqos -- \
> +        --id=@myqos create qos type=trtcm-policer \
> +        other-config:cir=52000 other-config:cbs=2048 \
> +        other-config:eir=52000 other-config:ebs=2048  \
> +        queues:10=@dpdk1Q10 queues:20=@dpdk1Q20 -- \
> +         --id=@dpdk1Q10 create queue \
> +          other-config:cir=41600000 other-config:cbs=2048 \
> +          other-config:eir=0 other-config:ebs=0 -- \
> +         --id=@dpdk1Q20 create queue \
> +           other-config:cir=0 other-config:cbs=0 \
> +           other-config:eir=41600000 other-config:ebs=2048 \
> +
> +This configuration accomplishes that the high priority traffic has a
> +guaranteed bandwidth egressing the ports at CIR (1000pps), but it can also
> +use the EIR, so a total of 2000pps at max. These additional 1000pps is
> +shared with the low priority traffic. The low priority traffic can use at
> +maximum 1000pps.
> +
>   Refer to ``vswitch.xml`` for more details on egress policer.
>   
>   Rate Limiting (Ingress Policing)
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index 128963f..3f164c8 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -26,7 +26,10 @@
>   #include <sys/socket.h>
>   #include <linux/if.h>
>   
> +/* The below is needed as rte_meter.h gets included trough rte_bus_pci.h. */
> +#define ALLOW_EXPERIMENTAL_API
>   #include <rte_bus_pci.h>
> +#undef ALLOW_EXPERIMENTAL_API
>   #include <rte_config.h>
>   #include <rte_cycles.h>
>   #include <rte_errno.h>
> @@ -35,7 +38,9 @@
>   #include <rte_flow.h>
>   #include <rte_malloc.h>
>   #include <rte_mbuf.h>
> +#define ALLOW_EXPERIMENTAL_API
>   #include <rte_meter.h>
> +#undef ALLOW_EXPERIMENTAL_API
>   #include <rte_pci.h>
>   #include <rte_version.h>
>   #include <rte_vhost.h>
> @@ -329,8 +334,9 @@ struct dpdk_qos_ops {
>                                        struct netdev_dpdk_queue_state *state);
>   };
>   
> -/* dpdk_qos_ops for each type of user space QoS implementation */
> +/* dpdk_qos_ops for each type of user space QoS implementation. */
>   static const struct dpdk_qos_ops egress_policer_ops;
> +static const struct dpdk_qos_ops trtcm_policer_ops;
>   
>   /*
>    * Array of dpdk_qos_ops, contains pointer to all supported QoS
> @@ -338,6 +344,7 @@ static const struct dpdk_qos_ops egress_policer_ops;
>    */
>   static const struct dpdk_qos_ops *const qos_confs[] = {
>       &egress_policer_ops,
> +    &trtcm_policer_ops,
>       NULL
>   };
>   
> @@ -2162,9 +2169,9 @@ netdev_dpdk_eth_tx_burst(struct netdev_dpdk *dev, int qid,
>   }
>   
>   static inline bool
> -netdev_dpdk_policer_pkt_handle(struct rte_meter_srtcm *meter,
> -                               struct rte_meter_srtcm_profile *profile,
> -                               struct rte_mbuf *pkt, uint64_t time)
> +netdev_dpdk_srtcm_policer_pkt_handle(struct rte_meter_srtcm *meter,
> +                                     struct rte_meter_srtcm_profile *profile,
> +                                     struct rte_mbuf *pkt, uint64_t time)
>   {
>       uint32_t pkt_len = rte_pktmbuf_pkt_len(pkt) - sizeof(struct rte_ether_hdr);
>   
> @@ -2173,10 +2180,10 @@ netdev_dpdk_policer_pkt_handle(struct rte_meter_srtcm *meter,
>   }
>   
>   static int
> -netdev_dpdk_policer_run(struct rte_meter_srtcm *meter,
> -                        struct rte_meter_srtcm_profile *profile,
> -                        struct rte_mbuf **pkts, int pkt_cnt,
> -                        bool should_steal)
> +srtcm_policer_run_single_packet(struct rte_meter_srtcm *meter,
> +                                struct rte_meter_srtcm_profile *profile,
> +                                struct rte_mbuf **pkts, int pkt_cnt,
> +                                bool should_steal)
>   {
>       int i = 0;
>       int cnt = 0;
> @@ -2186,8 +2193,8 @@ netdev_dpdk_policer_run(struct rte_meter_srtcm *meter,
>       for (i = 0; i < pkt_cnt; i++) {
>           pkt = pkts[i];
>           /* Handle current packet */
> -        if (netdev_dpdk_policer_pkt_handle(meter, profile,
> -                                           pkt, current_time)) {
> +        if (netdev_dpdk_srtcm_policer_pkt_handle(meter, profile,
> +                                                 pkt, current_time)) {
>               if (cnt != i) {
>                   pkts[cnt] = pkt;
>               }
> @@ -2209,8 +2216,9 @@ ingress_policer_run(struct ingress_policer *policer, struct rte_mbuf **pkts,
>       int cnt = 0;
>   
>       rte_spinlock_lock(&policer->policer_lock);
> -    cnt = netdev_dpdk_policer_run(&policer->in_policer, &policer->in_prof,
> -                                  pkts, pkt_cnt, should_steal);
> +    cnt = srtcm_policer_run_single_packet(&policer->in_policer,
> +                                          &policer->in_prof,
> +                                          pkts, pkt_cnt, should_steal);
>       rte_spinlock_unlock(&policer->policer_lock);
>   
>       return cnt;
> @@ -4480,9 +4488,9 @@ egress_policer_run(struct qos_conf *conf, struct rte_mbuf **pkts, int pkt_cnt,
>       struct egress_policer *policer =
>           CONTAINER_OF(conf, struct egress_policer, qos_conf);
>   
> -    cnt = netdev_dpdk_policer_run(&policer->egress_meter,
> -                                  &policer->egress_prof, pkts,
> -                                  pkt_cnt, should_steal);
> +    cnt = srtcm_policer_run_single_packet(&policer->egress_meter,
> +                                          &policer->egress_prof, pkts,
> +                                          pkt_cnt, should_steal);
>   
>       return cnt;
>   }
> @@ -4496,6 +4504,326 @@ static const struct dpdk_qos_ops egress_policer_ops = {
>       .qos_run = egress_policer_run
>   };
>   
> +/* trtcm-policer details */
> +
> +struct trtcm_policer {
> +    struct qos_conf qos_conf;
> +    struct rte_meter_trtcm_rfc4115_params meter_params;
> +    struct rte_meter_trtcm_rfc4115_profile meter_profile;
> +    struct rte_meter_trtcm_rfc4115 meter;
> +    struct netdev_queue_stats stats;
> +    struct hmap queues;
> +};
> +
> +struct trtcm_policer_queue {
> +    struct hmap_node hmap_node;
> +    uint32_t queue_id;
> +    struct rte_meter_trtcm_rfc4115_params meter_params;
> +    struct rte_meter_trtcm_rfc4115_profile meter_profile;
> +    struct rte_meter_trtcm_rfc4115 meter;
> +    struct netdev_queue_stats stats;
> +};
> +
> +static void
> +trtcm_policer_details_to_param(const struct smap *details,
> +                               struct rte_meter_trtcm_rfc4115_params *params)
> +{
> +    memset(params, 0, sizeof *params);
> +    params->cir = smap_get_ullong(details, "cir", 0);
> +    params->eir = smap_get_ullong(details, "eir", 0);
> +    params->cbs = smap_get_ullong(details, "cbs", 0);
> +    params->ebs = smap_get_ullong(details, "ebs", 0);
> +}
> +
> +static void
> +trtcm_policer_param_to_detail(
> +    const struct rte_meter_trtcm_rfc4115_params *params,
> +    struct smap *details)
> +{
> +    smap_add_format(details, "cir", "%"PRIu64, params->cir);
> +    smap_add_format(details, "eir", "%"PRIu64, params->eir);
> +    smap_add_format(details, "cbs", "%"PRIu64, params->cbs);
> +    smap_add_format(details, "ebs", "%"PRIu64, params->ebs);
> +}
> +
> +
> +static int
> +trtcm_policer_qos_construct(const struct smap *details,
> +                            struct qos_conf **conf)
> +{
> +    struct trtcm_policer *policer;
> +    int err = 0;
> +
> +    policer = xmalloc(sizeof *policer);
> +    qos_conf_init(&policer->qos_conf, &trtcm_policer_ops);
> +    trtcm_policer_details_to_param(details, &policer->meter_params);
> +    err = rte_meter_trtcm_rfc4115_profile_config(&policer->meter_profile,
> +                                                 &policer->meter_params);
> +    if (!err) {
> +        err = rte_meter_trtcm_rfc4115_config(&policer->meter,
> +                                             &policer->meter_profile);
> +    }
> +
> +    if (!err) {
> +        *conf = &policer->qos_conf;
> +        memset(&policer->stats, 0, sizeof policer->stats);
> +        hmap_init(&policer->queues);
> +    } else {
> +        free(policer);
> +        *conf = NULL;
> +        err = -err;
> +    }
> +
> +    return err;
> +}
> +
> +static void
> +trtcm_policer_qos_destruct(struct qos_conf *conf)
> +{
> +    struct trtcm_policer_queue *queue, *next_queue;
> +    struct trtcm_policer *policer = CONTAINER_OF(conf, struct trtcm_policer,
> +                                                 qos_conf);
> +
> +    HMAP_FOR_EACH_SAFE (queue, next_queue, hmap_node, &policer->queues) {
> +        hmap_remove(&policer->queues, &queue->hmap_node);
> +        free(queue);
> +    }
> +    hmap_destroy(&policer->queues);
> +    free(policer);
> +}
> +
> +static int
> +trtcm_policer_qos_get(const struct qos_conf *conf, struct smap *details)
> +{
> +    struct trtcm_policer *policer = CONTAINER_OF(conf, struct trtcm_policer,
> +                                                 qos_conf);
> +
> +    trtcm_policer_param_to_detail(&policer->meter_params, details);
> +    return 0;
> +}
> +
> +static bool
> +trtcm_policer_qos_is_equal(const struct qos_conf *conf,
> +                           const struct smap *details)
> +{
> +    struct trtcm_policer *policer = CONTAINER_OF(conf, struct trtcm_policer,
> +                                                 qos_conf);
> +    struct rte_meter_trtcm_rfc4115_params params;
> +
> +    trtcm_policer_details_to_param(details, &params);
> +
> +    return !memcmp(&params, &policer->meter_params, sizeof params);
> +}
> +
> +static struct trtcm_policer_queue *
> +trtcm_policer_qos_find_queue(struct trtcm_policer *policer, uint32_t queue_id)
> +{
> +    struct trtcm_policer_queue *queue;
> +    HMAP_FOR_EACH_WITH_HASH (queue, hmap_node, hash_2words(queue_id, 0),
> +                             &policer->queues) {
> +        if (queue->queue_id == queue_id) {
> +            return queue;
> +        }
> +    }
> +    return NULL;
> +}
> +
> +static inline bool
> +trtcm_policer_run_single_packet(struct trtcm_policer *policer,
> +                                struct rte_mbuf *pkt, uint64_t time)
> +{
> +    enum rte_color pkt_color;
> +    struct trtcm_policer_queue *queue;
> +    uint32_t pkt_len = rte_pktmbuf_pkt_len(pkt) - sizeof(struct rte_ether_hdr);
> +    struct dp_packet *dpkt = CONTAINER_OF(pkt, struct dp_packet, mbuf);
> +
> +    queue = trtcm_policer_qos_find_queue(policer, dpkt->md.skb_priority);
> +    if (!queue) {
> +        /* If no queue is found, use the default queue, which MUST exist. */
> +        queue = trtcm_policer_qos_find_queue(policer, 0);
> +        if (!queue) {
> +            return false;
> +        }
> +    }
> +
> +    pkt_color = rte_meter_trtcm_rfc4115_color_blind_check(&queue->meter,
> +                                                          &queue->meter_profile,
> +                                                          time,
> +                                                          pkt_len);
> +
> +    if (pkt_color == RTE_COLOR_RED) {
> +        queue->stats.tx_errors++;
> +    } else {
> +        queue->stats.tx_bytes += pkt_len;
> +        queue->stats.tx_packets++;
> +    }
> +
> +    pkt_color = rte_meter_trtcm_rfc4115_color_aware_check(&policer->meter,
> +                                                     &policer->meter_profile,
> +                                                     time, pkt_len,
> +                                                     pkt_color);
> +
> +    if (pkt_color == RTE_COLOR_RED) {
> +        policer->stats.tx_errors++;
> +        return false;
> +    }
> +
> +    policer->stats.tx_bytes += pkt_len;
> +    policer->stats.tx_packets++;
> +    return true;
> +}
> +
> +static int
> +trtcm_policer_run(struct qos_conf *conf, struct rte_mbuf **pkts, int pkt_cnt,
> +                  bool should_steal)
> +{
> +    int i = 0;
> +    int cnt = 0;
> +    struct rte_mbuf *pkt = NULL;
> +    uint64_t current_time = rte_rdtsc();
> +
> +    struct trtcm_policer *policer = CONTAINER_OF(conf, struct trtcm_policer,
> +                                                 qos_conf);
> +
> +    for (i = 0; i < pkt_cnt; i++) {
> +        pkt = pkts[i];
> +
> +        if (trtcm_policer_run_single_packet(policer, pkt, current_time)) {
> +            if (cnt != i) {
> +                pkts[cnt] = pkt;
> +            }
> +            cnt++;
> +        } else {
> +            if (should_steal) {
> +                rte_pktmbuf_free(pkt);
> +            }
> +        }
> +    }
> +    return cnt;
> +}
> +
> +static int
> +trtcm_policer_qos_queue_construct(const struct smap *details,
> +                                  uint32_t queue_id, struct qos_conf *conf)
> +{
> +    int err = 0;
> +    struct trtcm_policer_queue *queue;
> +    struct trtcm_policer *policer = CONTAINER_OF(conf, struct trtcm_policer,
> +                                                 qos_conf);
> +
> +    queue = trtcm_policer_qos_find_queue(policer, queue_id);
> +    if (!queue) {
> +        queue = xmalloc(sizeof *queue);
> +        queue->queue_id = queue_id;
> +        memset(&queue->stats, 0, sizeof queue->stats);
> +        queue->stats.created = time_msec();
> +        hmap_insert(&policer->queues, &queue->hmap_node,
> +                    hash_2words(queue_id, 0));
> +    }
> +    if (queue_id == 0 && smap_is_empty(details)) {
> +        /* No default queue configured, use port values */
> +        memcpy(&queue->meter_params, &policer->meter_params,
> +               sizeof queue->meter_params);
> +    } else {
> +        trtcm_policer_details_to_param(details, &queue->meter_params);
> +    }
> +
> +    err = rte_meter_trtcm_rfc4115_profile_config(&queue->meter_profile,
> +                                                 &queue->meter_params);
> +
> +    if (!err) {
> +        err = rte_meter_trtcm_rfc4115_config(&queue->meter,
> +                                             &queue->meter_profile);
> +    }
> +    if (err) {
> +        hmap_remove(&policer->queues, &queue->hmap_node);
> +        free(queue);
> +        err = -err;
> +    }
> +    return err;
> +}
> +
> +static void
> +trtcm_policer_qos_queue_destruct(struct qos_conf *conf, uint32_t queue_id)
> +{
> +    struct trtcm_policer_queue *queue;
> +    struct trtcm_policer *policer = CONTAINER_OF(conf, struct trtcm_policer,
> +                                                 qos_conf);
> +
> +    queue = trtcm_policer_qos_find_queue(policer, queue_id);
> +    if (queue) {
> +        hmap_remove(&policer->queues, &queue->hmap_node);
> +        free(queue);
> +    }
> +}
> +
> +static int
> +trtcm_policer_qos_queue_get(struct smap *details, uint32_t queue_id,
> +                            const struct qos_conf *conf)
> +{
> +    struct trtcm_policer_queue *queue;
> +    struct trtcm_policer *policer = CONTAINER_OF(conf, struct trtcm_policer,
> +                                                 qos_conf);
> +
> +    queue = trtcm_policer_qos_find_queue(policer, queue_id);
> +    if (!queue) {
> +        return EINVAL;
> +    }
> +
> +    trtcm_policer_param_to_detail(&queue->meter_params, details);
> +    return 0;
> +}
> +
> +static int
> +trtcm_policer_qos_queue_get_stats(const struct qos_conf *conf,
> +                                  uint32_t queue_id,
> +                                  struct netdev_queue_stats *stats)
> +{
> +    struct trtcm_policer_queue *queue;
> +    struct trtcm_policer *policer = CONTAINER_OF(conf, struct trtcm_policer,
> +                                                 qos_conf);
> +
> +    queue = trtcm_policer_qos_find_queue(policer, queue_id);
> +    if (!queue) {
> +        return EINVAL;
> +    }
> +    memcpy(stats, &queue->stats, sizeof *stats);
> +    return 0;
> +}
> +
> +static int
> +trtcm_policer_qos_queue_dump_state_init(const struct qos_conf *conf,
> +                                        struct netdev_dpdk_queue_state *state)
> +{
> +    uint32_t i = 0;
> +    struct trtcm_policer_queue *queue;
> +    struct trtcm_policer *policer = CONTAINER_OF(conf, struct trtcm_policer,
> +                                                 qos_conf);
> +
> +    state->n_queues = hmap_count(&policer->queues);
> +    state->cur_queue = 0;
> +    state->queues = xmalloc(state->n_queues * sizeof *state->queues);
> +
> +    HMAP_FOR_EACH (queue, hmap_node, &policer->queues) {
> +        state->queues[i++] = queue->queue_id;
> +    }
> +    return 0;
> +}
> +
> +static const struct dpdk_qos_ops trtcm_policer_ops = {
> +    .qos_name = "trtcm-policer",
> +    .qos_construct = trtcm_policer_qos_construct,
> +    .qos_destruct = trtcm_policer_qos_destruct,
> +    .qos_get = trtcm_policer_qos_get,
> +    .qos_is_equal = trtcm_policer_qos_is_equal,
> +    .qos_run = trtcm_policer_run,
> +    .qos_queue_construct = trtcm_policer_qos_queue_construct,
> +    .qos_queue_destruct = trtcm_policer_qos_queue_destruct,
> +    .qos_queue_get = trtcm_policer_qos_queue_get,
> +    .qos_queue_get_stats = trtcm_policer_qos_queue_get_stats,
> +    .qos_queue_dump_state_init = trtcm_policer_qos_queue_dump_state_init
> +};
> +
>   static int
>   netdev_dpdk_reconfigure(struct netdev *netdev)
>   {
> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
> index 0ec726c..c43cb1a 100644
> --- a/vswitchd/vswitch.xml
> +++ b/vswitchd/vswitch.xml
> @@ -4441,6 +4441,19 @@ ovs-vsctl add-port br0 p0 -- set Interface p0 type=patch options:peer=p1 \
>             in performance will be noticed in terms of overall aggregate traffic
>             throughput.
>           </dd>
> +        <dt><code>trtcm-policer</code></dt>
> +        <dd>
> +          A DPDK egress policer algorithm using RFC 4115's Two-Rate,
> +          Three-Color marker. It's a two-level hierarchical policer
> +          which first does a color-blind marking of the traffic at the queue
> +          level, followed by a color-aware marking at the port level. At the
> +          end traffic marked as Green or Yellow is forwarded, Red is dropped.
> +          For details on how traffic is marked, see RFC 4115.
> +
> +          If the ``default queue'', 0, is not configured it's automatically
> +          created with the same <code>other_config</code> values as the
> +          physical port.
> +        </dd>
>         </dl>
>       </column>
>   
> @@ -4508,6 +4521,27 @@ ovs-vsctl add-port br0 p0 -- set Interface p0 type=patch options:peer=p1 \
>           bytes/tokens of the packet. If there are not enough tokens in the cbs
>           bucket the packet will be dropped.
>         </column>
> +      <column name="other_config" key="eir" type='{"type": "integer"}'>
> +        The Excess Information Rate (EIR) is measured in bytes of IP
> +        packets per second, i.e. it includes the IP header, but not link
> +        specific (e.g. Ethernet) headers. This represents the bytes per second
> +        rate at which the token bucket will be updated. The eir value is
> +        calculated by (pps x packet data size).  For example assuming a user
> +        wishes to limit a stream consisting of 64 byte packets to 1 million
> +        packets per second the EIR would be set to to to 46000000. This value
> +        can be broken into '1,000,000 x 46'. Where 1,000,000 is the policing
> +        rate for the number of packets per second and 46 represents the size
> +        of the packet data for a 64 byte ip packet.
> +      </column>
> +      <column name="other_config" key="ebs" type='{"type": "integer"}'>
> +        The Excess Burst Size (EBS) is measured in bytes and represents a
> +        token bucket. At a minimum this value should be be set to the expected
> +        largest size packet in the traffic stream. In practice larger values
> +        may be used to increase the size of the token bucket. If a packet can
> +        be transmitted then the ebs will be decremented by the number of
> +        bytes/tokens of the packet. If there are not enough tokens in the cbs
> +        bucket the packet might be dropped.
> +      </column>
>       </group>
>   
>       <group title="Configuration for linux-sfq">
>
David Marchand Jan. 14, 2020, 8:35 a.m. UTC | #2
Hey Eelco,

On Mon, Jan 13, 2020 at 4:57 PM Eelco Chaudron <echaudro@redhat.com> wrote:
[snip]
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index 128963f..3f164c8 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -26,7 +26,10 @@
>  #include <sys/socket.h>
>  #include <linux/if.h>
>
> +/* The below is needed as rte_meter.h gets included trough rte_bus_pci.h. */
> +#define ALLOW_EXPERIMENTAL_API
>  #include <rte_bus_pci.h>
> +#undef ALLOW_EXPERIMENTAL_API

__rte_experimental is getting defined at the first inclusion of
rte_compat.h, so here rte_bus_pci.h is the first one.
__rte_experimental ends up being a big "all or nothing" switch, so we
don't need to #undef.

Cc: Neil if he has a better idea/comments.

>  #include <rte_config.h>
>  #include <rte_cycles.h>
>  #include <rte_errno.h>
> @@ -35,7 +38,9 @@
>  #include <rte_flow.h>
>  #include <rte_malloc.h>
>  #include <rte_mbuf.h>
> +#define ALLOW_EXPERIMENTAL_API
>  #include <rte_meter.h>
> +#undef ALLOW_EXPERIMENTAL_API
>  #include <rte_pci.h>
>  #include <rte_version.h>
>  #include <rte_vhost.h>
Stokes, Ian Jan. 14, 2020, 11:23 a.m. UTC | #3
On 1/13/2020 8:32 PM, Stokes, Ian wrote:
> 
> 
> On 1/13/2020 3:56 PM, Eelco Chaudron wrote:
>> This patch adds a new policer to the DPDK datapath based on RFC 4115's
>> Two-Rate, Three-Color marker. It's a two-level hierarchical policer
>> which first does a color-blind marking of the traffic at the queue
>> level, followed by a color-aware marking at the port level. At the end
>> traffic marked as Green or Yellow is forwarded, Red is dropped. For
>> details on how traffic is marked, see RFC 4115.
>>
>> This egress policer can be used to limit traffic at different rated
>> based on the queues the traffic is in. In addition, it can also be used
>> to prioritize certain traffic over others at a port level.
>>
>> For example, the following configuration will limit the traffic rate at a
>> port level to a maximum of 2000 packets a second (64 bytes IPv4 packets).
>> 100pps as CIR (Committed Information Rate) and 1000pps as EIR (Excess
>> Information Rate). High priority traffic is routed to queue 10, which 
>> marks
>> all traffic as CIR, i.e. Green. All low priority traffic, queue 20, is
>> marked as EIR, i.e. Yellow.
>>
>> ovs-vsctl --timeout=5 set port dpdk1 qos=@myqos -- \
>>    --id=@myqos create qos type=trtcm-policer \
>>    other-config:cir=52000 other-config:cbs=2048 \
>>    other-config:eir=52000 other-config:ebs=2048  \
>>    queues:10=@dpdk1Q10 queues:20=@dpdk1Q20 -- \
>>    --id=@dpdk1Q10 create queue \
>>      other-config:cir=41600000 other-config:cbs=2048 \
>>      other-config:eir=0 other-config:ebs=0 -- \
>>    --id=@dpdk1Q20 create queue \
>>      other-config:cir=0 other-config:cbs=0 \
>>      other-config:eir=41600000 other-config:ebs=2048 \
>>
>> This configuration accomplishes that the high priority traffic has a
>> guaranteed bandwidth egressing the ports at CIR (1000pps), but it can 
>> also
>> use the EIR, so a total of 2000pps at max. These additional 1000pps is
>> shared with the low priority traffic. The low priority traffic can use at
>> maximum 1000pps.
> 
> Thanks for the patch Eelco, LGTM, I just want to finish a little more 
> testing but I intend to merge this tomorrow to master if nothing else 
> comes up/is raised.
> 

Hi Eelco, I'm seeing a crash in OVS while running this with just a port 
and a default queue 0 (phy to phy setup). It seems related to the call 
to rte_meter_trtcm_rfc4115_color_blind_check. I've provided more detail 
below in the trtcm_policer_run_single_packet function, just wondering if 
you've come across it?

Heres the output for the qos configuration I'm using

-bash-4.4$ovs-appctl -t ovs-vswitchd qos/show dpdk1
QoS: dpdk1 trtcm-policer
eir: 52000
cbs: 2048
ebs: 2048
cir: 52000

Default:
   eir: 52000
   cbs: 2048
   ebs: 2048
   cir: 52000
   tx_packets: 672150
   tx_bytes: 30918900
   tx_errors: 489562233

I'll try to investigate further with DPDK and GDB also.

> Regards
> Ian
> 
>>
>> Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
>> ---
>>   Documentation/topics/dpdk/qos.rst |   43 ++++
>>   lib/netdev-dpdk.c                 |  358 
>> +++++++++++++++++++++++++++++++++++--
>>   vswitchd/vswitch.xml              |   34 ++++
>>   3 files changed, 420 insertions(+), 15 deletions(-)
>>
>> diff --git a/Documentation/topics/dpdk/qos.rst 
>> b/Documentation/topics/dpdk/qos.rst
>> index c0aec5d..1034954 100644
>> --- a/Documentation/topics/dpdk/qos.rst
>> +++ b/Documentation/topics/dpdk/qos.rst
>> @@ -33,6 +33,9 @@ datapath. These are referred to as *QoS* and *Rate 
>> Limiting*, respectively.
>>   QoS (Egress Policing)
>>   ---------------------
>> +Single Queue Policer
>> +~~~~~~~~~~~~~~~~~~~~
>> +
>>   Assuming you have a :doc:`vhost-user port <vhost-user>` transmitting 
>> traffic
>>   consisting of packets of size 64 bytes, the following command would 
>> limit the
>>   egress transmission rate of the port to ~1,000,000 packets per second::
>> @@ -49,6 +52,46 @@ To clear the QoS configuration from the port and 
>> ovsdb, run::
>>       $ ovs-vsctl destroy QoS vhost-user0 -- clear Port vhost-user0 qos
>> +
>> +Multi Queue Policer
>> +~~~~~~~~~~~~~~~~~~~
>> +
>> +In addition to the egress-policer OVS-DPDK also has support for a RFC
>> +4115's Two-Rate, Three-Color marker meter. It's a two-level hierarchical
>> +policer which first does a color-blind marking of the traffic at the 
>> queue
>> +level, followed by a color-aware marking at the port level. At the end
>> +traffic marked as Green or Yellow is forwarded, Red is dropped. For
>> +details on how traffic is marked, see RFC 4115.
>> +
>> +This egress policer can be used to limit traffic at different rated
>> +based on the queues the traffic is in. In addition, it can also be used
>> +to prioritize certain traffic over others at a port level.
>> +
>> +For example, the following configuration will limit the traffic rate 
>> at a
>> +port level to a maximum of 2000 packets a second (64 bytes IPv4 
>> packets).
>> +100pps as CIR (Committed Information Rate) and 1000pps as EIR (Excess
>> +Information Rate). High priority traffic is routed to queue 10, which 
>> marks
>> +all traffic as CIR, i.e. Green. All low priority traffic, queue 20, is
>> +marked as EIR, i.e. Yellow::
>> +
>> +    $ ovs-vsctl --timeout=5 set port dpdk1 qos=@myqos -- \
>> +        --id=@myqos create qos type=trtcm-policer \
>> +        other-config:cir=52000 other-config:cbs=2048 \
>> +        other-config:eir=52000 other-config:ebs=2048  \
>> +        queues:10=@dpdk1Q10 queues:20=@dpdk1Q20 -- \
>> +         --id=@dpdk1Q10 create queue \
>> +          other-config:cir=41600000 other-config:cbs=2048 \
>> +          other-config:eir=0 other-config:ebs=0 -- \
>> +         --id=@dpdk1Q20 create queue \
>> +           other-config:cir=0 other-config:cbs=0 \
>> +           other-config:eir=41600000 other-config:ebs=2048 \
>> +
>> +This configuration accomplishes that the high priority traffic has a
>> +guaranteed bandwidth egressing the ports at CIR (1000pps), but it can 
>> also
>> +use the EIR, so a total of 2000pps at max. These additional 1000pps is
>> +shared with the low priority traffic. The low priority traffic can 
>> use at
>> +maximum 1000pps.
>> +
>>   Refer to ``vswitch.xml`` for more details on egress policer.
>>   Rate Limiting (Ingress Policing)
>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
>> index 128963f..3f164c8 100644
>> --- a/lib/netdev-dpdk.c
>> +++ b/lib/netdev-dpdk.c
>> @@ -26,7 +26,10 @@
>>   #include <sys/socket.h>
>>   #include <linux/if.h>
>> +/* The below is needed as rte_meter.h gets included trough 
>> rte_bus_pci.h. */
>> +#define ALLOW_EXPERIMENTAL_API
>>   #include <rte_bus_pci.h>
>> +#undef ALLOW_EXPERIMENTAL_API
>>   #include <rte_config.h>
>>   #include <rte_cycles.h>
>>   #include <rte_errno.h>
>> @@ -35,7 +38,9 @@
>>   #include <rte_flow.h>
>>   #include <rte_malloc.h>
>>   #include <rte_mbuf.h>
>> +#define ALLOW_EXPERIMENTAL_API
>>   #include <rte_meter.h>
>> +#undef ALLOW_EXPERIMENTAL_API
>>   #include <rte_pci.h>
>>   #include <rte_version.h>
>>   #include <rte_vhost.h>
>> @@ -329,8 +334,9 @@ struct dpdk_qos_ops {
>>                                        struct netdev_dpdk_queue_state 
>> *state);
>>   };
>> -/* dpdk_qos_ops for each type of user space QoS implementation */
>> +/* dpdk_qos_ops for each type of user space QoS implementation. */
>>   static const struct dpdk_qos_ops egress_policer_ops;
>> +static const struct dpdk_qos_ops trtcm_policer_ops;
>>   /*
>>    * Array of dpdk_qos_ops, contains pointer to all supported QoS
>> @@ -338,6 +344,7 @@ static const struct dpdk_qos_ops egress_policer_ops;
>>    */
>>   static const struct dpdk_qos_ops *const qos_confs[] = {
>>       &egress_policer_ops,
>> +    &trtcm_policer_ops,
>>       NULL
>>   };
>> @@ -2162,9 +2169,9 @@ netdev_dpdk_eth_tx_burst(struct netdev_dpdk 
>> *dev, int qid,
>>   }
>>   static inline bool
>> -netdev_dpdk_policer_pkt_handle(struct rte_meter_srtcm *meter,
>> -                               struct rte_meter_srtcm_profile *profile,
>> -                               struct rte_mbuf *pkt, uint64_t time)
>> +netdev_dpdk_srtcm_policer_pkt_handle(struct rte_meter_srtcm *meter,
>> +                                     struct rte_meter_srtcm_profile 
>> *profile,
>> +                                     struct rte_mbuf *pkt, uint64_t 
>> time)
>>   {
>>       uint32_t pkt_len = rte_pktmbuf_pkt_len(pkt) - sizeof(struct 
>> rte_ether_hdr);
>> @@ -2173,10 +2180,10 @@ netdev_dpdk_policer_pkt_handle(struct 
>> rte_meter_srtcm *meter,
>>   }
>>   static int
>> -netdev_dpdk_policer_run(struct rte_meter_srtcm *meter,
>> -                        struct rte_meter_srtcm_profile *profile,
>> -                        struct rte_mbuf **pkts, int pkt_cnt,
>> -                        bool should_steal)
>> +srtcm_policer_run_single_packet(struct rte_meter_srtcm *meter,
>> +                                struct rte_meter_srtcm_profile *profile,
>> +                                struct rte_mbuf **pkts, int pkt_cnt,
>> +                                bool should_steal)
>>   {
>>       int i = 0;
>>       int cnt = 0;
>> @@ -2186,8 +2193,8 @@ netdev_dpdk_policer_run(struct rte_meter_srtcm 
>> *meter,
>>       for (i = 0; i < pkt_cnt; i++) {
>>           pkt = pkts[i];
>>           /* Handle current packet */
>> -        if (netdev_dpdk_policer_pkt_handle(meter, profile,
>> -                                           pkt, current_time)) {
>> +        if (netdev_dpdk_srtcm_policer_pkt_handle(meter, profile,
>> +                                                 pkt, current_time)) {
>>               if (cnt != i) {
>>                   pkts[cnt] = pkt;
>>               }
>> @@ -2209,8 +2216,9 @@ ingress_policer_run(struct ingress_policer 
>> *policer, struct rte_mbuf **pkts,
>>       int cnt = 0;
>>       rte_spinlock_lock(&policer->policer_lock);
>> -    cnt = netdev_dpdk_policer_run(&policer->in_policer, 
>> &policer->in_prof,
>> -                                  pkts, pkt_cnt, should_steal);
>> +    cnt = srtcm_policer_run_single_packet(&policer->in_policer,
>> +                                          &policer->in_prof,
>> +                                          pkts, pkt_cnt, should_steal);
>>       rte_spinlock_unlock(&policer->policer_lock);
>>       return cnt;
>> @@ -4480,9 +4488,9 @@ egress_policer_run(struct qos_conf *conf, struct 
>> rte_mbuf **pkts, int pkt_cnt,
>>       struct egress_policer *policer =
>>           CONTAINER_OF(conf, struct egress_policer, qos_conf);
>> -    cnt = netdev_dpdk_policer_run(&policer->egress_meter,
>> -                                  &policer->egress_prof, pkts,
>> -                                  pkt_cnt, should_steal);
>> +    cnt = srtcm_policer_run_single_packet(&policer->egress_meter,
>> +                                          &policer->egress_prof, pkts,
>> +                                          pkt_cnt, should_steal);
>>       return cnt;
>>   }
>> @@ -4496,6 +4504,326 @@ static const struct dpdk_qos_ops 
>> egress_policer_ops = {
>>       .qos_run = egress_policer_run
>>   };
>> +/* trtcm-policer details */
>> +
>> +struct trtcm_policer {
>> +    struct qos_conf qos_conf;
>> +    struct rte_meter_trtcm_rfc4115_params meter_params;
>> +    struct rte_meter_trtcm_rfc4115_profile meter_profile;
>> +    struct rte_meter_trtcm_rfc4115 meter;
>> +    struct netdev_queue_stats stats;
>> +    struct hmap queues;
>> +};
>> +
>> +struct trtcm_policer_queue {
>> +    struct hmap_node hmap_node;
>> +    uint32_t queue_id;
>> +    struct rte_meter_trtcm_rfc4115_params meter_params;
>> +    struct rte_meter_trtcm_rfc4115_profile meter_profile;
>> +    struct rte_meter_trtcm_rfc4115 meter;
>> +    struct netdev_queue_stats stats;
>> +};
>> +
>> +static void
>> +trtcm_policer_details_to_param(const struct smap *details,
>> +                               struct rte_meter_trtcm_rfc4115_params 
>> *params)
>> +{
>> +    memset(params, 0, sizeof *params);
>> +    params->cir = smap_get_ullong(details, "cir", 0);
>> +    params->eir = smap_get_ullong(details, "eir", 0);
>> +    params->cbs = smap_get_ullong(details, "cbs", 0);
>> +    params->ebs = smap_get_ullong(details, "ebs", 0);
>> +}
>> +
>> +static void
>> +trtcm_policer_param_to_detail(
>> +    const struct rte_meter_trtcm_rfc4115_params *params,
>> +    struct smap *details)
>> +{
>> +    smap_add_format(details, "cir", "%"PRIu64, params->cir);
>> +    smap_add_format(details, "eir", "%"PRIu64, params->eir);
>> +    smap_add_format(details, "cbs", "%"PRIu64, params->cbs);
>> +    smap_add_format(details, "ebs", "%"PRIu64, params->ebs);
>> +}
>> +
>> +
>> +static int
>> +trtcm_policer_qos_construct(const struct smap *details,
>> +                            struct qos_conf **conf)
>> +{
>> +    struct trtcm_policer *policer;
>> +    int err = 0;
>> +
>> +    policer = xmalloc(sizeof *policer);
>> +    qos_conf_init(&policer->qos_conf, &trtcm_policer_ops);
>> +    trtcm_policer_details_to_param(details, &policer->meter_params);
>> +    err = 
>> rte_meter_trtcm_rfc4115_profile_config(&policer->meter_profile,
>> +                                                 
>> &policer->meter_params);
>> +    if (!err) {
>> +        err = rte_meter_trtcm_rfc4115_config(&policer->meter,
>> +                                             &policer->meter_profile);
>> +    }
>> +
>> +    if (!err) {
>> +        *conf = &policer->qos_conf;
>> +        memset(&policer->stats, 0, sizeof policer->stats);
>> +        hmap_init(&policer->queues);
>> +    } else {
>> +        free(policer);
>> +        *conf = NULL;
>> +        err = -err;
>> +    }
>> +
>> +    return err;
>> +}
>> +
>> +static void
>> +trtcm_policer_qos_destruct(struct qos_conf *conf)
>> +{
>> +    struct trtcm_policer_queue *queue, *next_queue;
>> +    struct trtcm_policer *policer = CONTAINER_OF(conf, struct 
>> trtcm_policer,
>> +                                                 qos_conf);
>> +
>> +    HMAP_FOR_EACH_SAFE (queue, next_queue, hmap_node, 
>> &policer->queues) {
>> +        hmap_remove(&policer->queues, &queue->hmap_node);
>> +        free(queue);
>> +    }
>> +    hmap_destroy(&policer->queues);
>> +    free(policer);
>> +}
>> +
>> +static int
>> +trtcm_policer_qos_get(const struct qos_conf *conf, struct smap *details)
>> +{
>> +    struct trtcm_policer *policer = CONTAINER_OF(conf, struct 
>> trtcm_policer,
>> +                                                 qos_conf);
>> +
>> +    trtcm_policer_param_to_detail(&policer->meter_params, details);
>> +    return 0;
>> +}
>> +
>> +static bool
>> +trtcm_policer_qos_is_equal(const struct qos_conf *conf,
>> +                           const struct smap *details)
>> +{
>> +    struct trtcm_policer *policer = CONTAINER_OF(conf, struct 
>> trtcm_policer,
>> +                                                 qos_conf);
>> +    struct rte_meter_trtcm_rfc4115_params params;
>> +
>> +    trtcm_policer_details_to_param(details, &params);
>> +
>> +    return !memcmp(&params, &policer->meter_params, sizeof params);
>> +}
>> +
>> +static struct trtcm_policer_queue *
>> +trtcm_policer_qos_find_queue(struct trtcm_policer *policer, uint32_t 
>> queue_id)
>> +{
>> +    struct trtcm_policer_queue *queue;
>> +    HMAP_FOR_EACH_WITH_HASH (queue, hmap_node, hash_2words(queue_id, 0),
>> +                             &policer->queues) {
>> +        if (queue->queue_id == queue_id) {
>> +            return queue;
>> +        }
>> +    }
>> +    return NULL;
>> +}
>> +
>> +static inline bool
>> +trtcm_policer_run_single_packet(struct trtcm_policer *policer,
>> +                                struct rte_mbuf *pkt, uint64_t time)
>> +{
>> +    enum rte_color pkt_color;
>> +    struct trtcm_policer_queue *queue;
>> +    uint32_t pkt_len = rte_pktmbuf_pkt_len(pkt) - sizeof(struct 
>> rte_ether_hdr);
>> +    struct dp_packet *dpkt = CONTAINER_OF(pkt, struct dp_packet, mbuf);
>> +
>> +    queue = trtcm_policer_qos_find_queue(policer, 
>> dpkt->md.skb_priority);
>> +    if (!queue) {
>> +        /* If no queue is found, use the default queue, which MUST 
>> exist. */
>> +        queue = trtcm_policer_qos_find_queue(policer, 0);
>> +        if (!queue) {
>> +            return false;
>> +        }
>> +    }
>> +
>> +    pkt_color = rte_meter_trtcm_rfc4115_color_blind_check(&queue->meter,

A few times during testing I have seen OVS crash with the following

./launch_vswitch.sh: line 66: 11694 Floating point exception sudo 
$OVS_DIR/vswitchd/ovs-vswitchd unix:$DB_SOCK --pidfile

Looking into it with GDB it sems related to the 
rte_meter_trtcm_rfc4115_color_blind_check above. See GDB output below.

Thread 12 "pmd-c03/id:9" received signal SIGFPE, Arithmetic exception.
[Switching to Thread 0x7f3dce734700 (LWP 26465)]
0x0000000000d4b92d in rte_meter_trtcm_rfc4115_color_blind_check 
(m=0x328e178, p=0x328e148, time=29107058565136113, pkt_len=46) at 
/opt/istokes/dpdk-19.11//x86_64-native-linuxapp-gcc/include/rte_meter.h:599
599             n_periods_te = time_diff_te / p->eir_period;
(gdb) bt
#0  0x0000000000d4b92d in rte_meter_trtcm_rfc4115_color_blind_check 
(m=0x328e178, p=0x328e148, time=29107058565136113, pkt_len=46) at 
/opt/istokes/dpdk-19.11//x86_64-native-linuxapp-gcc/include/rte_meter.h:599
#1  0x0000000000d4abc1 in trtcm_policer_run_single_packet 
(policer=0x2774200, pkt=0x1508fbd40, time=29107058565136113) at 
lib/netdev-dpdk.c:4649
#2  0x0000000000d4ad24 in trtcm_policer_run (conf=0x2774200, 
pkts=0x7f3db8005100, pkt_cnt=32, should_steal=true) at 
lib/netdev-dpdk.c:4691
#3  0x0000000000d45299 in netdev_dpdk_qos_run (dev=0x17fd68840, 
pkts=0x7f3db8005100, cnt=32, should_steal=true) at lib/netdev-dpdk.c:2421
#4  0x0000000000d45db0 in netdev_dpdk_send__ (dev=0x17fd68840, qid=1, 
batch=0x7f3db80050f0, concurrent_txq=false) at lib/netdev-dpdk.c:2683
#5  0x0000000000d45ee9 in netdev_dpdk_eth_send (netdev=0x17fd688c0, 
qid=1, batch=0x7f3db80050f0, concurrent_txq=false) at lib/netdev-dpdk.c:2710
#6  0x0000000000c342ba in netdev_send (netdev=0x17fd688c0, qid=1, 
batch=0x7f3db80050f0, concurrent_txq=false) at lib/netdev.c:814
#7  0x0000000000beb3de in dp_netdev_pmd_flush_output_on_port 
(pmd=0x7f3dce735010, p=0x7f3db80050c0) at lib/dpif-netdev.c:4224
#8  0x0000000000beb5c4 in dp_netdev_pmd_flush_output_packets 
(pmd=0x7f3dce735010, force=false) at lib/dpif-netdev.c:4264
#9  0x0000000000beb814 in dp_netdev_process_rxq_port 
(pmd=0x7f3dce735010, rxq=0x328d930, port_no=2) at lib/dpif-netdev.c:4319
#10 0x0000000000bef432 in pmd_thread_main (f_=0x7f3dce735010) at 
lib/dpif-netdev.c:5556
#11 0x0000000000cb24e5 in ovsthread_wrapper (aux_=0x326d220) at 
lib/ovs-thread.c:383
#12 0x00007f3de11d236d in start_thread (arg=0x7f3dce734700) at 
pthread_create.c:456
#13 0x00007f3de06bab4f in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:97

Regards
Ian
>> +                                                          
>> &queue->meter_profile,
>> +                                                          time,
>> +                                                          pkt_len);
>> +
>> +    if (pkt_color == RTE_COLOR_RED) {
>> +        queue->stats.tx_errors++;
>> +    } else {
>> +        queue->stats.tx_bytes += pkt_len;
>> +        queue->stats.tx_packets++;
>> +    }
>> +
>> +    pkt_color = 
>> rte_meter_trtcm_rfc4115_color_aware_check(&policer->meter,
>> +                                                     
>> &policer->meter_profile,
>> +                                                     time, pkt_len,
>> +                                                     pkt_color);
>> +
>> +    if (pkt_color == RTE_COLOR_RED) {
>> +        policer->stats.tx_errors++;
>> +        return false;
>> +    }
>> +
>> +    policer->stats.tx_bytes += pkt_len;
>> +    policer->stats.tx_packets++;
>> +    return true;
>> +}
>> +
>> +static int
>> +trtcm_policer_run(struct qos_conf *conf, struct rte_mbuf **pkts, int 
>> pkt_cnt,
>> +                  bool should_steal)
>> +{
>> +    int i = 0;
>> +    int cnt = 0;
>> +    struct rte_mbuf *pkt = NULL;
>> +    uint64_t current_time = rte_rdtsc();
>> +
>> +    struct trtcm_policer *policer = CONTAINER_OF(conf, struct 
>> trtcm_policer,
>> +                                                 qos_conf);
>> +
>> +    for (i = 0; i < pkt_cnt; i++) {
>> +        pkt = pkts[i];
>> +
>> +        if (trtcm_policer_run_single_packet(policer, pkt, 
>> current_time)) {
>> +            if (cnt != i) {
>> +                pkts[cnt] = pkt;
>> +            }
>> +            cnt++;
>> +        } else {
>> +            if (should_steal) {
>> +                rte_pktmbuf_free(pkt);
>> +            }
>> +        }
>> +    }
>> +    return cnt;
>> +}
>> +
>> +static int
>> +trtcm_policer_qos_queue_construct(const struct smap *details,
>> +                                  uint32_t queue_id, struct qos_conf 
>> *conf)
>> +{
>> +    int err = 0;
>> +    struct trtcm_policer_queue *queue;
>> +    struct trtcm_policer *policer = CONTAINER_OF(conf, struct 
>> trtcm_policer,
>> +                                                 qos_conf);
>> +
>> +    queue = trtcm_policer_qos_find_queue(policer, queue_id);
>> +    if (!queue) {
>> +        queue = xmalloc(sizeof *queue);
>> +        queue->queue_id = queue_id;
>> +        memset(&queue->stats, 0, sizeof queue->stats);
>> +        queue->stats.created = time_msec();
>> +        hmap_insert(&policer->queues, &queue->hmap_node,
>> +                    hash_2words(queue_id, 0));
>> +    }
>> +    if (queue_id == 0 && smap_is_empty(details)) {
>> +        /* No default queue configured, use port values */
>> +        memcpy(&queue->meter_params, &policer->meter_params,
>> +               sizeof queue->meter_params);
>> +    } else {
>> +        trtcm_policer_details_to_param(details, &queue->meter_params);
>> +    }
>> +
>> +    err = rte_meter_trtcm_rfc4115_profile_config(&queue->meter_profile,
>> +                                                 &queue->meter_params);
>> +
>> +    if (!err) {
>> +        err = rte_meter_trtcm_rfc4115_config(&queue->meter,
>> +                                             &queue->meter_profile);
>> +    }
>> +    if (err) {
>> +        hmap_remove(&policer->queues, &queue->hmap_node);
>> +        free(queue);
>> +        err = -err;
>> +    }
>> +    return err;
>> +}
>> +
>> +static void
>> +trtcm_policer_qos_queue_destruct(struct qos_conf *conf, uint32_t 
>> queue_id)
>> +{
>> +    struct trtcm_policer_queue *queue;
>> +    struct trtcm_policer *policer = CONTAINER_OF(conf, struct 
>> trtcm_policer,
>> +                                                 qos_conf);
>> +
>> +    queue = trtcm_policer_qos_find_queue(policer, queue_id);
>> +    if (queue) {
>> +        hmap_remove(&policer->queues, &queue->hmap_node);
>> +        free(queue);
>> +    }
>> +}
>> +
>> +static int
>> +trtcm_policer_qos_queue_get(struct smap *details, uint32_t queue_id,
>> +                            const struct qos_conf *conf)
>> +{
>> +    struct trtcm_policer_queue *queue;
>> +    struct trtcm_policer *policer = CONTAINER_OF(conf, struct 
>> trtcm_policer,
>> +                                                 qos_conf);
>> +
>> +    queue = trtcm_policer_qos_find_queue(policer, queue_id);
>> +    if (!queue) {
>> +        return EINVAL;
>> +    }
>> +
>> +    trtcm_policer_param_to_detail(&queue->meter_params, details);
>> +    return 0;
>> +}
>> +
>> +static int
>> +trtcm_policer_qos_queue_get_stats(const struct qos_conf *conf,
>> +                                  uint32_t queue_id,
>> +                                  struct netdev_queue_stats *stats)
>> +{
>> +    struct trtcm_policer_queue *queue;
>> +    struct trtcm_policer *policer = CONTAINER_OF(conf, struct 
>> trtcm_policer,
>> +                                                 qos_conf);
>> +
>> +    queue = trtcm_policer_qos_find_queue(policer, queue_id);
>> +    if (!queue) {
>> +        return EINVAL;
>> +    }
>> +    memcpy(stats, &queue->stats, sizeof *stats);
>> +    return 0;
>> +}
>> +
>> +static int
>> +trtcm_policer_qos_queue_dump_state_init(const struct qos_conf *conf,
>> +                                        struct 
>> netdev_dpdk_queue_state *state)
>> +{
>> +    uint32_t i = 0;
>> +    struct trtcm_policer_queue *queue;
>> +    struct trtcm_policer *policer = CONTAINER_OF(conf, struct 
>> trtcm_policer,
>> +                                                 qos_conf);
>> +
>> +    state->n_queues = hmap_count(&policer->queues);
>> +    state->cur_queue = 0;
>> +    state->queues = xmalloc(state->n_queues * sizeof *state->queues);
>> +
>> +    HMAP_FOR_EACH (queue, hmap_node, &policer->queues) {
>> +        state->queues[i++] = queue->queue_id;
>> +    }
>> +    return 0;
>> +}
>> +
>> +static const struct dpdk_qos_ops trtcm_policer_ops = {
>> +    .qos_name = "trtcm-policer",
>> +    .qos_construct = trtcm_policer_qos_construct,
>> +    .qos_destruct = trtcm_policer_qos_destruct,
>> +    .qos_get = trtcm_policer_qos_get,
>> +    .qos_is_equal = trtcm_policer_qos_is_equal,
>> +    .qos_run = trtcm_policer_run,
>> +    .qos_queue_construct = trtcm_policer_qos_queue_construct,
>> +    .qos_queue_destruct = trtcm_policer_qos_queue_destruct,
>> +    .qos_queue_get = trtcm_policer_qos_queue_get,
>> +    .qos_queue_get_stats = trtcm_policer_qos_queue_get_stats,
>> +    .qos_queue_dump_state_init = trtcm_policer_qos_queue_dump_state_init
>> +};
>> +
>>   static int
>>   netdev_dpdk_reconfigure(struct netdev *netdev)
>>   {
>> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
>> index 0ec726c..c43cb1a 100644
>> --- a/vswitchd/vswitch.xml
>> +++ b/vswitchd/vswitch.xml
>> @@ -4441,6 +4441,19 @@ ovs-vsctl add-port br0 p0 -- set Interface p0 
>> type=patch options:peer=p1 \
>>             in performance will be noticed in terms of overall 
>> aggregate traffic
>>             throughput.
>>           </dd>
>> +        <dt><code>trtcm-policer</code></dt>
>> +        <dd>
>> +          A DPDK egress policer algorithm using RFC 4115's Two-Rate,
>> +          Three-Color marker. It's a two-level hierarchical policer
>> +          which first does a color-blind marking of the traffic at 
>> the queue
>> +          level, followed by a color-aware marking at the port level. 
>> At the
>> +          end traffic marked as Green or Yellow is forwarded, Red is 
>> dropped.
>> +          For details on how traffic is marked, see RFC 4115.
>> +
>> +          If the ``default queue'', 0, is not configured it's 
>> automatically
>> +          created with the same <code>other_config</code> values as the
>> +          physical port.
>> +        </dd>
>>         </dl>
>>       </column>
>> @@ -4508,6 +4521,27 @@ ovs-vsctl add-port br0 p0 -- set Interface p0 
>> type=patch options:peer=p1 \
>>           bytes/tokens of the packet. If there are not enough tokens 
>> in the cbs
>>           bucket the packet will be dropped.
>>         </column>
>> +      <column name="other_config" key="eir" type='{"type": "integer"}'>
>> +        The Excess Information Rate (EIR) is measured in bytes of IP
>> +        packets per second, i.e. it includes the IP header, but not link
>> +        specific (e.g. Ethernet) headers. This represents the bytes 
>> per second
>> +        rate at which the token bucket will be updated. The eir value is
>> +        calculated by (pps x packet data size).  For example assuming 
>> a user
>> +        wishes to limit a stream consisting of 64 byte packets to 1 
>> million
>> +        packets per second the EIR would be set to to to 46000000. 
>> This value
>> +        can be broken into '1,000,000 x 46'. Where 1,000,000 is the 
>> policing
>> +        rate for the number of packets per second and 46 represents 
>> the size
>> +        of the packet data for a 64 byte ip packet.
>> +      </column>
>> +      <column name="other_config" key="ebs" type='{"type": "integer"}'>
>> +        The Excess Burst Size (EBS) is measured in bytes and 
>> represents a
>> +        token bucket. At a minimum this value should be be set to the 
>> expected
>> +        largest size packet in the traffic stream. In practice larger 
>> values
>> +        may be used to increase the size of the token bucket. If a 
>> packet can
>> +        be transmitted then the ebs will be decremented by the number of
>> +        bytes/tokens of the packet. If there are not enough tokens in 
>> the cbs
>> +        bucket the packet might be dropped.
>> +      </column>
>>       </group>
>>       <group title="Configuration for linux-sfq">
>>
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Eelco Chaudron Jan. 14, 2020, 11:25 a.m. UTC | #4
On 14 Jan 2020, at 12:23, Stokes, Ian wrote:

> On 1/13/2020 8:32 PM, Stokes, Ian wrote:
>>
>>
>> On 1/13/2020 3:56 PM, Eelco Chaudron wrote:
>>> This patch adds a new policer to the DPDK datapath based on RFC 
>>> 4115's
>>> Two-Rate, Three-Color marker. It's a two-level hierarchical policer
>>> which first does a color-blind marking of the traffic at the queue
>>> level, followed by a color-aware marking at the port level. At the 
>>> end
>>> traffic marked as Green or Yellow is forwarded, Red is dropped. For
>>> details on how traffic is marked, see RFC 4115.
>>>
>>> This egress policer can be used to limit traffic at different rated
>>> based on the queues the traffic is in. In addition, it can also be 
>>> used
>>> to prioritize certain traffic over others at a port level.
>>>
>>> For example, the following configuration will limit the traffic rate 
>>> at a
>>> port level to a maximum of 2000 packets a second (64 bytes IPv4 
>>> packets).
>>> 100pps as CIR (Committed Information Rate) and 1000pps as EIR 
>>> (Excess
>>> Information Rate). High priority traffic is routed to queue 10, 
>>> which marks
>>> all traffic as CIR, i.e. Green. All low priority traffic, queue 20, 
>>> is
>>> marked as EIR, i.e. Yellow.
>>>
>>> ovs-vsctl --timeout=5 set port dpdk1 qos=@myqos -- \
>>>    --id=@myqos create qos type=trtcm-policer \
>>>    other-config:cir=52000 other-config:cbs=2048 \
>>>    other-config:eir=52000 other-config:ebs=2048  \
>>>    queues:10=@dpdk1Q10 queues:20=@dpdk1Q20 -- \
>>>    --id=@dpdk1Q10 create queue \
>>>      other-config:cir=41600000 other-config:cbs=2048 \
>>>      other-config:eir=0 other-config:ebs=0 -- \
>>>    --id=@dpdk1Q20 create queue \
>>>      other-config:cir=0 other-config:cbs=0 \
>>>      other-config:eir=41600000 other-config:ebs=2048 \
>>>
>>> This configuration accomplishes that the high priority traffic has a
>>> guaranteed bandwidth egressing the ports at CIR (1000pps), but it 
>>> can also
>>> use the EIR, so a total of 2000pps at max. These additional 1000pps 
>>> is
>>> shared with the low priority traffic. The low priority traffic can 
>>> use at
>>> maximum 1000pps.
>>
>> Thanks for the patch Eelco, LGTM, I just want to finish a little more 
>> testing but I intend to merge this tomorrow to master if nothing else 
>> comes up/is raised.
>>
>
> Hi Eelco, I'm seeing a crash in OVS while running this with just a 
> port and a default queue 0 (phy to phy setup). It seems related to the 
> call to rte_meter_trtcm_rfc4115_color_blind_check. I've provided more 
> detail below in the trtcm_policer_run_single_packet function, just 
> wondering if you've come across it?

Looks like something that was fixed before in DPDK, I’ll take a look a 
bit later…

//Eelco

> Heres the output for the qos configuration I'm using
>
> -bash-4.4$ovs-appctl -t ovs-vswitchd qos/show dpdk1
> QoS: dpdk1 trtcm-policer
> eir: 52000
> cbs: 2048
> ebs: 2048
> cir: 52000
>
> Default:
>   eir: 52000
>   cbs: 2048
>   ebs: 2048
>   cir: 52000
>   tx_packets: 672150
>   tx_bytes: 30918900
>   tx_errors: 489562233
>
> I'll try to investigate further with DPDK and GDB also.
>
>> Regards
>> Ian
>>
>>>
>>> Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
>>> ---
>>>   Documentation/topics/dpdk/qos.rst |   43 ++++
>>>   lib/netdev-dpdk.c                 |  358 
>>> +++++++++++++++++++++++++++++++++++--
>>>   vswitchd/vswitch.xml              |   34 ++++
>>>   3 files changed, 420 insertions(+), 15 deletions(-)
>>>
>>> diff --git a/Documentation/topics/dpdk/qos.rst 
>>> b/Documentation/topics/dpdk/qos.rst
>>> index c0aec5d..1034954 100644
>>> --- a/Documentation/topics/dpdk/qos.rst
>>> +++ b/Documentation/topics/dpdk/qos.rst
>>> @@ -33,6 +33,9 @@ datapath. These are referred to as *QoS* and *Rate 
>>> Limiting*, respectively.
>>>   QoS (Egress Policing)
>>>   ---------------------
>>> +Single Queue Policer
>>> +~~~~~~~~~~~~~~~~~~~~
>>> +
>>>   Assuming you have a :doc:`vhost-user port <vhost-user>` 
>>> transmitting traffic
>>>   consisting of packets of size 64 bytes, the following command 
>>> would limit the
>>>   egress transmission rate of the port to ~1,000,000 packets per 
>>> second::
>>> @@ -49,6 +52,46 @@ To clear the QoS configuration from the port and 
>>> ovsdb, run::
>>>       $ ovs-vsctl destroy QoS vhost-user0 -- clear Port 
>>> vhost-user0 qos
>>> +
>>> +Multi Queue Policer
>>> +~~~~~~~~~~~~~~~~~~~
>>> +
>>> +In addition to the egress-policer OVS-DPDK also has support for a 
>>> RFC
>>> +4115's Two-Rate, Three-Color marker meter. It's a two-level 
>>> hierarchical
>>> +policer which first does a color-blind marking of the traffic at 
>>> the queue
>>> +level, followed by a color-aware marking at the port level. At the 
>>> end
>>> +traffic marked as Green or Yellow is forwarded, Red is dropped. For
>>> +details on how traffic is marked, see RFC 4115.
>>> +
>>> +This egress policer can be used to limit traffic at different rated
>>> +based on the queues the traffic is in. In addition, it can also be 
>>> used
>>> +to prioritize certain traffic over others at a port level.
>>> +
>>> +For example, the following configuration will limit the traffic 
>>> rate at a
>>> +port level to a maximum of 2000 packets a second (64 bytes IPv4 
>>> packets).
>>> +100pps as CIR (Committed Information Rate) and 1000pps as EIR 
>>> (Excess
>>> +Information Rate). High priority traffic is routed to queue 10, 
>>> which marks
>>> +all traffic as CIR, i.e. Green. All low priority traffic, queue 20, 
>>> is
>>> +marked as EIR, i.e. Yellow::
>>> +
>>> +    $ ovs-vsctl --timeout=5 set port dpdk1 qos=@myqos -- \
>>> +        --id=@myqos create qos type=trtcm-policer \
>>> +        other-config:cir=52000 other-config:cbs=2048 \
>>> +        other-config:eir=52000 other-config:ebs=2048  \
>>> +        queues:10=@dpdk1Q10 queues:20=@dpdk1Q20 -- \
>>> +         --id=@dpdk1Q10 create queue \
>>> +          other-config:cir=41600000 other-config:cbs=2048 
>>> \
>>> +          other-config:eir=0 other-config:ebs=0 -- \
>>> +         --id=@dpdk1Q20 create queue \
>>> +           other-config:cir=0 other-config:cbs=0 \
>>> +           other-config:eir=41600000 
>>> other-config:ebs=2048 \
>>> +
>>> +This configuration accomplishes that the high priority traffic has 
>>> a
>>> +guaranteed bandwidth egressing the ports at CIR (1000pps), but it 
>>> can also
>>> +use the EIR, so a total of 2000pps at max. These additional 1000pps 
>>> is
>>> +shared with the low priority traffic. The low priority traffic can 
>>> use at
>>> +maximum 1000pps.
>>> +
>>>   Refer to ``vswitch.xml`` for more details on egress policer.
>>>   Rate Limiting (Ingress Policing)
>>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
>>> index 128963f..3f164c8 100644
>>> --- a/lib/netdev-dpdk.c
>>> +++ b/lib/netdev-dpdk.c
>>> @@ -26,7 +26,10 @@
>>>   #include <sys/socket.h>
>>>   #include <linux/if.h>
>>> +/* The below is needed as rte_meter.h gets included trough 
>>> rte_bus_pci.h. */
>>> +#define ALLOW_EXPERIMENTAL_API
>>>   #include <rte_bus_pci.h>
>>> +#undef ALLOW_EXPERIMENTAL_API
>>>   #include <rte_config.h>
>>>   #include <rte_cycles.h>
>>>   #include <rte_errno.h>
>>> @@ -35,7 +38,9 @@
>>>   #include <rte_flow.h>
>>>   #include <rte_malloc.h>
>>>   #include <rte_mbuf.h>
>>> +#define ALLOW_EXPERIMENTAL_API
>>>   #include <rte_meter.h>
>>> +#undef ALLOW_EXPERIMENTAL_API
>>>   #include <rte_pci.h>
>>>   #include <rte_version.h>
>>>   #include <rte_vhost.h>
>>> @@ -329,8 +334,9 @@ struct dpdk_qos_ops {
>>>                                        
>>> struct netdev_dpdk_queue_state *state);
>>>   };
>>> -/* dpdk_qos_ops for each type of user space QoS implementation */
>>> +/* dpdk_qos_ops for each type of user space QoS implementation. */
>>>   static const struct dpdk_qos_ops egress_policer_ops;
>>> +static const struct dpdk_qos_ops trtcm_policer_ops;
>>>   /*
>>>    * Array of dpdk_qos_ops, contains pointer to all supported QoS
>>> @@ -338,6 +344,7 @@ static const struct dpdk_qos_ops 
>>> egress_policer_ops;
>>>    */
>>>   static const struct dpdk_qos_ops *const qos_confs[] = {
>>>       &egress_policer_ops,
>>> +    &trtcm_policer_ops,
>>>       NULL
>>>   };
>>> @@ -2162,9 +2169,9 @@ netdev_dpdk_eth_tx_burst(struct netdev_dpdk 
>>> *dev, int qid,
>>>   }
>>>   static inline bool
>>> -netdev_dpdk_policer_pkt_handle(struct rte_meter_srtcm *meter,
>>> -                               struct 
>>> rte_meter_srtcm_profile *profile,
>>> -                               struct 
>>> rte_mbuf *pkt, uint64_t time)
>>> +netdev_dpdk_srtcm_policer_pkt_handle(struct rte_meter_srtcm *meter,
>>> +                                     
>>> struct rte_meter_srtcm_profile *profile,
>>> +                                     
>>> struct rte_mbuf *pkt, uint64_t time)
>>>   {
>>>       uint32_t pkt_len = rte_pktmbuf_pkt_len(pkt) - 
>>> sizeof(struct rte_ether_hdr);
>>> @@ -2173,10 +2180,10 @@ netdev_dpdk_policer_pkt_handle(struct 
>>> rte_meter_srtcm *meter,
>>>   }
>>>   static int
>>> -netdev_dpdk_policer_run(struct rte_meter_srtcm *meter,
>>> -                        struct 
>>> rte_meter_srtcm_profile *profile,
>>> -                        struct rte_mbuf 
>>> **pkts, int pkt_cnt,
>>> -                        bool should_steal)
>>> +srtcm_policer_run_single_packet(struct rte_meter_srtcm *meter,
>>> +                                
>>> struct rte_meter_srtcm_profile *profile,
>>> +                                
>>> struct rte_mbuf **pkts, int pkt_cnt,
>>> +                                bool 
>>> should_steal)
>>>   {
>>>       int i = 0;
>>>       int cnt = 0;
>>> @@ -2186,8 +2193,8 @@ netdev_dpdk_policer_run(struct rte_meter_srtcm 
>>> *meter,
>>>       for (i = 0; i < pkt_cnt; i++) {
>>>           pkt = pkts[i];
>>>           /* Handle current packet */
>>> -        if (netdev_dpdk_policer_pkt_handle(meter, profile,
>>> -                                           
>>> pkt, current_time)) {
>>> +        if (netdev_dpdk_srtcm_policer_pkt_handle(meter, 
>>> profile,
>>> +                                                 
>>> pkt, current_time)) {
>>>               if (cnt != i) {
>>>                   pkts[cnt] = pkt;
>>>               }
>>> @@ -2209,8 +2216,9 @@ ingress_policer_run(struct ingress_policer 
>>> *policer, struct rte_mbuf **pkts,
>>>       int cnt = 0;
>>>       rte_spinlock_lock(&policer->policer_lock);
>>> -    cnt = netdev_dpdk_policer_run(&policer->in_policer, 
>>> &policer->in_prof,
>>> -                                  
>>> pkts, pkt_cnt, should_steal);
>>> +    cnt = srtcm_policer_run_single_packet(&policer->in_policer,
>>> +                                          
>>> &policer->in_prof,
>>> +                                          
>>> pkts, pkt_cnt, should_steal);
>>>       rte_spinlock_unlock(&policer->policer_lock);
>>>       return cnt;
>>> @@ -4480,9 +4488,9 @@ egress_policer_run(struct qos_conf *conf, 
>>> struct rte_mbuf **pkts, int pkt_cnt,
>>>       struct egress_policer *policer =
>>>           CONTAINER_OF(conf, struct egress_policer, 
>>> qos_conf);
>>> -    cnt = netdev_dpdk_policer_run(&policer->egress_meter,
>>> -                                  
>>> &policer->egress_prof, pkts,
>>> -                                  
>>> pkt_cnt, should_steal);
>>> +    cnt = 
>>> srtcm_policer_run_single_packet(&policer->egress_meter,
>>> +                                          
>>> &policer->egress_prof, pkts,
>>> +                                          
>>> pkt_cnt, should_steal);
>>>       return cnt;
>>>   }
>>> @@ -4496,6 +4504,326 @@ static const struct dpdk_qos_ops 
>>> egress_policer_ops = {
>>>       .qos_run = egress_policer_run
>>>   };
>>> +/* trtcm-policer details */
>>> +
>>> +struct trtcm_policer {
>>> +    struct qos_conf qos_conf;
>>> +    struct rte_meter_trtcm_rfc4115_params meter_params;
>>> +    struct rte_meter_trtcm_rfc4115_profile meter_profile;
>>> +    struct rte_meter_trtcm_rfc4115 meter;
>>> +    struct netdev_queue_stats stats;
>>> +    struct hmap queues;
>>> +};
>>> +
>>> +struct trtcm_policer_queue {
>>> +    struct hmap_node hmap_node;
>>> +    uint32_t queue_id;
>>> +    struct rte_meter_trtcm_rfc4115_params meter_params;
>>> +    struct rte_meter_trtcm_rfc4115_profile meter_profile;
>>> +    struct rte_meter_trtcm_rfc4115 meter;
>>> +    struct netdev_queue_stats stats;
>>> +};
>>> +
>>> +static void
>>> +trtcm_policer_details_to_param(const struct smap *details,
>>> +                               struct 
>>> rte_meter_trtcm_rfc4115_params *params)
>>> +{
>>> +    memset(params, 0, sizeof *params);
>>> +    params->cir = smap_get_ullong(details, "cir", 0);
>>> +    params->eir = smap_get_ullong(details, "eir", 0);
>>> +    params->cbs = smap_get_ullong(details, "cbs", 0);
>>> +    params->ebs = smap_get_ullong(details, "ebs", 0);
>>> +}
>>> +
>>> +static void
>>> +trtcm_policer_param_to_detail(
>>> +    const struct rte_meter_trtcm_rfc4115_params *params,
>>> +    struct smap *details)
>>> +{
>>> +    smap_add_format(details, "cir", "%"PRIu64, params->cir);
>>> +    smap_add_format(details, "eir", "%"PRIu64, params->eir);
>>> +    smap_add_format(details, "cbs", "%"PRIu64, params->cbs);
>>> +    smap_add_format(details, "ebs", "%"PRIu64, params->ebs);
>>> +}
>>> +
>>> +
>>> +static int
>>> +trtcm_policer_qos_construct(const struct smap *details,
>>> +                            struct 
>>> qos_conf **conf)
>>> +{
>>> +    struct trtcm_policer *policer;
>>> +    int err = 0;
>>> +
>>> +    policer = xmalloc(sizeof *policer);
>>> +    qos_conf_init(&policer->qos_conf, &trtcm_policer_ops);
>>> +    trtcm_policer_details_to_param(details, 
>>> &policer->meter_params);
>>> +    err = 
>>> rte_meter_trtcm_rfc4115_profile_config(&policer->meter_profile,
>>> +                                                 
>>> &policer->meter_params);
>>> +    if (!err) {
>>> +        err = 
>>> rte_meter_trtcm_rfc4115_config(&policer->meter,
>>> +                                             
>>> &policer->meter_profile);
>>> +    }
>>> +
>>> +    if (!err) {
>>> +        *conf = &policer->qos_conf;
>>> +        memset(&policer->stats, 0, sizeof policer->stats);
>>> +        hmap_init(&policer->queues);
>>> +    } else {
>>> +        free(policer);
>>> +        *conf = NULL;
>>> +        err = -err;
>>> +    }
>>> +
>>> +    return err;
>>> +}
>>> +
>>> +static void
>>> +trtcm_policer_qos_destruct(struct qos_conf *conf)
>>> +{
>>> +    struct trtcm_policer_queue *queue, *next_queue;
>>> +    struct trtcm_policer *policer = CONTAINER_OF(conf, struct 
>>> trtcm_policer,
>>> +                                                 
>>> qos_conf);
>>> +
>>> +    HMAP_FOR_EACH_SAFE (queue, next_queue, hmap_node, 
>>> &policer->queues) {
>>> +        hmap_remove(&policer->queues, &queue->hmap_node);
>>> +        free(queue);
>>> +    }
>>> +    hmap_destroy(&policer->queues);
>>> +    free(policer);
>>> +}
>>> +
>>> +static int
>>> +trtcm_policer_qos_get(const struct qos_conf *conf, struct smap 
>>> *details)
>>> +{
>>> +    struct trtcm_policer *policer = CONTAINER_OF(conf, struct 
>>> trtcm_policer,
>>> +                                                 
>>> qos_conf);
>>> +
>>> +    trtcm_policer_param_to_detail(&policer->meter_params, 
>>> details);
>>> +    return 0;
>>> +}
>>> +
>>> +static bool
>>> +trtcm_policer_qos_is_equal(const struct qos_conf *conf,
>>> +                           const struct 
>>> smap *details)
>>> +{
>>> +    struct trtcm_policer *policer = CONTAINER_OF(conf, struct 
>>> trtcm_policer,
>>> +                                                 
>>> qos_conf);
>>> +    struct rte_meter_trtcm_rfc4115_params params;
>>> +
>>> +    trtcm_policer_details_to_param(details, &params);
>>> +
>>> +    return !memcmp(&params, &policer->meter_params, sizeof 
>>> params);
>>> +}
>>> +
>>> +static struct trtcm_policer_queue *
>>> +trtcm_policer_qos_find_queue(struct trtcm_policer *policer, 
>>> uint32_t queue_id)
>>> +{
>>> +    struct trtcm_policer_queue *queue;
>>> +    HMAP_FOR_EACH_WITH_HASH (queue, hmap_node, 
>>> hash_2words(queue_id, 0),
>>> +                             
>>> &policer->queues) {
>>> +        if (queue->queue_id == queue_id) {
>>> +            return queue;
>>> +        }
>>> +    }
>>> +    return NULL;
>>> +}
>>> +
>>> +static inline bool
>>> +trtcm_policer_run_single_packet(struct trtcm_policer *policer,
>>> +                                
>>> struct rte_mbuf *pkt, uint64_t time)
>>> +{
>>> +    enum rte_color pkt_color;
>>> +    struct trtcm_policer_queue *queue;
>>> +    uint32_t pkt_len = rte_pktmbuf_pkt_len(pkt) - sizeof(struct 
>>> rte_ether_hdr);
>>> +    struct dp_packet *dpkt = CONTAINER_OF(pkt, struct dp_packet, 
>>> mbuf);
>>> +
>>> +    queue = trtcm_policer_qos_find_queue(policer, 
>>> dpkt->md.skb_priority);
>>> +    if (!queue) {
>>> +        /* If no queue is found, use the default queue, 
>>> which MUST exist. */
>>> +        queue = trtcm_policer_qos_find_queue(policer, 0);
>>> +        if (!queue) {
>>> +            return false;
>>> +        }
>>> +    }
>>> +
>>> +    pkt_color = 
>>> rte_meter_trtcm_rfc4115_color_blind_check(&queue->meter,
>
> A few times during testing I have seen OVS crash with the following
>
> ./launch_vswitch.sh: line 66: 11694 Floating point exception sudo 
> $OVS_DIR/vswitchd/ovs-vswitchd unix:$DB_SOCK --pidfile
>
> Looking into it with GDB it sems related to the 
> rte_meter_trtcm_rfc4115_color_blind_check above. See GDB output below.
>
> Thread 12 "pmd-c03/id:9" received signal SIGFPE, Arithmetic exception.
> [Switching to Thread 0x7f3dce734700 (LWP 26465)]
> 0x0000000000d4b92d in rte_meter_trtcm_rfc4115_color_blind_check 
> (m=0x328e178, p=0x328e148, time=29107058565136113, pkt_len=46) at 
> /opt/istokes/dpdk-19.11//x86_64-native-linuxapp-gcc/include/rte_meter.h:599
> 599             n_periods_te = time_diff_te / p->eir_period;
> (gdb) bt
> #0  0x0000000000d4b92d in rte_meter_trtcm_rfc4115_color_blind_check 
> (m=0x328e178, p=0x328e148, time=29107058565136113, pkt_len=46) at 
> /opt/istokes/dpdk-19.11//x86_64-native-linuxapp-gcc/include/rte_meter.h:599
> #1  0x0000000000d4abc1 in trtcm_policer_run_single_packet 
> (policer=0x2774200, pkt=0x1508fbd40, time=29107058565136113) at 
> lib/netdev-dpdk.c:4649
> #2  0x0000000000d4ad24 in trtcm_policer_run (conf=0x2774200, 
> pkts=0x7f3db8005100, pkt_cnt=32, should_steal=true) at 
> lib/netdev-dpdk.c:4691
> #3  0x0000000000d45299 in netdev_dpdk_qos_run (dev=0x17fd68840, 
> pkts=0x7f3db8005100, cnt=32, should_steal=true) at 
> lib/netdev-dpdk.c:2421
> #4  0x0000000000d45db0 in netdev_dpdk_send__ (dev=0x17fd68840, qid=1, 
> batch=0x7f3db80050f0, concurrent_txq=false) at lib/netdev-dpdk.c:2683
> #5  0x0000000000d45ee9 in netdev_dpdk_eth_send (netdev=0x17fd688c0, 
> qid=1, batch=0x7f3db80050f0, concurrent_txq=false) at 
> lib/netdev-dpdk.c:2710
> #6  0x0000000000c342ba in netdev_send (netdev=0x17fd688c0, qid=1, 
> batch=0x7f3db80050f0, concurrent_txq=false) at lib/netdev.c:814
> #7  0x0000000000beb3de in dp_netdev_pmd_flush_output_on_port 
> (pmd=0x7f3dce735010, p=0x7f3db80050c0) at lib/dpif-netdev.c:4224
> #8  0x0000000000beb5c4 in dp_netdev_pmd_flush_output_packets 
> (pmd=0x7f3dce735010, force=false) at lib/dpif-netdev.c:4264
> #9  0x0000000000beb814 in dp_netdev_process_rxq_port 
> (pmd=0x7f3dce735010, rxq=0x328d930, port_no=2) at 
> lib/dpif-netdev.c:4319
> #10 0x0000000000bef432 in pmd_thread_main (f_=0x7f3dce735010) at 
> lib/dpif-netdev.c:5556
> #11 0x0000000000cb24e5 in ovsthread_wrapper (aux_=0x326d220) at 
> lib/ovs-thread.c:383
> #12 0x00007f3de11d236d in start_thread (arg=0x7f3dce734700) at 
> pthread_create.c:456
> #13 0x00007f3de06bab4f in clone () at 
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:97
>
> Regards
> Ian
>>> +                                                          
>>> &queue->meter_profile,
>>> +                                                          
>>> time,
>>> +                                                          
>>> pkt_len);
>>> +
>>> +    if (pkt_color == RTE_COLOR_RED) {
>>> +        queue->stats.tx_errors++;
>>> +    } else {
>>> +        queue->stats.tx_bytes += pkt_len;
>>> +        queue->stats.tx_packets++;
>>> +    }
>>> +
>>> +    pkt_color = 
>>> rte_meter_trtcm_rfc4115_color_aware_check(&policer->meter,
>>> +                                                     
>>> &policer->meter_profile,
>>> +                                                     
>>> time, pkt_len,
>>> +                                                     
>>> pkt_color);
>>> +
>>> +    if (pkt_color == RTE_COLOR_RED) {
>>> +        policer->stats.tx_errors++;
>>> +        return false;
>>> +    }
>>> +
>>> +    policer->stats.tx_bytes += pkt_len;
>>> +    policer->stats.tx_packets++;
>>> +    return true;
>>> +}
>>> +
>>> +static int
>>> +trtcm_policer_run(struct qos_conf *conf, struct rte_mbuf **pkts, 
>>> int pkt_cnt,
>>> +                  bool should_steal)
>>> +{
>>> +    int i = 0;
>>> +    int cnt = 0;
>>> +    struct rte_mbuf *pkt = NULL;
>>> +    uint64_t current_time = rte_rdtsc();
>>> +
>>> +    struct trtcm_policer *policer = CONTAINER_OF(conf, struct 
>>> trtcm_policer,
>>> +                                                 
>>> qos_conf);
>>> +
>>> +    for (i = 0; i < pkt_cnt; i++) {
>>> +        pkt = pkts[i];
>>> +
>>> +        if (trtcm_policer_run_single_packet(policer, pkt, 
>>> current_time)) {
>>> +            if (cnt != i) {
>>> +                pkts[cnt] = pkt;
>>> +            }
>>> +            cnt++;
>>> +        } else {
>>> +            if (should_steal) {
>>> +                rte_pktmbuf_free(pkt);
>>> +            }
>>> +        }
>>> +    }
>>> +    return cnt;
>>> +}
>>> +
>>> +static int
>>> +trtcm_policer_qos_queue_construct(const struct smap *details,
>>> +                                  
>>> uint32_t queue_id, struct qos_conf *conf)
>>> +{
>>> +    int err = 0;
>>> +    struct trtcm_policer_queue *queue;
>>> +    struct trtcm_policer *policer = CONTAINER_OF(conf, struct 
>>> trtcm_policer,
>>> +                                                 
>>> qos_conf);
>>> +
>>> +    queue = trtcm_policer_qos_find_queue(policer, queue_id);
>>> +    if (!queue) {
>>> +        queue = xmalloc(sizeof *queue);
>>> +        queue->queue_id = queue_id;
>>> +        memset(&queue->stats, 0, sizeof queue->stats);
>>> +        queue->stats.created = time_msec();
>>> +        hmap_insert(&policer->queues, &queue->hmap_node,
>>> +                    hash_2words(queue_id, 0));
>>> +    }
>>> +    if (queue_id == 0 && smap_is_empty(details)) {
>>> +        /* No default queue configured, use port values */
>>> +        memcpy(&queue->meter_params, &policer->meter_params,
>>> +               sizeof queue->meter_params);
>>> +    } else {
>>> +        trtcm_policer_details_to_param(details, 
>>> &queue->meter_params);
>>> +    }
>>> +
>>> +    err = 
>>> rte_meter_trtcm_rfc4115_profile_config(&queue->meter_profile,
>>> +                                                 
>>> &queue->meter_params);
>>> +
>>> +    if (!err) {
>>> +        err = rte_meter_trtcm_rfc4115_config(&queue->meter,
>>> +                                             
>>> &queue->meter_profile);
>>> +    }
>>> +    if (err) {
>>> +        hmap_remove(&policer->queues, &queue->hmap_node);
>>> +        free(queue);
>>> +        err = -err;
>>> +    }
>>> +    return err;
>>> +}
>>> +
>>> +static void
>>> +trtcm_policer_qos_queue_destruct(struct qos_conf *conf, uint32_t 
>>> queue_id)
>>> +{
>>> +    struct trtcm_policer_queue *queue;
>>> +    struct trtcm_policer *policer = CONTAINER_OF(conf, struct 
>>> trtcm_policer,
>>> +                                                 
>>> qos_conf);
>>> +
>>> +    queue = trtcm_policer_qos_find_queue(policer, queue_id);
>>> +    if (queue) {
>>> +        hmap_remove(&policer->queues, &queue->hmap_node);
>>> +        free(queue);
>>> +    }
>>> +}
>>> +
>>> +static int
>>> +trtcm_policer_qos_queue_get(struct smap *details, uint32_t 
>>> queue_id,
>>> +                            const struct 
>>> qos_conf *conf)
>>> +{
>>> +    struct trtcm_policer_queue *queue;
>>> +    struct trtcm_policer *policer = CONTAINER_OF(conf, struct 
>>> trtcm_policer,
>>> +                                                 
>>> qos_conf);
>>> +
>>> +    queue = trtcm_policer_qos_find_queue(policer, queue_id);
>>> +    if (!queue) {
>>> +        return EINVAL;
>>> +    }
>>> +
>>> +    trtcm_policer_param_to_detail(&queue->meter_params, 
>>> details);
>>> +    return 0;
>>> +}
>>> +
>>> +static int
>>> +trtcm_policer_qos_queue_get_stats(const struct qos_conf *conf,
>>> +                                  
>>> uint32_t queue_id,
>>> +                                  
>>> struct netdev_queue_stats *stats)
>>> +{
>>> +    struct trtcm_policer_queue *queue;
>>> +    struct trtcm_policer *policer = CONTAINER_OF(conf, struct 
>>> trtcm_policer,
>>> +                                                 
>>> qos_conf);
>>> +
>>> +    queue = trtcm_policer_qos_find_queue(policer, queue_id);
>>> +    if (!queue) {
>>> +        return EINVAL;
>>> +    }
>>> +    memcpy(stats, &queue->stats, sizeof *stats);
>>> +    return 0;
>>> +}
>>> +
>>> +static int
>>> +trtcm_policer_qos_queue_dump_state_init(const struct qos_conf 
>>> *conf,
>>> +                                        
>>> struct netdev_dpdk_queue_state *state)
>>> +{
>>> +    uint32_t i = 0;
>>> +    struct trtcm_policer_queue *queue;
>>> +    struct trtcm_policer *policer = CONTAINER_OF(conf, struct 
>>> trtcm_policer,
>>> +                                                 
>>> qos_conf);
>>> +
>>> +    state->n_queues = hmap_count(&policer->queues);
>>> +    state->cur_queue = 0;
>>> +    state->queues = xmalloc(state->n_queues * sizeof 
>>> *state->queues);
>>> +
>>> +    HMAP_FOR_EACH (queue, hmap_node, &policer->queues) {
>>> +        state->queues[i++] = queue->queue_id;
>>> +    }
>>> +    return 0;
>>> +}
>>> +
>>> +static const struct dpdk_qos_ops trtcm_policer_ops = {
>>> +    .qos_name = "trtcm-policer",
>>> +    .qos_construct = trtcm_policer_qos_construct,
>>> +    .qos_destruct = trtcm_policer_qos_destruct,
>>> +    .qos_get = trtcm_policer_qos_get,
>>> +    .qos_is_equal = trtcm_policer_qos_is_equal,
>>> +    .qos_run = trtcm_policer_run,
>>> +    .qos_queue_construct = trtcm_policer_qos_queue_construct,
>>> +    .qos_queue_destruct = trtcm_policer_qos_queue_destruct,
>>> +    .qos_queue_get = trtcm_policer_qos_queue_get,
>>> +    .qos_queue_get_stats = trtcm_policer_qos_queue_get_stats,
>>> +    .qos_queue_dump_state_init = 
>>> trtcm_policer_qos_queue_dump_state_init
>>> +};
>>> +
>>>   static int
>>>   netdev_dpdk_reconfigure(struct netdev *netdev)
>>>   {
>>> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
>>> index 0ec726c..c43cb1a 100644
>>> --- a/vswitchd/vswitch.xml
>>> +++ b/vswitchd/vswitch.xml
>>> @@ -4441,6 +4441,19 @@ ovs-vsctl add-port br0 p0 -- set Interface p0 
>>> type=patch options:peer=p1 \
>>>             in performance will be noticed in terms of 
>>> overall aggregate traffic
>>>             throughput.
>>>           </dd>
>>> +        <dt><code>trtcm-policer</code></dt>
>>> +        <dd>
>>> +          A DPDK egress policer algorithm using RFC 4115's 
>>> Two-Rate,
>>> +          Three-Color marker. It's a two-level 
>>> hierarchical policer
>>> +          which first does a color-blind marking of the 
>>> traffic at the queue
>>> +          level, followed by a color-aware marking at the 
>>> port level. At the
>>> +          end traffic marked as Green or Yellow is 
>>> forwarded, Red is dropped.
>>> +          For details on how traffic is marked, see RFC 
>>> 4115.
>>> +
>>> +          If the ``default queue'', 0, is not configured 
>>> it's automatically
>>> +          created with the same <code>other_config</code> 
>>> values as the
>>> +          physical port.
>>> +        </dd>
>>>         </dl>
>>>       </column>
>>> @@ -4508,6 +4521,27 @@ ovs-vsctl add-port br0 p0 -- set Interface p0 
>>> type=patch options:peer=p1 \
>>>           bytes/tokens of the packet. If there are not 
>>> enough tokens in the cbs
>>>           bucket the packet will be dropped.
>>>         </column>
>>> +      <column name="other_config" key="eir" type='{"type": 
>>> "integer"}'>
>>> +        The Excess Information Rate (EIR) is measured in 
>>> bytes of IP
>>> +        packets per second, i.e. it includes the IP header, 
>>> but not link
>>> +        specific (e.g. Ethernet) headers. This represents 
>>> the bytes per second
>>> +        rate at which the token bucket will be updated. The 
>>> eir value is
>>> +        calculated by (pps x packet data size).  For 
>>> example assuming a user
>>> +        wishes to limit a stream consisting of 64 byte 
>>> packets to 1 million
>>> +        packets per second the EIR would be set to to to 
>>> 46000000. This value
>>> +        can be broken into '1,000,000 x 46'. Where 1,000,000 
>>> is the policing
>>> +        rate for the number of packets per second and 46 
>>> represents the size
>>> +        of the packet data for a 64 byte ip packet.
>>> +      </column>
>>> +      <column name="other_config" key="ebs" type='{"type": 
>>> "integer"}'>
>>> +        The Excess Burst Size (EBS) is measured in bytes and 
>>> represents a
>>> +        token bucket. At a minimum this value should be be 
>>> set to the expected
>>> +        largest size packet in the traffic stream. In 
>>> practice larger values
>>> +        may be used to increase the size of the token 
>>> bucket. If a packet can
>>> +        be transmitted then the ebs will be decremented by 
>>> the number of
>>> +        bytes/tokens of the packet. If there are not enough 
>>> tokens in the cbs
>>> +        bucket the packet might be dropped.
>>> +      </column>
>>>       </group>
>>>       <group title="Configuration for linux-sfq">
>>>
>> _______________________________________________
>> dev mailing list
>> dev@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Neil Horman Jan. 14, 2020, 11:57 a.m. UTC | #5
On Tue, Jan 14, 2020 at 09:35:33AM +0100, David Marchand wrote:
> Hey Eelco,
> 
> On Mon, Jan 13, 2020 at 4:57 PM Eelco Chaudron <echaudro@redhat.com> wrote:
> [snip]
> > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> > index 128963f..3f164c8 100644
> > --- a/lib/netdev-dpdk.c
> > +++ b/lib/netdev-dpdk.c
> > @@ -26,7 +26,10 @@
> >  #include <sys/socket.h>
> >  #include <linux/if.h>
> >
> > +/* The below is needed as rte_meter.h gets included trough rte_bus_pci.h. */
> > +#define ALLOW_EXPERIMENTAL_API
> >  #include <rte_bus_pci.h>
> > +#undef ALLOW_EXPERIMENTAL_API
> 
> __rte_experimental is getting defined at the first inclusion of
> rte_compat.h, so here rte_bus_pci.h is the first one.
> __rte_experimental ends up being a big "all or nothing" switch, so we
> don't need to #undef.
> 
> Cc: Neil if he has a better idea/comments.
> 
If I'm reading this properly,  I think the request here is to have finer grained
control over the use of experimental API's?

If so, What fine grained control are you looking for? Do you just want to not
get the warning about a symbol being experimental only for the API exported in
rte_bus_pci.h, but not others?  If thats the case, I would make the argument
that the right (or perhaps just the more prduent) thing to do would be to accept
that you're getting warnings during the build, and that they are expected for
those particular API calls.  If you want to record that fact, I would do so at
the call site, using whatever syntax covscan or other linters use to record the
fact that you expect the warning there.  Thats actual documentation, and saves
you the odd ifdeffery above, and below

Neil

> >  #include <rte_config.h>
> >  #include <rte_cycles.h>
> >  #include <rte_errno.h>
> > @@ -35,7 +38,9 @@
> >  #include <rte_flow.h>
> >  #include <rte_malloc.h>
> >  #include <rte_mbuf.h>
> > +#define ALLOW_EXPERIMENTAL_API
> >  #include <rte_meter.h>
> > +#undef ALLOW_EXPERIMENTAL_API
> >  #include <rte_pci.h>
> >  #include <rte_version.h>
> >  #include <rte_vhost.h>
> 
> 
> -- 
> David Marchand
> 
>
Eelco Chaudron Jan. 14, 2020, 2:13 p.m. UTC | #6
On 14 Jan 2020, at 12:23, Stokes, Ian wrote:

> On 1/13/2020 8:32 PM, Stokes, Ian wrote:

<SNIP>

> Hi Eelco, I'm seeing a crash in OVS while running this with just a 
> port and a default queue 0 (phy to phy setup). It seems related to the 
> call to rte_meter_trtcm_rfc4115_color_blind_check. I've provided more 
> detail below in the trtcm_policer_run_single_packet function, just 
> wondering if you've come across it?
>
> Heres the output for the qos configuration I'm using
>
> -bash-4.4$ovs-appctl -t ovs-vswitchd qos/show dpdk1
> QoS: dpdk1 trtcm-policer
> eir: 52000
> cbs: 2048
> ebs: 2048
> cir: 52000
>
> Default:
>   eir: 52000
>   cbs: 2048
>   ebs: 2048
>   cir: 52000
>   tx_packets: 672150
>   tx_bytes: 30918900
>   tx_errors: 489562233
>
> I'll try to investigate further with DPDK and GDB also.

I tried to replicate this, but I’m not able to do so. How did you 
test? Reconfiguring it and start, etc. etc.?

<SNIP>

>
> A few times during testing I have seen OVS crash with the following
>
> ./launch_vswitch.sh: line 66: 11694 Floating point exception sudo 
> $OVS_DIR/vswitchd/ovs-vswitchd unix:$DB_SOCK --pidfile
>
> Looking into it with GDB it sems related to the 
> rte_meter_trtcm_rfc4115_color_blind_check above. See GDB output below.
>
> Thread 12 "pmd-c03/id:9" received signal SIGFPE, Arithmetic exception.
> [Switching to Thread 0x7f3dce734700 (LWP 26465)]
> 0x0000000000d4b92d in rte_meter_trtcm_rfc4115_color_blind_check 
> (m=0x328e178, p=0x328e148, time=29107058565136113, pkt_len=46) at 
> /opt/istokes/dpdk-19.11//x86_64-native-linuxapp-gcc/include/rte_meter.h:599
> 599             n_periods_te = time_diff_te / p->eir_period;
> (gdb) bt
> #0  0x0000000000d4b92d in rte_meter_trtcm_rfc4115_color_blind_check 
> (m=0x328e178, p=0x328e148, time=29107058565136113, pkt_len=46) at 
> /opt/istokes/dpdk-19.11//x86_64-native-linuxapp-gcc/include/rte_meter.h:599
> #1  0x0000000000d4abc1 in trtcm_policer_run_single_packet 
> (policer=0x2774200, pkt=0x1508fbd40, time=29107058565136113) at 
> lib/netdev-dpdk.c:4649
> #2  0x0000000000d4ad24 in trtcm_policer_run (conf=0x2774200, 
> pkts=0x7f3db8005100, pkt_cnt=32, should_steal=true) at 
> lib/netdev-dpdk.c:4691
> #3  0x0000000000d45299 in netdev_dpdk_qos_run (dev=0x17fd68840, 
> pkts=0x7f3db8005100, cnt=32, should_steal=true) at 
> lib/netdev-dpdk.c:2421
> #4  0x0000000000d45db0 in netdev_dpdk_send__ (dev=0x17fd68840, qid=1, 
> batch=0x7f3db80050f0, concurrent_txq=false) at lib/netdev-dpdk.c:2683
> #5  0x0000000000d45ee9 in netdev_dpdk_eth_send (netdev=0x17fd688c0, 
> qid=1, batch=0x7f3db80050f0, concurrent_txq=false) at 
> lib/netdev-dpdk.c:2710
> #6  0x0000000000c342ba in netdev_send (netdev=0x17fd688c0, qid=1, 
> batch=0x7f3db80050f0, concurrent_txq=false) at lib/netdev.c:814
> #7  0x0000000000beb3de in dp_netdev_pmd_flush_output_on_port 
> (pmd=0x7f3dce735010, p=0x7f3db80050c0) at lib/dpif-netdev.c:4224
> #8  0x0000000000beb5c4 in dp_netdev_pmd_flush_output_packets 
> (pmd=0x7f3dce735010, force=false) at lib/dpif-netdev.c:4264
> #9  0x0000000000beb814 in dp_netdev_process_rxq_port 
> (pmd=0x7f3dce735010, rxq=0x328d930, port_no=2) at 
> lib/dpif-netdev.c:4319
> #10 0x0000000000bef432 in pmd_thread_main (f_=0x7f3dce735010) at 
> lib/dpif-netdev.c:5556
> #11 0x0000000000cb24e5 in ovsthread_wrapper (aux_=0x326d220) at 
> lib/ovs-thread.c:383
> #12 0x00007f3de11d236d in start_thread (arg=0x7f3dce734700) at 
> pthread_create.c:456
> #13 0x00007f3de06bab4f in clone () at 
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:97

Looks like a divide by zero, but that was fixed and seems to be in DPDK 
v19.11, ebe3a769911071450acb808153ec2a2496726906

So for some reason rte_meter_get_tb_params() might return 0 in 
eir_period. Looking at the code, I would say this could only really 
happen if rte_get_tsc_hz() returns 0, which seems odd… Could this 
happen in your system for some reason?

//Eelco


<SNIP>
Stokes, Ian Jan. 14, 2020, 3:21 p.m. UTC | #7
On 1/14/2020 2:13 PM, Eelco Chaudron wrote:
> 
> 
> On 14 Jan 2020, at 12:23, Stokes, Ian wrote:
> 
>> On 1/13/2020 8:32 PM, Stokes, Ian wrote:
> 
> <SNIP>
> 
>> Hi Eelco, I'm seeing a crash in OVS while running this with just a 
>> port and a default queue 0 (phy to phy setup). It seems related to the 
>> call to rte_meter_trtcm_rfc4115_color_blind_check. I've provided more 
>> detail below in the trtcm_policer_run_single_packet function, just 
>> wondering if you've come across it?
>>
>> Heres the output for the qos configuration I'm using
>>
>> -bash-4.4$ovs-appctl -t ovs-vswitchd qos/show dpdk1
>> QoS: dpdk1 trtcm-policer
>> eir: 52000
>> cbs: 2048
>> ebs: 2048
>> cir: 52000
>>
>> Default:
>>   eir: 52000
>>   cbs: 2048
>>   ebs: 2048
>>   cir: 52000
>>   tx_packets: 672150
>>   tx_bytes: 30918900
>>   tx_errors: 489562233
>>
>> I'll try to investigate further with DPDK and GDB also.
> 
> I tried to replicate this, but I’m not able to do so. How did you test? 
> Reconfiguring it and start, etc. etc.?

Starting a fresh instance of OVS (cleared previous OVSDB etc.).

dpdk-socket-mem="1024,0"
dpdk-lcore-mask="0x2"
pmd-cpu-mask="0xC"

2 phy ports only, 1 rxq per phy port.

Flow rules are basic (in port 1 out port 2)

Traffic profile is IPv4 UDP 64 byte packets at line rate (10G)

QoS Setup with the following

sudo $OVS_DIR/utilities/ovs-vsctl --timeout=5 set port dpdk1 qos=@myqos -- \
--id=@myqos create qos type=trtcm-policer \
other-config:cir=52000 other-config:cbs=2048 \
other-config:eir=52000 other-config:ebs=2048

 From there it's a case of leaving traffic run (between 10 to 15 mins) 
before the segfault occurs.


> 
> <SNIP>
> 
>>
>> A few times during testing I have seen OVS crash with the following
>>
>> ./launch_vswitch.sh: line 66: 11694 Floating point exception sudo 
>> $OVS_DIR/vswitchd/ovs-vswitchd unix:$DB_SOCK --pidfile
>>
>> Looking into it with GDB it sems related to the 
>> rte_meter_trtcm_rfc4115_color_blind_check above. See GDB output below.
>>
>> Thread 12 "pmd-c03/id:9" received signal SIGFPE, Arithmetic exception.
>> [Switching to Thread 0x7f3dce734700 (LWP 26465)]
>> 0x0000000000d4b92d in rte_meter_trtcm_rfc4115_color_blind_check 
>> (m=0x328e178, p=0x328e148, time=29107058565136113, pkt_len=46) at 
>> /opt/istokes/dpdk-19.11//x86_64-native-linuxapp-gcc/include/rte_meter.h:599 
>>
>> 599             n_periods_te = time_diff_te / p->eir_period;
>> (gdb) bt
>> #0  0x0000000000d4b92d in rte_meter_trtcm_rfc4115_color_blind_check 
>> (m=0x328e178, p=0x328e148, time=29107058565136113, pkt_len=46) at 
>> /opt/istokes/dpdk-19.11//x86_64-native-linuxapp-gcc/include/rte_meter.h:599 
>>
>> #1  0x0000000000d4abc1 in trtcm_policer_run_single_packet 
>> (policer=0x2774200, pkt=0x1508fbd40, time=29107058565136113) at 
>> lib/netdev-dpdk.c:4649
>> #2  0x0000000000d4ad24 in trtcm_policer_run (conf=0x2774200, 
>> pkts=0x7f3db8005100, pkt_cnt=32, should_steal=true) at 
>> lib/netdev-dpdk.c:4691
>> #3  0x0000000000d45299 in netdev_dpdk_qos_run (dev=0x17fd68840, 
>> pkts=0x7f3db8005100, cnt=32, should_steal=true) at lib/netdev-dpdk.c:2421
>> #4  0x0000000000d45db0 in netdev_dpdk_send__ (dev=0x17fd68840, qid=1, 
>> batch=0x7f3db80050f0, concurrent_txq=false) at lib/netdev-dpdk.c:2683
>> #5  0x0000000000d45ee9 in netdev_dpdk_eth_send (netdev=0x17fd688c0, 
>> qid=1, batch=0x7f3db80050f0, concurrent_txq=false) at 
>> lib/netdev-dpdk.c:2710
>> #6  0x0000000000c342ba in netdev_send (netdev=0x17fd688c0, qid=1, 
>> batch=0x7f3db80050f0, concurrent_txq=false) at lib/netdev.c:814
>> #7  0x0000000000beb3de in dp_netdev_pmd_flush_output_on_port 
>> (pmd=0x7f3dce735010, p=0x7f3db80050c0) at lib/dpif-netdev.c:4224
>> #8  0x0000000000beb5c4 in dp_netdev_pmd_flush_output_packets 
>> (pmd=0x7f3dce735010, force=false) at lib/dpif-netdev.c:4264
>> #9  0x0000000000beb814 in dp_netdev_process_rxq_port 
>> (pmd=0x7f3dce735010, rxq=0x328d930, port_no=2) at lib/dpif-netdev.c:4319
>> #10 0x0000000000bef432 in pmd_thread_main (f_=0x7f3dce735010) at 
>> lib/dpif-netdev.c:5556
>> #11 0x0000000000cb24e5 in ovsthread_wrapper (aux_=0x326d220) at 
>> lib/ovs-thread.c:383
>> #12 0x00007f3de11d236d in start_thread (arg=0x7f3dce734700) at 
>> pthread_create.c:456
>> #13 0x00007f3de06bab4f in clone () at 
>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:97
> 
> Looks like a divide by zero, but that was fixed and seems to be in DPDK 
> v19.11, ebe3a769911071450acb808153ec2a2496726906

I've confirmed I'm testing against 19.11.0 and that commit is present.


> 
> So for some reason rte_meter_get_tb_params() might return 0 in 
> eir_period. Looking at the code, I would say this could only really 
> happen if rte_get_tsc_hz() returns 0, which seems odd… Could this happen 
> in your system for some reason?

I don't think rte_get_tsc_hz is returning 0, at east the values for the 
function call in GDB don't seem to suggest this. Snippet below from 
rte_meter_trtcm_rfc4115_color_blind_check that I'm checking with GDB.

rte_meter_trtcm_rfc4115_color_blind_check(struct rte_meter_trtcm_rfc4115 
*m, struct rte_meter_trtcm_rfc4115_profile *p, uint64_t time, uint32_t 
pkt_len) 
 

{ 
 
 


     uint64_t time_diff_tc, time_diff_te, n_periods_tc, n_periods_te,
     tc, te;
 

 
 
 

     /* Bucket update */
     time_diff_tc = time - m->time_tc;
     time_diff_te = time - m->time_te;
 
 

     n_periods_tc = time_diff_tc / p->cir_period;
     n_periods_te = time_diff_te / p->eir_period;

Looking at values with GDB gives the following

(gdb) p *p
$1 = {cbs = 2048, ebs = 2048, cir_period = 44230, cir_bytes_per_period = 
1, eir_period = 44230, eir_bytes_per_period = 1}
(gdb) p time_diff_tc
$2 = 29137292739849292
(gdb) p n_periods_tc
$3 = 13937399
(gdb) p time
$4 = 29137292739849292
(gdb) p m->time_tc
$5 = 29137292739858117
(gdb) p *m
$6 = {time_tc = 29137292739858117, time_te = 29137292739858117, tc = 
2048, te = 2048}
(gdb) p time_diff_te
$7 = 29137292739849289
(gdb) p p->eir_period
$8 = 44230
(gdb) p n_periods_te
$9 = 140260075883992

I don't have another board to test on at the moment but will try.

Ian

> 
> //Eelco
> 
> 
> <SNIP>
>
Eelco Chaudron Jan. 14, 2020, 4:05 p.m. UTC | #8
On 14 Jan 2020, at 12:57, Neil Horman wrote:

> On Tue, Jan 14, 2020 at 09:35:33AM +0100, David Marchand wrote:
>> Hey Eelco,
>>
>> On Mon, Jan 13, 2020 at 4:57 PM Eelco Chaudron <echaudro@redhat.com> 
>> wrote:
>> [snip]
>>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
>>> index 128963f..3f164c8 100644
>>> --- a/lib/netdev-dpdk.c
>>> +++ b/lib/netdev-dpdk.c
>>> @@ -26,7 +26,10 @@
>>>  #include <sys/socket.h>
>>>  #include <linux/if.h>
>>>
>>> +/* The below is needed as rte_meter.h gets included trough 
>>> rte_bus_pci.h. */
>>> +#define ALLOW_EXPERIMENTAL_API
>>>  #include <rte_bus_pci.h>
>>> +#undef ALLOW_EXPERIMENTAL_API
>>
>> __rte_experimental is getting defined at the first inclusion of
>> rte_compat.h, so here rte_bus_pci.h is the first one.
>> __rte_experimental ends up being a big "all or nothing" switch, so we
>> don't need to #undef.
>>
>> Cc: Neil if he has a better idea/comments.
>>
> If I'm reading this properly,  I think the request here is to have 
> finer grained
> control over the use of experimental API's?
>
> If so, What fine grained control are you looking for? Do you just want 
> to not
> get the warning about a symbol being experimental only for the API 
> exported in
> rte_bus_pci.h, but not others?  If thats the case, I would make the 
> argument
> that the right (or perhaps just the more prduent) thing to do would be 
> to accept
> that you're getting warnings during the build, and that they are 
> expected for
> those particular API calls.

That was the idea, however just ignore the warnings will not work as OVS 
gets compiled with -Wall.

For now I will sent a patch as suggested by David…

> If you want to record that fact, I would do so at
> the call site, using whatever syntax covscan or other linters use to 
> record the
> fact that you expect the warning there.  Thats actual documentation, 
> and saves
> you the odd ifdeffery above, and below
>
> Neil
>
>>>  #include <rte_config.h>
>>>  #include <rte_cycles.h>
>>>  #include <rte_errno.h>
>>> @@ -35,7 +38,9 @@
>>>  #include <rte_flow.h>
>>>  #include <rte_malloc.h>
>>>  #include <rte_mbuf.h>
>>> +#define ALLOW_EXPERIMENTAL_API
>>>  #include <rte_meter.h>
>>> +#undef ALLOW_EXPERIMENTAL_API
>>>  #include <rte_pci.h>
>>>  #include <rte_version.h>
>>>  #include <rte_vhost.h>
>>
>>
>> -- 
>> David Marchand
>>
>>
Eelco Chaudron Jan. 14, 2020, 4:21 p.m. UTC | #9
On 14 Jan 2020, at 16:21, Stokes, Ian wrote:

> On 1/14/2020 2:13 PM, Eelco Chaudron wrote:
>>
>>
>> On 14 Jan 2020, at 12:23, Stokes, Ian wrote:
>>
>>> On 1/13/2020 8:32 PM, Stokes, Ian wrote:
>>
>> <SNIP>
>>
>>> Hi Eelco, I'm seeing a crash in OVS while running this with just a 
>>> port and a default queue 0 (phy to phy setup). It seems related to 
>>> the call to rte_meter_trtcm_rfc4115_color_blind_check. I've provided 
>>> more detail below in the trtcm_policer_run_single_packet function, 
>>> just wondering if you've come across it?
>>>
>>> Heres the output for the qos configuration I'm using
>>>
>>> -bash-4.4$ovs-appctl -t ovs-vswitchd qos/show dpdk1
>>> QoS: dpdk1 trtcm-policer
>>> eir: 52000
>>> cbs: 2048
>>> ebs: 2048
>>> cir: 52000
>>>
>>> Default:
>>>   eir: 52000
>>>   cbs: 2048
>>>   ebs: 2048
>>>   cir: 52000
>>>   tx_packets: 672150
>>>   tx_bytes: 30918900
>>>   tx_errors: 489562233
>>>
>>> I'll try to investigate further with DPDK and GDB also.
>>
>> I tried to replicate this, but I’m not able to do so. How did you 
>> test? Reconfiguring it and start, etc. etc.?
>
> Starting a fresh instance of OVS (cleared previous OVSDB etc.).
>
> dpdk-socket-mem="1024,0"
> dpdk-lcore-mask="0x2"
> pmd-cpu-mask="0xC"
>
> 2 phy ports only, 1 rxq per phy port.
>
> Flow rules are basic (in port 1 out port 2)
>
> Traffic profile is IPv4 UDP 64 byte packets at line rate (10G)
>
> QoS Setup with the following
>
> sudo $OVS_DIR/utilities/ovs-vsctl --timeout=5 set port dpdk1 
> qos=@myqos -- \
> --id=@myqos create qos type=trtcm-policer \
> other-config:cir=52000 other-config:cbs=2048 \
> other-config:eir=52000 other-config:ebs=2048
>
> From there it's a case of leaving traffic run (between 10 to 15 mins) 
> before the segfault occurs.

Tried a couple of runs, but no luck…

>>
>> <SNIP>
>>
>>>
>>> A few times during testing I have seen OVS crash with the following
>>>
>>> ./launch_vswitch.sh: line 66: 11694 Floating point exception sudo 
>>> $OVS_DIR/vswitchd/ovs-vswitchd unix:$DB_SOCK --pidfile
>>>
>>> Looking into it with GDB it sems related to the 
>>> rte_meter_trtcm_rfc4115_color_blind_check above. See GDB output 
>>> below.
>>>
>>> Thread 12 "pmd-c03/id:9" received signal SIGFPE, Arithmetic 
>>> exception.
>>> [Switching to Thread 0x7f3dce734700 (LWP 26465)]
>>> 0x0000000000d4b92d in rte_meter_trtcm_rfc4115_color_blind_check 
>>> (m=0x328e178, p=0x328e148, time=29107058565136113, pkt_len=46) at 
>>> /opt/istokes/dpdk-19.11//x86_64-native-linuxapp-gcc/include/rte_meter.h:599
>>> 599             n_periods_te = time_diff_te / 
>>> p->eir_period;
>>> (gdb) bt
>>> #0  0x0000000000d4b92d in rte_meter_trtcm_rfc4115_color_blind_check 
>>> (m=0x328e178, p=0x328e148, time=29107058565136113, pkt_len=46) at 
>>> /opt/istokes/dpdk-19.11//x86_64-native-linuxapp-gcc/include/rte_meter.h:599
>>> #1  0x0000000000d4abc1 in trtcm_policer_run_single_packet 
>>> (policer=0x2774200, pkt=0x1508fbd40, time=29107058565136113) at 
>>> lib/netdev-dpdk.c:4649
>>> #2  0x0000000000d4ad24 in trtcm_policer_run (conf=0x2774200, 
>>> pkts=0x7f3db8005100, pkt_cnt=32, should_steal=true) at 
>>> lib/netdev-dpdk.c:4691
>>> #3  0x0000000000d45299 in netdev_dpdk_qos_run (dev=0x17fd68840, 
>>> pkts=0x7f3db8005100, cnt=32, should_steal=true) at 
>>> lib/netdev-dpdk.c:2421
>>> #4  0x0000000000d45db0 in netdev_dpdk_send__ (dev=0x17fd68840, 
>>> qid=1, batch=0x7f3db80050f0, concurrent_txq=false) at 
>>> lib/netdev-dpdk.c:2683
>>> #5  0x0000000000d45ee9 in netdev_dpdk_eth_send (netdev=0x17fd688c0, 
>>> qid=1, batch=0x7f3db80050f0, concurrent_txq=false) at 
>>> lib/netdev-dpdk.c:2710
>>> #6  0x0000000000c342ba in netdev_send (netdev=0x17fd688c0, qid=1, 
>>> batch=0x7f3db80050f0, concurrent_txq=false) at lib/netdev.c:814
>>> #7  0x0000000000beb3de in dp_netdev_pmd_flush_output_on_port 
>>> (pmd=0x7f3dce735010, p=0x7f3db80050c0) at lib/dpif-netdev.c:4224
>>> #8  0x0000000000beb5c4 in dp_netdev_pmd_flush_output_packets 
>>> (pmd=0x7f3dce735010, force=false) at lib/dpif-netdev.c:4264
>>> #9  0x0000000000beb814 in dp_netdev_process_rxq_port 
>>> (pmd=0x7f3dce735010, rxq=0x328d930, port_no=2) at 
>>> lib/dpif-netdev.c:4319
>>> #10 0x0000000000bef432 in pmd_thread_main (f_=0x7f3dce735010) at 
>>> lib/dpif-netdev.c:5556
>>> #11 0x0000000000cb24e5 in ovsthread_wrapper (aux_=0x326d220) at 
>>> lib/ovs-thread.c:383
>>> #12 0x00007f3de11d236d in start_thread (arg=0x7f3dce734700) at 
>>> pthread_create.c:456
>>> #13 0x00007f3de06bab4f in clone () at 
>>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:97
>>
>> Looks like a divide by zero, but that was fixed and seems to be in 
>> DPDK v19.11, ebe3a769911071450acb808153ec2a2496726906
>
> I've confirmed I'm testing against 19.11.0 and that commit is present.
>
>
>>
>> So for some reason rte_meter_get_tb_params() might return 0 in 
>> eir_period. Looking at the code, I would say this could only really 
>> happen if rte_get_tsc_hz() returns 0, which seems odd… Could this 
>> happen in your system for some reason?
>
> I don't think rte_get_tsc_hz is returning 0, at east the values for 
> the function call in GDB don't seem to suggest this. Snippet below 
> from rte_meter_trtcm_rfc4115_color_blind_check that I'm checking with 
> GDB.
>
> rte_meter_trtcm_rfc4115_color_blind_check(struct 
> rte_meter_trtcm_rfc4115 *m, struct rte_meter_trtcm_rfc4115_profile *p, 
> uint64_t time, uint32_t pkt_len)
>
> {
>
>
>
>     uint64_t time_diff_tc, time_diff_te, n_periods_tc, n_periods_te,
>     tc, te;
>
>
>
>
>
>
>     /* Bucket update */
>     time_diff_tc = time - m->time_tc;
>     time_diff_te = time - m->time_te;
>
>
>
>     n_periods_tc = time_diff_tc / p->cir_period;
>     n_periods_te = time_diff_te / p->eir_period;
>
> Looking at values with GDB gives the following
>
> (gdb) p *p
> $1 = {cbs = 2048, ebs = 2048, cir_period = 44230, cir_bytes_per_period 
> = 1, eir_period = 44230, eir_bytes_per_period = 1}
> (gdb) p time_diff_tc
> $2 = 29137292739849292
> (gdb) p n_periods_tc
> $3 = 13937399
> (gdb) p time
> $4 = 29137292739849292
> (gdb) p m->time_tc
> $5 = 29137292739858117
> (gdb) p *m
> $6 = {time_tc = 29137292739858117, time_te = 29137292739858117, tc = 
> 2048, te = 2048}
> (gdb) p time_diff_te
> $7 = 29137292739849289
> (gdb) p p->eir_period
> $8 = 44230
> (gdb) p n_periods_te
> $9 = 140260075883992
>
> I don't have another board to test on at the moment but will try.

Odd, do not see how this would create a SIGFPE…
Stokes, Ian Jan. 14, 2020, 4:50 p.m. UTC | #10
On 1/14/2020 4:21 PM, Eelco Chaudron wrote:
> 
> 
> On 14 Jan 2020, at 16:21, Stokes, Ian wrote:
> 
>> On 1/14/2020 2:13 PM, Eelco Chaudron wrote:
>>>
>>>
>>> On 14 Jan 2020, at 12:23, Stokes, Ian wrote:
>>>
>>>> On 1/13/2020 8:32 PM, Stokes, Ian wrote:
>>>
>>> <SNIP>
>>>
>>>> Hi Eelco, I'm seeing a crash in OVS while running this with just a 
>>>> port and a default queue 0 (phy to phy setup). It seems related to 
>>>> the call to rte_meter_trtcm_rfc4115_color_blind_check. I've provided 
>>>> more detail below in the trtcm_policer_run_single_packet function, 
>>>> just wondering if you've come across it?
>>>>
>>>> Heres the output for the qos configuration I'm using
>>>>
>>>> -bash-4.4$ovs-appctl -t ovs-vswitchd qos/show dpdk1
>>>> QoS: dpdk1 trtcm-policer
>>>> eir: 52000
>>>> cbs: 2048
>>>> ebs: 2048
>>>> cir: 52000
>>>>
>>>> Default:
>>>>   eir: 52000
>>>>   cbs: 2048
>>>>   ebs: 2048
>>>>   cir: 52000
>>>>   tx_packets: 672150
>>>>   tx_bytes: 30918900
>>>>   tx_errors: 489562233
>>>>
>>>> I'll try to investigate further with DPDK and GDB also.
>>>
>>> I tried to replicate this, but I’m not able to do so. How did you 
>>> test? Reconfiguring it and start, etc. etc.?
>>
>> Starting a fresh instance of OVS (cleared previous OVSDB etc.).
>>
>> dpdk-socket-mem="1024,0"
>> dpdk-lcore-mask="0x2"
>> pmd-cpu-mask="0xC"
>>
>> 2 phy ports only, 1 rxq per phy port.
>>
>> Flow rules are basic (in port 1 out port 2)
>>
>> Traffic profile is IPv4 UDP 64 byte packets at line rate (10G)
>>
>> QoS Setup with the following
>>
>> sudo $OVS_DIR/utilities/ovs-vsctl --timeout=5 set port dpdk1 
>> qos=@myqos -- \
>> --id=@myqos create qos type=trtcm-policer \
>> other-config:cir=52000 other-config:cbs=2048 \
>> other-config:eir=52000 other-config:ebs=2048
>>
>> From there it's a case of leaving traffic run (between 10 to 15 mins) 
>> before the segfault occurs.
> 
> Tried a couple of runs, but no luck…

We've tried on a second system as well, but were not able to reproduce 
it, it may be specific to the first test board I've used in this case. 
Shouldn't be a blocker. I'll look at the v5 but think we're close to 
merging.

Regards
Ian

> 
>>>
>>> <SNIP>
>>>
>>>>
>>>> A few times during testing I have seen OVS crash with the following
>>>>
>>>> ./launch_vswitch.sh: line 66: 11694 Floating point exception sudo 
>>>> $OVS_DIR/vswitchd/ovs-vswitchd unix:$DB_SOCK --pidfile
>>>>
>>>> Looking into it with GDB it sems related to the 
>>>> rte_meter_trtcm_rfc4115_color_blind_check above. See GDB output below.
>>>>
>>>> Thread 12 "pmd-c03/id:9" received signal SIGFPE, Arithmetic exception.
>>>> [Switching to Thread 0x7f3dce734700 (LWP 26465)]
>>>> 0x0000000000d4b92d in rte_meter_trtcm_rfc4115_color_blind_check 
>>>> (m=0x328e178, p=0x328e148, time=29107058565136113, pkt_len=46) at 
>>>> /opt/istokes/dpdk-19.11//x86_64-native-linuxapp-gcc/include/rte_meter.h:599 
>>>>
>>>> 599             n_periods_te = time_diff_te / p->eir_period;
>>>> (gdb) bt
>>>> #0  0x0000000000d4b92d in rte_meter_trtcm_rfc4115_color_blind_check 
>>>> (m=0x328e178, p=0x328e148, time=29107058565136113, pkt_len=46) at 
>>>> /opt/istokes/dpdk-19.11//x86_64-native-linuxapp-gcc/include/rte_meter.h:599 
>>>>
>>>> #1  0x0000000000d4abc1 in trtcm_policer_run_single_packet 
>>>> (policer=0x2774200, pkt=0x1508fbd40, time=29107058565136113) at 
>>>> lib/netdev-dpdk.c:4649
>>>> #2  0x0000000000d4ad24 in trtcm_policer_run (conf=0x2774200, 
>>>> pkts=0x7f3db8005100, pkt_cnt=32, should_steal=true) at 
>>>> lib/netdev-dpdk.c:4691
>>>> #3  0x0000000000d45299 in netdev_dpdk_qos_run (dev=0x17fd68840, 
>>>> pkts=0x7f3db8005100, cnt=32, should_steal=true) at 
>>>> lib/netdev-dpdk.c:2421
>>>> #4  0x0000000000d45db0 in netdev_dpdk_send__ (dev=0x17fd68840, 
>>>> qid=1, batch=0x7f3db80050f0, concurrent_txq=false) at 
>>>> lib/netdev-dpdk.c:2683
>>>> #5  0x0000000000d45ee9 in netdev_dpdk_eth_send (netdev=0x17fd688c0, 
>>>> qid=1, batch=0x7f3db80050f0, concurrent_txq=false) at 
>>>> lib/netdev-dpdk.c:2710
>>>> #6  0x0000000000c342ba in netdev_send (netdev=0x17fd688c0, qid=1, 
>>>> batch=0x7f3db80050f0, concurrent_txq=false) at lib/netdev.c:814
>>>> #7  0x0000000000beb3de in dp_netdev_pmd_flush_output_on_port 
>>>> (pmd=0x7f3dce735010, p=0x7f3db80050c0) at lib/dpif-netdev.c:4224
>>>> #8  0x0000000000beb5c4 in dp_netdev_pmd_flush_output_packets 
>>>> (pmd=0x7f3dce735010, force=false) at lib/dpif-netdev.c:4264
>>>> #9  0x0000000000beb814 in dp_netdev_process_rxq_port 
>>>> (pmd=0x7f3dce735010, rxq=0x328d930, port_no=2) at 
>>>> lib/dpif-netdev.c:4319
>>>> #10 0x0000000000bef432 in pmd_thread_main (f_=0x7f3dce735010) at 
>>>> lib/dpif-netdev.c:5556
>>>> #11 0x0000000000cb24e5 in ovsthread_wrapper (aux_=0x326d220) at 
>>>> lib/ovs-thread.c:383
>>>> #12 0x00007f3de11d236d in start_thread (arg=0x7f3dce734700) at 
>>>> pthread_create.c:456
>>>> #13 0x00007f3de06bab4f in clone () at 
>>>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:97
>>>
>>> Looks like a divide by zero, but that was fixed and seems to be in 
>>> DPDK v19.11, ebe3a769911071450acb808153ec2a2496726906
>>
>> I've confirmed I'm testing against 19.11.0 and that commit is present.
>>
>>
>>>
>>> So for some reason rte_meter_get_tb_params() might return 0 in 
>>> eir_period. Looking at the code, I would say this could only really 
>>> happen if rte_get_tsc_hz() returns 0, which seems odd… Could this 
>>> happen in your system for some reason?
>>
>> I don't think rte_get_tsc_hz is returning 0, at east the values for 
>> the function call in GDB don't seem to suggest this. Snippet below 
>> from rte_meter_trtcm_rfc4115_color_blind_check that I'm checking with 
>> GDB.
>>
>> rte_meter_trtcm_rfc4115_color_blind_check(struct 
>> rte_meter_trtcm_rfc4115 *m, struct rte_meter_trtcm_rfc4115_profile *p, 
>> uint64_t time, uint32_t pkt_len)
>>
>> {
>>
>>
>>
>>     uint64_t time_diff_tc, time_diff_te, n_periods_tc, n_periods_te,
>>     tc, te;
>>
>>
>>
>>
>>
>>
>>     /* Bucket update */
>>     time_diff_tc = time - m->time_tc;
>>     time_diff_te = time - m->time_te;
>>
>>
>>
>>     n_periods_tc = time_diff_tc / p->cir_period;
>>     n_periods_te = time_diff_te / p->eir_period;
>>
>> Looking at values with GDB gives the following
>>
>> (gdb) p *p
>> $1 = {cbs = 2048, ebs = 2048, cir_period = 44230, cir_bytes_per_period 
>> = 1, eir_period = 44230, eir_bytes_per_period = 1}
>> (gdb) p time_diff_tc
>> $2 = 29137292739849292
>> (gdb) p n_periods_tc
>> $3 = 13937399
>> (gdb) p time
>> $4 = 29137292739849292
>> (gdb) p m->time_tc
>> $5 = 29137292739858117
>> (gdb) p *m
>> $6 = {time_tc = 29137292739858117, time_te = 29137292739858117, tc = 
>> 2048, te = 2048}
>> (gdb) p time_diff_te
>> $7 = 29137292739849289
>> (gdb) p p->eir_period
>> $8 = 44230
>> (gdb) p n_periods_te
>> $9 = 140260075883992
>>
>> I don't have another board to test on at the moment but will try.
> 
> Odd, do not see how this would create a SIGFPE…
>

Patch
diff mbox series

diff --git a/Documentation/topics/dpdk/qos.rst b/Documentation/topics/dpdk/qos.rst
index c0aec5d..1034954 100644
--- a/Documentation/topics/dpdk/qos.rst
+++ b/Documentation/topics/dpdk/qos.rst
@@ -33,6 +33,9 @@  datapath. These are referred to as *QoS* and *Rate Limiting*, respectively.
 QoS (Egress Policing)
 ---------------------
 
+Single Queue Policer
+~~~~~~~~~~~~~~~~~~~~
+
 Assuming you have a :doc:`vhost-user port <vhost-user>` transmitting traffic
 consisting of packets of size 64 bytes, the following command would limit the
 egress transmission rate of the port to ~1,000,000 packets per second::
@@ -49,6 +52,46 @@  To clear the QoS configuration from the port and ovsdb, run::
 
     $ ovs-vsctl destroy QoS vhost-user0 -- clear Port vhost-user0 qos
 
+
+Multi Queue Policer
+~~~~~~~~~~~~~~~~~~~
+
+In addition to the egress-policer OVS-DPDK also has support for a RFC
+4115's Two-Rate, Three-Color marker meter. It's a two-level hierarchical
+policer which first does a color-blind marking of the traffic at the queue
+level, followed by a color-aware marking at the port level. At the end
+traffic marked as Green or Yellow is forwarded, Red is dropped. For
+details on how traffic is marked, see RFC 4115.
+
+This egress policer can be used to limit traffic at different rated
+based on the queues the traffic is in. In addition, it can also be used
+to prioritize certain traffic over others at a port level.
+
+For example, the following configuration will limit the traffic rate at a
+port level to a maximum of 2000 packets a second (64 bytes IPv4 packets).
+100pps as CIR (Committed Information Rate) and 1000pps as EIR (Excess
+Information Rate). High priority traffic is routed to queue 10, which marks
+all traffic as CIR, i.e. Green. All low priority traffic, queue 20, is
+marked as EIR, i.e. Yellow::
+
+    $ ovs-vsctl --timeout=5 set port dpdk1 qos=@myqos -- \
+        --id=@myqos create qos type=trtcm-policer \
+        other-config:cir=52000 other-config:cbs=2048 \
+        other-config:eir=52000 other-config:ebs=2048  \
+        queues:10=@dpdk1Q10 queues:20=@dpdk1Q20 -- \
+         --id=@dpdk1Q10 create queue \
+          other-config:cir=41600000 other-config:cbs=2048 \
+          other-config:eir=0 other-config:ebs=0 -- \
+         --id=@dpdk1Q20 create queue \
+           other-config:cir=0 other-config:cbs=0 \
+           other-config:eir=41600000 other-config:ebs=2048 \
+
+This configuration accomplishes that the high priority traffic has a
+guaranteed bandwidth egressing the ports at CIR (1000pps), but it can also
+use the EIR, so a total of 2000pps at max. These additional 1000pps is
+shared with the low priority traffic. The low priority traffic can use at
+maximum 1000pps.
+
 Refer to ``vswitch.xml`` for more details on egress policer.
 
 Rate Limiting (Ingress Policing)
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 128963f..3f164c8 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -26,7 +26,10 @@ 
 #include <sys/socket.h>
 #include <linux/if.h>
 
+/* The below is needed as rte_meter.h gets included trough rte_bus_pci.h. */
+#define ALLOW_EXPERIMENTAL_API
 #include <rte_bus_pci.h>
+#undef ALLOW_EXPERIMENTAL_API
 #include <rte_config.h>
 #include <rte_cycles.h>
 #include <rte_errno.h>
@@ -35,7 +38,9 @@ 
 #include <rte_flow.h>
 #include <rte_malloc.h>
 #include <rte_mbuf.h>
+#define ALLOW_EXPERIMENTAL_API
 #include <rte_meter.h>
+#undef ALLOW_EXPERIMENTAL_API
 #include <rte_pci.h>
 #include <rte_version.h>
 #include <rte_vhost.h>
@@ -329,8 +334,9 @@  struct dpdk_qos_ops {
                                      struct netdev_dpdk_queue_state *state);
 };
 
-/* dpdk_qos_ops for each type of user space QoS implementation */
+/* dpdk_qos_ops for each type of user space QoS implementation. */
 static const struct dpdk_qos_ops egress_policer_ops;
+static const struct dpdk_qos_ops trtcm_policer_ops;
 
 /*
  * Array of dpdk_qos_ops, contains pointer to all supported QoS
@@ -338,6 +344,7 @@  static const struct dpdk_qos_ops egress_policer_ops;
  */
 static const struct dpdk_qos_ops *const qos_confs[] = {
     &egress_policer_ops,
+    &trtcm_policer_ops,
     NULL
 };
 
@@ -2162,9 +2169,9 @@  netdev_dpdk_eth_tx_burst(struct netdev_dpdk *dev, int qid,
 }
 
 static inline bool
-netdev_dpdk_policer_pkt_handle(struct rte_meter_srtcm *meter,
-                               struct rte_meter_srtcm_profile *profile,
-                               struct rte_mbuf *pkt, uint64_t time)
+netdev_dpdk_srtcm_policer_pkt_handle(struct rte_meter_srtcm *meter,
+                                     struct rte_meter_srtcm_profile *profile,
+                                     struct rte_mbuf *pkt, uint64_t time)
 {
     uint32_t pkt_len = rte_pktmbuf_pkt_len(pkt) - sizeof(struct rte_ether_hdr);
 
@@ -2173,10 +2180,10 @@  netdev_dpdk_policer_pkt_handle(struct rte_meter_srtcm *meter,
 }
 
 static int
-netdev_dpdk_policer_run(struct rte_meter_srtcm *meter,
-                        struct rte_meter_srtcm_profile *profile,
-                        struct rte_mbuf **pkts, int pkt_cnt,
-                        bool should_steal)
+srtcm_policer_run_single_packet(struct rte_meter_srtcm *meter,
+                                struct rte_meter_srtcm_profile *profile,
+                                struct rte_mbuf **pkts, int pkt_cnt,
+                                bool should_steal)
 {
     int i = 0;
     int cnt = 0;
@@ -2186,8 +2193,8 @@  netdev_dpdk_policer_run(struct rte_meter_srtcm *meter,
     for (i = 0; i < pkt_cnt; i++) {
         pkt = pkts[i];
         /* Handle current packet */
-        if (netdev_dpdk_policer_pkt_handle(meter, profile,
-                                           pkt, current_time)) {
+        if (netdev_dpdk_srtcm_policer_pkt_handle(meter, profile,
+                                                 pkt, current_time)) {
             if (cnt != i) {
                 pkts[cnt] = pkt;
             }
@@ -2209,8 +2216,9 @@  ingress_policer_run(struct ingress_policer *policer, struct rte_mbuf **pkts,
     int cnt = 0;
 
     rte_spinlock_lock(&policer->policer_lock);
-    cnt = netdev_dpdk_policer_run(&policer->in_policer, &policer->in_prof,
-                                  pkts, pkt_cnt, should_steal);
+    cnt = srtcm_policer_run_single_packet(&policer->in_policer,
+                                          &policer->in_prof,
+                                          pkts, pkt_cnt, should_steal);
     rte_spinlock_unlock(&policer->policer_lock);
 
     return cnt;
@@ -4480,9 +4488,9 @@  egress_policer_run(struct qos_conf *conf, struct rte_mbuf **pkts, int pkt_cnt,
     struct egress_policer *policer =
         CONTAINER_OF(conf, struct egress_policer, qos_conf);
 
-    cnt = netdev_dpdk_policer_run(&policer->egress_meter,
-                                  &policer->egress_prof, pkts,
-                                  pkt_cnt, should_steal);
+    cnt = srtcm_policer_run_single_packet(&policer->egress_meter,
+                                          &policer->egress_prof, pkts,
+                                          pkt_cnt, should_steal);
 
     return cnt;
 }
@@ -4496,6 +4504,326 @@  static const struct dpdk_qos_ops egress_policer_ops = {
     .qos_run = egress_policer_run
 };
 
+/* trtcm-policer details */
+
+struct trtcm_policer {
+    struct qos_conf qos_conf;
+    struct rte_meter_trtcm_rfc4115_params meter_params;
+    struct rte_meter_trtcm_rfc4115_profile meter_profile;
+    struct rte_meter_trtcm_rfc4115 meter;
+    struct netdev_queue_stats stats;
+    struct hmap queues;
+};
+
+struct trtcm_policer_queue {
+    struct hmap_node hmap_node;
+    uint32_t queue_id;
+    struct rte_meter_trtcm_rfc4115_params meter_params;
+    struct rte_meter_trtcm_rfc4115_profile meter_profile;
+    struct rte_meter_trtcm_rfc4115 meter;
+    struct netdev_queue_stats stats;
+};
+
+static void
+trtcm_policer_details_to_param(const struct smap *details,
+                               struct rte_meter_trtcm_rfc4115_params *params)
+{
+    memset(params, 0, sizeof *params);
+    params->cir = smap_get_ullong(details, "cir", 0);
+    params->eir = smap_get_ullong(details, "eir", 0);
+    params->cbs = smap_get_ullong(details, "cbs", 0);
+    params->ebs = smap_get_ullong(details, "ebs", 0);
+}
+
+static void
+trtcm_policer_param_to_detail(
+    const struct rte_meter_trtcm_rfc4115_params *params,
+    struct smap *details)
+{
+    smap_add_format(details, "cir", "%"PRIu64, params->cir);
+    smap_add_format(details, "eir", "%"PRIu64, params->eir);
+    smap_add_format(details, "cbs", "%"PRIu64, params->cbs);
+    smap_add_format(details, "ebs", "%"PRIu64, params->ebs);
+}
+
+
+static int
+trtcm_policer_qos_construct(const struct smap *details,
+                            struct qos_conf **conf)
+{
+    struct trtcm_policer *policer;
+    int err = 0;
+
+    policer = xmalloc(sizeof *policer);
+    qos_conf_init(&policer->qos_conf, &trtcm_policer_ops);
+    trtcm_policer_details_to_param(details, &policer->meter_params);
+    err = rte_meter_trtcm_rfc4115_profile_config(&policer->meter_profile,
+                                                 &policer->meter_params);
+    if (!err) {
+        err = rte_meter_trtcm_rfc4115_config(&policer->meter,
+                                             &policer->meter_profile);
+    }
+
+    if (!err) {
+        *conf = &policer->qos_conf;
+        memset(&policer->stats, 0, sizeof policer->stats);
+        hmap_init(&policer->queues);
+    } else {
+        free(policer);
+        *conf = NULL;
+        err = -err;
+    }
+
+    return err;
+}
+
+static void
+trtcm_policer_qos_destruct(struct qos_conf *conf)
+{
+    struct trtcm_policer_queue *queue, *next_queue;
+    struct trtcm_policer *policer = CONTAINER_OF(conf, struct trtcm_policer,
+                                                 qos_conf);
+
+    HMAP_FOR_EACH_SAFE (queue, next_queue, hmap_node, &policer->queues) {
+        hmap_remove(&policer->queues, &queue->hmap_node);
+        free(queue);
+    }
+    hmap_destroy(&policer->queues);
+    free(policer);
+}
+
+static int
+trtcm_policer_qos_get(const struct qos_conf *conf, struct smap *details)
+{
+    struct trtcm_policer *policer = CONTAINER_OF(conf, struct trtcm_policer,
+                                                 qos_conf);
+
+    trtcm_policer_param_to_detail(&policer->meter_params, details);
+    return 0;
+}
+
+static bool
+trtcm_policer_qos_is_equal(const struct qos_conf *conf,
+                           const struct smap *details)
+{
+    struct trtcm_policer *policer = CONTAINER_OF(conf, struct trtcm_policer,
+                                                 qos_conf);
+    struct rte_meter_trtcm_rfc4115_params params;
+
+    trtcm_policer_details_to_param(details, &params);
+
+    return !memcmp(&params, &policer->meter_params, sizeof params);
+}
+
+static struct trtcm_policer_queue *
+trtcm_policer_qos_find_queue(struct trtcm_policer *policer, uint32_t queue_id)
+{
+    struct trtcm_policer_queue *queue;
+    HMAP_FOR_EACH_WITH_HASH (queue, hmap_node, hash_2words(queue_id, 0),
+                             &policer->queues) {
+        if (queue->queue_id == queue_id) {
+            return queue;
+        }
+    }
+    return NULL;
+}
+
+static inline bool
+trtcm_policer_run_single_packet(struct trtcm_policer *policer,
+                                struct rte_mbuf *pkt, uint64_t time)
+{
+    enum rte_color pkt_color;
+    struct trtcm_policer_queue *queue;
+    uint32_t pkt_len = rte_pktmbuf_pkt_len(pkt) - sizeof(struct rte_ether_hdr);
+    struct dp_packet *dpkt = CONTAINER_OF(pkt, struct dp_packet, mbuf);
+
+    queue = trtcm_policer_qos_find_queue(policer, dpkt->md.skb_priority);
+    if (!queue) {
+        /* If no queue is found, use the default queue, which MUST exist. */
+        queue = trtcm_policer_qos_find_queue(policer, 0);
+        if (!queue) {
+            return false;
+        }
+    }
+
+    pkt_color = rte_meter_trtcm_rfc4115_color_blind_check(&queue->meter,
+                                                          &queue->meter_profile,
+                                                          time,
+                                                          pkt_len);
+
+    if (pkt_color == RTE_COLOR_RED) {
+        queue->stats.tx_errors++;
+    } else {
+        queue->stats.tx_bytes += pkt_len;
+        queue->stats.tx_packets++;
+    }
+
+    pkt_color = rte_meter_trtcm_rfc4115_color_aware_check(&policer->meter,
+                                                     &policer->meter_profile,
+                                                     time, pkt_len,
+                                                     pkt_color);
+
+    if (pkt_color == RTE_COLOR_RED) {
+        policer->stats.tx_errors++;
+        return false;
+    }
+
+    policer->stats.tx_bytes += pkt_len;
+    policer->stats.tx_packets++;
+    return true;
+}
+
+static int
+trtcm_policer_run(struct qos_conf *conf, struct rte_mbuf **pkts, int pkt_cnt,
+                  bool should_steal)
+{
+    int i = 0;
+    int cnt = 0;
+    struct rte_mbuf *pkt = NULL;
+    uint64_t current_time = rte_rdtsc();
+
+    struct trtcm_policer *policer = CONTAINER_OF(conf, struct trtcm_policer,
+                                                 qos_conf);
+
+    for (i = 0; i < pkt_cnt; i++) {
+        pkt = pkts[i];
+
+        if (trtcm_policer_run_single_packet(policer, pkt, current_time)) {
+            if (cnt != i) {
+                pkts[cnt] = pkt;
+            }
+            cnt++;
+        } else {
+            if (should_steal) {
+                rte_pktmbuf_free(pkt);
+            }
+        }
+    }
+    return cnt;
+}
+
+static int
+trtcm_policer_qos_queue_construct(const struct smap *details,
+                                  uint32_t queue_id, struct qos_conf *conf)
+{
+    int err = 0;
+    struct trtcm_policer_queue *queue;
+    struct trtcm_policer *policer = CONTAINER_OF(conf, struct trtcm_policer,
+                                                 qos_conf);
+
+    queue = trtcm_policer_qos_find_queue(policer, queue_id);
+    if (!queue) {
+        queue = xmalloc(sizeof *queue);
+        queue->queue_id = queue_id;
+        memset(&queue->stats, 0, sizeof queue->stats);
+        queue->stats.created = time_msec();
+        hmap_insert(&policer->queues, &queue->hmap_node,
+                    hash_2words(queue_id, 0));
+    }
+    if (queue_id == 0 && smap_is_empty(details)) {
+        /* No default queue configured, use port values */
+        memcpy(&queue->meter_params, &policer->meter_params,
+               sizeof queue->meter_params);
+    } else {
+        trtcm_policer_details_to_param(details, &queue->meter_params);
+    }
+
+    err = rte_meter_trtcm_rfc4115_profile_config(&queue->meter_profile,
+                                                 &queue->meter_params);
+
+    if (!err) {
+        err = rte_meter_trtcm_rfc4115_config(&queue->meter,
+                                             &queue->meter_profile);
+    }
+    if (err) {
+        hmap_remove(&policer->queues, &queue->hmap_node);
+        free(queue);
+        err = -err;
+    }
+    return err;
+}
+
+static void
+trtcm_policer_qos_queue_destruct(struct qos_conf *conf, uint32_t queue_id)
+{
+    struct trtcm_policer_queue *queue;
+    struct trtcm_policer *policer = CONTAINER_OF(conf, struct trtcm_policer,
+                                                 qos_conf);
+
+    queue = trtcm_policer_qos_find_queue(policer, queue_id);
+    if (queue) {
+        hmap_remove(&policer->queues, &queue->hmap_node);
+        free(queue);
+    }
+}
+
+static int
+trtcm_policer_qos_queue_get(struct smap *details, uint32_t queue_id,
+                            const struct qos_conf *conf)
+{
+    struct trtcm_policer_queue *queue;
+    struct trtcm_policer *policer = CONTAINER_OF(conf, struct trtcm_policer,
+                                                 qos_conf);
+
+    queue = trtcm_policer_qos_find_queue(policer, queue_id);
+    if (!queue) {
+        return EINVAL;
+    }
+
+    trtcm_policer_param_to_detail(&queue->meter_params, details);
+    return 0;
+}
+
+static int
+trtcm_policer_qos_queue_get_stats(const struct qos_conf *conf,
+                                  uint32_t queue_id,
+                                  struct netdev_queue_stats *stats)
+{
+    struct trtcm_policer_queue *queue;
+    struct trtcm_policer *policer = CONTAINER_OF(conf, struct trtcm_policer,
+                                                 qos_conf);
+
+    queue = trtcm_policer_qos_find_queue(policer, queue_id);
+    if (!queue) {
+        return EINVAL;
+    }
+    memcpy(stats, &queue->stats, sizeof *stats);
+    return 0;
+}
+
+static int
+trtcm_policer_qos_queue_dump_state_init(const struct qos_conf *conf,
+                                        struct netdev_dpdk_queue_state *state)
+{
+    uint32_t i = 0;
+    struct trtcm_policer_queue *queue;
+    struct trtcm_policer *policer = CONTAINER_OF(conf, struct trtcm_policer,
+                                                 qos_conf);
+
+    state->n_queues = hmap_count(&policer->queues);
+    state->cur_queue = 0;
+    state->queues = xmalloc(state->n_queues * sizeof *state->queues);
+
+    HMAP_FOR_EACH (queue, hmap_node, &policer->queues) {
+        state->queues[i++] = queue->queue_id;
+    }
+    return 0;
+}
+
+static const struct dpdk_qos_ops trtcm_policer_ops = {
+    .qos_name = "trtcm-policer",
+    .qos_construct = trtcm_policer_qos_construct,
+    .qos_destruct = trtcm_policer_qos_destruct,
+    .qos_get = trtcm_policer_qos_get,
+    .qos_is_equal = trtcm_policer_qos_is_equal,
+    .qos_run = trtcm_policer_run,
+    .qos_queue_construct = trtcm_policer_qos_queue_construct,
+    .qos_queue_destruct = trtcm_policer_qos_queue_destruct,
+    .qos_queue_get = trtcm_policer_qos_queue_get,
+    .qos_queue_get_stats = trtcm_policer_qos_queue_get_stats,
+    .qos_queue_dump_state_init = trtcm_policer_qos_queue_dump_state_init
+};
+
 static int
 netdev_dpdk_reconfigure(struct netdev *netdev)
 {
diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
index 0ec726c..c43cb1a 100644
--- a/vswitchd/vswitch.xml
+++ b/vswitchd/vswitch.xml
@@ -4441,6 +4441,19 @@  ovs-vsctl add-port br0 p0 -- set Interface p0 type=patch options:peer=p1 \
           in performance will be noticed in terms of overall aggregate traffic
           throughput.
         </dd>
+        <dt><code>trtcm-policer</code></dt>
+        <dd>
+          A DPDK egress policer algorithm using RFC 4115's Two-Rate,
+          Three-Color marker. It's a two-level hierarchical policer
+          which first does a color-blind marking of the traffic at the queue
+          level, followed by a color-aware marking at the port level. At the
+          end traffic marked as Green or Yellow is forwarded, Red is dropped.
+          For details on how traffic is marked, see RFC 4115.
+
+          If the ``default queue'', 0, is not configured it's automatically
+          created with the same <code>other_config</code> values as the
+          physical port.
+        </dd>
       </dl>
     </column>
 
@@ -4508,6 +4521,27 @@  ovs-vsctl add-port br0 p0 -- set Interface p0 type=patch options:peer=p1 \
         bytes/tokens of the packet. If there are not enough tokens in the cbs
         bucket the packet will be dropped.
       </column>
+      <column name="other_config" key="eir" type='{"type": "integer"}'>
+        The Excess Information Rate (EIR) is measured in bytes of IP
+        packets per second, i.e. it includes the IP header, but not link
+        specific (e.g. Ethernet) headers. This represents the bytes per second
+        rate at which the token bucket will be updated. The eir value is
+        calculated by (pps x packet data size).  For example assuming a user
+        wishes to limit a stream consisting of 64 byte packets to 1 million
+        packets per second the EIR would be set to to to 46000000. This value
+        can be broken into '1,000,000 x 46'. Where 1,000,000 is the policing
+        rate for the number of packets per second and 46 represents the size
+        of the packet data for a 64 byte ip packet.
+      </column>
+      <column name="other_config" key="ebs" type='{"type": "integer"}'>
+        The Excess Burst Size (EBS) is measured in bytes and represents a
+        token bucket. At a minimum this value should be be set to the expected
+        largest size packet in the traffic stream. In practice larger values
+        may be used to increase the size of the token bucket. If a packet can
+        be transmitted then the ebs will be decremented by the number of
+        bytes/tokens of the packet. If there are not enough tokens in the cbs
+        bucket the packet might be dropped.
+      </column>
     </group>
 
     <group title="Configuration for linux-sfq">