diff mbox series

[ovs-dev,dpdk-latest,v3,2/2] netdev-dpdk: Add new DPDK RFC 4115 egress policer

Message ID 20191001141034.26768.88495.stgit@netdev64
State Changes Requested
Headers show
Series netdev-dpdk: Add new DPDK RFC 4115 egress policer | expand

Commit Message

Eelco Chaudron Oct. 1, 2019, 2:10 p.m. UTC
This patch adds a new policer to the DPDK datapath based on RFC 4115's
Two-Rate, Three-Color marker. It's a two-level hierarchical policer
which first does a color-blind marking of the traffic at the queue
level, followed by a color-aware marking at the port level. At the end
traffic marked as Green or Yellow is forwarded, Red is dropped. For
details on how traffic is marked, see RFC 4115.

This egress policer can be used to limit traffic at different rated
based on the queues the traffic is in. In addition, it can also be used
to prioritize certain traffic over others at a port level.

For example, the following configuration will limit the traffic rate at a
port level to a maximum of 2000 packets a second (64 bytes IPv4 packets).
100pps as CIR (Committed Information Rate) and 1000pps as EIR (Excess
Information Rate). High priority traffic is routed to queue 10, which marks
all traffic as CIR, i.e. Green. All low priority traffic, queue 20, is
marked as EIR, i.e. Yellow.

ovs-vsctl --timeout=5 set port dpdk1 qos=@myqos -- \
  --id=@myqos create qos type=trtcm-policer \
  other-config:cir=52000 other-config:cbs=2048 \
  other-config:eir=52000 other-config:ebs=2048  \
  queues:10=@dpdk1Q10 queues:20=@dpdk1Q20 -- \
  --id=@dpdk1Q10 create queue \
    other-config:cir=41600000 other-config:cbs=2048 \
    other-config:eir=0 other-config:ebs=0 -- \
  --id=@dpdk1Q20 create queue \
    other-config:cir=0 other-config:cbs=0 \
    other-config:eir=41600000 other-config:ebs=2048 \

This configuration accomplishes that the high priority traffic has a
guaranteed bandwidth egressing the ports at CIR (1000pps), but it can also
use the EIR, so a total of 2000pps at max. These additional 1000pps is
shared with the low priority traffic. The low priority traffic can use at
maximum 1000pps.

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
---
 Documentation/topics/dpdk/qos.rst |   36 ++++
 lib/netdev-dpdk.c                 |  321 +++++++++++++++++++++++++++++++++++++
 vswitchd/vswitch.xml              |   30 +++
 3 files changed, 386 insertions(+), 1 deletion(-)

Comments

Stokes, Ian Jan. 7, 2020, 6:33 p.m. UTC | #1
On 10/1/2019 3:10 PM, Eelco Chaudron wrote:
> This patch adds a new policer to the DPDK datapath based on RFC 4115's
> Two-Rate, Three-Color marker. It's a two-level hierarchical policer
> which first does a color-blind marking of the traffic at the queue
> level, followed by a color-aware marking at the port level. At the end
> traffic marked as Green or Yellow is forwarded, Red is dropped. For
> details on how traffic is marked, see RFC 4115.
> 
> This egress policer can be used to limit traffic at different rated
> based on the queues the traffic is in. In addition, it can also be used
> to prioritize certain traffic over others at a port level.
> 
> For example, the following configuration will limit the traffic rate at a
> port level to a maximum of 2000 packets a second (64 bytes IPv4 packets).
> 100pps as CIR (Committed Information Rate) and 1000pps as EIR (Excess
> Information Rate). High priority traffic is routed to queue 10, which marks
> all traffic as CIR, i.e. Green. All low priority traffic, queue 20, is
> marked as EIR, i.e. Yellow.
> 
> ovs-vsctl --timeout=5 set port dpdk1 qos=@myqos -- \
>    --id=@myqos create qos type=trtcm-policer \
>    other-config:cir=52000 other-config:cbs=2048 \
>    other-config:eir=52000 other-config:ebs=2048  \
>    queues:10=@dpdk1Q10 queues:20=@dpdk1Q20 -- \
>    --id=@dpdk1Q10 create queue \
>      other-config:cir=41600000 other-config:cbs=2048 \
>      other-config:eir=0 other-config:ebs=0 -- \
>    --id=@dpdk1Q20 create queue \
>      other-config:cir=0 other-config:cbs=0 \
>      other-config:eir=41600000 other-config:ebs=2048 \
> 
> This configuration accomplishes that the high priority traffic has a
> guaranteed bandwidth egressing the ports at CIR (1000pps), but it can also
> use the EIR, so a total of 2000pps at max. These additional 1000pps is
> shared with the low priority traffic. The low priority traffic can use at
> maximum 1000pps.
> 
> Signed-off-by: Eelco Chaudron <echaudro@redhat.com>

Thank for the patch Eelco,

Overall loks good and follows the established QoS methodology.

I'm still testing this code in more detail but wanted to clarify a few 
points below in the meantime.

Comments inline.

We discussed this at the OVS conference but this causes compilation to 
failure in OVS DPDK due to some of the trtcm API being experimental in 
DPDK, so we'd either need to avoid that warning or else remove the 
experimental tag for the API in 19.11.1 and hold off merging this patch 
until then.

> ---
>   Documentation/topics/dpdk/qos.rst |   36 ++++
>   lib/netdev-dpdk.c                 |  321 +++++++++++++++++++++++++++++++++++++
>   vswitchd/vswitch.xml              |   30 +++
>   3 files changed, 386 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/topics/dpdk/qos.rst b/Documentation/topics/dpdk/qos.rst
> index c0aec5d..b06b1c2 100644
> --- a/Documentation/topics/dpdk/qos.rst
> +++ b/Documentation/topics/dpdk/qos.rst
> @@ -49,6 +49,42 @@ To clear the QoS configuration from the port and ovsdb, run::
>   
>       $ ovs-vsctl destroy QoS vhost-user0 -- clear Port vhost-user0 qos
>

Below looks good, robust explanation, possibly worth separating the 
single policer and Two rte policer using the 2nd heading style ~~~~~~~~ 
as we now have 2 types of QoS.

What do you think?

> +In addition to the egress-policer OVS-DPDK also has support for a RFC
> +4115's Two-Rate, Three-Color marker meter. It's a two-level hierarchical
> +policer which first does a color-blind marking of the traffic at the queue
> +level, followed by a color-aware marking at the port level. At the end
> +traffic marked as Green or Yellow is forwarded, Red is dropped. For
> +details on how traffic is marked, see RFC 4115.
> +
> +This egress policer can be used to limit traffic at different rated
> +based on the queues the traffic is in. In addition, it can also be used
> +to prioritize certain traffic over others at a port level.
> +
> +For example, the following configuration will limit the traffic rate at a
> +port level to a maximum of 2000 packets a second (64 bytes IPv4 packets).
> +100pps as CIR (Committed Information Rate) and 1000pps as EIR (Excess
> +Information Rate). High priority traffic is routed to queue 10, which marks
> +all traffic as CIR, i.e. Green. All low priority traffic, queue 20, is
> +marked as EIR, i.e. Yellow::
> +
> +    $ ovs-vsctl --timeout=5 set port dpdk1 qos=@myqos -- \
> +        --id=@myqos create qos type=trtcm-policer \
> +        other-config:cir=52000 other-config:cbs=2048 \
> +        other-config:eir=52000 other-config:ebs=2048  \
> +        queues:10=@dpdk1Q10 queues:20=@dpdk1Q20 -- \
> +         --id=@dpdk1Q10 create queue \
> +          other-config:cir=41600000 other-config:cbs=2048 \
> +          other-config:eir=0 other-config:ebs=0 -- \
> +         --id=@dpdk1Q20 create queue \
> +           other-config:cir=0 other-config:cbs=0 \
> +           other-config:eir=41600000 other-config:ebs=2048 \
> +
> +This configuration accomplishes that the high priority traffic has a
> +guaranteed bandwidth egressing the ports at CIR (1000pps), but it can also
> +use the EIR, so a total of 2000pps at max. These additional 1000pps is
> +shared with the low priority traffic. The low priority traffic can use at
> +maximum 1000pps.
> +
>   Refer to ``vswitch.xml`` for more details on egress policer.
>   
>   Rate Limiting (Ingress Policing)
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index 072ce96..87f098f 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -309,13 +309,14 @@ struct dpdk_qos_ops {
>   
>   /* dpdk_qos_ops for each type of user space QoS implementation */
Minor, missing period for comment.

>   static const struct dpdk_qos_ops egress_policer_ops;
> -
> +static const struct dpdk_qos_ops trtcm_policer_ops;

Minor, could use a white space to separate the qos ops from the ops 
array comment below.

>   /*
>    * Array of dpdk_qos_ops, contains pointer to all supported QoS
>    * operations.
>    */
>   static const struct dpdk_qos_ops *const qos_confs[] = {
>       &egress_policer_ops,
> +    &trtcm_policer_ops,
>       NULL
>   };
>   
> @@ -4334,6 +4335,324 @@ static const struct dpdk_qos_ops egress_policer_ops = {
>       .qos_run = egress_policer_run
>   };
>   
> +/* trtcm-policer details */
> +
> +struct trtcm_policer {
> +    struct qos_conf qos_conf;
> +    struct rte_meter_trtcm_rfc4115_params meter_params;
> +    struct rte_meter_trtcm_rfc4115_profile meter_profile;
> +    struct rte_meter_trtcm_rfc4115 meter;
> +    struct netdev_queue_stats stats;
> +    struct hmap queues;
> +};
> +
> +struct trtcm_policer_queue {
> +    struct hmap_node hmap_node;
> +    uint32_t queue_id;
> +    struct rte_meter_trtcm_rfc4115_params meter_params;
> +    struct rte_meter_trtcm_rfc4115_profile meter_profile;
> +    struct rte_meter_trtcm_rfc4115 meter;
> +    struct netdev_queue_stats stats;
> +};
> +
> +static void
> +trtcm_policer_details_to_param(const struct smap *details,
> +                               struct rte_meter_trtcm_rfc4115_params *params)
> +{
> +    memset(params, 0, sizeof *params);
> +    params->cir = smap_get_ullong(details, "cir", 0);
> +    params->eir = smap_get_ullong(details, "eir", 0);
> +    params->cbs = smap_get_ullong(details, "cbs", 0);
> +    params->ebs = smap_get_ullong(details, "ebs", 0);
> +}
> +
> +static void
> +trtcm_policer_param_to_detail(
> +    const struct rte_meter_trtcm_rfc4115_params *params,
> +    struct smap *details)
> +{
> +    smap_add_format(details, "cir", "%"PRIu64, params->cir);
> +    smap_add_format(details, "eir", "%"PRIu64, params->eir);
> +    smap_add_format(details, "cbs", "%"PRIu64, params->cbs);
> +    smap_add_format(details, "ebs", "%"PRIu64, params->ebs);
> +}
> +
> +
> +static int
> +trtcm_policer_qos_construct(const struct smap *details,
> +                            struct qos_conf **conf)
> +{
> +    struct trtcm_policer *policer;
> +    int err = 0;
> +
> +    policer = xmalloc(sizeof *policer);
> +    qos_conf_init(&policer->qos_conf, &trtcm_policer_ops);
> +    trtcm_policer_details_to_param(details, &policer->meter_params);
> +    err = rte_meter_trtcm_rfc4115_profile_config(&policer->meter_profile,
> +                                                 &policer->meter_params);
> +    if (!err) {
> +        err = rte_meter_trtcm_rfc4115_config(&policer->meter,
> +                                             &policer->meter_profile);
> +    }

Minor, white space between the  if block below and above. Possibly add 
one before return below as well (jut in keeping with the existing egress 
policer style.

> +    if (!err) {
> +        *conf = &policer->qos_conf;
> +        memset(&policer->stats, 0, sizeof policer->stats);
> +        hmap_init(&policer->queues);
> +    } else {
> +        free(policer);
> +        *conf = NULL;
> +        err = -err;
> +    }
> +    return err;
> +}
> +
> +static void
> +trtcm_policer_qos_destruct(struct qos_conf *conf)
> +{
> +    struct trtcm_policer_queue *queue, *next_queue;
> +    struct trtcm_policer *policer = CONTAINER_OF(conf, struct trtcm_policer,
> +                                                 qos_conf);
> +
> +    HMAP_FOR_EACH_SAFE (queue, next_queue, hmap_node, &policer->queues) {
> +        hmap_remove(&policer->queues, &queue->hmap_node);
> +        free(queue);
> +    }
> +    hmap_destroy(&policer->queues);
> +    free(policer);
> +}
> +
> +static int
> +trtcm_policer_qos_get(const struct qos_conf *conf, struct smap *details)
> +{
> +    struct trtcm_policer *policer = CONTAINER_OF(conf, struct trtcm_policer,
> +                                                 qos_conf);
> +
> +    trtcm_policer_param_to_detail(&policer->meter_params, details);

Above is nice, especially where there are multiple parameters, with 
egress currently we directly access the parameters we need (only 2 of 
them) but I'm thinking a similar function to this could be implemented 
for it to keep the approach uniform.

Not a blocker, but in the future we could possibly add the parameters 
for a given qos_conf as part of conf, would allow us to refactor the qos 
code to use generic details_to_param/param_to_details operations 
(assuming the string representing param is the same in details of 
course). What are your thoughts?

> +    return 0;
> +}
> +
> +static bool
> +trtcm_policer_qos_is_equal(const struct qos_conf *conf,
> +                           const struct smap *details)
> +{
> +    struct trtcm_policer *policer = CONTAINER_OF(conf, struct trtcm_policer,
> +                                                 qos_conf);
> +    struct rte_meter_trtcm_rfc4115_params params;
> +
> +    trtcm_policer_details_to_param(details, &params);
> +
> +    return !memcmp(&params, &policer->meter_params, sizeof params);
> +}
> +
> +static struct trtcm_policer_queue *
> +trtcm_policer_qos_find_queue(struct trtcm_policer *policer, uint32_t queue_id)
> +{
> +    struct trtcm_policer_queue *queue;
> +    HMAP_FOR_EACH_WITH_HASH (queue, hmap_node, hash_2words(queue_id, 0),
> +                             &policer->queues) {
> +        if (queue->queue_id == queue_id) {
> +            return queue;
> +        }
> +    }
> +    return NULL;
> +}
> +
> +static inline bool
> +trtcm_policer_run_single_packet(struct trtcm_policer *policer,
> +                                struct rte_mbuf *pkt, uint64_t time)
> +{
> +    enum rte_color pkt_color;
> +    struct trtcm_policer_queue *queue;
> +    uint32_t pkt_len = rte_pktmbuf_pkt_len(pkt) - sizeof(struct rte_ether_hdr);
> +    struct dp_packet *dpkt = CONTAINER_OF(pkt, struct dp_packet, mbuf);
> +
> +    queue = trtcm_policer_qos_find_queue(policer, dpkt->md.skb_priority);
So I think we should flag the relationship to the users in documentation 
between the skb priority and the queue ID. Maybe I'm misunderstanding 
but they would have to understand this to know how traffic will map to 
which queue?

Also when testing his how are you setting the skb, via traffic gen or by 
OVS with a flow rule?

> +    if (!queue) {
> +        /* If no queue is found, use the default queue, which MUST exist */
Minor, period above.

> +        queue = trtcm_policer_qos_find_queue(policer, 0);
> +        if (!queue) {
> +            return false;
> +        }
> +    }
> +
> +    pkt_color = rte_meter_trtcm_rfc4115_color_blind_check(&queue->meter,
> +                                                          &queue->meter_profile,
> +                                                          time,
> +                                                          pkt_len);
> +
> +    if (pkt_color == RTE_COLOR_RED) {
> +        queue->stats.tx_errors++;
> +    } else {
> +        queue->stats.tx_bytes += pkt_len;
> +        queue->stats.tx_packets++;
> +    }
> +
> +    pkt_color = rte_meter_trtcm_rfc4115_color_aware_check(&policer->meter,
> +                                                     &policer->meter_profile,
> +                                                     time, pkt_len,
> +                                                     pkt_color);
> +
> +    if (pkt_color == RTE_COLOR_RED) {
> +        policer->stats.tx_errors++;
> +        return false;
> +    }
> +
> +    policer->stats.tx_bytes += pkt_len;
> +    policer->stats.tx_packets++;
> +    return true;
> +}
> +
> +static int
> +trtcm_policer_run(struct qos_conf *conf, struct rte_mbuf **pkts, int pkt_cnt,
> +                  bool should_steal)
> +{
> +    int i = 0;
> +    int cnt = 0;
> +    struct rte_mbuf *pkt = NULL;
> +    uint64_t current_time = rte_rdtsc();
> +
> +    struct trtcm_policer *policer = CONTAINER_OF(conf, struct trtcm_policer,
> +                                                 qos_conf);
> +
> +    for (i = 0; i < pkt_cnt; i++) {
> +        pkt = pkts[i];
> +
> +        if (trtcm_policer_run_single_packet(policer, pkt, current_time)) {
Just thinking, it might be a good idea to change the existing egress 
single packet run name (netdev_dpdk_policer_run) in light of this 
function being introduced as it may cause confusion, not a blocker here 
though.

> +            if (cnt != i) {
> +                pkts[cnt] = pkt;
> +            }
> +            cnt++;
> +        } else {
> +            if (should_steal) {
Hmmm, it's been a while since I last looked at this part of the code for 
the existing policer but I was surprised that we only drop the packet 
when should steal was enabled, I would have thought that for a give RED 
color packet it would be freed regardless. It probably clear later, 
maybe I'm missing something.

> +                rte_pktmbuf_free(pkt);
> +            }
> +        }
> +    }
> +    return cnt;
> +}
> +
> +static int
> +trtcm_policer_qos_queue_construct(const struct smap *details,
> +                                  uint32_t queue_id, struct qos_conf *conf)
> +{
> +    int err = 0;
> +    struct trtcm_policer_queue *queue;
> +    struct trtcm_policer *policer = CONTAINER_OF(conf, struct trtcm_policer,
> +                                                 qos_conf);
> +
> +    queue = trtcm_policer_qos_find_queue(policer, queue_id);
> +    if (!queue) {
> +        queue = xmalloc(sizeof *queue);
> +        queue->queue_id = queue_id;
> +        memset(&queue->stats, 0, sizeof queue->stats);
> +        queue->stats.created = time_msec();
> +        hmap_insert(&policer->queues, &queue->hmap_node,
> +                    hash_2words(queue_id, 0));
> +    }
> +    if (queue_id == 0 && smap_is_empty(details)) {
> +        /* No default queue configured, use port values */
So in this case it it that no parameters for a queue are being provided 
or that no queue is being created and a default queue is created instead?

If no queue/parameters for a queue are requested should the QoS in this 
case just fail to initialize and log to the user that queue arguments 
are required?

Just trying to get my head around the behavior of a packet in this case, 
as the same parameters are in use at both queue (color blind mode) and 
port (color aware mode), essentially is this queue redundant as the port 
will be color aware any how so would drop the packet?


Might be good to flag what the default behavior will be if queue params 
are not specified by a user in either vswitch.xml or qos docs above..
> +        memcpy(&queue->meter_params, &policer->meter_params,
> +               sizeof queue->meter_params);
> +    } else {
> +        trtcm_policer_details_to_param(details, &queue->meter_params);
> +    }
> +
> +    err = rte_meter_trtcm_rfc4115_profile_config(&queue->meter_profile,
> +                                                 &queue->meter_params);
> +
> +    if (!err) {
> +        err = rte_meter_trtcm_rfc4115_config(&queue->meter,
> +                                             &queue->meter_profile);
> +    }
> +    if (err) {
> +        hmap_remove(&policer->queues, &queue->hmap_node);
> +        free(queue);
> +        err = -err;
> +    }
> +    return err;
> +}
> +
> +static void
> +trtcm_policer_qos_queue_destruct(struct qos_conf *conf, uint32_t queue_id)
> +{
> +    struct trtcm_policer_queue *queue;
> +    struct trtcm_policer *policer = CONTAINER_OF(conf, struct trtcm_policer,
> +                                                 qos_conf);
> +
> +    queue = trtcm_policer_qos_find_queue(policer, queue_id);
> +    if (queue) {
> +        hmap_remove(&policer->queues, &queue->hmap_node);
> +        free(queue);
> +    }
> +}
> +
> +static int
> +trtcm_policer_qos_queue_get(struct smap *details, uint32_t queue_id,
> +                            const struct qos_conf *conf)
> +{
> +    struct trtcm_policer_queue *queue;
> +    struct trtcm_policer *policer = CONTAINER_OF(conf, struct trtcm_policer,
> +                                                 qos_conf);
> +
> +    queue = trtcm_policer_qos_find_queue(policer, queue_id);
> +    if (!queue) {
> +        return EINVAL;
> +    }
> +
> +    trtcm_policer_param_to_detail(&queue->meter_params, details);
> +    return 0;
> +}
> +
> +static int
> +trtcm_policer_qos_queue_get_stats(const struct qos_conf *conf,
> +                                  uint32_t queue_id,
> +                                  struct netdev_queue_stats *stats)
> +{
> +    struct trtcm_policer_queue *queue;
> +    struct trtcm_policer *policer = CONTAINER_OF(conf, struct trtcm_policer,
> +                                                 qos_conf);
> +
> +    queue = trtcm_policer_qos_find_queue(policer, queue_id);
> +    if (!queue) {
> +        return EINVAL;
> +    }
> +    memcpy(stats, &queue->stats, sizeof *stats);
> +    return 0;
> +}
> +
> +static int
> +trtcm_policer_qos_queue_dump_state_init(const struct qos_conf *conf,
> +                                        struct netdev_dpdk_queue_state *state)
> +{
> +    uint32_t i = 0;
> +    struct trtcm_policer_queue *queue;
> +    struct trtcm_policer *policer = CONTAINER_OF(conf, struct trtcm_policer,
> +                                                 qos_conf);
> +
> +    state->n_queues = hmap_count(&policer->queues);
> +    state->cur_queue = 0;

To confrim, is the assumption here thet you must always have a default 
queue0 and hence why you can set this here once? It will be used 
eslewhere to cycle through queue states I assume.

Thanks
Ian
> +    state->queues = xmalloc(state->n_queues * sizeof *state->queues);
> +
> +    HMAP_FOR_EACH (queue, hmap_node, &policer->queues) {
> +        state->queues[i++] = queue->queue_id;
> +    }
> +    return 0;
> +}
> +
> +static const struct dpdk_qos_ops trtcm_policer_ops = {
> +    .qos_name = "trtcm-policer",
> +    .qos_construct = trtcm_policer_qos_construct,
> +    .qos_destruct = trtcm_policer_qos_destruct,
> +    .qos_get = trtcm_policer_qos_get,
> +    .qos_is_equal = trtcm_policer_qos_is_equal,
> +    .qos_run = trtcm_policer_run,
> +    .qos_queue_construct = trtcm_policer_qos_queue_construct,
> +    .qos_queue_destruct = trtcm_policer_qos_queue_destruct,
> +    .qos_queue_get = trtcm_policer_qos_queue_get,
> +    .qos_queue_get_stats = trtcm_policer_qos_queue_get_stats,
> +    .qos_queue_dump_state_init = trtcm_policer_qos_queue_dump_state_init
> +};
> +
>   static int
>   netdev_dpdk_reconfigure(struct netdev *netdev)
>   {
> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
> index 01304a5..3569e7c 100644
> --- a/vswitchd/vswitch.xml
> +++ b/vswitchd/vswitch.xml
> @@ -4409,6 +4409,15 @@ ovs-vsctl add-port br0 p0 -- set Interface p0 type=patch options:peer=p1 \
>             in performance will be noticed in terms of overall aggregate traffic
>             throughput.
>           </dd>
> +        <dt><code>trtcm-policer</code></dt>
> +        <dd>
> +          A DPDK egress policer algorithm using RFC 4115's Two-Rate,
> +          Three-Color marker. It's a two-level hierarchical policer
> +          which first does a color-blind marking of the traffic at the queue
> +          level, followed by a color-aware marking at the port level. At the
> +          end traffic marked as Green or Yellow is forwarded, Red is dropped.
> +          For details on how traffic is marked, see RFC 4115.
> +        </dd>
>         </dl>
>       </column>
>   
> @@ -4476,6 +4485,27 @@ ovs-vsctl add-port br0 p0 -- set Interface p0 type=patch options:peer=p1 \
>           bytes/tokens of the packet. If there are not enough tokens in the cbs
>           bucket the packet will be dropped.
>         </column>
> +      <column name="other_config" key="eir" type='{"type": "integer"}'>
> +        The Excess Information Rate (EIR) is measured in bytes of IP
> +        packets per second, i.e. it includes the IP header, but not link
> +        specific (e.g. Ethernet) headers. This represents the bytes per second
> +        rate at which the token bucket will be updated. The eir value is
> +        calculated by (pps x packet data size).  For example assuming a user
> +        wishes to limit a stream consisting of 64 byte packets to 1 million
> +        packets per second the EIR would be set to to to 46000000. This value
> +        can be broken into '1,000,000 x 46'. Where 1,000,000 is the policing
> +        rate for the number of packets per second and 46 represents the size
> +        of the packet data for a 64 byte ip packet.
> +      </column>
> +      <column name="other_config" key="ebs" type='{"type": "integer"}'>
> +        The Excess Burst Size (EBS) is measured in bytes and represents a
> +        token bucket. At a minimum this value should be be set to the expected
> +        largest size packet in the traffic stream. In practice larger values
> +        may be used to increase the size of the token bucket. If a packet can
> +        be transmitted then the ebs will be decremented by the number of
> +        bytes/tokens of the packet. If there are not enough tokens in the cbs
> +        bucket the packet might be dropped.
> +      </column>
>       </group>
>   
>       <group title="Configuration for linux-sfq">
> 
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>   
>
Eelco Chaudron Jan. 13, 2020, 1:34 p.m. UTC | #2
On 7 Jan 2020, at 19:33, Stokes, Ian wrote:

> On 10/1/2019 3:10 PM, Eelco Chaudron wrote:
>> This patch adds a new policer to the DPDK datapath based on RFC 
>> 4115's
>> Two-Rate, Three-Color marker. It's a two-level hierarchical policer
>> which first does a color-blind marking of the traffic at the queue
>> level, followed by a color-aware marking at the port level. At the 
>> end
>> traffic marked as Green or Yellow is forwarded, Red is dropped. For
>> details on how traffic is marked, see RFC 4115.
>>
>> This egress policer can be used to limit traffic at different rated
>> based on the queues the traffic is in. In addition, it can also be 
>> used
>> to prioritize certain traffic over others at a port level.
>>
>> For example, the following configuration will limit the traffic rate 
>> at a
>> port level to a maximum of 2000 packets a second (64 bytes IPv4 
>> packets).
>> 100pps as CIR (Committed Information Rate) and 1000pps as EIR (Excess
>> Information Rate). High priority traffic is routed to queue 10, which 
>> marks
>> all traffic as CIR, i.e. Green. All low priority traffic, queue 20, 
>> is
>> marked as EIR, i.e. Yellow.
>>
>> ovs-vsctl --timeout=5 set port dpdk1 qos=@myqos -- \
>>    --id=@myqos create qos type=trtcm-policer \
>>    other-config:cir=52000 other-config:cbs=2048 \
>>    other-config:eir=52000 other-config:ebs=2048  \
>>    queues:10=@dpdk1Q10 queues:20=@dpdk1Q20 -- \
>>    --id=@dpdk1Q10 create queue \
>>      other-config:cir=41600000 other-config:cbs=2048 \
>>      other-config:eir=0 other-config:ebs=0 -- \
>>    --id=@dpdk1Q20 create queue \
>>      other-config:cir=0 other-config:cbs=0 \
>>      other-config:eir=41600000 other-config:ebs=2048 \
>>
>> This configuration accomplishes that the high priority traffic has a
>> guaranteed bandwidth egressing the ports at CIR (1000pps), but it can 
>> also
>> use the EIR, so a total of 2000pps at max. These additional 1000pps 
>> is
>> shared with the low priority traffic. The low priority traffic can 
>> use at
>> maximum 1000pps.
>>
>> Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
>
> Thank for the patch Eelco,
>
> Overall loks good and follows the established QoS methodology.
>
> I'm still testing this code in more detail but wanted to clarify a few 
> points below in the meantime.
>
> Comments inline.
>
> We discussed this at the OVS conference but this causes compilation to 
> failure in OVS DPDK due to some of the trtcm API being experimental in 
> DPDK, so we'd either need to avoid that warning or else remove the 
> experimental tag for the API in 19.11.1 and hold off merging this 
> patch until then.

Thanks for the review, will prepare a new version of this patch and sent 
it out later after some more testing.

See comments inline below…

>> ---
>>   Documentation/topics/dpdk/qos.rst |   36 ++++
>>   lib/netdev-dpdk.c                 |  321 
>> +++++++++++++++++++++++++++++++++++++
>>   vswitchd/vswitch.xml              |   30 +++
>>   3 files changed, 386 insertions(+), 1 deletion(-)
>>
>> diff --git a/Documentation/topics/dpdk/qos.rst 
>> b/Documentation/topics/dpdk/qos.rst
>> index c0aec5d..b06b1c2 100644
>> --- a/Documentation/topics/dpdk/qos.rst
>> +++ b/Documentation/topics/dpdk/qos.rst
>> @@ -49,6 +49,42 @@ To clear the QoS configuration from the port and 
>> ovsdb, run::
>>        $ ovs-vsctl destroy QoS vhost-user0 -- clear Port vhost-user0 
>> qos
>>
>
> Below looks good, robust explanation, possibly worth separating the 
> single policer and Two rte policer using the 2nd heading style 
> ~~~~~~~~ as we now have 2 types of QoS.
>
> What do you think?

Ack, will add headers in next version

>> +In addition to the egress-policer OVS-DPDK also has support for a 
>> RFC
>> +4115's Two-Rate, Three-Color marker meter. It's a two-level 
>> hierarchical
>> +policer which first does a color-blind marking of the traffic at the 
>> queue
>> +level, followed by a color-aware marking at the port level. At the 
>> end
>> +traffic marked as Green or Yellow is forwarded, Red is dropped. For
>> +details on how traffic is marked, see RFC 4115.
>> +
>> +This egress policer can be used to limit traffic at different rated
>> +based on the queues the traffic is in. In addition, it can also be 
>> used
>> +to prioritize certain traffic over others at a port level.
>> +
>> +For example, the following configuration will limit the traffic rate 
>> at a
>> +port level to a maximum of 2000 packets a second (64 bytes IPv4 
>> packets).
>> +100pps as CIR (Committed Information Rate) and 1000pps as EIR 
>> (Excess
>> +Information Rate). High priority traffic is routed to queue 10, 
>> which marks
>> +all traffic as CIR, i.e. Green. All low priority traffic, queue 20, 
>> is
>> +marked as EIR, i.e. Yellow::
>> +
>> +    $ ovs-vsctl --timeout=5 set port dpdk1 qos=@myqos -- \
>> +        --id=@myqos create qos type=trtcm-policer \
>> +        other-config:cir=52000 other-config:cbs=2048 \
>> +        other-config:eir=52000 other-config:ebs=2048  \
>> +        queues:10=@dpdk1Q10 queues:20=@dpdk1Q20 -- \
>> +         --id=@dpdk1Q10 create queue \
>> +          other-config:cir=41600000 other-config:cbs=2048 \
>> +          other-config:eir=0 other-config:ebs=0 -- \
>> +         --id=@dpdk1Q20 create queue \
>> +           other-config:cir=0 other-config:cbs=0 \
>> +           other-config:eir=41600000 other-config:ebs=2048 \
>> +
>> +This configuration accomplishes that the high priority traffic has a
>> +guaranteed bandwidth egressing the ports at CIR (1000pps), but it 
>> can also
>> +use the EIR, so a total of 2000pps at max. These additional 1000pps 
>> is
>> +shared with the low priority traffic. The low priority traffic can 
>> use at
>> +maximum 1000pps.
>> +
>>   Refer to ``vswitch.xml`` for more details on egress policer.
>>    Rate Limiting (Ingress Policing)
>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
>> index 072ce96..87f098f 100644
>> --- a/lib/netdev-dpdk.c
>> +++ b/lib/netdev-dpdk.c
>> @@ -309,13 +309,14 @@ struct dpdk_qos_ops {
>>    /* dpdk_qos_ops for each type of user space QoS implementation */
> Minor, missing period for comment.

Ack, will be in next version

>>   static const struct dpdk_qos_ops egress_policer_ops;
>> -
>> +static const struct dpdk_qos_ops trtcm_policer_ops;
>
> Minor, could use a white space to separate the qos ops from the ops 
> array comment below.

Ack, will be in next version

>>   /*
>>    * Array of dpdk_qos_ops, contains pointer to all supported QoS
>>    * operations.
>>    */
>>   static const struct dpdk_qos_ops *const qos_confs[] = {
>>       &egress_policer_ops,
>> +    &trtcm_policer_ops,
>>       NULL
>>   };
>>  @@ -4334,6 +4335,324 @@ static const struct dpdk_qos_ops 
>> egress_policer_ops = {
>>       .qos_run = egress_policer_run
>>   };
>>  +/* trtcm-policer details */
>> +
>> +struct trtcm_policer {
>> +    struct qos_conf qos_conf;
>> +    struct rte_meter_trtcm_rfc4115_params meter_params;
>> +    struct rte_meter_trtcm_rfc4115_profile meter_profile;
>> +    struct rte_meter_trtcm_rfc4115 meter;
>> +    struct netdev_queue_stats stats;
>> +    struct hmap queues;
>> +};
>> +
>> +struct trtcm_policer_queue {
>> +    struct hmap_node hmap_node;
>> +    uint32_t queue_id;
>> +    struct rte_meter_trtcm_rfc4115_params meter_params;
>> +    struct rte_meter_trtcm_rfc4115_profile meter_profile;
>> +    struct rte_meter_trtcm_rfc4115 meter;
>> +    struct netdev_queue_stats stats;
>> +};
>> +
>> +static void
>> +trtcm_policer_details_to_param(const struct smap *details,
>> +                               struct rte_meter_trtcm_rfc4115_params 
>> *params)
>> +{
>> +    memset(params, 0, sizeof *params);
>> +    params->cir = smap_get_ullong(details, "cir", 0);
>> +    params->eir = smap_get_ullong(details, "eir", 0);
>> +    params->cbs = smap_get_ullong(details, "cbs", 0);
>> +    params->ebs = smap_get_ullong(details, "ebs", 0);
>> +}
>> +
>> +static void
>> +trtcm_policer_param_to_detail(
>> +    const struct rte_meter_trtcm_rfc4115_params *params,
>> +    struct smap *details)
>> +{
>> +    smap_add_format(details, "cir", "%"PRIu64, params->cir);
>> +    smap_add_format(details, "eir", "%"PRIu64, params->eir);
>> +    smap_add_format(details, "cbs", "%"PRIu64, params->cbs);
>> +    smap_add_format(details, "ebs", "%"PRIu64, params->ebs);
>> +}
>> +
>> +
>> +static int
>> +trtcm_policer_qos_construct(const struct smap *details,
>> +                            struct qos_conf **conf)
>> +{
>> +    struct trtcm_policer *policer;
>> +    int err = 0;
>> +
>> +    policer = xmalloc(sizeof *policer);
>> +    qos_conf_init(&policer->qos_conf, &trtcm_policer_ops);
>> +    trtcm_policer_details_to_param(details, &policer->meter_params);
>> +    err = 
>> rte_meter_trtcm_rfc4115_profile_config(&policer->meter_profile,
>> +                                                 
>> &policer->meter_params);
>> +    if (!err) {
>> +        err = rte_meter_trtcm_rfc4115_config(&policer->meter,
>> +                                             
>> &policer->meter_profile);
>> +    }
>
> Minor, white space between the  if block below and above. Possibly add 
> one before return below as well (jut in keeping with the existing 
> egress policer style.

Ack, will be in next version

>> +    if (!err) {
>> +        *conf = &policer->qos_conf;
>> +        memset(&policer->stats, 0, sizeof policer->stats);
>> +        hmap_init(&policer->queues);
>> +    } else {
>> +        free(policer);
>> +        *conf = NULL;
>> +        err = -err;
>> +    }
>> +    return err;
>> +}
>> +
>> +static void
>> +trtcm_policer_qos_destruct(struct qos_conf *conf)
>> +{
>> +    struct trtcm_policer_queue *queue, *next_queue;
>> +    struct trtcm_policer *policer = CONTAINER_OF(conf, struct 
>> trtcm_policer,
>> +                                                 qos_conf);
>> +
>> +    HMAP_FOR_EACH_SAFE (queue, next_queue, hmap_node, 
>> &policer->queues) {
>> +        hmap_remove(&policer->queues, &queue->hmap_node);
>> +        free(queue);
>> +    }
>> +    hmap_destroy(&policer->queues);
>> +    free(policer);
>> +}
>> +
>> +static int
>> +trtcm_policer_qos_get(const struct qos_conf *conf, struct smap 
>> *details)
>> +{
>> +    struct trtcm_policer *policer = CONTAINER_OF(conf, struct 
>> trtcm_policer,
>> +                                                 qos_conf);
>> +
>> +    trtcm_policer_param_to_detail(&policer->meter_params, details);
>
> Above is nice, especially where there are multiple parameters, with 
> egress currently we directly access the parameters we need (only 2 of 
> them) but I'm thinking a similar function to this could be implemented 
> for it to keep the approach uniform.
>
> Not a blocker, but in the future we could possibly add the parameters 
> for a given qos_conf as part of conf, would allow us to refactor the 
> qos code to use generic details_to_param/param_to_details operations 
> (assuming the string representing param is the same in details of 
> course). What are your thoughts?

Yes this can be further cleanedup, will do it later, will keep it as is 
for now

>> +    return 0;
>> +}
>> +
>> +static bool
>> +trtcm_policer_qos_is_equal(const struct qos_conf *conf,
>> +                           const struct smap *details)
>> +{
>> +    struct trtcm_policer *policer = CONTAINER_OF(conf, struct 
>> trtcm_policer,
>> +                                                 qos_conf);
>> +    struct rte_meter_trtcm_rfc4115_params params;
>> +
>> +    trtcm_policer_details_to_param(details, &params);
>> +
>> +    return !memcmp(&params, &policer->meter_params, sizeof params);
>> +}
>> +
>> +static struct trtcm_policer_queue *
>> +trtcm_policer_qos_find_queue(struct trtcm_policer *policer, uint32_t 
>> queue_id)
>> +{
>> +    struct trtcm_policer_queue *queue;
>> +    HMAP_FOR_EACH_WITH_HASH (queue, hmap_node, hash_2words(queue_id, 
>> 0),
>> +                             &policer->queues) {
>> +        if (queue->queue_id == queue_id) {
>> +            return queue;
>> +        }
>> +    }
>> +    return NULL;
>> +}
>> +
>> +static inline bool
>> +trtcm_policer_run_single_packet(struct trtcm_policer *policer,
>> +                                struct rte_mbuf *pkt, uint64_t time)
>> +{
>> +    enum rte_color pkt_color;
>> +    struct trtcm_policer_queue *queue;
>> +    uint32_t pkt_len = rte_pktmbuf_pkt_len(pkt) - sizeof(struct 
>> rte_ether_hdr);
>> +    struct dp_packet *dpkt = CONTAINER_OF(pkt, struct dp_packet, 
>> mbuf);
>> +
>> +    queue = trtcm_policer_qos_find_queue(policer, 
>> dpkt->md.skb_priority);
> So I think we should flag the relationship to the users in 
> documentation between the skb priority and the queue ID. Maybe I'm 
> misunderstanding but they would have to understand this to know how 
> traffic will map to which queue?
>
> Also when testing his how are you setting the skb, via traffic gen or 
> by OVS with a flow rule?

This is just the internal field where the queue id is stored. This is 
the value is set with the OVS set_queue action, see general QoS faq 
http://docs.openvswitch.org/en/latest/faq/qos/

>> +    if (!queue) {
>> +        /* If no queue is found, use the default queue, which MUST 
>> exist */
> Minor, period above.

Ack, will be in next version

>> +        queue = trtcm_policer_qos_find_queue(policer, 0);
>> +        if (!queue) {
>> +            return false;
>> +        }
>> +    }
>> +
>> +    pkt_color = 
>> rte_meter_trtcm_rfc4115_color_blind_check(&queue->meter,
>> +                                                          
>> &queue->meter_profile,
>> +                                                          time,
>> +                                                          pkt_len);
>> +
>> +    if (pkt_color == RTE_COLOR_RED) {
>> +        queue->stats.tx_errors++;
>> +    } else {
>> +        queue->stats.tx_bytes += pkt_len;
>> +        queue->stats.tx_packets++;
>> +    }
>> +
>> +    pkt_color = 
>> rte_meter_trtcm_rfc4115_color_aware_check(&policer->meter,
>> +                                                     
>> &policer->meter_profile,
>> +                                                     time, pkt_len,
>> +                                                     pkt_color);
>> +
>> +    if (pkt_color == RTE_COLOR_RED) {
>> +        policer->stats.tx_errors++;
>> +        return false;
>> +    }
>> +
>> +    policer->stats.tx_bytes += pkt_len;
>> +    policer->stats.tx_packets++;
>> +    return true;
>> +}
>> +
>> +static int
>> +trtcm_policer_run(struct qos_conf *conf, struct rte_mbuf **pkts, int 
>> pkt_cnt,
>> +                  bool should_steal)
>> +{
>> +    int i = 0;
>> +    int cnt = 0;
>> +    struct rte_mbuf *pkt = NULL;
>> +    uint64_t current_time = rte_rdtsc();
>> +
>> +    struct trtcm_policer *policer = CONTAINER_OF(conf, struct 
>> trtcm_policer,
>> +                                                 qos_conf);
>> +
>> +    for (i = 0; i < pkt_cnt; i++) {
>> +        pkt = pkts[i];
>> +
>> +        if (trtcm_policer_run_single_packet(policer, pkt, 
>> current_time)) {
> Just thinking, it might be a good idea to change the existing egress 
> single packet run name (netdev_dpdk_policer_run) in light of this 
> function being introduced as it may cause confusion, not a blocker 
> here though.

Ack, will rename it to srtcm_policer_run_single_packet()

>> +            if (cnt != i) {
>> +                pkts[cnt] = pkt;
>> +            }
>> +            cnt++;
>> +        } else {
>> +            if (should_steal) {
> Hmmm, it's been a while since I last looked at this part of the code 
> for the existing policer but I was surprised that we only drop the 
> packet when should steal was enabled, I would have thought that for a 
> give RED color packet it would be freed regardless. It probably clear 
> later, maybe I'm missing something.

To be honest, I copied this code from your implementation, but looking 
at the reason it seems related to how the driver handles the packets not 
being sent related to the place where netdev_dpdk_qos_run() is called.

>> +                rte_pktmbuf_free(pkt);
>> +            }
>> +        }
>> +    }
>> +    return cnt;
>> +}
>> +
>> +static int
>> +trtcm_policer_qos_queue_construct(const struct smap *details,
>> +                                  uint32_t queue_id, struct qos_conf 
>> *conf)
>> +{
>> +    int err = 0;
>> +    struct trtcm_policer_queue *queue;
>> +    struct trtcm_policer *policer = CONTAINER_OF(conf, struct 
>> trtcm_policer,
>> +                                                 qos_conf);
>> +
>> +    queue = trtcm_policer_qos_find_queue(policer, queue_id);
>> +    if (!queue) {
>> +        queue = xmalloc(sizeof *queue);
>> +        queue->queue_id = queue_id;
>> +        memset(&queue->stats, 0, sizeof queue->stats);
>> +        queue->stats.created = time_msec();
>> +        hmap_insert(&policer->queues, &queue->hmap_node,
>> +                    hash_2words(queue_id, 0));
>> +    }
>> +    if (queue_id == 0 && smap_is_empty(details)) {
>> +        /* No default queue configured, use port values */
> So in this case it it that no parameters for a queue are being 
> provided or that no queue is being created and a default queue is 
> created instead?
>
> If no queue/parameters for a queue are requested should the QoS in 
> this case just fail to initialize and log to the user that queue 
> arguments are required?
>
> Just trying to get my head around the behavior of a packet in this 
> case, as the same parameters are in use at both queue (color blind 
> mode) and port (color aware mode), essentially is this queue redundant 
> as the port will be color aware any how so would drop the packet?
>
>
> Might be good to flag what the default behavior will be if queue 
> params are not specified by a user in either vswitch.xml or qos docs 
> above..

This is the same behavior as for the Linux data path, i.e. the default 
queue is always created. I will update the XML documentation to state 
that the default queue will be created if not configured explicitly.

>> +        memcpy(&queue->meter_params, &policer->meter_params,
>> +               sizeof queue->meter_params);
>> +    } else {
>> +        trtcm_policer_details_to_param(details, 
>> &queue->meter_params);
>> +    }
>> +
>> +    err = 
>> rte_meter_trtcm_rfc4115_profile_config(&queue->meter_profile,
>> +                                                 
>> &queue->meter_params);
>> +
>> +    if (!err) {
>> +        err = rte_meter_trtcm_rfc4115_config(&queue->meter,
>> +                                             &queue->meter_profile);
>> +    }
>> +    if (err) {
>> +        hmap_remove(&policer->queues, &queue->hmap_node);
>> +        free(queue);
>> +        err = -err;
>> +    }
>> +    return err;
>> +}
>> +
>> +static void
>> +trtcm_policer_qos_queue_destruct(struct qos_conf *conf, uint32_t 
>> queue_id)
>> +{
>> +    struct trtcm_policer_queue *queue;
>> +    struct trtcm_policer *policer = CONTAINER_OF(conf, struct 
>> trtcm_policer,
>> +                                                 qos_conf);
>> +
>> +    queue = trtcm_policer_qos_find_queue(policer, queue_id);
>> +    if (queue) {
>> +        hmap_remove(&policer->queues, &queue->hmap_node);
>> +        free(queue);
>> +    }
>> +}
>> +
>> +static int
>> +trtcm_policer_qos_queue_get(struct smap *details, uint32_t queue_id,
>> +                            const struct qos_conf *conf)
>> +{
>> +    struct trtcm_policer_queue *queue;
>> +    struct trtcm_policer *policer = CONTAINER_OF(conf, struct 
>> trtcm_policer,
>> +                                                 qos_conf);
>> +
>> +    queue = trtcm_policer_qos_find_queue(policer, queue_id);
>> +    if (!queue) {
>> +        return EINVAL;
>> +    }
>> +
>> +    trtcm_policer_param_to_detail(&queue->meter_params, details);
>> +    return 0;
>> +}
>> +
>> +static int
>> +trtcm_policer_qos_queue_get_stats(const struct qos_conf *conf,
>> +                                  uint32_t queue_id,
>> +                                  struct netdev_queue_stats *stats)
>> +{
>> +    struct trtcm_policer_queue *queue;
>> +    struct trtcm_policer *policer = CONTAINER_OF(conf, struct 
>> trtcm_policer,
>> +                                                 qos_conf);
>> +
>> +    queue = trtcm_policer_qos_find_queue(policer, queue_id);
>> +    if (!queue) {
>> +        return EINVAL;
>> +    }
>> +    memcpy(stats, &queue->stats, sizeof *stats);
>> +    return 0;
>> +}
>> +
>> +static int
>> +trtcm_policer_qos_queue_dump_state_init(const struct qos_conf *conf,
>> +                                        struct 
>> netdev_dpdk_queue_state *state)
>> +{
>> +    uint32_t i = 0;
>> +    struct trtcm_policer_queue *queue;
>> +    struct trtcm_policer *policer = CONTAINER_OF(conf, struct 
>> trtcm_policer,
>> +                                                 qos_conf);
>> +
>> +    state->n_queues = hmap_count(&policer->queues);
>> +    state->cur_queue = 0;
>
> To confrim, is the assumption here thet you must always have a default 
> queue0 and hence why you can set this here once? It will be used 
> eslewhere to cycle through queue states I assume.

Yes this is correct there MUST be a default queue, which is 
automatically created.

> Thanks
> Ian
>> +    state->queues = xmalloc(state->n_queues * sizeof 
>> *state->queues);
>> +
>> +    HMAP_FOR_EACH (queue, hmap_node, &policer->queues) {
>> +        state->queues[i++] = queue->queue_id;
>> +    }
>> +    return 0;
>> +}
>> +
>> +static const struct dpdk_qos_ops trtcm_policer_ops = {
>> +    .qos_name = "trtcm-policer",
>> +    .qos_construct = trtcm_policer_qos_construct,
>> +    .qos_destruct = trtcm_policer_qos_destruct,
>> +    .qos_get = trtcm_policer_qos_get,
>> +    .qos_is_equal = trtcm_policer_qos_is_equal,
>> +    .qos_run = trtcm_policer_run,
>> +    .qos_queue_construct = trtcm_policer_qos_queue_construct,
>> +    .qos_queue_destruct = trtcm_policer_qos_queue_destruct,
>> +    .qos_queue_get = trtcm_policer_qos_queue_get,
>> +    .qos_queue_get_stats = trtcm_policer_qos_queue_get_stats,
>> +    .qos_queue_dump_state_init = 
>> trtcm_policer_qos_queue_dump_state_init
>> +};
>> +
>>   static int
>>   netdev_dpdk_reconfigure(struct netdev *netdev)
>>   {
>> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
>> index 01304a5..3569e7c 100644
>> --- a/vswitchd/vswitch.xml
>> +++ b/vswitchd/vswitch.xml
>> @@ -4409,6 +4409,15 @@ ovs-vsctl add-port br0 p0 -- set Interface p0 
>> type=patch options:peer=p1 \
>>             in performance will be noticed in terms of overall 
>> aggregate traffic
>>             throughput.
>>           </dd>
>> +        <dt><code>trtcm-policer</code></dt>
>> +        <dd>
>> +          A DPDK egress policer algorithm using RFC 4115's Two-Rate,
>> +          Three-Color marker. It's a two-level hierarchical policer
>> +          which first does a color-blind marking of the traffic at 
>> the queue
>> +          level, followed by a color-aware marking at the port 
>> level. At the
>> +          end traffic marked as Green or Yellow is forwarded, Red is 
>> dropped.
>> +          For details on how traffic is marked, see RFC 4115.
>> +        </dd>
>>         </dl>
>>       </column>
>>  @@ -4476,6 +4485,27 @@ ovs-vsctl add-port br0 p0 -- set Interface p0 
>> type=patch options:peer=p1 \
>>           bytes/tokens of the packet. If there are not enough tokens 
>> in the cbs
>>           bucket the packet will be dropped.
>>         </column>
>> +      <column name="other_config" key="eir" type='{"type": 
>> "integer"}'>
>> +        The Excess Information Rate (EIR) is measured in bytes of IP
>> +        packets per second, i.e. it includes the IP header, but not 
>> link
>> +        specific (e.g. Ethernet) headers. This represents the bytes 
>> per second
>> +        rate at which the token bucket will be updated. The eir 
>> value is
>> +        calculated by (pps x packet data size).  For example 
>> assuming a user
>> +        wishes to limit a stream consisting of 64 byte packets to 1 
>> million
>> +        packets per second the EIR would be set to to to 46000000. 
>> This value
>> +        can be broken into '1,000,000 x 46'. Where 1,000,000 is the 
>> policing
>> +        rate for the number of packets per second and 46 represents 
>> the size
>> +        of the packet data for a 64 byte ip packet.
>> +      </column>
>> +      <column name="other_config" key="ebs" type='{"type": 
>> "integer"}'>
>> +        The Excess Burst Size (EBS) is measured in bytes and 
>> represents a
>> +        token bucket. At a minimum this value should be be set to 
>> the expected
>> +        largest size packet in the traffic stream. In practice 
>> larger values
>> +        may be used to increase the size of the token bucket. If a 
>> packet can
>> +        be transmitted then the ebs will be decremented by the 
>> number of
>> +        bytes/tokens of the packet. If there are not enough tokens 
>> in the cbs
>> +        bucket the packet might be dropped.
>> +      </column>
>>       </group>
>>        <group title="Configuration for linux-sfq">
>>
>> _______________________________________________
>> dev mailing list
>> dev@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>
diff mbox series

Patch

diff --git a/Documentation/topics/dpdk/qos.rst b/Documentation/topics/dpdk/qos.rst
index c0aec5d..b06b1c2 100644
--- a/Documentation/topics/dpdk/qos.rst
+++ b/Documentation/topics/dpdk/qos.rst
@@ -49,6 +49,42 @@  To clear the QoS configuration from the port and ovsdb, run::
 
     $ ovs-vsctl destroy QoS vhost-user0 -- clear Port vhost-user0 qos
 
+In addition to the egress-policer OVS-DPDK also has support for a RFC
+4115's Two-Rate, Three-Color marker meter. It's a two-level hierarchical
+policer which first does a color-blind marking of the traffic at the queue
+level, followed by a color-aware marking at the port level. At the end
+traffic marked as Green or Yellow is forwarded, Red is dropped. For
+details on how traffic is marked, see RFC 4115.
+
+This egress policer can be used to limit traffic at different rated
+based on the queues the traffic is in. In addition, it can also be used
+to prioritize certain traffic over others at a port level.
+
+For example, the following configuration will limit the traffic rate at a
+port level to a maximum of 2000 packets a second (64 bytes IPv4 packets).
+100pps as CIR (Committed Information Rate) and 1000pps as EIR (Excess
+Information Rate). High priority traffic is routed to queue 10, which marks
+all traffic as CIR, i.e. Green. All low priority traffic, queue 20, is
+marked as EIR, i.e. Yellow::
+
+    $ ovs-vsctl --timeout=5 set port dpdk1 qos=@myqos -- \
+        --id=@myqos create qos type=trtcm-policer \
+        other-config:cir=52000 other-config:cbs=2048 \
+        other-config:eir=52000 other-config:ebs=2048  \
+        queues:10=@dpdk1Q10 queues:20=@dpdk1Q20 -- \
+         --id=@dpdk1Q10 create queue \
+          other-config:cir=41600000 other-config:cbs=2048 \
+          other-config:eir=0 other-config:ebs=0 -- \
+         --id=@dpdk1Q20 create queue \
+           other-config:cir=0 other-config:cbs=0 \
+           other-config:eir=41600000 other-config:ebs=2048 \
+
+This configuration accomplishes that the high priority traffic has a
+guaranteed bandwidth egressing the ports at CIR (1000pps), but it can also
+use the EIR, so a total of 2000pps at max. These additional 1000pps is
+shared with the low priority traffic. The low priority traffic can use at
+maximum 1000pps.
+
 Refer to ``vswitch.xml`` for more details on egress policer.
 
 Rate Limiting (Ingress Policing)
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 072ce96..87f098f 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -309,13 +309,14 @@  struct dpdk_qos_ops {
 
 /* dpdk_qos_ops for each type of user space QoS implementation */
 static const struct dpdk_qos_ops egress_policer_ops;
-
+static const struct dpdk_qos_ops trtcm_policer_ops;
 /*
  * Array of dpdk_qos_ops, contains pointer to all supported QoS
  * operations.
  */
 static const struct dpdk_qos_ops *const qos_confs[] = {
     &egress_policer_ops,
+    &trtcm_policer_ops,
     NULL
 };
 
@@ -4334,6 +4335,324 @@  static const struct dpdk_qos_ops egress_policer_ops = {
     .qos_run = egress_policer_run
 };
 
+/* trtcm-policer details */
+
+struct trtcm_policer {
+    struct qos_conf qos_conf;
+    struct rte_meter_trtcm_rfc4115_params meter_params;
+    struct rte_meter_trtcm_rfc4115_profile meter_profile;
+    struct rte_meter_trtcm_rfc4115 meter;
+    struct netdev_queue_stats stats;
+    struct hmap queues;
+};
+
+struct trtcm_policer_queue {
+    struct hmap_node hmap_node;
+    uint32_t queue_id;
+    struct rte_meter_trtcm_rfc4115_params meter_params;
+    struct rte_meter_trtcm_rfc4115_profile meter_profile;
+    struct rte_meter_trtcm_rfc4115 meter;
+    struct netdev_queue_stats stats;
+};
+
+static void
+trtcm_policer_details_to_param(const struct smap *details,
+                               struct rte_meter_trtcm_rfc4115_params *params)
+{
+    memset(params, 0, sizeof *params);
+    params->cir = smap_get_ullong(details, "cir", 0);
+    params->eir = smap_get_ullong(details, "eir", 0);
+    params->cbs = smap_get_ullong(details, "cbs", 0);
+    params->ebs = smap_get_ullong(details, "ebs", 0);
+}
+
+static void
+trtcm_policer_param_to_detail(
+    const struct rte_meter_trtcm_rfc4115_params *params,
+    struct smap *details)
+{
+    smap_add_format(details, "cir", "%"PRIu64, params->cir);
+    smap_add_format(details, "eir", "%"PRIu64, params->eir);
+    smap_add_format(details, "cbs", "%"PRIu64, params->cbs);
+    smap_add_format(details, "ebs", "%"PRIu64, params->ebs);
+}
+
+
+static int
+trtcm_policer_qos_construct(const struct smap *details,
+                            struct qos_conf **conf)
+{
+    struct trtcm_policer *policer;
+    int err = 0;
+
+    policer = xmalloc(sizeof *policer);
+    qos_conf_init(&policer->qos_conf, &trtcm_policer_ops);
+    trtcm_policer_details_to_param(details, &policer->meter_params);
+    err = rte_meter_trtcm_rfc4115_profile_config(&policer->meter_profile,
+                                                 &policer->meter_params);
+    if (!err) {
+        err = rte_meter_trtcm_rfc4115_config(&policer->meter,
+                                             &policer->meter_profile);
+    }
+    if (!err) {
+        *conf = &policer->qos_conf;
+        memset(&policer->stats, 0, sizeof policer->stats);
+        hmap_init(&policer->queues);
+    } else {
+        free(policer);
+        *conf = NULL;
+        err = -err;
+    }
+    return err;
+}
+
+static void
+trtcm_policer_qos_destruct(struct qos_conf *conf)
+{
+    struct trtcm_policer_queue *queue, *next_queue;
+    struct trtcm_policer *policer = CONTAINER_OF(conf, struct trtcm_policer,
+                                                 qos_conf);
+
+    HMAP_FOR_EACH_SAFE (queue, next_queue, hmap_node, &policer->queues) {
+        hmap_remove(&policer->queues, &queue->hmap_node);
+        free(queue);
+    }
+    hmap_destroy(&policer->queues);
+    free(policer);
+}
+
+static int
+trtcm_policer_qos_get(const struct qos_conf *conf, struct smap *details)
+{
+    struct trtcm_policer *policer = CONTAINER_OF(conf, struct trtcm_policer,
+                                                 qos_conf);
+
+    trtcm_policer_param_to_detail(&policer->meter_params, details);
+    return 0;
+}
+
+static bool
+trtcm_policer_qos_is_equal(const struct qos_conf *conf,
+                           const struct smap *details)
+{
+    struct trtcm_policer *policer = CONTAINER_OF(conf, struct trtcm_policer,
+                                                 qos_conf);
+    struct rte_meter_trtcm_rfc4115_params params;
+
+    trtcm_policer_details_to_param(details, &params);
+
+    return !memcmp(&params, &policer->meter_params, sizeof params);
+}
+
+static struct trtcm_policer_queue *
+trtcm_policer_qos_find_queue(struct trtcm_policer *policer, uint32_t queue_id)
+{
+    struct trtcm_policer_queue *queue;
+    HMAP_FOR_EACH_WITH_HASH (queue, hmap_node, hash_2words(queue_id, 0),
+                             &policer->queues) {
+        if (queue->queue_id == queue_id) {
+            return queue;
+        }
+    }
+    return NULL;
+}
+
+static inline bool
+trtcm_policer_run_single_packet(struct trtcm_policer *policer,
+                                struct rte_mbuf *pkt, uint64_t time)
+{
+    enum rte_color pkt_color;
+    struct trtcm_policer_queue *queue;
+    uint32_t pkt_len = rte_pktmbuf_pkt_len(pkt) - sizeof(struct rte_ether_hdr);
+    struct dp_packet *dpkt = CONTAINER_OF(pkt, struct dp_packet, mbuf);
+
+    queue = trtcm_policer_qos_find_queue(policer, dpkt->md.skb_priority);
+    if (!queue) {
+        /* If no queue is found, use the default queue, which MUST exist */
+        queue = trtcm_policer_qos_find_queue(policer, 0);
+        if (!queue) {
+            return false;
+        }
+    }
+
+    pkt_color = rte_meter_trtcm_rfc4115_color_blind_check(&queue->meter,
+                                                          &queue->meter_profile,
+                                                          time,
+                                                          pkt_len);
+
+    if (pkt_color == RTE_COLOR_RED) {
+        queue->stats.tx_errors++;
+    } else {
+        queue->stats.tx_bytes += pkt_len;
+        queue->stats.tx_packets++;
+    }
+
+    pkt_color = rte_meter_trtcm_rfc4115_color_aware_check(&policer->meter,
+                                                     &policer->meter_profile,
+                                                     time, pkt_len,
+                                                     pkt_color);
+
+    if (pkt_color == RTE_COLOR_RED) {
+        policer->stats.tx_errors++;
+        return false;
+    }
+
+    policer->stats.tx_bytes += pkt_len;
+    policer->stats.tx_packets++;
+    return true;
+}
+
+static int
+trtcm_policer_run(struct qos_conf *conf, struct rte_mbuf **pkts, int pkt_cnt,
+                  bool should_steal)
+{
+    int i = 0;
+    int cnt = 0;
+    struct rte_mbuf *pkt = NULL;
+    uint64_t current_time = rte_rdtsc();
+
+    struct trtcm_policer *policer = CONTAINER_OF(conf, struct trtcm_policer,
+                                                 qos_conf);
+
+    for (i = 0; i < pkt_cnt; i++) {
+        pkt = pkts[i];
+
+        if (trtcm_policer_run_single_packet(policer, pkt, current_time)) {
+            if (cnt != i) {
+                pkts[cnt] = pkt;
+            }
+            cnt++;
+        } else {
+            if (should_steal) {
+                rte_pktmbuf_free(pkt);
+            }
+        }
+    }
+    return cnt;
+}
+
+static int
+trtcm_policer_qos_queue_construct(const struct smap *details,
+                                  uint32_t queue_id, struct qos_conf *conf)
+{
+    int err = 0;
+    struct trtcm_policer_queue *queue;
+    struct trtcm_policer *policer = CONTAINER_OF(conf, struct trtcm_policer,
+                                                 qos_conf);
+
+    queue = trtcm_policer_qos_find_queue(policer, queue_id);
+    if (!queue) {
+        queue = xmalloc(sizeof *queue);
+        queue->queue_id = queue_id;
+        memset(&queue->stats, 0, sizeof queue->stats);
+        queue->stats.created = time_msec();
+        hmap_insert(&policer->queues, &queue->hmap_node,
+                    hash_2words(queue_id, 0));
+    }
+    if (queue_id == 0 && smap_is_empty(details)) {
+        /* No default queue configured, use port values */
+        memcpy(&queue->meter_params, &policer->meter_params,
+               sizeof queue->meter_params);
+    } else {
+        trtcm_policer_details_to_param(details, &queue->meter_params);
+    }
+
+    err = rte_meter_trtcm_rfc4115_profile_config(&queue->meter_profile,
+                                                 &queue->meter_params);
+
+    if (!err) {
+        err = rte_meter_trtcm_rfc4115_config(&queue->meter,
+                                             &queue->meter_profile);
+    }
+    if (err) {
+        hmap_remove(&policer->queues, &queue->hmap_node);
+        free(queue);
+        err = -err;
+    }
+    return err;
+}
+
+static void
+trtcm_policer_qos_queue_destruct(struct qos_conf *conf, uint32_t queue_id)
+{
+    struct trtcm_policer_queue *queue;
+    struct trtcm_policer *policer = CONTAINER_OF(conf, struct trtcm_policer,
+                                                 qos_conf);
+
+    queue = trtcm_policer_qos_find_queue(policer, queue_id);
+    if (queue) {
+        hmap_remove(&policer->queues, &queue->hmap_node);
+        free(queue);
+    }
+}
+
+static int
+trtcm_policer_qos_queue_get(struct smap *details, uint32_t queue_id,
+                            const struct qos_conf *conf)
+{
+    struct trtcm_policer_queue *queue;
+    struct trtcm_policer *policer = CONTAINER_OF(conf, struct trtcm_policer,
+                                                 qos_conf);
+
+    queue = trtcm_policer_qos_find_queue(policer, queue_id);
+    if (!queue) {
+        return EINVAL;
+    }
+
+    trtcm_policer_param_to_detail(&queue->meter_params, details);
+    return 0;
+}
+
+static int
+trtcm_policer_qos_queue_get_stats(const struct qos_conf *conf,
+                                  uint32_t queue_id,
+                                  struct netdev_queue_stats *stats)
+{
+    struct trtcm_policer_queue *queue;
+    struct trtcm_policer *policer = CONTAINER_OF(conf, struct trtcm_policer,
+                                                 qos_conf);
+
+    queue = trtcm_policer_qos_find_queue(policer, queue_id);
+    if (!queue) {
+        return EINVAL;
+    }
+    memcpy(stats, &queue->stats, sizeof *stats);
+    return 0;
+}
+
+static int
+trtcm_policer_qos_queue_dump_state_init(const struct qos_conf *conf,
+                                        struct netdev_dpdk_queue_state *state)
+{
+    uint32_t i = 0;
+    struct trtcm_policer_queue *queue;
+    struct trtcm_policer *policer = CONTAINER_OF(conf, struct trtcm_policer,
+                                                 qos_conf);
+
+    state->n_queues = hmap_count(&policer->queues);
+    state->cur_queue = 0;
+    state->queues = xmalloc(state->n_queues * sizeof *state->queues);
+
+    HMAP_FOR_EACH (queue, hmap_node, &policer->queues) {
+        state->queues[i++] = queue->queue_id;
+    }
+    return 0;
+}
+
+static const struct dpdk_qos_ops trtcm_policer_ops = {
+    .qos_name = "trtcm-policer",
+    .qos_construct = trtcm_policer_qos_construct,
+    .qos_destruct = trtcm_policer_qos_destruct,
+    .qos_get = trtcm_policer_qos_get,
+    .qos_is_equal = trtcm_policer_qos_is_equal,
+    .qos_run = trtcm_policer_run,
+    .qos_queue_construct = trtcm_policer_qos_queue_construct,
+    .qos_queue_destruct = trtcm_policer_qos_queue_destruct,
+    .qos_queue_get = trtcm_policer_qos_queue_get,
+    .qos_queue_get_stats = trtcm_policer_qos_queue_get_stats,
+    .qos_queue_dump_state_init = trtcm_policer_qos_queue_dump_state_init
+};
+
 static int
 netdev_dpdk_reconfigure(struct netdev *netdev)
 {
diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
index 01304a5..3569e7c 100644
--- a/vswitchd/vswitch.xml
+++ b/vswitchd/vswitch.xml
@@ -4409,6 +4409,15 @@  ovs-vsctl add-port br0 p0 -- set Interface p0 type=patch options:peer=p1 \
           in performance will be noticed in terms of overall aggregate traffic
           throughput.
         </dd>
+        <dt><code>trtcm-policer</code></dt>
+        <dd>
+          A DPDK egress policer algorithm using RFC 4115's Two-Rate,
+          Three-Color marker. It's a two-level hierarchical policer
+          which first does a color-blind marking of the traffic at the queue
+          level, followed by a color-aware marking at the port level. At the
+          end traffic marked as Green or Yellow is forwarded, Red is dropped.
+          For details on how traffic is marked, see RFC 4115.
+        </dd>
       </dl>
     </column>
 
@@ -4476,6 +4485,27 @@  ovs-vsctl add-port br0 p0 -- set Interface p0 type=patch options:peer=p1 \
         bytes/tokens of the packet. If there are not enough tokens in the cbs
         bucket the packet will be dropped.
       </column>
+      <column name="other_config" key="eir" type='{"type": "integer"}'>
+        The Excess Information Rate (EIR) is measured in bytes of IP
+        packets per second, i.e. it includes the IP header, but not link
+        specific (e.g. Ethernet) headers. This represents the bytes per second
+        rate at which the token bucket will be updated. The eir value is
+        calculated by (pps x packet data size).  For example assuming a user
+        wishes to limit a stream consisting of 64 byte packets to 1 million
+        packets per second the EIR would be set to to to 46000000. This value
+        can be broken into '1,000,000 x 46'. Where 1,000,000 is the policing
+        rate for the number of packets per second and 46 represents the size
+        of the packet data for a 64 byte ip packet.
+      </column>
+      <column name="other_config" key="ebs" type='{"type": "integer"}'>
+        The Excess Burst Size (EBS) is measured in bytes and represents a
+        token bucket. At a minimum this value should be be set to the expected
+        largest size packet in the traffic stream. In practice larger values
+        may be used to increase the size of the token bucket. If a packet can
+        be transmitted then the ebs will be decremented by the number of
+        bytes/tokens of the packet. If there are not enough tokens in the cbs
+        bucket the packet might be dropped.
+      </column>
     </group>
 
     <group title="Configuration for linux-sfq">