diff mbox series

[ovs-dev,v1,1/2] netdev-dpdk: add netdev-dpdk egress pkts policer.

Message ID SY4PR01MB84388ADC0207878D8925992CCD8D9@SY4PR01MB8438.ausprd01.prod.outlook.com
State Changes Requested
Headers show
Series Add netdev-dpdk support for ingress and egress pkts policer. | expand

Checks

Context Check Description
ovsrobot/apply-robot success apply and check: success
ovsrobot/github-robot-_Build_and_Test fail github build: failed

Commit Message

miter April 2, 2023, 9:30 a.m. UTC
From: Lin Huang <linhuang@ruijie.com.cn>

add netdev-dpdk egress pkts policer.

Signed-off-by: Lin Huang <linhuang@ruijie.com.cn>
---
 lib/netdev-dpdk.c    | 118 +++++++++++++++++++++++++++++++++++++++++++
 vswitchd/vswitch.xml |  32 ++++++++++++
 2 files changed, 150 insertions(+)

Comments

Ilya Maximets April 4, 2023, 2:57 p.m. UTC | #1
On 4/2/23 11:30, miterv@outlook.com wrote:
> From: Lin Huang <linhuang@ruijie.com.cn>
> 
> add netdev-dpdk egress pkts policer.
> 
> Signed-off-by: Lin Huang <linhuang@ruijie.com.cn>
> ---

Hi.  Thanks for the patch!  I didn't test this, but I have a few
general comments.

First, please, add some of the relevant information that you posted
in reply to the cover letter into a commit message.  I.e. why the
feature is needed and how to configure it.

Also, since it is a user-visible change, we need an entry in the
NEWS file for it.

See some more comments inline.

Best regards, Ilya Maximets.

>  lib/netdev-dpdk.c    | 118 +++++++++++++++++++++++++++++++++++++++++++
>  vswitchd/vswitch.xml |  32 ++++++++++++
>  2 files changed, 150 insertions(+)
> 
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index fb0dd43f7..6709c459c 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -19,6 +19,7 @@
>  
>  #include <errno.h>
>  #include <signal.h>
> +#include <stdint.h>
>  #include <stdlib.h>
>  #include <string.h>
>  #include <unistd.h>
> @@ -59,6 +60,7 @@
>  #include "openvswitch/ofp-parse.h"
>  #include "openvswitch/ofp-print.h"
>  #include "openvswitch/shash.h"
> +#include "openvswitch/token-bucket.h"
>  #include "openvswitch/vlog.h"
>  #include "ovs-numa.h"
>  #include "ovs-rcu.h"
> @@ -345,6 +347,7 @@ struct dpdk_qos_ops {
>  
>  /* dpdk_qos_ops for each type of user space QoS implementation. */
>  static const struct dpdk_qos_ops egress_policer_ops;
> +static const struct dpdk_qos_ops egress_pkts_policer_ops;
>  static const struct dpdk_qos_ops trtcm_policer_ops;
>  
>  /*
> @@ -353,6 +356,7 @@ static const struct dpdk_qos_ops trtcm_policer_ops;
>   */
>  static const struct dpdk_qos_ops *const qos_confs[] = {
>      &egress_policer_ops,
> +    &egress_pkts_policer_ops,
>      &trtcm_policer_ops,
>      NULL
>  };
> @@ -2335,6 +2339,29 @@ srtcm_policer_run_single_packet(struct rte_meter_srtcm *meter,
>      return cnt;
>  }
>  
> +static int
> +pkts_policer_run_single_packet(struct token_bucket *tb, struct rte_mbuf **pkts,
> +                               int pkt_cnt, bool should_steal)
> +{
> +    int cnt = 0;
> +    struct rte_mbuf *pkt;

I know that existing code is written this way, but it's better to keep
variable definitions in the reverse x-mass tree order, i.e. from the
longest line to the shortest.

> +
> +    for (int i = 0; i < pkt_cnt; i++) {
> +        pkt = pkts[i];
> +        /* Handle current packet */

Period at the end of a sentence.

> +        if (token_bucket_withdraw(tb, 1000)) {

Why a 1000 and not 1 ?

> +            if (cnt != i) {
> +                pkts[cnt] = pkt;
> +            }
> +            cnt++;
> +        } else if (should_steal) {
> +            rte_pktmbuf_free(pkt);
> +        }
> +    }

I wonder if it makes sense to have a per-packet granularity here.
I mean, what if we just call token_bucket_withdraw(tb, pkt_cnt)
and drop the whole batch if it doesn't fit?
It sounds unlikely that someone will care about extra 31 packets
being dropped in a worst case scenario.  We're already dropping
packets, it might be better to allow 1 batch in 32, instead of
1 packet from each of 32 batches (just an example).
The single token_bucket_withdraw() and the batched mbuf free will
be much more efficient.

What do you think?

> +
> +    return cnt;
> +}
> +
>  static int
>  ingress_policer_run(struct ingress_policer *policer, struct rte_mbuf **pkts,
>                      int pkt_cnt, bool should_steal)
> @@ -4858,6 +4885,97 @@ static const struct dpdk_qos_ops egress_policer_ops = {
>      .qos_run = egress_policer_run
>  };
>  
> +/* egress-pkts-policer details */

Period at the end.

> +

Empty line is unnecessary.

> +struct egress_pkts_policer {
> +    struct qos_conf qos_conf;
> +    struct token_bucket tb;
> +};
> +
> +static int
> +egress_pkts_policer_qos_construct(const struct smap *details,
> +                                  struct qos_conf **conf)
> +{
> +    uint32_t rate, burst;
> +    struct egress_pkts_policer *policer;
> +
> +    policer = xmalloc(sizeof *policer);
> +    rate  = smap_get_uint(details, "pkts_rate", 0);
> +    burst = smap_get_uint(details, "pkts_burst", 0);

We use kpkts as a vlue for ingress policing instead of pkts, should
we use the same here?

> +
> +    /*
> +     * Force to 0 if no rate specified,
> +     * default to rate if burst is 0,
> +     * else stick with user-specified value.
> +     */
> +    burst = (!rate ? 0 : !burst ? rate : burst);
> +
> +    qos_conf_init(&policer->qos_conf, &egress_pkts_policer_ops);
> +    token_bucket_init(&policer->tb, rate, burst * 1000);

Why * 1000 ?

> +
> +    *conf = &policer->qos_conf;
> +
> +    return 0;
> +}
> +
> +static void
> +egress_pkts_policer_qos_destruct(struct qos_conf *conf)
> +{
> +    struct egress_pkts_policer *policer =
> +                CONTAINER_OF(conf, struct egress_pkts_policer, qos_conf);
> +
> +    free(policer);
> +}
> +
> +static int
> +egress_pkts_policer_qos_get(const struct qos_conf *conf, struct smap *details)
> +{
> +    struct egress_pkts_policer *policer =
> +                CONTAINER_OF(conf, struct egress_pkts_policer, qos_conf);
> +
> +    smap_add_format(details, "pkts_rate", "%"PRIu32, policer->tb.rate);
> +    smap_add_format(details, "pkts_burst", "%"PRIu32, policer->tb.burst);
> +
> +    return 0;
> +}
> +
> +static bool
> +egress_pkts_policer_qos_is_equal(const struct qos_conf *conf,
> +                                 const struct smap *details)
> +{
> +    uint32_t rate, burst;
> +    struct egress_pkts_policer *policer =
> +        CONTAINER_OF(conf, struct egress_pkts_policer, qos_conf);

Reverse x-mass tree.

> +
> +    rate  = smap_get_uint(details, "pkts_rate", 0);
> +    burst = smap_get_uint(details, "pkts_burst", 0);
> +
> +    return (policer->tb.rate == rate && policer->tb.burst == burst);
> +}
> +
> +static int
> +egress_pkts_policer_run(struct qos_conf *conf, struct rte_mbuf **pkts,
> +                        int pkt_cnt, bool should_steal)
> +{
> +    int cnt = 0;
> +    struct egress_pkts_policer *policer =
> +                CONTAINER_OF(conf, struct egress_pkts_policer, qos_conf);

Reverse x-mass tree.

> +
> +    cnt = pkts_policer_run_single_packet(&policer->tb, pkts, pkt_cnt,
> +                                         should_steal);
> +
> +    return cnt;
> +}
> +
> +static const struct dpdk_qos_ops egress_pkts_policer_ops = {
> +    .qos_name = "egress-pkts-policer",
> +    .qos_construct = egress_pkts_policer_qos_construct,
> +    .qos_destruct = egress_pkts_policer_qos_destruct,
> +    .qos_get = egress_pkts_policer_qos_get,
> +    .qos_is_equal = egress_pkts_policer_qos_is_equal,
> +    .qos_run = egress_pkts_policer_run
> +};
> +
>  /* trtcm-policer details */
>  
>  struct trtcm_policer {
> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
> index edb5eafa0..94bfe54eb 100644
> --- a/vswitchd/vswitch.xml
> +++ b/vswitchd/vswitch.xml
> @@ -4791,6 +4791,19 @@ ovs-vsctl add-port br0 p0 -- set Interface p0 type=patch options:peer=p1 \
>            in performance will be noticed in terms of overall aggregate traffic
>            throughput.
>          </dd>
> +        <dt><code>egress-pkts-policer</code></dt>
> +        <dd>
> +          A DPDK egress packet per second policer algorithm using the ovs
> +          token bucket library. The token bucket library provides an
> +          implementation which allows the policing of packets traffic. The
> +          implementation in OVS essentially creates a single token bucket used
> +          to police traffic.

I'd replace the text above with just:

"An egress packet per second policer for DPDK ports using
 a simple tocken bucket algorithm."

>            It should be noted that when the token bucket is
> +          configured as part of QoS there will be a performance overhead as the
> +          token bucket itself will consume CPU cycles in order to police
> +          traffic. These CPU cycles ordinarily are used for packet proccessing.
> +          As such the drop in performance will be noticed in terms of overall
> +          aggregate traffic throughput.
> +        </dd>
>          <dt><code>trtcm-policer</code></dt>
>          <dd>
>            A DPDK egress policer algorithm using RFC 4115's Two-Rate,
> @@ -4896,6 +4909,25 @@ ovs-vsctl add-port br0 p0 -- set Interface p0 type=patch options:peer=p1 \
>        </column>
>      </group>
>  
> +    <group title="Configuration for egress-pkts-policer QoS">
> +      <p>
> +        <ref table="QoS"/> <ref table="QoS" column="type"/>
> +        <code>egress-pkts-policer</code> provides egress pkts policing for
> +        userspace port types with DPDK.
> +
> +        It has the following key-value pairs defined.
> +      </p>
> +
> +      <column name="other_config" key="pkts_rate" type='{"type": "integer"}'>
> +        The Packets Per Second (pps) represents the packet per second rate at
> +        which the token bucket will be updated.
> +      </column>
> +      <column name="other_config" key="pkts_burst" type='{"type": "integer"}'>
> +        The Packets Per Second Burst Size is measured in count and represents a
> +        token bucket.
> +      </column>

The available value range should be specified for these.

> +    </group>
> +
>      <group title="Configuration for linux-sfq">
>        <p>
>          The <code>linux-sfq</code> QoS supports the following key-value pairs:
miter April 4, 2023, 4:58 p.m. UTC | #2
Hi Ilya,

Thanks for looking at this patch and for your feedback.
There are some explanations about your comments.

> -----Original Message-----
> From: Ilya Maximets <i.maximets@ovn.org>
> Sent: Tuesday, April 4, 2023 10:58 PM
> To: miterv@outlook.com; dev@openvswitch.org
> Cc: i.maximets@ovn.org; Lin Huang <linhuang@ruijie.com.cn>
> Subject: Re: [PATCH v1 1/2] netdev-dpdk: add netdev-dpdk egress pkts policer.
> 
> On 4/2/23 11:30, miterv@outlook.com wrote:
> > From: Lin Huang <linhuang@ruijie.com.cn>
> >
> > add netdev-dpdk egress pkts policer.
> >
> > Signed-off-by: Lin Huang <linhuang@ruijie.com.cn>
> > ---
> 
> Hi.  Thanks for the patch!  I didn't test this, but I have a few
> general comments.
> 
> First, please, add some of the relevant information that you posted
> in reply to the cover letter into a commit message.  I.e. why the
> feature is needed and how to configure it.
> 
> Also, since it is a user-visible change, we need an entry in the
> NEWS file for it.

Yes, I will add more information about my work later.

> 
> See some more comments inline.
> 
> Best regards, Ilya Maximets.
> 
> >  lib/netdev-dpdk.c    | 118
> +++++++++++++++++++++++++++++++++++++++++++
> >  vswitchd/vswitch.xml |  32 ++++++++++++
> >  2 files changed, 150 insertions(+)
> >
> > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> > index fb0dd43f7..6709c459c 100644
> > --- a/lib/netdev-dpdk.c
> > +++ b/lib/netdev-dpdk.c
> > @@ -19,6 +19,7 @@
> >
> >  #include <errno.h>
> >  #include <signal.h>
> > +#include <stdint.h>
> >  #include <stdlib.h>
> >  #include <string.h>
> >  #include <unistd.h>
> > @@ -59,6 +60,7 @@
> >  #include "openvswitch/ofp-parse.h"
> >  #include "openvswitch/ofp-print.h"
> >  #include "openvswitch/shash.h"
> > +#include "openvswitch/token-bucket.h"
> >  #include "openvswitch/vlog.h"
> >  #include "ovs-numa.h"
> >  #include "ovs-rcu.h"
> > @@ -345,6 +347,7 @@ struct dpdk_qos_ops {
> >
> >  /* dpdk_qos_ops for each type of user space QoS implementation. */
> >  static const struct dpdk_qos_ops egress_policer_ops;
> > +static const struct dpdk_qos_ops egress_pkts_policer_ops;
> >  static const struct dpdk_qos_ops trtcm_policer_ops;
> >
> >  /*
> > @@ -353,6 +356,7 @@ static const struct dpdk_qos_ops trtcm_policer_ops;
> >   */
> >  static const struct dpdk_qos_ops *const qos_confs[] = {
> >      &egress_policer_ops,
> > +    &egress_pkts_policer_ops,
> >      &trtcm_policer_ops,
> >      NULL
> >  };
> > @@ -2335,6 +2339,29 @@ srtcm_policer_run_single_packet(struct
> rte_meter_srtcm *meter,
> >      return cnt;
> >  }
> >
> > +static int
> > +pkts_policer_run_single_packet(struct token_bucket *tb, struct rte_mbuf
> **pkts,
> > +                               int pkt_cnt, bool should_steal)
> > +{
> > +    int cnt = 0;
> > +    struct rte_mbuf *pkt;
> 
> I know that existing code is written this way, but it's better to keep
> variable definitions in the reverse x-mass tree order, i.e. from the
> longest line to the shortest.

Yes, I will modify it.

> 
> > +
> > +    for (int i = 0; i < pkt_cnt; i++) {
> > +        pkt = pkts[i];
> > +        /* Handle current packet */
> 
> Period at the end of a sentence.
> 
> > +        if (token_bucket_withdraw(tb, 1000)) {
> 
> Why a 1000 and not 1 ?

By default tokens is added per millisecond.
Rate in packets/second, bucket 1/1000 packets.
That's to say.
1 token means 1/1000 packet.
1 packet means 1000 tokens.

> 
> > +            if (cnt != i) {
> > +                pkts[cnt] = pkt;
> > +            }
> > +            cnt++;
> > +        } else if (should_steal) {
> > +            rte_pktmbuf_free(pkt);
> > +        }
> > +    }
> 
> I wonder if it makes sense to have a per-packet granularity here. 
> I mean, what if we just call token_bucket_withdraw(tb, pkt_cnt)
> and drop the whole batch if it doesn't fit?
> It sounds unlikely that someone will care about extra 31 packets
> being dropped in a worst case scenario.  We're already dropping
> packets, it might be better to allow 1 batch in 32, instead of
> 1 packet from each of 32 batches (just an example).
> The single token_bucket_withdraw() and the batched mbuf free will
> be much more efficient.
> 
> What do you think?
> 

This code is reference to srtcm_policer_run_single_packet which have a per-packet check.
I think batched work will be more efficient, but the granularity here will be less inaccurate.
from my point of view, using per-packet check is more reasonable.

> > +
> > +    return cnt;
> > +}
> > +
> >  static int
> >  ingress_policer_run(struct ingress_policer *policer, struct rte_mbuf **pkts,
> >                      int pkt_cnt, bool should_steal)
> > @@ -4858,6 +4885,97 @@ static const struct dpdk_qos_ops
> egress_policer_ops = {
> >      .qos_run = egress_policer_run
> >  };
> >
> > +/* egress-pkts-policer details */
> 
> Period at the end.
> 
> > +
> 
> Empty line is unnecessary.
> 
> > +struct egress_pkts_policer {
> > +    struct qos_conf qos_conf;
> > +    struct token_bucket tb;
> > +};
> > +
> > +static int
> > +egress_pkts_policer_qos_construct(const struct smap *details,
> > +                                  struct qos_conf **conf)
> > +{
> > +    uint32_t rate, burst;
> > +    struct egress_pkts_policer *policer;
> > +
> > +    policer = xmalloc(sizeof *policer);
> > +    rate  = smap_get_uint(details, "pkts_rate", 0);
> > +    burst = smap_get_uint(details, "pkts_burst", 0);
> 
> We use kpkts as a vlue for ingress policing instead of pkts, should
> we use the same here?

OK, I will convert the value of unit to kilo to keep both side the same.
That's to say.
Rate in kilo packets/second, bucket 1 packets.
1 token means 1 packet.
1 packet means 1 tokens.

> 
> > +
> > +    /*
> > +     * Force to 0 if no rate specified,
> > +     * default to rate if burst is 0,
> > +     * else stick with user-specified value.
> > +     */
> > +    burst = (!rate ? 0 : !burst ? rate : burst);
> > +
> > +    qos_conf_init(&policer->qos_conf, &egress_pkts_policer_ops);
> > +    token_bucket_init(&policer->tb, rate, burst * 1000);
> 
> Why * 1000 ?

By default tokens is added per millisecond.
Rate in packets/second, bucket 1/1000 packets.
So we need to scale the burst value by a factor of 1000.

> 
> > +
> > +    *conf = &policer->qos_conf;
> > +
> > +    return 0;
> > +}
> > +
> > +static void
> > +egress_pkts_policer_qos_destruct(struct qos_conf *conf)
> > +{
> > +    struct egress_pkts_policer *policer =
> > +                CONTAINER_OF(conf, struct egress_pkts_policer,
> qos_conf);
> > +
> > +    free(policer);
> > +}
> > +
> > +static int
> > +egress_pkts_policer_qos_get(const struct qos_conf *conf, struct smap
> *details)
> > +{
> > +    struct egress_pkts_policer *policer =
> > +                CONTAINER_OF(conf, struct egress_pkts_policer,
> qos_conf);
> > +
> > +    smap_add_format(details, "pkts_rate", "%"PRIu32, policer->tb.rate);
> > +    smap_add_format(details, "pkts_burst", "%"PRIu32, policer->tb.burst);
> > +
> > +    return 0;
> > +}
> > +
> > +static bool
> > +egress_pkts_policer_qos_is_equal(const struct qos_conf *conf,
> > +                                 const struct smap *details)
> > +{
> > +    uint32_t rate, burst;
> > +    struct egress_pkts_policer *policer =
> > +        CONTAINER_OF(conf, struct egress_pkts_policer, qos_conf);
> 
> Reverse x-mass tree.

Yes, I will modify it.

> 
> > +
> > +    rate  = smap_get_uint(details, "pkts_rate", 0);
> > +    burst = smap_get_uint(details, "pkts_burst", 0);
> > +
> > +    return (policer->tb.rate == rate && policer->tb.burst == burst);
> > +}
> > +
> > +static int
> > +egress_pkts_policer_run(struct qos_conf *conf, struct rte_mbuf **pkts,
> > +                        int pkt_cnt, bool should_steal)
> > +{
> > +    int cnt = 0;
> > +    struct egress_pkts_policer *policer =
> > +                CONTAINER_OF(conf, struct egress_pkts_policer,
> qos_conf);
> 
> Reverse x-mass tree.

Yes, I will modify it.

> 
> > +
> > +    cnt = pkts_policer_run_single_packet(&policer->tb, pkts, pkt_cnt,
> > +                                         should_steal);
> > +
> > +    return cnt;
> > +}
> > +
> > +static const struct dpdk_qos_ops egress_pkts_policer_ops = {
> > +    .qos_name = "egress-pkts-policer",
> > +    .qos_construct = egress_pkts_policer_qos_construct,
> > +    .qos_destruct = egress_pkts_policer_qos_destruct,
> > +    .qos_get = egress_pkts_policer_qos_get,
> > +    .qos_is_equal = egress_pkts_policer_qos_is_equal,
> > +    .qos_run = egress_pkts_policer_run
> > +};
> > +
> >  /* trtcm-policer details */
> >
> >  struct trtcm_policer {
> > diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
> > index edb5eafa0..94bfe54eb 100644
> > --- a/vswitchd/vswitch.xml
> > +++ b/vswitchd/vswitch.xml
> > @@ -4791,6 +4791,19 @@ ovs-vsctl add-port br0 p0 -- set Interface p0
> type=patch options:peer=p1 \
> >            in performance will be noticed in terms of overall aggregate
> traffic
> >            throughput.
> >          </dd>
> > +        <dt><code>egress-pkts-policer</code></dt>
> > +        <dd>
> > +          A DPDK egress packet per second policer algorithm using the
> ovs
> > +          token bucket library. The token bucket library provides an
> > +          implementation which allows the policing of packets traffic. The
> > +          implementation in OVS essentially creates a single token bucket
> used
> > +          to police traffic.
> 
> I'd replace the text above with just:
> 
> "An egress packet per second policer for DPDK ports using
>  a simple tocken bucket algorithm."

Yes, I will modify it.

> >            It should be noted that when the token bucket is
> > +          configured as part of QoS there will be a performance overhead
> as the
> > +          token bucket itself will consume CPU cycles in order to police
> > +          traffic. These CPU cycles ordinarily are used for packet
> proccessing.
> > +          As such the drop in performance will be noticed in terms of
> overall
> > +          aggregate traffic throughput.
> > +        </dd>
> >          <dt><code>trtcm-policer</code></dt>
> >          <dd>
> >            A DPDK egress policer algorithm using RFC 4115's Two-Rate,
> > @@ -4896,6 +4909,25 @@ ovs-vsctl add-port br0 p0 -- set Interface p0
> type=patch options:peer=p1 \
> >        </column>
> >      </group>
> >
> > +    <group title="Configuration for egress-pkts-policer QoS">
> > +      <p>
> > +        <ref table="QoS"/> <ref table="QoS" column="type"/>
> > +        <code>egress-pkts-policer</code> provides egress pkts policing for
> > +        userspace port types with DPDK.
> > +
> > +        It has the following key-value pairs defined.
> > +      </p>
> > +
> > +      <column name="other_config" key="pkts_rate" type='{"type":
> "integer"}'>
> > +        The Packets Per Second (pps) represents the packet per second
> rate at
> > +        which the token bucket will be updated.
> > +      </column>
> > +      <column name="other_config" key="pkts_burst" type='{"type":
> "integer"}'>
> > +        The Packets Per Second Burst Size is measured in count and
> represents a
> > +        token bucket.
> > +      </column>
> 
> The available value range should be specified for these.
> 
> > +    </group>
> > +
> >      <group title="Configuration for linux-sfq">
> >        <p>
> >          The <code>linux-sfq</code> QoS supports the following key-value
> pairs:

Best regards, Huang Lin.
Ilya Maximets April 4, 2023, 5:22 p.m. UTC | #3
On 4/4/23 18:58, lin huang wrote:
> Hi Ilya,
> 
> Thanks for looking at this patch and for your feedback.
> There are some explanations about your comments.
> 
>> -----Original Message-----
>> From: Ilya Maximets <i.maximets@ovn.org>
>> Sent: Tuesday, April 4, 2023 10:58 PM
>> To: miterv@outlook.com; dev@openvswitch.org
>> Cc: i.maximets@ovn.org; Lin Huang <linhuang@ruijie.com.cn>
>> Subject: Re: [PATCH v1 1/2] netdev-dpdk: add netdev-dpdk egress pkts policer.
>>
>> On 4/2/23 11:30, miterv@outlook.com wrote:
>>> From: Lin Huang <linhuang@ruijie.com.cn>
>>>
>>> add netdev-dpdk egress pkts policer.
>>>
>>> Signed-off-by: Lin Huang <linhuang@ruijie.com.cn>
>>> ---
>>
>> Hi.  Thanks for the patch!  I didn't test this, but I have a few
>> general comments.
>>
>> First, please, add some of the relevant information that you posted
>> in reply to the cover letter into a commit message.  I.e. why the
>> feature is needed and how to configure it.
>>
>> Also, since it is a user-visible change, we need an entry in the
>> NEWS file for it.
> 
> Yes, I will add more information about my work later.
> 
>>
>> See some more comments inline.
>>
>> Best regards, Ilya Maximets.
>>
>>>  lib/netdev-dpdk.c    | 118
>> +++++++++++++++++++++++++++++++++++++++++++
>>>  vswitchd/vswitch.xml |  32 ++++++++++++
>>>  2 files changed, 150 insertions(+)
>>>
>>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
>>> index fb0dd43f7..6709c459c 100644
>>> --- a/lib/netdev-dpdk.c
>>> +++ b/lib/netdev-dpdk.c
>>> @@ -19,6 +19,7 @@
>>>
>>>  #include <errno.h>
>>>  #include <signal.h>
>>> +#include <stdint.h>
>>>  #include <stdlib.h>
>>>  #include <string.h>
>>>  #include <unistd.h>
>>> @@ -59,6 +60,7 @@
>>>  #include "openvswitch/ofp-parse.h"
>>>  #include "openvswitch/ofp-print.h"
>>>  #include "openvswitch/shash.h"
>>> +#include "openvswitch/token-bucket.h"
>>>  #include "openvswitch/vlog.h"
>>>  #include "ovs-numa.h"
>>>  #include "ovs-rcu.h"
>>> @@ -345,6 +347,7 @@ struct dpdk_qos_ops {
>>>
>>>  /* dpdk_qos_ops for each type of user space QoS implementation. */
>>>  static const struct dpdk_qos_ops egress_policer_ops;
>>> +static const struct dpdk_qos_ops egress_pkts_policer_ops;
>>>  static const struct dpdk_qos_ops trtcm_policer_ops;
>>>
>>>  /*
>>> @@ -353,6 +356,7 @@ static const struct dpdk_qos_ops trtcm_policer_ops;
>>>   */
>>>  static const struct dpdk_qos_ops *const qos_confs[] = {
>>>      &egress_policer_ops,
>>> +    &egress_pkts_policer_ops,
>>>      &trtcm_policer_ops,
>>>      NULL
>>>  };
>>> @@ -2335,6 +2339,29 @@ srtcm_policer_run_single_packet(struct
>> rte_meter_srtcm *meter,
>>>      return cnt;
>>>  }
>>>
>>> +static int
>>> +pkts_policer_run_single_packet(struct token_bucket *tb, struct rte_mbuf
>> **pkts,
>>> +                               int pkt_cnt, bool should_steal)
>>> +{
>>> +    int cnt = 0;
>>> +    struct rte_mbuf *pkt;
>>
>> I know that existing code is written this way, but it's better to keep
>> variable definitions in the reverse x-mass tree order, i.e. from the
>> longest line to the shortest.
> 
> Yes, I will modify it.
> 
>>
>>> +
>>> +    for (int i = 0; i < pkt_cnt; i++) {
>>> +        pkt = pkts[i];
>>> +        /* Handle current packet */
>>
>> Period at the end of a sentence.
>>
>>> +        if (token_bucket_withdraw(tb, 1000)) {
>>
>> Why a 1000 and not 1 ?
> 
> By default tokens is added per millisecond.
> Rate in packets/second, bucket 1/1000 packets.
> That's to say.
> 1 token means 1/1000 packet.
> 1 packet means 1000 tokens.

Oh, I see.  I didn't read the token-bucket implementation carefully
enough.  It might make sense to have a comment about this at the
point of initialization.  It might be less important if we're going
to use kilo- values, but still might be worth mentioning that
"kilo-packets per second" equals "packets per millisecond" that the
token bucket is using, to avoid potential confusion.

> 
>>
>>> +            if (cnt != i) {
>>> +                pkts[cnt] = pkt;
>>> +            }
>>> +            cnt++;
>>> +        } else if (should_steal) {
>>> +            rte_pktmbuf_free(pkt);
>>> +        }
>>> +    }
>>
>> I wonder if it makes sense to have a per-packet granularity here. 
>> I mean, what if we just call token_bucket_withdraw(tb, pkt_cnt)
>> and drop the whole batch if it doesn't fit?
>> It sounds unlikely that someone will care about extra 31 packets
>> being dropped in a worst case scenario.  We're already dropping
>> packets, it might be better to allow 1 batch in 32, instead of
>> 1 packet from each of 32 batches (just an example).
>> The single token_bucket_withdraw() and the batched mbuf free will
>> be much more efficient.
>>
>> What do you think?
>>
> 
> This code is reference to srtcm_policer_run_single_packet which have a per-packet check.
> I think batched work will be more efficient, but the granularity here will be less inaccurate.
> from my point of view, using per-packet check is more reasonable.

Sure.  I don't have a strong opinion here.  The only thing that
bothers me is that we may end up calling time_msec() per packet,
if the bucket is empty.  That might cost us some performance.

Best regards, Ilya Maximets.
diff mbox series

Patch

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index fb0dd43f7..6709c459c 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -19,6 +19,7 @@ 
 
 #include <errno.h>
 #include <signal.h>
+#include <stdint.h>
 #include <stdlib.h>
 #include <string.h>
 #include <unistd.h>
@@ -59,6 +60,7 @@ 
 #include "openvswitch/ofp-parse.h"
 #include "openvswitch/ofp-print.h"
 #include "openvswitch/shash.h"
+#include "openvswitch/token-bucket.h"
 #include "openvswitch/vlog.h"
 #include "ovs-numa.h"
 #include "ovs-rcu.h"
@@ -345,6 +347,7 @@  struct dpdk_qos_ops {
 
 /* dpdk_qos_ops for each type of user space QoS implementation. */
 static const struct dpdk_qos_ops egress_policer_ops;
+static const struct dpdk_qos_ops egress_pkts_policer_ops;
 static const struct dpdk_qos_ops trtcm_policer_ops;
 
 /*
@@ -353,6 +356,7 @@  static const struct dpdk_qos_ops trtcm_policer_ops;
  */
 static const struct dpdk_qos_ops *const qos_confs[] = {
     &egress_policer_ops,
+    &egress_pkts_policer_ops,
     &trtcm_policer_ops,
     NULL
 };
@@ -2335,6 +2339,29 @@  srtcm_policer_run_single_packet(struct rte_meter_srtcm *meter,
     return cnt;
 }
 
+static int
+pkts_policer_run_single_packet(struct token_bucket *tb, struct rte_mbuf **pkts,
+                               int pkt_cnt, bool should_steal)
+{
+    int cnt = 0;
+    struct rte_mbuf *pkt;
+
+    for (int i = 0; i < pkt_cnt; i++) {
+        pkt = pkts[i];
+        /* Handle current packet */
+        if (token_bucket_withdraw(tb, 1000)) {
+            if (cnt != i) {
+                pkts[cnt] = pkt;
+            }
+            cnt++;
+        } else if (should_steal) {
+            rte_pktmbuf_free(pkt);
+        }
+    }
+
+    return cnt;
+}
+
 static int
 ingress_policer_run(struct ingress_policer *policer, struct rte_mbuf **pkts,
                     int pkt_cnt, bool should_steal)
@@ -4858,6 +4885,97 @@  static const struct dpdk_qos_ops egress_policer_ops = {
     .qos_run = egress_policer_run
 };
 
+/* egress-pkts-policer details */
+
+struct egress_pkts_policer {
+    struct qos_conf qos_conf;
+    struct token_bucket tb;
+};
+
+static int
+egress_pkts_policer_qos_construct(const struct smap *details,
+                                  struct qos_conf **conf)
+{
+    uint32_t rate, burst;
+    struct egress_pkts_policer *policer;
+
+    policer = xmalloc(sizeof *policer);
+    rate  = smap_get_uint(details, "pkts_rate", 0);
+    burst = smap_get_uint(details, "pkts_burst", 0);
+
+    /*
+     * Force to 0 if no rate specified,
+     * default to rate if burst is 0,
+     * else stick with user-specified value.
+     */
+    burst = (!rate ? 0 : !burst ? rate : burst);
+
+    qos_conf_init(&policer->qos_conf, &egress_pkts_policer_ops);
+    token_bucket_init(&policer->tb, rate, burst * 1000);
+
+    *conf = &policer->qos_conf;
+
+    return 0;
+}
+
+static void
+egress_pkts_policer_qos_destruct(struct qos_conf *conf)
+{
+    struct egress_pkts_policer *policer =
+                CONTAINER_OF(conf, struct egress_pkts_policer, qos_conf);
+
+    free(policer);
+}
+
+static int
+egress_pkts_policer_qos_get(const struct qos_conf *conf, struct smap *details)
+{
+    struct egress_pkts_policer *policer =
+                CONTAINER_OF(conf, struct egress_pkts_policer, qos_conf);
+
+    smap_add_format(details, "pkts_rate", "%"PRIu32, policer->tb.rate);
+    smap_add_format(details, "pkts_burst", "%"PRIu32, policer->tb.burst);
+
+    return 0;
+}
+
+static bool
+egress_pkts_policer_qos_is_equal(const struct qos_conf *conf,
+                                 const struct smap *details)
+{
+    uint32_t rate, burst;
+    struct egress_pkts_policer *policer =
+        CONTAINER_OF(conf, struct egress_pkts_policer, qos_conf);
+
+    rate  = smap_get_uint(details, "pkts_rate", 0);
+    burst = smap_get_uint(details, "pkts_burst", 0);
+
+    return (policer->tb.rate == rate && policer->tb.burst == burst);
+}
+
+static int
+egress_pkts_policer_run(struct qos_conf *conf, struct rte_mbuf **pkts,
+                        int pkt_cnt, bool should_steal)
+{
+    int cnt = 0;
+    struct egress_pkts_policer *policer =
+                CONTAINER_OF(conf, struct egress_pkts_policer, qos_conf);
+
+    cnt = pkts_policer_run_single_packet(&policer->tb, pkts, pkt_cnt,
+                                         should_steal);
+
+    return cnt;
+}
+
+static const struct dpdk_qos_ops egress_pkts_policer_ops = {
+    .qos_name = "egress-pkts-policer",
+    .qos_construct = egress_pkts_policer_qos_construct,
+    .qos_destruct = egress_pkts_policer_qos_destruct,
+    .qos_get = egress_pkts_policer_qos_get,
+    .qos_is_equal = egress_pkts_policer_qos_is_equal,
+    .qos_run = egress_pkts_policer_run
+};
+
 /* trtcm-policer details */
 
 struct trtcm_policer {
diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
index edb5eafa0..94bfe54eb 100644
--- a/vswitchd/vswitch.xml
+++ b/vswitchd/vswitch.xml
@@ -4791,6 +4791,19 @@  ovs-vsctl add-port br0 p0 -- set Interface p0 type=patch options:peer=p1 \
           in performance will be noticed in terms of overall aggregate traffic
           throughput.
         </dd>
+        <dt><code>egress-pkts-policer</code></dt>
+        <dd>
+          A DPDK egress packet per second policer algorithm using the ovs
+          token bucket library. The token bucket library provides an
+          implementation which allows the policing of packets traffic. The
+          implementation in OVS essentially creates a single token bucket used
+          to police traffic. It should be noted that when the token bucket is
+          configured as part of QoS there will be a performance overhead as the
+          token bucket itself will consume CPU cycles in order to police
+          traffic. These CPU cycles ordinarily are used for packet proccessing.
+          As such the drop in performance will be noticed in terms of overall
+          aggregate traffic throughput.
+        </dd>
         <dt><code>trtcm-policer</code></dt>
         <dd>
           A DPDK egress policer algorithm using RFC 4115's Two-Rate,
@@ -4896,6 +4909,25 @@  ovs-vsctl add-port br0 p0 -- set Interface p0 type=patch options:peer=p1 \
       </column>
     </group>
 
+    <group title="Configuration for egress-pkts-policer QoS">
+      <p>
+        <ref table="QoS"/> <ref table="QoS" column="type"/>
+        <code>egress-pkts-policer</code> provides egress pkts policing for
+        userspace port types with DPDK.
+
+        It has the following key-value pairs defined.
+      </p>
+
+      <column name="other_config" key="pkts_rate" type='{"type": "integer"}'>
+        The Packets Per Second (pps) represents the packet per second rate at
+        which the token bucket will be updated.
+      </column>
+      <column name="other_config" key="pkts_burst" type='{"type": "integer"}'>
+        The Packets Per Second Burst Size is measured in count and represents a
+        token bucket.
+      </column>
+    </group>
+
     <group title="Configuration for linux-sfq">
       <p>
         The <code>linux-sfq</code> QoS supports the following key-value pairs: