[ovs-dev] Upcall/Slowpath rate limiter for OVS

Message ID D8F4A9E9-50FF-4DF7-81C2-2D6C7F62F6E7@ericsson.com
State New
Headers show
Series
  • [ovs-dev] Upcall/Slowpath rate limiter for OVS
Related show

Commit Message

Manohar Krishnappa Chidambaraswamy Nov. 10, 2017, 1:34 p.m.
In OVS-DPDK, both fast-path and slow-path execute in the context of a common
thread (i.e, PMD thread), without any partitioning of CPU cycles between the
two. When there is a burst of new flows coming into the data-path, packets
are punted to slow-path in the order they are received and the PMD is busy
for the duration of the upcall. Slow-path processing of a packet consumes
100-200 times the cycles of fast-path handling. As a result, the forwarding
performance of a PMD degrades significantly during an upcall burst. If the
PMD was highly loaded already, it becomes temporarily overloaded and its rx
queues start filling up. If the upcall burst is long enough, packets will be
dropped when rx queues are full. This happens even if the new flows are
unexpected and the slow-path decides to drop the packets.

It is likely that most of the packets dropped due to rx queue overflow belong
to established flows that should have been processed by the fast-path. Hence,
the current OVS-DPDK architecture favors the handling of new flows over the
forwarding of established flows. This is generally a sub-optimal approach.

Without a limit to the rate of upcalls, OVS-DPDK is vulnerable for DoS attacks.
But even sporadic bursts of e.g. unexpected multicast packets have shown to
cause such packet drops.

Proposed solution:
--
1.9.1

Comments

Manohar Krishnappa Chidambaraswamy Nov. 17, 2017, 5:47 a.m. | #1
Hi,

Does anyone have any comments on this add-on to upcalls?

Thanx
Manu

On 10/11/17, 7:06 PM, "ovs-dev-bounces@openvswitch.org on behalf of Manohar Krishnappa Chidambaraswamy" <ovs-dev-bounces@openvswitch.org on behalf of manohar.krishnappa.chidambaraswamy@ericsson.com> wrote:

    In OVS-DPDK, both fast-path and slow-path execute in the context of a common
    thread (i.e, PMD thread), without any partitioning of CPU cycles between the
    two. When there is a burst of new flows coming into the data-path, packets
    are punted to slow-path in the order they are received and the PMD is busy
    for the duration of the upcall. Slow-path processing of a packet consumes
    100-200 times the cycles of fast-path handling. As a result, the forwarding
    performance of a PMD degrades significantly during an upcall burst. If the
    PMD was highly loaded already, it becomes temporarily overloaded and its rx
    queues start filling up. If the upcall burst is long enough, packets will be
    dropped when rx queues are full. This happens even if the new flows are
    unexpected and the slow-path decides to drop the packets.
    
    It is likely that most of the packets dropped due to rx queue overflow belong
    to established flows that should have been processed by the fast-path. Hence,
    the current OVS-DPDK architecture favors the handling of new flows over the
    forwarding of established flows. This is generally a sub-optimal approach.
    
    Without a limit to the rate of upcalls, OVS-DPDK is vulnerable for DoS attacks.
    But even sporadic bursts of e.g. unexpected multicast packets have shown to
    cause such packet drops.
    
    Proposed solution:
    ------------------
    This patch implements a mechanism to limit the rate of packets going into
    slow-path from fast-path in OVS-DPDK mode. A simple token bucket policer per
    packet processing thread (either PMD or non-PMD thread) restricts the flow
    of packets funneling from fast-path into slow-path, as shown below. So for
    each PMD (or non-PMD) thread, there is a limited amount of cycles carved out
    for its slow-path. This Upcall Policer allows only configured number of
    packets per second (pps) into handle_packet_upcall() which identifies the
    start of slow-path in OVS-DPDK. A packet entering slow-path has to take a
    token to get into slow-path and if no tokens are available, the packet is
    dropped and accounted per-PMD (or thread) under
    "ovs-appctl dpif-netdev/pmd-stats-show".
    
    pmd thread numa_id 0 core_id 2:
            emc hits:0
            megaflow hits:0
            avg. subtable lookups per hit:0.00
            miss:287572
            rate limit drops:xxxxx <<<<<<<<<<<<<
            lost:0
            idle cycles:14925072116 (43.81%)
            processing cycles:19140112904 (56.19%)
            avg cycles per packet: 118457.93 (34065185020/287572)
            avg processing cycles per packet: 66557.64 (19140112904/287572)
    NOTE: This is a sample output and may need adaptation based on new
    changes/proposals (if any) in progress.
    
    The upcall policer can be enabled and configured with the following
    parameters in the "Open_vSwitch" table as new items under other_config.
    These values are common for all packet processing threads (PMD or non-PMD),
    with each thread using the same configured value independently.
    
    1. ovs-vsctl set Open_vSwitch . other_config:upcall-rl=true
        - Global knob to enable/disable upcall ratelimiting for all non-PMD/PMD
          threads.
    2. ovs-vsctl set Open_vSwitch . other_config:upcall-rate=xxx
        - xxx is in packets per second (pps). This determines the token bucket's
          fill-rate.
    3. ovs-vsctl set Open_vSwitch . other_config:upcall-burst=xxx
        - xxx is the maximum burst of packets allowed at any time. This determines
          the token bucket's burst size.
    
    By default, this feature is disabled for backward compatibility and needs to be
    explicitly enabled (via global knob upcall-rl shown above). When enabled,
    default values (that are derived based on typical slow-path and fast-path
    performance measurements) will be used during init and can be overridden by the
    above knob/commands. Configured values for rate and burst would take effect only
    when the feature is enabled.
    
    The patch is based on an existing token bucket implementation in OVS.
    Signed-off-by: Manohar K C <manohar.krishnappa.chidambaraswamy@ericsson.com>
    CC: Jan Scheurich jan.scheurich@ericsson.com
    ---
     Documentation/howto/dpdk.rst | 21 +++++++++++
     lib/dpif-netdev.c            | 85 +++++++++++++++++++++++++++++++++++++++++---
     vswitchd/vswitch.xml         | 47 ++++++++++++++++++++++++
     3 files changed, 148 insertions(+), 5 deletions(-)
    
    diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst
    index d123819..2cd0209 100644
    --- a/Documentation/howto/dpdk.rst
    +++ b/Documentation/howto/dpdk.rst
    @@ -709,3 +709,24 @@ devices to bridge ``br0``. Once complete, follow the below steps:
        Check traffic on multiple queues::
    
            $ cat /proc/interrupts | grep virtio
    +
    +Upcall rate limiting
    +--------------------
    +ovs-vsctl can be used to enable and configure upcall rate limit parameters.
    +There are 2 configurable values ``upcall-rate`` and ``upcall-burst`` which
    +take effect when global enable knob ``upcall-rl`` is set to true.
    +
    +Upcall rate should be set using ``upcall-rate`` in packets-per-sec. For
    +example::
    +
    +    $ ovs-vsctl set Open_vSwitch . other_config:upcall-rate=2000
    +
    +Upcall burst should be set using ``upcall-burst`` in packets-per-sec. For
    +example::
    +
    +    $ ovs-vsctl set Open_vSwitch . other_config:upcall-burst=2000
    +
    +Upcall ratelimit feature should be globally enabled using ``upcall-rl``. For
    +example::
    +
    +    $ ovs-vsctl set Open_vSwitch . other_config:upcall-rl=true
    diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
    index 599308d..b26fbbb 100644
    --- a/lib/dpif-netdev.c
    +++ b/lib/dpif-netdev.c
    @@ -100,6 +100,16 @@ static struct shash dp_netdevs OVS_GUARDED_BY(dp_netdev_mutex)
    
     static struct vlog_rate_limit upcall_rl = VLOG_RATE_LIMIT_INIT(600, 600);
    
    +/* Upcall rate-limit parameters */
    +static bool upcall_ratelimit;
    +static unsigned int upcall_rate;
    +static unsigned int upcall_burst;
    +
    +/* TODO: Tune these default values based on upcall perf test */
    +#define UPCALL_RATELIMIT_DEFAULT false /* Disabled by default */
    +#define UPCALL_RATE_DEFAULT      1000  /* pps */
    +#define UPCALL_BURST_DEFAULT     1000  /* pps */
    +
     #define DP_NETDEV_CS_SUPPORTED_MASK (CS_NEW | CS_ESTABLISHED | CS_RELATED \
                                          | CS_INVALID | CS_REPLY_DIR | CS_TRACKED \
                                          | CS_SRC_NAT | CS_DST_NAT)
    @@ -337,6 +347,7 @@ enum dp_stat_type {
         DP_STAT_LOST,               /* Packets not passed up to the client. */
         DP_STAT_LOOKUP_HIT,         /* Number of subtable lookups for flow table
                                        hits */
    +    DP_STAT_RATELIMIT_DROP,     /* Packets dropped due to upcall policer */
         DP_N_STATS
     };
    
    @@ -651,6 +662,11 @@ struct dp_netdev_pmd_thread {
             uint64_t cycles_zero[PMD_N_CYCLES];
             /* 8 pad bytes. */
         );
    +
    +    PADDED_MEMBERS(CACHE_LINE_SIZE,
    +        /* Policer to rate limit slow-path */
    +        struct token_bucket upcall_tb;
    +    );
     };
    
     /* Interface to netdev-based datapath. */
    @@ -864,12 +880,13 @@ pmd_info_show_stats(struct ds *reply,
         ds_put_format(reply,
                       "\temc hits:%llu\n\tmegaflow hits:%llu\n"
                       "\tavg. subtable lookups per hit:%.2f\n"
    -                  "\tmiss:%llu\n\tlost:%llu\n",
    +                  "\tmiss:%llu\n\tlost:%llu\n\tratelimit drops:%llu\n",
                       stats[DP_STAT_EXACT_HIT], stats[DP_STAT_MASKED_HIT],
                       stats[DP_STAT_MASKED_HIT] > 0
                       ? (1.0*stats[DP_STAT_LOOKUP_HIT])/stats[DP_STAT_MASKED_HIT]
                       : 0,
    -                  stats[DP_STAT_MISS], stats[DP_STAT_LOST]);
    +                  stats[DP_STAT_MISS], stats[DP_STAT_LOST],
    +                  stats[DP_STAT_RATELIMIT_DROP]);
    
         if (total_cycles == 0) {
             return;
    @@ -2996,6 +3013,8 @@ dpif_netdev_set_config(struct dpif *dpif, const struct smap *other_config)
             smap_get_ullong(other_config, "emc-insert-inv-prob",
                             DEFAULT_EM_FLOW_INSERT_INV_PROB);
         uint32_t insert_min, cur_min;
    +    unsigned int rate, burst;
    +    bool ratelimit;
    
         if (!nullable_string_is_equal(dp->pmd_cmask, cmask)) {
             free(dp->pmd_cmask);
    @@ -3021,6 +3040,36 @@ dpif_netdev_set_config(struct dpif *dpif, const struct smap *other_config)
             }
         }
    
    +    /* Handle upcall policer params */
    +    ratelimit = smap_get_bool(other_config, "upcall-rl",
    +                              UPCALL_RATELIMIT_DEFAULT);
    +    rate = smap_get_int(other_config, "upcall-rate",
    +                        UPCALL_RATE_DEFAULT);
    +    burst = smap_get_int(other_config, "upcall-burst",
    +                         UPCALL_BURST_DEFAULT);
    +
    +    if ((rate != upcall_rate) || (burst != upcall_burst)) {
    +        VLOG_INFO("Upcall ratelimit params changed : Old - rate=%d burst=%d "
    +                  ": New - rate=%d burst=%d\n", upcall_rate, upcall_burst,
    +                  rate, burst);
    +
    +        upcall_rate = rate;
    +        upcall_burst = burst;
    +
    +        /*
    +         * TODO: See if there is a way to reconfig only the policer
    +         * in each PMD.
    +         */
    +        dp_netdev_request_reconfigure(dp);
    +    }
    +
    +    if (ratelimit != upcall_ratelimit) {
    +        upcall_ratelimit = ratelimit;
    +
    +        VLOG_INFO("Upcall ratelimit changed to %s\n",
    +                  (upcall_ratelimit ? "Enabled" : "Disabled"));
    +    }
    +
         return 0;
     }
    @@ -3879,6 +3928,12 @@ dpif_netdev_run(struct dpif *dpif)
         ovs_mutex_lock(&dp->port_mutex);
         non_pmd = dp_netdev_get_pmd(dp, NON_PMD_CORE_ID);
         if (non_pmd) {
    +        /* Reconfig the upcall policer if params have changed */
    +        if ((upcall_rate != non_pmd->upcall_tb.rate) ||
    +            (upcall_burst != non_pmd->upcall_tb.burst)) {
    +            token_bucket_init(&non_pmd->upcall_tb, upcall_rate, upcall_burst);
    +        }
    +
             ovs_mutex_lock(&dp->non_pmd_mutex);
             cycles_count_start(non_pmd);
             HMAP_FOR_EACH (port, node, &dp->ports) {
    @@ -4074,6 +4129,9 @@ reload:
             lc = UINT_MAX;
         }
    
    +    /* Initialize upcall policer token bucket with configured params */
    +    token_bucket_init(&pmd->upcall_tb, upcall_rate, upcall_burst);
    +
         cycles_count_start(pmd);
         for (;;) {
             for (i = 0; i < poll_cnt; i++) {
    @@ -4554,6 +4612,10 @@ dp_netdev_configure_pmd(struct dp_netdev_pmd_thread *pmd, struct dp_netdev *dp,
             emc_cache_init(&pmd->flow_cache);
             pmd_alloc_static_tx_qid(pmd);
         }
    +
    +    /* Initialize upcall policer token bucket with configured params */
    +    token_bucket_init(&pmd->upcall_tb, upcall_rate, upcall_burst);
    +
         cmap_insert(&dp->poll_threads, CONST_CAST(struct cmap_node *, &pmd->node),
                     hash_int(core_id, 0));
     }
    @@ -4992,7 +5054,7 @@ handle_packet_upcall(struct dp_netdev_pmd_thread *pmd,
                          struct dp_packet *packet,
                          const struct netdev_flow_key *key,
                          struct ofpbuf *actions, struct ofpbuf *put_actions,
    -                     int *lost_cnt, long long now)
    +                     int *lost_cnt, int *rl_drop_cnt, long long now)
     {
         struct ofpbuf *add_actions;
         struct dp_packet_batch b;
    @@ -5000,6 +5062,18 @@ handle_packet_upcall(struct dp_netdev_pmd_thread *pmd,
         ovs_u128 ufid;
         int error;
    
    +    /*
    +     * Grab a token from the upcall policer to enter slowpath. If token
    +     * is not available, drop and account the packet. This is to
    +     * rate-limit packets getting into slowpath.
    +     */
    +    if (upcall_ratelimit && !token_bucket_withdraw(&pmd->upcall_tb, 1)) {
    +        dp_packet_delete(packet);
    +        (*rl_drop_cnt)++;
    +
    +        return;
    +    }
    +
         match.tun_md.valid = false;
         miniflow_expand(&key->mf, &match.flow);
    
    @@ -5074,7 +5148,7 @@ fast_path_processing(struct dp_netdev_pmd_thread *pmd,
         struct dpcls *cls;
         struct dpcls_rule *rules[PKT_ARRAY_SIZE];
         struct dp_netdev *dp = pmd->dp;
    -    int miss_cnt = 0, lost_cnt = 0;
    +    int miss_cnt = 0, lost_cnt = 0, rl_drop_cnt = 0;
         int lookup_cnt = 0, add_lookup_cnt;
         bool any_miss;
         size_t i;
    @@ -5118,7 +5192,7 @@ fast_path_processing(struct dp_netdev_pmd_thread *pmd,
    
                 miss_cnt++;
                 handle_packet_upcall(pmd, packet, &keys[i], &actions,
    -                                 &put_actions, &lost_cnt, now);
    +                                 &put_actions, &lost_cnt, &rl_drop_cnt, now);
             }
    
             ofpbuf_uninit(&actions);
    @@ -5151,6 +5225,7 @@ fast_path_processing(struct dp_netdev_pmd_thread *pmd,
         dp_netdev_count_packet(pmd, DP_STAT_LOOKUP_HIT, lookup_cnt);
         dp_netdev_count_packet(pmd, DP_STAT_MISS, miss_cnt);
         dp_netdev_count_packet(pmd, DP_STAT_LOST, lost_cnt);
    +    dp_netdev_count_packet(pmd, DP_STAT_RATELIMIT_DROP, rl_drop_cnt);
     }
    
     /* Packets enter the datapath from a port (or from recirculation) here.
    diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
    index c145e1a..3d86367 100644
    --- a/vswitchd/vswitch.xml
    +++ b/vswitchd/vswitch.xml
    @@ -397,6 +397,53 @@
             </p>
           </column>
    
    +      <column name="other_config" key="upcall-rl"
    +              type='{"type": "boolean"}'>
    +        <p>
    +          Set this value to <code>true</code> to enable upcall rate-limiting.
    +          The upcall parameters like rate and burst will be ignored, if this is
    +          not set.
    +        </p>
    +        <p>
    +          The default value is <code>false</code> and upcall rate-limiting will
    +          be disabled.
    +        </p>
    +      </column>
    +
    +      <column name="other_config" key="upcall-rate"
    +        type='{"type": "integer", "minInteger": 0, "maxInteger": 4294967295}'>
    +        <p>
    +          Specifies the rate of upcalls in packets-per-second that is to be
    +          allowed. For example, if the value is 10000, then those many upcalls
    +          (for packets) are allowed per second in each of the packet polling
    +          thread (PMD or non-PMD).
    +        </p>
    +        <p>
    +          A value of <code>0</code> means, no upcalls would be allowed i.e,
    +          upcall will be disabled. This is mainly for debugging.
    +        </p>
    +        <p>
    +          The default value is 1000.
    +        </p>
    +      </column>
    +
    +      <column name="other_config" key="upcall-burst"
    +        type='{"type": "integer", "minInteger": 0, "maxInteger": 4294967295}'>
    +        <p>
    +          Specifies the maximum burst of upcalls in packets-per-second that is
    +          to be allowed. For example, if the value is 15000, then a maximum
    +          burst of 15000 upcalls (for packets) are allowed per second in each
    +          of the packet polling thread (PMD or non-PMD).
    +        </p>
    +        <p>
    +          A value of <code>0</code> means, no upcalls would be allowed i.e,
    +          upcall will be disabled. This is mainly for debugging.
    +        </p>
    +        <p>
    +          The default value is 1000.
    +        </p>
    +      </column>
    +
           <column name="other_config" key="vlan-limit"
                   type='{"type": "integer", "minInteger": 0}'>
             <p>
    --
    1.9.1
    
    _______________________________________________
    dev mailing list
    dev@openvswitch.org
    https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Patch

==================
This patch implements a mechanism to limit the rate of packets going into
slow-path from fast-path in OVS-DPDK mode. A simple token bucket policer per
packet processing thread (either PMD or non-PMD thread) restricts the flow
of packets funneling from fast-path into slow-path, as shown below. So for
each PMD (or non-PMD) thread, there is a limited amount of cycles carved out
for its slow-path. This Upcall Policer allows only configured number of
packets per second (pps) into handle_packet_upcall() which identifies the
start of slow-path in OVS-DPDK. A packet entering slow-path has to take a
token to get into slow-path and if no tokens are available, the packet is
dropped and accounted per-PMD (or thread) under
"ovs-appctl dpif-netdev/pmd-stats-show".

pmd thread numa_id 0 core_id 2:
        emc hits:0
        megaflow hits:0
        avg. subtable lookups per hit:0.00
        miss:287572
        rate limit drops:xxxxx <<<<<<<<<<<<<
        lost:0
        idle cycles:14925072116 (43.81%)
        processing cycles:19140112904 (56.19%)
        avg cycles per packet: 118457.93 (34065185020/287572)
        avg processing cycles per packet: 66557.64 (19140112904/287572)
NOTE: This is a sample output and may need adaptation based on new
changes/proposals (if any) in progress.

The upcall policer can be enabled and configured with the following
parameters in the "Open_vSwitch" table as new items under other_config.
These values are common for all packet processing threads (PMD or non-PMD),
with each thread using the same configured value independently.

1. ovs-vsctl set Open_vSwitch . other_config:upcall-rl=true
    - Global knob to enable/disable upcall ratelimiting for all non-PMD/PMD
      threads.
2. ovs-vsctl set Open_vSwitch . other_config:upcall-rate=xxx
    - xxx is in packets per second (pps). This determines the token bucket's
      fill-rate.
3. ovs-vsctl set Open_vSwitch . other_config:upcall-burst=xxx
    - xxx is the maximum burst of packets allowed at any time. This determines
      the token bucket's burst size.

By default, this feature is disabled for backward compatibility and needs to be
explicitly enabled (via global knob upcall-rl shown above). When enabled,
default values (that are derived based on typical slow-path and fast-path
performance measurements) will be used during init and can be overridden by the
above knob/commands. Configured values for rate and burst would take effect only
when the feature is enabled.

The patch is based on an existing token bucket implementation in OVS.
Signed-off-by: Manohar K C <manohar.krishnappa.chidambaraswamy@ericsson.com>
CC: Jan Scheurich jan.scheurich@ericsson.com
---
 Documentation/howto/dpdk.rst | 21 +++++++++++
 lib/dpif-netdev.c            | 85 +++++++++++++++++++++++++++++++++++++++++---
 vswitchd/vswitch.xml         | 47 ++++++++++++++++++++++++
 3 files changed, 148 insertions(+), 5 deletions(-)

diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst
index d123819..2cd0209 100644
--- a/Documentation/howto/dpdk.rst
+++ b/Documentation/howto/dpdk.rst
@@ -709,3 +709,24 @@  devices to bridge ``br0``. Once complete, follow the below steps:
    Check traffic on multiple queues::

        $ cat /proc/interrupts | grep virtio
+
+Upcall rate limiting
+--------------------
+ovs-vsctl can be used to enable and configure upcall rate limit parameters.
+There are 2 configurable values ``upcall-rate`` and ``upcall-burst`` which
+take effect when global enable knob ``upcall-rl`` is set to true.
+
+Upcall rate should be set using ``upcall-rate`` in packets-per-sec. For
+example::
+
+    $ ovs-vsctl set Open_vSwitch . other_config:upcall-rate=2000
+
+Upcall burst should be set using ``upcall-burst`` in packets-per-sec. For
+example::
+
+    $ ovs-vsctl set Open_vSwitch . other_config:upcall-burst=2000
+
+Upcall ratelimit feature should be globally enabled using ``upcall-rl``. For
+example::
+
+    $ ovs-vsctl set Open_vSwitch . other_config:upcall-rl=true
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 599308d..b26fbbb 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -100,6 +100,16 @@  static struct shash dp_netdevs OVS_GUARDED_BY(dp_netdev_mutex)

 static struct vlog_rate_limit upcall_rl = VLOG_RATE_LIMIT_INIT(600, 600);

+/* Upcall rate-limit parameters */
+static bool upcall_ratelimit;
+static unsigned int upcall_rate;
+static unsigned int upcall_burst;
+
+/* TODO: Tune these default values based on upcall perf test */
+#define UPCALL_RATELIMIT_DEFAULT false /* Disabled by default */
+#define UPCALL_RATE_DEFAULT      1000  /* pps */
+#define UPCALL_BURST_DEFAULT     1000  /* pps */
+
 #define DP_NETDEV_CS_SUPPORTED_MASK (CS_NEW | CS_ESTABLISHED | CS_RELATED \
                                      | CS_INVALID | CS_REPLY_DIR | CS_TRACKED \
                                      | CS_SRC_NAT | CS_DST_NAT)
@@ -337,6 +347,7 @@  enum dp_stat_type {
     DP_STAT_LOST,               /* Packets not passed up to the client. */
     DP_STAT_LOOKUP_HIT,         /* Number of subtable lookups for flow table
                                    hits */
+    DP_STAT_RATELIMIT_DROP,     /* Packets dropped due to upcall policer */
     DP_N_STATS
 };

@@ -651,6 +662,11 @@  struct dp_netdev_pmd_thread {
         uint64_t cycles_zero[PMD_N_CYCLES];
         /* 8 pad bytes. */
     );
+
+    PADDED_MEMBERS(CACHE_LINE_SIZE,
+        /* Policer to rate limit slow-path */
+        struct token_bucket upcall_tb;
+    );
 };

 /* Interface to netdev-based datapath. */
@@ -864,12 +880,13 @@  pmd_info_show_stats(struct ds *reply,
     ds_put_format(reply,
                   "\temc hits:%llu\n\tmegaflow hits:%llu\n"
                   "\tavg. subtable lookups per hit:%.2f\n"
-                  "\tmiss:%llu\n\tlost:%llu\n",
+                  "\tmiss:%llu\n\tlost:%llu\n\tratelimit drops:%llu\n",
                   stats[DP_STAT_EXACT_HIT], stats[DP_STAT_MASKED_HIT],
                   stats[DP_STAT_MASKED_HIT] > 0
                   ? (1.0*stats[DP_STAT_LOOKUP_HIT])/stats[DP_STAT_MASKED_HIT]
                   : 0,
-                  stats[DP_STAT_MISS], stats[DP_STAT_LOST]);
+                  stats[DP_STAT_MISS], stats[DP_STAT_LOST],
+                  stats[DP_STAT_RATELIMIT_DROP]);

     if (total_cycles == 0) {
         return;
@@ -2996,6 +3013,8 @@  dpif_netdev_set_config(struct dpif *dpif, const struct smap *other_config)
         smap_get_ullong(other_config, "emc-insert-inv-prob",
                         DEFAULT_EM_FLOW_INSERT_INV_PROB);
     uint32_t insert_min, cur_min;
+    unsigned int rate, burst;
+    bool ratelimit;

     if (!nullable_string_is_equal(dp->pmd_cmask, cmask)) {
         free(dp->pmd_cmask);
@@ -3021,6 +3040,36 @@  dpif_netdev_set_config(struct dpif *dpif, const struct smap *other_config)
         }
     }

+    /* Handle upcall policer params */
+    ratelimit = smap_get_bool(other_config, "upcall-rl",
+                              UPCALL_RATELIMIT_DEFAULT);
+    rate = smap_get_int(other_config, "upcall-rate",
+                        UPCALL_RATE_DEFAULT);
+    burst = smap_get_int(other_config, "upcall-burst",
+                         UPCALL_BURST_DEFAULT);
+
+    if ((rate != upcall_rate) || (burst != upcall_burst)) {
+        VLOG_INFO("Upcall ratelimit params changed : Old - rate=%d burst=%d "
+                  ": New - rate=%d burst=%d\n", upcall_rate, upcall_burst,
+                  rate, burst);
+
+        upcall_rate = rate;
+        upcall_burst = burst;
+
+        /*
+         * TODO: See if there is a way to reconfig only the policer
+         * in each PMD.
+         */
+        dp_netdev_request_reconfigure(dp);
+    }
+
+    if (ratelimit != upcall_ratelimit) {
+        upcall_ratelimit = ratelimit;
+
+        VLOG_INFO("Upcall ratelimit changed to %s\n",
+                  (upcall_ratelimit ? "Enabled" : "Disabled"));
+    }
+
     return 0;
 }
@@ -3879,6 +3928,12 @@  dpif_netdev_run(struct dpif *dpif)
     ovs_mutex_lock(&dp->port_mutex);
     non_pmd = dp_netdev_get_pmd(dp, NON_PMD_CORE_ID);
     if (non_pmd) {
+        /* Reconfig the upcall policer if params have changed */
+        if ((upcall_rate != non_pmd->upcall_tb.rate) ||
+            (upcall_burst != non_pmd->upcall_tb.burst)) {
+            token_bucket_init(&non_pmd->upcall_tb, upcall_rate, upcall_burst);
+        }
+
         ovs_mutex_lock(&dp->non_pmd_mutex);
         cycles_count_start(non_pmd);
         HMAP_FOR_EACH (port, node, &dp->ports) {
@@ -4074,6 +4129,9 @@  reload:
         lc = UINT_MAX;
     }

+    /* Initialize upcall policer token bucket with configured params */
+    token_bucket_init(&pmd->upcall_tb, upcall_rate, upcall_burst);
+
     cycles_count_start(pmd);
     for (;;) {
         for (i = 0; i < poll_cnt; i++) {
@@ -4554,6 +4612,10 @@  dp_netdev_configure_pmd(struct dp_netdev_pmd_thread *pmd, struct dp_netdev *dp,
         emc_cache_init(&pmd->flow_cache);
         pmd_alloc_static_tx_qid(pmd);
     }
+
+    /* Initialize upcall policer token bucket with configured params */
+    token_bucket_init(&pmd->upcall_tb, upcall_rate, upcall_burst);
+
     cmap_insert(&dp->poll_threads, CONST_CAST(struct cmap_node *, &pmd->node),
                 hash_int(core_id, 0));
 }
@@ -4992,7 +5054,7 @@  handle_packet_upcall(struct dp_netdev_pmd_thread *pmd,
                      struct dp_packet *packet,
                      const struct netdev_flow_key *key,
                      struct ofpbuf *actions, struct ofpbuf *put_actions,
-                     int *lost_cnt, long long now)
+                     int *lost_cnt, int *rl_drop_cnt, long long now)
 {
     struct ofpbuf *add_actions;
     struct dp_packet_batch b;
@@ -5000,6 +5062,18 @@  handle_packet_upcall(struct dp_netdev_pmd_thread *pmd,
     ovs_u128 ufid;
     int error;

+    /*
+     * Grab a token from the upcall policer to enter slowpath. If token
+     * is not available, drop and account the packet. This is to
+     * rate-limit packets getting into slowpath.
+     */
+    if (upcall_ratelimit && !token_bucket_withdraw(&pmd->upcall_tb, 1)) {
+        dp_packet_delete(packet);
+        (*rl_drop_cnt)++;
+
+        return;
+    }
+
     match.tun_md.valid = false;
     miniflow_expand(&key->mf, &match.flow);

@@ -5074,7 +5148,7 @@  fast_path_processing(struct dp_netdev_pmd_thread *pmd,
     struct dpcls *cls;
     struct dpcls_rule *rules[PKT_ARRAY_SIZE];
     struct dp_netdev *dp = pmd->dp;
-    int miss_cnt = 0, lost_cnt = 0;
+    int miss_cnt = 0, lost_cnt = 0, rl_drop_cnt = 0;
     int lookup_cnt = 0, add_lookup_cnt;
     bool any_miss;
     size_t i;
@@ -5118,7 +5192,7 @@  fast_path_processing(struct dp_netdev_pmd_thread *pmd,

             miss_cnt++;
             handle_packet_upcall(pmd, packet, &keys[i], &actions,
-                                 &put_actions, &lost_cnt, now);
+                                 &put_actions, &lost_cnt, &rl_drop_cnt, now);
         }

         ofpbuf_uninit(&actions);
@@ -5151,6 +5225,7 @@  fast_path_processing(struct dp_netdev_pmd_thread *pmd,
     dp_netdev_count_packet(pmd, DP_STAT_LOOKUP_HIT, lookup_cnt);
     dp_netdev_count_packet(pmd, DP_STAT_MISS, miss_cnt);
     dp_netdev_count_packet(pmd, DP_STAT_LOST, lost_cnt);
+    dp_netdev_count_packet(pmd, DP_STAT_RATELIMIT_DROP, rl_drop_cnt);
 }

 /* Packets enter the datapath from a port (or from recirculation) here.
diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
index c145e1a..3d86367 100644
--- a/vswitchd/vswitch.xml
+++ b/vswitchd/vswitch.xml
@@ -397,6 +397,53 @@ 
         </p>
       </column>

+      <column name="other_config" key="upcall-rl"
+              type='{"type": "boolean"}'>
+        <p>
+          Set this value to <code>true</code> to enable upcall rate-limiting.
+          The upcall parameters like rate and burst will be ignored, if this is
+          not set.
+        </p>
+        <p>
+          The default value is <code>false</code> and upcall rate-limiting will
+          be disabled.
+        </p>
+      </column>
+
+      <column name="other_config" key="upcall-rate"
+        type='{"type": "integer", "minInteger": 0, "maxInteger": 4294967295}'>
+        <p>
+          Specifies the rate of upcalls in packets-per-second that is to be
+          allowed. For example, if the value is 10000, then those many upcalls
+          (for packets) are allowed per second in each of the packet polling
+          thread (PMD or non-PMD).
+        </p>
+        <p>
+          A value of <code>0</code> means, no upcalls would be allowed i.e,
+          upcall will be disabled. This is mainly for debugging.
+        </p>
+        <p>
+          The default value is 1000.
+        </p>
+      </column>
+
+      <column name="other_config" key="upcall-burst"
+        type='{"type": "integer", "minInteger": 0, "maxInteger": 4294967295}'>
+        <p>
+          Specifies the maximum burst of upcalls in packets-per-second that is
+          to be allowed. For example, if the value is 15000, then a maximum
+          burst of 15000 upcalls (for packets) are allowed per second in each
+          of the packet polling thread (PMD or non-PMD).
+        </p>
+        <p>
+          A value of <code>0</code> means, no upcalls would be allowed i.e,
+          upcall will be disabled. This is mainly for debugging.
+        </p>
+        <p>
+          The default value is 1000.
+        </p>
+      </column>
+
       <column name="other_config" key="vlan-limit"
               type='{"type": "integer", "minInteger": 0}'>
         <p>