[ovs-dev,v2,3/5] dpif-netdev: Skip EMC lookup/insert for recirc packets.
diff mbox

Message ID 1500480297-7530-3-git-send-email-antonio.fischetti@intel.com
State Superseded
Delegated to: Darrell Ball
Headers show

Commit Message

Fischetti, Antonio July 19, 2017, 4:04 p.m. UTC
When OVS is configured as a firewall, with thousands of active
concurrent connections, the EMC gets quicly saturated and may come under
heavy thrashing for the reason that original and recirculated packets
keep overwrite existing active EMC entries due to its limited size (8k).

This thrashing causes the EMC to be less efficient than the dcpls in
terms of lookups and insertions.

This patch allows to use the EMC efficiently by allowing only the 'original'
packets to hit EMC. All recirculated packets are sent to the classifier directly.
An empirical threshold (EMC_RECIRCT_NO_INSERT_THRESHOLD - of 50%) for EMC
occupancy is set to trigger this logic. By doing so when EMC utilization exceeds
EMC_RECIRCT_NO_INSERT_THRESHOLD:
 - EMC Insertions are allowed just for original packets. EMC insertion
   and look up is skipped for recirculated packets.
 - Recirculated packets are sent to the classifier.

This patch is based on patch
"dpif-netdev: add EMC entry count and %full figure to pmd-stats-show" at:
https://mail.openvswitch.org/pipermail/ovs-dev/2017-January/327570.html
Also, this patch depends on the previous one in this series.

Signed-off-by: Antonio Fischetti <antonio.fischetti@intel.com>
Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com>
Co-authored-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com>
---
In our Connection Tracker testbench set up with

 table=0, priority=1 actions=drop
 table=0, priority=10,arp actions=NORMAL
 table=0, priority=100,ct_state=-trk,ip actions=ct(table=1)
 table=1, ct_state=+new+trk,ip,in_port=1 actions=ct(commit),output:2
 table=1, ct_state=+est+trk,ip,in_port=1 actions=output:2
 table=1, ct_state=+new+trk,ip,in_port=2 actions=drop
 table=1, ct_state=+est+trk,ip,in_port=2 actions=output:1

we saw the following performance improvement.

We measured packet Rx rate (regardless of packet loss). Bidirectional
test with 64B UDP packets.
Each row is a test with a different number of traffic streams. The traffic
generator is set so that each stream establishes one UDP connection.
Mpps columns reports the Rx rates on the 2 sides.

          +----------------------+-----------------------+
          |  Original OvS-DPDK   |    Previous case      |
          |  + patches #1,2      |    + this patch       |
 ---------+------------+---------+------------+----------+
  Traffic |     Rx     |   EMC   |     Rx     |   EMC    |
  Streams |   [Mpps]   | entries |   [Mpps]   | entries  |
 ---------+------------+---------+------------+----------+
      10  | 2.60, 2.67 |    20   | 2.60, 2.64 |    20    |
     100  | 2.53, 2.58 |   200   | 2.59, 2.61 |   201    | 
   1,000  | 2.02, 2.03 |  1929   | 2.15, 2.15 |  1997    |
   2,000  | 1.94, 1.96 |  3661   | 1.97, 1.98 |  3668    |
   3,000  | 1.87, 1.90 |  5086   | 1.96, 1.98 |  4736    |    
   4,000  | 1.82, 1.82 |  6173   | 1.95, 1.94 |  5280    |        
  10,000  | 1.68, 1.69 |  7826   | 1.84, 1.84 |  7102    |     
  30,000  | 1.57, 1.58 |  8192   | 1.68, 1.70 |  8192    | 
 ---------+------------+---------+------------+----------+

This test setup implies 1 recirculation on each received packet.
We didn't check this patch in a test scenario where more than 1
recirculation is occurring per packet.

 lib/dpif-netdev.c | 63 ++++++++++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 58 insertions(+), 5 deletions(-)

Comments

Billy O'Mahony Aug. 1, 2017, 10:50 a.m. UTC | #1
Hi Antonio,

Unfortunately I think the performance deltas of this here probably need to be re-worked given the bug discovered & fixed in EMC Insertion algorithm here which according to the patch notes will significantly reduce EMC contention for a given number of flows.

https://mail.openvswitch.org/pipermail/ovs-dev/2017-July/336452.html

However, before you commit more effort I would like to post a proposal to the list on a more generalized EMC load-shedding mechanism which I think could be more effective as it would be more granular than shedding just re-circulated traffic. I hope to post that today. 

Regards,
/Billy

> -----Original Message-----
> From: ovs-dev-bounces@openvswitch.org [mailto:ovs-dev-
> bounces@openvswitch.org] On Behalf Of antonio.fischetti@intel.com
> Sent: Wednesday, July 19, 2017 5:05 PM
> To: dev@openvswitch.org
> Subject: [ovs-dev] [PATCH v2 3/5] dpif-netdev: Skip EMC lookup/insert for
> recirc packets.
> 
> When OVS is configured as a firewall, with thousands of active concurrent
> connections, the EMC gets quicly saturated and may come under heavy
> thrashing for the reason that original and recirculated packets keep overwrite
> existing active EMC entries due to its limited size (8k).
> 
> This thrashing causes the EMC to be less efficient than the dcpls in terms of
> lookups and insertions.
> 
> This patch allows to use the EMC efficiently by allowing only the 'original'
> packets to hit EMC. All recirculated packets are sent to the classifier directly.
> An empirical threshold (EMC_RECIRCT_NO_INSERT_THRESHOLD - of 50%) for
> EMC occupancy is set to trigger this logic. By doing so when EMC utilization
> exceeds
> EMC_RECIRCT_NO_INSERT_THRESHOLD:
>  - EMC Insertions are allowed just for original packets. EMC insertion
>    and look up is skipped for recirculated packets.
>  - Recirculated packets are sent to the classifier.
> 
> This patch is based on patch
> "dpif-netdev: add EMC entry count and %full figure to pmd-stats-show" at:
> https://mail.openvswitch.org/pipermail/ovs-dev/2017-January/327570.html
> Also, this patch depends on the previous one in this series.
> 
> Signed-off-by: Antonio Fischetti <antonio.fischetti@intel.com>
> Signed-off-by: Bhanuprakash Bodireddy
> <bhanuprakash.bodireddy@intel.com>
> Co-authored-by: Bhanuprakash Bodireddy
> <bhanuprakash.bodireddy@intel.com>
> ---
> In our Connection Tracker testbench set up with
> 
>  table=0, priority=1 actions=drop
>  table=0, priority=10,arp actions=NORMAL  table=0, priority=100,ct_state=-
> trk,ip actions=ct(table=1)  table=1, ct_state=+new+trk,ip,in_port=1
> actions=ct(commit),output:2  table=1, ct_state=+est+trk,ip,in_port=1
> actions=output:2  table=1, ct_state=+new+trk,ip,in_port=2 actions=drop
> table=1, ct_state=+est+trk,ip,in_port=2 actions=output:1
> 
> we saw the following performance improvement.
> 
> We measured packet Rx rate (regardless of packet loss). Bidirectional test
> with 64B UDP packets.
> Each row is a test with a different number of traffic streams. The traffic
> generator is set so that each stream establishes one UDP connection.
> Mpps columns reports the Rx rates on the 2 sides.
> 
>           +----------------------+-----------------------+
>           |  Original OvS-DPDK   |    Previous case      |
>           |  + patches #1,2      |    + this patch       |
>  ---------+------------+---------+------------+----------+
>   Traffic |     Rx     |   EMC   |     Rx     |   EMC    |
>   Streams |   [Mpps]   | entries |   [Mpps]   | entries  |
>  ---------+------------+---------+------------+----------+
>       10  | 2.60, 2.67 |    20   | 2.60, 2.64 |    20    |
>      100  | 2.53, 2.58 |   200   | 2.59, 2.61 |   201    |
>    1,000  | 2.02, 2.03 |  1929   | 2.15, 2.15 |  1997    |
>    2,000  | 1.94, 1.96 |  3661   | 1.97, 1.98 |  3668    |
>    3,000  | 1.87, 1.90 |  5086   | 1.96, 1.98 |  4736    |
>    4,000  | 1.82, 1.82 |  6173   | 1.95, 1.94 |  5280    |
>   10,000  | 1.68, 1.69 |  7826   | 1.84, 1.84 |  7102    |
>   30,000  | 1.57, 1.58 |  8192   | 1.68, 1.70 |  8192    |
>  ---------+------------+---------+------------+----------+
> 
> This test setup implies 1 recirculation on each received packet.
> We didn't check this patch in a test scenario where more than 1 recirculation
> is occurring per packet.
> 
>  lib/dpif-netdev.c | 63
> ++++++++++++++++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 58 insertions(+), 5 deletions(-)
> 
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 9562827..79efce6
> 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -4573,6 +4573,9 @@ dp_netdev_queue_batches(struct dp_packet *pkt,
>      packet_batch_per_flow_update(batch, pkt, mf);  }
> 
> +/* Threshold to skip EMC for recirculated packets. */ #define
> +EMC_RECIRCT_NO_INSERT_THRESHOLD 0xFFFFF000
> +
>  /* Try to process all ('cnt') the 'packets' using only the exact match cache
>   * 'pmd->flow_cache'. If a flow is not found for a packet 'packets[i]', the
>   * miniflow is copied into 'keys' and the packet pointer is moved at the @@ -
> 4620,15 +4623,39 @@ emc_processing(struct dp_netdev_pmd_thread
> *pmd,
>          miniflow_extract(packet, &key->mf);
>          key->len = 0; /* Not computed yet. */
> 
> -        /* If EMC is disabled skip hash computation and emc_lookup */
> +        /*
> +         * EMC lookup is skipped when one or both of the following
> +         * two cases occurs:
> +         *
> +         *   - EMC is disabled.  This is detected from cur_min.
> +         *
> +         *   - The EMC occupancy exceeds
> EMC_RECIRCT_NO_INSERT_THRESHOLD and
> +         *     the packet to be classified is being recirculated.  When this
> +         *     happens also EMC insertions are skipped for recirculated
> +         *     packets.  So that EMC is used just to store entries which
> +         *     are hit from the 'original' packets.  This way the EMC
> +         *     thrashing is mitigated with a benefit on performance.
> +         */
>          if (OVS_LIKELY(cur_min)) {
>              if (!md_is_valid) {
> +                /* This is an original packet.  As it is not recirculated
> +                 * we can retrieve the 5-tuple hash value without considering
> +                 * the recirc id. */
>                  key->hash = dpif_netdev_packet_get_rss_hash_orig_pkt(packet,
>                          &key->mf);
> +                flow = emc_lookup(flow_cache, key);
>              } else {
> -                key->hash = dpif_netdev_packet_get_rss_hash(packet, &key->mf);
> +                /* Recirculated packet. */
> +                if (flow_cache->n_entries &
> EMC_RECIRCT_NO_INSERT_THRESHOLD) {
> +                    /* EMC occupancy is over the threshold.  We skip EMC
> +                     * lookup for recirculated packets. */
> +                    flow = NULL;
> +                } else {
> +                    key->hash = dpif_netdev_packet_get_rss_hash(packet,
> +                            &key->mf);
> +                    flow = emc_lookup(flow_cache, key);
> +                }
>              }
> -            flow = emc_lookup(flow_cache, key);
>          } else {
>              flow = NULL;
>          }
> @@ -4716,7 +4743,20 @@ handle_packet_upcall(struct
> dp_netdev_pmd_thread *pmd,
>                                               add_actions->size);
>          }
>          ovs_mutex_unlock(&pmd->flow_mutex);
> -        emc_probabilistic_insert(pmd, key, netdev_flow);
> +        /* EMC insertion can be skipped by a probabilistic criteria or
> +         * - in case of recirculated packets - depending on the number of
> +         * EMC entries. */
> +        if (!packet->md.recirc_id) {
> +            emc_probabilistic_insert(pmd, key, netdev_flow);
> +        } else {
> +            /* Recirculated packets.  When EMC occupancy goes over
> +             * a threshold we avoid inserting new entries. */
> +            if (!(pmd->flow_cache.n_entries &
> +                    EMC_RECIRCT_NO_INSERT_THRESHOLD)) {
> +                /* Still under the threshold. */
> +                emc_probabilistic_insert(pmd, key, netdev_flow);
> +            }
> +        }
>      }
>  }
> 
> @@ -4809,7 +4849,20 @@ fast_path_processing(struct
> dp_netdev_pmd_thread *pmd,
> 
>          flow = dp_netdev_flow_cast(rules[i]);
> 
> -        emc_probabilistic_insert(pmd, &keys[i], flow);
> +        /* EMC insertion can be skipped by a probabilistic criteria or
> +         * - in case of recirculated packets - depending on the number of
> +         * EMC entries. */
> +        if (!packet->md.recirc_id) {
> +            emc_probabilistic_insert(pmd, &keys[i], flow);
> +        } else {
> +            /* Recirculated packets.  When EMC occupancy goes over
> +             * a threshold we avoid inserting new entries. */
> +            if (!(pmd->flow_cache.n_entries &
> +                    EMC_RECIRCT_NO_INSERT_THRESHOLD)) {
> +                /* Still under the threshold. */
> +                emc_probabilistic_insert(pmd, &keys[i], flow);
> +            }
> +        }
>          dp_netdev_queue_batches(packet, flow, &keys[i].mf, batches,
> n_batches);
>      }
> 
> --
> 2.4.11
> 
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Fischetti, Antonio Aug. 2, 2017, 3:59 p.m. UTC | #2
> -----Original Message-----
> From: O Mahony, Billy
> Sent: Tuesday, August 1, 2017 11:51 AM
> To: Fischetti, Antonio <antonio.fischetti@intel.com>; dev@openvswitch.org
> Subject: RE: [ovs-dev] [PATCH v2 3/5] dpif-netdev: Skip EMC lookup/insert for
> recirc packets.
> 
> Hi Antonio,
> 
> Unfortunately I think the performance deltas of this here probably need to be
> re-worked given the bug discovered & fixed in EMC Insertion algorithm here
> which according to the patch notes will significantly reduce EMC contention for
> a given number of flows.
> 
> https://mail.openvswitch.org/pipermail/ovs-dev/2017-July/336452.html

[Antonio] I think this patch and the one you mentioned are 2 different 
approaches with 2 different goals that can work fine together. 

  "Fix emc replacement policy" patch
  ----------------------------------
It allows to select - better than now - which location to overwrite so 
that the emc is used in a smarter way. The usecase here is the general
emc replacement management, also with very few flows, ie 50 - 100 
active flows.
In case you have to choose between 2 active flows it will decide  
with a criteria based on a good random value.

  This patch
  ----------
This patch is instead targeting a 'congestion' usecase where you already have 
the EMC quite full and also recirculation(s). A typical example is a 
firewall keeping track of a tens of thousands of connections. A better 
example would be a scenario - as Jan S. mentioned in one of the last 
Community calls - with 'more than 1' recirculation.
It also defines a criteria to avoid lookups.

I think both patches can work together.


> 
> However, before you commit more effort I would like to post a proposal to the
> list on a more generalized EMC load-shedding mechanism which I think could be
> more effective as it would be more granular than shedding just re-circulated
> traffic. I hope to post that today.

[Antonio] I'll have a look.


> 
> Regards,
> /Billy
> 
> > -----Original Message-----
> > From: ovs-dev-bounces@openvswitch.org [mailto:ovs-dev-
> > bounces@openvswitch.org] On Behalf Of antonio.fischetti@intel.com
> > Sent: Wednesday, July 19, 2017 5:05 PM
> > To: dev@openvswitch.org
> > Subject: [ovs-dev] [PATCH v2 3/5] dpif-netdev: Skip EMC lookup/insert for
> > recirc packets.
> >
> > When OVS is configured as a firewall, with thousands of active concurrent
> > connections, the EMC gets quicly saturated and may come under heavy
> > thrashing for the reason that original and recirculated packets keep
> overwrite
> > existing active EMC entries due to its limited size (8k).
> >
> > This thrashing causes the EMC to be less efficient than the dcpls in terms of
> > lookups and insertions.
> >
> > This patch allows to use the EMC efficiently by allowing only the 'original'
> > packets to hit EMC. All recirculated packets are sent to the classifier
> directly.
> > An empirical threshold (EMC_RECIRCT_NO_INSERT_THRESHOLD - of 50%) for
> > EMC occupancy is set to trigger this logic. By doing so when EMC utilization
> > exceeds
> > EMC_RECIRCT_NO_INSERT_THRESHOLD:
> >  - EMC Insertions are allowed just for original packets. EMC insertion
> >    and look up is skipped for recirculated packets.
> >  - Recirculated packets are sent to the classifier.
> >
> > This patch is based on patch
> > "dpif-netdev: add EMC entry count and %full figure to pmd-stats-show" at:
> > https://mail.openvswitch.org/pipermail/ovs-dev/2017-January/327570.html
> > Also, this patch depends on the previous one in this series.
> >
> > Signed-off-by: Antonio Fischetti <antonio.fischetti@intel.com>
> > Signed-off-by: Bhanuprakash Bodireddy
> > <bhanuprakash.bodireddy@intel.com>
> > Co-authored-by: Bhanuprakash Bodireddy
> > <bhanuprakash.bodireddy@intel.com>
> > ---
> > In our Connection Tracker testbench set up with
> >
> >  table=0, priority=1 actions=drop
> >  table=0, priority=10,arp actions=NORMAL  table=0, priority=100,ct_state=-
> > trk,ip actions=ct(table=1)  table=1, ct_state=+new+trk,ip,in_port=1
> > actions=ct(commit),output:2  table=1, ct_state=+est+trk,ip,in_port=1
> > actions=output:2  table=1, ct_state=+new+trk,ip,in_port=2 actions=drop
> > table=1, ct_state=+est+trk,ip,in_port=2 actions=output:1
> >
> > we saw the following performance improvement.
> >
> > We measured packet Rx rate (regardless of packet loss). Bidirectional test
> > with 64B UDP packets.
> > Each row is a test with a different number of traffic streams. The traffic
> > generator is set so that each stream establishes one UDP connection.
> > Mpps columns reports the Rx rates on the 2 sides.
> >
> >           +----------------------+-----------------------+
> >           |  Original OvS-DPDK   |    Previous case      |
> >           |  + patches #1,2      |    + this patch       |
> >  ---------+------------+---------+------------+----------+
> >   Traffic |     Rx     |   EMC   |     Rx     |   EMC    |
> >   Streams |   [Mpps]   | entries |   [Mpps]   | entries  |
> >  ---------+------------+---------+------------+----------+
> >       10  | 2.60, 2.67 |    20   | 2.60, 2.64 |    20    |
> >      100  | 2.53, 2.58 |   200   | 2.59, 2.61 |   201    |
> >    1,000  | 2.02, 2.03 |  1929   | 2.15, 2.15 |  1997    |
> >    2,000  | 1.94, 1.96 |  3661   | 1.97, 1.98 |  3668    |
> >    3,000  | 1.87, 1.90 |  5086   | 1.96, 1.98 |  4736    |
> >    4,000  | 1.82, 1.82 |  6173   | 1.95, 1.94 |  5280    |
> >   10,000  | 1.68, 1.69 |  7826   | 1.84, 1.84 |  7102    |
> >   30,000  | 1.57, 1.58 |  8192   | 1.68, 1.70 |  8192    |
> >  ---------+------------+---------+------------+----------+
> >
> > This test setup implies 1 recirculation on each received packet.
> > We didn't check this patch in a test scenario where more than 1 recirculation
> > is occurring per packet.
> >
> >  lib/dpif-netdev.c | 63
> > ++++++++++++++++++++++++++++++++++++++++++++++++++-----
> >  1 file changed, 58 insertions(+), 5 deletions(-)
> >
> > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 9562827..79efce6
> > 100644
> > --- a/lib/dpif-netdev.c
> > +++ b/lib/dpif-netdev.c
> > @@ -4573,6 +4573,9 @@ dp_netdev_queue_batches(struct dp_packet *pkt,
> >      packet_batch_per_flow_update(batch, pkt, mf);  }
> >
> > +/* Threshold to skip EMC for recirculated packets. */ #define
> > +EMC_RECIRCT_NO_INSERT_THRESHOLD 0xFFFFF000
> > +
> >  /* Try to process all ('cnt') the 'packets' using only the exact match cache
> >   * 'pmd->flow_cache'. If a flow is not found for a packet 'packets[i]', the
> >   * miniflow is copied into 'keys' and the packet pointer is moved at the @@
> -
> > 4620,15 +4623,39 @@ emc_processing(struct dp_netdev_pmd_thread
> > *pmd,
> >          miniflow_extract(packet, &key->mf);
> >          key->len = 0; /* Not computed yet. */
> >
> > -        /* If EMC is disabled skip hash computation and emc_lookup */
> > +        /*
> > +         * EMC lookup is skipped when one or both of the following
> > +         * two cases occurs:
> > +         *
> > +         *   - EMC is disabled.  This is detected from cur_min.
> > +         *
> > +         *   - The EMC occupancy exceeds
> > EMC_RECIRCT_NO_INSERT_THRESHOLD and
> > +         *     the packet to be classified is being recirculated.  When this
> > +         *     happens also EMC insertions are skipped for recirculated
> > +         *     packets.  So that EMC is used just to store entries which
> > +         *     are hit from the 'original' packets.  This way the EMC
> > +         *     thrashing is mitigated with a benefit on performance.
> > +         */
> >          if (OVS_LIKELY(cur_min)) {
> >              if (!md_is_valid) {
> > +                /* This is an original packet.  As it is not recirculated
> > +                 * we can retrieve the 5-tuple hash value without
> considering
> > +                 * the recirc id. */
> >                  key->hash = dpif_netdev_packet_get_rss_hash_orig_pkt(packet,
> >                          &key->mf);
> > +                flow = emc_lookup(flow_cache, key);
> >              } else {
> > -                key->hash = dpif_netdev_packet_get_rss_hash(packet, &key-
> >mf);
> > +                /* Recirculated packet. */
> > +                if (flow_cache->n_entries &
> > EMC_RECIRCT_NO_INSERT_THRESHOLD) {
> > +                    /* EMC occupancy is over the threshold.  We skip EMC
> > +                     * lookup for recirculated packets. */
> > +                    flow = NULL;
> > +                } else {
> > +                    key->hash = dpif_netdev_packet_get_rss_hash(packet,
> > +                            &key->mf);
> > +                    flow = emc_lookup(flow_cache, key);
> > +                }
> >              }
> > -            flow = emc_lookup(flow_cache, key);
> >          } else {
> >              flow = NULL;
> >          }
> > @@ -4716,7 +4743,20 @@ handle_packet_upcall(struct
> > dp_netdev_pmd_thread *pmd,
> >                                               add_actions->size);
> >          }
> >          ovs_mutex_unlock(&pmd->flow_mutex);
> > -        emc_probabilistic_insert(pmd, key, netdev_flow);
> > +        /* EMC insertion can be skipped by a probabilistic criteria or
> > +         * - in case of recirculated packets - depending on the number of
> > +         * EMC entries. */
> > +        if (!packet->md.recirc_id) {
> > +            emc_probabilistic_insert(pmd, key, netdev_flow);
> > +        } else {
> > +            /* Recirculated packets.  When EMC occupancy goes over
> > +             * a threshold we avoid inserting new entries. */
> > +            if (!(pmd->flow_cache.n_entries &
> > +                    EMC_RECIRCT_NO_INSERT_THRESHOLD)) {
> > +                /* Still under the threshold. */
> > +                emc_probabilistic_insert(pmd, key, netdev_flow);
> > +            }
> > +        }
> >      }
> >  }
> >
> > @@ -4809,7 +4849,20 @@ fast_path_processing(struct
> > dp_netdev_pmd_thread *pmd,
> >
> >          flow = dp_netdev_flow_cast(rules[i]);
> >
> > -        emc_probabilistic_insert(pmd, &keys[i], flow);
> > +        /* EMC insertion can be skipped by a probabilistic criteria or
> > +         * - in case of recirculated packets - depending on the number of
> > +         * EMC entries. */
> > +        if (!packet->md.recirc_id) {
> > +            emc_probabilistic_insert(pmd, &keys[i], flow);
> > +        } else {
> > +            /* Recirculated packets.  When EMC occupancy goes over
> > +             * a threshold we avoid inserting new entries. */
> > +            if (!(pmd->flow_cache.n_entries &
> > +                    EMC_RECIRCT_NO_INSERT_THRESHOLD)) {
> > +                /* Still under the threshold. */
> > +                emc_probabilistic_insert(pmd, &keys[i], flow);
> > +            }
> > +        }
> >          dp_netdev_queue_batches(packet, flow, &keys[i].mf, batches,
> > n_batches);
> >      }
> >
> > --
> > 2.4.11
> >
> > _______________________________________________
> > dev mailing list
> > dev@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Patch
diff mbox

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 9562827..79efce6 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -4573,6 +4573,9 @@  dp_netdev_queue_batches(struct dp_packet *pkt,
     packet_batch_per_flow_update(batch, pkt, mf);
 }
 
+/* Threshold to skip EMC for recirculated packets. */
+#define EMC_RECIRCT_NO_INSERT_THRESHOLD 0xFFFFF000
+
 /* Try to process all ('cnt') the 'packets' using only the exact match cache
  * 'pmd->flow_cache'. If a flow is not found for a packet 'packets[i]', the
  * miniflow is copied into 'keys' and the packet pointer is moved at the
@@ -4620,15 +4623,39 @@  emc_processing(struct dp_netdev_pmd_thread *pmd,
         miniflow_extract(packet, &key->mf);
         key->len = 0; /* Not computed yet. */
 
-        /* If EMC is disabled skip hash computation and emc_lookup */
+        /*
+         * EMC lookup is skipped when one or both of the following
+         * two cases occurs:
+         *
+         *   - EMC is disabled.  This is detected from cur_min.
+         *
+         *   - The EMC occupancy exceeds EMC_RECIRCT_NO_INSERT_THRESHOLD and
+         *     the packet to be classified is being recirculated.  When this
+         *     happens also EMC insertions are skipped for recirculated
+         *     packets.  So that EMC is used just to store entries which
+         *     are hit from the 'original' packets.  This way the EMC
+         *     thrashing is mitigated with a benefit on performance.
+         */
         if (OVS_LIKELY(cur_min)) {
             if (!md_is_valid) {
+                /* This is an original packet.  As it is not recirculated
+                 * we can retrieve the 5-tuple hash value without considering
+                 * the recirc id. */
                 key->hash = dpif_netdev_packet_get_rss_hash_orig_pkt(packet,
                         &key->mf);
+                flow = emc_lookup(flow_cache, key);
             } else {
-                key->hash = dpif_netdev_packet_get_rss_hash(packet, &key->mf);
+                /* Recirculated packet. */
+                if (flow_cache->n_entries & EMC_RECIRCT_NO_INSERT_THRESHOLD) {
+                    /* EMC occupancy is over the threshold.  We skip EMC
+                     * lookup for recirculated packets. */
+                    flow = NULL;
+                } else {
+                    key->hash = dpif_netdev_packet_get_rss_hash(packet,
+                            &key->mf);
+                    flow = emc_lookup(flow_cache, key);
+                }
             }
-            flow = emc_lookup(flow_cache, key);
         } else {
             flow = NULL;
         }
@@ -4716,7 +4743,20 @@  handle_packet_upcall(struct dp_netdev_pmd_thread *pmd,
                                              add_actions->size);
         }
         ovs_mutex_unlock(&pmd->flow_mutex);
-        emc_probabilistic_insert(pmd, key, netdev_flow);
+        /* EMC insertion can be skipped by a probabilistic criteria or
+         * - in case of recirculated packets - depending on the number of
+         * EMC entries. */
+        if (!packet->md.recirc_id) {
+            emc_probabilistic_insert(pmd, key, netdev_flow);
+        } else {
+            /* Recirculated packets.  When EMC occupancy goes over
+             * a threshold we avoid inserting new entries. */
+            if (!(pmd->flow_cache.n_entries &
+                    EMC_RECIRCT_NO_INSERT_THRESHOLD)) {
+                /* Still under the threshold. */
+                emc_probabilistic_insert(pmd, key, netdev_flow);
+            }
+        }
     }
 }
 
@@ -4809,7 +4849,20 @@  fast_path_processing(struct dp_netdev_pmd_thread *pmd,
 
         flow = dp_netdev_flow_cast(rules[i]);
 
-        emc_probabilistic_insert(pmd, &keys[i], flow);
+        /* EMC insertion can be skipped by a probabilistic criteria or
+         * - in case of recirculated packets - depending on the number of
+         * EMC entries. */
+        if (!packet->md.recirc_id) {
+            emc_probabilistic_insert(pmd, &keys[i], flow);
+        } else {
+            /* Recirculated packets.  When EMC occupancy goes over
+             * a threshold we avoid inserting new entries. */
+            if (!(pmd->flow_cache.n_entries &
+                    EMC_RECIRCT_NO_INSERT_THRESHOLD)) {
+                /* Still under the threshold. */
+                emc_probabilistic_insert(pmd, &keys[i], flow);
+            }
+        }
         dp_netdev_queue_batches(packet, flow, &keys[i].mf, batches, n_batches);
     }