Message ID | 1500480297-7530-3-git-send-email-antonio.fischetti@intel.com |
---|---|
State | Superseded |
Delegated to: | Darrell Ball |
Headers | show |
Hi Antonio, Unfortunately I think the performance deltas of this here probably need to be re-worked given the bug discovered & fixed in EMC Insertion algorithm here which according to the patch notes will significantly reduce EMC contention for a given number of flows. https://mail.openvswitch.org/pipermail/ovs-dev/2017-July/336452.html However, before you commit more effort I would like to post a proposal to the list on a more generalized EMC load-shedding mechanism which I think could be more effective as it would be more granular than shedding just re-circulated traffic. I hope to post that today. Regards, /Billy > -----Original Message----- > From: ovs-dev-bounces@openvswitch.org [mailto:ovs-dev- > bounces@openvswitch.org] On Behalf Of antonio.fischetti@intel.com > Sent: Wednesday, July 19, 2017 5:05 PM > To: dev@openvswitch.org > Subject: [ovs-dev] [PATCH v2 3/5] dpif-netdev: Skip EMC lookup/insert for > recirc packets. > > When OVS is configured as a firewall, with thousands of active concurrent > connections, the EMC gets quicly saturated and may come under heavy > thrashing for the reason that original and recirculated packets keep overwrite > existing active EMC entries due to its limited size (8k). > > This thrashing causes the EMC to be less efficient than the dcpls in terms of > lookups and insertions. > > This patch allows to use the EMC efficiently by allowing only the 'original' > packets to hit EMC. All recirculated packets are sent to the classifier directly. > An empirical threshold (EMC_RECIRCT_NO_INSERT_THRESHOLD - of 50%) for > EMC occupancy is set to trigger this logic. By doing so when EMC utilization > exceeds > EMC_RECIRCT_NO_INSERT_THRESHOLD: > - EMC Insertions are allowed just for original packets. EMC insertion > and look up is skipped for recirculated packets. > - Recirculated packets are sent to the classifier. > > This patch is based on patch > "dpif-netdev: add EMC entry count and %full figure to pmd-stats-show" at: > https://mail.openvswitch.org/pipermail/ovs-dev/2017-January/327570.html > Also, this patch depends on the previous one in this series. > > Signed-off-by: Antonio Fischetti <antonio.fischetti@intel.com> > Signed-off-by: Bhanuprakash Bodireddy > <bhanuprakash.bodireddy@intel.com> > Co-authored-by: Bhanuprakash Bodireddy > <bhanuprakash.bodireddy@intel.com> > --- > In our Connection Tracker testbench set up with > > table=0, priority=1 actions=drop > table=0, priority=10,arp actions=NORMAL table=0, priority=100,ct_state=- > trk,ip actions=ct(table=1) table=1, ct_state=+new+trk,ip,in_port=1 > actions=ct(commit),output:2 table=1, ct_state=+est+trk,ip,in_port=1 > actions=output:2 table=1, ct_state=+new+trk,ip,in_port=2 actions=drop > table=1, ct_state=+est+trk,ip,in_port=2 actions=output:1 > > we saw the following performance improvement. > > We measured packet Rx rate (regardless of packet loss). Bidirectional test > with 64B UDP packets. > Each row is a test with a different number of traffic streams. The traffic > generator is set so that each stream establishes one UDP connection. > Mpps columns reports the Rx rates on the 2 sides. > > +----------------------+-----------------------+ > | Original OvS-DPDK | Previous case | > | + patches #1,2 | + this patch | > ---------+------------+---------+------------+----------+ > Traffic | Rx | EMC | Rx | EMC | > Streams | [Mpps] | entries | [Mpps] | entries | > ---------+------------+---------+------------+----------+ > 10 | 2.60, 2.67 | 20 | 2.60, 2.64 | 20 | > 100 | 2.53, 2.58 | 200 | 2.59, 2.61 | 201 | > 1,000 | 2.02, 2.03 | 1929 | 2.15, 2.15 | 1997 | > 2,000 | 1.94, 1.96 | 3661 | 1.97, 1.98 | 3668 | > 3,000 | 1.87, 1.90 | 5086 | 1.96, 1.98 | 4736 | > 4,000 | 1.82, 1.82 | 6173 | 1.95, 1.94 | 5280 | > 10,000 | 1.68, 1.69 | 7826 | 1.84, 1.84 | 7102 | > 30,000 | 1.57, 1.58 | 8192 | 1.68, 1.70 | 8192 | > ---------+------------+---------+------------+----------+ > > This test setup implies 1 recirculation on each received packet. > We didn't check this patch in a test scenario where more than 1 recirculation > is occurring per packet. > > lib/dpif-netdev.c | 63 > ++++++++++++++++++++++++++++++++++++++++++++++++++----- > 1 file changed, 58 insertions(+), 5 deletions(-) > > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 9562827..79efce6 > 100644 > --- a/lib/dpif-netdev.c > +++ b/lib/dpif-netdev.c > @@ -4573,6 +4573,9 @@ dp_netdev_queue_batches(struct dp_packet *pkt, > packet_batch_per_flow_update(batch, pkt, mf); } > > +/* Threshold to skip EMC for recirculated packets. */ #define > +EMC_RECIRCT_NO_INSERT_THRESHOLD 0xFFFFF000 > + > /* Try to process all ('cnt') the 'packets' using only the exact match cache > * 'pmd->flow_cache'. If a flow is not found for a packet 'packets[i]', the > * miniflow is copied into 'keys' and the packet pointer is moved at the @@ - > 4620,15 +4623,39 @@ emc_processing(struct dp_netdev_pmd_thread > *pmd, > miniflow_extract(packet, &key->mf); > key->len = 0; /* Not computed yet. */ > > - /* If EMC is disabled skip hash computation and emc_lookup */ > + /* > + * EMC lookup is skipped when one or both of the following > + * two cases occurs: > + * > + * - EMC is disabled. This is detected from cur_min. > + * > + * - The EMC occupancy exceeds > EMC_RECIRCT_NO_INSERT_THRESHOLD and > + * the packet to be classified is being recirculated. When this > + * happens also EMC insertions are skipped for recirculated > + * packets. So that EMC is used just to store entries which > + * are hit from the 'original' packets. This way the EMC > + * thrashing is mitigated with a benefit on performance. > + */ > if (OVS_LIKELY(cur_min)) { > if (!md_is_valid) { > + /* This is an original packet. As it is not recirculated > + * we can retrieve the 5-tuple hash value without considering > + * the recirc id. */ > key->hash = dpif_netdev_packet_get_rss_hash_orig_pkt(packet, > &key->mf); > + flow = emc_lookup(flow_cache, key); > } else { > - key->hash = dpif_netdev_packet_get_rss_hash(packet, &key->mf); > + /* Recirculated packet. */ > + if (flow_cache->n_entries & > EMC_RECIRCT_NO_INSERT_THRESHOLD) { > + /* EMC occupancy is over the threshold. We skip EMC > + * lookup for recirculated packets. */ > + flow = NULL; > + } else { > + key->hash = dpif_netdev_packet_get_rss_hash(packet, > + &key->mf); > + flow = emc_lookup(flow_cache, key); > + } > } > - flow = emc_lookup(flow_cache, key); > } else { > flow = NULL; > } > @@ -4716,7 +4743,20 @@ handle_packet_upcall(struct > dp_netdev_pmd_thread *pmd, > add_actions->size); > } > ovs_mutex_unlock(&pmd->flow_mutex); > - emc_probabilistic_insert(pmd, key, netdev_flow); > + /* EMC insertion can be skipped by a probabilistic criteria or > + * - in case of recirculated packets - depending on the number of > + * EMC entries. */ > + if (!packet->md.recirc_id) { > + emc_probabilistic_insert(pmd, key, netdev_flow); > + } else { > + /* Recirculated packets. When EMC occupancy goes over > + * a threshold we avoid inserting new entries. */ > + if (!(pmd->flow_cache.n_entries & > + EMC_RECIRCT_NO_INSERT_THRESHOLD)) { > + /* Still under the threshold. */ > + emc_probabilistic_insert(pmd, key, netdev_flow); > + } > + } > } > } > > @@ -4809,7 +4849,20 @@ fast_path_processing(struct > dp_netdev_pmd_thread *pmd, > > flow = dp_netdev_flow_cast(rules[i]); > > - emc_probabilistic_insert(pmd, &keys[i], flow); > + /* EMC insertion can be skipped by a probabilistic criteria or > + * - in case of recirculated packets - depending on the number of > + * EMC entries. */ > + if (!packet->md.recirc_id) { > + emc_probabilistic_insert(pmd, &keys[i], flow); > + } else { > + /* Recirculated packets. When EMC occupancy goes over > + * a threshold we avoid inserting new entries. */ > + if (!(pmd->flow_cache.n_entries & > + EMC_RECIRCT_NO_INSERT_THRESHOLD)) { > + /* Still under the threshold. */ > + emc_probabilistic_insert(pmd, &keys[i], flow); > + } > + } > dp_netdev_queue_batches(packet, flow, &keys[i].mf, batches, > n_batches); > } > > -- > 2.4.11 > > _______________________________________________ > dev mailing list > dev@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> -----Original Message----- > From: O Mahony, Billy > Sent: Tuesday, August 1, 2017 11:51 AM > To: Fischetti, Antonio <antonio.fischetti@intel.com>; dev@openvswitch.org > Subject: RE: [ovs-dev] [PATCH v2 3/5] dpif-netdev: Skip EMC lookup/insert for > recirc packets. > > Hi Antonio, > > Unfortunately I think the performance deltas of this here probably need to be > re-worked given the bug discovered & fixed in EMC Insertion algorithm here > which according to the patch notes will significantly reduce EMC contention for > a given number of flows. > > https://mail.openvswitch.org/pipermail/ovs-dev/2017-July/336452.html [Antonio] I think this patch and the one you mentioned are 2 different approaches with 2 different goals that can work fine together. "Fix emc replacement policy" patch ---------------------------------- It allows to select - better than now - which location to overwrite so that the emc is used in a smarter way. The usecase here is the general emc replacement management, also with very few flows, ie 50 - 100 active flows. In case you have to choose between 2 active flows it will decide with a criteria based on a good random value. This patch ---------- This patch is instead targeting a 'congestion' usecase where you already have the EMC quite full and also recirculation(s). A typical example is a firewall keeping track of a tens of thousands of connections. A better example would be a scenario - as Jan S. mentioned in one of the last Community calls - with 'more than 1' recirculation. It also defines a criteria to avoid lookups. I think both patches can work together. > > However, before you commit more effort I would like to post a proposal to the > list on a more generalized EMC load-shedding mechanism which I think could be > more effective as it would be more granular than shedding just re-circulated > traffic. I hope to post that today. [Antonio] I'll have a look. > > Regards, > /Billy > > > -----Original Message----- > > From: ovs-dev-bounces@openvswitch.org [mailto:ovs-dev- > > bounces@openvswitch.org] On Behalf Of antonio.fischetti@intel.com > > Sent: Wednesday, July 19, 2017 5:05 PM > > To: dev@openvswitch.org > > Subject: [ovs-dev] [PATCH v2 3/5] dpif-netdev: Skip EMC lookup/insert for > > recirc packets. > > > > When OVS is configured as a firewall, with thousands of active concurrent > > connections, the EMC gets quicly saturated and may come under heavy > > thrashing for the reason that original and recirculated packets keep > overwrite > > existing active EMC entries due to its limited size (8k). > > > > This thrashing causes the EMC to be less efficient than the dcpls in terms of > > lookups and insertions. > > > > This patch allows to use the EMC efficiently by allowing only the 'original' > > packets to hit EMC. All recirculated packets are sent to the classifier > directly. > > An empirical threshold (EMC_RECIRCT_NO_INSERT_THRESHOLD - of 50%) for > > EMC occupancy is set to trigger this logic. By doing so when EMC utilization > > exceeds > > EMC_RECIRCT_NO_INSERT_THRESHOLD: > > - EMC Insertions are allowed just for original packets. EMC insertion > > and look up is skipped for recirculated packets. > > - Recirculated packets are sent to the classifier. > > > > This patch is based on patch > > "dpif-netdev: add EMC entry count and %full figure to pmd-stats-show" at: > > https://mail.openvswitch.org/pipermail/ovs-dev/2017-January/327570.html > > Also, this patch depends on the previous one in this series. > > > > Signed-off-by: Antonio Fischetti <antonio.fischetti@intel.com> > > Signed-off-by: Bhanuprakash Bodireddy > > <bhanuprakash.bodireddy@intel.com> > > Co-authored-by: Bhanuprakash Bodireddy > > <bhanuprakash.bodireddy@intel.com> > > --- > > In our Connection Tracker testbench set up with > > > > table=0, priority=1 actions=drop > > table=0, priority=10,arp actions=NORMAL table=0, priority=100,ct_state=- > > trk,ip actions=ct(table=1) table=1, ct_state=+new+trk,ip,in_port=1 > > actions=ct(commit),output:2 table=1, ct_state=+est+trk,ip,in_port=1 > > actions=output:2 table=1, ct_state=+new+trk,ip,in_port=2 actions=drop > > table=1, ct_state=+est+trk,ip,in_port=2 actions=output:1 > > > > we saw the following performance improvement. > > > > We measured packet Rx rate (regardless of packet loss). Bidirectional test > > with 64B UDP packets. > > Each row is a test with a different number of traffic streams. The traffic > > generator is set so that each stream establishes one UDP connection. > > Mpps columns reports the Rx rates on the 2 sides. > > > > +----------------------+-----------------------+ > > | Original OvS-DPDK | Previous case | > > | + patches #1,2 | + this patch | > > ---------+------------+---------+------------+----------+ > > Traffic | Rx | EMC | Rx | EMC | > > Streams | [Mpps] | entries | [Mpps] | entries | > > ---------+------------+---------+------------+----------+ > > 10 | 2.60, 2.67 | 20 | 2.60, 2.64 | 20 | > > 100 | 2.53, 2.58 | 200 | 2.59, 2.61 | 201 | > > 1,000 | 2.02, 2.03 | 1929 | 2.15, 2.15 | 1997 | > > 2,000 | 1.94, 1.96 | 3661 | 1.97, 1.98 | 3668 | > > 3,000 | 1.87, 1.90 | 5086 | 1.96, 1.98 | 4736 | > > 4,000 | 1.82, 1.82 | 6173 | 1.95, 1.94 | 5280 | > > 10,000 | 1.68, 1.69 | 7826 | 1.84, 1.84 | 7102 | > > 30,000 | 1.57, 1.58 | 8192 | 1.68, 1.70 | 8192 | > > ---------+------------+---------+------------+----------+ > > > > This test setup implies 1 recirculation on each received packet. > > We didn't check this patch in a test scenario where more than 1 recirculation > > is occurring per packet. > > > > lib/dpif-netdev.c | 63 > > ++++++++++++++++++++++++++++++++++++++++++++++++++----- > > 1 file changed, 58 insertions(+), 5 deletions(-) > > > > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 9562827..79efce6 > > 100644 > > --- a/lib/dpif-netdev.c > > +++ b/lib/dpif-netdev.c > > @@ -4573,6 +4573,9 @@ dp_netdev_queue_batches(struct dp_packet *pkt, > > packet_batch_per_flow_update(batch, pkt, mf); } > > > > +/* Threshold to skip EMC for recirculated packets. */ #define > > +EMC_RECIRCT_NO_INSERT_THRESHOLD 0xFFFFF000 > > + > > /* Try to process all ('cnt') the 'packets' using only the exact match cache > > * 'pmd->flow_cache'. If a flow is not found for a packet 'packets[i]', the > > * miniflow is copied into 'keys' and the packet pointer is moved at the @@ > - > > 4620,15 +4623,39 @@ emc_processing(struct dp_netdev_pmd_thread > > *pmd, > > miniflow_extract(packet, &key->mf); > > key->len = 0; /* Not computed yet. */ > > > > - /* If EMC is disabled skip hash computation and emc_lookup */ > > + /* > > + * EMC lookup is skipped when one or both of the following > > + * two cases occurs: > > + * > > + * - EMC is disabled. This is detected from cur_min. > > + * > > + * - The EMC occupancy exceeds > > EMC_RECIRCT_NO_INSERT_THRESHOLD and > > + * the packet to be classified is being recirculated. When this > > + * happens also EMC insertions are skipped for recirculated > > + * packets. So that EMC is used just to store entries which > > + * are hit from the 'original' packets. This way the EMC > > + * thrashing is mitigated with a benefit on performance. > > + */ > > if (OVS_LIKELY(cur_min)) { > > if (!md_is_valid) { > > + /* This is an original packet. As it is not recirculated > > + * we can retrieve the 5-tuple hash value without > considering > > + * the recirc id. */ > > key->hash = dpif_netdev_packet_get_rss_hash_orig_pkt(packet, > > &key->mf); > > + flow = emc_lookup(flow_cache, key); > > } else { > > - key->hash = dpif_netdev_packet_get_rss_hash(packet, &key- > >mf); > > + /* Recirculated packet. */ > > + if (flow_cache->n_entries & > > EMC_RECIRCT_NO_INSERT_THRESHOLD) { > > + /* EMC occupancy is over the threshold. We skip EMC > > + * lookup for recirculated packets. */ > > + flow = NULL; > > + } else { > > + key->hash = dpif_netdev_packet_get_rss_hash(packet, > > + &key->mf); > > + flow = emc_lookup(flow_cache, key); > > + } > > } > > - flow = emc_lookup(flow_cache, key); > > } else { > > flow = NULL; > > } > > @@ -4716,7 +4743,20 @@ handle_packet_upcall(struct > > dp_netdev_pmd_thread *pmd, > > add_actions->size); > > } > > ovs_mutex_unlock(&pmd->flow_mutex); > > - emc_probabilistic_insert(pmd, key, netdev_flow); > > + /* EMC insertion can be skipped by a probabilistic criteria or > > + * - in case of recirculated packets - depending on the number of > > + * EMC entries. */ > > + if (!packet->md.recirc_id) { > > + emc_probabilistic_insert(pmd, key, netdev_flow); > > + } else { > > + /* Recirculated packets. When EMC occupancy goes over > > + * a threshold we avoid inserting new entries. */ > > + if (!(pmd->flow_cache.n_entries & > > + EMC_RECIRCT_NO_INSERT_THRESHOLD)) { > > + /* Still under the threshold. */ > > + emc_probabilistic_insert(pmd, key, netdev_flow); > > + } > > + } > > } > > } > > > > @@ -4809,7 +4849,20 @@ fast_path_processing(struct > > dp_netdev_pmd_thread *pmd, > > > > flow = dp_netdev_flow_cast(rules[i]); > > > > - emc_probabilistic_insert(pmd, &keys[i], flow); > > + /* EMC insertion can be skipped by a probabilistic criteria or > > + * - in case of recirculated packets - depending on the number of > > + * EMC entries. */ > > + if (!packet->md.recirc_id) { > > + emc_probabilistic_insert(pmd, &keys[i], flow); > > + } else { > > + /* Recirculated packets. When EMC occupancy goes over > > + * a threshold we avoid inserting new entries. */ > > + if (!(pmd->flow_cache.n_entries & > > + EMC_RECIRCT_NO_INSERT_THRESHOLD)) { > > + /* Still under the threshold. */ > > + emc_probabilistic_insert(pmd, &keys[i], flow); > > + } > > + } > > dp_netdev_queue_batches(packet, flow, &keys[i].mf, batches, > > n_batches); > > } > > > > -- > > 2.4.11 > > > > _______________________________________________ > > dev mailing list > > dev@openvswitch.org > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 9562827..79efce6 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -4573,6 +4573,9 @@ dp_netdev_queue_batches(struct dp_packet *pkt, packet_batch_per_flow_update(batch, pkt, mf); } +/* Threshold to skip EMC for recirculated packets. */ +#define EMC_RECIRCT_NO_INSERT_THRESHOLD 0xFFFFF000 + /* Try to process all ('cnt') the 'packets' using only the exact match cache * 'pmd->flow_cache'. If a flow is not found for a packet 'packets[i]', the * miniflow is copied into 'keys' and the packet pointer is moved at the @@ -4620,15 +4623,39 @@ emc_processing(struct dp_netdev_pmd_thread *pmd, miniflow_extract(packet, &key->mf); key->len = 0; /* Not computed yet. */ - /* If EMC is disabled skip hash computation and emc_lookup */ + /* + * EMC lookup is skipped when one or both of the following + * two cases occurs: + * + * - EMC is disabled. This is detected from cur_min. + * + * - The EMC occupancy exceeds EMC_RECIRCT_NO_INSERT_THRESHOLD and + * the packet to be classified is being recirculated. When this + * happens also EMC insertions are skipped for recirculated + * packets. So that EMC is used just to store entries which + * are hit from the 'original' packets. This way the EMC + * thrashing is mitigated with a benefit on performance. + */ if (OVS_LIKELY(cur_min)) { if (!md_is_valid) { + /* This is an original packet. As it is not recirculated + * we can retrieve the 5-tuple hash value without considering + * the recirc id. */ key->hash = dpif_netdev_packet_get_rss_hash_orig_pkt(packet, &key->mf); + flow = emc_lookup(flow_cache, key); } else { - key->hash = dpif_netdev_packet_get_rss_hash(packet, &key->mf); + /* Recirculated packet. */ + if (flow_cache->n_entries & EMC_RECIRCT_NO_INSERT_THRESHOLD) { + /* EMC occupancy is over the threshold. We skip EMC + * lookup for recirculated packets. */ + flow = NULL; + } else { + key->hash = dpif_netdev_packet_get_rss_hash(packet, + &key->mf); + flow = emc_lookup(flow_cache, key); + } } - flow = emc_lookup(flow_cache, key); } else { flow = NULL; } @@ -4716,7 +4743,20 @@ handle_packet_upcall(struct dp_netdev_pmd_thread *pmd, add_actions->size); } ovs_mutex_unlock(&pmd->flow_mutex); - emc_probabilistic_insert(pmd, key, netdev_flow); + /* EMC insertion can be skipped by a probabilistic criteria or + * - in case of recirculated packets - depending on the number of + * EMC entries. */ + if (!packet->md.recirc_id) { + emc_probabilistic_insert(pmd, key, netdev_flow); + } else { + /* Recirculated packets. When EMC occupancy goes over + * a threshold we avoid inserting new entries. */ + if (!(pmd->flow_cache.n_entries & + EMC_RECIRCT_NO_INSERT_THRESHOLD)) { + /* Still under the threshold. */ + emc_probabilistic_insert(pmd, key, netdev_flow); + } + } } } @@ -4809,7 +4849,20 @@ fast_path_processing(struct dp_netdev_pmd_thread *pmd, flow = dp_netdev_flow_cast(rules[i]); - emc_probabilistic_insert(pmd, &keys[i], flow); + /* EMC insertion can be skipped by a probabilistic criteria or + * - in case of recirculated packets - depending on the number of + * EMC entries. */ + if (!packet->md.recirc_id) { + emc_probabilistic_insert(pmd, &keys[i], flow); + } else { + /* Recirculated packets. When EMC occupancy goes over + * a threshold we avoid inserting new entries. */ + if (!(pmd->flow_cache.n_entries & + EMC_RECIRCT_NO_INSERT_THRESHOLD)) { + /* Still under the threshold. */ + emc_probabilistic_insert(pmd, &keys[i], flow); + } + } dp_netdev_queue_batches(packet, flow, &keys[i].mf, batches, n_batches); }