Message ID | 1520364208-54362-1-git-send-email-yipeng1.wang@intel.com |
---|---|
Headers | show |
Series | dpif-netdev: Combine CD and DFC patch for datapath refactor | expand |
Hi All, I have run some tests using a more realistic distribution of flows - see below - to what we normally test with. This is a phy to phy test with just port forwarding but I think that is the best way to test the EMC as it avoids noise from other mechanisms e.g. vhost and dpcls lookup. It uses 64B packets and 1 PMD. I also ran the tests with the EMC disabled. Baseline HoM 951cbaf 6/3/18 || emc-insert-prob 1/100 | emc disabled offered flows || rxd emc cycles/ | rxd emc cycles kpps || kpps hits pkt | kpps hits pkt ----------------++--------------------------+----------------------- 14,000 8 || 10830 100% 212 | 7360 0% 311 14,000 100 || 9730 100% 236 | 7370 0% 311 14,000 10000 || 6545 69% 345 | 7370 0% 311 14,000 1000000 || 6202 3% 370 | 7370 0% 311 Combine CD and DFC patch || emc-insert-prob 1/100 | emc disabled offered flows || rxd emc dfc cycles | rxd emc dfc cycles kpps || kpps hits hits /pkt | hits hits /pkt ----------------++--------------------------+----------------------- 14,000 8 || 10930 100% 0% 210 | 8570 0% 100% 268 14,000 100 || 10220 100% 0% 224 | 7800 0% 100% 294 14,000 10000 || 8000 84% 16% 287 | 6770 0% 100% 339 14,000 1000000 || 5921 7% 65% 387 | 6060 0% 72% 378 In these scenarios the patch gives an advantage at lower numbers of flows but this advantaged is reversed at very high flow numbers - presumably as the DFC itself approaches capacity. Another interesting scenario to test would be the case of many shortlived flows. 1M flows and 200k new flows/sec for example. This real use case was presented at the last OvS Con https://www.slideshare.net/LF_OpenvSwitch/lfovs17ovsdpdk-for-nfv-go-live-feedback. Which I'd hope to implement in due course. Below is some details on the how and why flow distribution was made for these tests. Regards, Billy All caches are designed on the assumption that in the real world that access requests are not uniformly distributed. By design they are used to improve performance only in situations where some items are accessed more frequently than others. If this assumption does not hold then the use a cache actually degrades performance. Currently we test the EMC with one of two scenarios both of which break the above assumption: 1) Round Robin 'Distribution': The TG sends a packet from flow#1 then flow#2 up to flow#N then back to flow#1 again. Testing with this case gives results that under-state the benefit of the EMC to the maximum extent possible. By sending packet from every other flow in between every two packets from any given flow the chances of the flow having been evicted in the interim between the two packets are maximized. If a tester were to intentionally design a test to understate the benefits of the EMC it would be a round-robin flow distribution. 2) Uniform Distribution: The TG randomly selects the next packet to send from all the configured flows. Testing with this case gives results that under-state the benefit of the EMC. Testing with this case gives results that under-state the benefit of the EMC. As each flow is equally likely to occur then unless the number of flows is less than the number of EMC entries, there are many more flows than is likely no benefit from using the EMC. By testing only with these flow distributions that break the fundamental assumptions indicating the use of an EMC, we are consistently under-stating the benefits of flow-caching. A More Realistic Distribution: A more realistic distribution is almost certainly some kind of power-law or Pareto distribution. In this kind of distribution the majority of packets belong to a small number of flows. Think of the classic '80/20' rule. Given a power-law distribution if we rank all the flows by their packets-per-second we see that flow with the Nth most packets-per-second is has a rate that is some fraction of the N-1th flow for all N. This kind of distribution is what is seen in the natural world regarding the distribution of ranking of word occurrence in a language, the population ranks of cities in various countries, corporation sizes, income rankings, ranks of number of people watching the same TV channel, and so on. [https://en.wikipedia.org/wiki/Zipf%27s_law]. For example using a Zipfian distribution with k=1 as the model for pps for flows the flow distribution would look like this (y-axis is not linear): 1000| * 500| * * 333| * * * PPS 250| * * * * .. 10| * * * * * +------------...------------- 1 2 3 4 ... 100 Flow (PPS Ranking) Trex software-based traffic generator with a small amount of additional scripting does allow an approximation of such a distribution (specifically the limitation is that instead of pps rate for each individual flow a bucket of flows much share the same pps value - for example below all the flows are distributed across 8 streams to appoximate this kind of power-law distribution). For example: # start 14000kpps 10000flows Each stream has 14000/8 kpps but has a different number of flows. Each stream has about half the number of flows as the previous one. # flows per stream: [5019, 2509, 1254, 627, 313, 156, 78, 39] total: 9995 The actual number of flows per stream is a close approximation of the precise distribution - usually it's out by just one or two. This is done to allow use of prime numbers for the ranges of the elements of the 5-tuple members. For example incrementing over 29 ip address and 173 port numbers means 29*173=5017 unique flows as 29 and 173 are prime. Use of non-primes would result in aliasing of the combinations resulting in much less unique flows. # stream#: 0 req. #flows: 5019 tuple_spans: [29, 173] diff: 2 # stream#: 1 req. #flows: 2509 tuple_spans: [13, 193] diff: 0 # stream#: 2 req. #flows: 1254 tuple_spans: [5, 251] diff: 1 # stream#: 3 req. #flows: 627 tuple_spans: [313, 2] diff: 1 # stream#: 4 req. #flows: 313 tuple_spans: [1, 313] diff: 0 # stream#: 5 req. #flows: 156 tuple_spans: [1, 157] diff: 1 # stream#: 6 req. #flows: 78 tuple_spans: [1, 79] diff: 1 # stream#: 7 req. #flows: 39 tuple_spans: [3, 13] diff: 0 # stream#: 0 src: 16.0.0.0 - 16.0.0.28 : 0 - 172 dst 48.0.0.0 - 48.0.0.1 : 0 - 1 # stream#: 1 src: 16.0.0.29 - 16.0.0.41 : 173 - 365 dst 48.0.0.0 - 48.0.0.1 : 0 - 1 # stream#: 2 src: 16.0.0.42 - 16.0.0.46 : 366 - 616 dst 48.0.0.0 - 48.0.0.1 : 0 - 1 # stream#: 3 src: 16.0.0.47 - 16.0.1.103 : 617 - 618 dst 48.0.0.0 - 48.0.0.1 : 0 - 1 # stream#: 4 src: 16.0.1.104 - 16.0.1.104 : 619 - 931 dst 48.0.0.0 - 48.0.0.1 : 0 - 1 # stream#: 5 src: 16.0.1.105 - 16.0.1.105 : 932 - 1088 dst 48.0.0.0 - 48.0.0.1 : 0 - 1 # stream#: 6 src: 16.0.1.106 - 16.0.1.106 : 1089 - 1167 dst 48.0.0.0 - 48.0.0.1 : 0 - 1 # stream#: 7 src: 16.0.1.107 - 16.0.1.109 : 1168 - 1180 dst 48.0.0.0 - 48.0.0.1 : 0 - 1 > -----Original Message----- > From: ovs-dev-bounces@openvswitch.org [mailto:ovs-dev- > bounces@openvswitch.org] On Behalf Of yipeng wang > Sent: Tuesday, March 6, 2018 7:23 PM > To: dev@openvswitch.org; Wang, Yipeng1 <yipeng1.wang@intel.com>; > jan.scheurich@ericsson.com; Bodireddy, Bhanuprakash > <bhanuprakash.bodireddy@intel.com>; u9012063@gmail.com > Cc: Tai, Charlie <charlie.tai@intel.com> > Subject: [ovs-dev] [PATCH 0/3] dpif-netdev: Combine CD and DFC patch for > datapath refactor > > This patch set is the V1 implementation to combine the CD and DFC design. > Both patches intend to refactor datapath to avoid costly sequential subtable > search. > > CD and DFC patch sets: > CD: [PATCH v2 0/5] dpif-netdev: Cuckoo-Distributor implementation > https://mail.openvswitch.org/pipermail/ovs-dev/2017-October/340305.html > > DFC: [PATCH] dpif-netdev: Refactor datapath flow cache > https://mail.openvswitch.org/pipermail/ovs-dev/2017-November/341066.html > > The first commit is a rebase of Jan Scheurich's patch of [PATCH] dpif-netdev: > Refactor datapath flow cache > > The second commit is to incorporate CD's way-associative design into DFC to > improve the hit rate. > > The third commit is to change the distributor to cache an index of flow_table > entry to improve memory efficiency. > > > RFC of this patch set: > https://mail.openvswitch.org/pipermail/ovs-dev/2018-January/343411.html > > > > RFC->V1: > 1. rebase to master head. > 2. The last commit is totally rewritten to use the flow_table as indirect table. > The CD/DFC distributor will cache the index of flow_table entry. > 3. Incorporate commit 2 into commit 1. (Bhanu's comment) 4. Change DFC to be > always on in commit 1. (Bhanu's comment) > > > Yipeng Wang (2): > dpif-netdev: Use way-associative cache > use flow_table as indirect table > > Jan Scheurich (1): > dpif-netdev: Refactor datapath flow cache > > lib/cmap.c | 62 +++++++++ > lib/cmap.h | 5 + > lib/dpif-netdev-perf.h | 1 + > lib/dpif-netdev.c | 359 +++++++++++++++++++++++++++++++++---------------- > 4 files changed, 310 insertions(+), 117 deletions(-) > > -- > 2.7.4 > > _______________________________________________ > dev mailing list > dev@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Hi, all Thanks Billy for the test. The skewed traffic generator is very helpful. After investigation with Billy, we found the flow count is not correct in the results table. The number of flows in the table is half of the real number, so the case that CD/DFC performs much worse than master head is at 2 million (not 1 million) flow case. Since CD/DFC has 1 million entries, In such case, the DFC/CD suffers a lot of misses (like EMC miss will harm total throughput too). We are working on a better test script (including skewed traffic to various numbers of subtables), and also a new patch set. We will post version 2 and new results soon. Thanks Yipeng >-----Original Message----- >From: O Mahony, Billy >Sent: Thursday, March 8, 2018 2:31 AM >To: Wang, Yipeng1 <yipeng1.wang@intel.com>; dev@openvswitch.org; Wang, Yipeng1 <yipeng1.wang@intel.com>; >jan.scheurich@ericsson.com; Bodireddy, Bhanuprakash <bhanuprakash.bodireddy@intel.com>; u9012063@gmail.com >Cc: Tai, Charlie <charlie.tai@intel.com>; Franck Baudin <fbaudin@redhat.com> >Subject: RE: [ovs-dev] [PATCH 0/3] dpif-netdev: Combine CD and DFC patch for datapath refactor > >Hi All, > >I have run some tests using a more realistic distribution of flows - see below >- to what we normally test with. > >This is a phy to phy test with just port forwarding but I think that is the >best way to test the EMC as it avoids noise from other mechanisms e.g. vhost >and dpcls lookup. It uses 64B packets and 1 PMD. > >I also ran the tests with the EMC disabled. > > Baseline HoM 951cbaf 6/3/18 > || emc-insert-prob 1/100 | emc disabled >offered flows || rxd emc cycles/ | rxd emc cycles >kpps || kpps hits pkt | kpps hits pkt >----------------++--------------------------+----------------------- >14,000 8 || 10830 100% 212 | 7360 0% 311 >14,000 100 || 9730 100% 236 | 7370 0% 311 >14,000 10000 || 6545 69% 345 | 7370 0% 311 >14,000 1000000 || 6202 3% 370 | 7370 0% 311 > > Combine CD and DFC patch > || emc-insert-prob 1/100 | emc disabled >offered flows || rxd emc dfc cycles | rxd emc dfc cycles >kpps || kpps hits hits /pkt | hits hits /pkt >----------------++--------------------------+----------------------- >14,000 8 || 10930 100% 0% 210 | 8570 0% 100% 268 >14,000 100 || 10220 100% 0% 224 | 7800 0% 100% 294 >14,000 10000 || 8000 84% 16% 287 | 6770 0% 100% 339 >14,000 1000000 || 5921 7% 65% 387 | 6060 0% 72% 378 > >In these scenarios the patch gives an advantage at lower numbers of flows but >this advantaged is reversed at very high flow numbers - presumably as the DFC itself >approaches capacity. > >Another interesting scenario to test would be the case of many shortlived flows. >1M flows and 200k new flows/sec for example. This real use case was presented at >the last OvS Con https://www.slideshare.net/LF_OpenvSwitch/lfovs17ovsdpdk-for-nfv-go-live-feedback. >Which I'd hope to implement in due course. > >Below is some details on the how and why flow distribution was made for these tests. > >Regards, >Billy > > >All caches are designed on the assumption that in the real world that access >requests are not uniformly distributed. By design they are used to improve >performance only in situations where some items are accessed more frequently >than others. If this assumption does not hold then the use a cache actually >degrades performance. > >Currently we test the EMC with one of two scenarios both of which break the >above assumption: > >1) Round Robin 'Distribution': > > The TG sends a packet from flow#1 then flow#2 up to flow#N then back to > flow#1 again. > > Testing with this case gives results that under-state the benefit of the > EMC to the maximum extent possible. By sending packet from every other flow in > between every two packets from any given flow the chances of the flow having > been evicted in the interim between the two packets are maximized. If a tester > were to intentionally design a test to understate the benefits of the EMC it > would be a round-robin flow distribution. > >2) Uniform Distribution: > > The TG randomly selects the next packet to send from all the configured flows. > > Testing with this case gives results that under-state the benefit of the > EMC. Testing with this case gives results that under-state the benefit > of the EMC. As each flow is equally likely to occur then unless the number of > flows is less than the number of EMC entries, there are many more flows than > is likely no benefit from using the EMC. > >By testing only with these flow distributions that break the fundamental >assumptions indicating the use of an EMC, we are consistently under-stating the >benefits of flow-caching. > >A More Realistic Distribution: > >A more realistic distribution is almost certainly some kind of power-law or >Pareto distribution. In this kind of distribution the majority of packets >belong to a small number of flows. Think of the classic '80/20' rule. Given a >power-law distribution if we rank all the flows by their packets-per-second we >see that flow with the Nth most packets-per-second is has a rate that is some >fraction of the N-1th flow for all N. This kind of distribution is what is >seen in the natural world regarding the distribution of ranking of word >occurrence in a language, the population ranks of cities in various countries, >corporation sizes, income rankings, ranks of number of people watching the same >TV channel, and so on. [https://en.wikipedia.org/wiki/Zipf%27s_law]. > >For example using a Zipfian distribution with k=1 as the model for pps for >flows the flow distribution would look like this (y-axis is not linear): > > 1000| * > 500| * * > 333| * * * >PPS 250| * * * * > .. > 10| * * * * * > +------------...------------- > 1 2 3 4 ... 100 > Flow (PPS Ranking) > > >Trex software-based traffic generator with a small amount of additional >scripting does allow an approximation of such a distribution (specifically the >limitation is that instead of pps rate for each individual flow a bucket of >flows much share the same pps value - for example below all the flows are >distributed across 8 streams to appoximate this kind of power-law >distribution). > >For example: > ># start 14000kpps 10000flows >Each stream has 14000/8 kpps but has a different number of flows. Each stream >has about half the number of flows as the previous one. > ># flows per stream: [5019, 2509, 1254, 627, 313, 156, 78, 39] total: 9995 > >The actual number of flows per stream is a close approximation of the precise >distribution - usually it's out by just one or two. This is done to allow use >of prime numbers for the ranges of the elements of the 5-tuple members. For >example incrementing over 29 ip address and 173 port numbers means 29*173=5017 >unique flows as 29 and 173 are prime. Use of non-primes would result in >aliasing of the combinations resulting in much less unique flows. > ># stream#: 0 req. #flows: 5019 tuple_spans: [29, 173] diff: 2 ># stream#: 1 req. #flows: 2509 tuple_spans: [13, 193] diff: 0 ># stream#: 2 req. #flows: 1254 tuple_spans: [5, 251] diff: 1 ># stream#: 3 req. #flows: 627 tuple_spans: [313, 2] diff: 1 ># stream#: 4 req. #flows: 313 tuple_spans: [1, 313] diff: 0 ># stream#: 5 req. #flows: 156 tuple_spans: [1, 157] diff: 1 ># stream#: 6 req. #flows: 78 tuple_spans: [1, 79] diff: 1 ># stream#: 7 req. #flows: 39 tuple_spans: [3, 13] diff: 0 ># stream#: 0 src: 16.0.0.0 - 16.0.0.28 : 0 - 172 dst 48.0.0.0 - 48.0.0.1 : 0 - 1 ># stream#: 1 src: 16.0.0.29 - 16.0.0.41 : 173 - 365 dst 48.0.0.0 - 48.0.0.1 : 0 - 1 ># stream#: 2 src: 16.0.0.42 - 16.0.0.46 : 366 - 616 dst 48.0.0.0 - 48.0.0.1 : 0 - 1 ># stream#: 3 src: 16.0.0.47 - 16.0.1.103 : 617 - 618 dst 48.0.0.0 - 48.0.0.1 : 0 - 1 ># stream#: 4 src: 16.0.1.104 - 16.0.1.104 : 619 - 931 dst 48.0.0.0 - 48.0.0.1 : 0 - 1 ># stream#: 5 src: 16.0.1.105 - 16.0.1.105 : 932 - 1088 dst 48.0.0.0 - 48.0.0.1 : 0 - 1 ># stream#: 6 src: 16.0.1.106 - 16.0.1.106 : 1089 - 1167 dst 48.0.0.0 - 48.0.0.1 : 0 - 1 ># stream#: 7 src: 16.0.1.107 - 16.0.1.109 : 1168 - 1180 dst 48.0.0.0 - 48.0.0.1 : 0 - 1 > > > >> -----Original Message----- >> From: ovs-dev-bounces@openvswitch.org [mailto:ovs-dev- >> bounces@openvswitch.org] On Behalf Of yipeng wang >> Sent: Tuesday, March 6, 2018 7:23 PM >> To: dev@openvswitch.org; Wang, Yipeng1 <yipeng1.wang@intel.com>; >> jan.scheurich@ericsson.com; Bodireddy, Bhanuprakash >> <bhanuprakash.bodireddy@intel.com>; u9012063@gmail.com >> Cc: Tai, Charlie <charlie.tai@intel.com> >> Subject: [ovs-dev] [PATCH 0/3] dpif-netdev: Combine CD and DFC patch for >> datapath refactor >> >> This patch set is the V1 implementation to combine the CD and DFC design. >> Both patches intend to refactor datapath to avoid costly sequential subtable >> search. >> >> CD and DFC patch sets: >> CD: [PATCH v2 0/5] dpif-netdev: Cuckoo-Distributor implementation >> https://mail.openvswitch.org/pipermail/ovs-dev/2017-October/340305.html >> >> DFC: [PATCH] dpif-netdev: Refactor datapath flow cache >> https://mail.openvswitch.org/pipermail/ovs-dev/2017-November/341066.html >> >> The first commit is a rebase of Jan Scheurich's patch of [PATCH] dpif-netdev: >> Refactor datapath flow cache >> >> The second commit is to incorporate CD's way-associative design into DFC to >> improve the hit rate. >> >> The third commit is to change the distributor to cache an index of flow_table >> entry to improve memory efficiency. >> >> >> RFC of this patch set: >> https://mail.openvswitch.org/pipermail/ovs-dev/2018-January/343411.html >> >> >> >> RFC->V1: >> 1. rebase to master head. >> 2. The last commit is totally rewritten to use the flow_table as indirect table. >> The CD/DFC distributor will cache the index of flow_table entry. >> 3. Incorporate commit 2 into commit 1. (Bhanu's comment) 4. Change DFC to be >> always on in commit 1. (Bhanu's comment) >> >> >> Yipeng Wang (2): >> dpif-netdev: Use way-associative cache >> use flow_table as indirect table >> >> Jan Scheurich (1): >> dpif-netdev: Refactor datapath flow cache >> >> lib/cmap.c | 62 +++++++++ >> lib/cmap.h | 5 + >> lib/dpif-netdev-perf.h | 1 + >> lib/dpif-netdev.c | 359 +++++++++++++++++++++++++++++++++---------------- >> 4 files changed, 310 insertions(+), 117 deletions(-) >> >> -- >> 2.7.4 >> >> _______________________________________________ >> dev mailing list >> dev@openvswitch.org >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev