mbox series

[ovs-dev,0/3] dpif-netdev: Combine CD and DFC patch for datapath refactor

Message ID 1520364208-54362-1-git-send-email-yipeng1.wang@intel.com
Headers show
Series dpif-netdev: Combine CD and DFC patch for datapath refactor | expand

Message

Wang, Yipeng1 March 6, 2018, 7:23 p.m. UTC
This patch set is the V1 implementation to combine the CD and DFC design.
Both patches intend to refactor datapath to avoid costly sequential subtable
search.

CD and DFC patch sets:
CD: [PATCH v2 0/5] dpif-netdev: Cuckoo-Distributor	implementation
https://mail.openvswitch.org/pipermail/ovs-dev/2017-October/340305.html

DFC: [PATCH] dpif-netdev: Refactor datapath flow cache
https://mail.openvswitch.org/pipermail/ovs-dev/2017-November/341066.html

The first commit is a rebase of Jan Scheurich's patch of
[PATCH] dpif-netdev: Refactor datapath flow cache

The second commit is to incorporate CD's way-associative design into DFC to
improve the hit rate.

The third commit is to change the distributor to cache an index of flow_table
entry to improve memory efficiency.


RFC of this patch set:
https://mail.openvswitch.org/pipermail/ovs-dev/2018-January/343411.html



RFC->V1:
1. rebase to master head.
2. The last commit is totally rewritten to use the flow_table as indirect table.
   The CD/DFC distributor will cache the index of flow_table entry.
3. Incorporate commit 2 into commit 1. (Bhanu's comment)
4. Change DFC to be always on in commit 1. (Bhanu's comment)


Yipeng Wang (2):
  dpif-netdev: Use way-associative cache
  use flow_table as indirect table

Jan Scheurich (1):
  dpif-netdev: Refactor datapath flow cache

 lib/cmap.c             |  62 +++++++++
 lib/cmap.h             |   5 +
 lib/dpif-netdev-perf.h |   1 +
 lib/dpif-netdev.c      | 359 +++++++++++++++++++++++++++++++++----------------
 4 files changed, 310 insertions(+), 117 deletions(-)

Comments

Billy O'Mahony March 8, 2018, 10:31 a.m. UTC | #1
Hi All,

I have run some tests using a more realistic distribution of flows - see below
- to what we normally test with.

This is a phy to phy test with just port forwarding but I think that is the 
best way to test the EMC as it avoids noise from other mechanisms e.g. vhost
and dpcls lookup. It uses 64B packets and 1 PMD.

I also ran the tests with the EMC disabled.

                             Baseline HoM 951cbaf 6/3/18
                ||   emc-insert-prob 1/100  |     emc disabled
offered   flows ||   rxd    emc    cycles/  |  rxd  emc       cycles
kpps            ||  kpps   hits       pkt   | kpps hits          pkt
----------------++--------------------------+-----------------------
14,000        8 || 10830   100%       212   | 7360   0%          311
14,000      100 ||  9730   100%       236   | 7370   0%          311
14,000    10000 ||  6545    69%       345   | 7370   0%          311
14,000  1000000 ||  6202     3%       370   | 7370   0%          311

                                 Combine CD and DFC patch
                ||  emc-insert-prob 1/100   |     emc disabled
offered   flows ||  rxd    emc  dfc  cycles | rxd   emc   dfc cycles
kpps            || kpps   hits hits    /pkt |      hits  hits   /pkt
----------------++--------------------------+-----------------------
14,000        8 || 10930  100%   0%     210 | 8570   0%  100%    268
14,000      100 || 10220  100%   0%     224 | 7800   0%  100%    294
14,000    10000 || 8000    84%  16%     287 | 6770   0%  100%    339
14,000  1000000 || 5921     7%  65%     387 | 6060   0%   72%    378

In these scenarios the patch gives an advantage at lower numbers of flows but
this advantaged is reversed at very high flow numbers - presumably as the DFC itself
approaches capacity.

Another interesting scenario to test would be the case of many shortlived flows.
1M flows and 200k new flows/sec for example. This real use case was presented at
the last OvS Con https://www.slideshare.net/LF_OpenvSwitch/lfovs17ovsdpdk-for-nfv-go-live-feedback.
Which I'd hope to implement in due course.

Below is some details on the how and why flow distribution was made for these tests.

Regards,
Billy


All caches are designed on the assumption that in the real world that access
requests are not uniformly distributed.  By design they are used to improve
performance only in situations where some items are accessed more frequently
than others.  If this assumption does not hold then the use a cache actually
degrades performance.

Currently we test the EMC with one of two scenarios both of which break the
above assumption:

1) Round Robin 'Distribution':

    The TG sends a packet from flow#1 then flow#2 up to flow#N then back to
    flow#1 again.

    Testing with this case gives results that under-state the benefit of the
    EMC to the maximum extent possible.  By sending packet from every other flow in
    between every two packets from any given flow the chances of the flow having
    been evicted in the interim between the two packets are maximized.  If a tester
    were to intentionally design a test to understate the benefits of the EMC it
    would be a round-robin flow distribution.

2) Uniform Distribution:

    The TG randomly selects the next packet to send from all the configured flows.

    Testing with this case gives results that under-state the benefit of the
    EMC.  Testing with this case gives results that under-state the benefit
    of the EMC.  As each flow is equally likely to occur then unless the number of 
    flows is less than the number of EMC entries, there are many more flows than
    is likely no benefit from using the EMC.

By testing only with these flow distributions that break the fundamental
assumptions indicating the use of an EMC, we are consistently under-stating the
benefits of flow-caching.

A More Realistic Distribution:

A more realistic distribution is almost certainly some kind of power-law or
Pareto distribution.  In this kind of distribution the majority of packets
belong to a small number of flows.  Think of the classic '80/20' rule.  Given a
power-law distribution if we rank all the flows by their packets-per-second we
see that flow with the Nth most packets-per-second is has a rate that is some
fraction of the N-1th flow for all N.  This kind of distribution is what is
seen in the natural world regarding the distribution of ranking of word
occurrence in a language,  the population ranks of cities in various countries,
corporation sizes, income rankings, ranks of number of people watching the same
TV channel, and so on. [https://en.wikipedia.org/wiki/Zipf%27s_law].

For example using a Zipfian distribution with k=1 as the model for pps for
flows the flow distribution would look like this (y-axis is not linear):

        1000| *
         500| *  *
         333| *  *  *
PPS      250| *  *  *  *
           ..
          10| *  *  *  *       *
            +------------...-------------
              1  2  3  4 ...  100
              Flow (PPS Ranking)


Trex software-based traffic generator with a small amount of additional
scripting does allow an approximation of such a distribution (specifically the
limitation is that instead of pps rate for each individual flow a bucket of
flows much share the same pps value - for example below all the flows are
distributed across 8 streams to appoximate this kind of power-law
distribution).

For example: 

# start 14000kpps 10000flows
Each stream has 14000/8 kpps but has a different number of flows. Each stream
has about half the number of flows as the previous one.

# flows per stream: [5019, 2509, 1254, 627, 313, 156, 78, 39] total: 9995

The actual number of flows per stream is a close approximation of the precise
distribution - usually it's out by just one or two. This is done to allow use
of prime numbers for the ranges of the elements of the 5-tuple members. For
example incrementing over 29 ip address and 173 port numbers means 29*173=5017
unique flows as 29 and 173 are prime. Use of non-primes would result in
aliasing of the combinations resulting in much less unique flows.

# stream#: 0 req. #flows: 5019 tuple_spans: [29, 173] diff:  2
# stream#: 1 req. #flows: 2509 tuple_spans: [13, 193] diff:  0
# stream#: 2 req. #flows: 1254 tuple_spans: [5, 251] diff:  1
# stream#: 3 req. #flows: 627 tuple_spans: [313, 2] diff:  1
# stream#: 4 req. #flows: 313 tuple_spans: [1, 313] diff:  0
# stream#: 5 req. #flows: 156 tuple_spans: [1, 157] diff:  1
# stream#: 6 req. #flows: 78 tuple_spans: [1, 79] diff:  1
# stream#: 7 req. #flows: 39 tuple_spans: [3, 13] diff:  0
# stream#: 0 src: 16.0.0.0 - 16.0.0.28 : 0 - 172 dst 48.0.0.0 - 48.0.0.1 : 0 - 1
# stream#: 1 src: 16.0.0.29 - 16.0.0.41 : 173 - 365 dst 48.0.0.0 - 48.0.0.1 : 0 - 1
# stream#: 2 src: 16.0.0.42 - 16.0.0.46 : 366 - 616 dst 48.0.0.0 - 48.0.0.1 : 0 - 1
# stream#: 3 src: 16.0.0.47 - 16.0.1.103 : 617 - 618 dst 48.0.0.0 - 48.0.0.1 : 0 - 1
# stream#: 4 src: 16.0.1.104 - 16.0.1.104 : 619 - 931 dst 48.0.0.0 - 48.0.0.1 : 0 - 1
# stream#: 5 src: 16.0.1.105 - 16.0.1.105 : 932 - 1088 dst 48.0.0.0 - 48.0.0.1 : 0 - 1
# stream#: 6 src: 16.0.1.106 - 16.0.1.106 : 1089 - 1167 dst 48.0.0.0 - 48.0.0.1 : 0 - 1
# stream#: 7 src: 16.0.1.107 - 16.0.1.109 : 1168 - 1180 dst 48.0.0.0 - 48.0.0.1 : 0 - 1



> -----Original Message-----
> From: ovs-dev-bounces@openvswitch.org [mailto:ovs-dev-
> bounces@openvswitch.org] On Behalf Of yipeng wang
> Sent: Tuesday, March 6, 2018 7:23 PM
> To: dev@openvswitch.org; Wang, Yipeng1 <yipeng1.wang@intel.com>;
> jan.scheurich@ericsson.com; Bodireddy, Bhanuprakash
> <bhanuprakash.bodireddy@intel.com>; u9012063@gmail.com
> Cc: Tai, Charlie <charlie.tai@intel.com>
> Subject: [ovs-dev] [PATCH 0/3] dpif-netdev: Combine CD and DFC patch for
> datapath refactor
> 
> This patch set is the V1 implementation to combine the CD and DFC design.
> Both patches intend to refactor datapath to avoid costly sequential subtable
> search.
> 
> CD and DFC patch sets:
> CD: [PATCH v2 0/5] dpif-netdev: Cuckoo-Distributor	implementation
> https://mail.openvswitch.org/pipermail/ovs-dev/2017-October/340305.html
> 
> DFC: [PATCH] dpif-netdev: Refactor datapath flow cache
> https://mail.openvswitch.org/pipermail/ovs-dev/2017-November/341066.html
> 
> The first commit is a rebase of Jan Scheurich's patch of [PATCH] dpif-netdev:
> Refactor datapath flow cache
> 
> The second commit is to incorporate CD's way-associative design into DFC to
> improve the hit rate.
> 
> The third commit is to change the distributor to cache an index of flow_table
> entry to improve memory efficiency.
> 
> 
> RFC of this patch set:
> https://mail.openvswitch.org/pipermail/ovs-dev/2018-January/343411.html
> 
> 
> 
> RFC->V1:
> 1. rebase to master head.
> 2. The last commit is totally rewritten to use the flow_table as indirect table.
>    The CD/DFC distributor will cache the index of flow_table entry.
> 3. Incorporate commit 2 into commit 1. (Bhanu's comment) 4. Change DFC to be
> always on in commit 1. (Bhanu's comment)
> 
> 
> Yipeng Wang (2):
>   dpif-netdev: Use way-associative cache
>   use flow_table as indirect table
> 
> Jan Scheurich (1):
>   dpif-netdev: Refactor datapath flow cache
> 
>  lib/cmap.c             |  62 +++++++++
>  lib/cmap.h             |   5 +
>  lib/dpif-netdev-perf.h |   1 +
>  lib/dpif-netdev.c      | 359 +++++++++++++++++++++++++++++++++----------------
>  4 files changed, 310 insertions(+), 117 deletions(-)
> 
> --
> 2.7.4
> 
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Wang, Yipeng1 March 21, 2018, 4:33 p.m. UTC | #2
Hi, all

Thanks Billy for the test. The skewed traffic generator is very helpful.

After investigation with Billy, we found the flow count is not correct in the results table. The number of flows in the table is half of the real number, so the case that CD/DFC performs much worse than master head is at 2 million (not 1 million) flow case. Since CD/DFC has 1 million entries, In such case, the DFC/CD suffers a lot of misses (like EMC miss will harm total throughput too).  

We are working on a better test script (including skewed traffic to various numbers of subtables), and also a new patch set. We will post version 2 and new results soon.

Thanks
Yipeng

>-----Original Message-----
>From: O Mahony, Billy
>Sent: Thursday, March 8, 2018 2:31 AM
>To: Wang, Yipeng1 <yipeng1.wang@intel.com>; dev@openvswitch.org; Wang, Yipeng1 <yipeng1.wang@intel.com>;
>jan.scheurich@ericsson.com; Bodireddy, Bhanuprakash <bhanuprakash.bodireddy@intel.com>; u9012063@gmail.com
>Cc: Tai, Charlie <charlie.tai@intel.com>; Franck Baudin <fbaudin@redhat.com>
>Subject: RE: [ovs-dev] [PATCH 0/3] dpif-netdev: Combine CD and DFC patch for datapath refactor
>
>Hi All,
>
>I have run some tests using a more realistic distribution of flows - see below
>- to what we normally test with.
>
>This is a phy to phy test with just port forwarding but I think that is the
>best way to test the EMC as it avoids noise from other mechanisms e.g. vhost
>and dpcls lookup. It uses 64B packets and 1 PMD.
>
>I also ran the tests with the EMC disabled.
>
>                             Baseline HoM 951cbaf 6/3/18
>                ||   emc-insert-prob 1/100  |     emc disabled
>offered   flows ||   rxd    emc    cycles/  |  rxd  emc       cycles
>kpps            ||  kpps   hits       pkt   | kpps hits          pkt
>----------------++--------------------------+-----------------------
>14,000        8 || 10830   100%       212   | 7360   0%          311
>14,000      100 ||  9730   100%       236   | 7370   0%          311
>14,000    10000 ||  6545    69%       345   | 7370   0%          311
>14,000  1000000 ||  6202     3%       370   | 7370   0%          311
>
>                                 Combine CD and DFC patch
>                ||  emc-insert-prob 1/100   |     emc disabled
>offered   flows ||  rxd    emc  dfc  cycles | rxd   emc   dfc cycles
>kpps            || kpps   hits hits    /pkt |      hits  hits   /pkt
>----------------++--------------------------+-----------------------
>14,000        8 || 10930  100%   0%     210 | 8570   0%  100%    268
>14,000      100 || 10220  100%   0%     224 | 7800   0%  100%    294
>14,000    10000 || 8000    84%  16%     287 | 6770   0%  100%    339
>14,000  1000000 || 5921     7%  65%     387 | 6060   0%   72%    378
>
>In these scenarios the patch gives an advantage at lower numbers of flows but
>this advantaged is reversed at very high flow numbers - presumably as the DFC itself
>approaches capacity.
>
>Another interesting scenario to test would be the case of many shortlived flows.
>1M flows and 200k new flows/sec for example. This real use case was presented at
>the last OvS Con https://www.slideshare.net/LF_OpenvSwitch/lfovs17ovsdpdk-for-nfv-go-live-feedback.
>Which I'd hope to implement in due course.
>
>Below is some details on the how and why flow distribution was made for these tests.
>
>Regards,
>Billy
>
>
>All caches are designed on the assumption that in the real world that access
>requests are not uniformly distributed.  By design they are used to improve
>performance only in situations where some items are accessed more frequently
>than others.  If this assumption does not hold then the use a cache actually
>degrades performance.
>
>Currently we test the EMC with one of two scenarios both of which break the
>above assumption:
>
>1) Round Robin 'Distribution':
>
>    The TG sends a packet from flow#1 then flow#2 up to flow#N then back to
>    flow#1 again.
>
>    Testing with this case gives results that under-state the benefit of the
>    EMC to the maximum extent possible.  By sending packet from every other flow in
>    between every two packets from any given flow the chances of the flow having
>    been evicted in the interim between the two packets are maximized.  If a tester
>    were to intentionally design a test to understate the benefits of the EMC it
>    would be a round-robin flow distribution.
>
>2) Uniform Distribution:
>
>    The TG randomly selects the next packet to send from all the configured flows.
>
>    Testing with this case gives results that under-state the benefit of the
>    EMC.  Testing with this case gives results that under-state the benefit
>    of the EMC.  As each flow is equally likely to occur then unless the number of
>    flows is less than the number of EMC entries, there are many more flows than
>    is likely no benefit from using the EMC.
>
>By testing only with these flow distributions that break the fundamental
>assumptions indicating the use of an EMC, we are consistently under-stating the
>benefits of flow-caching.
>
>A More Realistic Distribution:
>
>A more realistic distribution is almost certainly some kind of power-law or
>Pareto distribution.  In this kind of distribution the majority of packets
>belong to a small number of flows.  Think of the classic '80/20' rule.  Given a
>power-law distribution if we rank all the flows by their packets-per-second we
>see that flow with the Nth most packets-per-second is has a rate that is some
>fraction of the N-1th flow for all N.  This kind of distribution is what is
>seen in the natural world regarding the distribution of ranking of word
>occurrence in a language,  the population ranks of cities in various countries,
>corporation sizes, income rankings, ranks of number of people watching the same
>TV channel, and so on. [https://en.wikipedia.org/wiki/Zipf%27s_law].
>
>For example using a Zipfian distribution with k=1 as the model for pps for
>flows the flow distribution would look like this (y-axis is not linear):
>
>        1000| *
>         500| *  *
>         333| *  *  *
>PPS      250| *  *  *  *
>           ..
>          10| *  *  *  *       *
>            +------------...-------------
>              1  2  3  4 ...  100
>              Flow (PPS Ranking)
>
>
>Trex software-based traffic generator with a small amount of additional
>scripting does allow an approximation of such a distribution (specifically the
>limitation is that instead of pps rate for each individual flow a bucket of
>flows much share the same pps value - for example below all the flows are
>distributed across 8 streams to appoximate this kind of power-law
>distribution).
>
>For example:
>
># start 14000kpps 10000flows
>Each stream has 14000/8 kpps but has a different number of flows. Each stream
>has about half the number of flows as the previous one.
>
># flows per stream: [5019, 2509, 1254, 627, 313, 156, 78, 39] total: 9995
>
>The actual number of flows per stream is a close approximation of the precise
>distribution - usually it's out by just one or two. This is done to allow use
>of prime numbers for the ranges of the elements of the 5-tuple members. For
>example incrementing over 29 ip address and 173 port numbers means 29*173=5017
>unique flows as 29 and 173 are prime. Use of non-primes would result in
>aliasing of the combinations resulting in much less unique flows.
>
># stream#: 0 req. #flows: 5019 tuple_spans: [29, 173] diff:  2
># stream#: 1 req. #flows: 2509 tuple_spans: [13, 193] diff:  0
># stream#: 2 req. #flows: 1254 tuple_spans: [5, 251] diff:  1
># stream#: 3 req. #flows: 627 tuple_spans: [313, 2] diff:  1
># stream#: 4 req. #flows: 313 tuple_spans: [1, 313] diff:  0
># stream#: 5 req. #flows: 156 tuple_spans: [1, 157] diff:  1
># stream#: 6 req. #flows: 78 tuple_spans: [1, 79] diff:  1
># stream#: 7 req. #flows: 39 tuple_spans: [3, 13] diff:  0
># stream#: 0 src: 16.0.0.0 - 16.0.0.28 : 0 - 172 dst 48.0.0.0 - 48.0.0.1 : 0 - 1
># stream#: 1 src: 16.0.0.29 - 16.0.0.41 : 173 - 365 dst 48.0.0.0 - 48.0.0.1 : 0 - 1
># stream#: 2 src: 16.0.0.42 - 16.0.0.46 : 366 - 616 dst 48.0.0.0 - 48.0.0.1 : 0 - 1
># stream#: 3 src: 16.0.0.47 - 16.0.1.103 : 617 - 618 dst 48.0.0.0 - 48.0.0.1 : 0 - 1
># stream#: 4 src: 16.0.1.104 - 16.0.1.104 : 619 - 931 dst 48.0.0.0 - 48.0.0.1 : 0 - 1
># stream#: 5 src: 16.0.1.105 - 16.0.1.105 : 932 - 1088 dst 48.0.0.0 - 48.0.0.1 : 0 - 1
># stream#: 6 src: 16.0.1.106 - 16.0.1.106 : 1089 - 1167 dst 48.0.0.0 - 48.0.0.1 : 0 - 1
># stream#: 7 src: 16.0.1.107 - 16.0.1.109 : 1168 - 1180 dst 48.0.0.0 - 48.0.0.1 : 0 - 1
>
>
>
>> -----Original Message-----
>> From: ovs-dev-bounces@openvswitch.org [mailto:ovs-dev-
>> bounces@openvswitch.org] On Behalf Of yipeng wang
>> Sent: Tuesday, March 6, 2018 7:23 PM
>> To: dev@openvswitch.org; Wang, Yipeng1 <yipeng1.wang@intel.com>;
>> jan.scheurich@ericsson.com; Bodireddy, Bhanuprakash
>> <bhanuprakash.bodireddy@intel.com>; u9012063@gmail.com
>> Cc: Tai, Charlie <charlie.tai@intel.com>
>> Subject: [ovs-dev] [PATCH 0/3] dpif-netdev: Combine CD and DFC patch for
>> datapath refactor
>>
>> This patch set is the V1 implementation to combine the CD and DFC design.
>> Both patches intend to refactor datapath to avoid costly sequential subtable
>> search.
>>
>> CD and DFC patch sets:
>> CD: [PATCH v2 0/5] dpif-netdev: Cuckoo-Distributor	implementation
>> https://mail.openvswitch.org/pipermail/ovs-dev/2017-October/340305.html
>>
>> DFC: [PATCH] dpif-netdev: Refactor datapath flow cache
>> https://mail.openvswitch.org/pipermail/ovs-dev/2017-November/341066.html
>>
>> The first commit is a rebase of Jan Scheurich's patch of [PATCH] dpif-netdev:
>> Refactor datapath flow cache
>>
>> The second commit is to incorporate CD's way-associative design into DFC to
>> improve the hit rate.
>>
>> The third commit is to change the distributor to cache an index of flow_table
>> entry to improve memory efficiency.
>>
>>
>> RFC of this patch set:
>> https://mail.openvswitch.org/pipermail/ovs-dev/2018-January/343411.html
>>
>>
>>
>> RFC->V1:
>> 1. rebase to master head.
>> 2. The last commit is totally rewritten to use the flow_table as indirect table.
>>    The CD/DFC distributor will cache the index of flow_table entry.
>> 3. Incorporate commit 2 into commit 1. (Bhanu's comment) 4. Change DFC to be
>> always on in commit 1. (Bhanu's comment)
>>
>>
>> Yipeng Wang (2):
>>   dpif-netdev: Use way-associative cache
>>   use flow_table as indirect table
>>
>> Jan Scheurich (1):
>>   dpif-netdev: Refactor datapath flow cache
>>
>>  lib/cmap.c             |  62 +++++++++
>>  lib/cmap.h             |   5 +
>>  lib/dpif-netdev-perf.h |   1 +
>>  lib/dpif-netdev.c      | 359 +++++++++++++++++++++++++++++++++----------------
>>  4 files changed, 310 insertions(+), 117 deletions(-)
>>
>> --
>> 2.7.4
>>
>> _______________________________________________
>> dev mailing list
>> dev@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev