diff mbox

[ovs-dev,v2,2/2] dpif-netdev: Fix emc replacement policy.

Message ID 1501512087-29023-3-git-send-email-i.maximets@samsung.com
State Superseded
Headers show

Commit Message

Ilya Maximets July 31, 2017, 2:41 p.m. UTC
Current EMC replacement policy allows to replace active EMC entry
even if there are dead (empty) entries available. This leads to
EMC trashing even on few hundreds of flows. In some cases PMD
threads starts to execute classifier lookups even in tests with
50 - 100 active flows.

Looks like the hash comparison rule was introduced to randomly
choose one of alive entries to replace. But it doesn't work as
needed and also hashes has nothing common with randomness.

Lets fix the replacement policy by removing hash checking and
using the random value passed from 'emc_probabilistic_insert()'
only while considering replace of the alive entry.
This should give us nearly fair way to choose the entry to replace.

We are avoiding calculation of the new random value by reusing
bits of already generated random for probabilistic EMC insertion.
Bits higher than 'EM_FLOW_INSERT_INV_PROB_SHIFT' are used because
lower bits are less than 'min' and not fully random.

Not replacing of alive entries while dead ones exists allows to
significantly decrease EMC trashing.

Testing shows stable work of exact match cache without misses
with up to 3072 - 6144 active flows (depends on traffic pattern).

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
---
 lib/dpif-netdev.c | 23 ++++++++++++++++-------
 1 file changed, 16 insertions(+), 7 deletions(-)

Comments

Fischetti, Antonio Aug. 3, 2017, 3:10 p.m. UTC | #1
LGTM.

I think this patch could work fine in conjunction with the one I posted
https://mail.openvswitch.org/pipermail/ovs-dev/2017-July/335940.html
where I'm targeting a congestion usecase with recirculated packets.
The goal is still to limit thrashing and the criteria is to avoid 
EMC lookup and insertions for recirculated packets.

I gave a try to your 2 patches, I'm using 
Commit 325b2b1a493a2230072de726bbb53a8337759f39

On top of that I applied Ciara's patch "dpif-netdev: add EMC entry count 
and %full figure to pmd-stats-show" at:
https://mail.openvswitch.org/pipermail/ovs-dev/2017-January/327570.html
because I wanted to see the count of EMC entries.

I generated the different traffic streams by looping on IP addresses.
Of course results are strictly dependent on the particular data traffic.

I sent traffic 64B UDP packets at the line rate and measured the Rx packet 
rate, regardless of pkt loss.

Flow setup:
table=0, in_port=dpdk0 actions=output:dpdk1
table=0, in_port=dpdk1 actions=output:dpdk0

2 PMDs, 3 Tx queues.

To read entries counts:
 sudo ./utilities/ovs-appctl dpif-netdev/pmd-stats-show | grep entries

Results
=======

        +=========================+============================
        |  Orig + Ciara's patch   |  + these 2 patches     
========+=========================+============================
Streams | Entries      Rx [Mpps]  |   Entries      Rx [Mpps]
--------+-------------------------+----------------------------
100     | 100, 100    9.76, 10.05 |   100, 100    10.55, 10.72 
1000    | 1000, 999   7.85, 8.09  |   1000, 1000  8.77, 8.91 
3000    | 2993, 2987  7.61, 7.87  |   3000, 3000  8.58, 7.89
5000    | 4336, 4872  7.49, 7.26  |   4496, 5000  8.57, 7.28
7000    | 5870, 7000  8.57, 7.20  |   6039, 7000  8.56, 7.26 
9000    | 6550, 7572  7.19, 6.91  |   6643, 7836  7.06, 6.90 
11000   | 7152, 8192  6.81, 6.83  |   7158, 8192  6.81, 6.79
--------+-------------------------+----------------------------

It's interesting to see how these patches allow to use more EMC locations -
see cases from 3000 to 9000 streams - so reducing thrashing.


-Antonio


> -----Original Message-----
> From: ovs-dev-bounces@openvswitch.org [mailto:ovs-dev-bounces@openvswitch.org]
> On Behalf Of Ilya Maximets
> Sent: Monday, July 31, 2017 3:41 PM
> To: ovs-dev@openvswitch.org
> Cc: Heetae Ahn <heetae82.ahn@samsung.com>; Ilya Maximets
> <i.maximets@samsung.com>
> Subject: [ovs-dev] [PATCH v2 2/2] dpif-netdev: Fix emc replacement policy.
> 
> Current EMC replacement policy allows to replace active EMC entry
> even if there are dead (empty) entries available. This leads to
> EMC trashing even on few hundreds of flows. In some cases PMD
> threads starts to execute classifier lookups even in tests with
> 50 - 100 active flows.
> 
> Looks like the hash comparison rule was introduced to randomly
> choose one of alive entries to replace. But it doesn't work as
> needed and also hashes has nothing common with randomness.
> 
> Lets fix the replacement policy by removing hash checking and
> using the random value passed from 'emc_probabilistic_insert()'
> only while considering replace of the alive entry.

[Antonio]
I like the approach to re-use the info we already have, in this case
the random value.

> This should give us nearly fair way to choose the entry to replace.
> 
> We are avoiding calculation of the new random value by reusing
> bits of already generated random for probabilistic EMC insertion.
> Bits higher than 'EM_FLOW_INSERT_INV_PROB_SHIFT' are used because
> lower bits are less than 'min' and not fully random.
> 
> Not replacing of alive entries while dead ones exists allows to
> significantly decrease EMC trashing.
> 
> Testing shows stable work of exact match cache without misses
> with up to 3072 - 6144 active flows (depends on traffic pattern).
> 
> Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
> ---
>  lib/dpif-netdev.c | 23 ++++++++++++++++-------
>  1 file changed, 16 insertions(+), 7 deletions(-)
> 
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index 123a7c9..a714329 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -155,6 +155,9 @@ struct netdev_flow_key {
>  #define EM_FLOW_INSERT_INV_PROB_SHIFT 20
>  #define EM_FLOW_INSERT_INV_PROB_MAX  (1 << EM_FLOW_INSERT_INV_PROB_SHIFT)
>  #define EM_FLOW_INSERT_INV_PROB_MASK (EM_FLOW_INSERT_INV_PROB_MAX - 1)
> +/* We will use bits higher than EM_FLOW_INSERT_INV_PROB_SHIFT of the random
> + * value for EMC replacement policy. */
> +BUILD_ASSERT_DECL(32 - EM_FLOW_INSERT_INV_PROB_SHIFT >= EM_FLOW_HASH_SEGS);
>  /* Default EMC insert probability is 1 / DEFAULT_EM_FLOW_INSERT_INV_PROB */
>  #define DEFAULT_EM_FLOW_INSERT_INV_PROB 100
>  #define DEFAULT_EM_FLOW_INSERT_MIN (EM_FLOW_INSERT_INV_PROB_MAX /        \
> @@ -2041,7 +2044,7 @@ emc_change_entry(struct emc_entry *ce, struct
> dp_netdev_flow *flow,
> 
>  static inline void
>  emc_insert(struct emc_cache *cache, const struct netdev_flow_key *key,
> -           struct dp_netdev_flow *flow)
> +           struct dp_netdev_flow *flow, uint32_t random_value)
>  {
>      struct emc_entry *to_be_replaced = NULL;
>      struct emc_entry *current_entry;
> @@ -2054,12 +2057,12 @@ emc_insert(struct emc_cache *cache, const struct
> netdev_flow_key *key,
>          }
> 
>          /* Replacement policy: put the flow in an empty (not alive) entry, or
> -         * in the first entry where it can be */
> +         * in the random entry where it can be */
>          if (!to_be_replaced
>              || (emc_entry_alive(to_be_replaced)
> -                && !emc_entry_alive(current_entry))
> -            || current_entry->key.hash < to_be_replaced->key.hash) {
> +                && (!emc_entry_alive(current_entry) || random_value & 1))) {
>              to_be_replaced = current_entry;
> +            random_value >>= 1;
>          }
>      }
>      /* We didn't find the miniflow in the cache.
> @@ -2077,11 +2080,17 @@ emc_probabilistic_insert(struct dp_netdev_pmd_thread
> *pmd,
>       * default the value is UINT32_MAX / 100 which yields an insertion
>       * probability of 1/100 ie. 1% */
> 
> -    uint32_t min;
> +    uint32_t min, random_value;
>      atomic_read_relaxed(&pmd->dp->emc_insert_min, &min);
> 
> -    if (min && (random_uint32() & EM_FLOW_INSERT_INV_PROB_MASK) < min) {
> -        emc_insert(&pmd->flow_cache, key, flow);
> +    if (!min) {
> +        return;
> +    }
> +
> +    random_value = random_uint32();
> +    if ((random_value & EM_FLOW_INSERT_INV_PROB_MASK) < min) {
> +        emc_insert(&pmd->flow_cache, key, flow,
> +                   random_value >> EM_FLOW_INSERT_INV_PROB_SHIFT);
>      }
>  }
> 
> --
> 2.7.4
> 
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Darrell Ball Aug. 8, 2017, 7:16 a.m. UTC | #2
Hi Antonio

Would you mind sharing your distribution algorithm ?
I would like to understand how you saw some benefit in the 1000-5000 range.

    1000    | 1000, 999   7.85, 8.09  |   1000, 1000  8.77, 8.91 
    3000    | 2993, 2987  7.61, 7.87  |   3000, 3000  8.58, 7.89
    5000    | 4336, 4872  7.49, 7.26  |   4496, 5000  8.57, 7.28

Thanks Darrell


-----Original Message-----
From: <ovs-dev-bounces@openvswitch.org> on behalf of "Fischetti, Antonio" <antonio.fischetti@intel.com>
Date: Thursday, August 3, 2017 at 8:10 AM
To: Ilya Maximets <i.maximets@samsung.com>, "ovs-dev@openvswitch.org" <ovs-dev@openvswitch.org>
Cc: Heetae Ahn <heetae82.ahn@samsung.com>
Subject: Re: [ovs-dev] [PATCH v2 2/2] dpif-netdev: Fix emc replacement policy.

    LGTM.
    
    I think this patch could work fine in conjunction with the one I posted
    https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_pipermail_ovs-2Ddev_2017-2DJuly_335940.html&d=DwICAg&c=uilaK90D4TOVoH58JNXRgQ&r=BVhFA09CGX7JQ5Ih-uZnsw&m=gPngVJNy2q5ZxGlA8YyqJvmxn6ITNf2p_ExFf4bI1qA&s=Ro4bZVnRtHb2-gSch_oxiU5ypxPWW8ONyoMeuXml4Dk&e= 
    where I'm targeting a congestion usecase with recirculated packets.
    The goal is still to limit thrashing and the criteria is to avoid 
    EMC lookup and insertions for recirculated packets.
    
    I gave a try to your 2 patches, I'm using 
    Commit 325b2b1a493a2230072de726bbb53a8337759f39
    
    On top of that I applied Ciara's patch "dpif-netdev: add EMC entry count 
    and %full figure to pmd-stats-show" at:
    https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_pipermail_ovs-2Ddev_2017-2DJanuary_327570.html&d=DwICAg&c=uilaK90D4TOVoH58JNXRgQ&r=BVhFA09CGX7JQ5Ih-uZnsw&m=gPngVJNy2q5ZxGlA8YyqJvmxn6ITNf2p_ExFf4bI1qA&s=Np71SDnUWCgZNENFZ0A95iSWDqHN9SBtRso9UU6Wzlk&e= 
    because I wanted to see the count of EMC entries.
    
    I generated the different traffic streams by looping on IP addresses.
    Of course results are strictly dependent on the particular data traffic.
    
    I sent traffic 64B UDP packets at the line rate and measured the Rx packet 
    rate, regardless of pkt loss.
    
    Flow setup:
    table=0, in_port=dpdk0 actions=output:dpdk1
    table=0, in_port=dpdk1 actions=output:dpdk0
    
    2 PMDs, 3 Tx queues.
    
    To read entries counts:
     sudo ./utilities/ovs-appctl dpif-netdev/pmd-stats-show | grep entries
    
    Results
    =======
    
            +=========================+============================
            |  Orig + Ciara's patch   |  + these 2 patches     
    ========+=========================+============================
    Streams | Entries      Rx [Mpps]  |   Entries      Rx [Mpps]
    --------+-------------------------+----------------------------
    100     | 100, 100    9.76, 10.05 |   100, 100    10.55, 10.72 
    1000    | 1000, 999   7.85, 8.09  |   1000, 1000  8.77, 8.91 
    3000    | 2993, 2987  7.61, 7.87  |   3000, 3000  8.58, 7.89
    5000    | 4336, 4872  7.49, 7.26  |   4496, 5000  8.57, 7.28
    7000    | 5870, 7000  8.57, 7.20  |   6039, 7000  8.56, 7.26 
    9000    | 6550, 7572  7.19, 6.91  |   6643, 7836  7.06, 6.90 
    11000   | 7152, 8192  6.81, 6.83  |   7158, 8192  6.81, 6.79
    --------+-------------------------+----------------------------
    
    It's interesting to see how these patches allow to use more EMC locations -
    see cases from 3000 to 9000 streams - so reducing thrashing.
    
    
    -Antonio
    
    
    > -----Original Message-----
    > From: ovs-dev-bounces@openvswitch.org [mailto:ovs-dev-bounces@openvswitch.org]
    > On Behalf Of Ilya Maximets
    > Sent: Monday, July 31, 2017 3:41 PM
    > To: ovs-dev@openvswitch.org
    > Cc: Heetae Ahn <heetae82.ahn@samsung.com>; Ilya Maximets
    > <i.maximets@samsung.com>
    > Subject: [ovs-dev] [PATCH v2 2/2] dpif-netdev: Fix emc replacement policy.
    > 
    > Current EMC replacement policy allows to replace active EMC entry
    > even if there are dead (empty) entries available. This leads to
    > EMC trashing even on few hundreds of flows. In some cases PMD
    > threads starts to execute classifier lookups even in tests with
    > 50 - 100 active flows.
    > 
    > Looks like the hash comparison rule was introduced to randomly
    > choose one of alive entries to replace. But it doesn't work as
    > needed and also hashes has nothing common with randomness.
    > 
    > Lets fix the replacement policy by removing hash checking and
    > using the random value passed from 'emc_probabilistic_insert()'
    > only while considering replace of the alive entry.
    
    [Antonio]
    I like the approach to re-use the info we already have, in this case
    the random value.
    
    > This should give us nearly fair way to choose the entry to replace.
    > 
    > We are avoiding calculation of the new random value by reusing
    > bits of already generated random for probabilistic EMC insertion.
    > Bits higher than 'EM_FLOW_INSERT_INV_PROB_SHIFT' are used because
    > lower bits are less than 'min' and not fully random.
    > 
    > Not replacing of alive entries while dead ones exists allows to
    > significantly decrease EMC trashing.
    > 
    > Testing shows stable work of exact match cache without misses
    > with up to 3072 - 6144 active flows (depends on traffic pattern).
    > 
    > Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
    > ---
    >  lib/dpif-netdev.c | 23 ++++++++++++++++-------
    >  1 file changed, 16 insertions(+), 7 deletions(-)
    > 
    > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
    > index 123a7c9..a714329 100644
    > --- a/lib/dpif-netdev.c
    > +++ b/lib/dpif-netdev.c
    > @@ -155,6 +155,9 @@ struct netdev_flow_key {
    >  #define EM_FLOW_INSERT_INV_PROB_SHIFT 20
    >  #define EM_FLOW_INSERT_INV_PROB_MAX  (1 << EM_FLOW_INSERT_INV_PROB_SHIFT)
    >  #define EM_FLOW_INSERT_INV_PROB_MASK (EM_FLOW_INSERT_INV_PROB_MAX - 1)
    > +/* We will use bits higher than EM_FLOW_INSERT_INV_PROB_SHIFT of the random
    > + * value for EMC replacement policy. */
    > +BUILD_ASSERT_DECL(32 - EM_FLOW_INSERT_INV_PROB_SHIFT >= EM_FLOW_HASH_SEGS);
    >  /* Default EMC insert probability is 1 / DEFAULT_EM_FLOW_INSERT_INV_PROB */
    >  #define DEFAULT_EM_FLOW_INSERT_INV_PROB 100
    >  #define DEFAULT_EM_FLOW_INSERT_MIN (EM_FLOW_INSERT_INV_PROB_MAX /        \
    > @@ -2041,7 +2044,7 @@ emc_change_entry(struct emc_entry *ce, struct
    > dp_netdev_flow *flow,
    > 
    >  static inline void
    >  emc_insert(struct emc_cache *cache, const struct netdev_flow_key *key,
    > -           struct dp_netdev_flow *flow)
    > +           struct dp_netdev_flow *flow, uint32_t random_value)
    >  {
    >      struct emc_entry *to_be_replaced = NULL;
    >      struct emc_entry *current_entry;
    > @@ -2054,12 +2057,12 @@ emc_insert(struct emc_cache *cache, const struct
    > netdev_flow_key *key,
    >          }
    > 
    >          /* Replacement policy: put the flow in an empty (not alive) entry, or
    > -         * in the first entry where it can be */
    > +         * in the random entry where it can be */
    >          if (!to_be_replaced
    >              || (emc_entry_alive(to_be_replaced)
    > -                && !emc_entry_alive(current_entry))
    > -            || current_entry->key.hash < to_be_replaced->key.hash) {
    > +                && (!emc_entry_alive(current_entry) || random_value & 1))) {
    >              to_be_replaced = current_entry;
    > +            random_value >>= 1;
    >          }
    >      }
    >      /* We didn't find the miniflow in the cache.
    > @@ -2077,11 +2080,17 @@ emc_probabilistic_insert(struct dp_netdev_pmd_thread
    > *pmd,
    >       * default the value is UINT32_MAX / 100 which yields an insertion
    >       * probability of 1/100 ie. 1% */
    > 
    > -    uint32_t min;
    > +    uint32_t min, random_value;
    >      atomic_read_relaxed(&pmd->dp->emc_insert_min, &min);
    > 
    > -    if (min && (random_uint32() & EM_FLOW_INSERT_INV_PROB_MASK) < min) {
    > -        emc_insert(&pmd->flow_cache, key, flow);
    > +    if (!min) {
    > +        return;
    > +    }
    > +
    > +    random_value = random_uint32();
    > +    if ((random_value & EM_FLOW_INSERT_INV_PROB_MASK) < min) {
    > +        emc_insert(&pmd->flow_cache, key, flow,
    > +                   random_value >> EM_FLOW_INSERT_INV_PROB_SHIFT);
    >      }
    >  }
    > 
    > --
    > 2.7.4
    > 
    > _______________________________________________
    > dev mailing list
    > dev@openvswitch.org
    > https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_mailman_listinfo_ovs-2Ddev&d=DwICAg&c=uilaK90D4TOVoH58JNXRgQ&r=BVhFA09CGX7JQ5Ih-uZnsw&m=gPngVJNy2q5ZxGlA8YyqJvmxn6ITNf2p_ExFf4bI1qA&s=catT2G8FNT5ScEbzPLhiCPWCH1Lk3A9BN4tt0egFtTo&e= 
    _______________________________________________
    dev mailing list
    dev@openvswitch.org
    https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_mailman_listinfo_ovs-2Ddev&d=DwICAg&c=uilaK90D4TOVoH58JNXRgQ&r=BVhFA09CGX7JQ5Ih-uZnsw&m=gPngVJNy2q5ZxGlA8YyqJvmxn6ITNf2p_ExFf4bI1qA&s=catT2G8FNT5ScEbzPLhiCPWCH1Lk3A9BN4tt0egFtTo&e=
Fischetti, Antonio Aug. 8, 2017, 10:45 a.m. UTC | #3
> -----Original Message-----
> From: Darrell Ball [mailto:dball@vmware.com]
> Sent: Tuesday, August 8, 2017 8:16 AM
> To: Fischetti, Antonio <antonio.fischetti@intel.com>; Ilya Maximets
> <i.maximets@samsung.com>; ovs-dev@openvswitch.org
> Cc: Heetae Ahn <heetae82.ahn@samsung.com>
> Subject: Re: [ovs-dev] [PATCH v2 2/2] dpif-netdev: Fix emc replacement policy.
> 
> Hi Antonio
> 
> Would you mind sharing your distribution algorithm ?
> I would like to understand how you saw some benefit in the 1000-5000 range.
> 
>     1000    | 1000, 999   7.85, 8.09  |   1000, 1000  8.77, 8.91
>     3000    | 2993, 2987  7.61, 7.87  |   3000, 3000  8.58, 7.89
>     5000    | 4336, 4872  7.49, 7.26  |   4496, 5000  8.57, 7.28
> 
> Thanks Darrell

[Antonio]
I set up the generator to loop on the dest IP addr on one side,
and loop instead on the source IP addr on the other side.

For example to generate 10 different flows, I was sending to phy port #1
UDP, IPsrc:10.10.10.10, IPdest: 20.20.20.[20-29], PortSrc: 63, PortDest: 63

Instead to phy port #2 (source and dest IPs are now swapped):
UDP, IPsrc: 20.20.20.[20-29], IPdest: 10.10.10.10, PortSrc: 63, PortDest: 63

I use this setup to quickly simulate some UDP connections, maybe the
flows created in this way don't seem to be a real packet dataset.

> 
> -----Original Message-----
> From: <ovs-dev-bounces@openvswitch.org> on behalf of "Fischetti, Antonio"
> <antonio.fischetti@intel.com>
> Date: Thursday, August 3, 2017 at 8:10 AM
> To: Ilya Maximets <i.maximets@samsung.com>, "ovs-dev@openvswitch.org" <ovs-
> dev@openvswitch.org>
> Cc: Heetae Ahn <heetae82.ahn@samsung.com>
> Subject: Re: [ovs-dev] [PATCH v2 2/2] dpif-netdev: Fix emc replacement policy.
> 
>     LGTM.
> 
>     I think this patch could work fine in conjunction with the one I posted
>     https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__mail.openvswitch.org_pipermail_ovs-2Ddev_2017-
> 2DJuly_335940.html&d=DwICAg&c=uilaK90D4TOVoH58JNXRgQ&r=BVhFA09CGX7JQ5Ih-
> uZnsw&m=gPngVJNy2q5ZxGlA8YyqJvmxn6ITNf2p_ExFf4bI1qA&s=Ro4bZVnRtHb2-
> gSch_oxiU5ypxPWW8ONyoMeuXml4Dk&e=
>     where I'm targeting a congestion usecase with recirculated packets.
>     The goal is still to limit thrashing and the criteria is to avoid
>     EMC lookup and insertions for recirculated packets.
> 
>     I gave a try to your 2 patches, I'm using
>     Commit 325b2b1a493a2230072de726bbb53a8337759f39
> 
>     On top of that I applied Ciara's patch "dpif-netdev: add EMC entry count
>     and %full figure to pmd-stats-show" at:
>     https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__mail.openvswitch.org_pipermail_ovs-2Ddev_2017-
> 2DJanuary_327570.html&d=DwICAg&c=uilaK90D4TOVoH58JNXRgQ&r=BVhFA09CGX7JQ5Ih-
> uZnsw&m=gPngVJNy2q5ZxGlA8YyqJvmxn6ITNf2p_ExFf4bI1qA&s=Np71SDnUWCgZNENFZ0A95iSWD
> qHN9SBtRso9UU6Wzlk&e=
>     because I wanted to see the count of EMC entries.
> 
>     I generated the different traffic streams by looping on IP addresses.
>     Of course results are strictly dependent on the particular data traffic.
> 
>     I sent traffic 64B UDP packets at the line rate and measured the Rx packet
>     rate, regardless of pkt loss.
> 
>     Flow setup:
>     table=0, in_port=dpdk0 actions=output:dpdk1
>     table=0, in_port=dpdk1 actions=output:dpdk0
> 
>     2 PMDs, 3 Tx queues.
> 
>     To read entries counts:
>      sudo ./utilities/ovs-appctl dpif-netdev/pmd-stats-show | grep entries
> 
>     Results
>     =======
> 
>             +=========================+============================
>             |  Orig + Ciara's patch   |  + these 2 patches
>     ========+=========================+============================
>     Streams | Entries      Rx [Mpps]  |   Entries      Rx [Mpps]
>     --------+-------------------------+----------------------------
>     100     | 100, 100    9.76, 10.05 |   100, 100    10.55, 10.72
>     1000    | 1000, 999   7.85, 8.09  |   1000, 1000  8.77, 8.91
>     3000    | 2993, 2987  7.61, 7.87  |   3000, 3000  8.58, 7.89
>     5000    | 4336, 4872  7.49, 7.26  |   4496, 5000  8.57, 7.28
>     7000    | 5870, 7000  8.57, 7.20  |   6039, 7000  8.56, 7.26
>     9000    | 6550, 7572  7.19, 6.91  |   6643, 7836  7.06, 6.90
>     11000   | 7152, 8192  6.81, 6.83  |   7158, 8192  6.81, 6.79
>     --------+-------------------------+----------------------------
> 
>     It's interesting to see how these patches allow to use more EMC locations -
>     see cases from 3000 to 9000 streams - so reducing thrashing.
> 
> 
>     -Antonio
> 
> 
>     > -----Original Message-----
>     > From: ovs-dev-bounces@openvswitch.org [mailto:ovs-dev-
> bounces@openvswitch.org]
>     > On Behalf Of Ilya Maximets
>     > Sent: Monday, July 31, 2017 3:41 PM
>     > To: ovs-dev@openvswitch.org
>     > Cc: Heetae Ahn <heetae82.ahn@samsung.com>; Ilya Maximets
>     > <i.maximets@samsung.com>
>     > Subject: [ovs-dev] [PATCH v2 2/2] dpif-netdev: Fix emc replacement
> policy.
>     >
>     > Current EMC replacement policy allows to replace active EMC entry
>     > even if there are dead (empty) entries available. This leads to
>     > EMC trashing even on few hundreds of flows. In some cases PMD
>     > threads starts to execute classifier lookups even in tests with
>     > 50 - 100 active flows.
>     >
>     > Looks like the hash comparison rule was introduced to randomly
>     > choose one of alive entries to replace. But it doesn't work as
>     > needed and also hashes has nothing common with randomness.
>     >
>     > Lets fix the replacement policy by removing hash checking and
>     > using the random value passed from 'emc_probabilistic_insert()'
>     > only while considering replace of the alive entry.
> 
>     [Antonio]
>     I like the approach to re-use the info we already have, in this case
>     the random value.
> 
>     > This should give us nearly fair way to choose the entry to replace.
>     >
>     > We are avoiding calculation of the new random value by reusing
>     > bits of already generated random for probabilistic EMC insertion.
>     > Bits higher than 'EM_FLOW_INSERT_INV_PROB_SHIFT' are used because
>     > lower bits are less than 'min' and not fully random.
>     >
>     > Not replacing of alive entries while dead ones exists allows to
>     > significantly decrease EMC trashing.
>     >
>     > Testing shows stable work of exact match cache without misses
>     > with up to 3072 - 6144 active flows (depends on traffic pattern).
>     >
>     > Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
>     > ---
>     >  lib/dpif-netdev.c | 23 ++++++++++++++++-------
>     >  1 file changed, 16 insertions(+), 7 deletions(-)
>     >
>     > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>     > index 123a7c9..a714329 100644
>     > --- a/lib/dpif-netdev.c
>     > +++ b/lib/dpif-netdev.c
>     > @@ -155,6 +155,9 @@ struct netdev_flow_key {
>     >  #define EM_FLOW_INSERT_INV_PROB_SHIFT 20
>     >  #define EM_FLOW_INSERT_INV_PROB_MAX  (1 <<
> EM_FLOW_INSERT_INV_PROB_SHIFT)
>     >  #define EM_FLOW_INSERT_INV_PROB_MASK (EM_FLOW_INSERT_INV_PROB_MAX - 1)
>     > +/* We will use bits higher than EM_FLOW_INSERT_INV_PROB_SHIFT of the
> random
>     > + * value for EMC replacement policy. */
>     > +BUILD_ASSERT_DECL(32 - EM_FLOW_INSERT_INV_PROB_SHIFT >=
> EM_FLOW_HASH_SEGS);
>     >  /* Default EMC insert probability is 1 / DEFAULT_EM_FLOW_INSERT_INV_PROB
> */
>     >  #define DEFAULT_EM_FLOW_INSERT_INV_PROB 100
>     >  #define DEFAULT_EM_FLOW_INSERT_MIN (EM_FLOW_INSERT_INV_PROB_MAX /
> \
>     > @@ -2041,7 +2044,7 @@ emc_change_entry(struct emc_entry *ce, struct
>     > dp_netdev_flow *flow,
>     >
>     >  static inline void
>     >  emc_insert(struct emc_cache *cache, const struct netdev_flow_key *key,
>     > -           struct dp_netdev_flow *flow)
>     > +           struct dp_netdev_flow *flow, uint32_t random_value)
>     >  {
>     >      struct emc_entry *to_be_replaced = NULL;
>     >      struct emc_entry *current_entry;
>     > @@ -2054,12 +2057,12 @@ emc_insert(struct emc_cache *cache, const struct
>     > netdev_flow_key *key,
>     >          }
>     >
>     >          /* Replacement policy: put the flow in an empty (not alive)
> entry, or
>     > -         * in the first entry where it can be */
>     > +         * in the random entry where it can be */
>     >          if (!to_be_replaced
>     >              || (emc_entry_alive(to_be_replaced)
>     > -                && !emc_entry_alive(current_entry))
>     > -            || current_entry->key.hash < to_be_replaced->key.hash) {
>     > +                && (!emc_entry_alive(current_entry) || random_value &
> 1))) {
>     >              to_be_replaced = current_entry;
>     > +            random_value >>= 1;
>     >          }
>     >      }
>     >      /* We didn't find the miniflow in the cache.
>     > @@ -2077,11 +2080,17 @@ emc_probabilistic_insert(struct
> dp_netdev_pmd_thread
>     > *pmd,
>     >       * default the value is UINT32_MAX / 100 which yields an insertion
>     >       * probability of 1/100 ie. 1% */
>     >
>     > -    uint32_t min;
>     > +    uint32_t min, random_value;
>     >      atomic_read_relaxed(&pmd->dp->emc_insert_min, &min);
>     >
>     > -    if (min && (random_uint32() & EM_FLOW_INSERT_INV_PROB_MASK) < min) {
>     > -        emc_insert(&pmd->flow_cache, key, flow);
>     > +    if (!min) {
>     > +        return;
>     > +    }
>     > +
>     > +    random_value = random_uint32();
>     > +    if ((random_value & EM_FLOW_INSERT_INV_PROB_MASK) < min) {
>     > +        emc_insert(&pmd->flow_cache, key, flow,
>     > +                   random_value >> EM_FLOW_INSERT_INV_PROB_SHIFT);
>     >      }
>     >  }
>     >
>     > --
>     > 2.7.4
>     >
>     > _______________________________________________
>     > dev mailing list
>     > dev@openvswitch.org
>     > https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__mail.openvswitch.org_mailman_listinfo_ovs-
> 2Ddev&d=DwICAg&c=uilaK90D4TOVoH58JNXRgQ&r=BVhFA09CGX7JQ5Ih-
> uZnsw&m=gPngVJNy2q5ZxGlA8YyqJvmxn6ITNf2p_ExFf4bI1qA&s=catT2G8FNT5ScEbzPLhiCPWCH
> 1Lk3A9BN4tt0egFtTo&e=
>     _______________________________________________
>     dev mailing list
>     dev@openvswitch.org
>     https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__mail.openvswitch.org_mailman_listinfo_ovs-
> 2Ddev&d=DwICAg&c=uilaK90D4TOVoH58JNXRgQ&r=BVhFA09CGX7JQ5Ih-
> uZnsw&m=gPngVJNy2q5ZxGlA8YyqJvmxn6ITNf2p_ExFf4bI1qA&s=catT2G8FNT5ScEbzPLhiCPWCH
> 1Lk3A9BN4tt0egFtTo&e=
>
diff mbox

Patch

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 123a7c9..a714329 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -155,6 +155,9 @@  struct netdev_flow_key {
 #define EM_FLOW_INSERT_INV_PROB_SHIFT 20
 #define EM_FLOW_INSERT_INV_PROB_MAX  (1 << EM_FLOW_INSERT_INV_PROB_SHIFT)
 #define EM_FLOW_INSERT_INV_PROB_MASK (EM_FLOW_INSERT_INV_PROB_MAX - 1)
+/* We will use bits higher than EM_FLOW_INSERT_INV_PROB_SHIFT of the random
+ * value for EMC replacement policy. */
+BUILD_ASSERT_DECL(32 - EM_FLOW_INSERT_INV_PROB_SHIFT >= EM_FLOW_HASH_SEGS);
 /* Default EMC insert probability is 1 / DEFAULT_EM_FLOW_INSERT_INV_PROB */
 #define DEFAULT_EM_FLOW_INSERT_INV_PROB 100
 #define DEFAULT_EM_FLOW_INSERT_MIN (EM_FLOW_INSERT_INV_PROB_MAX /        \
@@ -2041,7 +2044,7 @@  emc_change_entry(struct emc_entry *ce, struct dp_netdev_flow *flow,
 
 static inline void
 emc_insert(struct emc_cache *cache, const struct netdev_flow_key *key,
-           struct dp_netdev_flow *flow)
+           struct dp_netdev_flow *flow, uint32_t random_value)
 {
     struct emc_entry *to_be_replaced = NULL;
     struct emc_entry *current_entry;
@@ -2054,12 +2057,12 @@  emc_insert(struct emc_cache *cache, const struct netdev_flow_key *key,
         }
 
         /* Replacement policy: put the flow in an empty (not alive) entry, or
-         * in the first entry where it can be */
+         * in the random entry where it can be */
         if (!to_be_replaced
             || (emc_entry_alive(to_be_replaced)
-                && !emc_entry_alive(current_entry))
-            || current_entry->key.hash < to_be_replaced->key.hash) {
+                && (!emc_entry_alive(current_entry) || random_value & 1))) {
             to_be_replaced = current_entry;
+            random_value >>= 1;
         }
     }
     /* We didn't find the miniflow in the cache.
@@ -2077,11 +2080,17 @@  emc_probabilistic_insert(struct dp_netdev_pmd_thread *pmd,
      * default the value is UINT32_MAX / 100 which yields an insertion
      * probability of 1/100 ie. 1% */
 
-    uint32_t min;
+    uint32_t min, random_value;
     atomic_read_relaxed(&pmd->dp->emc_insert_min, &min);
 
-    if (min && (random_uint32() & EM_FLOW_INSERT_INV_PROB_MASK) < min) {
-        emc_insert(&pmd->flow_cache, key, flow);
+    if (!min) {
+        return;
+    }
+
+    random_value = random_uint32();
+    if ((random_value & EM_FLOW_INSERT_INV_PROB_MASK) < min) {
+        emc_insert(&pmd->flow_cache, key, flow,
+                   random_value >> EM_FLOW_INSERT_INV_PROB_SHIFT);
     }
 }