diff mbox

[ovs-dev,v3,1/2] dpif-netdev: Decrease range of values for EMC probability.

Message ID 1501856248-28555-2-git-send-email-i.maximets@samsung.com
State Deferred
Delegated to: Ian Stokes
Headers show

Commit Message

Ilya Maximets Aug. 4, 2017, 2:17 p.m. UTC
Currently, real insertion probability is higher than configured
for the maximum case because of wrong usage of the random value.

i.e. if 'emc-invert-inv-prob' == UINT32_MAX, then 'emc_insert_min'
equals to 1. In this case we're allowing insert if random vailue
is less or equal to 1. So, two of 2**32 values (0 and 1) are
allowed and real probability is 2 times higher than configured.

This happens because 'random_uint32()' returns value in range
[0; UINT32_MAX], but for the checking to be correct we should
generate random value in range [0; UINT32_MAX - 1].

To fix this we have 4 possible solutions:

 1. need to use uint64_t for 'emc-insert-min' and calculate it
    as '(UINT32_MAX + 1) / inverse_prob' to fairly check the full
    range [0; UINT32_MAX].

    This may decrease performance becaue of 64 bit atomic ops.

 2. Forbid the '2**32 - 1' as the value for 'emc-insert-min'
    because it's the only value we have issues with.

    This will require additional explanations and not very friendly
    for users.

 3. Generate random value in range [0; UINT32_MAX - 1].

    This will require heavy division operation.

 4. Decrease the range of available values for 'emc-insert-inv-prob'.

    Actually, we don't need to have so much different values for
    that option. I beleve that values higher than 1M are completely
    useless. Choosing the upper limit as a power of 2 like 2**20 we
    will be able to mask the generated random value in a fast way
    and also avoid range issue, because same uint32_t can be used to
    store 2**20.

This patch implements solution #4.

CC: Ciara Loftus <ciara.loftus@intel.com>
Fixes: 4c30b24602c3 ("dpif-netdev: Conditional EMC insert")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Antonio Fischetti <antonio.fischetti@intel.com>
---

Infrastructure and logic introduced here will be used for fixing
emc replacement policy.

 lib/dpif-netdev.c    | 14 ++++++++++----
 vswitchd/vswitch.xml |  3 ++-
 2 files changed, 12 insertions(+), 5 deletions(-)

Comments

Darrell Ball Aug. 4, 2017, 6:34 p.m. UTC | #1
-----Original Message-----
From: Ilya Maximets <i.maximets@samsung.com>

Date: Friday, August 4, 2017 at 7:17 AM
To: "ovs-dev@openvswitch.org" <ovs-dev@openvswitch.org>
Cc: Heetae Ahn <heetae82.ahn@samsung.com>, Darrell Ball <dball@vmware.com>, Yipeng Wang <yipeng1.wang@intel.com>, Kevin Traynor <ktraynor@redhat.com>, Ciara Loftus <ciara.loftus@intel.com>, Antonio Fischetti <antonio.fischetti@intel.com>, Ilya Maximets <i.maximets@samsung.com>
Subject: [PATCH v3 1/2] dpif-netdev: Decrease range of values for EMC probability.

    Currently, real insertion probability is higher than configured
    for the maximum case because of wrong usage of the random value.
    
    i.e. if 'emc-invert-inv-prob' == UINT32_MAX, then 'emc_insert_min'
    equals to 1. In this case we're allowing insert if random vailue
    is less or equal to 1. So, two of 2**32 values (0 and 1) are
    allowed and real probability is 2 times higher than configured.
    
    This happens because 'random_uint32()' returns value in range
    [0; UINT32_MAX], but for the checking to be correct we should
    generate random value in range [0; UINT32_MAX - 1].


I understand the calculation is slightly off.
If the user enters 4,294,967,295 then the probability to insert into emc will be 
2 out of 4,294,967,295 rather than 1 out of 4,294,967,295,

However, is there a general concern about such a low probability anyways ?
This max inverse value would be rarely, if ever used and if used, it would be impossible
to see the difference in any real use case.

The user might as well just disable emc rather than use such tiny probabilities.

This existing api was discussed extensively and was very contentious.

However, if patch 2 really has an absolute dependency on this patch 1, we can include it.
I have done various testing and don’t see that, but I have some comments on the
other threads.

    
    To fix this we have 4 possible solutions:
    
     1. need to use uint64_t for 'emc-insert-min' and calculate it
        as '(UINT32_MAX + 1) / inverse_prob' to fairly check the full
        range [0; UINT32_MAX].
    
        This may decrease performance becaue of 64 bit atomic ops.
    
     2. Forbid the '2**32 - 1' as the value for 'emc-insert-min'
        because it's the only value we have issues with.
    
        This will require additional explanations and not very friendly
        for users.
    
     3. Generate random value in range [0; UINT32_MAX - 1].
    
        This will require heavy division operation.
    
     4. Decrease the range of available values for 'emc-insert-inv-prob'.
    
        Actually, we don't need to have so much different values for
        that option. I beleve that values higher than 1M are completely
        useless. Choosing the upper limit as a power of 2 like 2**20 we
        will be able to mask the generated random value in a fast way
        and also avoid range issue, because same uint32_t can be used to
        store 2**20.
    
    This patch implements solution #4.
    
    CC: Ciara Loftus <ciara.loftus@intel.com>
    Fixes: 4c30b24602c3 ("dpif-netdev: Conditional EMC insert")
    Signed-off-by: Ilya Maximets <i.maximets@samsung.com>

    Acked-by: Antonio Fischetti <antonio.fischetti@intel.com>

    ---
    
    Infrastructure and logic introduced here will be used for fixing
    emc replacement policy.
    
     lib/dpif-netdev.c    | 14 ++++++++++----
     vswitchd/vswitch.xml |  3 ++-
     2 files changed, 12 insertions(+), 5 deletions(-)
    
    diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
    index 8ecc9d1..e23c674 100644
    --- a/lib/dpif-netdev.c
    +++ b/lib/dpif-netdev.c
    @@ -153,9 +153,14 @@ struct netdev_flow_key {
     #define EM_FLOW_HASH_MASK (EM_FLOW_HASH_ENTRIES - 1)
     #define EM_FLOW_HASH_SEGS 2
     
    +/* Set up maximum inverse EMC insertion probability to 2^20 - 1.
    + * Higher values considered useless in practice. */
    +#define EM_FLOW_INSERT_INV_PROB_SHIFT 20
    +#define EM_FLOW_INSERT_INV_PROB_MAX  (1 << EM_FLOW_INSERT_INV_PROB_SHIFT)
    +#define EM_FLOW_INSERT_INV_PROB_MASK (EM_FLOW_INSERT_INV_PROB_MAX - 1)
     /* Default EMC insert probability is 1 / DEFAULT_EM_FLOW_INSERT_INV_PROB */
     #define DEFAULT_EM_FLOW_INSERT_INV_PROB 100
    -#define DEFAULT_EM_FLOW_INSERT_MIN (UINT32_MAX /                     \
    +#define DEFAULT_EM_FLOW_INSERT_MIN (EM_FLOW_INSERT_INV_PROB_MAX /        \
                                         DEFAULT_EM_FLOW_INSERT_INV_PROB)
     
     struct emc_entry {
    @@ -2092,7 +2097,7 @@ emc_probabilistic_insert(struct dp_netdev_pmd_thread *pmd,
         uint32_t min;
         atomic_read_relaxed(&pmd->dp->emc_insert_min, &min);
     
    -    if (min && random_uint32() <= min) {
    +    if (min && (random_uint32() & EM_FLOW_INSERT_INV_PROB_MASK) < min) {
             emc_insert(&pmd->flow_cache, key, flow);
         }
     }
    @@ -2909,8 +2914,9 @@ dpif_netdev_set_config(struct dpif *dpif, const struct smap *other_config)
         }
     
         atomic_read_relaxed(&dp->emc_insert_min, &cur_min);
    -    if (insert_prob <= UINT32_MAX) {
    -        insert_min = insert_prob == 0 ? 0 : UINT32_MAX / insert_prob;
    +    if (insert_prob < EM_FLOW_INSERT_INV_PROB_MAX) {
    +        insert_min = insert_prob == 0
    +                     ? 0 : EM_FLOW_INSERT_INV_PROB_MAX / insert_prob;
         } else {
             insert_min = DEFAULT_EM_FLOW_INSERT_MIN;
             insert_prob = DEFAULT_EM_FLOW_INSERT_INV_PROB;
    diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
    index 074535b..61f252e 100644
    --- a/vswitchd/vswitch.xml
    +++ b/vswitchd/vswitch.xml
    @@ -381,7 +381,8 @@
           </column>
     
           <column name="other_config" key="emc-insert-inv-prob"
    -              type='{"type": "integer", "minInteger": 0, "maxInteger": 4294967295}'>
    +              type='{"type": "integer",
    +                     "minInteger": 0, "maxInteger": 1048575}'>
             <p>
               Specifies the inverse probability (1/emc-insert-inv-prob) of a flow
               being inserted into the Exact Match Cache (EMC). On average one in
    -- 
    2.7.4
Wang, Yipeng1 Aug. 4, 2017, 8:39 p.m. UTC | #2
> -----Original Message-----

> From: Darrell Ball [mailto:dball@vmware.com]

> Sent: Friday, August 4, 2017 11:35 AM

> To: Ilya Maximets <i.maximets@samsung.com>; ovs-dev@openvswitch.org

> Cc: Heetae Ahn <heetae82.ahn@samsung.com>; Wang, Yipeng1

> <yipeng1.wang@intel.com>; Kevin Traynor <ktraynor@redhat.com>; Loftus,

> Ciara <ciara.loftus@intel.com>; Fischetti, Antonio

> <antonio.fischetti@intel.com>

> Subject: Re: [PATCH v3 1/2] dpif-netdev: Decrease range of values for EMC

> probability.

> 

> 

> 

> -----Original Message-----

> From: Ilya Maximets <i.maximets@samsung.com>

> Date: Friday, August 4, 2017 at 7:17 AM

> To: "ovs-dev@openvswitch.org" <ovs-dev@openvswitch.org>

> Cc: Heetae Ahn <heetae82.ahn@samsung.com>, Darrell Ball

> <dball@vmware.com>, Yipeng Wang <yipeng1.wang@intel.com>, Kevin

> Traynor <ktraynor@redhat.com>, Ciara Loftus <ciara.loftus@intel.com>,

> Antonio Fischetti <antonio.fischetti@intel.com>, Ilya Maximets

> <i.maximets@samsung.com>

> Subject: [PATCH v3 1/2] dpif-netdev: Decrease range of values for EMC

> probability.

> 

>     Currently, real insertion probability is higher than configured

>     for the maximum case because of wrong usage of the random value.

> 

>     i.e. if 'emc-invert-inv-prob' == UINT32_MAX, then 'emc_insert_min'

>     equals to 1. In this case we're allowing insert if random vailue

>     is less or equal to 1. So, two of 2**32 values (0 and 1) are

>     allowed and real probability is 2 times higher than configured.

> 

>     This happens because 'random_uint32()' returns value in range

>     [0; UINT32_MAX], but for the checking to be correct we should

>     generate random value in range [0; UINT32_MAX - 1].

> 

> 

> I understand the calculation is slightly off.

> If the user enters 4,294,967,295 then the probability to insert into emc will be

> 2 out of 4,294,967,295 rather than 1 out of 4,294,967,295,

> 

> However, is there a general concern about such a low probability anyways ?

> This max inverse value would be rarely, if ever used and if used, it would be

> impossible

> to see the difference in any real use case.

> 

> The user might as well just disable emc rather than use such tiny probabilities.

> 

> This existing api was discussed extensively and was very contentious.

> 

> However, if patch 2 really has an absolute dependency on this patch 1, we

> can include it.

> I have done various testing and don’t see that, but I have some comments

> on the

> other threads.

> 

> 

[Wang, Yipeng] The dependency I can tell is that when do EMC_insert, the random bit is chosen by (random_value >> EM_FLOW_INSERT_INV_PROB_SHIFT & 1) in next patch, which means that the random bit will be "fully independent". 
If it is the only dependency, I guess it is fine to even not include this patch.

>     To fix this we have 4 possible solutions:

> 

>      1. need to use uint64_t for 'emc-insert-min' and calculate it

>         as '(UINT32_MAX + 1) / inverse_prob' to fairly check the full

>         range [0; UINT32_MAX].

> 

>         This may decrease performance becaue of 64 bit atomic ops.

> 

>      2. Forbid the '2**32 - 1' as the value for 'emc-insert-min'

>         because it's the only value we have issues with.

> 

>         This will require additional explanations and not very friendly

>         for users.

> 

>      3. Generate random value in range [0; UINT32_MAX - 1].

> 

>         This will require heavy division operation.

> 

>      4. Decrease the range of available values for 'emc-insert-inv-prob'.

> 

>         Actually, we don't need to have so much different values for

>         that option. I beleve that values higher than 1M are completely

>         useless. Choosing the upper limit as a power of 2 like 2**20 we

>         will be able to mask the generated random value in a fast way

>         and also avoid range issue, because same uint32_t can be used to

>         store 2**20.

> 

>     This patch implements solution #4.

> 

>     CC: Ciara Loftus <ciara.loftus@intel.com>

>     Fixes: 4c30b24602c3 ("dpif-netdev: Conditional EMC insert")

>     Signed-off-by: Ilya Maximets <i.maximets@samsung.com>

>     Acked-by: Antonio Fischetti <antonio.fischetti@intel.com>

>     ---

> 

>     Infrastructure and logic introduced here will be used for fixing

>     emc replacement policy.

> 

>      lib/dpif-netdev.c    | 14 ++++++++++----

>      vswitchd/vswitch.xml |  3 ++-

>      2 files changed, 12 insertions(+), 5 deletions(-)

> 

>     diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c

>     index 8ecc9d1..e23c674 100644

>     --- a/lib/dpif-netdev.c

>     +++ b/lib/dpif-netdev.c

>     @@ -153,9 +153,14 @@ struct netdev_flow_key {

>      #define EM_FLOW_HASH_MASK (EM_FLOW_HASH_ENTRIES - 1)

>      #define EM_FLOW_HASH_SEGS 2

> 

>     +/* Set up maximum inverse EMC insertion probability to 2^20 - 1.

>     + * Higher values considered useless in practice. */

>     +#define EM_FLOW_INSERT_INV_PROB_SHIFT 20

>     +#define EM_FLOW_INSERT_INV_PROB_MAX  (1 <<

> EM_FLOW_INSERT_INV_PROB_SHIFT)

>     +#define EM_FLOW_INSERT_INV_PROB_MASK

> (EM_FLOW_INSERT_INV_PROB_MAX - 1)

>      /* Default EMC insert probability is 1 /

> DEFAULT_EM_FLOW_INSERT_INV_PROB */

>      #define DEFAULT_EM_FLOW_INSERT_INV_PROB 100

>     -#define DEFAULT_EM_FLOW_INSERT_MIN (UINT32_MAX /                     \

>     +#define DEFAULT_EM_FLOW_INSERT_MIN

> (EM_FLOW_INSERT_INV_PROB_MAX /        \

>                                          DEFAULT_EM_FLOW_INSERT_INV_PROB)

> 

>      struct emc_entry {

>     @@ -2092,7 +2097,7 @@ emc_probabilistic_insert(struct

> dp_netdev_pmd_thread *pmd,

>          uint32_t min;

>          atomic_read_relaxed(&pmd->dp->emc_insert_min, &min);

> 

>     -    if (min && random_uint32() <= min) {

>     +    if (min && (random_uint32() & EM_FLOW_INSERT_INV_PROB_MASK)

> < min) {

>              emc_insert(&pmd->flow_cache, key, flow);

>          }

>      }

>     @@ -2909,8 +2914,9 @@ dpif_netdev_set_config(struct dpif *dpif, const

> struct smap *other_config)

>          }

> 

>          atomic_read_relaxed(&dp->emc_insert_min, &cur_min);

>     -    if (insert_prob <= UINT32_MAX) {

>     -        insert_min = insert_prob == 0 ? 0 : UINT32_MAX / insert_prob;

>     +    if (insert_prob < EM_FLOW_INSERT_INV_PROB_MAX) {

>     +        insert_min = insert_prob == 0

>     +                     ? 0 : EM_FLOW_INSERT_INV_PROB_MAX / insert_prob;

>          } else {

>              insert_min = DEFAULT_EM_FLOW_INSERT_MIN;

>              insert_prob = DEFAULT_EM_FLOW_INSERT_INV_PROB;

>     diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml

>     index 074535b..61f252e 100644

>     --- a/vswitchd/vswitch.xml

>     +++ b/vswitchd/vswitch.xml

>     @@ -381,7 +381,8 @@

>            </column>

> 

>            <column name="other_config" key="emc-insert-inv-prob"

>     -              type='{"type": "integer", "minInteger": 0, "maxInteger":

> 4294967295}'>

>     +              type='{"type": "integer",

>     +                     "minInteger": 0, "maxInteger": 1048575}'>

>              <p>

>                Specifies the inverse probability (1/emc-insert-inv-prob) of a flow

>                being inserted into the Exact Match Cache (EMC). On average one in

>     --

>     2.7.4

> 

> 

> 

> 

> 

> 

> 

>
Ilya Maximets Aug. 7, 2017, 11:54 a.m. UTC | #3
On 04.08.2017 23:39, Wang, Yipeng1 wrote:
> 
> 
>> -----Original Message-----
>> From: Darrell Ball [mailto:dball@vmware.com]
>> Sent: Friday, August 4, 2017 11:35 AM
>> To: Ilya Maximets <i.maximets@samsung.com>; ovs-dev@openvswitch.org
>> Cc: Heetae Ahn <heetae82.ahn@samsung.com>; Wang, Yipeng1
>> <yipeng1.wang@intel.com>; Kevin Traynor <ktraynor@redhat.com>; Loftus,
>> Ciara <ciara.loftus@intel.com>; Fischetti, Antonio
>> <antonio.fischetti@intel.com>
>> Subject: Re: [PATCH v3 1/2] dpif-netdev: Decrease range of values for EMC
>> probability.
>>
>>
>>
>> -----Original Message-----
>> From: Ilya Maximets <i.maximets@samsung.com>
>> Date: Friday, August 4, 2017 at 7:17 AM
>> To: "ovs-dev@openvswitch.org" <ovs-dev@openvswitch.org>
>> Cc: Heetae Ahn <heetae82.ahn@samsung.com>, Darrell Ball
>> <dball@vmware.com>, Yipeng Wang <yipeng1.wang@intel.com>, Kevin
>> Traynor <ktraynor@redhat.com>, Ciara Loftus <ciara.loftus@intel.com>,
>> Antonio Fischetti <antonio.fischetti@intel.com>, Ilya Maximets
>> <i.maximets@samsung.com>
>> Subject: [PATCH v3 1/2] dpif-netdev: Decrease range of values for EMC
>> probability.
>>
>>     Currently, real insertion probability is higher than configured
>>     for the maximum case because of wrong usage of the random value.
>>
>>     i.e. if 'emc-invert-inv-prob' == UINT32_MAX, then 'emc_insert_min'
>>     equals to 1. In this case we're allowing insert if random vailue
>>     is less or equal to 1. So, two of 2**32 values (0 and 1) are
>>     allowed and real probability is 2 times higher than configured.
>>
>>     This happens because 'random_uint32()' returns value in range
>>     [0; UINT32_MAX], but for the checking to be correct we should
>>     generate random value in range [0; UINT32_MAX - 1].
>>
>>
>> I understand the calculation is slightly off.
>> If the user enters 4,294,967,295 then the probability to insert into emc will be
>> 2 out of 4,294,967,295 rather than 1 out of 4,294,967,295,
>>
>> However, is there a general concern about such a low probability anyways ?
>> This max inverse value would be rarely, if ever used and if used, it would be
>> impossible
>> to see the difference in any real use case.
>>
>> The user might as well just disable emc rather than use such tiny probabilities.
>>
>> This existing api was discussed extensively and was very contentious.
>>
>> However, if patch 2 really has an absolute dependency on this patch 1, we
>> can include it.
>> I have done various testing and don’t see that, but I have some comments
>> on the
>> other threads.
>>
>>
> [Wang, Yipeng] The dependency I can tell is that when do EMC_insert, the random bit is chosen by (random_value >> EM_FLOW_INSERT_INV_PROB_SHIFT & 1) in next patch, which means that the random bit will be "fully independent".

Yes. The dependency is that bits of random value that passes (random_value < min) are
not fully random. Using them we may have additional issues with random slot choosing.
Even if we'll have large enough 'min' value we may have worse distribution of lower
bits than with original PRNG implemented in lib/random.

> If it is the only dependency, I guess it is fine to even not include this patch.
> 
>>     To fix this we have 4 possible solutions:
>>
>>      1. need to use uint64_t for 'emc-insert-min' and calculate it
>>         as '(UINT32_MAX + 1) / inverse_prob' to fairly check the full
>>         range [0; UINT32_MAX].
>>
>>         This may decrease performance becaue of 64 bit atomic ops.
>>
>>      2. Forbid the '2**32 - 1' as the value for 'emc-insert-min'
>>         because it's the only value we have issues with.
>>
>>         This will require additional explanations and not very friendly
>>         for users.
>>
>>      3. Generate random value in range [0; UINT32_MAX - 1].
>>
>>         This will require heavy division operation.
>>
>>      4. Decrease the range of available values for 'emc-insert-inv-prob'.
>>
>>         Actually, we don't need to have so much different values for
>>         that option. I beleve that values higher than 1M are completely
>>         useless. Choosing the upper limit as a power of 2 like 2**20 we
>>         will be able to mask the generated random value in a fast way
>>         and also avoid range issue, because same uint32_t can be used to
>>         store 2**20.
>>
>>     This patch implements solution #4.
>>
>>     CC: Ciara Loftus <ciara.loftus@intel.com>
>>     Fixes: 4c30b24602c3 ("dpif-netdev: Conditional EMC insert")
>>     Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
>>     Acked-by: Antonio Fischetti <antonio.fischetti@intel.com>
>>     ---
>>
>>     Infrastructure and logic introduced here will be used for fixing
>>     emc replacement policy.
>>
>>      lib/dpif-netdev.c    | 14 ++++++++++----
>>      vswitchd/vswitch.xml |  3 ++-
>>      2 files changed, 12 insertions(+), 5 deletions(-)
>>
>>     diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>>     index 8ecc9d1..e23c674 100644
>>     --- a/lib/dpif-netdev.c
>>     +++ b/lib/dpif-netdev.c
>>     @@ -153,9 +153,14 @@ struct netdev_flow_key {
>>      #define EM_FLOW_HASH_MASK (EM_FLOW_HASH_ENTRIES - 1)
>>      #define EM_FLOW_HASH_SEGS 2
>>
>>     +/* Set up maximum inverse EMC insertion probability to 2^20 - 1.
>>     + * Higher values considered useless in practice. */
>>     +#define EM_FLOW_INSERT_INV_PROB_SHIFT 20
>>     +#define EM_FLOW_INSERT_INV_PROB_MAX  (1 <<
>> EM_FLOW_INSERT_INV_PROB_SHIFT)
>>     +#define EM_FLOW_INSERT_INV_PROB_MASK
>> (EM_FLOW_INSERT_INV_PROB_MAX - 1)
>>      /* Default EMC insert probability is 1 /
>> DEFAULT_EM_FLOW_INSERT_INV_PROB */
>>      #define DEFAULT_EM_FLOW_INSERT_INV_PROB 100
>>     -#define DEFAULT_EM_FLOW_INSERT_MIN (UINT32_MAX /                     \
>>     +#define DEFAULT_EM_FLOW_INSERT_MIN
>> (EM_FLOW_INSERT_INV_PROB_MAX /        \
>>                                          DEFAULT_EM_FLOW_INSERT_INV_PROB)
>>
>>      struct emc_entry {
>>     @@ -2092,7 +2097,7 @@ emc_probabilistic_insert(struct
>> dp_netdev_pmd_thread *pmd,
>>          uint32_t min;
>>          atomic_read_relaxed(&pmd->dp->emc_insert_min, &min);
>>
>>     -    if (min && random_uint32() <= min) {
>>     +    if (min && (random_uint32() & EM_FLOW_INSERT_INV_PROB_MASK)
>> < min) {
>>              emc_insert(&pmd->flow_cache, key, flow);
>>          }
>>      }
>>     @@ -2909,8 +2914,9 @@ dpif_netdev_set_config(struct dpif *dpif, const
>> struct smap *other_config)
>>          }
>>
>>          atomic_read_relaxed(&dp->emc_insert_min, &cur_min);
>>     -    if (insert_prob <= UINT32_MAX) {
>>     -        insert_min = insert_prob == 0 ? 0 : UINT32_MAX / insert_prob;
>>     +    if (insert_prob < EM_FLOW_INSERT_INV_PROB_MAX) {
>>     +        insert_min = insert_prob == 0
>>     +                     ? 0 : EM_FLOW_INSERT_INV_PROB_MAX / insert_prob;
>>          } else {
>>              insert_min = DEFAULT_EM_FLOW_INSERT_MIN;
>>              insert_prob = DEFAULT_EM_FLOW_INSERT_INV_PROB;
>>     diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
>>     index 074535b..61f252e 100644
>>     --- a/vswitchd/vswitch.xml
>>     +++ b/vswitchd/vswitch.xml
>>     @@ -381,7 +381,8 @@
>>            </column>
>>
>>            <column name="other_config" key="emc-insert-inv-prob"
>>     -              type='{"type": "integer", "minInteger": 0, "maxInteger":
>> 4294967295}'>
>>     +              type='{"type": "integer",
>>     +                     "minInteger": 0, "maxInteger": 1048575}'>
>>              <p>
>>                Specifies the inverse probability (1/emc-insert-inv-prob) of a flow
>>                being inserted into the Exact Match Cache (EMC). On average one in
>>     --
>>     2.7.4
>>
>>
>>
>>
>>
>>
>>
>>
>
Darrell Ball Aug. 8, 2017, 7:11 a.m. UTC | #4
-----Original Message-----
From: Ilya Maximets <i.maximets@samsung.com>

Date: Monday, August 7, 2017 at 4:54 AM
To: "Wang, Yipeng1" <yipeng1.wang@intel.com>, Darrell Ball <dball@vmware.com>, "ovs-dev@openvswitch.org" <ovs-dev@openvswitch.org>
Cc: Heetae Ahn <heetae82.ahn@samsung.com>, Kevin Traynor <ktraynor@redhat.com>, "Loftus, Ciara" <ciara.loftus@intel.com>, "Fischetti, Antonio" <antonio.fischetti@intel.com>
Subject: Re: [PATCH v3 1/2] dpif-netdev: Decrease range of values for EMC probability.

    On 04.08.2017 23:39, Wang, Yipeng1 wrote:
    > 

    > 

    >> -----Original Message-----

    >> From: Darrell Ball [mailto:dball@vmware.com]

    >> Sent: Friday, August 4, 2017 11:35 AM

    >> To: Ilya Maximets <i.maximets@samsung.com>; ovs-dev@openvswitch.org

    >> Cc: Heetae Ahn <heetae82.ahn@samsung.com>; Wang, Yipeng1

    >> <yipeng1.wang@intel.com>; Kevin Traynor <ktraynor@redhat.com>; Loftus,

    >> Ciara <ciara.loftus@intel.com>; Fischetti, Antonio

    >> <antonio.fischetti@intel.com>

    >> Subject: Re: [PATCH v3 1/2] dpif-netdev: Decrease range of values for EMC

    >> probability.

    >>

    >>

    >>

    >> -----Original Message-----

    >> From: Ilya Maximets <i.maximets@samsung.com>

    >> Date: Friday, August 4, 2017 at 7:17 AM

    >> To: "ovs-dev@openvswitch.org" <ovs-dev@openvswitch.org>

    >> Cc: Heetae Ahn <heetae82.ahn@samsung.com>, Darrell Ball

    >> <dball@vmware.com>, Yipeng Wang <yipeng1.wang@intel.com>, Kevin

    >> Traynor <ktraynor@redhat.com>, Ciara Loftus <ciara.loftus@intel.com>,

    >> Antonio Fischetti <antonio.fischetti@intel.com>, Ilya Maximets

    >> <i.maximets@samsung.com>

    >> Subject: [PATCH v3 1/2] dpif-netdev: Decrease range of values for EMC

    >> probability.

    >>

    >>     Currently, real insertion probability is higher than configured

    >>     for the maximum case because of wrong usage of the random value.

    >>

    >>     i.e. if 'emc-invert-inv-prob' == UINT32_MAX, then 'emc_insert_min'

    >>     equals to 1. In this case we're allowing insert if random vailue

    >>     is less or equal to 1. So, two of 2**32 values (0 and 1) are

    >>     allowed and real probability is 2 times higher than configured.

    >>

    >>     This happens because 'random_uint32()' returns value in range

    >>     [0; UINT32_MAX], but for the checking to be correct we should

    >>     generate random value in range [0; UINT32_MAX - 1].

    >>

    >>

    >> I understand the calculation is slightly off.

    >> If the user enters 4,294,967,295 then the probability to insert into emc will be

    >> 2 out of 4,294,967,295 rather than 1 out of 4,294,967,295,

    >>

    >> However, is there a general concern about such a low probability anyways ?

    >> This max inverse value would be rarely, if ever used and if used, it would be

    >> impossible

    >> to see the difference in any real use case.

    >>

    >> The user might as well just disable emc rather than use such tiny probabilities.

    >>

    >> This existing api was discussed extensively and was very contentious.

    >>

    >> However, if patch 2 really has an absolute dependency on this patch 1, we

    >> can include it.

    >> I have done various testing and don’t see that, but I have some comments

    >> on the

    >> other threads.

    >>

    >>

    > [Wang, Yipeng] The dependency I can tell is that when do EMC_insert, the random bit is chosen by (random_value >> EM_FLOW_INSERT_INV_PROB_SHIFT & 1) in next patch, which means that the random bit will be "fully independent".

    
    Yes. The dependency is that bits of random value that passes (random_value < min) are
    not fully random. Using them we may have additional issues with random slot choosing.
    Even if we'll have large enough 'min' value we may have worse distribution of lower
    bits than with original PRNG implemented in lib/random.

Well, that is the only dependency of Patch 2 on this patch, right.
This is of course the real reason behind patch 1.
What I was getting at is – based on my testing, even with a pseudorandom packet distribution, the extra “pseudorandomness”
did not do much, meaning even in this case the packets themselves dominate  - one distribution, mine, sees no benefit and another
Antonio’s sees some benefit. 
In a real packet dataset and temporal variances of packets received, I don’t really know if we would gain anything because the packet
distribution “pseudorandomness” dominates the “randomness” of the selection.

So, we might end up even having changing the user level api to get some extra randomness that may not have any benefit in real
scenarios. 

I will ask Antonio for his distribution algorithm.

    
    > If it is the only dependency, I guess it is fine to even not include this patch.

    > 

    >>     To fix this we have 4 possible solutions:

    >>

    >>      1. need to use uint64_t for 'emc-insert-min' and calculate it

    >>         as '(UINT32_MAX + 1) / inverse_prob' to fairly check the full

    >>         range [0; UINT32_MAX].

    >>

    >>         This may decrease performance becaue of 64 bit atomic ops.

    >>

    >>      2. Forbid the '2**32 - 1' as the value for 'emc-insert-min'

    >>         because it's the only value we have issues with.

    >>

    >>         This will require additional explanations and not very friendly

    >>         for users.

    >>

    >>      3. Generate random value in range [0; UINT32_MAX - 1].

    >>

    >>         This will require heavy division operation.

    >>

    >>      4. Decrease the range of available values for 'emc-insert-inv-prob'.

    >>

    >>         Actually, we don't need to have so much different values for

    >>         that option. I beleve that values higher than 1M are completely

    >>         useless. Choosing the upper limit as a power of 2 like 2**20 we

    >>         will be able to mask the generated random value in a fast way

    >>         and also avoid range issue, because same uint32_t can be used to

    >>         store 2**20.

    >>

    >>     This patch implements solution #4.

    >>

    >>     CC: Ciara Loftus <ciara.loftus@intel.com>

    >>     Fixes: 4c30b24602c3 ("dpif-netdev: Conditional EMC insert")

    >>     Signed-off-by: Ilya Maximets <i.maximets@samsung.com>

    >>     Acked-by: Antonio Fischetti <antonio.fischetti@intel.com>

    >>     ---

    >>

    >>     Infrastructure and logic introduced here will be used for fixing

    >>     emc replacement policy.

    >>

    >>      lib/dpif-netdev.c    | 14 ++++++++++----

    >>      vswitchd/vswitch.xml |  3 ++-

    >>      2 files changed, 12 insertions(+), 5 deletions(-)

    >>

    >>     diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c

    >>     index 8ecc9d1..e23c674 100644

    >>     --- a/lib/dpif-netdev.c

    >>     +++ b/lib/dpif-netdev.c

    >>     @@ -153,9 +153,14 @@ struct netdev_flow_key {

    >>      #define EM_FLOW_HASH_MASK (EM_FLOW_HASH_ENTRIES - 1)

    >>      #define EM_FLOW_HASH_SEGS 2

    >>

    >>     +/* Set up maximum inverse EMC insertion probability to 2^20 - 1.

    >>     + * Higher values considered useless in practice. */

    >>     +#define EM_FLOW_INSERT_INV_PROB_SHIFT 20

    >>     +#define EM_FLOW_INSERT_INV_PROB_MAX  (1 <<

    >> EM_FLOW_INSERT_INV_PROB_SHIFT)

    >>     +#define EM_FLOW_INSERT_INV_PROB_MASK

    >> (EM_FLOW_INSERT_INV_PROB_MAX - 1)

    >>      /* Default EMC insert probability is 1 /

    >> DEFAULT_EM_FLOW_INSERT_INV_PROB */

    >>      #define DEFAULT_EM_FLOW_INSERT_INV_PROB 100

    >>     -#define DEFAULT_EM_FLOW_INSERT_MIN (UINT32_MAX /                     \

    >>     +#define DEFAULT_EM_FLOW_INSERT_MIN

    >> (EM_FLOW_INSERT_INV_PROB_MAX /        \

    >>                                          DEFAULT_EM_FLOW_INSERT_INV_PROB)

    >>

    >>      struct emc_entry {

    >>     @@ -2092,7 +2097,7 @@ emc_probabilistic_insert(struct

    >> dp_netdev_pmd_thread *pmd,

    >>          uint32_t min;

    >>          atomic_read_relaxed(&pmd->dp->emc_insert_min, &min);

    >>

    >>     -    if (min && random_uint32() <= min) {

    >>     +    if (min && (random_uint32() & EM_FLOW_INSERT_INV_PROB_MASK)

    >> < min) {

    >>              emc_insert(&pmd->flow_cache, key, flow);

    >>          }

    >>      }

    >>     @@ -2909,8 +2914,9 @@ dpif_netdev_set_config(struct dpif *dpif, const

    >> struct smap *other_config)

    >>          }

    >>

    >>          atomic_read_relaxed(&dp->emc_insert_min, &cur_min);

    >>     -    if (insert_prob <= UINT32_MAX) {

    >>     -        insert_min = insert_prob == 0 ? 0 : UINT32_MAX / insert_prob;

    >>     +    if (insert_prob < EM_FLOW_INSERT_INV_PROB_MAX) {

    >>     +        insert_min = insert_prob == 0

    >>     +                     ? 0 : EM_FLOW_INSERT_INV_PROB_MAX / insert_prob;

    >>          } else {

    >>              insert_min = DEFAULT_EM_FLOW_INSERT_MIN;

    >>              insert_prob = DEFAULT_EM_FLOW_INSERT_INV_PROB;

    >>     diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml

    >>     index 074535b..61f252e 100644

    >>     --- a/vswitchd/vswitch.xml

    >>     +++ b/vswitchd/vswitch.xml

    >>     @@ -381,7 +381,8 @@

    >>            </column>

    >>

    >>            <column name="other_config" key="emc-insert-inv-prob"

    >>     -              type='{"type": "integer", "minInteger": 0, "maxInteger":

    >> 4294967295}'>

    >>     +              type='{"type": "integer",

    >>     +                     "minInteger": 0, "maxInteger": 1048575}'>

    >>              <p>

    >>                Specifies the inverse probability (1/emc-insert-inv-prob) of a flow

    >>                being inserted into the Exact Match Cache (EMC). On average one in

    >>     --

    >>     2.7.4

    >>

    >>

    >>

    >>

    >>

    >>

    >>

    >>

    >
Fischetti, Antonio Aug. 8, 2017, 8:24 a.m. UTC | #5
> -----Original Message-----

> From: Darrell Ball [mailto:dball@vmware.com]

> Sent: Tuesday, August 8, 2017 8:12 AM

> To: Ilya Maximets <i.maximets@samsung.com>; Wang, Yipeng1

> <yipeng1.wang@intel.com>; ovs-dev@openvswitch.org

> Cc: Heetae Ahn <heetae82.ahn@samsung.com>; Kevin Traynor <ktraynor@redhat.com>;

> Loftus, Ciara <ciara.loftus@intel.com>; Fischetti, Antonio

> <antonio.fischetti@intel.com>

> Subject: Re: [PATCH v3 1/2] dpif-netdev: Decrease range of values for EMC

> probability.

> 

> 

> 

> -----Original Message-----

> From: Ilya Maximets <i.maximets@samsung.com>

> Date: Monday, August 7, 2017 at 4:54 AM

> To: "Wang, Yipeng1" <yipeng1.wang@intel.com>, Darrell Ball <dball@vmware.com>,

> "ovs-dev@openvswitch.org" <ovs-dev@openvswitch.org>

> Cc: Heetae Ahn <heetae82.ahn@samsung.com>, Kevin Traynor <ktraynor@redhat.com>,

> "Loftus, Ciara" <ciara.loftus@intel.com>, "Fischetti, Antonio"

> <antonio.fischetti@intel.com>

> Subject: Re: [PATCH v3 1/2] dpif-netdev: Decrease range of values for EMC

> probability.

> 

>     On 04.08.2017 23:39, Wang, Yipeng1 wrote:

>     >

>     >

>     >> -----Original Message-----

>     >> From: Darrell Ball [mailto:dball@vmware.com]

>     >> Sent: Friday, August 4, 2017 11:35 AM

>     >> To: Ilya Maximets <i.maximets@samsung.com>; ovs-dev@openvswitch.org

>     >> Cc: Heetae Ahn <heetae82.ahn@samsung.com>; Wang, Yipeng1

>     >> <yipeng1.wang@intel.com>; Kevin Traynor <ktraynor@redhat.com>; Loftus,

>     >> Ciara <ciara.loftus@intel.com>; Fischetti, Antonio

>     >> <antonio.fischetti@intel.com>

>     >> Subject: Re: [PATCH v3 1/2] dpif-netdev: Decrease range of values for

> EMC

>     >> probability.

>     >>

>     >>

>     >>

>     >> -----Original Message-----

>     >> From: Ilya Maximets <i.maximets@samsung.com>

>     >> Date: Friday, August 4, 2017 at 7:17 AM

>     >> To: "ovs-dev@openvswitch.org" <ovs-dev@openvswitch.org>

>     >> Cc: Heetae Ahn <heetae82.ahn@samsung.com>, Darrell Ball

>     >> <dball@vmware.com>, Yipeng Wang <yipeng1.wang@intel.com>, Kevin

>     >> Traynor <ktraynor@redhat.com>, Ciara Loftus <ciara.loftus@intel.com>,

>     >> Antonio Fischetti <antonio.fischetti@intel.com>, Ilya Maximets

>     >> <i.maximets@samsung.com>

>     >> Subject: [PATCH v3 1/2] dpif-netdev: Decrease range of values for EMC

>     >> probability.

>     >>

>     >>     Currently, real insertion probability is higher than configured

>     >>     for the maximum case because of wrong usage of the random value.

>     >>

>     >>     i.e. if 'emc-invert-inv-prob' == UINT32_MAX, then 'emc_insert_min'

>     >>     equals to 1. In this case we're allowing insert if random vailue

>     >>     is less or equal to 1. So, two of 2**32 values (0 and 1) are

>     >>     allowed and real probability is 2 times higher than configured.

>     >>

>     >>     This happens because 'random_uint32()' returns value in range

>     >>     [0; UINT32_MAX], but for the checking to be correct we should

>     >>     generate random value in range [0; UINT32_MAX - 1].

>     >>

>     >>

>     >> I understand the calculation is slightly off.

>     >> If the user enters 4,294,967,295 then the probability to insert into emc

> will be

>     >> 2 out of 4,294,967,295 rather than 1 out of 4,294,967,295,

>     >>

>     >> However, is there a general concern about such a low probability anyways

> ?

>     >> This max inverse value would be rarely, if ever used and if used, it

> would be

>     >> impossible

>     >> to see the difference in any real use case.

>     >>

>     >> The user might as well just disable emc rather than use such tiny

> probabilities.

>     >>

>     >> This existing api was discussed extensively and was very contentious.

>     >>

>     >> However, if patch 2 really has an absolute dependency on this patch 1,

> we

>     >> can include it.

>     >> I have done various testing and don’t see that, but I have some comments

>     >> on the

>     >> other threads.

>     >>

>     >>

>     > [Wang, Yipeng] The dependency I can tell is that when do EMC_insert, the

> random bit is chosen by (random_value >> EM_FLOW_INSERT_INV_PROB_SHIFT & 1) in

> next patch, which means that the random bit will be "fully independent".

> 

>     Yes. The dependency is that bits of random value that passes (random_value

> < min) are

>     not fully random. Using them we may have additional issues with random slot

> choosing.

>     Even if we'll have large enough 'min' value we may have worse distribution

> of lower

>     bits than with original PRNG implemented in lib/random.

> 

> Well, that is the only dependency of Patch 2 on this patch, right.

> This is of course the real reason behind patch 1.

> What I was getting at is – based on my testing, even with a pseudorandom packet

> distribution, the extra “pseudorandomness”

> did not do much, meaning even in this case the packets themselves dominate  -

> one distribution, mine, sees no benefit and another

> Antonio’s sees some benefit.

> In a real packet dataset and temporal variances of packets received, I don’t

> really know if we would gain anything because the packet

> distribution “pseudorandomness” dominates the “randomness” of the selection.

> 

> So, we might end up even having changing the user level api to get some extra

> randomness that may not have any benefit in real

> scenarios.

> 

> I will ask Antonio for his distribution algorithm.


[Antonio]
I set up the generator to loop on the dest IP addr, eg to generate 10 different
packets:
IPsrc:10.10.10.10, IPdest: 20.20.20.[20-29]
for sure this is not a real packet dataset.


> 

> 

>     > If it is the only dependency, I guess it is fine to even not include this

> patch.

>     >

>     >>     To fix this we have 4 possible solutions:

>     >>

>     >>      1. need to use uint64_t for 'emc-insert-min' and calculate it

>     >>         as '(UINT32_MAX + 1) / inverse_prob' to fairly check the full

>     >>         range [0; UINT32_MAX].

>     >>

>     >>         This may decrease performance becaue of 64 bit atomic ops.

>     >>

>     >>      2. Forbid the '2**32 - 1' as the value for 'emc-insert-min'

>     >>         because it's the only value we have issues with.

>     >>

>     >>         This will require additional explanations and not very friendly

>     >>         for users.

>     >>

>     >>      3. Generate random value in range [0; UINT32_MAX - 1].

>     >>

>     >>         This will require heavy division operation.

>     >>

>     >>      4. Decrease the range of available values for 'emc-insert-inv-

> prob'.

>     >>

>     >>         Actually, we don't need to have so much different values for

>     >>         that option. I beleve that values higher than 1M are completely

>     >>         useless. Choosing the upper limit as a power of 2 like 2**20 we

>     >>         will be able to mask the generated random value in a fast way

>     >>         and also avoid range issue, because same uint32_t can be used to

>     >>         store 2**20.

>     >>

>     >>     This patch implements solution #4.

>     >>

>     >>     CC: Ciara Loftus <ciara.loftus@intel.com>

>     >>     Fixes: 4c30b24602c3 ("dpif-netdev: Conditional EMC insert")

>     >>     Signed-off-by: Ilya Maximets <i.maximets@samsung.com>

>     >>     Acked-by: Antonio Fischetti <antonio.fischetti@intel.com>

>     >>     ---

>     >>

>     >>     Infrastructure and logic introduced here will be used for fixing

>     >>     emc replacement policy.

>     >>

>     >>      lib/dpif-netdev.c    | 14 ++++++++++----

>     >>      vswitchd/vswitch.xml |  3 ++-

>     >>      2 files changed, 12 insertions(+), 5 deletions(-)

>     >>

>     >>     diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c

>     >>     index 8ecc9d1..e23c674 100644

>     >>     --- a/lib/dpif-netdev.c

>     >>     +++ b/lib/dpif-netdev.c

>     >>     @@ -153,9 +153,14 @@ struct netdev_flow_key {

>     >>      #define EM_FLOW_HASH_MASK (EM_FLOW_HASH_ENTRIES - 1)

>     >>      #define EM_FLOW_HASH_SEGS 2

>     >>

>     >>     +/* Set up maximum inverse EMC insertion probability to 2^20 - 1.

>     >>     + * Higher values considered useless in practice. */

>     >>     +#define EM_FLOW_INSERT_INV_PROB_SHIFT 20

>     >>     +#define EM_FLOW_INSERT_INV_PROB_MAX  (1 <<

>     >> EM_FLOW_INSERT_INV_PROB_SHIFT)

>     >>     +#define EM_FLOW_INSERT_INV_PROB_MASK

>     >> (EM_FLOW_INSERT_INV_PROB_MAX - 1)

>     >>      /* Default EMC insert probability is 1 /

>     >> DEFAULT_EM_FLOW_INSERT_INV_PROB */

>     >>      #define DEFAULT_EM_FLOW_INSERT_INV_PROB 100

>     >>     -#define DEFAULT_EM_FLOW_INSERT_MIN (UINT32_MAX /

> \

>     >>     +#define DEFAULT_EM_FLOW_INSERT_MIN

>     >> (EM_FLOW_INSERT_INV_PROB_MAX /        \

>     >>

> DEFAULT_EM_FLOW_INSERT_INV_PROB)

>     >>

>     >>      struct emc_entry {

>     >>     @@ -2092,7 +2097,7 @@ emc_probabilistic_insert(struct

>     >> dp_netdev_pmd_thread *pmd,

>     >>          uint32_t min;

>     >>          atomic_read_relaxed(&pmd->dp->emc_insert_min, &min);

>     >>

>     >>     -    if (min && random_uint32() <= min) {

>     >>     +    if (min && (random_uint32() & EM_FLOW_INSERT_INV_PROB_MASK)

>     >> < min) {

>     >>              emc_insert(&pmd->flow_cache, key, flow);

>     >>          }

>     >>      }

>     >>     @@ -2909,8 +2914,9 @@ dpif_netdev_set_config(struct dpif *dpif,

> const

>     >> struct smap *other_config)

>     >>          }

>     >>

>     >>          atomic_read_relaxed(&dp->emc_insert_min, &cur_min);

>     >>     -    if (insert_prob <= UINT32_MAX) {

>     >>     -        insert_min = insert_prob == 0 ? 0 : UINT32_MAX /

> insert_prob;

>     >>     +    if (insert_prob < EM_FLOW_INSERT_INV_PROB_MAX) {

>     >>     +        insert_min = insert_prob == 0

>     >>     +                     ? 0 : EM_FLOW_INSERT_INV_PROB_MAX /

> insert_prob;

>     >>          } else {

>     >>              insert_min = DEFAULT_EM_FLOW_INSERT_MIN;

>     >>              insert_prob = DEFAULT_EM_FLOW_INSERT_INV_PROB;

>     >>     diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml

>     >>     index 074535b..61f252e 100644

>     >>     --- a/vswitchd/vswitch.xml

>     >>     +++ b/vswitchd/vswitch.xml

>     >>     @@ -381,7 +381,8 @@

>     >>            </column>

>     >>

>     >>            <column name="other_config" key="emc-insert-inv-prob"

>     >>     -              type='{"type": "integer", "minInteger": 0,

> "maxInteger":

>     >> 4294967295}'>

>     >>     +              type='{"type": "integer",

>     >>     +                     "minInteger": 0, "maxInteger": 1048575}'>

>     >>              <p>

>     >>                Specifies the inverse probability (1/emc-insert-inv-prob)

> of a flow

>     >>                being inserted into the Exact Match Cache (EMC). On

> average one in

>     >>     --

>     >>     2.7.4

>     >>

>     >>

>     >>

>     >>

>     >>

>     >>

>     >>

>     >>

>     >

>
diff mbox

Patch

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 8ecc9d1..e23c674 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -153,9 +153,14 @@  struct netdev_flow_key {
 #define EM_FLOW_HASH_MASK (EM_FLOW_HASH_ENTRIES - 1)
 #define EM_FLOW_HASH_SEGS 2
 
+/* Set up maximum inverse EMC insertion probability to 2^20 - 1.
+ * Higher values considered useless in practice. */
+#define EM_FLOW_INSERT_INV_PROB_SHIFT 20
+#define EM_FLOW_INSERT_INV_PROB_MAX  (1 << EM_FLOW_INSERT_INV_PROB_SHIFT)
+#define EM_FLOW_INSERT_INV_PROB_MASK (EM_FLOW_INSERT_INV_PROB_MAX - 1)
 /* Default EMC insert probability is 1 / DEFAULT_EM_FLOW_INSERT_INV_PROB */
 #define DEFAULT_EM_FLOW_INSERT_INV_PROB 100
-#define DEFAULT_EM_FLOW_INSERT_MIN (UINT32_MAX /                     \
+#define DEFAULT_EM_FLOW_INSERT_MIN (EM_FLOW_INSERT_INV_PROB_MAX /        \
                                     DEFAULT_EM_FLOW_INSERT_INV_PROB)
 
 struct emc_entry {
@@ -2092,7 +2097,7 @@  emc_probabilistic_insert(struct dp_netdev_pmd_thread *pmd,
     uint32_t min;
     atomic_read_relaxed(&pmd->dp->emc_insert_min, &min);
 
-    if (min && random_uint32() <= min) {
+    if (min && (random_uint32() & EM_FLOW_INSERT_INV_PROB_MASK) < min) {
         emc_insert(&pmd->flow_cache, key, flow);
     }
 }
@@ -2909,8 +2914,9 @@  dpif_netdev_set_config(struct dpif *dpif, const struct smap *other_config)
     }
 
     atomic_read_relaxed(&dp->emc_insert_min, &cur_min);
-    if (insert_prob <= UINT32_MAX) {
-        insert_min = insert_prob == 0 ? 0 : UINT32_MAX / insert_prob;
+    if (insert_prob < EM_FLOW_INSERT_INV_PROB_MAX) {
+        insert_min = insert_prob == 0
+                     ? 0 : EM_FLOW_INSERT_INV_PROB_MAX / insert_prob;
     } else {
         insert_min = DEFAULT_EM_FLOW_INSERT_MIN;
         insert_prob = DEFAULT_EM_FLOW_INSERT_INV_PROB;
diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
index 074535b..61f252e 100644
--- a/vswitchd/vswitch.xml
+++ b/vswitchd/vswitch.xml
@@ -381,7 +381,8 @@ 
       </column>
 
       <column name="other_config" key="emc-insert-inv-prob"
-              type='{"type": "integer", "minInteger": 0, "maxInteger": 4294967295}'>
+              type='{"type": "integer",
+                     "minInteger": 0, "maxInteger": 1048575}'>
         <p>
           Specifies the inverse probability (1/emc-insert-inv-prob) of a flow
           being inserted into the Exact Match Cache (EMC). On average one in