diff mbox series

[ovs-dev,net] openvswitch: meter: fix race when getting now_ms.

Message ID 20210513100300.22735-1-thomas.liu@ucloud.cn
State Awaiting Upstream
Headers show
Series [ovs-dev,net] openvswitch: meter: fix race when getting now_ms. | expand

Commit Message

Tao Liu May 13, 2021, 10:03 a.m. UTC
We have observed meters working unexpected if traffic is 3+Gbit/s
with multiple connections.

now_ms is not pretected by meter->lock, we may get a negative
long_delta_ms when another cpu updated meter->used, then:
    delta_ms = (u32)long_delta_ms;
which will be a large value.

    band->bucket += delta_ms * band->rate;
then we get a wrong band->bucket.

Fixes: 96fbc13d7e77 ("openvswitch: Add meter infrastructure")
Signed-off-by: Tao Liu <thomas.liu@ucloud.cn>
---
 net/openvswitch/meter.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Ilya Maximets May 13, 2021, 10:21 a.m. UTC | #1
On 5/13/21 12:03 PM, Tao Liu wrote:
> We have observed meters working unexpected if traffic is 3+Gbit/s
> with multiple connections.
> 
> now_ms is not pretected by meter->lock, we may get a negative
> long_delta_ms when another cpu updated meter->used, then:
>     delta_ms = (u32)long_delta_ms;
> which will be a large value.
> 
>     band->bucket += delta_ms * band->rate;
> then we get a wrong band->bucket.
> 
> Fixes: 96fbc13d7e77 ("openvswitch: Add meter infrastructure")
> Signed-off-by: Tao Liu <thomas.liu@ucloud.cn>
> ---

Hi.  Thanks for the patch!
We fixed the same issue in userspace datapath some time ago and
we did that a bit differently by just setting negative long_delta_ms
to zero in assumption that all threads received their packets at
the same millisecond (which is most likely true if we have this
kind of race).  This should be also cheaper from form the performance
point of view to not have an extra call and a division under the
spinlock.   What do you think?

It's also a good thing to have more or less similar implementation
for all datapaths.

Here is a userspace patch:

commit acc5df0e3cb036524d49891fdb9ba89b609dd26a
Author: Ilya Maximets <i.maximets@ovn.org>
Date:   Thu Oct 24 15:15:07 2019 +0200

    dpif-netdev: Fix time delta overflow in case of race for meter lock.
    
    There is a race window between getting the time and getting the meter
    lock.  This could lead to situation where the thread with larger
    current time (this thread called time_{um}sec() later than others)
    will acquire meter lock first and update meter->used to the large
    value.  Next threads will try to calculate time delta by subtracting
    the large meter->used from their lower time getting the negative value
    which will be converted to a big unsigned delta.
    
    Fix that by assuming that all these threads received packets in the
    same time in this case, i.e. dropping negative delta to 0.
    
    CC: Jarno Rajahalme <jarno@ovn.org>
    Fixes: 4b27db644a8c ("dpif-netdev: Simple DROP meter implementation.")
    Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2019-September/363126.html
    Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
    Acked-by: William Tu <u9012063@gmail.com>

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index c09b8fd95..4720ba1ab 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -5646,6 +5646,14 @@ dp_netdev_run_meter(struct dp_netdev *dp, struct dp_packet_batch *packets_,
     /* All packets will hit the meter at the same time. */
     long_delta_t = now / 1000 - meter->used / 1000; /* msec */
 
+    if (long_delta_t < 0) {
+        /* This condition means that we have several threads fighting for a
+           meter lock, and the one who received the packets a bit later wins.
+           Assuming that all racing threads received packets at the same time
+           to avoid overflow. */
+        long_delta_t = 0;
+    }
+
     /* Make sure delta_t will not be too large, so that bucket will not
      * wrap around below. */
     delta_t = (long_delta_t > (long long int)meter->max_delta_t)
---
Tao Liu May 13, 2021, 12:31 p.m. UTC | #2
From: Ilya Maximets <i.maximets@ovn.org>
Date: 2021-05-13 18:21:31
To:  Tao Liu <thomas.liu@ucloud.cn>,pshelar@ovn.org
Cc:  dev@openvswitch.org,netdev@vger.kernel.org,linux-kernel@vger.kernel.org,i.maximets@ovn.org,jean.tourrilhes@hpe.com,kuba@kernel.org,davem@davemloft.net,Eelco Chaudron <echaudro@redhat.com>
Subject: Re: [ovs-dev] [PATCH net] openvswitch: meter: fix race when getting now_ms.>On 5/13/21 12:03 PM, Tao Liu wrote:
>> We have observed meters working unexpected if traffic is 3+Gbit/s
>> with multiple connections.
>> 
>> now_ms is not pretected by meter->lock, we may get a negative
>> long_delta_ms when another cpu updated meter->used, then:
>>     delta_ms = (u32)long_delta_ms;
>> which will be a large value.
>> 
>>     band->bucket += delta_ms * band->rate;
>> then we get a wrong band->bucket.
>> 
>> Fixes: 96fbc13d7e77 ("openvswitch: Add meter infrastructure")
>> Signed-off-by: Tao Liu <thomas.liu@ucloud.cn>
>> ---
>
>Hi.  Thanks for the patch!
>We fixed the same issue in userspace datapath some time ago and
>we did that a bit differently by just setting negative long_delta_ms
>to zero in assumption that all threads received their packets at
>the same millisecond (which is most likely true if we have this
>kind of race).  This should be also cheaper from form the performance
>point of view to not have an extra call and a division under the
>spinlock.   What do you think?


Yes, I agree with you. The userspace implementation has same effection,
and looks a bit more efficient. I will send a v2.


>It's also a good thing to have more or less similar implementation
>for all datapaths.
>
>Here is a userspace patch:
>
>commit acc5df0e3cb036524d49891fdb9ba89b609dd26a
>Author: Ilya Maximets <i.maximets@ovn.org>
>Date:   Thu Oct 24 15:15:07 2019 +0200
>
>    dpif-netdev: Fix time delta overflow in case of race for meter lock.
>    
>    There is a race window between getting the time and getting the meter
>    lock.  This could lead to situation where the thread with larger
>    current time (this thread called time_{um}sec() later than others)
>    will acquire meter lock first and update meter->used to the large
>    value.  Next threads will try to calculate time delta by subtracting
>    the large meter->used from their lower time getting the negative value
>    which will be converted to a big unsigned delta.
>    
>    Fix that by assuming that all these threads received packets in the
>    same time in this case, i.e. dropping negative delta to 0.
>    
>    CC: Jarno Rajahalme <jarno@ovn.org>
>    Fixes: 4b27db644a8c ("dpif-netdev: Simple DROP meter implementation.")
>    Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2019-September/363126.html
>    Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
>    Acked-by: William Tu <u9012063@gmail.com>
>
>diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>index c09b8fd95..4720ba1ab 100644
>--- a/lib/dpif-netdev.c
>+++ b/lib/dpif-netdev.c
>@@ -5646,6 +5646,14 @@ dp_netdev_run_meter(struct dp_netdev *dp, struct dp_packet_batch *packets_,
>     /* All packets will hit the meter at the same time. */
>     long_delta_t = now / 1000 - meter->used / 1000; /* msec */
> 
>+    if (long_delta_t < 0) {
>+        /* This condition means that we have several threads fighting for a
>+           meter lock, and the one who received the packets a bit later wins.
>+           Assuming that all racing threads received packets at the same time
>+           to avoid overflow. */
>+        long_delta_t = 0;
>+    }
>+
>     /* Make sure delta_t will not be too large, so that bucket will not
>      * wrap around below. */
>     delta_t = (long_delta_t > (long long int)meter->max_delta_t)
>---
diff mbox series

Patch

diff --git a/net/openvswitch/meter.c b/net/openvswitch/meter.c
index 96b524c..c50ab7f 100644
--- a/net/openvswitch/meter.c
+++ b/net/openvswitch/meter.c
@@ -593,7 +593,7 @@  static int ovs_meter_cmd_del(struct sk_buff *skb, struct genl_info *info)
 bool ovs_meter_execute(struct datapath *dp, struct sk_buff *skb,
 		       struct sw_flow_key *key, u32 meter_id)
 {
-	long long int now_ms = div_u64(ktime_get_ns(), 1000 * 1000);
+	long long int now_ms;
 	long long int long_delta_ms;
 	struct dp_meter_band *band;
 	struct dp_meter *meter;
@@ -610,6 +610,7 @@  bool ovs_meter_execute(struct datapath *dp, struct sk_buff *skb,
 	/* Lock the meter while using it. */
 	spin_lock(&meter->lock);
 
+	now_ms = div_u64(ktime_get_ns(), 1000 * 1000);
 	long_delta_ms = (now_ms - meter->used); /* ms */
 
 	/* Make sure delta_ms will not be too large, so that bucket will not