diff mbox series

[ovs-dev,v2] bond: Reset stats when deleting post recirc rule.

Message ID 20240220065008.92838-1-amorenoz@redhat.com
State Changes Requested
Delegated to: Ilya Maximets
Headers show
Series [ovs-dev,v2] bond: Reset stats when deleting post recirc rule. | expand

Checks

Context Check Description
ovsrobot/apply-robot success apply and check: success
ovsrobot/github-robot-_Build_and_Test success github build: passed
ovsrobot/intel-ovs-compilation success test: success

Commit Message

Adrian Moreno Feb. 20, 2024, 6:50 a.m. UTC
In order to properly balance bond traffic, ofproto/bond periodically
reads usage statistics of the post-recirculation rules (which are added
to a hidden internal table).

To do that, each "struct bond_entry" (which represents a hash within a
bond) stores the last seen statistics for its rule. When a hash is moved
to another member (due to a bond rebalance or the previous member going
down), the rule is typically just modified, i.e: same match different
actions. In this case, statistics are preserved and accounting continues
to work.

However, if the rule gets completely deleted (e.g: when all bond members
go down) and then re-created, the new rule will have 0 tx_bytes but its
associated entry will still store a non-zero last-seen value.
This situation leads to an overflow of the delta calculation (computed
as [current_stats_value - last_seen_value]), which can affect traffic
as the hash will be considered to carry a lot of traffic and rebalancing
will kick in.

In order to fix this situation, reset the value of last seen statistics
on rule deletion.

Implementation notes:
Modifying pr_tx_bytes requires write-locking the global rwlock but a
lockless version of update_recirc_rules was being maintained to avoid locking
on bon_unref().
Considering the small impact of locking during bond removal, removing the
lockless version and relying on clang's thread safety analysis is preferred.

Also, folding Ilya's [1], i.e: fixing thread safety annotation in
update_recirc_rules() to require holding write-lock.

[1]
https://patchwork.ozlabs.org/project/openvswitch/patch/20240209161718.1149494-1-i.maximets@ovn.org/

Reported-at: https://github.com/openvswitch/ovs-issues/issues/319
Co-authored-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
---
 ofproto/bond.c | 33 +++++++++++++++------------------
 1 file changed, 15 insertions(+), 18 deletions(-)

Comments

Mike Pattrick Feb. 20, 2024, 3:34 p.m. UTC | #1
On Tue, Feb 20, 2024 at 1:50 AM Adrian Moreno <amorenoz@redhat.com> wrote:
>
> In order to properly balance bond traffic, ofproto/bond periodically
> reads usage statistics of the post-recirculation rules (which are added
> to a hidden internal table).
>
> To do that, each "struct bond_entry" (which represents a hash within a
> bond) stores the last seen statistics for its rule. When a hash is moved
> to another member (due to a bond rebalance or the previous member going
> down), the rule is typically just modified, i.e: same match different
> actions. In this case, statistics are preserved and accounting continues
> to work.
>
> However, if the rule gets completely deleted (e.g: when all bond members
> go down) and then re-created, the new rule will have 0 tx_bytes but its
> associated entry will still store a non-zero last-seen value.
> This situation leads to an overflow of the delta calculation (computed
> as [current_stats_value - last_seen_value]), which can affect traffic
> as the hash will be considered to carry a lot of traffic and rebalancing
> will kick in.
>
> In order to fix this situation, reset the value of last seen statistics
> on rule deletion.
>
> Implementation notes:
> Modifying pr_tx_bytes requires write-locking the global rwlock but a
> lockless version of update_recirc_rules was being maintained to avoid locking
> on bon_unref().
> Considering the small impact of locking during bond removal, removing the
> lockless version and relying on clang's thread safety analysis is preferred.
>
> Also, folding Ilya's [1], i.e: fixing thread safety annotation in
> update_recirc_rules() to require holding write-lock.
>
> [1]
> https://patchwork.ozlabs.org/project/openvswitch/patch/20240209161718.1149494-1-i.maximets@ovn.org/
>
> Reported-at: https://github.com/openvswitch/ovs-issues/issues/319
> Co-authored-by: Ilya Maximets <i.maximets@ovn.org>
> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
> Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
> ---

Looks reasonable to me.

Acked-by: Mike Pattrick <mkp@redhat.com>
Ilya Maximets Feb. 21, 2024, 8:09 p.m. UTC | #2
On 2/20/24 07:50, Adrian Moreno wrote:
> In order to properly balance bond traffic, ofproto/bond periodically
> reads usage statistics of the post-recirculation rules (which are added
> to a hidden internal table).
> 
> To do that, each "struct bond_entry" (which represents a hash within a
> bond) stores the last seen statistics for its rule. When a hash is moved
> to another member (due to a bond rebalance or the previous member going
> down), the rule is typically just modified, i.e: same match different
> actions. In this case, statistics are preserved and accounting continues
> to work.
> 
> However, if the rule gets completely deleted (e.g: when all bond members
> go down) and then re-created, the new rule will have 0 tx_bytes but its
> associated entry will still store a non-zero last-seen value.
> This situation leads to an overflow of the delta calculation (computed
> as [current_stats_value - last_seen_value]), which can affect traffic
> as the hash will be considered to carry a lot of traffic and rebalancing
> will kick in.

Hi, Adrian.  Thanks for the v2!

The code seems fine, but I can still reproduce the issue described here
in the commit message.  I just added the following lines to an existing
rebalancing test and see overflowed load counters on hashes:

diff --git a/tests/ofproto-dpif.at b/tests/ofproto-dpif.at
index daeea7775..ca478bb29 100644
--- a/tests/ofproto-dpif.at
+++ b/tests/ofproto-dpif.at
@@ -547,6 +547,22 @@ ovs-appctl time/warp 1000 100
 ovs-appctl bond/show > bond3.txt
 AT_CHECK([sed -n '/member p2/,/^$/p' bond3.txt | grep 'hash'], [0], [ignore])
 
+AT_CHECK([ovs-appctl netdev-dummy/set-admin-state p1 down], 0, [OK
+])
+AT_CHECK([ovs-appctl netdev-dummy/set-admin-state p2 down], 0, [OK
+])
+ovs-appctl time/warp 1000 100
+AT_CHECK([ovs-appctl netdev-dummy/set-admin-state p1 up], 0, [OK
+])
+AT_CHECK([ovs-appctl netdev-dummy/set-admin-state p2 up], 0, [OK
+])
+ovs-appctl time/warp 1000 100
+
+AT_CHECK([SEND_TCP_BOND_PKTS([p5], [5], [65500])])
+# We sent 49125 KB of data total in 3 batches.  No hash should have more
+# than that amount of load.  Just checking that it is within 5 digits.
+AT_CHECK([ovs-appctl bond/show | grep -E '[[0-9]]{6}'], [1])
+
 OVS_VSWITCHD_STOP()
 AT_CLEANUP
 
---

./ofproto-dpif.at:564: ovs-appctl bond/show | grep -E '[0-9]{6}'
--- /dev/null   2024-01-29 23:31:19.129371175 +0100
+++ /ovs/tests/testsuite.dir/at-groups/1132/stdout
@@ -0,0 +1,62 @@
+  hash 0: 9007199254740940 kB load
+  hash 2: 9007199254740888 kB load
+  hash 25: 9007199254740940 kB load
+  hash 30: 9007199254740940 kB load
+  hash 33: 9007199254740940 kB load
+  hash 35: 9007199254740940 kB load
+  hash 41: 9007199254740940 kB load
 ...

Could you, please, take a look?

Also, we should probably add a test like that to the patch.

Best regards, Ilya Maximets.
Adrian Moreno Feb. 22, 2024, 2:58 p.m. UTC | #3
On 2/21/24 21:09, Ilya Maximets wrote:
> On 2/20/24 07:50, Adrian Moreno wrote:
>> In order to properly balance bond traffic, ofproto/bond periodically
>> reads usage statistics of the post-recirculation rules (which are added
>> to a hidden internal table).
>>
>> To do that, each "struct bond_entry" (which represents a hash within a
>> bond) stores the last seen statistics for its rule. When a hash is moved
>> to another member (due to a bond rebalance or the previous member going
>> down), the rule is typically just modified, i.e: same match different
>> actions. In this case, statistics are preserved and accounting continues
>> to work.
>>
>> However, if the rule gets completely deleted (e.g: when all bond members
>> go down) and then re-created, the new rule will have 0 tx_bytes but its
>> associated entry will still store a non-zero last-seen value.
>> This situation leads to an overflow of the delta calculation (computed
>> as [current_stats_value - last_seen_value]), which can affect traffic
>> as the hash will be considered to carry a lot of traffic and rebalancing
>> will kick in.
> 
> Hi, Adrian.  Thanks for the v2!
> 
> The code seems fine, but I can still reproduce the issue described here
> in the commit message.  I just added the following lines to an existing
> rebalancing test and see overflowed load counters on hashes:
> 
> diff --git a/tests/ofproto-dpif.at b/tests/ofproto-dpif.at
> index daeea7775..ca478bb29 100644
> --- a/tests/ofproto-dpif.at
> +++ b/tests/ofproto-dpif.at
> @@ -547,6 +547,22 @@ ovs-appctl time/warp 1000 100
>   ovs-appctl bond/show > bond3.txt
>   AT_CHECK([sed -n '/member p2/,/^$/p' bond3.txt | grep 'hash'], [0], [ignore])
>   
> +AT_CHECK([ovs-appctl netdev-dummy/set-admin-state p1 down], 0, [OK
> +])
> +AT_CHECK([ovs-appctl netdev-dummy/set-admin-state p2 down], 0, [OK
> +])
> +ovs-appctl time/warp 1000 100
> +AT_CHECK([ovs-appctl netdev-dummy/set-admin-state p1 up], 0, [OK
> +])
> +AT_CHECK([ovs-appctl netdev-dummy/set-admin-state p2 up], 0, [OK
> +])
> +ovs-appctl time/warp 1000 100
> +
> +AT_CHECK([SEND_TCP_BOND_PKTS([p5], [5], [65500])])
> +# We sent 49125 KB of data total in 3 batches.  No hash should have more
> +# than that amount of load.  Just checking that it is within 5 digits.
> +AT_CHECK([ovs-appctl bond/show | grep -E '[[0-9]]{6}'], [1])
> +
>   OVS_VSWITCHD_STOP()
>   AT_CLEANUP
>   
> ---
> 
> ./ofproto-dpif.at:564: ovs-appctl bond/show | grep -E '[0-9]{6}'
> --- /dev/null   2024-01-29 23:31:19.129371175 +0100
> +++ /ovs/tests/testsuite.dir/at-groups/1132/stdout
> @@ -0,0 +1,62 @@
> +  hash 0: 9007199254740940 kB load
> +  hash 2: 9007199254740888 kB load
> +  hash 25: 9007199254740940 kB load
> +  hash 30: 9007199254740940 kB load
> +  hash 33: 9007199254740940 kB load
> +  hash 35: 9007199254740940 kB load
> +  hash 41: 9007199254740940 kB load
>   ...
> 
> Could you, please, take a look?
> 
> Also, we should probably add a test like that to the patch.
> 

Thanks for the test, I'll add it to v3. I am surprised I didn't get this during 
my testing. Indeed, I did not test with that many different traffic hashes and 
got lucky!

Now I see my mistake. I had read "hash" so many times that when I read:
     hmap_insert(&bond->pr_rule_ops, &pr_op->hmap_node, hash);

I immediately thought: oh that's smart, using the bond entry hash, which is 
known to be unique, as index in the pr_rules_ops hash table. However, this hash 
is not the bond entry hash, it's "match_hash(match, 0)" which is not 
match->flow->dp_hash as my tired eyes told me, but the hash of the entire match. 
The fact that there were some hashes left uncleared means that this hash collides.

Anyway, I'll send v3 using the actual flow dp_hash (match->flow.dp_hash) and 
adding your test.
diff mbox series

Patch

diff --git a/ofproto/bond.c b/ofproto/bond.c
index cfdf44f85..e5b3f1a1b 100644
--- a/ofproto/bond.c
+++ b/ofproto/bond.c
@@ -186,7 +186,7 @@  static struct bond_member *choose_output_member(const struct bond *,
                                                 struct flow_wildcards *,
                                                 uint16_t vlan)
     OVS_REQ_RDLOCK(rwlock);
-static void update_recirc_rules__(struct bond *);
+static void update_recirc_rules(struct bond *) OVS_REQ_WRLOCK(rwlock);
 static bool bond_may_recirc(const struct bond *);
 static void bond_update_post_recirc_rules__(struct bond *, bool force)
     OVS_REQ_WRLOCK(rwlock);
@@ -299,7 +299,10 @@  bond_unref(struct bond *bond)
     }
     free(bond->hash);
     bond->hash = NULL;
-    update_recirc_rules__(bond);
+
+    ovs_rwlock_wrlock(&rwlock);
+    update_recirc_rules(bond);
+    ovs_rwlock_unlock(&rwlock);
 
     hmap_destroy(&bond->pr_rule_ops);
     free(bond->primary);
@@ -331,17 +334,8 @@  add_pr_rule(struct bond *bond, const struct match *match,
     hmap_insert(&bond->pr_rule_ops, &pr_op->hmap_node, hash);
 }
 
-/* This function should almost never be called directly.
- * 'update_recirc_rules()' should be called instead.  Since
- * this function modifies 'bond->pr_rule_ops', it is only
- * safe when 'rwlock' is held.
- *
- * However, when the 'bond' is the only reference in the system,
- * calling this function avoid acquiring lock only to satisfy
- * lock annotation. Currently, only 'bond_unref()' calls
- * this function directly.  */
 static void
-update_recirc_rules__(struct bond *bond)
+update_recirc_rules(struct bond *bond) OVS_REQ_WRLOCK(rwlock)
 {
     struct match match;
     struct bond_pr_rule_op *pr_op;
@@ -407,6 +401,15 @@  update_recirc_rules__(struct bond *bond)
 
                 VLOG_ERR("failed to remove post recirculation flow %s", err_s);
                 free(err_s);
+            } else if (bond->hash) {
+                /* If the flow deletion failed, a subsecuent call to
+                 * ofproto_dpif_add_internal_flow() would just modify the flow
+                 * preserving its statistics. Therefore, only reset the entry's
+                 * byte counter if it succeeds. */
+                uint32_t hash = pr_op->hmap_node.hash & BOND_MASK;
+                struct bond_entry *entry = &bond->hash[hash];
+
+                entry->pr_tx_bytes = 0;
             }
 
             hmap_remove(&bond->pr_rule_ops, &pr_op->hmap_node);
@@ -421,12 +424,6 @@  update_recirc_rules__(struct bond *bond)
     ofpbuf_uninit(&ofpacts);
 }
 
-static void
-update_recirc_rules(struct bond *bond)
-    OVS_REQ_RDLOCK(rwlock)
-{
-    update_recirc_rules__(bond);
-}
 
 /* Updates 'bond''s overall configuration to 's'.
  *