diff mbox series

[ovs-dev] ovs: fix the bug of bucket counter is not updated

Message ID 3ea42071-7a00-0840-6017-01c21839e8fb@gmail.com
State Accepted
Headers show
Series [ovs-dev] ovs: fix the bug of bucket counter is not updated | expand

Commit Message

solomon March 20, 2019, 12:16 p.m. UTC
After inserting/removing a bucket, we don't update the bucket counter.
When we call ovs-ofctl dump-group-stats br-int, a panic happened.

Reproduce steps:
1. ovs-ofctl -O OpenFlow15 add-group br-int "group_id=1, type=select, selection_method=hash bucket=bucket_id=1,weight:100,actions=output:1"
2. ovs-ofctl insert-buckets br-int "group_id=1, command_bucket_id=last, bucket=bucket_id=7,weight:800,actions=output:1"
3. ovs-ofctl dump-group-stats br-int

gdb) bt
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1 0x00007f028675b42a in __GI_abort () at abort.c:89
#2 0x00007f0286797c00 in __libc_message (do_abort=do_abort@entry=2, fmt=fmt@entry=0x7f028688cd98 "*** Error in `%s': %s: 0x%s ***\n")
at ../sysdeps/posix/libc_fatal.c:175
#3 0x00007f028679dfc6 in malloc_printerr (action=3, str=0x7f028688cea8 "free(): invalid next size (fast)", ptr=<optimized out>,
ar_ptr=<optimized out>) at malloc.c:5049
#4 0x00007f028679e80e in _int_free (av=0x7f0286ac0b00 <main_arena>, p=0x55cac2920040, have_lock=0) at malloc.c:3905
#5 0x000055cab8fd6d8e in append_group_stats (group=0x55cac250d860, replies=0x7fffe6a11b90) at ofproto/ofproto.c:6770
#6 0x000055cab8fd8d04 in handle_group_request (ofconn=ofconn@entry=0x55cac28ec5a0, request=request@entry=0x55cac29317f0,
group_id=<optimized out>, cb=cb@entry=0x55cab8fd6cd0 <append_group_stats>) at ofproto/ofproto.c:6790
#7 0x000055cab8fe2a96 in handle_group_stats_request (request=0x55cac29317f0, ofconn=0x55cac28ec5a0) at ofproto/ofproto.c:6815
#8 handle_openflow__ (msg=0x55cac29365c0, ofconn=0x55cac28ec5a0) at ofproto/ofproto.c:8282
#9 handle_openflow (ofconn=0x55cac28ec5a0, ofp_msg=0x55cac29365c0) at ofproto/ofproto.c:8362
#10 0x000055cab9013ddb in ofconn_run (handle_openflow=0x55cab8fe23c0 <handle_openflow>, ofconn=0x55cac28ec5a0) at ofproto/connmgr.c:1446
#11 connmgr_run (mgr=0x55cabb2ce360, handle_openflow=handle_openflow@entry=0x55cab8fe23c0 <handle_openflow>) at ofproto/connmgr.c:365
#12 0x000055cab8fdc516 in ofproto_run (p=0x55cabb2cddd0) at ofproto/ofproto.c:1825
#13 0x000055cab8fcabfc in bridge_run__ () at vswitchd/bridge.c:2944
#14 0x000055cab8fcfe64 in bridge_run () at vswitchd/bridge.c:3002
#15 0x000055cab8c693ed in main (argc=<optimized out>, argv=<optimized out>) at vswitchd/ovs-vswitchd.c:125


Signed-off-by: solomon <liwei.solomon@gmail.com>
---
 ofproto/ofproto.c |  2 ++
 tests/ofproto.at  | 16 ++++++++++++++++
 2 files changed, 18 insertions(+)

Comments

0-day Robot March 20, 2019, 12:59 p.m. UTC | #1
Bleep bloop.  Greetings Li Wei, I am a robot and I have tried out your patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


checkpatch:
ERROR: Author Li Wei <liwei.solomon@gmail.com> needs to sign off.
WARNING: Unexpected sign-offs from developers who are not authors or co-authors or committers: solomon <liwei.solomon@gmail.com>
Lines checked: 84, Warnings: 1, Errors: 1


Please check this out.  If you feel there has been an error, please email aconole@bytheb.org

Thanks,
0-day Robot
Ben Pfaff March 20, 2019, 6 p.m. UTC | #2
On Wed, Mar 20, 2019 at 08:16:18PM +0800, Li Wei wrote:
> 
> After inserting/removing a bucket, we don't update the bucket counter.
> When we call ovs-ofctl dump-group-stats br-int, a panic happened.

Thanks for the patch!  It looks correct to me.  Thank you for adding a
test, too.

I took a closer look and I saw that 'n_buckets' is not very useful,
because it is only used in cases where the code is already
O(n_buckets).  I think that we can just remove it.  Then it cannot get
out-of-sync.  What do you think of this variation of your patch?

--8<--------------------------cut here-------------------------->8--

From: Li Wei <liwei.solomon@gmail.com>
Date: Wed, 20 Mar 2019 20:16:18 +0800
Subject: [PATCH] ofproto: fix the bug of bucket counter is not updated

After inserting/removing a bucket, we don't update the bucket counter.
When we call ovs-ofctl dump-group-stats br-int, a panic happened.

Reproduce steps:
1. ovs-ofctl -O OpenFlow15 add-group br-int "group_id=1, type=select, selection_method=hash bucket=bucket_id=1,weight:100,actions=output:1"
2. ovs-ofctl insert-buckets br-int "group_id=1, command_bucket_id=last, bucket=bucket_id=7,weight:800,actions=output:1"
3. ovs-ofctl dump-group-stats br-int

gdb) bt
at ../sysdeps/posix/libc_fatal.c:175
ar_ptr=<optimized out>) at malloc.c:5049
group_id=<optimized out>, cb=cb@entry=0x55cab8fd6cd0 <append_group_stats>) at ofproto/ofproto.c:6790

Signed-off-by: solomon <liwei.solomon@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
---
 ofproto/ofproto-dpif.c     |  2 +-
 ofproto/ofproto-provider.h |  1 -
 ofproto/ofproto.c          | 11 ++++-------
 tests/ofproto.at           | 16 ++++++++++++++++
 4 files changed, 21 insertions(+), 9 deletions(-)

diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
index 21dd54bab09b..cc6dbc70fe41 100644
--- a/ofproto/ofproto-dpif.c
+++ b/ofproto/ofproto-dpif.c
@@ -4754,7 +4754,7 @@ static bool
 group_setup_dp_hash_table(struct group_dpif *group, size_t max_hash)
 {
     struct ofputil_bucket *bucket;
-    uint32_t n_buckets = group->up.n_buckets;
+    uint32_t n_buckets = ovs_list_size(&group->up.buckets);
     uint64_t total_weight = 0;
     uint16_t min_weight = UINT16_MAX;
     struct webster {
diff --git a/ofproto/ofproto-provider.h b/ofproto/ofproto-provider.h
index d1a87a59e47e..a71d911199f4 100644
--- a/ofproto/ofproto-provider.h
+++ b/ofproto/ofproto-provider.h
@@ -573,7 +573,6 @@ struct ofgroup {
     const long long int modified;     /* Time of last modification. */
 
     const struct ovs_list buckets;    /* Contains "struct ofputil_bucket"s. */
-    const uint32_t n_buckets;
 
     struct ofputil_group_props props;
 
diff --git a/ofproto/ofproto.c b/ofproto/ofproto.c
index 40780e2766ab..ff3b8410bf49 100644
--- a/ofproto/ofproto.c
+++ b/ofproto/ofproto.c
@@ -6979,11 +6979,12 @@ append_group_stats(struct ofgroup *group, struct ovs_list *replies)
     long long int now = time_msec();
     int error;
 
-    ogs.bucket_stats = xmalloc(group->n_buckets * sizeof *ogs.bucket_stats);
+    size_t n_buckets = ovs_list_size(&group->buckets);
+    ogs.bucket_stats = xmalloc(n_buckets * sizeof *ogs.bucket_stats);
 
     /* Provider sets the packet and byte counts, we do the rest. */
     ogs.ref_count = rule_collection_n(&group->rules);
-    ogs.n_buckets = group->n_buckets;
+    ogs.n_buckets = n_buckets;
 
     error = (ofproto->ofproto_class->group_get_stats
              ? ofproto->ofproto_class->group_get_stats(group, &ogs)
@@ -6991,8 +6992,7 @@ append_group_stats(struct ofgroup *group, struct ovs_list *replies)
     if (error) {
         ogs.packet_count = UINT64_MAX;
         ogs.byte_count = UINT64_MAX;
-        memset(ogs.bucket_stats, 0xff,
-               ogs.n_buckets * sizeof *ogs.bucket_stats);
+        memset(ogs.bucket_stats, 0xff, n_buckets * sizeof *ogs.bucket_stats);
     }
 
     ogs.group_id = group->group_id;
@@ -7212,9 +7212,6 @@ init_group(struct ofproto *ofproto, const struct ofputil_group_mod *gm,
                                          &(*ofgroup)->buckets),
                               &gm->buckets, NULL);
 
-    *CONST_CAST(uint32_t *, &(*ofgroup)->n_buckets) =
-        ovs_list_size(&(*ofgroup)->buckets);
-
     ofputil_group_properties_copy(CONST_CAST(struct ofputil_group_props *,
                                              &(*ofgroup)->props),
                                   &gm->props);
diff --git a/tests/ofproto.at b/tests/ofproto.at
index 213fb262360b..a810dd6042fd 100644
--- a/tests/ofproto.at
+++ b/tests/ofproto.at
@@ -1223,6 +1223,22 @@ OFPST_GROUP reply (OF1.5):
 OVS_VSWITCHD_STOP
 AT_CLEANUP
 
+dnl This is used to find that the bucket counter is not updated.
+AT_SETUP([ofproto - group stats after insert a new bucket (OpenFlow 1.5)])
+OVS_VSWITCHD_START
+AT_DATA([groups.txt], [dnl
+group_id=1234,type=select,selection_method=hash bucket=bucket_id=1,weight:100,actions=output:10
+])
+AT_CHECK([ovs-ofctl -O OpenFlow15 -vwarn add-groups br0 groups.txt])
+AT_CHECK([ovs-ofctl -O OpenFlow15 -vwarn insert-buckets br0 'group_id=1234, command_bucket_id=last, bucket=bucket_id=2,weight:100,actions=output:10'])
+AT_CHECK([ovs-ofctl -O OpenFlow15 -vwarn dump-group-stats br0 group_id=1234], [0], [stdout])
+AT_CHECK([strip_xids < stdout | sed 's/duration=[[0-9.]]*s/duration=?s/' | sort], [0], [dnl
+ group_id=1234,duration=?s,ref_count=0,packet_count=0,byte_count=0,bucket0:packet_count=0,byte_count=0,bucket1:packet_count=0,byte_count=0
+OFPST_GROUP reply (OF1.5):
+])
+OVS_VSWITCHD_STOP
+AT_CLEANUP
+
 dnl This found a use-after-free error in bridge destruction in the
 dnl presence of groups.
 AT_SETUP([ofproto - group add then bridge delete (OpenFlow 1.3)])
solomon March 21, 2019, 2:41 a.m. UTC | #3
Ben Pfaff wrote:
> On Wed, Mar 20, 2019 at 08:16:18PM +0800, Li Wei wrote:
>>
>> After inserting/removing a bucket, we don't update the bucket counter.
>> When we call ovs-ofctl dump-group-stats br-int, a panic happened.
> 
> Thanks for the patch!  It looks correct to me.  Thank you for adding a
> test, too.
> 
> I took a closer look and I saw that 'n_buckets' is not very useful,
> because it is only used in cases where the code is already
> O(n_buckets).  I think that we can just remove it.  Then it cannot get
> out-of-sync.  What do you think of this variation of your patch?


ovs_list_size() will traversing the list to get the total length.

In our custom scheduling algorithms (eg wrr, least-connection), 
we need to know the total number of buckets before traversing the bucket list to hit target bucket. 
so, it is traversed twice.

If the number of buckets reaches 100+, there are tens of thousands of groups, don't this modification affect performance?

I hope to keep n_buckets in struct ofgroup.


> 
> --8<--------------------------cut here-------------------------->8--
> 
> From: Li Wei <liwei.solomon@gmail.com>
> Date: Wed, 20 Mar 2019 20:16:18 +0800
> Subject: [PATCH] ofproto: fix the bug of bucket counter is not updated
> 
> After inserting/removing a bucket, we don't update the bucket counter.
> When we call ovs-ofctl dump-group-stats br-int, a panic happened.
> 
> Reproduce steps:
> 1. ovs-ofctl -O OpenFlow15 add-group br-int "group_id=1, type=select, selection_method=hash bucket=bucket_id=1,weight:100,actions=output:1"
> 2. ovs-ofctl insert-buckets br-int "group_id=1, command_bucket_id=last, bucket=bucket_id=7,weight:800,actions=output:1"
> 3. ovs-ofctl dump-group-stats br-int
> 
> gdb) bt
> at ../sysdeps/posix/libc_fatal.c:175
> ar_ptr=<optimized out>) at malloc.c:5049
> group_id=<optimized out>, cb=cb@entry=0x55cab8fd6cd0 <append_group_stats>) at ofproto/ofproto.c:6790
> 
> Signed-off-by: solomon <liwei.solomon@gmail.com>
> Signed-off-by: Ben Pfaff <blp@ovn.org>
> ---
>  ofproto/ofproto-dpif.c     |  2 +-
>  ofproto/ofproto-provider.h |  1 -
>  ofproto/ofproto.c          | 11 ++++-------
>  tests/ofproto.at           | 16 ++++++++++++++++
>  4 files changed, 21 insertions(+), 9 deletions(-)
> 
> diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
> index 21dd54bab09b..cc6dbc70fe41 100644
> --- a/ofproto/ofproto-dpif.c
> +++ b/ofproto/ofproto-dpif.c
> @@ -4754,7 +4754,7 @@ static bool
>  group_setup_dp_hash_table(struct group_dpif *group, size_t max_hash)
>  {
>      struct ofputil_bucket *bucket;
> -    uint32_t n_buckets = group->up.n_buckets;
> +    uint32_t n_buckets = ovs_list_size(&group->up.buckets);
>      uint64_t total_weight = 0;
>      uint16_t min_weight = UINT16_MAX;
>      struct webster {
> diff --git a/ofproto/ofproto-provider.h b/ofproto/ofproto-provider.h
> index d1a87a59e47e..a71d911199f4 100644
> --- a/ofproto/ofproto-provider.h
> +++ b/ofproto/ofproto-provider.h
> @@ -573,7 +573,6 @@ struct ofgroup {
>      const long long int modified;     /* Time of last modification. */
>  
>      const struct ovs_list buckets;    /* Contains "struct ofputil_bucket"s. */
> -    const uint32_t n_buckets;
>  
>      struct ofputil_group_props props;
>  
> diff --git a/ofproto/ofproto.c b/ofproto/ofproto.c
> index 40780e2766ab..ff3b8410bf49 100644
> --- a/ofproto/ofproto.c
> +++ b/ofproto/ofproto.c
> @@ -6979,11 +6979,12 @@ append_group_stats(struct ofgroup *group, struct ovs_list *replies)
>      long long int now = time_msec();
>      int error;
>  
> -    ogs.bucket_stats = xmalloc(group->n_buckets * sizeof *ogs.bucket_stats);
> +    size_t n_buckets = ovs_list_size(&group->buckets);
> +    ogs.bucket_stats = xmalloc(n_buckets * sizeof *ogs.bucket_stats);
>  
>      /* Provider sets the packet and byte counts, we do the rest. */
>      ogs.ref_count = rule_collection_n(&group->rules);
> -    ogs.n_buckets = group->n_buckets;
> +    ogs.n_buckets = n_buckets;
>  
>      error = (ofproto->ofproto_class->group_get_stats
>               ? ofproto->ofproto_class->group_get_stats(group, &ogs)
> @@ -6991,8 +6992,7 @@ append_group_stats(struct ofgroup *group, struct ovs_list *replies)
>      if (error) {
>          ogs.packet_count = UINT64_MAX;
>          ogs.byte_count = UINT64_MAX;
> -        memset(ogs.bucket_stats, 0xff,
> -               ogs.n_buckets * sizeof *ogs.bucket_stats);
> +        memset(ogs.bucket_stats, 0xff, n_buckets * sizeof *ogs.bucket_stats);
>      }
>  
>      ogs.group_id = group->group_id;
> @@ -7212,9 +7212,6 @@ init_group(struct ofproto *ofproto, const struct ofputil_group_mod *gm,
>                                           &(*ofgroup)->buckets),
>                                &gm->buckets, NULL);
>  
> -    *CONST_CAST(uint32_t *, &(*ofgroup)->n_buckets) =
> -        ovs_list_size(&(*ofgroup)->buckets);
> -
>      ofputil_group_properties_copy(CONST_CAST(struct ofputil_group_props *,
>                                               &(*ofgroup)->props),
>                                    &gm->props);
> diff --git a/tests/ofproto.at b/tests/ofproto.at
> index 213fb262360b..a810dd6042fd 100644
> --- a/tests/ofproto.at
> +++ b/tests/ofproto.at
> @@ -1223,6 +1223,22 @@ OFPST_GROUP reply (OF1.5):
>  OVS_VSWITCHD_STOP
>  AT_CLEANUP
>  
> +dnl This is used to find that the bucket counter is not updated.
> +AT_SETUP([ofproto - group stats after insert a new bucket (OpenFlow 1.5)])
> +OVS_VSWITCHD_START
> +AT_DATA([groups.txt], [dnl
> +group_id=1234,type=select,selection_method=hash bucket=bucket_id=1,weight:100,actions=output:10
> +])
> +AT_CHECK([ovs-ofctl -O OpenFlow15 -vwarn add-groups br0 groups.txt])
> +AT_CHECK([ovs-ofctl -O OpenFlow15 -vwarn insert-buckets br0 'group_id=1234, command_bucket_id=last, bucket=bucket_id=2,weight:100,actions=output:10'])
> +AT_CHECK([ovs-ofctl -O OpenFlow15 -vwarn dump-group-stats br0 group_id=1234], [0], [stdout])
> +AT_CHECK([strip_xids < stdout | sed 's/duration=[[0-9.]]*s/duration=?s/' | sort], [0], [dnl
> + group_id=1234,duration=?s,ref_count=0,packet_count=0,byte_count=0,bucket0:packet_count=0,byte_count=0,bucket1:packet_count=0,byte_count=0
> +OFPST_GROUP reply (OF1.5):
> +])
> +OVS_VSWITCHD_STOP
> +AT_CLEANUP
> +
>  dnl This found a use-after-free error in bridge destruction in the
>  dnl presence of groups.
>  AT_SETUP([ofproto - group add then bridge delete (OpenFlow 1.3)])
>
Ben Pfaff March 25, 2019, 8:36 p.m. UTC | #4
On Thu, Mar 21, 2019 at 10:41:05AM +0800, solomon wrote:
> Ben Pfaff wrote:
> > On Wed, Mar 20, 2019 at 08:16:18PM +0800, Li Wei wrote:
> >>
> >> After inserting/removing a bucket, we don't update the bucket counter.
> >> When we call ovs-ofctl dump-group-stats br-int, a panic happened.
> > 
> > Thanks for the patch!  It looks correct to me.  Thank you for adding a
> > test, too.
> > 
> > I took a closer look and I saw that 'n_buckets' is not very useful,
> > because it is only used in cases where the code is already
> > O(n_buckets).  I think that we can just remove it.  Then it cannot get
> > out-of-sync.  What do you think of this variation of your patch?
> 
> 
> ovs_list_size() will traversing the list to get the total length.
> 
> In our custom scheduling algorithms (eg wrr, least-connection), 
> we need to know the total number of buckets before traversing the bucket list to hit target bucket. 
> so, it is traversed twice.
> 
> If the number of buckets reaches 100+, there are tens of thousands of groups, don't this modification affect performance?
> 
> I hope to keep n_buckets in struct ofgroup.

OK.

I applied the original to master and backported as far as branch-2.6.
solomon March 26, 2019, 2:16 a.m. UTC | #5
Ben Pfaff wrote:
> On Thu, Mar 21, 2019 at 10:41:05AM +0800, solomon wrote:
>> Ben Pfaff wrote:
>>> On Wed, Mar 20, 2019 at 08:16:18PM +0800, Li Wei wrote:
>>>>
>>>> After inserting/removing a bucket, we don't update the bucket counter.
>>>> When we call ovs-ofctl dump-group-stats br-int, a panic happened.
>>>
>>> Thanks for the patch!  It looks correct to me.  Thank you for adding a
>>> test, too.
>>>
>>> I took a closer look and I saw that 'n_buckets' is not very useful,
>>> because it is only used in cases where the code is already
>>> O(n_buckets).  I think that we can just remove it.  Then it cannot get
>>> out-of-sync.  What do you think of this variation of your patch?
>>
>>
>> ovs_list_size() will traversing the list to get the total length.
>>
>> In our custom scheduling algorithms (eg wrr, least-connection), 
>> we need to know the total number of buckets before traversing the bucket list to hit target bucket. 
>> so, it is traversed twice.
>>
>> If the number of buckets reaches 100+, there are tens of thousands of groups, don't this modification affect performance?
>>
>> I hope to keep n_buckets in struct ofgroup.
> 
> OK.
> 
> I applied the original to master and backported as far as branch-2.6.

hi Ben,this patch depends on commit 0b4caa2eba22a516a312e7b404107eaebe618ee1
(ofp-group: support to insert bucket with weight value for select type)on branch-2.10.

Could you pick commit 0b4caa2eba22 into brnach-2.10 also?

> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
Ilya Maximets March 26, 2019, 8:16 a.m. UTC | #6
> On Thu, Mar 21, 2019 at 10:41:05AM +0800, solomon wrote:
>> Ben Pfaff wrote:
>> > On Wed, Mar 20, 2019 at 08:16:18PM +0800, Li Wei wrote:
>> >>
>> >> After inserting/removing a bucket, we don't update the bucket counter.
>> >> When we call ovs-ofctl dump-group-stats br-int, a panic happened.
>> > 
>> > Thanks for the patch!  It looks correct to me.  Thank you for adding a
>> > test, too.
>> > 
>> > I took a closer look and I saw that 'n_buckets' is not very useful,
>> > because it is only used in cases where the code is already
>> > O(n_buckets).  I think that we can just remove it.  Then it cannot get
>> > out-of-sync.  What do you think of this variation of your patch?
>> 
>> 
>> ovs_list_size() will traversing the list to get the total length.
>> 
>> In our custom scheduling algorithms (eg wrr, least-connection), 
>> we need to know the total number of buckets before traversing the bucket list to hit target bucket. 
>> so, it is traversed twice.
>> 
>> If the number of buckets reaches 100+, there are tens of thousands of groups, don't this modification affect performance?
>> 
>> I hope to keep n_buckets in struct ofgroup.
> 
> OK.
> 
> I applied the original to master and backported as far as branch-2.6.

This patch broke the testsuite on branches 2.6 to 2.10.

Best regards, Ilya Maximets.
solomon March 26, 2019, 8:50 a.m. UTC | #7
Ilya Maximets wrote:
>> On Thu, Mar 21, 2019 at 10:41:05AM +0800, solomon wrote:
>>> Ben Pfaff wrote:
>>>> On Wed, Mar 20, 2019 at 08:16:18PM +0800, Li Wei wrote:
>>>>>
>>>>> After inserting/removing a bucket, we don't update the bucket counter.
>>>>> When we call ovs-ofctl dump-group-stats br-int, a panic happened.
>>>>
>>>> Thanks for the patch!  It looks correct to me.  Thank you for adding a
>>>> test, too.
>>>>
>>>> I took a closer look and I saw that 'n_buckets' is not very useful,
>>>> because it is only used in cases where the code is already
>>>> O(n_buckets).  I think that we can just remove it.  Then it cannot get
>>>> out-of-sync.  What do you think of this variation of your patch?
>>>
>>>
>>> ovs_list_size() will traversing the list to get the total length.
>>>
>>> In our custom scheduling algorithms (eg wrr, least-connection), 
>>> we need to know the total number of buckets before traversing the bucket list to hit target bucket. 
>>> so, it is traversed twice.
>>>
>>> If the number of buckets reaches 100+, there are tens of thousands of groups, don't this modification affect performance?
>>>
>>> I hope to keep n_buckets in struct ofgroup.
>>
>> OK.
>>
>> I applied the original to master and backported as far as branch-2.6.
> 
> This patch broke the testsuite on branches 2.6 to 2.10.

The new testcase in this patch insert new bucket using insert-buckets command.
But there is a bug in inserting bucket with weight fixed in commit 
0b4caa2eba22a516a312e7b404107eaebe618ee1
(ofp-group: support to insert bucket with weight value for select type)

Also need to backport commit 0b4caa2eba to branch2.6~2.10.


> 
> Best regards, Ilya Maximets.
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
Simon Horman March 26, 2019, 1:17 p.m. UTC | #8
On Tue, Mar 26, 2019 at 04:50:03PM +0800, solomon wrote:
> Ilya Maximets wrote:
> >> On Thu, Mar 21, 2019 at 10:41:05AM +0800, solomon wrote:
> >>> Ben Pfaff wrote:
> >>>> On Wed, Mar 20, 2019 at 08:16:18PM +0800, Li Wei wrote:
> >>>>>
> >>>>> After inserting/removing a bucket, we don't update the bucket counter.
> >>>>> When we call ovs-ofctl dump-group-stats br-int, a panic happened.
> >>>>
> >>>> Thanks for the patch!  It looks correct to me.  Thank you for adding a
> >>>> test, too.
> >>>>
> >>>> I took a closer look and I saw that 'n_buckets' is not very useful,
> >>>> because it is only used in cases where the code is already
> >>>> O(n_buckets).  I think that we can just remove it.  Then it cannot get
> >>>> out-of-sync.  What do you think of this variation of your patch?
> >>>
> >>>
> >>> ovs_list_size() will traversing the list to get the total length.
> >>>
> >>> In our custom scheduling algorithms (eg wrr, least-connection), 
> >>> we need to know the total number of buckets before traversing the bucket list to hit target bucket. 
> >>> so, it is traversed twice.
> >>>
> >>> If the number of buckets reaches 100+, there are tens of thousands of groups, don't this modification affect performance?
> >>>
> >>> I hope to keep n_buckets in struct ofgroup.
> >>
> >> OK.
> >>
> >> I applied the original to master and backported as far as branch-2.6.
> > 
> > This patch broke the testsuite on branches 2.6 to 2.10.
> 
> The new testcase in this patch insert new bucket using insert-buckets command.
> But there is a bug in inserting bucket with weight fixed in commit 
> 0b4caa2eba22a516a312e7b404107eaebe618ee1
> (ofp-group: support to insert bucket with weight value for select type)
> 
> Also need to backport commit 0b4caa2eba to branch2.6~2.10.

Thanks, I have made preliminary backports to the relevant branches
and am running travisci to see if the tests pass.

https://travis-ci.org/horms2/ovs/builds/511479533
https://travis-ci.org/horms2/ovs/builds/511482871
https://travis-ci.org/horms2/ovs/builds/511482977
https://travis-ci.org/horms2/ovs/builds/511482911
https://travis-ci.org/horms2/ovs/builds/511482945

From my point of view travisci passing is a base requirement
for applying patches, in particular backports to released versions.
Ben Pfaff March 26, 2019, 4:04 p.m. UTC | #9
On Tue, Mar 26, 2019 at 02:17:07PM +0100, Simon Horman wrote:
> From my point of view travisci passing is a base requirement
> for applying patches, in particular backports to released versions.

I agree in principle, but I don't know how to work that in with the
number of patches I apply.  It would blow up my workflow and I would be
ineffective.

If we designated a new LTS, then it would work out better because there
wouldn't be 7 release branches to deal with.
Simon Horman March 27, 2019, 8:10 a.m. UTC | #10
On Tue, Mar 26, 2019 at 09:04:05AM -0700, Ben Pfaff wrote:
> On Tue, Mar 26, 2019 at 02:17:07PM +0100, Simon Horman wrote:
> > From my point of view travisci passing is a base requirement
> > for applying patches, in particular backports to released versions.
> 
> I agree in principle, but I don't know how to work that in with the
> number of patches I apply.  It would blow up my workflow and I would be
> ineffective.
> 
> If we designated a new LTS, then it would work out better because there
> wouldn't be 7 release branches to deal with.

Thanks Ben,

I appreciate its easier for me to make a sweeping statement than to
implement a working system. But I think its a discussion worth having.
Simon Horman March 27, 2019, 9:37 a.m. UTC | #11
On Tue, Mar 26, 2019 at 02:17:07PM +0100, Simon Horman wrote:
> On Tue, Mar 26, 2019 at 04:50:03PM +0800, solomon wrote:
> > Ilya Maximets wrote:
> > >> On Thu, Mar 21, 2019 at 10:41:05AM +0800, solomon wrote:
> > >>> Ben Pfaff wrote:
> > >>>> On Wed, Mar 20, 2019 at 08:16:18PM +0800, Li Wei wrote:
> > >>>>>
> > >>>>> After inserting/removing a bucket, we don't update the bucket counter.
> > >>>>> When we call ovs-ofctl dump-group-stats br-int, a panic happened.
> > >>>>
> > >>>> Thanks for the patch!  It looks correct to me.  Thank you for adding a
> > >>>> test, too.
> > >>>>
> > >>>> I took a closer look and I saw that 'n_buckets' is not very useful,
> > >>>> because it is only used in cases where the code is already
> > >>>> O(n_buckets).  I think that we can just remove it.  Then it cannot get
> > >>>> out-of-sync.  What do you think of this variation of your patch?
> > >>>
> > >>>
> > >>> ovs_list_size() will traversing the list to get the total length.
> > >>>
> > >>> In our custom scheduling algorithms (eg wrr, least-connection), 
> > >>> we need to know the total number of buckets before traversing the bucket list to hit target bucket. 
> > >>> so, it is traversed twice.
> > >>>
> > >>> If the number of buckets reaches 100+, there are tens of thousands of groups, don't this modification affect performance?
> > >>>
> > >>> I hope to keep n_buckets in struct ofgroup.
> > >>
> > >> OK.
> > >>
> > >> I applied the original to master and backported as far as branch-2.6.
> > > 
> > > This patch broke the testsuite on branches 2.6 to 2.10.
> > 
> > The new testcase in this patch insert new bucket using insert-buckets command.
> > But there is a bug in inserting bucket with weight fixed in commit 
> > 0b4caa2eba22a516a312e7b404107eaebe618ee1
> > (ofp-group: support to insert bucket with weight value for select type)
> > 
> > Also need to backport commit 0b4caa2eba to branch2.6~2.10.
> 
> Thanks, I have made preliminary backports to the relevant branches
> and am running travisci to see if the tests pass.
> 
> https://travis-ci.org/horms2/ovs/builds/511479533

The check above, of the backport to branch-2.10, succeeded so I have
pushed the backport to the ovs tree.

> https://travis-ci.org/horms2/ovs/builds/511482871
> https://travis-ci.org/horms2/ovs/builds/511482977
> https://travis-ci.org/horms2/ovs/builds/511482911
> https://travis-ci.org/horms2/ovs/builds/511482945

The above backports to branches 2.6 to 2.9 were incomplete.
I have made a second attempt. Travis is checking over that:

https://travis-ci.org/horms2/ovs/builds/511479533
https://travis-ci.org/horms2/ovs/builds/511897524
https://travis-ci.org/horms2/ovs/builds/511897703
https://travis-ci.org/horms2/ovs/builds/511897927
Simon Horman March 27, 2019, 10:52 a.m. UTC | #12
On Wed, Mar 27, 2019 at 10:37:03AM +0100, Simon Horman wrote:
> On Tue, Mar 26, 2019 at 02:17:07PM +0100, Simon Horman wrote:
> > On Tue, Mar 26, 2019 at 04:50:03PM +0800, solomon wrote:
> > > Ilya Maximets wrote:
> > > >> On Thu, Mar 21, 2019 at 10:41:05AM +0800, solomon wrote:
> > > >>> Ben Pfaff wrote:
> > > >>>> On Wed, Mar 20, 2019 at 08:16:18PM +0800, Li Wei wrote:
> > > >>>>>
> > > >>>>> After inserting/removing a bucket, we don't update the bucket counter.
> > > >>>>> When we call ovs-ofctl dump-group-stats br-int, a panic happened.
> > > >>>>
> > > >>>> Thanks for the patch!  It looks correct to me.  Thank you for adding a
> > > >>>> test, too.
> > > >>>>
> > > >>>> I took a closer look and I saw that 'n_buckets' is not very useful,
> > > >>>> because it is only used in cases where the code is already
> > > >>>> O(n_buckets).  I think that we can just remove it.  Then it cannot get
> > > >>>> out-of-sync.  What do you think of this variation of your patch?
> > > >>>
> > > >>>
> > > >>> ovs_list_size() will traversing the list to get the total length.
> > > >>>
> > > >>> In our custom scheduling algorithms (eg wrr, least-connection), 
> > > >>> we need to know the total number of buckets before traversing the bucket list to hit target bucket. 
> > > >>> so, it is traversed twice.
> > > >>>
> > > >>> If the number of buckets reaches 100+, there are tens of thousands of groups, don't this modification affect performance?
> > > >>>
> > > >>> I hope to keep n_buckets in struct ofgroup.
> > > >>
> > > >> OK.
> > > >>
> > > >> I applied the original to master and backported as far as branch-2.6.
> > > > 
> > > > This patch broke the testsuite on branches 2.6 to 2.10.
> > > 
> > > The new testcase in this patch insert new bucket using insert-buckets command.
> > > But there is a bug in inserting bucket with weight fixed in commit 
> > > 0b4caa2eba22a516a312e7b404107eaebe618ee1
> > > (ofp-group: support to insert bucket with weight value for select type)
> > > 
> > > Also need to backport commit 0b4caa2eba to branch2.6~2.10.
> > 
> > Thanks, I have made preliminary backports to the relevant branches
> > and am running travisci to see if the tests pass.
> > 
> > https://travis-ci.org/horms2/ovs/builds/511479533
> 
> The check above, of the backport to branch-2.10, succeeded so I have
> pushed the backport to the ovs tree.
> 
> > https://travis-ci.org/horms2/ovs/builds/511482871
> > https://travis-ci.org/horms2/ovs/builds/511482977
> > https://travis-ci.org/horms2/ovs/builds/511482911
> > https://travis-ci.org/horms2/ovs/builds/511482945
> 
> The above backports to branches 2.6 to 2.9 were incomplete.
> I have made a second attempt. Travis is checking over that:
> 
> https://travis-ci.org/horms2/ovs/builds/511479533
> https://travis-ci.org/horms2/ovs/builds/511897524
> https://travis-ci.org/horms2/ovs/builds/511897703
> https://travis-ci.org/horms2/ovs/builds/511897927

The above passed and I have now pushed the (complete) backport of
0b4caa2eba22 ("ofp-group: support to insert bucket with weight value for
select type") to branch-2.6 to 2.9.
solomon March 27, 2019, 11:16 a.m. UTC | #13
Simon Horman wrote:>>>>
>>>> Also need to backport commit 0b4caa2eba to branch2.6~2.10.
>>>
>>> Thanks, I have made preliminary backports to the relevant branches
>>> and am running travisci to see if the tests pass.
>>>
>>> https://travis-ci.org/horms2/ovs/builds/511479533
>>
>> The check above, of the backport to branch-2.10, succeeded so I have
>> pushed the backport to the ovs tree.
>>
>>> https://travis-ci.org/horms2/ovs/builds/511482871
>>> https://travis-ci.org/horms2/ovs/builds/511482977
>>> https://travis-ci.org/horms2/ovs/builds/511482911
>>> https://travis-ci.org/horms2/ovs/builds/511482945
>>
>> The above backports to branches 2.6 to 2.9 were incomplete.
>> I have made a second attempt. Travis is checking over that:
>>
>> https://travis-ci.org/horms2/ovs/builds/511479533
>> https://travis-ci.org/horms2/ovs/builds/511897524
>> https://travis-ci.org/horms2/ovs/builds/511897703
>> https://travis-ci.org/horms2/ovs/builds/511897927
> 
> The above passed and I have now pushed the (complete) backport of
> 0b4caa2eba22 ("ofp-group: support to insert bucket with weight value for
> select type") to branch-2.6 to 2.9.

Thanks for your jobs to backport them.


> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
Ben Pfaff March 27, 2019, 4:05 p.m. UTC | #14
On Wed, Mar 27, 2019 at 09:10:12AM +0100, Simon Horman wrote:
> On Tue, Mar 26, 2019 at 09:04:05AM -0700, Ben Pfaff wrote:
> > On Tue, Mar 26, 2019 at 02:17:07PM +0100, Simon Horman wrote:
> > > From my point of view travisci passing is a base requirement
> > > for applying patches, in particular backports to released versions.
> > 
> > I agree in principle, but I don't know how to work that in with the
> > number of patches I apply.  It would blow up my workflow and I would be
> > ineffective.
> > 
> > If we designated a new LTS, then it would work out better because there
> > wouldn't be 7 release branches to deal with.
> 
> Thanks Ben,
> 
> I appreciate its easier for me to make a sweeping statement than to
> implement a working system. But I think its a discussion worth having.

It might be possible to set up a system where patches can be pushed to
testing queues and then automatically pushed to the release branches if
the tests succeed.  I've seen that kind of system work elsewhere.  I
don't know how much work it would be to set it up.
diff mbox series

Patch

diff --git a/ofproto/ofproto.c b/ofproto/ofproto.c
index 0a8d141a4..96cdf0b5b 100644
--- a/ofproto/ofproto.c
+++ b/ofproto/ofproto.c
@@ -7192,6 +7192,8 @@  modify_group_start(struct ofproto *ofproto, struct ofproto_group_mod *ogm)
     *CONST_CAST(long long int *, &(new_group->created)) = old_group->created;
     *CONST_CAST(long long int *, &(new_group->modified)) = time_msec();
 
+    *CONST_CAST(uint32_t *, &(new_group->n_buckets)) =
+        ovs_list_size(&(new_group->buckets));
     group_collection_add(&ogm->old_groups, old_group);
 
     /* Mark the old group for deletion. */
diff --git a/tests/ofproto.at b/tests/ofproto.at
index f32d987fa..c5cebd9fe 100644
--- a/tests/ofproto.at
+++ b/tests/ofproto.at
@@ -1257,6 +1257,22 @@  OFPST_GROUP reply (OF1.5):
 OVS_VSWITCHD_STOP
 AT_CLEANUP
 
+dnl This is used to find that the bucket counter is not updated.
+AT_SETUP([ofproto - group stats after insert a new bucket (OpenFlow 1.5)])
+OVS_VSWITCHD_START
+AT_DATA([groups.txt], [dnl
+group_id=1234,type=select,selection_method=hash bucket=bucket_id=1,weight:100,actions=output:10
+])
+AT_CHECK([ovs-ofctl -O OpenFlow15 -vwarn add-groups br0 groups.txt])
+AT_CHECK([ovs-ofctl -O OpenFlow15 -vwarn insert-buckets br0 'group_id=1234, command_bucket_id=last, bucket=bucket_id=2,weight:100,actions=output:10'])
+AT_CHECK([ovs-ofctl -O OpenFlow15 -vwarn dump-group-stats br0 group_id=1234], [0], [stdout])
+AT_CHECK([strip_xids < stdout | sed 's/duration=[[0-9.]]*s/duration=?s/' | sort], [0], [dnl
+ group_id=1234,duration=?s,ref_count=0,packet_count=0,byte_count=0,bucket0:packet_count=0,byte_count=0,bucket1:packet_count=0,byte_count=0
+OFPST_GROUP reply (OF1.5):
+])
+OVS_VSWITCHD_STOP
+AT_CLEANUP
+
 dnl This found a use-after-free error in bridge destruction in the
 dnl presence of groups.
 AT_SETUP([ofproto - group add then bridge delete (OpenFlow 1.3)])