diff mbox

[ovs-dev] fix ovsdb-server memory growth issues when ovs-vsctl cmd stucking.

Message ID 1503747893-16556-1-git-send-email-zengganghui@huawei.com
State Rejected
Headers show

Commit Message

zengganghui Aug. 26, 2017, 11:44 a.m. UTC
When rpc sends message failed with return EAGIN, it will try to send again unlimitly.
Finally, the memory of ovsdb-server will grow infinitly.

Signed-off-by: ZengGanghui <zengganghui@huawei.com>
---
 lib/jsonrpc.c | 17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

Comments

Ben Pfaff Aug. 26, 2017, 10:32 p.m. UTC | #1
On Sat, Aug 26, 2017 at 07:44:53PM +0800, ZengGanghui wrote:
> When rpc sends message failed with return EAGIN, it will try to send again unlimitly.
> Finally, the memory of ovsdb-server will grow infinitly.
> 
> Signed-off-by: ZengGanghui <zengganghui@huawei.com>

This patch doesn't make sense to me.  A failed stream send should not
use memory.  Can you explain the problem and why this solves it?
zengganghui Aug. 28, 2017, 3:15 a.m. UTC | #2
In our environment when applied this patch can solve this problem. For the root cause we are still analyzing, can you help us?
Through our analysis, we suspect that when the ovs-vsctl cmd (ex: ovs-vsctl list interface) sticking, but unix socket link still exists. At this time if the ovsdb data updated, it will continue to send the updated data to ovs-vsctl. Resulting in continuous allocate rpc, but these rpcs cannot be released.

BR.
Zeng Ganghui
Huawei Technologies Co., Ltd.

-----Original Message-----
From: Ben Pfaff [mailto:blp@ovn.org] 
Sent: Sunday, August 27, 2017 6:33 AM
To: zengganghui
Cc: dev@openvswitch.org
Subject: Re: [PATCH] fix ovsdb-server memory growth issues when ovs-vsctl cmd stucking.

On Sat, Aug 26, 2017 at 07:44:53PM +0800, ZengGanghui wrote:
> When rpc sends message failed with return EAGIN, it will try to send again unlimitly.
> Finally, the memory of ovsdb-server will grow infinitly.
> 
> Signed-off-by: ZengGanghui <zengganghui@huawei.com>

This patch doesn't make sense to me.  A failed stream send should not use memory.  Can you explain the problem and why this solves it?
zengganghui Aug. 30, 2017, 3:49 a.m. UTC | #3
We have found a patch (http://patchwork.ozlabs.org/patch/593753/) can resolve this problem.

BR.
Zeng Ganghui
Huawei Technologies Co., Ltd.

-----Original Message-----
From: zengganghui 
Sent: Monday, August 28, 2017 11:15 AM
To: 'Ben Pfaff'
Cc: dev@openvswitch.org
Subject: RE: [PATCH] fix ovsdb-server memory growth issues when ovs-vsctl cmd stucking.

In our environment when applied this patch can solve this problem. For the root cause we are still analyzing, can you help us?
Through our analysis, we suspect that when the ovs-vsctl cmd (ex: ovs-vsctl list interface) sticking, but unix socket link still exists. At this time if the ovsdb data updated, it will continue to send the updated data to ovs-vsctl. Resulting in continuous allocate rpc, but these rpcs cannot be released.

BR.
Zeng Ganghui
Huawei Technologies Co., Ltd.

-----Original Message-----
From: Ben Pfaff [mailto:blp@ovn.org] 
Sent: Sunday, August 27, 2017 6:33 AM
To: zengganghui
Cc: dev@openvswitch.org
Subject: Re: [PATCH] fix ovsdb-server memory growth issues when ovs-vsctl cmd stucking.

On Sat, Aug 26, 2017 at 07:44:53PM +0800, ZengGanghui wrote:
> When rpc sends message failed with return EAGIN, it will try to send again unlimitly.
> Finally, the memory of ovsdb-server will grow infinitly.
> 
> Signed-off-by: ZengGanghui <zengganghui@huawei.com>

This patch doesn't make sense to me.  A failed stream send should not use memory.  Can you explain the problem and why this solves it?
Ben Pfaff Aug. 31, 2017, 4:42 p.m. UTC | #4
Great.  That patch is already on all relevant branches, so I guess we
are done here.  Thanks for figuring out the issue!

On Wed, Aug 30, 2017 at 03:49:43AM +0000, zengganghui wrote:
> We have found a patch (http://patchwork.ozlabs.org/patch/593753/) can resolve this problem.
> 
> BR.
> Zeng Ganghui
> Huawei Technologies Co., Ltd.
> 
> -----Original Message-----
> From: zengganghui 
> Sent: Monday, August 28, 2017 11:15 AM
> To: 'Ben Pfaff'
> Cc: dev@openvswitch.org
> Subject: RE: [PATCH] fix ovsdb-server memory growth issues when ovs-vsctl cmd stucking.
> 
> In our environment when applied this patch can solve this problem. For the root cause we are still analyzing, can you help us?
> Through our analysis, we suspect that when the ovs-vsctl cmd (ex: ovs-vsctl list interface) sticking, but unix socket link still exists. At this time if the ovsdb data updated, it will continue to send the updated data to ovs-vsctl. Resulting in continuous allocate rpc, but these rpcs cannot be released.
> 
> BR.
> Zeng Ganghui
> Huawei Technologies Co., Ltd.
> 
> -----Original Message-----
> From: Ben Pfaff [mailto:blp@ovn.org] 
> Sent: Sunday, August 27, 2017 6:33 AM
> To: zengganghui
> Cc: dev@openvswitch.org
> Subject: Re: [PATCH] fix ovsdb-server memory growth issues when ovs-vsctl cmd stucking.
> 
> On Sat, Aug 26, 2017 at 07:44:53PM +0800, ZengGanghui wrote:
> > When rpc sends message failed with return EAGIN, it will try to send again unlimitly.
> > Finally, the memory of ovsdb-server will grow infinitly.
> > 
> > Signed-off-by: ZengGanghui <zengganghui@huawei.com>
> 
> This patch doesn't make sense to me.  A failed stream send should not use memory.  Can you explain the problem and why this solves it?
diff mbox

Patch

diff --git a/lib/jsonrpc.c b/lib/jsonrpc.c
index 2fae057..6506dc2 100644
--- a/lib/jsonrpc.c
+++ b/lib/jsonrpc.c
@@ -49,6 +49,9 @@  struct jsonrpc {
     struct ovs_list output;     /* Contains "struct ofpbuf"s. */
     size_t output_count;        /* Number of elements in "output". */
     size_t backlog;
+
+    /* send retry times */
+    int retries;
 };
 
 /* Rate limit for error messages. */
@@ -127,10 +130,17 @@  jsonrpc_run(struct jsonrpc *rpc)
                 ofpbuf_delete(buf);
             }
         } else {
-            if (retval != -EAGAIN) {
-                VLOG_WARN_RL(&rl, "%s: send error: %s",
-                             rpc->name, ovs_strerror(-retval));
+            if (retval != -EAGAIN || rpc->retries++ > 256) {
+                VLOG_WARN_RL(&rl, "%s: send error - %s: %s",
+                             rpc->name,
+                             retval != -EAGAIN ? "not again" : "over retries",
+                             ovs_strerror(-retval));
                 jsonrpc_error(rpc, -retval);
+            } else {
+                VLOG_DBG_RL(&rl, "%s: send again: %s, retries %d",
+                             rpc->name,
+                             ovs_strerror(-retval),
+                             rpc->retries);
             }
             break;
         }
@@ -503,6 +513,7 @@  jsonrpc_cleanup(struct jsonrpc *rpc)
     ofpbuf_list_delete(&rpc->output);
     rpc->backlog = 0;
     rpc->output_count = 0;
+    rpc->retries = 0;
 }
 
 static struct jsonrpc_msg *