Message ID | 1495428900-11436-1-git-send-email-wangyunjian@huawei.com |
---|---|
State | Accepted |
Headers | show |
On Mon, 2017-05-22 at 12:55 +0800, Yunjian Wang wrote: > The memory leak was triggered each time on calling netflow_unref() with > containing netflow_flows. And flows need to be removed and destroyed. > > Signed-off-by: Yunjian Wang <wangyunjian@huawei.com> > --- > ofproto/netflow.c | 7 +++++++ > 1 file changed, 7 insertions(+) > > diff --git a/ofproto/netflow.c b/ofproto/netflow.c > index 55f7814..29c5f3e 100644 > --- a/ofproto/netflow.c > +++ b/ofproto/netflow.c > @@ -409,10 +409,17 @@ netflow_ref(const struct netflow *nf_) > void > netflow_unref(struct netflow *nf) > { > + struct netflow_flow *nf_flow, *next; > + > if (nf && ovs_refcount_unref_relaxed(&nf->ref_cnt) == 1) { > atomic_count_dec(&netflow_count); > collectors_destroy(nf->collectors); > ofpbuf_uninit(&nf->packet); > + HMAP_FOR_EACH_SAFE (nf_flow, next, hmap_node, &nf->flows) { > + hmap_remove(&nf->flows, &nf_flow->hmap_node); > + free(nf_flow); > + } > + hmap_destroy(&nf->flows); > free(nf); > } > } This looks right to me. The only other place I see the flows freed is when they're detected as idle. If the flow is never detected as idle then I don't see anywhere else that they are freed up after the xzalloc in netflow_flow_update(). Reviewed-by: Greg Rose <gvrose8192@gmail.com>
On Mon, 2017-05-22 at 07:40 -0700, Greg Rose wrote: > On Mon, 2017-05-22 at 12:55 +0800, Yunjian Wang wrote: > > The memory leak was triggered each time on calling netflow_unref() with > > containing netflow_flows. And flows need to be removed and destroyed. > > > > Signed-off-by: Yunjian Wang <wangyunjian@huawei.com> > > --- > > ofproto/netflow.c | 7 +++++++ > > 1 file changed, 7 insertions(+) > > > > diff --git a/ofproto/netflow.c b/ofproto/netflow.c > > index 55f7814..29c5f3e 100644 > > --- a/ofproto/netflow.c > > +++ b/ofproto/netflow.c > > @@ -409,10 +409,17 @@ netflow_ref(const struct netflow *nf_) > > void > > netflow_unref(struct netflow *nf) > > { > > + struct netflow_flow *nf_flow, *next; > > + > > if (nf && ovs_refcount_unref_relaxed(&nf->ref_cnt) == 1) { > > atomic_count_dec(&netflow_count); > > collectors_destroy(nf->collectors); > > ofpbuf_uninit(&nf->packet); > > + HMAP_FOR_EACH_SAFE (nf_flow, next, hmap_node, &nf->flows) { > > + hmap_remove(&nf->flows, &nf_flow->hmap_node); > > + free(nf_flow); > > + } > > + hmap_destroy(&nf->flows); > > free(nf); > > } > > } > > This looks right to me. The only other place I see the flows freed is > when they're detected as idle. If the flow is never detected as idle > then I don't see anywhere else that they are freed up after the xzalloc > in netflow_flow_update(). > > Reviewed-by: Greg Rose <gvrose8192@gmail.com> > I'm trying to test this but the condition never seems to be met and thus the 4 lines of additional code free flows never executes. Is there some particular flow or type of network traffic that will execute this code? I'd like to add a Tested-by for this but unless I can get the code to execute I can't. - Greg
On 21 May 2017 at 21:55, Yunjian Wang <wangyunjian@huawei.com> wrote: > The memory leak was triggered each time on calling netflow_unref() with > containing netflow_flows. And flows need to be removed and destroyed. > > Signed-off-by: Yunjian Wang <wangyunjian@huawei.com> > --- Does this also need to netflow_expire__() the flows? Would it make sense for this flow clear path to share code with netflow_flow_clear()?
On Thu, 2017-05-25 at 09:53 -0700, Greg Rose wrote: > On Mon, 2017-05-22 at 07:40 -0700, Greg Rose wrote: > > On Mon, 2017-05-22 at 12:55 +0800, Yunjian Wang wrote: > > > The memory leak was triggered each time on calling netflow_unref() with > > > containing netflow_flows. And flows need to be removed and destroyed. > > > > > > Signed-off-by: Yunjian Wang <wangyunjian@huawei.com> > > > --- > > > ofproto/netflow.c | 7 +++++++ > > > 1 file changed, 7 insertions(+) > > > > > > diff --git a/ofproto/netflow.c b/ofproto/netflow.c > > > index 55f7814..29c5f3e 100644 > > > --- a/ofproto/netflow.c > > > +++ b/ofproto/netflow.c > > > @@ -409,10 +409,17 @@ netflow_ref(const struct netflow *nf_) > > > void > > > netflow_unref(struct netflow *nf) > > > { > > > + struct netflow_flow *nf_flow, *next; > > > + > > > if (nf && ovs_refcount_unref_relaxed(&nf->ref_cnt) == 1) { > > > atomic_count_dec(&netflow_count); > > > collectors_destroy(nf->collectors); > > > ofpbuf_uninit(&nf->packet); > > > + HMAP_FOR_EACH_SAFE (nf_flow, next, hmap_node, &nf->flows) { > > > + hmap_remove(&nf->flows, &nf_flow->hmap_node); > > > + free(nf_flow); > > > + } > > > + hmap_destroy(&nf->flows); > > > free(nf); > > > } > > > } > > > > This looks right to me. The only other place I see the flows freed is > > when they're detected as idle. If the flow is never detected as idle > > then I don't see anywhere else that they are freed up after the xzalloc > > in netflow_flow_update(). > > > > Reviewed-by: Greg Rose <gvrose8192@gmail.com> > > > > I'm trying to test this but the condition never seems to be met and thus > the 4 lines of additional code free flows never executes. Is there some > particular flow or type of network traffic that will execute this code? > > I'd like to add a Tested-by for this but unless I can get the code to > execute I can't. > > - Greg OK, I've finally got my netflow configuration working right with ntopng collecting the flows. I can test the patch tomorrow, it's getting late for me now. Thanks, - Greg
On Thu, 2017-05-25 at 21:05 -0700, Greg Rose wrote: > On Thu, 2017-05-25 at 09:53 -0700, Greg Rose wrote: > > On Mon, 2017-05-22 at 07:40 -0700, Greg Rose wrote: > > > On Mon, 2017-05-22 at 12:55 +0800, Yunjian Wang wrote: > > > > The memory leak was triggered each time on calling netflow_unref() with > > > > containing netflow_flows. And flows need to be removed and destroyed. > > > > > > > > Signed-off-by: Yunjian Wang <wangyunjian@huawei.com> > > > > --- > > > > ofproto/netflow.c | 7 +++++++ > > > > 1 file changed, 7 insertions(+) > > > > > > > > diff --git a/ofproto/netflow.c b/ofproto/netflow.c > > > > index 55f7814..29c5f3e 100644 > > > > --- a/ofproto/netflow.c > > > > +++ b/ofproto/netflow.c > > > > @@ -409,10 +409,17 @@ netflow_ref(const struct netflow *nf_) > > > > void > > > > netflow_unref(struct netflow *nf) > > > > { > > > > + struct netflow_flow *nf_flow, *next; > > > > + > > > > if (nf && ovs_refcount_unref_relaxed(&nf->ref_cnt) == 1) { > > > > atomic_count_dec(&netflow_count); > > > > collectors_destroy(nf->collectors); > > > > ofpbuf_uninit(&nf->packet); > > > > + HMAP_FOR_EACH_SAFE (nf_flow, next, hmap_node, &nf->flows) { > > > > + hmap_remove(&nf->flows, &nf_flow->hmap_node); > > > > + free(nf_flow); > > > > + } > > > > + hmap_destroy(&nf->flows); > > > > free(nf); > > > > } > > > > } > > > > > > This looks right to me. The only other place I see the flows freed is > > > when they're detected as idle. If the flow is never detected as idle > > > then I don't see anywhere else that they are freed up after the xzalloc > > > in netflow_flow_update(). > > > > > > Reviewed-by: Greg Rose <gvrose8192@gmail.com> > > > > > > > I'm trying to test this but the condition never seems to be met and thus > > the 4 lines of additional code free flows never executes. Is there some > > particular flow or type of network traffic that will execute this code? > > > > I'd like to add a Tested-by for this but unless I can get the code to > > execute I can't. > > > > - Greg > > OK, I've finally got my netflow configuration working right with ntopng > collecting the flows. I can test the patch tomorrow, it's getting late > for me now. > > Thanks, > > - Greg Actually, ntopng doesn't work as the flow collector itself (I spoke too soon) but I am successfully creating netflows which I can see with tcpdump. Then we are hitting the condition net netflow_unref() when I execute this command: ovs-vsctl clear Bridge int-br1 netflow However, the '*' marked lines of code are never executed when I place a gdb breakpoint on it: HMAP_FOR_EACH_SAFE (nf_flow, next, hmap_node, &nf->flows) { * hmap_remove(&nf->flows, &nf_flow->hmap_node); * free(nf_flow); } It seems the code is never reached. Unless we can see some example of the code being reached and executed I worry about this patch. Thanks, - Greg
> -----Original Message----- > From: ovs-dev-bounces@openvswitch.org [mailto:ovs-dev- > bounces@openvswitch.org] On Behalf Of Greg Rose > Sent: Friday, May 26, 2017 11:51 PM > To: ovs-dev@openvswitch.org > Subject: Re: [ovs-dev] [PATCH v2] netflow: Fix memory leak in > netflow_unref. > > On Thu, 2017-05-25 at 21:05 -0700, Greg Rose wrote: > > On Thu, 2017-05-25 at 09:53 -0700, Greg Rose wrote: > > > On Mon, 2017-05-22 at 07:40 -0700, Greg Rose wrote: > > > > On Mon, 2017-05-22 at 12:55 +0800, Yunjian Wang wrote: > > > > > The memory leak was triggered each time on calling > > > > > netflow_unref() with containing netflow_flows. And flows need to be > removed and destroyed. > > > > > > > > > > Signed-off-by: Yunjian Wang <wangyunjian@huawei.com> > > > > > --- > > > > > ofproto/netflow.c | 7 +++++++ > > > > > 1 file changed, 7 insertions(+) > > > > > > > > > > diff --git a/ofproto/netflow.c b/ofproto/netflow.c index > > > > > 55f7814..29c5f3e 100644 > > > > > --- a/ofproto/netflow.c > > > > > +++ b/ofproto/netflow.c > > > > > @@ -409,10 +409,17 @@ netflow_ref(const struct netflow *nf_) > > > > > void netflow_unref(struct netflow *nf) { > > > > > + struct netflow_flow *nf_flow, *next; > > > > > + > > > > > if (nf && ovs_refcount_unref_relaxed(&nf->ref_cnt) == 1) { > > > > > atomic_count_dec(&netflow_count); > > > > > collectors_destroy(nf->collectors); > > > > > ofpbuf_uninit(&nf->packet); > > > > > + HMAP_FOR_EACH_SAFE (nf_flow, next, hmap_node, &nf- > >flows) { > > > > > + hmap_remove(&nf->flows, &nf_flow->hmap_node); > > > > > + free(nf_flow); > > > > > + } > > > > > + hmap_destroy(&nf->flows); > > > > > free(nf); > > > > > } > > > > > } > > > > > > > > This looks right to me. The only other place I see the flows > > > > freed is when they're detected as idle. If the flow is never > > > > detected as idle then I don't see anywhere else that they are > > > > freed up after the xzalloc in netflow_flow_update(). > > > > > > > > Reviewed-by: Greg Rose <gvrose8192@gmail.com> > > > > > > > > > > I'm trying to test this but the condition never seems to be met and > > > thus the 4 lines of additional code free flows never executes. Is > > > there some particular flow or type of network traffic that will execute this > code? > > > > > > I'd like to add a Tested-by for this but unless I can get the code > > > to execute I can't. > > > > > > - Greg > > > > OK, I've finally got my netflow configuration working right with > > ntopng collecting the flows. I can test the patch tomorrow, it's > > getting late for me now. > > > > Thanks, > > > > - Greg > > Actually, ntopng doesn't work as the flow collector itself (I spoke too > soon) but I am successfully creating netflows which I can see with tcpdump. > Then we are hitting the condition net netflow_unref() when I execute this > command: > > ovs-vsctl clear Bridge int-br1 netflow > > However, the '*' marked lines of code are never executed when I place a gdb > breakpoint on it: > > HMAP_FOR_EACH_SAFE (nf_flow, next, hmap_node, &nf->flows) { > * hmap_remove(&nf->flows, &nf_flow->hmap_node); > * free(nf_flow); > } > > It seems the code is never reached. Unless we can see some example of the > code being reached and executed I worry about this patch. > > Thanks, > > - Greg It is probable that the need to test many times. Thanks, Yunjian my ovs version: openvswitch-2.7.0 Test Script: ovs-vsctl add-br ovs ovs-vsctl add-port ovs eth1 for (( i=0; i<5000000; i=i+1 )); do ovs-vsctl -- --id=@netflow create netflow targets=\"3.3.2.26:9996\" active_timeout=30 -- set bridge ovs netflow=@netflow sleep 3 ovs-vsctl clear bridge ovs netflow sleep 1 done Test results: (gdb) b ofproto/netflow.c:419 Breakpoint 1 at 0x4aeb74: file ofproto/netflow.c, line 419. (gdb) c Continuing. [Switching to Thread 0x7f09ed273700 (LWP 19046)] Breakpoint 1, netflow_unref (nf=0x408dbd0) at ofproto/netflow.c:419 419 hmap_remove(&nf->flows, &nf_flow->hmap_node); (gdb) bt #0 netflow_unref (nf=0x408dbd0) at ofproto/netflow.c:419 #1 0x00000000004a4347 in xlate_cache_clear_netflow (netflow=0x408dbd0, flow=0x7f09dc011a90) at ofproto/ofproto-dpif-xlate-cache.c:200 #2 0x00000000004a43e2 in xlate_cache_clear_entry (entry=0x7f09dc010d08) at ofproto/ofproto-dpif-xlate-cache.c:221 #3 0x00000000004a44dc in xlate_cache_clear (xcache=0x7f09dc01ca90) at ofproto/ofproto-dpif-xlate-cache.c:262 #4 0x00000000004a451e in xlate_cache_uninit (xcache=0x7f09dc01ca90) at ofproto/ofproto-dpif-xlate-cache.c:271 #5 0x00000000004a4544 in xlate_cache_delete (xcache=0x7f09dc01ca90) at ofproto/ofproto-dpif-xlate-cache.c:278 #6 0x00000000004911d9 in ukey_delete__ (ukey=0x40987d0) at ofproto/ofproto-dpif-upcall.c:1801 #7 0x000000000056c9f5 in ovsrcu_call_postponed () at lib/ovs-rcu.c:323 #8 0x000000000056ca9b in ovsrcu_postpone_thread (arg=0x0) at lib/ovs-rcu.c:338 #9 0x000000000056efa0 in ovsthread_wrapper (aux_=0x3fa3c30) at lib/ovs-thread.c:348 #10 0x00007f09f1e44dc5 in start_thread () from /usr/lib64/libpthread.so.0 #11 0x00007f09f0c0a71d in clone () from /usr/lib64/libc.so.6 (gdb) l 414 if (nf && ovs_refcount_unref_relaxed(&nf->ref_cnt) == 1) { 415 atomic_count_dec(&netflow_count); 416 collectors_destroy(nf->collectors); 417 ofpbuf_uninit(&nf->packet); 418 HMAP_FOR_EACH_SAFE (nf_flow, next, hmap_node, &nf->flows) { 419 hmap_remove(&nf->flows, &nf_flow->hmap_node); 420 free(nf_flow); 421 } 422 free(nf); 423 } (gdb) info b Num Type Disp Enb Address What 1 breakpoint keep y 0x00000000004aeb74 in netflow_unref at ofproto/netflow.c:419 breakpoint already hit 1 time (gdb) p nf->flows $1 = {buckets = 0x408dc50, one = 0x3fd1fd0, mask = 0, n = 1} (gdb) > > _______________________________________________ > dev mailing list > dev@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
On Sat, 2017-05-27 at 01:43 +0000, wangyunjian wrote: > > > -----Original Message----- > > From: ovs-dev-bounces@openvswitch.org [mailto:ovs-dev- > > bounces@openvswitch.org] On Behalf Of Greg Rose > > Sent: Friday, May 26, 2017 11:51 PM > > To: ovs-dev@openvswitch.org > > Subject: Re: [ovs-dev] [PATCH v2] netflow: Fix memory leak in > > netflow_unref. > > > > On Thu, 2017-05-25 at 21:05 -0700, Greg Rose wrote: > > > On Thu, 2017-05-25 at 09:53 -0700, Greg Rose wrote: > > > > On Mon, 2017-05-22 at 07:40 -0700, Greg Rose wrote: > > > > > On Mon, 2017-05-22 at 12:55 +0800, Yunjian Wang wrote: > > > > > > The memory leak was triggered each time on calling > > > > > > netflow_unref() with containing netflow_flows. And flows need to be > > removed and destroyed. > > > > > > > > > > > > Signed-off-by: Yunjian Wang <wangyunjian@huawei.com> > > > > > > --- > > > > > > ofproto/netflow.c | 7 +++++++ > > > > > > 1 file changed, 7 insertions(+) > > > > > > > > > > > > diff --git a/ofproto/netflow.c b/ofproto/netflow.c index > > > > > > 55f7814..29c5f3e 100644 > > > > > > --- a/ofproto/netflow.c > > > > > > +++ b/ofproto/netflow.c > > > > > > @@ -409,10 +409,17 @@ netflow_ref(const struct netflow *nf_) > > > > > > void netflow_unref(struct netflow *nf) { > > > > > > + struct netflow_flow *nf_flow, *next; > > > > > > + > > > > > > if (nf && ovs_refcount_unref_relaxed(&nf->ref_cnt) == 1) { > > > > > > atomic_count_dec(&netflow_count); > > > > > > collectors_destroy(nf->collectors); > > > > > > ofpbuf_uninit(&nf->packet); > > > > > > + HMAP_FOR_EACH_SAFE (nf_flow, next, hmap_node, &nf- > > >flows) { > > > > > > + hmap_remove(&nf->flows, &nf_flow->hmap_node); > > > > > > + free(nf_flow); > > > > > > + } > > > > > > + hmap_destroy(&nf->flows); > > > > > > free(nf); > > > > > > } > > > > > > } > > > > > > > > > > This looks right to me. The only other place I see the flows > > > > > freed is when they're detected as idle. If the flow is never > > > > > detected as idle then I don't see anywhere else that they are > > > > > freed up after the xzalloc in netflow_flow_update(). > > > > > > > > > > Reviewed-by: Greg Rose <gvrose8192@gmail.com> > > > > > > > > > > > > > I'm trying to test this but the condition never seems to be met and > > > > thus the 4 lines of additional code free flows never executes. Is > > > > there some particular flow or type of network traffic that will execute this > > code? > > > > > > > > I'd like to add a Tested-by for this but unless I can get the code > > > > to execute I can't. > > > > > > > > - Greg > > > > > > OK, I've finally got my netflow configuration working right with > > > ntopng collecting the flows. I can test the patch tomorrow, it's > > > getting late for me now. > > > > > > Thanks, > > > > > > - Greg > > > > Actually, ntopng doesn't work as the flow collector itself (I spoke too > > soon) but I am successfully creating netflows which I can see with tcpdump. > > Then we are hitting the condition net netflow_unref() when I execute this > > command: > > > > ovs-vsctl clear Bridge int-br1 netflow > > > > However, the '*' marked lines of code are never executed when I place a gdb > > breakpoint on it: > > > > HMAP_FOR_EACH_SAFE (nf_flow, next, hmap_node, &nf->flows) { > > * hmap_remove(&nf->flows, &nf_flow->hmap_node); > > * free(nf_flow); > > } > > > > It seems the code is never reached. Unless we can see some example of the > > code being reached and executed I worry about this patch. > > > > Thanks, > > > > - Greg > > It is probable that the need to test many times. > > Thanks, > > Yunjian > > my ovs version: openvswitch-2.7.0 > > Test Script: > ovs-vsctl add-br ovs > ovs-vsctl add-port ovs eth1 > > for (( i=0; i<5000000; i=i+1 )); do > ovs-vsctl -- --id=@netflow create netflow targets=\"3.3.2.26:9996\" active_timeout=30 -- set bridge ovs netflow=@netflow > sleep 3 > ovs-vsctl clear bridge ovs netflow > sleep 1 > done So we have to beat it up a little. OK, I'll give it a try. Thanks, - Greg > > Test results: > (gdb) b ofproto/netflow.c:419 > Breakpoint 1 at 0x4aeb74: file ofproto/netflow.c, line 419. > (gdb) c > Continuing. > [Switching to Thread 0x7f09ed273700 (LWP 19046)] > > Breakpoint 1, netflow_unref (nf=0x408dbd0) at ofproto/netflow.c:419 > 419 hmap_remove(&nf->flows, &nf_flow->hmap_node); > (gdb) bt > #0 netflow_unref (nf=0x408dbd0) at ofproto/netflow.c:419 > #1 0x00000000004a4347 in xlate_cache_clear_netflow (netflow=0x408dbd0, flow=0x7f09dc011a90) at ofproto/ofproto-dpif-xlate-cache.c:200 > #2 0x00000000004a43e2 in xlate_cache_clear_entry (entry=0x7f09dc010d08) at ofproto/ofproto-dpif-xlate-cache.c:221 > #3 0x00000000004a44dc in xlate_cache_clear (xcache=0x7f09dc01ca90) at ofproto/ofproto-dpif-xlate-cache.c:262 > #4 0x00000000004a451e in xlate_cache_uninit (xcache=0x7f09dc01ca90) at ofproto/ofproto-dpif-xlate-cache.c:271 > #5 0x00000000004a4544 in xlate_cache_delete (xcache=0x7f09dc01ca90) at ofproto/ofproto-dpif-xlate-cache.c:278 > #6 0x00000000004911d9 in ukey_delete__ (ukey=0x40987d0) at ofproto/ofproto-dpif-upcall.c:1801 > #7 0x000000000056c9f5 in ovsrcu_call_postponed () at lib/ovs-rcu.c:323 > #8 0x000000000056ca9b in ovsrcu_postpone_thread (arg=0x0) at lib/ovs-rcu.c:338 > #9 0x000000000056efa0 in ovsthread_wrapper (aux_=0x3fa3c30) at lib/ovs-thread.c:348 > #10 0x00007f09f1e44dc5 in start_thread () from /usr/lib64/libpthread.so.0 > #11 0x00007f09f0c0a71d in clone () from /usr/lib64/libc.so.6 > (gdb) l > 414 if (nf && ovs_refcount_unref_relaxed(&nf->ref_cnt) == 1) { > 415 atomic_count_dec(&netflow_count); > 416 collectors_destroy(nf->collectors); > 417 ofpbuf_uninit(&nf->packet); > 418 HMAP_FOR_EACH_SAFE (nf_flow, next, hmap_node, &nf->flows) { > 419 hmap_remove(&nf->flows, &nf_flow->hmap_node); > 420 free(nf_flow); > 421 } > 422 free(nf); > 423 } > (gdb) info b > Num Type Disp Enb Address What > 1 breakpoint keep y 0x00000000004aeb74 in netflow_unref at ofproto/netflow.c:419 > breakpoint already hit 1 time > (gdb) p nf->flows > $1 = {buckets = 0x408dc50, one = 0x3fd1fd0, mask = 0, n = 1} > (gdb) > > > > > _______________________________________________ > > dev mailing list > > dev@openvswitch.org > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
On Thu, May 25, 2017 at 01:47:32PM -0700, Joe Stringer wrote: > On 21 May 2017 at 21:55, Yunjian Wang <wangyunjian@huawei.com> wrote: > > The memory leak was triggered each time on calling netflow_unref() with > > containing netflow_flows. And flows need to be removed and destroyed. > > > > Signed-off-by: Yunjian Wang <wangyunjian@huawei.com> > > --- > > Does this also need to netflow_expire__() the flows? Would it make > sense for this flow clear path to share code with > netflow_flow_clear()? That would certainly make sense. However, it adds a lot of extra code to run at the time we're freeing something, which is always a little risky. Since Greg is having trouble testing this at all, I lean toward just freeing without trying to send out NetFlow expirations.
On Tue, 2017-05-30 at 09:47 -0700, Ben Pfaff wrote: > On Thu, May 25, 2017 at 01:47:32PM -0700, Joe Stringer wrote: > > On 21 May 2017 at 21:55, Yunjian Wang <wangyunjian@huawei.com> wrote: > > > The memory leak was triggered each time on calling netflow_unref() with > > > containing netflow_flows. And flows need to be removed and destroyed. > > > > > > Signed-off-by: Yunjian Wang <wangyunjian@huawei.com> > > > --- > > > > Does this also need to netflow_expire__() the flows? Would it make > > sense for this flow clear path to share code with > > netflow_flow_clear()? > > That would certainly make sense. However, it adds a lot of extra code > to run at the time we're freeing something, which is always a little > risky. Since Greg is having trouble testing this at all, I lean toward > just freeing without trying to send out NetFlow expirations. I've still had no luck on testing. Over the weekend I ran a continuous loop as suggested by Yunjian that constantly enables and disables netflow and didn't get a break on that line of code. However, I'm looking at adjusting the background network traffic to something that should create more flows and will try again today. - Greg > _______________________________________________ > dev mailing list > dev@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
On Sat, 2017-05-27 at 01:43 +0000, wangyunjian wrote: > > > -----Original Message----- > > From: ovs-dev-bounces@openvswitch.org [mailto:ovs-dev- > > bounces@openvswitch.org] On Behalf Of Greg Rose > > Sent: Friday, May 26, 2017 11:51 PM > > To: ovs-dev@openvswitch.org > > Subject: Re: [ovs-dev] [PATCH v2] netflow: Fix memory leak in > > netflow_unref. > > > > On Thu, 2017-05-25 at 21:05 -0700, Greg Rose wrote: > > > On Thu, 2017-05-25 at 09:53 -0700, Greg Rose wrote: > > > > On Mon, 2017-05-22 at 07:40 -0700, Greg Rose wrote: > > > > > On Mon, 2017-05-22 at 12:55 +0800, Yunjian Wang wrote: > > > > > > The memory leak was triggered each time on calling > > > > > > netflow_unref() with containing netflow_flows. And flows need to be > > removed and destroyed. > > > > > > > > > > > > Signed-off-by: Yunjian Wang <wangyunjian@huawei.com> > > > > > > --- > > > > > > ofproto/netflow.c | 7 +++++++ > > > > > > 1 file changed, 7 insertions(+) > > > > > > > > > > > > diff --git a/ofproto/netflow.c b/ofproto/netflow.c index > > > > > > 55f7814..29c5f3e 100644 > > > > > > --- a/ofproto/netflow.c > > > > > > +++ b/ofproto/netflow.c > > > > > > @@ -409,10 +409,17 @@ netflow_ref(const struct netflow *nf_) > > > > > > void netflow_unref(struct netflow *nf) { > > > > > > + struct netflow_flow *nf_flow, *next; > > > > > > + > > > > > > if (nf && ovs_refcount_unref_relaxed(&nf->ref_cnt) == 1) { > > > > > > atomic_count_dec(&netflow_count); > > > > > > collectors_destroy(nf->collectors); > > > > > > ofpbuf_uninit(&nf->packet); > > > > > > + HMAP_FOR_EACH_SAFE (nf_flow, next, hmap_node, &nf- > > >flows) { > > > > > > + hmap_remove(&nf->flows, &nf_flow->hmap_node); > > > > > > + free(nf_flow); > > > > > > + } > > > > > > + hmap_destroy(&nf->flows); > > > > > > free(nf); > > > > > > } > > > > > > } > > > > > > > > > > This looks right to me. The only other place I see the flows > > > > > freed is when they're detected as idle. If the flow is never > > > > > detected as idle then I don't see anywhere else that they are > > > > > freed up after the xzalloc in netflow_flow_update(). > > > > > > > > > > Reviewed-by: Greg Rose <gvrose8192@gmail.com> > > > > > > > > > > > > > I'm trying to test this but the condition never seems to be met and > > > > thus the 4 lines of additional code free flows never executes. Is > > > > there some particular flow or type of network traffic that will execute this > > code? > > > > > > > > I'd like to add a Tested-by for this but unless I can get the code > > > > to execute I can't. > > > > > > > > - Greg > > > > > > OK, I've finally got my netflow configuration working right with > > > ntopng collecting the flows. I can test the patch tomorrow, it's > > > getting late for me now. > > > > > > Thanks, > > > > > > - Greg > > > > Actually, ntopng doesn't work as the flow collector itself (I spoke too > > soon) but I am successfully creating netflows which I can see with tcpdump. > > Then we are hitting the condition net netflow_unref() when I execute this > > command: > > > > ovs-vsctl clear Bridge int-br1 netflow > > > > However, the '*' marked lines of code are never executed when I place a gdb > > breakpoint on it: > > > > HMAP_FOR_EACH_SAFE (nf_flow, next, hmap_node, &nf->flows) { > > * hmap_remove(&nf->flows, &nf_flow->hmap_node); > > * free(nf_flow); > > } > > > > It seems the code is never reached. Unless we can see some example of the > > code being reached and executed I worry about this patch. > > > > Thanks, > > > > - Greg > > It is probable that the need to test many times. > > Thanks, > > Yunjian > > my ovs version: openvswitch-2.7.0 > > Test Script: > ovs-vsctl add-br ovs > ovs-vsctl add-port ovs eth1 > > for (( i=0; i<5000000; i=i+1 )); do > ovs-vsctl -- --id=@netflow create netflow targets=\"3.3.2.26:9996\" active_timeout=30 -- set bridge ovs netflow=@netflow > sleep 3 > ovs-vsctl clear bridge ovs netflow > sleep 1 > done > > Test results: > (gdb) b ofproto/netflow.c:419 > Breakpoint 1 at 0x4aeb74: file ofproto/netflow.c, line 419. > (gdb) c > Continuing. > [Switching to Thread 0x7f09ed273700 (LWP 19046)] > > Breakpoint 1, netflow_unref (nf=0x408dbd0) at ofproto/netflow.c:419 > 419 hmap_remove(&nf->flows, &nf_flow->hmap_node); > (gdb) bt > #0 netflow_unref (nf=0x408dbd0) at ofproto/netflow.c:419 > #1 0x00000000004a4347 in xlate_cache_clear_netflow (netflow=0x408dbd0, flow=0x7f09dc011a90) at ofproto/ofproto-dpif-xlate-cache.c:200 > #2 0x00000000004a43e2 in xlate_cache_clear_entry (entry=0x7f09dc010d08) at ofproto/ofproto-dpif-xlate-cache.c:221 > #3 0x00000000004a44dc in xlate_cache_clear (xcache=0x7f09dc01ca90) at ofproto/ofproto-dpif-xlate-cache.c:262 > #4 0x00000000004a451e in xlate_cache_uninit (xcache=0x7f09dc01ca90) at ofproto/ofproto-dpif-xlate-cache.c:271 > #5 0x00000000004a4544 in xlate_cache_delete (xcache=0x7f09dc01ca90) at ofproto/ofproto-dpif-xlate-cache.c:278 > #6 0x00000000004911d9 in ukey_delete__ (ukey=0x40987d0) at ofproto/ofproto-dpif-upcall.c:1801 > #7 0x000000000056c9f5 in ovsrcu_call_postponed () at lib/ovs-rcu.c:323 > #8 0x000000000056ca9b in ovsrcu_postpone_thread (arg=0x0) at lib/ovs-rcu.c:338 > #9 0x000000000056efa0 in ovsthread_wrapper (aux_=0x3fa3c30) at lib/ovs-thread.c:348 > #10 0x00007f09f1e44dc5 in start_thread () from /usr/lib64/libpthread.so.0 > #11 0x00007f09f0c0a71d in clone () from /usr/lib64/libc.so.6 > (gdb) l > 414 if (nf && ovs_refcount_unref_relaxed(&nf->ref_cnt) == 1) { > 415 atomic_count_dec(&netflow_count); > 416 collectors_destroy(nf->collectors); > 417 ofpbuf_uninit(&nf->packet); > 418 HMAP_FOR_EACH_SAFE (nf_flow, next, hmap_node, &nf->flows) { > 419 hmap_remove(&nf->flows, &nf_flow->hmap_node); > 420 free(nf_flow); > 421 } > 422 free(nf); > 423 } > (gdb) info b > Num Type Disp Enb Address What > 1 breakpoint keep y 0x00000000004aeb74 in netflow_unref at ofproto/netflow.c:419 > breakpoint already hit 1 time > (gdb) p nf->flows > $1 = {buckets = 0x408dc50, one = 0x3fd1fd0, mask = 0, n = 1} > (gdb) I haven't been able to hit this breakpoint in my own setup after running this test for several days. It may be due to different types of traffic or some other subtle difference. I'm running iperf sessions with multiple ports/ip address combinations to create a lot of netflows and netflow updates but perhaps that is not enough to trigger the right condition. In any case, the code looks correct as per my review and Yunjian has a reproducer so I'll leave it to the maintainers as to what action they will take. I know that there were some further comments from Joe and Ben. Thanks, - Greg > > > > > _______________________________________________ > > dev mailing list > > dev@openvswitch.org > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
On 30 May 2017 at 09:47, Ben Pfaff <blp@ovn.org> wrote: > On Thu, May 25, 2017 at 01:47:32PM -0700, Joe Stringer wrote: >> On 21 May 2017 at 21:55, Yunjian Wang <wangyunjian@huawei.com> wrote: >> > The memory leak was triggered each time on calling netflow_unref() with >> > containing netflow_flows. And flows need to be removed and destroyed. >> > >> > Signed-off-by: Yunjian Wang <wangyunjian@huawei.com> >> > --- >> >> Does this also need to netflow_expire__() the flows? Would it make >> sense for this flow clear path to share code with >> netflow_flow_clear()? > > That would certainly make sense. However, it adds a lot of extra code > to run at the time we're freeing something, which is always a little > risky. Since Greg is having trouble testing this at all, I lean toward > just freeing without trying to send out NetFlow expirations. This seems fine to me. It looks like the biggest issue is the bug that's fixed by the patch, which seems correct to me so I'd be happy if this was applied as-is.
On Wed, May 31, 2017 at 11:16:10AM -0700, Joe Stringer wrote: > On 30 May 2017 at 09:47, Ben Pfaff <blp@ovn.org> wrote: > > On Thu, May 25, 2017 at 01:47:32PM -0700, Joe Stringer wrote: > >> On 21 May 2017 at 21:55, Yunjian Wang <wangyunjian@huawei.com> wrote: > >> > The memory leak was triggered each time on calling netflow_unref() with > >> > containing netflow_flows. And flows need to be removed and destroyed. > >> > > >> > Signed-off-by: Yunjian Wang <wangyunjian@huawei.com> > >> > --- > >> > >> Does this also need to netflow_expire__() the flows? Would it make > >> sense for this flow clear path to share code with > >> netflow_flow_clear()? > > > > That would certainly make sense. However, it adds a lot of extra code > > to run at the time we're freeing something, which is always a little > > risky. Since Greg is having trouble testing this at all, I lean toward > > just freeing without trying to send out NetFlow expirations. > > This seems fine to me. It looks like the biggest issue is the bug > that's fixed by the patch, which seems correct to me so I'd be happy > if this was applied as-is. I think that's what I'm going to do.
On Mon, May 22, 2017 at 12:55:00PM +0800, Yunjian Wang wrote: > The memory leak was triggered each time on calling netflow_unref() with > containing netflow_flows. And flows need to be removed and destroyed. > > Signed-off-by: Yunjian Wang <wangyunjian@huawei.com> Thanks, applied to master, branch-2.7, and branch-2.6.
diff --git a/ofproto/netflow.c b/ofproto/netflow.c index 55f7814..29c5f3e 100644 --- a/ofproto/netflow.c +++ b/ofproto/netflow.c @@ -409,10 +409,17 @@ netflow_ref(const struct netflow *nf_) void netflow_unref(struct netflow *nf) { + struct netflow_flow *nf_flow, *next; + if (nf && ovs_refcount_unref_relaxed(&nf->ref_cnt) == 1) { atomic_count_dec(&netflow_count); collectors_destroy(nf->collectors); ofpbuf_uninit(&nf->packet); + HMAP_FOR_EACH_SAFE (nf_flow, next, hmap_node, &nf->flows) { + hmap_remove(&nf->flows, &nf_flow->hmap_node); + free(nf_flow); + } + hmap_destroy(&nf->flows); free(nf); } }
The memory leak was triggered each time on calling netflow_unref() with containing netflow_flows. And flows need to be removed and destroyed. Signed-off-by: Yunjian Wang <wangyunjian@huawei.com> --- ofproto/netflow.c | 7 +++++++ 1 file changed, 7 insertions(+)