diff mbox

[ovs-dev,v2] netflow: Fix memory leak in netflow_unref.

Message ID 1495428900-11436-1-git-send-email-wangyunjian@huawei.com
State Accepted
Headers show

Commit Message

wangyunjian May 22, 2017, 4:55 a.m. UTC
The memory leak was triggered each time on calling netflow_unref() with
containing netflow_flows. And flows need to be removed and destroyed.

Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
---
 ofproto/netflow.c | 7 +++++++
 1 file changed, 7 insertions(+)

Comments

Gregory Rose May 22, 2017, 2:40 p.m. UTC | #1
On Mon, 2017-05-22 at 12:55 +0800, Yunjian Wang wrote:
> The memory leak was triggered each time on calling netflow_unref() with
> containing netflow_flows. And flows need to be removed and destroyed.
> 
> Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
> ---
>  ofproto/netflow.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/ofproto/netflow.c b/ofproto/netflow.c
> index 55f7814..29c5f3e 100644
> --- a/ofproto/netflow.c
> +++ b/ofproto/netflow.c
> @@ -409,10 +409,17 @@ netflow_ref(const struct netflow *nf_)
>  void
>  netflow_unref(struct netflow *nf)
>  {
> +    struct netflow_flow *nf_flow, *next;
> +
>      if (nf && ovs_refcount_unref_relaxed(&nf->ref_cnt) == 1) {
>          atomic_count_dec(&netflow_count);
>          collectors_destroy(nf->collectors);
>          ofpbuf_uninit(&nf->packet);
> +        HMAP_FOR_EACH_SAFE (nf_flow, next, hmap_node, &nf->flows) {
> +            hmap_remove(&nf->flows, &nf_flow->hmap_node);
> +            free(nf_flow);
> +        }
> +        hmap_destroy(&nf->flows);
>          free(nf);
>      }
>  }

This looks right to me.  The only other place I see the flows freed is
when they're detected as idle.  If the flow is never detected as idle
then I don't see anywhere else that they are freed up after the xzalloc
in netflow_flow_update().

Reviewed-by: Greg Rose <gvrose8192@gmail.com>
Gregory Rose May 25, 2017, 4:53 p.m. UTC | #2
On Mon, 2017-05-22 at 07:40 -0700, Greg Rose wrote:
> On Mon, 2017-05-22 at 12:55 +0800, Yunjian Wang wrote:
> > The memory leak was triggered each time on calling netflow_unref() with
> > containing netflow_flows. And flows need to be removed and destroyed.
> > 
> > Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
> > ---
> >  ofproto/netflow.c | 7 +++++++
> >  1 file changed, 7 insertions(+)
> > 
> > diff --git a/ofproto/netflow.c b/ofproto/netflow.c
> > index 55f7814..29c5f3e 100644
> > --- a/ofproto/netflow.c
> > +++ b/ofproto/netflow.c
> > @@ -409,10 +409,17 @@ netflow_ref(const struct netflow *nf_)
> >  void
> >  netflow_unref(struct netflow *nf)
> >  {
> > +    struct netflow_flow *nf_flow, *next;
> > +
> >      if (nf && ovs_refcount_unref_relaxed(&nf->ref_cnt) == 1) {
> >          atomic_count_dec(&netflow_count);
> >          collectors_destroy(nf->collectors);
> >          ofpbuf_uninit(&nf->packet);
> > +        HMAP_FOR_EACH_SAFE (nf_flow, next, hmap_node, &nf->flows) {
> > +            hmap_remove(&nf->flows, &nf_flow->hmap_node);
> > +            free(nf_flow);
> > +        }
> > +        hmap_destroy(&nf->flows);
> >          free(nf);
> >      }
> >  }
> 
> This looks right to me.  The only other place I see the flows freed is
> when they're detected as idle.  If the flow is never detected as idle
> then I don't see anywhere else that they are freed up after the xzalloc
> in netflow_flow_update().
> 
> Reviewed-by: Greg Rose <gvrose8192@gmail.com>
> 

I'm trying to test this but the condition never seems to be met and thus
the 4 lines of additional code free flows never executes.  Is there some
particular flow or type of network traffic that will execute this code?

I'd like to add a Tested-by for this but unless I can get the code to
execute I can't.

- Greg
Joe Stringer May 25, 2017, 8:47 p.m. UTC | #3
On 21 May 2017 at 21:55, Yunjian Wang <wangyunjian@huawei.com> wrote:
> The memory leak was triggered each time on calling netflow_unref() with
> containing netflow_flows. And flows need to be removed and destroyed.
>
> Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
> ---

Does this also need to netflow_expire__() the flows? Would it make
sense for this flow clear path to share code with
netflow_flow_clear()?
Gregory Rose May 26, 2017, 4:05 a.m. UTC | #4
On Thu, 2017-05-25 at 09:53 -0700, Greg Rose wrote:
> On Mon, 2017-05-22 at 07:40 -0700, Greg Rose wrote:
> > On Mon, 2017-05-22 at 12:55 +0800, Yunjian Wang wrote:
> > > The memory leak was triggered each time on calling netflow_unref() with
> > > containing netflow_flows. And flows need to be removed and destroyed.
> > > 
> > > Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
> > > ---
> > >  ofproto/netflow.c | 7 +++++++
> > >  1 file changed, 7 insertions(+)
> > > 
> > > diff --git a/ofproto/netflow.c b/ofproto/netflow.c
> > > index 55f7814..29c5f3e 100644
> > > --- a/ofproto/netflow.c
> > > +++ b/ofproto/netflow.c
> > > @@ -409,10 +409,17 @@ netflow_ref(const struct netflow *nf_)
> > >  void
> > >  netflow_unref(struct netflow *nf)
> > >  {
> > > +    struct netflow_flow *nf_flow, *next;
> > > +
> > >      if (nf && ovs_refcount_unref_relaxed(&nf->ref_cnt) == 1) {
> > >          atomic_count_dec(&netflow_count);
> > >          collectors_destroy(nf->collectors);
> > >          ofpbuf_uninit(&nf->packet);
> > > +        HMAP_FOR_EACH_SAFE (nf_flow, next, hmap_node, &nf->flows) {
> > > +            hmap_remove(&nf->flows, &nf_flow->hmap_node);
> > > +            free(nf_flow);
> > > +        }
> > > +        hmap_destroy(&nf->flows);
> > >          free(nf);
> > >      }
> > >  }
> > 
> > This looks right to me.  The only other place I see the flows freed is
> > when they're detected as idle.  If the flow is never detected as idle
> > then I don't see anywhere else that they are freed up after the xzalloc
> > in netflow_flow_update().
> > 
> > Reviewed-by: Greg Rose <gvrose8192@gmail.com>
> > 
> 
> I'm trying to test this but the condition never seems to be met and thus
> the 4 lines of additional code free flows never executes.  Is there some
> particular flow or type of network traffic that will execute this code?
> 
> I'd like to add a Tested-by for this but unless I can get the code to
> execute I can't.
> 
> - Greg

OK, I've finally got my netflow configuration working right with ntopng
collecting the flows.  I can test the patch tomorrow, it's getting late
for me now.

Thanks,

- Greg
Gregory Rose May 26, 2017, 3:50 p.m. UTC | #5
On Thu, 2017-05-25 at 21:05 -0700, Greg Rose wrote:
> On Thu, 2017-05-25 at 09:53 -0700, Greg Rose wrote:
> > On Mon, 2017-05-22 at 07:40 -0700, Greg Rose wrote:
> > > On Mon, 2017-05-22 at 12:55 +0800, Yunjian Wang wrote:
> > > > The memory leak was triggered each time on calling netflow_unref() with
> > > > containing netflow_flows. And flows need to be removed and destroyed.
> > > > 
> > > > Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
> > > > ---
> > > >  ofproto/netflow.c | 7 +++++++
> > > >  1 file changed, 7 insertions(+)
> > > > 
> > > > diff --git a/ofproto/netflow.c b/ofproto/netflow.c
> > > > index 55f7814..29c5f3e 100644
> > > > --- a/ofproto/netflow.c
> > > > +++ b/ofproto/netflow.c
> > > > @@ -409,10 +409,17 @@ netflow_ref(const struct netflow *nf_)
> > > >  void
> > > >  netflow_unref(struct netflow *nf)
> > > >  {
> > > > +    struct netflow_flow *nf_flow, *next;
> > > > +
> > > >      if (nf && ovs_refcount_unref_relaxed(&nf->ref_cnt) == 1) {
> > > >          atomic_count_dec(&netflow_count);
> > > >          collectors_destroy(nf->collectors);
> > > >          ofpbuf_uninit(&nf->packet);
> > > > +        HMAP_FOR_EACH_SAFE (nf_flow, next, hmap_node, &nf->flows) {
> > > > +            hmap_remove(&nf->flows, &nf_flow->hmap_node);
> > > > +            free(nf_flow);
> > > > +        }
> > > > +        hmap_destroy(&nf->flows);
> > > >          free(nf);
> > > >      }
> > > >  }
> > > 
> > > This looks right to me.  The only other place I see the flows freed is
> > > when they're detected as idle.  If the flow is never detected as idle
> > > then I don't see anywhere else that they are freed up after the xzalloc
> > > in netflow_flow_update().
> > > 
> > > Reviewed-by: Greg Rose <gvrose8192@gmail.com>
> > > 
> > 
> > I'm trying to test this but the condition never seems to be met and thus
> > the 4 lines of additional code free flows never executes.  Is there some
> > particular flow or type of network traffic that will execute this code?
> > 
> > I'd like to add a Tested-by for this but unless I can get the code to
> > execute I can't.
> > 
> > - Greg
> 
> OK, I've finally got my netflow configuration working right with ntopng
> collecting the flows.  I can test the patch tomorrow, it's getting late
> for me now.
> 
> Thanks,
> 
> - Greg

Actually, ntopng doesn't work as the flow collector itself (I spoke too
soon) but I am successfully creating netflows which I can see with
tcpdump. Then we are hitting the condition net netflow_unref() when I
execute this command:

ovs-vsctl clear Bridge int-br1 netflow

However, the '*' marked lines of code are never executed when I place a
gdb breakpoint on it:

        HMAP_FOR_EACH_SAFE (nf_flow, next, hmap_node, &nf->flows) {
*            hmap_remove(&nf->flows, &nf_flow->hmap_node);
*            free(nf_flow);
        }

It seems the code is never reached.  Unless we can see some example of
the code being reached and executed I worry about this patch.

Thanks,

- Greg
wangyunjian May 27, 2017, 1:43 a.m. UTC | #6
> -----Original Message-----
> From: ovs-dev-bounces@openvswitch.org [mailto:ovs-dev-
> bounces@openvswitch.org] On Behalf Of Greg Rose
> Sent: Friday, May 26, 2017 11:51 PM
> To: ovs-dev@openvswitch.org
> Subject: Re: [ovs-dev] [PATCH v2] netflow: Fix memory leak in
> netflow_unref.
> 
> On Thu, 2017-05-25 at 21:05 -0700, Greg Rose wrote:
> > On Thu, 2017-05-25 at 09:53 -0700, Greg Rose wrote:
> > > On Mon, 2017-05-22 at 07:40 -0700, Greg Rose wrote:
> > > > On Mon, 2017-05-22 at 12:55 +0800, Yunjian Wang wrote:
> > > > > The memory leak was triggered each time on calling
> > > > > netflow_unref() with containing netflow_flows. And flows need to be
> removed and destroyed.
> > > > >
> > > > > Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
> > > > > ---
> > > > >  ofproto/netflow.c | 7 +++++++
> > > > >  1 file changed, 7 insertions(+)
> > > > >
> > > > > diff --git a/ofproto/netflow.c b/ofproto/netflow.c index
> > > > > 55f7814..29c5f3e 100644
> > > > > --- a/ofproto/netflow.c
> > > > > +++ b/ofproto/netflow.c
> > > > > @@ -409,10 +409,17 @@ netflow_ref(const struct netflow *nf_)
> > > > > void  netflow_unref(struct netflow *nf)  {
> > > > > +    struct netflow_flow *nf_flow, *next;
> > > > > +
> > > > >      if (nf && ovs_refcount_unref_relaxed(&nf->ref_cnt) == 1) {
> > > > >          atomic_count_dec(&netflow_count);
> > > > >          collectors_destroy(nf->collectors);
> > > > >          ofpbuf_uninit(&nf->packet);
> > > > > +        HMAP_FOR_EACH_SAFE (nf_flow, next, hmap_node, &nf-
> >flows) {
> > > > > +            hmap_remove(&nf->flows, &nf_flow->hmap_node);
> > > > > +            free(nf_flow);
> > > > > +        }
> > > > > +        hmap_destroy(&nf->flows);
> > > > >          free(nf);
> > > > >      }
> > > > >  }
> > > >
> > > > This looks right to me.  The only other place I see the flows
> > > > freed is when they're detected as idle.  If the flow is never
> > > > detected as idle then I don't see anywhere else that they are
> > > > freed up after the xzalloc in netflow_flow_update().
> > > >
> > > > Reviewed-by: Greg Rose <gvrose8192@gmail.com>
> > > >
> > >
> > > I'm trying to test this but the condition never seems to be met and
> > > thus the 4 lines of additional code free flows never executes.  Is
> > > there some particular flow or type of network traffic that will execute this
> code?
> > >
> > > I'd like to add a Tested-by for this but unless I can get the code
> > > to execute I can't.
> > >
> > > - Greg
> >
> > OK, I've finally got my netflow configuration working right with
> > ntopng collecting the flows.  I can test the patch tomorrow, it's
> > getting late for me now.
> >
> > Thanks,
> >
> > - Greg
> 
> Actually, ntopng doesn't work as the flow collector itself (I spoke too
> soon) but I am successfully creating netflows which I can see with tcpdump.
> Then we are hitting the condition net netflow_unref() when I execute this
> command:
> 
> ovs-vsctl clear Bridge int-br1 netflow
> 
> However, the '*' marked lines of code are never executed when I place a gdb
> breakpoint on it:
> 
>         HMAP_FOR_EACH_SAFE (nf_flow, next, hmap_node, &nf->flows) {
> *            hmap_remove(&nf->flows, &nf_flow->hmap_node);
> *            free(nf_flow);
>         }
> 
> It seems the code is never reached.  Unless we can see some example of the
> code being reached and executed I worry about this patch.
> 
> Thanks,
> 
> - Greg

It is probable that the need to test many times.

Thanks, 

Yunjian

my ovs version: openvswitch-2.7.0

Test Script:
ovs-vsctl add-br ovs
ovs-vsctl add-port ovs eth1

for (( i=0; i<5000000; i=i+1 )); do
	ovs-vsctl -- --id=@netflow create netflow targets=\"3.3.2.26:9996\" active_timeout=30 -- set bridge ovs netflow=@netflow
	sleep 3
	ovs-vsctl clear bridge ovs netflow
	sleep 1
done

Test results:
(gdb) b ofproto/netflow.c:419
Breakpoint 1 at 0x4aeb74: file ofproto/netflow.c, line 419.
(gdb) c
Continuing.
[Switching to Thread 0x7f09ed273700 (LWP 19046)]

Breakpoint 1, netflow_unref (nf=0x408dbd0) at ofproto/netflow.c:419
419	            hmap_remove(&nf->flows, &nf_flow->hmap_node);
(gdb) bt
#0  netflow_unref (nf=0x408dbd0) at ofproto/netflow.c:419
#1  0x00000000004a4347 in xlate_cache_clear_netflow (netflow=0x408dbd0, flow=0x7f09dc011a90) at ofproto/ofproto-dpif-xlate-cache.c:200
#2  0x00000000004a43e2 in xlate_cache_clear_entry (entry=0x7f09dc010d08) at ofproto/ofproto-dpif-xlate-cache.c:221
#3  0x00000000004a44dc in xlate_cache_clear (xcache=0x7f09dc01ca90) at ofproto/ofproto-dpif-xlate-cache.c:262
#4  0x00000000004a451e in xlate_cache_uninit (xcache=0x7f09dc01ca90) at ofproto/ofproto-dpif-xlate-cache.c:271
#5  0x00000000004a4544 in xlate_cache_delete (xcache=0x7f09dc01ca90) at ofproto/ofproto-dpif-xlate-cache.c:278
#6  0x00000000004911d9 in ukey_delete__ (ukey=0x40987d0) at ofproto/ofproto-dpif-upcall.c:1801
#7  0x000000000056c9f5 in ovsrcu_call_postponed () at lib/ovs-rcu.c:323
#8  0x000000000056ca9b in ovsrcu_postpone_thread (arg=0x0) at lib/ovs-rcu.c:338
#9  0x000000000056efa0 in ovsthread_wrapper (aux_=0x3fa3c30) at lib/ovs-thread.c:348
#10 0x00007f09f1e44dc5 in start_thread () from /usr/lib64/libpthread.so.0
#11 0x00007f09f0c0a71d in clone () from /usr/lib64/libc.so.6
(gdb) l
414	    if (nf && ovs_refcount_unref_relaxed(&nf->ref_cnt) == 1) {
415	        atomic_count_dec(&netflow_count);
416	        collectors_destroy(nf->collectors);
417	        ofpbuf_uninit(&nf->packet);
418                     HMAP_FOR_EACH_SAFE (nf_flow, next, hmap_node, &nf->flows) {
419	            hmap_remove(&nf->flows, &nf_flow->hmap_node);
420	            free(nf_flow);
421	        }
422	        free(nf);
423	    }
(gdb) info b
Num     Type           Disp Enb Address            What
1       breakpoint     keep y   0x00000000004aeb74 in netflow_unref at ofproto/netflow.c:419
	breakpoint already hit 1 time
(gdb) p nf->flows 
$1 = {buckets = 0x408dc50, one = 0x3fd1fd0, mask = 0, n = 1}
(gdb)

> 
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Gregory Rose May 27, 2017, 6:26 p.m. UTC | #7
On Sat, 2017-05-27 at 01:43 +0000, wangyunjian wrote:
> 
> > -----Original Message-----
> > From: ovs-dev-bounces@openvswitch.org [mailto:ovs-dev-
> > bounces@openvswitch.org] On Behalf Of Greg Rose
> > Sent: Friday, May 26, 2017 11:51 PM
> > To: ovs-dev@openvswitch.org
> > Subject: Re: [ovs-dev] [PATCH v2] netflow: Fix memory leak in
> > netflow_unref.
> > 
> > On Thu, 2017-05-25 at 21:05 -0700, Greg Rose wrote:
> > > On Thu, 2017-05-25 at 09:53 -0700, Greg Rose wrote:
> > > > On Mon, 2017-05-22 at 07:40 -0700, Greg Rose wrote:
> > > > > On Mon, 2017-05-22 at 12:55 +0800, Yunjian Wang wrote:
> > > > > > The memory leak was triggered each time on calling
> > > > > > netflow_unref() with containing netflow_flows. And flows need to be
> > removed and destroyed.
> > > > > >
> > > > > > Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
> > > > > > ---
> > > > > >  ofproto/netflow.c | 7 +++++++
> > > > > >  1 file changed, 7 insertions(+)
> > > > > >
> > > > > > diff --git a/ofproto/netflow.c b/ofproto/netflow.c index
> > > > > > 55f7814..29c5f3e 100644
> > > > > > --- a/ofproto/netflow.c
> > > > > > +++ b/ofproto/netflow.c
> > > > > > @@ -409,10 +409,17 @@ netflow_ref(const struct netflow *nf_)
> > > > > > void  netflow_unref(struct netflow *nf)  {
> > > > > > +    struct netflow_flow *nf_flow, *next;
> > > > > > +
> > > > > >      if (nf && ovs_refcount_unref_relaxed(&nf->ref_cnt) == 1) {
> > > > > >          atomic_count_dec(&netflow_count);
> > > > > >          collectors_destroy(nf->collectors);
> > > > > >          ofpbuf_uninit(&nf->packet);
> > > > > > +        HMAP_FOR_EACH_SAFE (nf_flow, next, hmap_node, &nf-
> > >flows) {
> > > > > > +            hmap_remove(&nf->flows, &nf_flow->hmap_node);
> > > > > > +            free(nf_flow);
> > > > > > +        }
> > > > > > +        hmap_destroy(&nf->flows);
> > > > > >          free(nf);
> > > > > >      }
> > > > > >  }
> > > > >
> > > > > This looks right to me.  The only other place I see the flows
> > > > > freed is when they're detected as idle.  If the flow is never
> > > > > detected as idle then I don't see anywhere else that they are
> > > > > freed up after the xzalloc in netflow_flow_update().
> > > > >
> > > > > Reviewed-by: Greg Rose <gvrose8192@gmail.com>
> > > > >
> > > >
> > > > I'm trying to test this but the condition never seems to be met and
> > > > thus the 4 lines of additional code free flows never executes.  Is
> > > > there some particular flow or type of network traffic that will execute this
> > code?
> > > >
> > > > I'd like to add a Tested-by for this but unless I can get the code
> > > > to execute I can't.
> > > >
> > > > - Greg
> > >
> > > OK, I've finally got my netflow configuration working right with
> > > ntopng collecting the flows.  I can test the patch tomorrow, it's
> > > getting late for me now.
> > >
> > > Thanks,
> > >
> > > - Greg
> > 
> > Actually, ntopng doesn't work as the flow collector itself (I spoke too
> > soon) but I am successfully creating netflows which I can see with tcpdump.
> > Then we are hitting the condition net netflow_unref() when I execute this
> > command:
> > 
> > ovs-vsctl clear Bridge int-br1 netflow
> > 
> > However, the '*' marked lines of code are never executed when I place a gdb
> > breakpoint on it:
> > 
> >         HMAP_FOR_EACH_SAFE (nf_flow, next, hmap_node, &nf->flows) {
> > *            hmap_remove(&nf->flows, &nf_flow->hmap_node);
> > *            free(nf_flow);
> >         }
> > 
> > It seems the code is never reached.  Unless we can see some example of the
> > code being reached and executed I worry about this patch.
> > 
> > Thanks,
> > 
> > - Greg
> 
> It is probable that the need to test many times.
> 
> Thanks, 
> 
> Yunjian
> 
> my ovs version: openvswitch-2.7.0
> 
> Test Script:
> ovs-vsctl add-br ovs
> ovs-vsctl add-port ovs eth1
> 
> for (( i=0; i<5000000; i=i+1 )); do
> 	ovs-vsctl -- --id=@netflow create netflow targets=\"3.3.2.26:9996\" active_timeout=30 -- set bridge ovs netflow=@netflow
> 	sleep 3
> 	ovs-vsctl clear bridge ovs netflow
> 	sleep 1
> done

So we have to beat it up a little.  OK, I'll give it a try.

Thanks,

- Greg

> 
> Test results:
> (gdb) b ofproto/netflow.c:419
> Breakpoint 1 at 0x4aeb74: file ofproto/netflow.c, line 419.
> (gdb) c
> Continuing.
> [Switching to Thread 0x7f09ed273700 (LWP 19046)]
> 
> Breakpoint 1, netflow_unref (nf=0x408dbd0) at ofproto/netflow.c:419
> 419	            hmap_remove(&nf->flows, &nf_flow->hmap_node);
> (gdb) bt
> #0  netflow_unref (nf=0x408dbd0) at ofproto/netflow.c:419
> #1  0x00000000004a4347 in xlate_cache_clear_netflow (netflow=0x408dbd0, flow=0x7f09dc011a90) at ofproto/ofproto-dpif-xlate-cache.c:200
> #2  0x00000000004a43e2 in xlate_cache_clear_entry (entry=0x7f09dc010d08) at ofproto/ofproto-dpif-xlate-cache.c:221
> #3  0x00000000004a44dc in xlate_cache_clear (xcache=0x7f09dc01ca90) at ofproto/ofproto-dpif-xlate-cache.c:262
> #4  0x00000000004a451e in xlate_cache_uninit (xcache=0x7f09dc01ca90) at ofproto/ofproto-dpif-xlate-cache.c:271
> #5  0x00000000004a4544 in xlate_cache_delete (xcache=0x7f09dc01ca90) at ofproto/ofproto-dpif-xlate-cache.c:278
> #6  0x00000000004911d9 in ukey_delete__ (ukey=0x40987d0) at ofproto/ofproto-dpif-upcall.c:1801
> #7  0x000000000056c9f5 in ovsrcu_call_postponed () at lib/ovs-rcu.c:323
> #8  0x000000000056ca9b in ovsrcu_postpone_thread (arg=0x0) at lib/ovs-rcu.c:338
> #9  0x000000000056efa0 in ovsthread_wrapper (aux_=0x3fa3c30) at lib/ovs-thread.c:348
> #10 0x00007f09f1e44dc5 in start_thread () from /usr/lib64/libpthread.so.0
> #11 0x00007f09f0c0a71d in clone () from /usr/lib64/libc.so.6
> (gdb) l
> 414	    if (nf && ovs_refcount_unref_relaxed(&nf->ref_cnt) == 1) {
> 415	        atomic_count_dec(&netflow_count);
> 416	        collectors_destroy(nf->collectors);
> 417	        ofpbuf_uninit(&nf->packet);
> 418                     HMAP_FOR_EACH_SAFE (nf_flow, next, hmap_node, &nf->flows) {
> 419	            hmap_remove(&nf->flows, &nf_flow->hmap_node);
> 420	            free(nf_flow);
> 421	        }
> 422	        free(nf);
> 423	    }
> (gdb) info b
> Num     Type           Disp Enb Address            What
> 1       breakpoint     keep y   0x00000000004aeb74 in netflow_unref at ofproto/netflow.c:419
> 	breakpoint already hit 1 time
> (gdb) p nf->flows 
> $1 = {buckets = 0x408dc50, one = 0x3fd1fd0, mask = 0, n = 1}
> (gdb)
> 
> > 
> > _______________________________________________
> > dev mailing list
> > dev@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Ben Pfaff May 30, 2017, 4:47 p.m. UTC | #8
On Thu, May 25, 2017 at 01:47:32PM -0700, Joe Stringer wrote:
> On 21 May 2017 at 21:55, Yunjian Wang <wangyunjian@huawei.com> wrote:
> > The memory leak was triggered each time on calling netflow_unref() with
> > containing netflow_flows. And flows need to be removed and destroyed.
> >
> > Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
> > ---
> 
> Does this also need to netflow_expire__() the flows? Would it make
> sense for this flow clear path to share code with
> netflow_flow_clear()?

That would certainly make sense.  However, it adds a lot of extra code
to run at the time we're freeing something, which is always a little
risky.  Since Greg is having trouble testing this at all, I lean toward
just freeing without trying to send out NetFlow expirations.
Gregory Rose May 30, 2017, 5:25 p.m. UTC | #9
On Tue, 2017-05-30 at 09:47 -0700, Ben Pfaff wrote:
> On Thu, May 25, 2017 at 01:47:32PM -0700, Joe Stringer wrote:
> > On 21 May 2017 at 21:55, Yunjian Wang <wangyunjian@huawei.com> wrote:
> > > The memory leak was triggered each time on calling netflow_unref() with
> > > containing netflow_flows. And flows need to be removed and destroyed.
> > >
> > > Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
> > > ---
> > 
> > Does this also need to netflow_expire__() the flows? Would it make
> > sense for this flow clear path to share code with
> > netflow_flow_clear()?
> 
> That would certainly make sense.  However, it adds a lot of extra code
> to run at the time we're freeing something, which is always a little
> risky.  Since Greg is having trouble testing this at all, I lean toward
> just freeing without trying to send out NetFlow expirations.

I've still had no luck on testing.  Over the weekend I ran a continuous
loop as suggested by Yunjian that constantly enables and disables
netflow and didn't get a break on that line of code.  However, I'm
looking at adjusting the background network traffic to something that
should create more flows and will try again today.

- Greg

> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Gregory Rose May 31, 2017, 3:12 p.m. UTC | #10
On Sat, 2017-05-27 at 01:43 +0000, wangyunjian wrote:
> 
> > -----Original Message-----
> > From: ovs-dev-bounces@openvswitch.org [mailto:ovs-dev-
> > bounces@openvswitch.org] On Behalf Of Greg Rose
> > Sent: Friday, May 26, 2017 11:51 PM
> > To: ovs-dev@openvswitch.org
> > Subject: Re: [ovs-dev] [PATCH v2] netflow: Fix memory leak in
> > netflow_unref.
> > 
> > On Thu, 2017-05-25 at 21:05 -0700, Greg Rose wrote:
> > > On Thu, 2017-05-25 at 09:53 -0700, Greg Rose wrote:
> > > > On Mon, 2017-05-22 at 07:40 -0700, Greg Rose wrote:
> > > > > On Mon, 2017-05-22 at 12:55 +0800, Yunjian Wang wrote:
> > > > > > The memory leak was triggered each time on calling
> > > > > > netflow_unref() with containing netflow_flows. And flows need to be
> > removed and destroyed.
> > > > > >
> > > > > > Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
> > > > > > ---
> > > > > >  ofproto/netflow.c | 7 +++++++
> > > > > >  1 file changed, 7 insertions(+)
> > > > > >
> > > > > > diff --git a/ofproto/netflow.c b/ofproto/netflow.c index
> > > > > > 55f7814..29c5f3e 100644
> > > > > > --- a/ofproto/netflow.c
> > > > > > +++ b/ofproto/netflow.c
> > > > > > @@ -409,10 +409,17 @@ netflow_ref(const struct netflow *nf_)
> > > > > > void  netflow_unref(struct netflow *nf)  {
> > > > > > +    struct netflow_flow *nf_flow, *next;
> > > > > > +
> > > > > >      if (nf && ovs_refcount_unref_relaxed(&nf->ref_cnt) == 1) {
> > > > > >          atomic_count_dec(&netflow_count);
> > > > > >          collectors_destroy(nf->collectors);
> > > > > >          ofpbuf_uninit(&nf->packet);
> > > > > > +        HMAP_FOR_EACH_SAFE (nf_flow, next, hmap_node, &nf-
> > >flows) {
> > > > > > +            hmap_remove(&nf->flows, &nf_flow->hmap_node);
> > > > > > +            free(nf_flow);
> > > > > > +        }
> > > > > > +        hmap_destroy(&nf->flows);
> > > > > >          free(nf);
> > > > > >      }
> > > > > >  }
> > > > >
> > > > > This looks right to me.  The only other place I see the flows
> > > > > freed is when they're detected as idle.  If the flow is never
> > > > > detected as idle then I don't see anywhere else that they are
> > > > > freed up after the xzalloc in netflow_flow_update().
> > > > >
> > > > > Reviewed-by: Greg Rose <gvrose8192@gmail.com>
> > > > >
> > > >
> > > > I'm trying to test this but the condition never seems to be met and
> > > > thus the 4 lines of additional code free flows never executes.  Is
> > > > there some particular flow or type of network traffic that will execute this
> > code?
> > > >
> > > > I'd like to add a Tested-by for this but unless I can get the code
> > > > to execute I can't.
> > > >
> > > > - Greg
> > >
> > > OK, I've finally got my netflow configuration working right with
> > > ntopng collecting the flows.  I can test the patch tomorrow, it's
> > > getting late for me now.
> > >
> > > Thanks,
> > >
> > > - Greg
> > 
> > Actually, ntopng doesn't work as the flow collector itself (I spoke too
> > soon) but I am successfully creating netflows which I can see with tcpdump.
> > Then we are hitting the condition net netflow_unref() when I execute this
> > command:
> > 
> > ovs-vsctl clear Bridge int-br1 netflow
> > 
> > However, the '*' marked lines of code are never executed when I place a gdb
> > breakpoint on it:
> > 
> >         HMAP_FOR_EACH_SAFE (nf_flow, next, hmap_node, &nf->flows) {
> > *            hmap_remove(&nf->flows, &nf_flow->hmap_node);
> > *            free(nf_flow);
> >         }
> > 
> > It seems the code is never reached.  Unless we can see some example of the
> > code being reached and executed I worry about this patch.
> > 
> > Thanks,
> > 
> > - Greg
> 
> It is probable that the need to test many times.
> 
> Thanks, 
> 
> Yunjian
> 
> my ovs version: openvswitch-2.7.0
> 
> Test Script:
> ovs-vsctl add-br ovs
> ovs-vsctl add-port ovs eth1
> 
> for (( i=0; i<5000000; i=i+1 )); do
> 	ovs-vsctl -- --id=@netflow create netflow targets=\"3.3.2.26:9996\" active_timeout=30 -- set bridge ovs netflow=@netflow
> 	sleep 3
> 	ovs-vsctl clear bridge ovs netflow
> 	sleep 1
> done
> 
> Test results:
> (gdb) b ofproto/netflow.c:419
> Breakpoint 1 at 0x4aeb74: file ofproto/netflow.c, line 419.
> (gdb) c
> Continuing.
> [Switching to Thread 0x7f09ed273700 (LWP 19046)]
> 
> Breakpoint 1, netflow_unref (nf=0x408dbd0) at ofproto/netflow.c:419
> 419	            hmap_remove(&nf->flows, &nf_flow->hmap_node);
> (gdb) bt
> #0  netflow_unref (nf=0x408dbd0) at ofproto/netflow.c:419
> #1  0x00000000004a4347 in xlate_cache_clear_netflow (netflow=0x408dbd0, flow=0x7f09dc011a90) at ofproto/ofproto-dpif-xlate-cache.c:200
> #2  0x00000000004a43e2 in xlate_cache_clear_entry (entry=0x7f09dc010d08) at ofproto/ofproto-dpif-xlate-cache.c:221
> #3  0x00000000004a44dc in xlate_cache_clear (xcache=0x7f09dc01ca90) at ofproto/ofproto-dpif-xlate-cache.c:262
> #4  0x00000000004a451e in xlate_cache_uninit (xcache=0x7f09dc01ca90) at ofproto/ofproto-dpif-xlate-cache.c:271
> #5  0x00000000004a4544 in xlate_cache_delete (xcache=0x7f09dc01ca90) at ofproto/ofproto-dpif-xlate-cache.c:278
> #6  0x00000000004911d9 in ukey_delete__ (ukey=0x40987d0) at ofproto/ofproto-dpif-upcall.c:1801
> #7  0x000000000056c9f5 in ovsrcu_call_postponed () at lib/ovs-rcu.c:323
> #8  0x000000000056ca9b in ovsrcu_postpone_thread (arg=0x0) at lib/ovs-rcu.c:338
> #9  0x000000000056efa0 in ovsthread_wrapper (aux_=0x3fa3c30) at lib/ovs-thread.c:348
> #10 0x00007f09f1e44dc5 in start_thread () from /usr/lib64/libpthread.so.0
> #11 0x00007f09f0c0a71d in clone () from /usr/lib64/libc.so.6
> (gdb) l
> 414	    if (nf && ovs_refcount_unref_relaxed(&nf->ref_cnt) == 1) {
> 415	        atomic_count_dec(&netflow_count);
> 416	        collectors_destroy(nf->collectors);
> 417	        ofpbuf_uninit(&nf->packet);
> 418                     HMAP_FOR_EACH_SAFE (nf_flow, next, hmap_node, &nf->flows) {
> 419	            hmap_remove(&nf->flows, &nf_flow->hmap_node);
> 420	            free(nf_flow);
> 421	        }
> 422	        free(nf);
> 423	    }
> (gdb) info b
> Num     Type           Disp Enb Address            What
> 1       breakpoint     keep y   0x00000000004aeb74 in netflow_unref at ofproto/netflow.c:419
> 	breakpoint already hit 1 time
> (gdb) p nf->flows 
> $1 = {buckets = 0x408dc50, one = 0x3fd1fd0, mask = 0, n = 1}
> (gdb)

I haven't been able to hit this breakpoint in my own setup after running
this test for several days.  It may be due to different types of traffic
or some other subtle difference.  I'm running iperf sessions with
multiple ports/ip address combinations to create a lot of netflows and
netflow updates but perhaps that is not enough to trigger the right
condition.

In any case, the code looks correct as per my review and Yunjian has a
reproducer so I'll leave it to the maintainers as to what action they
will take.  I know that there were some further comments from Joe and
Ben.

Thanks,

- Greg

> 
> > 
> > _______________________________________________
> > dev mailing list
> > dev@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Joe Stringer May 31, 2017, 6:16 p.m. UTC | #11
On 30 May 2017 at 09:47, Ben Pfaff <blp@ovn.org> wrote:
> On Thu, May 25, 2017 at 01:47:32PM -0700, Joe Stringer wrote:
>> On 21 May 2017 at 21:55, Yunjian Wang <wangyunjian@huawei.com> wrote:
>> > The memory leak was triggered each time on calling netflow_unref() with
>> > containing netflow_flows. And flows need to be removed and destroyed.
>> >
>> > Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
>> > ---
>>
>> Does this also need to netflow_expire__() the flows? Would it make
>> sense for this flow clear path to share code with
>> netflow_flow_clear()?
>
> That would certainly make sense.  However, it adds a lot of extra code
> to run at the time we're freeing something, which is always a little
> risky.  Since Greg is having trouble testing this at all, I lean toward
> just freeing without trying to send out NetFlow expirations.

This seems fine to me. It looks like the biggest issue is the bug
that's fixed by the patch, which seems correct to me so I'd be happy
if this was applied as-is.
Ben Pfaff May 31, 2017, 6:55 p.m. UTC | #12
On Wed, May 31, 2017 at 11:16:10AM -0700, Joe Stringer wrote:
> On 30 May 2017 at 09:47, Ben Pfaff <blp@ovn.org> wrote:
> > On Thu, May 25, 2017 at 01:47:32PM -0700, Joe Stringer wrote:
> >> On 21 May 2017 at 21:55, Yunjian Wang <wangyunjian@huawei.com> wrote:
> >> > The memory leak was triggered each time on calling netflow_unref() with
> >> > containing netflow_flows. And flows need to be removed and destroyed.
> >> >
> >> > Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
> >> > ---
> >>
> >> Does this also need to netflow_expire__() the flows? Would it make
> >> sense for this flow clear path to share code with
> >> netflow_flow_clear()?
> >
> > That would certainly make sense.  However, it adds a lot of extra code
> > to run at the time we're freeing something, which is always a little
> > risky.  Since Greg is having trouble testing this at all, I lean toward
> > just freeing without trying to send out NetFlow expirations.
> 
> This seems fine to me. It looks like the biggest issue is the bug
> that's fixed by the patch, which seems correct to me so I'd be happy
> if this was applied as-is.

I think that's what I'm going to do.
Ben Pfaff May 31, 2017, 7 p.m. UTC | #13
On Mon, May 22, 2017 at 12:55:00PM +0800, Yunjian Wang wrote:
> The memory leak was triggered each time on calling netflow_unref() with
> containing netflow_flows. And flows need to be removed and destroyed.
> 
> Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>

Thanks, applied to master, branch-2.7, and branch-2.6.
diff mbox

Patch

diff --git a/ofproto/netflow.c b/ofproto/netflow.c
index 55f7814..29c5f3e 100644
--- a/ofproto/netflow.c
+++ b/ofproto/netflow.c
@@ -409,10 +409,17 @@  netflow_ref(const struct netflow *nf_)
 void
 netflow_unref(struct netflow *nf)
 {
+    struct netflow_flow *nf_flow, *next;
+
     if (nf && ovs_refcount_unref_relaxed(&nf->ref_cnt) == 1) {
         atomic_count_dec(&netflow_count);
         collectors_destroy(nf->collectors);
         ofpbuf_uninit(&nf->packet);
+        HMAP_FOR_EACH_SAFE (nf_flow, next, hmap_node, &nf->flows) {
+            hmap_remove(&nf->flows, &nf_flow->hmap_node);
+            free(nf_flow);
+        }
+        hmap_destroy(&nf->flows);
         free(nf);
     }
 }