diff mbox

Kernel memory leak in bnx2x driver with vxlan tunnel

Message ID 1453334825.1223.332.camel@edumazet-glaptop2.roam.corp.google.com
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Eric Dumazet Jan. 21, 2016, 12:07 a.m. UTC
On Wed, 2016-01-20 at 16:43 -0700, John wrote:
> 
> On 01/19/2016 06:31 PM, Thomas Graf wrote:
> > On 01/19/16 at 04:51pm, Jesse Gross wrote:
> >> On Tue, Jan 19, 2016 at 4:17 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >>> So what is the purpose of having a dst if we need to drop it ?
> >>>
> >>> Adding code in GRO would be fine if someone explains me the purpose of
> >>> doing apparently useless work.
> >>>
> >>> (refcounting on dst is not exactly free)
> >> In the GRO case, the dst is only dropped on the packets which have
> >> been merged and therefore need to be freed (the GRO_MERGED_FREE case).
> >> It's not being thrown away for the overall frame, just metadata that
> >> has been duplicated on each individual frame, similar to the metadata
> >> in struct sk_buff itself. And while it is not used by the IP stack
> >> there are other consumers (eBPF/OVS/etc.). This entire process is
> >> controlled by the COLLECT_METADATA flag on tunnels, so there is no
> >> cost in situations where it is not actually used.
> > Right. There were thoughts around leveraging a per CPU scratch
> > buffer without a refcount and turn it into a full reference when
> > the packet gets enqueued somewhere but the need hasn't really come
> > up yet.
> >
> > Jesse, is this what you have in mind:
> >
> > diff --git a/net/core/dev.c b/net/core/dev.c
> > index cc9e365..3a5e96d 100644
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -4548,9 +4548,10 @@ static gro_result_t napi_skb_finish(gro_result_t ret, struct sk_buff *skb)
> >                  break;
> >   
> >          case GRO_MERGED_FREE:
> > -               if (NAPI_GRO_CB(skb)->free == NAPI_GRO_FREE_STOLEN_HEAD)
> > +               if (NAPI_GRO_CB(skb)->free == NAPI_GRO_FREE_STOLEN_HEAD) {
> > +                       skb_release_head_state(skb);
> >                          kmem_cache_free(skbuff_head_cache, skb);
> > -               else
> > +               } else
> >                          __kfree_skb(skb);
> >                  break;
> So I've tested the below patch (same as one above with minor 
> modifications made to make it compile) and it worked - no memory leak. 
> Should I submit this or...?

Unfortunately fix is not complete.

As someone mentioned, GRO should not aggregate packets having different
dst.

This part is hard to achieve, as a pointer comparison wont be enough :
Each skb has its own meta dst allocation.

Quite frankly, I would rather disable GRO for packets with a dst,
instead of making GRO dog slow.

Comments

Jesse Gross Jan. 21, 2016, 12:19 a.m. UTC | #1
On Wed, Jan 20, 2016 at 4:07 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Wed, 2016-01-20 at 16:43 -0700, John wrote:
>>
>> On 01/19/2016 06:31 PM, Thomas Graf wrote:
>> > On 01/19/16 at 04:51pm, Jesse Gross wrote:
>> >> On Tue, Jan 19, 2016 at 4:17 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> >>> So what is the purpose of having a dst if we need to drop it ?
>> >>>
>> >>> Adding code in GRO would be fine if someone explains me the purpose of
>> >>> doing apparently useless work.
>> >>>
>> >>> (refcounting on dst is not exactly free)
>> >> In the GRO case, the dst is only dropped on the packets which have
>> >> been merged and therefore need to be freed (the GRO_MERGED_FREE case).
>> >> It's not being thrown away for the overall frame, just metadata that
>> >> has been duplicated on each individual frame, similar to the metadata
>> >> in struct sk_buff itself. And while it is not used by the IP stack
>> >> there are other consumers (eBPF/OVS/etc.). This entire process is
>> >> controlled by the COLLECT_METADATA flag on tunnels, so there is no
>> >> cost in situations where it is not actually used.
>> > Right. There were thoughts around leveraging a per CPU scratch
>> > buffer without a refcount and turn it into a full reference when
>> > the packet gets enqueued somewhere but the need hasn't really come
>> > up yet.
>> >
>> > Jesse, is this what you have in mind:
>> >
>> > diff --git a/net/core/dev.c b/net/core/dev.c
>> > index cc9e365..3a5e96d 100644
>> > --- a/net/core/dev.c
>> > +++ b/net/core/dev.c
>> > @@ -4548,9 +4548,10 @@ static gro_result_t napi_skb_finish(gro_result_t ret, struct sk_buff *skb)
>> >                  break;
>> >
>> >          case GRO_MERGED_FREE:
>> > -               if (NAPI_GRO_CB(skb)->free == NAPI_GRO_FREE_STOLEN_HEAD)
>> > +               if (NAPI_GRO_CB(skb)->free == NAPI_GRO_FREE_STOLEN_HEAD) {
>> > +                       skb_release_head_state(skb);
>> >                          kmem_cache_free(skbuff_head_cache, skb);
>> > -               else
>> > +               } else
>> >                          __kfree_skb(skb);
>> >                  break;
>> So I've tested the below patch (same as one above with minor
>> modifications made to make it compile) and it worked - no memory leak.
>> Should I submit this or...?
>
> Unfortunately fix is not complete.
>
> As someone mentioned, GRO should not aggregate packets having different
> dst.
>
> This part is hard to achieve, as a pointer comparison wont be enough :
> Each skb has its own meta dst allocation.
>
> Quite frankly, I would rather disable GRO for packets with a dst,
> instead of making GRO dog slow.

I have a patch that implements the comparison between dsts. For
packets without a dst, there isn't really a cost and if we do have a
dst then GRO is still a benefit. So it seems like it is worth doing,
even if it is more expensive than is ideal.
Eric Dumazet Jan. 21, 2016, 12:34 a.m. UTC | #2
On Wed, 2016-01-20 at 16:19 -0800, Jesse Gross wrote:

> I have a patch that implements the comparison between dsts. For
> packets without a dst, there isn't really a cost and if we do have a
> dst then GRO is still a benefit. So it seems like it is worth doing,
> even if it is more expensive than is ideal.

You guys really want to kill GRO performance.

Really the aggregation should happen at the first layer (ethernet
device), instead of doing it after tunnel decap.

This is already done for GRE, IPIP, SIT, ...

GRO having to access metadata looks wrong, if you think about trying to
do the same function in hardware (offload)
Thomas Graf Jan. 21, 2016, 2:27 a.m. UTC | #3
On 01/20/16 at 04:34pm, Eric Dumazet wrote:
> On Wed, 2016-01-20 at 16:19 -0800, Jesse Gross wrote:
> 
> > I have a patch that implements the comparison between dsts. For
> > packets without a dst, there isn't really a cost and if we do have a
> > dst then GRO is still a benefit. So it seems like it is worth doing,
> > even if it is more expensive than is ideal.
> 
> You guys really want to kill GRO performance.
> 
> Really the aggregation should happen at the first layer (ethernet
> device), instead of doing it after tunnel decap.
> 
> This is already done for GRE, IPIP, SIT, ...
> 
> GRO having to access metadata looks wrong, if you think about trying to
> do the same function in hardware (offload)

If I understand Jesse correctly then the added cost is only for
metadata enabled packets. Though I agree, what's the benefit of
doing GRO after decap?

It seems like it's way too late and we've already paid the cost
by going through the stack for each outer header packet.
diff mbox

Patch

diff --git a/include/net/gro_cells.h b/include/net/gro_cells.h
index cf6c74550baa..124b8a5537e3 100644
--- a/include/net/gro_cells.h
+++ b/include/net/gro_cells.h
@@ -19,7 +19,10 @@  static inline void gro_cells_receive(struct gro_cells *gcells, struct sk_buff *s
 	struct gro_cell *cell;
 	struct net_device *dev = skb->dev;
 
-	if (!gcells->cells || skb_cloned(skb) || !(dev->features & NETIF_F_GRO)) {
+	if (!gcells->cells ||
+	    skb->_skb_refdst ||
+	    skb_cloned(skb) ||
+	    !(dev->features & NETIF_F_GRO)) {
 		netif_rx(skb);
 		return;
 	}