diff mbox

Fix guest memory leak and panic

Message ID 20111018080523.16861.55402.sendpatchset@krkumar2.in.ibm.com
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Krishna Kumar Oct. 18, 2011, 8:05 a.m. UTC
Commit 86ee8130 ("virtionet: convert to SKB paged frag API")
introduced a bug in guest. During RX testing, guest runs out
of memory within seconds, causing oom-killer; which then
panics the system: "Kernel panic - not syncing: Out of memory
and no killable processes...". /proc/meminfo just before the
panic shows MemFree is a few MB's:

	MemFree:         1928544 kB	(starts here)
	...
	...
	MemFree:           27488 kB
	MemFree:           26248 kB
	MemFree:           24636 kB
	MemFree:           22632 kB
	MemFree:           19580 kB
	MemFree:           17928 kB
	MemFree:           15548 kB
		(Panic)

The extra reference to the fragment pages causes those pages to
not get freed in skb_release_data(). The following patch fixes
the bug. I have not checked if any other converted driver has
the same issue.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
---
 drivers/net/virtio_net.c |   13 +++++--------
 1 file changed, 5 insertions(+), 8 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Ian Campbell Oct. 18, 2011, 8:36 a.m. UTC | #1
On Tue, 2011-10-18 at 09:05 +0100, Krishna Kumar wrote:
> The extra reference to the fragment pages causes those pages to
> not get freed in skb_release_data(). The following patch fixes
> the bug. I have not checked if any other converted driver has
> the same issue.

Damn. You are completely correct and I appear to have made this same
mistake several times. A quick look suggests that at least cxbg,
myriage, vmxnet, cassini and bnx2 may potentially have a similar
issue :-( (I stopped looking at that point, I'll obviously do a full
audit).

I considered quite carefully whether (__)skb_frag_set_page should take a
reference or not and decided yes but I'm starting to reconsider whether
I made the right choice. It seems that is just confusing and violates
the principal of least surprise to have a function called "set" take a
new reference. In reality all existing drivers expect that adding a page
to an SKB frag will just take over the existing reference.

I think the best thing might be to remove the additional ref taking from
the setter function and audit the previous changes to ensure they
conform. I'll do that right away and post a fixup patch ASAP.

> Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>

This change is correct as things stand today, so:

Acked-by: Ian Campbell <ian.campbell@citrix.com>

but perhaps it would be better to hold off and let me fix all of these
all at once.

Thanks for bringing this to my attention Krishna.

Ian.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Krishna Kumar Oct. 18, 2011, 10:28 a.m. UTC | #2
Ian Campbell <Ian.Campbell@citrix.com> wrote on 10/18/2011 02:06:56 PM:

Hi Ian,

> On Tue, 2011-10-18 at 09:05 +0100, Krishna Kumar wrote:
> > The extra reference to the fragment pages causes those pages to
> > not get freed in skb_release_data(). The following patch fixes
> > the bug. I have not checked if any other converted driver has
> > the same issue.
>
> Damn. You are completely correct and I appear to have made this same
> mistake several times. A quick look suggests that at least cxbg,
> myriage, vmxnet, cassini and bnx2 may potentially have a similar
> issue :-( (I stopped looking at that point, I'll obviously do a full
> audit).
>
> I considered quite carefully whether (__)skb_frag_set_page should take a
> reference or not and decided yes but I'm starting to reconsider whether
> I made the right choice. It seems that is just confusing and violates
> the principal of least surprise to have a function called "set" take a
> new reference. In reality all existing drivers expect that adding a page
> to an SKB frag will just take over the existing reference.
>
> I think the best thing might be to remove the additional ref taking from
> the setter function and audit the previous changes to ensure they
> conform. I'll do that right away and post a fixup patch ASAP.
>
> > Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
>
> This change is correct as things stand today, so:
>
> Acked-by: Ian Campbell <ian.campbell@citrix.com>
>
> but perhaps it would be better to hold off and let me fix all of these
> all at once.

Either way is fine with me. However, besides having a fix,
I would like to remove the manual initializations in
set_skb_frag(), and substitute with __skb_fill_page_desc,
like other drivers do.

thanks,

- KK

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Oct. 18, 2011, 7:12 p.m. UTC | #3
From: Krishna Kumar <krkumar2@in.ibm.com>
Date: Tue, 18 Oct 2011 13:35:23 +0530

> Commit 86ee8130 ("virtionet: convert to SKB paged frag API")
> introduced a bug in guest. During RX testing, guest runs out
> of memory within seconds, causing oom-killer; which then
> panics the system: "Kernel panic - not syncing: Out of memory
> and no killable processes...". /proc/meminfo just before the
> panic shows MemFree is a few MB's:
> 
> 	MemFree:         1928544 kB	(starts here)
> 	...
> 	...
> 	MemFree:           27488 kB
> 	MemFree:           26248 kB
> 	MemFree:           24636 kB
> 	MemFree:           22632 kB
> 	MemFree:           19580 kB
> 	MemFree:           17928 kB
> 	MemFree:           15548 kB
> 		(Panic)
> 
> The extra reference to the fragment pages causes those pages to
> not get freed in skb_release_data(). The following patch fixes
> the bug. I have not checked if any other converted driver has
> the same issue.
> 
> Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>

I'll wait for Ian's full audit, but Krishna please use more appropriate
subject lines in future patch submissions.

This patch is fixing a problem in the virtio_net driver, so please
mention that: "Subject: [PATCH] virtio_net: Fix guest memory leak and panic"
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ian Campbell Oct. 19, 2011, 6:12 a.m. UTC | #4
On Tue, 2011-10-18 at 15:12 -0400, David Miller wrote:
> I'll wait for Ian's full audit,

The result of that is in
<1318931232.16132.65.camel@zakaz.uk.xensource.com> in this thread.

Thanks,
Ian.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff -ruNp org/drivers/net/virtio_net.c new/drivers/net/virtio_net.c
--- org/drivers/net/virtio_net.c	2011-10-18 08:49:46.000000000 +0530
+++ new/drivers/net/virtio_net.c	2011-10-18 12:55:32.000000000 +0530
@@ -143,18 +143,15 @@  static void skb_xmit_done(struct virtque
 static void set_skb_frag(struct sk_buff *skb, struct page *page,
 			 unsigned int offset, unsigned int *len)
 {
+	int size = min((unsigned)PAGE_SIZE - offset, *len);
 	int i = skb_shinfo(skb)->nr_frags;
-	skb_frag_t *f;
 
-	f = &skb_shinfo(skb)->frags[i];
-	f->size = min((unsigned)PAGE_SIZE - offset, *len);
-	f->page_offset = offset;
-	__skb_frag_set_page(f, page);
+	__skb_fill_page_desc(skb, i, page, offset, size);
 
-	skb->data_len += f->size;
-	skb->len += f->size;
+	skb->data_len += size;
+	skb->len += size;
 	skb_shinfo(skb)->nr_frags++;
-	*len -= f->size;
+	*len -= size;
 }
 
 static struct sk_buff *page_to_skb(struct virtnet_info *vi,