Patchwork [2.6.35-rc1] page alloc failure order:1, mode:0x4020

login
register
mail settings
Submitter Eric Dumazet
Date June 4, 2010, 12:53 p.m.
Message ID <1275656014.2482.169.camel@edumazet-laptop>
Download mbox | patch
Permalink /patch/54575/
State Not Applicable
Delegated to: David Miller
Headers show

Comments

Eric Dumazet - June 4, 2010, 12:53 p.m.
Le vendredi 04 juin 2010 à 11:20 +0200, Michael Guntsche a écrit :
> Hi list,
> 
> Testing 2.6.35-rc1 on my powerpc based routerboard I saw the following page allocation
> error happening during an apt-get update with a semi loaded wlan
> interface
> 
> [309611.189267] __alloc_pages_slowpath: 52 callbacks suppressed
> [309611.194959] gzip: page allocation failure. order:1, mode:0x4020
> [309611.200981] Call Trace:
> [309611.203547] [c399bc50] [c0008144] show_stack+0x48/0x15c (unreliable)
> [309611.210041] [c399bc80] [c006268c] __alloc_pages_nodemask+0x3d4/0x52c
> [309611.216512] [c399bd20] [c008619c] __slab_alloc+0x560/0x570
> [309611.222111] [c399bd60] [c0086a98] __kmalloc_track_caller+0xd4/0x104
> [309611.228505] [c399bd80] [c01dd220] __alloc_skb+0x64/0x124
> [309611.233944] [c399bda0] [c994e034] ath_rxbuf_alloc+0x34/0xbc [ath]
> [309611.240178] [c399bdc0] [c9a1ec9c] ath_rx_tasklet+0x480/0x7c4 [ath9k]
> [309611.246658] [c399be80] [c9a1dae0] ath9k_tasklet+0x114/0x13c [ath9k]
> [309611.253055] [c399bea0] [c002532c] tasklet_action+0x88/0x104
> [309611.258746] [c399bec0] [c0025e30] __do_softirq+0xb4/0x134
> [309611.264261] [c399bf00] [c0005ec4] do_softirq+0x58/0x5c
> [309611.269514] [c399bf10] [c0025c20] irq_exit+0x7c/0x9c
> [309611.274591] [c399bf20] [c0005f64] do_IRQ+0x9c/0xb4
> [309611.279509] [c399bf40] [c00117d8] ret_from_except+0x0/0x14
> [309611.285112] --- Exception: 501 at 0xff31f0c
> [309611.285121]     LR = 0xff32548
> [309611.292536] Mem-Info:
> [309611.294899] DMA per-cpu:
> [309611.297528] CPU    0: hi:   42, btch:   7 usd:  18
> [309611.302444] active_anon:1040 inactive_anon:1160 isolated_anon:0
> [309611.302455]  active_file:14871 inactive_file:9440 isolated_file:0
> [309611.302467]  unevictable:491 dirty:1258 writeback:0 unstable:0
> [309611.302478]  free:628 slab_reclaimable:832 slab_unreclaimable:2312
> [309611.302490]  mapped:2254 shmem:36 pagetables:202 bounce:0
> [309611.332409] DMA free:2512kB min:1440kB low:1800kB high:2160kB active_anon:4160kB inactive_anon:4640kB active_file:59484kB inactive_file:37760kB unevictable:1964kB isolated(anon):0kB isolated(file):0kB present:130048kB mlocked:1964kB dirty:5032kB writeback:0kB mapped:9016kB shmem:144kB slab_reclaimable:3328kB slab_unreclaimable:9248kB kernel_stack:528kB pagetables:808kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
> [309611.372230] lowmem_reserve[]: 0 0 0
> [309611.375835] DMA: 596*4kB 14*8kB 1*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 2512kB
> [309611.386215] 24770 total pagecache pages
> [309611.390147] 0 pages in swap cache
> [309611.393559] Swap cache stats: add 0, delete 0, find 0/0
> [309611.398884] Free swap  = 0kB
> [309611.401857] Total swap = 0kB
> [309611.411877] 32768 pages RAM
> [309611.414765] 1228 pages reserved
> [309611.418000] 27690 pages shared
> [309611.421147] 8802 pages non-shared
> [309611.424560] SLUB: Unable to allocate memory on node -1 (gfp=0x20)
> [309611.430764]   cache: kmalloc-8192, object size: 8192, buffer size: 8192, default order: 3, min order: 1
> [309611.440276]   node 0: slabs: 155, objs: 620, free: 0
> [309611.445439] skbuff alloc of size 3872 failed

order-1 allocations are unfortunate, since this hardware should use
order-0 ones if possible, and it seems it was its goal.

3872 (0xF20) comes from 

#define IEEE80211_MAX_MPDU_LEN     (3840 + FCS_LEN +
	(IEEE80211_WEP_IVLEN +  \
	IEEE80211_WEP_KIDLEN + \
	IEEE80211_WEP_CRCLEN))

common->rx_bufsize = roundup(IEEE80211_MAX_MPDU_LEN +
	ah->caps.rx_status_len,
	min(common->cachelsz, (u16)64));

Then __dev_alloc_skb() adds two more blocs :

NET_SKB_PAD  (64 bytes on your platform ?)

sizeof(struct skb_shared_info) 
(on 32bit : 0x104 ... oh well that might be the problem : it is rounded
to 0x140)


And ath driver adds common->cachelsz  (I dont know its value)

-> more than 4096 bytes

1) Maybe rx_bufsize should not include the roundup() since 
ath_rxbuf_alloc() also do an alignment adjustment ?

2) We should try to reduce skb_shared_info by four bytes.

Could you try this patch ?


We make sure rx_bufsize + various overhead <= PAGE_SIZE
But I am not sure its legal for the hardware...



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michael Guntsche - June 4, 2010, 4:16 p.m.
On 2010.06.04 14:53:34 , Eric Dumazet wrote:
> order-1 allocations are unfortunate, since this hardware should use
> order-0 ones if possible, and it seems it was its goal.
> 
> 3872 (0xF20) comes from 
<snip>
> 
> 1) Maybe rx_bufsize should not include the roundup() since 
> ath_rxbuf_alloc() also do an alignment adjustment ?
> 
> 2) We should try to reduce skb_shared_info by four bytes.
> 
> Could you try this patch ?
> 
> 
> We make sure rx_bufsize + various overhead <= PAGE_SIZE
> But I am not sure its legal for the hardware...

I applied the patch recompiled and run it on the routerboard, trying
to trigger the bug again.

Kind regards,
Michael
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michael Guntsche - June 6, 2010, 9:56 a.m.
On 2010.06.04 18:16:44 , Michael Guntsche wrote:
> I applied the patch recompiled and run it on the routerboard, trying
> to trigger the bug again.

Hi Eric,

Up to now I was not able to reproduce the bug, do you think this patch
can be pushed to mainline or is there a "better"/other  fix for it?

Kind regards,
Michael


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet - June 6, 2010, 10:42 a.m.
Le dimanche 06 juin 2010 à 11:56 +0200, Michael Guntsche a écrit :
> On 2010.06.04 18:16:44 , Michael Guntsche wrote:
> > I applied the patch recompiled and run it on the routerboard, trying
> > to trigger the bug again.
> 
> Hi Eric,
> 
> Up to now I was not able to reproduce the bug, do you think this patch
> can be pushed to mainline or is there a "better"/other  fix for it?
> 
> Kind regards,
> Michael
> 
> 

Thanks Michael for testing.

I'll submit ASAP an official patch, sent to all people involved in this
driver to get their Ack (or Nack).

IEEE80211_MAX_MPDU_LEN being 3840 + somebits is suspect, since it doesnt
match 802.11 specs.

It should be more close of 2304 + MAC header (32bytes) + FCS (4 bytes) ?



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hello.

I am running 2.5.35.3 with the above patch, and I still get these
failures. Though they are much less often than without the patch. A
snippet from dmesg below.

Please let me know what other details I should provide. Thanks

skbuff alloc of size 3872 failed
java: page allocation failure. order:1, mode:0x4020
Pid: 11464, comm: java Not tainted 2.6.35.3 #3
Call Trace:
 [<c0243d26>] ? __alloc_pages_nodemask+0x3e6/0x513
 [<c025c293>] ? __slab_alloc+0x2d7/0x2eb
 [<c025c946>] ? __kmalloc_track_caller+0x74/0x95
 [<d09e801a>] ? ath_rxbuf_alloc+0x1a/0x78 [ath]
 [<d09e801a>] ? ath_rxbuf_alloc+0x1a/0x78 [ath]
 [<c0334097>] ? __alloc_skb+0x57/0x100
 [<d09e801a>] ? ath_rxbuf_alloc+0x1a/0x78 [ath]
 [<d0af4100>] ? ath_rx_tasklet+0x2fb/0x808 [ath9k]
 [<d0cbc89f>] ? br_handle_frame+0x1b3/0x1c3 [bridge]
 [<c033d18a>] ? __netif_receive_skb+0x141/0x25f
 [<d0af23c1>] ? ath9k_tasklet+0xcc/0x107 [ath9k]
 [<c02195bf>] ? tasklet_action+0x5f/0x65
 [<c0219873>] ? __do_softirq+0x60/0xc6
 [<c0219907>] ? do_softirq+0x2e/0x30
 [<c02199f9>] ? irq_exit+0x53/0x55
 [<c020392c>] ? do_IRQ+0x3a/0x72
 [<c0202be9>] ? common_interrupt+0x29/0x30
Mem-Info:
DMA per-cpu:
CPU    0: hi:    0, btch:   1 usd:   0
Normal per-cpu:
CPU    0: hi:   90, btch:  15 usd:  32
active_anon:6158 inactive_anon:15748 isolated_anon:0
 active_file:11477 inactive_file:22926 isolated_file:0
 unevictable:468 dirty:5788 writeback:0 unstable:0
 free:990 slab_reclaimable:3091 slab_unreclaimable:2349
 mapped:1410 shmem:5 pagetables:227 bounce:0
DMA free:1000kB min:124kB low:152kB high:184kB active_anon:1304kB
inactive_anon:2304kB active_file:2584kB inactive_file:4900kB
unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15864kB
mlocked:0kB dirty:0kB writeback:0kB mapped:284kB shmem:0kB
slab_reclaimable:648kB slab_unreclaimable:752kB kernel_stack:152kB
pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB
pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 238 238
Normal free:2960kB min:1908kB low:2384kB high:2860kB
active_anon:23328kB inactive_anon:60688kB active_file:43324kB
inactive_file:86804kB unevictable:1872kB isolated(anon):0kB
isolated(file):0kB present:243840kB mlocked:1872kB dirty:23152kB
writeback:0kB mapped:5356kB shmem:20kB slab_reclaimable:11716kB
slab_unreclaimable:8644kB kernel_stack:888kB pagetables:908kB
unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0
all_unreclaimable? no
lowmem_reserve[]: 0 0 0
DMA: 76*4kB 29*8kB 5*16kB 0*32kB 0*64kB 1*128kB 1*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 1000kB
Normal: 650*4kB 9*8kB 2*16kB 8*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 2960kB
37964 total pagecache pages
3160 pages in swap cache
Swap cache stats: add 164954, delete 161794, find 97931/115609
Free swap  = 937284kB
Total swap = 963896kB
65535 pages RAM
1237 pages reserved
29042 pages shared
38802 pages non-shared
SLUB: Unable to allocate memory on node -1 (gfp=0x20)
  cache: kmalloc-8192, object size: 8192, buffer size: 8192, default
order: 3, min order: 1
  node 0: slabs: 0, objs: 0, free: 0
skbuff alloc of size 3872 failed
java: page allocation failure. order:1, mode:0x4020
Pid: 11464, comm: java Not tainted 2.6.35.3 #3
Call Trace:
 [<c0243d26>] ? __alloc_pages_nodemask+0x3e6/0x513
 [<c025c293>] ? __slab_alloc+0x2d7/0x2eb
 [<c025c946>] ? __kmalloc_track_caller+0x74/0x95
 [<d09e801a>] ? ath_rxbuf_alloc+0x1a/0x78 [ath]
 [<d09e801a>] ? ath_rxbuf_alloc+0x1a/0x78 [ath]
 [<c0334097>] ? __alloc_skb+0x57/0x100
 [<d09e801a>] ? ath_rxbuf_alloc+0x1a/0x78 [ath]
 [<d0af4100>] ? ath_rx_tasklet+0x2fb/0x808 [ath9k]
 [<d0cbc89f>] ? br_handle_frame+0x1b3/0x1c3 [bridge]
 [<c033d18a>] ? __netif_receive_skb+0x141/0x25f
 [<d0af23c1>] ? ath9k_tasklet+0xcc/0x107 [ath9k]
 [<c02195bf>] ? tasklet_action+0x5f/0x65
 [<c0219873>] ? __do_softirq+0x60/0xc6
 [<c0219907>] ? do_softirq+0x2e/0x30
 [<c02199f9>] ? irq_exit+0x53/0x55
 [<c020392c>] ? do_IRQ+0x3a/0x72
 [<c0202be9>] ? common_interrupt+0x29/0x30
Mem-Info:
DMA per-cpu:
CPU    0: hi:    0, btch:   1 usd:   0
Normal per-cpu:
CPU    0: hi:   90, btch:  15 usd:  32
active_anon:6158 inactive_anon:15748 isolated_anon:0
 active_file:11477 inactive_file:22926 isolated_file:0
 unevictable:468 dirty:5788 writeback:0 unstable:0
 free:990 slab_reclaimable:3091 slab_unreclaimable:2349
 mapped:1410 shmem:5 pagetables:227 bounce:0
DMA free:1000kB min:124kB low:152kB high:184kB active_anon:1304kB
inactive_anon:2304kB active_file:2584kB inactive_file:4900kB
unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15864kB
mlocked:0kB dirty:0kB writeback:0kB mapped:284kB shmem:0kB
slab_reclaimable:648kB slab_unreclaimable:752kB kernel_stack:152kB
pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB
pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 238 238
Normal free:2960kB min:1908kB low:2384kB high:2860kB
active_anon:23328kB inactive_anon:60688kB active_file:43324kB
inactive_file:86804kB unevictable:1872kB isolated(anon):0kB
isolated(file):0kB present:243840kB mlocked:1872kB dirty:23152kB
writeback:0kB mapped:5356kB shmem:20kB slab_reclaimable:11716kB
slab_unreclaimable:8644kB kernel_stack:888kB pagetables:908kB
unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0
all_unreclaimable? no
lowmem_reserve[]: 0 0 0
DMA: 76*4kB 29*8kB 5*16kB 0*32kB 0*64kB 1*128kB 1*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 1000kB
Normal: 650*4kB 9*8kB 2*16kB 8*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 2960kB
37964 total pagecache pages
3160 pages in swap cache

On Sun, Jun 6, 2010 at 3:42 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> Le dimanche 06 juin 2010 à 11:56 +0200, Michael Guntsche a écrit :
> > On 2010.06.04 18:16:44 , Michael Guntsche wrote:
> > > I applied the patch recompiled and run it on the routerboard, trying
> > > to trigger the bug again.
> >
> > Hi Eric,
> >
> > Up to now I was not able to reproduce the bug, do you think this patch
> > can be pushed to mainline or is there a "better"/other  fix for it?
> >
> > Kind regards,
> > Michael
> >
> >
>
> Thanks Michael for testing.
>
> I'll submit ASAP an official patch, sent to all people involved in this
> driver to get their Ack (or Nack).
>
> IEEE80211_MAX_MPDU_LEN being 3840 + somebits is suspect, since it doesnt
> match 802.11 specs.
>
> It should be more close of 2304 + MAC header (32bytes) + FCS (4 bytes) ?
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/drivers/net/wireless/ath/ath9k/recv.c b/drivers/net/wireless/ath/ath9k/recv.c
index ca6065b..0a0dc3a 100644
--- a/drivers/net/wireless/ath/ath9k/recv.c
+++ b/drivers/net/wireless/ath/ath9k/recv.c
@@ -226,10 +226,10 @@  static int ath_rx_edma_init(struct ath_softc *sc, int nbufs)
 	u32 size;
 
 
-	common->rx_bufsize = roundup(IEEE80211_MAX_MPDU_LEN +
-				     ah->caps.rx_status_len,
-				     min(common->cachelsz, (u16)64));
-
+	size = roundup(IEEE80211_MAX_MPDU_LEN + ah->caps.rx_status_len,
+		       min(common->cachelsz, (u16)64));
+	common->rx_bufsize = max_t(u32, size,
+				   SKB_MAX_ORDER(NET_SKB_PAD + common->cachelsz, 0));
 	ath9k_hw_set_rx_bufsize(ah, common->rx_bufsize -
 				    ah->caps.rx_status_len);