Message ID | 20090609141903.GE15219@wotan.suse.de (mailing list archive) |
---|---|
State | Not Applicable, archived |
Headers | show |
Nick Piggin wrote: > I can't really work it out. It seems to be the kmem_cache_cache which has > a problem, but there have already been lots of caches created and even > this samw cache_node already used right beforehand with no problem. > > Unless a CPU or node comes up or something right at this point or the > caller is scheduled onto a different CPU... oopses seem to all > have CPU#1, wheras boot CPU is probably #0 (these CPUs are node 0 > and memory is only on node 1 and 2 where there are no CPUs if I read > correctly). > > I still can't see the reason for the failure, but can you try this > patch please and show dmesg? I was able to boot yesterday's next (20090611) on this machine. Not sure what changed(may be because of merge with linus tree), but i can no longer recreate this issue with next 20090611. I was consistently able to recreate the problem till June 10th next tree. Thanks -Sachin
On Fri, Jun 12, 2009 at 11:14:10AM +0530, Sachin Sant wrote: > Nick Piggin wrote: > >I can't really work it out. It seems to be the kmem_cache_cache which has > >a problem, but there have already been lots of caches created and even > >this samw cache_node already used right beforehand with no problem. > > > >Unless a CPU or node comes up or something right at this point or the > >caller is scheduled onto a different CPU... oopses seem to all > >have CPU#1, wheras boot CPU is probably #0 (these CPUs are node 0 > >and memory is only on node 1 and 2 where there are no CPUs if I read > >correctly). > > > >I still can't see the reason for the failure, but can you try this > >patch please and show dmesg? > I was able to boot yesterday's next (20090611) on this machine. Not sure Still with SLQB? With debug options turned on? > what changed(may be because of merge with linus tree), but i can no longer > recreate this issue with next 20090611. I was consistently able to > recreate the problem till June 10th next tree. I would guess some kind of memory corruption that by chance did not break the other allocators. Please let us know if you see any more crashes. Thanks for all your help.
Nick Piggin wrote: >> I was able to boot yesterday's next (20090611) on this machine. Not sure >> > > Still with SLQB? With debug options turned on? > Ah .. spoke too soon. The kernel was not compiled with SLQB. Sorry about the confusion. I can't seem to select SLQB as the slab allocator. Thanks -Sachin
On Fri, Jun 12, 2009 at 01:38:50PM +0530, Sachin Sant wrote: > Nick Piggin wrote: > >>I was able to boot yesterday's next (20090611) on this machine. Not sure > >> > > > >Still with SLQB? With debug options turned on? > > > Ah .. spoke too soon. The kernel was not compiled with SLQB. Sorry > about the confusion. I can't seem to select SLQB as the slab > allocator. It must have been dropped out of -next. You could just try a known-bad kernel with my patch applied? Thanks, Nick
On Fri, 2009-06-12 at 10:21 +0200, Nick Piggin wrote: > On Fri, Jun 12, 2009 at 01:38:50PM +0530, Sachin Sant wrote: > > Nick Piggin wrote: > > >>I was able to boot yesterday's next (20090611) on this machine. Not sure > > >> > > > > > >Still with SLQB? With debug options turned on? > > > > > Ah .. spoke too soon. The kernel was not compiled with SLQB. Sorry > > about the confusion. I can't seem to select SLQB as the slab > > allocator. > > It must have been dropped out of -next. You could just try > a known-bad kernel with my patch applied? Hmm, SLQB in my for-next branch. Stephen, is slab.git dropped from linux-next or something? Pekka
Hi Pekka, On Fri, 12 Jun 2009 11:25:39 +0300 Pekka Enberg <penberg@cs.helsinki.fi> wrote: > > Hmm, SLQB in my for-next branch. Stephen, is slab.git dropped from > linux-next or something? Yesterday (next-20090611) the slab tree for linux-next had only one commit in it ("SLUB: Out-of-memory diagnostics"). Today (next-20090612) it has quite a lot in it again - including SLQB.
On Fri, 2009-06-12 at 18:35 +1000, Stephen Rothwell wrote: > Hi Pekka, > > On Fri, 12 Jun 2009 11:25:39 +0300 Pekka Enberg <penberg@cs.helsinki.fi> wrote: > > > > Hmm, SLQB in my for-next branch. Stephen, is slab.git dropped from > > linux-next or something? > > Yesterday (next-20090611) the slab tree for linux-next had only one > commit in it ("SLUB: Out-of-memory diagnostics"). Today (next-20090612) > it has quite a lot in it again - including SLQB. Ah, ok. I did mess it up for few hours and I guess you picked up then. Thanks, Stephen!
Index: linux-2.6/mm/slqb.c =================================================================== --- linux-2.6.orig/mm/slqb.c +++ linux-2.6/mm/slqb.c @@ -963,6 +963,7 @@ static struct slqb_page *allocate_slab(s flags |= s->allocflags; + flags &= ~0x2000; page = (struct slqb_page *)alloc_pages_node(node, flags, s->order); if (!page) return NULL; @@ -1357,6 +1358,8 @@ static noinline void *__slab_alloc_page( unsigned int colour; void *object; + if (gfpflags & 0x2000) + printk("SLQB: __slab_alloc_page cpu=%d request node=%d\n", smp_processor_id(), node); c = get_cpu_slab(s, smp_processor_id()); colour = c->colour_next; c->colour_next += s->colour_off; @@ -1374,6 +1377,8 @@ static noinline void *__slab_alloc_page( if (unlikely(!page)) return page; + if (gfpflags & 0x2000) + printk("SLQB: __slab_alloc_page cpu=%d,nid=%d request node=%d page node=%d\n", smp_processor_id(), numa_node_id(), node, slqb_page_to_nid(page)); if (!NUMA_BUILD || likely(slqb_page_to_nid(page) == numa_node_id())) { struct kmem_cache_cpu *c; int cpu = smp_processor_id(); @@ -1382,6 +1387,7 @@ static noinline void *__slab_alloc_page( l = &c->list; page->list = l; + printk("SLQB: __slab_alloc_page spin_lock(%p)\n", &l->page_lock); spin_lock(&l->page_lock); l->nr_slabs++; l->nr_partial++; @@ -1398,6 +1404,8 @@ static noinline void *__slab_alloc_page( l = &n->list; page->list = l; + printk("SLQB: __slab_alloc_page spin_lock(%p)\n", &n->list_lock); + printk("SLQB: __slab_alloc_page spin_lock(%p)\n", &l->page_lock); spin_lock(&n->list_lock); spin_lock(&l->page_lock); l->nr_slabs++; @@ -1411,6 +1419,7 @@ static noinline void *__slab_alloc_page( #endif } VM_BUG_ON(!object); + printk("SLQB: __slab_alloc_page OK\n"); return object; } @@ -1440,6 +1449,8 @@ static void *__remote_slab_alloc_node(st struct kmem_cache_list *l; void *object; + if (gfpflags & 0x2000) + printk("SLQB: __remote_slab_alloc_node cpu=%d request node=%d\n", smp_processor_id(), node); n = s->node_slab[node]; if (unlikely(!n)) /* node has no memory */ return NULL; @@ -1541,7 +1552,11 @@ static __always_inline void *slab_alloc( again: local_irq_save(flags); + if (gfpflags & 0x2000) + printk("SLQB: slab_alloc cpu=%d,nid=%d request node=%d\n", smp_processor_id(), numa_node_id(), node); object = __slab_alloc(s, gfpflags, node); + if (gfpflags & 0x2000) + printk("SLQB: slab_alloc cpu=%d return=%p\n", smp_processor_id(), object); local_irq_restore(flags); if (unlikely(slab_debug(s)) && likely(object)) { @@ -2869,9 +2884,12 @@ void __init kmem_cache_init(void) #endif #ifdef CONFIG_SMP + printk("SLQB: kmem_cache_init possible CPUs: "); for_each_possible_cpu(i) { struct kmem_cache_cpu *c; + printk("%d ", i); + c = &per_cpu(kmem_cache_cpus, i); init_kmem_cache_cpu(&kmem_cache_cache, c); kmem_cache_cache.cpu_slab[i] = c; @@ -2886,14 +2904,18 @@ void __init kmem_cache_init(void) kmem_node_cache.cpu_slab[i] = c; #endif } + printk("\n"); #else init_kmem_cache_cpu(&kmem_cache_cache, &kmem_cache_cache.cpu_slab); #endif #ifdef CONFIG_NUMA - for_each_node_state(i, N_NORMAL_MEMORY) { + printk("SLQB: kmem_cache_init possible nodes: "); + for_each_node_state(i, N_POSSIBLE) { struct kmem_cache_node *n; + printk("%d ", i); + n = &per_cpu(kmem_cache_nodes, i); init_kmem_cache_node(&kmem_cache_cache, n); kmem_cache_cache.node_slab[i] = n; @@ -2906,6 +2928,7 @@ void __init kmem_cache_init(void) init_kmem_cache_node(&kmem_node_cache, n); kmem_node_cache.node_slab[i] = n; } + printk("\n"); #endif /* Caches that are not of the two-to-the-power-of size */ @@ -3040,12 +3063,17 @@ struct kmem_cache *kmem_cache_create(con if (!kmem_cache_create_ok(name, size, align, flags)) goto err; - s = kmem_cache_alloc(&kmem_cache_cache, GFP_KERNEL); + printk("SLQB: kmem_cache_create %s size=%d align=%d flags=%lx\n", name, (int)size, (int)align, flags); + + s = kmem_cache_alloc(&kmem_cache_cache, GFP_KERNEL|0x2000); if (!s) goto err; - if (kmem_cache_open(s, name, size, align, flags, ctor, 1)) + printk("SLQB: kmem_cache_create %s kmem_cache allocated\n", name); + if (kmem_cache_open(s, name, size, align, flags, ctor, 1)) { + printk("SLQB: kmem_cache_create %s kmem_cache opened\n", name); return s; + } kmem_cache_free(&kmem_cache_cache, s);