Patchwork 2.6.31-git5 kernel boot hangs on powerpc

login
register
mail settings
Submitter Tejun Heo
Date Sept. 25, 2009, 7:43 a.m.
Message ID <4ABC7486.8040500@kernel.org>
Download mbox | patch
Permalink /patch/34262/
State Superseded
Headers show

Comments

Tejun Heo - Sept. 25, 2009, 7:43 a.m.
Tejun Heo wrote:
> Hello,
> 
> Sachin Sant wrote:
>> <4>PERCPU: chunk 1 relocating -1 -> 18 c0000000db70fb00
>> <c0000000db70fb00:c0000000db70fb00>
>> <4>PERCPU: relocated <c000000001120320:c000000001120320>
>> <4>PERCPU: chunk 1 relocating 18 -> 16 c0000000db70fb00
>> <c000000001120320:c000000001120320>
>> <4>PERCPU: relocated <c000000001120300:c000000001120300>
>> <4>PERCPU: chunk 1, alloc pages [0,1)
>> <4>PERCPU: chunk 1, map pages [0,1)
>> <4>PERCPU: map 0xd00007fffff00000, 1 pages 53544
>> <4>PERCPU: map 0xd00007fffff80000, 1 pages 53545
>> <4>PERCPU: chunk 1, will clear 4096b/unit d00007fffff00000 d00007fffff80000
>> <3>INFO: RCU detected CPU 0 stall (t=1000 jiffies)
> 
> This supports my hypothesis.  This is the first area being allocated
> from a dynamic chunk and cleared.  PFN 53544 and 53545 have been
> allocated and successfully mapped to 0xd00007fffff00000 and
> 0xd00007fffff80000 using map_kernel_range_noflush() but when those
> addresses are actually accessed, we end up with infinite faults.  The
> fault handler probably thinks that the fault has been handled
> correctly but, when the control is returned, the processor faults
> again.  Benjamin, I'm way out of my depth here, can you please help?
> 
> Oh, one more simple experiment.  Sachin, does the following patch make
> any difference?

Oops, the patch should look like the following.
Sachin P. Sant - Sept. 25, 2009, 8:03 a.m.
Tejun Heo wrote:
> Tejun Heo wrote:
>   
>> Hello,
>>
>> Sachin Sant wrote:
>>     
>>> <4>PERCPU: chunk 1 relocating -1 -> 18 c0000000db70fb00
>>> <c0000000db70fb00:c0000000db70fb00>
>>> <4>PERCPU: relocated <c000000001120320:c000000001120320>
>>> <4>PERCPU: chunk 1 relocating 18 -> 16 c0000000db70fb00
>>> <c000000001120320:c000000001120320>
>>> <4>PERCPU: relocated <c000000001120300:c000000001120300>
>>> <4>PERCPU: chunk 1, alloc pages [0,1)
>>> <4>PERCPU: chunk 1, map pages [0,1)
>>> <4>PERCPU: map 0xd00007fffff00000, 1 pages 53544
>>> <4>PERCPU: map 0xd00007fffff80000, 1 pages 53545
>>> <4>PERCPU: chunk 1, will clear 4096b/unit d00007fffff00000 d00007fffff80000
>>> <3>INFO: RCU detected CPU 0 stall (t=1000 jiffies)
>>>       
>> This supports my hypothesis.  This is the first area being allocated
>> from a dynamic chunk and cleared.  PFN 53544 and 53545 have been
>> allocated and successfully mapped to 0xd00007fffff00000 and
>> 0xd00007fffff80000 using map_kernel_range_noflush() but when those
>> addresses are actually accessed, we end up with infinite faults.  The
>> fault handler probably thinks that the fault has been handled
>> correctly but, when the control is returned, the processor faults
>> again.  Benjamin, I'm way out of my depth here, can you please help?
>>
>> Oh, one more simple experiment.  Sachin, does the following patch make
>> any difference?
>>     
With this patch applied the machine boots OK :-)

Have attached the boot log. Note that this boot log is
from a different machine, but the reported problem can be
recreate on this machine as well.

Thanks
-Sachin

>
> Oops, the patch should look like the following.
>
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 69511e6..37ab9e2 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -2056,7 +2056,8 @@ static unsigned long pvm_determine_end(struct vmap_area **pnext,
>  				       struct vmap_area **pprev,
>  				       unsigned long align)
>  {
> -	const unsigned long vmalloc_end = VMALLOC_END & ~(align - 1);
> +	const unsigned long vmalloc_start = ALIGN(VMALLOC_START, align);
> +	const unsigned long vmalloc_end = vmalloc_start + (512 << 20);
>  	unsigned long addr;
>
>  	if (*pnext)
> @@ -2102,7 +2103,7 @@ struct vm_struct **pcpu_get_vm_areas(const unsigned long *offsets,
>  				     size_t align, gfp_t gfp_mask)
>  {
>  	const unsigned long vmalloc_start = ALIGN(VMALLOC_START, align);
> -	const unsigned long vmalloc_end = VMALLOC_END & ~(align - 1);
> +	const unsigned long vmalloc_end = vmalloc_start + (512 << 20);
>  	struct vmap_area **vas, *prev, *next;
>  	struct vm_struct **vms;
>  	int area, area2, last_area, term_area;
>
>
Tejun Heo - Sept. 25, 2009, 9:01 a.m.
Sachin Sant wrote:
> With this patch applied the machine boots OK :-)

Ah... so, the problem really is too high address.  If you've got some
time, it might be interesting to find out how far high is safe.

Thanks.
Benjamin Herrenschmidt - Sept. 25, 2009, 9:48 a.m.
On Fri, 2009-09-25 at 18:01 +0900, Tejun Heo wrote:
> > With this patch applied the machine boots OK :-)
> 
> Ah... so, the problem really is too high address.  If you've got some
> time, it might be interesting to find out how far high is safe.
> 
Might give me a clue about what the problem is but I think I'll just
cook up a test case that forcibly vmap something high up and see how it
goes from there. It could be a very old bug that nobody ever noticed
because our vmalloc space on 64-bit is so huge :-)

Cheers,
Ben.
Sachin P. Sant - Oct. 5, 2009, 6:54 a.m.
Benjamin Herrenschmidt wrote:
> On Fri, 2009-09-25 at 18:01 +0900, Tejun Heo wrote:
>   
>>> With this patch applied the machine boots OK :-)
>>>       
>> Ah... so, the problem really is too high address.  If you've got some
>> time, it might be interesting to find out how far high is safe.
>>
>>     
> Might give me a clue about what the problem is but I think I'll just
> cook up a test case that forcibly vmap something high up and see how it
> goes from there. It could be a very old bug that nobody ever noticed
> because our vmalloc space on 64-bit is so huge :-)
>   
I still have this problem with 2.6.32-rc3.
Here is the relevant information

0:mon> t
[link register   ] c0000000001a7f78 .pcpu_alloc+0x798/0xa04
[c0000000033e37f0] c0000000001a7f08 .pcpu_alloc+0x728/0xa04 (unreliable)
[c0000000033e3920] c0000000001a8278 .__alloc_percpu+0x3c/0x58
[c0000000033e39b0] c0000000005d1ad0 .snmp_mib_init+0x64/0xb0
[c0000000033e3a40] c0000000005d1c00 .ipv4_mib_init_net+0xe4/0x1f8
[c0000000033e3b00] c00000000055b608 .setup_net+0x78/0x138
[c0000000033e3ba0] c00000000055be38 .copy_net_ns+0x9c/0x148
[c0000000033e3c30] c0000000000d06d8 .create_new_namespaces+0x120/0x1e4
[c0000000033e3ce0] c0000000000d09e0 .unshare_nsproxy_namespaces+0x7c/0xfc
[c0000000033e3d80] c00000000009dd74 .SyS_unshare+0x148/0x33c
[c0000000033e3e30] c0000000000085b4 syscall_exit+0x0/0x40
--- Exception: c01 (System Call) at 00000fff8b0ab978
SP (fffe633fe30) is in userspace
0:mon> e
cpu 0x0: Vector: 501 (Hardware Interrupt) at [c0000000033e3570]
    pc: c00000000004bdc0: .memset+0x60/0xfc
    lr: c0000000001a7f78: .pcpu_alloc+0x798/0xa04
    sp: c0000000033e37f0
   msr: 8000000000009032
  current = 0xc000000003270860
  paca    = 0xc0000000010c2600
    pid   = 3442, comm = two_children_ns
0:mon> r
R00 = 0000000000000040   R07 = d00007fffff00000
R01 = c0000000033e37f0   R08 = 0000000000000000
R02 = c000000000fe7c78   R09 = c000000001700180
R03 = d00007fffff00000   R10 = c000000001095aa0
R04 = 0000000000000000   R11 = 00000000000003c0
R05 = 0000000000000000   R12 = 0000000048004428
R06 = d00007fffff00000   R13 = c0000000010c2600
pc  = c00000000004bdc0 .memset+0x60/0xfc
lr  = c0000000001a7f78 .pcpu_alloc+0x798/0xa04
msr = 8000000000009032   cr  = 44004420
ctr = 0000000000000040   xer = 0000000020000020   trap =  501
0:mon> di $.memset
c00000000004bd60  7c0300d0      neg     r0,r3
c00000000004bd64  5084442e      rlwimi  r4,r4,8,16,23
c00000000004bd68  70000007      andi.   r0,r0,7
c00000000004bd6c  5084801e      rlwimi  r4,r4,16,0,15
c00000000004bd70  7c850040      cmplw   cr1,r5,r0
c00000000004bd74  7884000e      rldimi  r4,r4,32,0
c00000000004bd78  7c101120      mtocrf  1,r0
c00000000004bd7c  7c661b78      mr      r6,r3
c00000000004bd80  418400ac      blt     cr1,c00000000004be2c    # .memset+0xcc/0xfc
c00000000004bd84  41e2002c      beq+    c00000000004bdb0        # .memset+0x50/0xfc
c00000000004bd88  7ca02850      subf    r5,r0,r5
c00000000004bd8c  409f000c      bns     cr7,c00000000004bd98    # .memset+0x38/0xfc
c00000000004bd90  98860000      stb     r4,0(r6)
c00000000004bd94  38c60001      addi    r6,r6,1
c00000000004bd98  409e000c      bne     cr7,c00000000004bda4    # .memset+0x44/0xfc
c00000000004bd9c  b0860000      sth     r4,0(r6)
0:mon>
c00000000004bda0  38c60002      addi    r6,r6,2
c00000000004bda4  409d000c      ble     cr7,c00000000004bdb0    # .memset+0x50/0xfc
c00000000004bda8  90860000      stw     r4,0(r6)
c00000000004bdac  38c60004      addi    r6,r6,4
c00000000004bdb0  78a0d183      rldicl. r0,r5,58,6
c00000000004bdb4  78a506a0      clrldi  r5,r5,58
c00000000004bdb8  7c0903a6      mtctr   r0
c00000000004bdbc  4182002c      beq     c00000000004bde8        # .memset+0x88/0xfc
c00000000004bdc0  f8860000      std     r4,0(r6)

At this point R06 contains d00007fffff00000.

Have attached the xmon log.

Thanks
-Sachin

Patch

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 69511e6..37ab9e2 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2056,7 +2056,8 @@  static unsigned long pvm_determine_end(struct vmap_area **pnext,
 				       struct vmap_area **pprev,
 				       unsigned long align)
 {
-	const unsigned long vmalloc_end = VMALLOC_END & ~(align - 1);
+	const unsigned long vmalloc_start = ALIGN(VMALLOC_START, align);
+	const unsigned long vmalloc_end = vmalloc_start + (512 << 20);
 	unsigned long addr;

 	if (*pnext)
@@ -2102,7 +2103,7 @@  struct vm_struct **pcpu_get_vm_areas(const unsigned long *offsets,
 				     size_t align, gfp_t gfp_mask)
 {
 	const unsigned long vmalloc_start = ALIGN(VMALLOC_START, align);
-	const unsigned long vmalloc_end = VMALLOC_END & ~(align - 1);
+	const unsigned long vmalloc_end = vmalloc_start + (512 << 20);
 	struct vmap_area **vas, *prev, *next;
 	struct vm_struct **vms;
 	int area, area2, last_area, term_area;