diff mbox

[v2,0/8] sparc64: MM/IRQ patch queue.

Message ID 20140928.003525.1355110317298235630.davem@davemloft.net
State RFC
Delegated to: David Miller
Headers show

Commit Message

David Miller Sept. 28, 2014, 4:35 a.m. UTC
From: Bob Picco <bpicco@meloft.net>
Date: Sat, 27 Sep 2014 16:46:51 -0400

>> Anyways, let me know how these work for you.  I'll be running tests
>> on my T4-2 all weekend.
> I'll commence with T5-2 which is local to me.
> 
> M7-4 came up with an older kernel of mine but with hardware faults.

Thanks in advance.

I added the following to my tree to deal with the MAX_BANKS issue,
I hope it helps the case you talked about, and it's essentially
free :-)

Comments

David Miller Sept. 29, 2014, 8:15 p.m. UTC | #1
From: David Miller <davem@davemloft.net>
Date: Sun, 28 Sep 2014 00:35:25 -0400 (EDT)

> From: Bob Picco <bpicco@meloft.net>
> Date: Sat, 27 Sep 2014 16:46:51 -0400
> 
>>> Anyways, let me know how these work for you.  I'll be running tests
>>> on my T4-2 all weekend.
>> I'll commence with T5-2 which is local to me.
>> 
>> M7-4 came up with an older kernel of mine but with hardware faults.
> 
> Thanks in advance.
> 
> I added the following to my tree to deal with the MAX_BANKS issue,
> I hope it helps the case you talked about, and it's essentially
> free :-)

Bob, I know you're probably busy, but any updates wrt. testing
this series?

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bob Picco Sept. 29, 2014, 9:03 p.m. UTC | #2
Hi,
David Miller wrote:	[Mon Sep 29 2014, 04:15:43PM EDT]
> From: David Miller <davem@davemloft.net>
> Date: Sun, 28 Sep 2014 00:35:25 -0400 (EDT)
> 
> > From: Bob Picco <bpicco@meloft.net>
> > Date: Sat, 27 Sep 2014 16:46:51 -0400
> > 
> >>> Anyways, let me know how these work for you.  I'll be running tests
> >>> on my T4-2 all weekend.
> >> I'll commence with T5-2 which is local to me.
> >> 
> >> M7-4 came up with an older kernel of mine but with hardware faults.
> > 
> > Thanks in advance.
> > 
> > I added the following to my tree to deal with the MAX_BANKS issue,
> > I hope it helps the case you talked about, and it's essentially
> > free :-)
> 
> Bob, I know you're probably busy, but any updates wrt. testing
> this series?
Apologies, I've been contexting switching heavily.

My local T4-2 and T5-2 seem solid for THP "always" and large kernel parallel
makes. I've seen no issues.

There is an issue with spec-jbb. I'm hoping to have dedicated time for
this issue during the week.

I've not had time to examine the DEBUG_PAGEALLOC issue.

I'll also test M7, faults permitting.

I'll commit to resolving these issues before the merge window closes.

Obviously it is your final decision that matters.

> 
you're welcome and thanx
> Thanks.
--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Sept. 29, 2014, 9:33 p.m. UTC | #3
From: Bob Picco <bob.picco@oracle.com>
Date: Mon, 29 Sep 2014 17:03:22 -0400

> My local T4-2 and T5-2 seem solid for THP "always" and large kernel parallel
> makes. I've seen no issues.

Ok, how about the T5-8?  Does it at least boot properly with
DEBUG_PAGEALLOC disabled?

--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bob Picco Sept. 29, 2014, 10:35 p.m. UTC | #4
David Miller wrote:	[Mon Sep 29 2014, 05:33:17PM EDT]
> From: Bob Picco <bob.picco@oracle.com>
> Date: Mon, 29 Sep 2014 17:03:22 -0400
> 
> > My local T4-2 and T5-2 seem solid for THP "always" and large kernel parallel
> > makes. I've seen no issues.
> 
> Ok, how about the T5-8?  Does it at least boot properly with
> DEBUG_PAGEALLOC disabled?
> 
Sorry family issue on top of work!

It would require time to get on this machine but not required.

This is debug from the T5-8 from a while ago:
From the next line 
PERCPU: static=0x5d80 reserved=0x2000 dyn=0x6280 unit=0x10000
PERCPU: atom=0x400000 alloc=0x400000 __ai=0x2000 nr=8
PERCPU: group=0 ptr=0xffff803f5d000000 nid=0 nr_units=128 cpu=0
PERCPU: group=1 ptr=0xffff883f7f000000 nid=1 nr_units=128 cpu=128
PERCPU: group=2 ptr=0xffff903f7f000000 nid=2 nr_units=128 cpu=256
PERCPU: group=3 ptr=0xffff983f7f000000 nid=3 nr_units=128 cpu=384
PERCPU: group=4 ptr=0xffffa03f7f000000 nid=4 nr_units=128 cpu=512
PERCPU: group=5 ptr=0xffffa83f7f000000 nid=5 nr_units=128 cpu=640
PERCPU: group=6 ptr=0xffffb03f7f000000 nid=6 nr_units=128 cpu=768
PERCPU: group=7 ptr=0xffffb83f5ec00000 nid=7 nr_units=128 cpu=896
PERCPU: base=0xffff803f5d000000
to the previous line is from percpu-debug.patch.
PERCPU: max_distance=0x380001c10000 too large for vmalloc space 0xff00000000
. So just subtract 0xffff800000000000UL from "ptr".

This from a SMI/Oracle hardware engineer and also contained within the
T5 hardware specification:
"But on T5 this has changed. Memory on T5 is spread out based on PA[45:43]. So,
that means same situation as above will see physical addresses as :"
.

This is current from my T5-2:
[    0.000000] MM: PAGE_OFFSET is 0xffff800000000000 (max_phys_bits == 47)
[    0.000000] MM: VMALLOC [0x0000000100000000 --> 0x0000400000000000]
[    0.000000] MM: VMEMMAP [0x0000400000000000 --> 0x0000800000000000]
[    0.000000] Kernel: Using 2 locked TLB entries for main kernel image.
...
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x30400000-0x1fef781fff]
[    0.000000]   node   0: [mem 0x1fef788000-0x1fef789fff]
[    0.000000]   node   1: [mem 0x80000000000-0x81ffff0ffff]
[    0.000000]   node   1: [mem 0x81ffff20000-0x81ffff43fff]

You can see the MCU increases linearly within bits positions PA[45:43] by
node. VMALLOC range is based on VMALLOC_END = (~PAGE_OFFSET + 1) >> 1;.
So it would seem to fail for T5-8. It indeed does for M7-4 but other
issues manifest too.

This is M7-4:
MM: PAGE_OFFSET is 0xfffe000000000000 (max_phys_bits == 49)^M
MM: VMALLOC [0x0000000100000000 --> 0x0001000000000000]^M
MM: VMEMMAP [0x0001000000000000 --> 0x0002000000000000]^M
Kernel: Using 3 locked TLB entries for main kernel image.^M
Remapping the kernel... done.^M
..
Early memory node ranges^M
  node   0: [mem 0x50400000-0x3fbf6bdfff]^M
  node   0: [mem 0x3fbf6c4000-0x3fbf6c5fff]^M
  node   1: [mem 0x400000000000-0x403bffffffff]^M
  node   2: [mem 0x800000000000-0x803fffffffff]^M
  node   3: [mem 0xc00000000000-0xc03fffd5dfff]^M
  node   3: [mem 0xc03fffd90000-0xc03fffdaffff]^M
  node   3: [mem 0xc03fffdc0000-0xc03fffdc3fff]^M
Booting Linux...^M
CPU CAPS: [flush,stbar,swap,muldiv,v9,mul32,div32,v8plus]^M
CPU CAPS: [popc,vis,vis2,ASIBlkInit,fmaf,vis3,hpc,ima]^M
CPU CAPS: [pause,cbcond,aes,des,camellia,md5,sha1,sha256]^M
CPU CAPS: [sha512,mpmul,montmul,montsqr,crc32c]^M
PERCPU: max_distance=0xc000c2410000 too large for vmalloc space 0xffff00000000^M
PERCPU: auto allocator failed (-22), falling back to page size^M
PERCPU: 6 8K pages/cpu @0000000100000000 s23552 r8192 d17408^M

Let me look at this in the morning when NOT distracted by family issue.

thanx,

bob

--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Sept. 30, 2014, 1:52 a.m. UTC | #5
From: Bob Picco <bpicco@meloft.net>
Date: Mon, 29 Sep 2014 18:35:22 -0400

> PERCPU: max_distance=0x380001c10000 too large for vmalloc space 0xff00000000
> . So just subtract 0xffff800000000000UL from "ptr".

Bob you're not running my current patches.

The vmalloc space is enlarged by the current patch set and should
no longer run into this issue, and it's specifically what I was
interested in seeing work or not.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Sept. 30, 2014, 1:56 a.m. UTC | #6
From: David Miller <davem@davemloft.net>
Date: Mon, 29 Sep 2014 21:52:54 -0400 (EDT)

> From: Bob Picco <bpicco@meloft.net>
> Date: Mon, 29 Sep 2014 18:35:22 -0400
> 
>> PERCPU: max_distance=0x380001c10000 too large for vmalloc space 0xff00000000
>> . So just subtract 0xffff800000000000UL from "ptr".
> 
> Bob you're not running my current patches.
> 
> The vmalloc space is enlarged by the current patch set and should
> no longer run into this issue, and it's specifically what I was
> interested in seeing work or not.

Ignore me, I see you were just trying to explain how memory is layed
out on this T5 machines and why (which I understood already :-)
--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Sept. 30, 2014, 1:57 a.m. UTC | #7
From: Bob Picco <bpicco@meloft.net>
Date: Mon, 29 Sep 2014 18:35:22 -0400

> This is M7-4:
> MM: PAGE_OFFSET is 0xfffe000000000000 (max_phys_bits == 49)^M
> MM: VMALLOC [0x0000000100000000 --> 0x0001000000000000]^M
> MM: VMEMMAP [0x0001000000000000 --> 0x0002000000000000]^M
> Kernel: Using 3 locked TLB entries for main kernel image.^M
> Remapping the kernel... done.^M
> ..
> Early memory node ranges^M
>   node   0: [mem 0x50400000-0x3fbf6bdfff]^M
>   node   0: [mem 0x3fbf6c4000-0x3fbf6c5fff]^M
>   node   1: [mem 0x400000000000-0x403bffffffff]^M
>   node   2: [mem 0x800000000000-0x803fffffffff]^M
>   node   3: [mem 0xc00000000000-0xc03fffd5dfff]^M
>   node   3: [mem 0xc03fffd90000-0xc03fffdaffff]^M
>   node   3: [mem 0xc03fffdc0000-0xc03fffdc3fff]^M
> Booting Linux...^M
> CPU CAPS: [flush,stbar,swap,muldiv,v9,mul32,div32,v8plus]^M
> CPU CAPS: [popc,vis,vis2,ASIBlkInit,fmaf,vis3,hpc,ima]^M
> CPU CAPS: [pause,cbcond,aes,des,camellia,md5,sha1,sha256]^M
> CPU CAPS: [sha512,mpmul,montmul,montsqr,crc32c]^M
> PERCPU: max_distance=0xc000c2410000 too large for vmalloc space 0xffff00000000^M
> PERCPU: auto allocator failed (-22), falling back to page size^M
> PERCPU: 6 8K pages/cpu @0000000100000000 s23552 r8192 d17408^M

Ok, it looks like we need to expand the vmalloc area a bit or two
more, which is easy to do.  I'll cook up a patch for that.

> Let me look at this in the morning when NOT distracted by family issue.

Take your time, your help is appreciated.
--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bob Picco Sept. 30, 2014, 10:36 a.m. UTC | #8
David Miller wrote:	[Mon Sep 29 2014, 09:56:10PM EDT]
> From: David Miller <davem@davemloft.net>
> Date: Mon, 29 Sep 2014 21:52:54 -0400 (EDT)
> 
> > From: Bob Picco <bpicco@meloft.net>
> > Date: Mon, 29 Sep 2014 18:35:22 -0400
> > 
> >> PERCPU: max_distance=0x380001c10000 too large for vmalloc space 0xff00000000
> >> . So just subtract 0xffff800000000000UL from "ptr".
> > 
> > Bob you're not running my current patches.
> > 
> > The vmalloc space is enlarged by the current patch set and should
> > no longer run into this issue, and it's specifically what I was
> > interested in seeing work or not.
> 
> Ignore me, I see you were just trying to explain how memory is layed
> out on this T5 machines and why (which I understood already :-)
Ah no problem, I feared being called away and the potential remains today.
--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

====================
From ed49895e9d85d33cc5d8e774a58b24b416fbd846 Mon Sep 17 00:00:00 2001
From: "David S. Miller" <davem@davemloft.net>
Date: Sat, 27 Sep 2014 21:30:57 -0700
Subject: [PATCH] sparc64: Kill unnecessary tables and increase MAX_BANKS.

swapper_low_pmd_dir and swapper_pud_dir are actually completely
useless and unnecessary.

We just need swapper_pg_dir[].  Naturally the other page table chunks
will be allocated on an as-needed basis.  Since the kernel actually
accesses these tables in the PAGE_OFFSET view, there is not even a TLB
locality advantage of placing them in the kernel image.

Use the hard coded vmlinux.ld.S slot for swapper_pg_dir which is
naturally page aligned.

Increase MAX_BANKS to 1024 in order to handle heavily fragmented
virtual guests.

Even with this MAX_BANKS increase, the kernel is 20K+ smaller.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 arch/sparc/include/asm/pgtable_64.h |  1 -
 arch/sparc/kernel/vmlinux.lds.S     |  5 +++--
 arch/sparc/mm/init_64.c             | 25 ++-----------------------
 3 files changed, 5 insertions(+), 26 deletions(-)

diff --git a/arch/sparc/include/asm/pgtable_64.h b/arch/sparc/include/asm/pgtable_64.h
index c093922..bfeb626 100644
--- a/arch/sparc/include/asm/pgtable_64.h
+++ b/arch/sparc/include/asm/pgtable_64.h
@@ -927,7 +927,6 @@  static inline void __set_pte_at(struct mm_struct *mm, unsigned long addr,
 #endif
 
 extern pgd_t swapper_pg_dir[PTRS_PER_PGD];
-extern pmd_t swapper_low_pmd_dir[PTRS_PER_PMD];
 
 void paging_init(void);
 unsigned long find_ecache_flush_span(unsigned long size);
diff --git a/arch/sparc/kernel/vmlinux.lds.S b/arch/sparc/kernel/vmlinux.lds.S
index 0bacceb..0924305 100644
--- a/arch/sparc/kernel/vmlinux.lds.S
+++ b/arch/sparc/kernel/vmlinux.lds.S
@@ -35,8 +35,9 @@  jiffies = jiffies_64;
 
 SECTIONS
 {
-	/* swapper_low_pmd_dir is sparc64 only */
-	swapper_low_pmd_dir = 0x0000000000402000;
+#ifdef CONFIG_SPARC64
+	swapper_pg_dir = 0x0000000000402000;
+#endif
 	. = INITIAL_ADDRESS;
 	.text TEXTSTART :
 	{
diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
index ede1840..b9a14a0 100644
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -87,7 +87,7 @@  extern struct tsb swapper_tsb[KERNEL_TSB_NENTRIES];
 
 static unsigned long cpu_pgsz_mask;
 
-#define MAX_BANKS	32
+#define MAX_BANKS	1024
 
 static struct linux_prom64_registers pavail[MAX_BANKS];
 static int pavail_ents;
@@ -1947,12 +1947,6 @@  static void __init sun4v_linear_pte_xor_finalize(void)
 
 static unsigned long last_valid_pfn;
 
-/* These must be page aligned in order to not trigger the
- * alignment tests of pgd_bad() and pud_bad().
- */
-pgd_t swapper_pg_dir[PTRS_PER_PGD] __attribute__ ((aligned (PAGE_SIZE)));
-static pud_t swapper_pud_dir[PTRS_PER_PUD] __attribute__ ((aligned (PAGE_SIZE)));
-
 static void sun4u_pgprot_init(void);
 static void sun4v_pgprot_init(void);
 
@@ -2006,8 +2000,6 @@  void __init paging_init(void)
 {
 	unsigned long end_pfn, shift, phys_base;
 	unsigned long real_end, i;
-	pud_t *pud;
-	pmd_t *pmd;
 	int node;
 
 	setup_page_offset();
@@ -2103,20 +2095,7 @@  void __init paging_init(void)
 	 */
 	init_mm.pgd += ((shift) / (sizeof(pgd_t)));
 	
-	memset(swapper_low_pmd_dir, 0, sizeof(swapper_low_pmd_dir));
-
-	/* The kernel page tables we publish into what the rest of the
-	 * world sees must be adjusted so that they see the PAGE_OFFSET
-	 * address of these in-kerenel data structures.  However right
-	 * here we must access them from the kernel image side, because
-	 * the trap tables haven't been taken over and therefore we cannot
-	 * take TLB misses in the PAGE_OFFSET linear mappings yet.
-	 */
-	pud = swapper_pud_dir + (shift / sizeof(pud_t));
-	pgd_set(&swapper_pg_dir[0], pud);
-
-	pmd = swapper_low_pmd_dir + (shift / sizeof(pmd_t));
-	pud_set(&swapper_pud_dir[0], pmd);
+	memset(swapper_pg_dir, 0, sizeof(swapper_pg_dir));
 
 	inherit_prom_mappings();