Patchwork [2/2] powerpc/fsl_booke: enable the relocatable for the kdump kernel

login
register
mail settings
Submitter Kevin Hao
Date June 27, 2013, 2 a.m.
Message ID <1372298434-20220-3-git-send-email-haokexin@gmail.com>
Download mbox | patch
Permalink /patch/254927/
State Superseded
Headers show

Comments

Kevin Hao - June 27, 2013, 2 a.m.
For a relocatable kdump kernel, we may create a tlb map which is
beyond the real memory allocated to the kdump kernel. For example,
when the boot kernel reserve 32M memory for the kdump kernel by
using 'crashkernel=32M@64M', we will have to create a 256M tlb
entry in the kdump kernel. So define PPC_PIN_SIZE for fsl ppc32 bit,
this will make sure that we still get the right VMALLOC_START in
this case.

Signed-off-by: Kevin Hao <haokexin@gmail.com>
---
 arch/powerpc/Kconfig                  |  3 +--
 arch/powerpc/include/asm/mmu-book3e.h |  5 +++++
 arch/powerpc/mm/fsl_booke_mmu.c       | 30 +++++++++++++++++++++++++++---
 3 files changed, 33 insertions(+), 5 deletions(-)
Kevin Hao - June 30, 2013, 7:35 a.m.
On Thu, Jun 27, 2013 at 09:19:06PM -0500, Scott Wood wrote:
> On 06/26/2013 09:00:34 PM, Kevin Hao wrote:
> >diff --git a/arch/powerpc/include/asm/mmu-book3e.h
> >b/arch/powerpc/include/asm/mmu-book3e.h
> >index 936db36..bf422db 100644
> >--- a/arch/powerpc/include/asm/mmu-book3e.h
> >+++ b/arch/powerpc/include/asm/mmu-book3e.h
> >@@ -214,6 +214,11 @@
> > #define TLBILX_T_CLASS2			6
> > #define TLBILX_T_CLASS3			7
> >
> >+#ifdef CONFIG_PPC32
> >+/* The max size that one tlb can map in a 32bit kernel. */
> >+#define PPC_PIN_SIZE	(1 << 28)	/* 256M */
> >+#endif
> 
> That comment is not true for all chips.

This is not for the hardware limitation.

> 
> >@@ -177,11 +178,34 @@ unsigned long map_mem_in_cams(unsigned long
> >ram, int max_cam_idx)
> > 	unsigned long virt = PAGE_OFFSET;
> > 	phys_addr_t phys = memstart_addr;
> > 	unsigned long amount_mapped = 0;
> >-
> >+	unsigned long cam_sz;
> >+
> >+#if defined(CONFIG_RELOCATABLE) && defined(CONFIG_PPC32)
> >+	/*
> >+	 * For a relocatable kernel, we would not map from
> >memstart_addr.
> >+	 * We first align to PPC_PIN_SIZE (256M), then map the
> >PAGE_OFFSET
> >+	 * from there.
> >+	 */
> >+	phys &= ~(PPC_PIN_SIZE - 1);
> >+	ram += memstart_addr & (PPC_PIN_SIZE - 1);
> 
> You should not map anything before memstart_addr.  If memstart_addr
> isn't 256M-aligned, you'll have to either use smaller pages or
> consider that region to be "high"mem (assuming Linux supports
> highmem existing below lowmem -- I'm skeptical).

OK, I will try to find a another way to resolve this issue.

> 
> >+	/*
> >+	 * For a kdump kernel, we may use a memory area reserved by the
> >boot
> >+	 * kernel by using a kernel option like this
> >'crashkernel=32M@64M'.
> >+	 * In this case, the ram is 96M. The kernel will try to map the
> >first
> >+	 * 64M in the first tlb entry. The kernel will definitely get
> >stuck,
> >+	 * since the kernel is running above the 64M. So we have to make
> >sure
> >+	 * that the first tlb cover the current kernel running address
> >at least.
> >+	 */
> 
> Maybe we should be running from AS1 when we set this up, to avoid
> problems replacing an entry while it's in use?

I thought about this. The reason that I don't use this method is
we also have to do another relocation if we just want to map
the reserved memory for the kernel.

> 
> Pardon my ignorance about how kdump/kexec works, but I'm a bit
> confused by exactly what the situation is with crashkernel.  How do
> we know that we are the crash kernel, and that we should limit our
> RAM usage to that area?

The kexec tool will parse the command line of the boot kernel and get
the reserved memory info (such as start address, size) and then pass
these informations to the kdump kernel via device tree.

>  I'm wondering if this code is assuming that
> the crashkernel area is from where the kernel starts to the end of
> RAM.

No. We get these information from device tree.

> 
> >+	while (1) {
> >+		cam_sz = calc_cam_sz(ram, virt, phys);
> >+		if (cam_sz + phys > PHYSICAL_START + _end - _stext)
> >+			break;
> >+		ram = 1 << (ilog2(ram) + 1);
> >+	}
> 
> The ram that was passed in is as much as you have.  Don't map more.
> 
> What happens if (e.g.) memstart_addr is 512M, with a size of 512M,
> and the kernel starts at 768M?  Increasing the size will never get
> you a mapping that covers kernstart, because calc_cam_sz will never
> return more than 256M.

Yes, the current code still can't handle this case. We always assume
that the kernel is in the memory region which can be covered by the
first tlb entry.

> 
> When does memory below the rounded-down kernel start get mapped?

This only get mapped via __ioremap when we need to read from it such as
creating the vmcore image.

Thanks,
Kevin

> 
> -Scott
Kevin Hao - July 2, 2013, 3:45 a.m.
On Mon, Jul 01, 2013 at 08:00:06PM -0500, Scott Wood wrote:
> On 06/30/2013 02:35:21 AM, Kevin Hao wrote:
> >On Thu, Jun 27, 2013 at 09:19:06PM -0500, Scott Wood wrote:
> >> On 06/26/2013 09:00:34 PM, Kevin Hao wrote:
> >> >diff --git a/arch/powerpc/include/asm/mmu-book3e.h
> >> >b/arch/powerpc/include/asm/mmu-book3e.h
> >> >index 936db36..bf422db 100644
> >> >--- a/arch/powerpc/include/asm/mmu-book3e.h
> >> >+++ b/arch/powerpc/include/asm/mmu-book3e.h
> >> >@@ -214,6 +214,11 @@
> >> > #define TLBILX_T_CLASS2			6
> >> > #define TLBILX_T_CLASS3			7
> >> >
> >> >+#ifdef CONFIG_PPC32
> >> >+/* The max size that one tlb can map in a 32bit kernel. */
> >> >+#define PPC_PIN_SIZE	(1 << 28)	/* 256M */
> >> >+#endif
> >>
> >> That comment is not true for all chips.
> >
> >This is not for the hardware limitation.
> 
> It's not for a general software limitation, either.  We can use 1G
> mappings for user hugetlb (if the default mmap address is moved out
> of the way) or for the kernel lowmem mapping (if the address is
> moved to 0x80000000 instead of 0xc0000000).
> 
> It's also possible (although unlikely at this point) that someone
> could make a 32-bit booke chip that cannot handle 256M mappings.

Then I have to agree with you. :-)
Wish I can find a way to drop it.

> 
> >> Pardon my ignorance about how kdump/kexec works, but I'm a bit
> >> confused by exactly what the situation is with crashkernel.  How do
> >> we know that we are the crash kernel, and that we should limit our
> >> RAM usage to that area?
> >
> >The kexec tool will parse the command line of the boot kernel and get
> >the reserved memory info (such as start address, size) and then pass
> >these informations to the kdump kernel via device tree.
> 
> Does this alter memstart_addr or just mark the region as reserved?

This will alter memstart_addr since we set memstart_addr based on the
memory node in device tree.

> 
> >> >+	while (1) {
> >> >+		cam_sz = calc_cam_sz(ram, virt, phys);
> >> >+		if (cam_sz + phys > PHYSICAL_START + _end - _stext)
> >> >+			break;
> >> >+		ram = 1 << (ilog2(ram) + 1);
> >> >+	}
> >>
> >> The ram that was passed in is as much as you have.  Don't map more.
> >>
> >> What happens if (e.g.) memstart_addr is 512M, with a size of 512M,
> >> and the kernel starts at 768M?  Increasing the size will never get
> >> you a mapping that covers kernstart, because calc_cam_sz will never
> >> return more than 256M.
> >
> >Yes, the current code still can't handle this case. We always assume
> >that the kernel is in the memory region which can be covered by the
> >first tlb entry.
> 
> Assuming memstart_addr isn't affected by the crashkernel reservation
> (if it is, could you point out where?),

No. The memstart_addr does be affected. For example, for a boot kernel
with the command line with "crashkernel=32M@64M", the device tree passed
the kdump kernel will have a memory node like this:
    memory {
            reg = <0x0 0x4000000 0x0 0x2000000>;
            device_type = "memory";
    }

Then the memstart_addr will be set to 0x4000000 in the kdump kernel.

> you'd then have a problem
> with the crashkernel area going anywhere above 256M.

The kdump kernel can be loaded above 256M memory. It only can not be loaded
above memstart_addr + 256M memory.

Thanks,
Kevin

> 
> -Scott
Kevin Hao - July 3, 2013, 3:29 a.m.
On Tue, Jul 02, 2013 at 05:41:05PM -0500, Scott Wood wrote:
> On 07/01/2013 10:45:14 PM, Kevin Hao wrote:
> >On Mon, Jul 01, 2013 at 08:00:06PM -0500, Scott Wood wrote:
> >> On 06/30/2013 02:35:21 AM, Kevin Hao wrote:
> >> >On Thu, Jun 27, 2013 at 09:19:06PM -0500, Scott Wood wrote:
> >> >> On 06/26/2013 09:00:34 PM, Kevin Hao wrote:
> >> >> >diff --git a/arch/powerpc/include/asm/mmu-book3e.h
> >> >> >b/arch/powerpc/include/asm/mmu-book3e.h
> >> >> >index 936db36..bf422db 100644
> >> >> >--- a/arch/powerpc/include/asm/mmu-book3e.h
> >> >> >+++ b/arch/powerpc/include/asm/mmu-book3e.h
> >> >> >@@ -214,6 +214,11 @@
> >> >> > #define TLBILX_T_CLASS2			6
> >> >> > #define TLBILX_T_CLASS3			7
> >> >> >
> >> >> >+#ifdef CONFIG_PPC32
> >> >> >+/* The max size that one tlb can map in a 32bit kernel. */
> >> >> >+#define PPC_PIN_SIZE	(1 << 28)	/* 256M */
> >> >> >+#endif
> >> >>
> >> >> That comment is not true for all chips.
> >> >
> >> >This is not for the hardware limitation.
> >>
> >> It's not for a general software limitation, either.  We can use 1G
> >> mappings for user hugetlb (if the default mmap address is moved out
> >> of the way) or for the kernel lowmem mapping (if the address is
> >> moved to 0x80000000 instead of 0xc0000000).
> >>
> >> It's also possible (although unlikely at this point) that someone
> >> could make a 32-bit booke chip that cannot handle 256M mappings.
> >
> >Then I have to agree with you. :-)
> >Wish I can find a way to drop it.
> 
> At least give it a comment that describes how it's actually used.
> 
> >> Assuming memstart_addr isn't affected by the crashkernel reservation
> >> (if it is, could you point out where?),
> >
> >No. The memstart_addr does be affected. For example, for a boot kernel
> >with the command line with "crashkernel=32M@64M", the device tree
> >passed
> >the kdump kernel will have a memory node like this:
> >    memory {
> >            reg = <0x0 0x4000000 0x0 0x2000000>;
> >            device_type = "memory";
> >    }
> >
> >Then the memstart_addr will be set to 0x4000000 in the kdump kernel.
> 
> OK, so then how does the crash kernel know which regions to dump?
> Is that indicated somewhere else in the device tree?

No. We get the dump of the memory via /proc/vmcore in the kdump kernel.

Thanks,
Kevin
> 
> -Scott

Patch

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 9eb97ac..ca237f5 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -382,8 +382,7 @@  config KEXEC
 config CRASH_DUMP
 	bool "Build a kdump crash kernel"
 	depends on PPC64 || 6xx || FSL_BOOKE || (44x && !SMP)
-	select RELOCATABLE if PPC64 || 44x
-	select DYNAMIC_MEMSTART if FSL_BOOKE
+	select RELOCATABLE if PPC64 || 44x || FSL_BOOKE
 	help
 	  Build a kernel suitable for use as a kdump capture kernel.
 	  The same kernel binary can be used as production kernel and dump
diff --git a/arch/powerpc/include/asm/mmu-book3e.h b/arch/powerpc/include/asm/mmu-book3e.h
index 936db36..bf422db 100644
--- a/arch/powerpc/include/asm/mmu-book3e.h
+++ b/arch/powerpc/include/asm/mmu-book3e.h
@@ -214,6 +214,11 @@ 
 #define TLBILX_T_CLASS2			6
 #define TLBILX_T_CLASS3			7
 
+#ifdef CONFIG_PPC32
+/* The max size that one tlb can map in a 32bit kernel. */
+#define PPC_PIN_SIZE	(1 << 28)	/* 256M */
+#endif
+
 #ifndef __ASSEMBLY__
 #include <asm/bug.h>
 
diff --git a/arch/powerpc/mm/fsl_booke_mmu.c b/arch/powerpc/mm/fsl_booke_mmu.c
index 07ba45b..59549b3 100644
--- a/arch/powerpc/mm/fsl_booke_mmu.c
+++ b/arch/powerpc/mm/fsl_booke_mmu.c
@@ -52,6 +52,7 @@ 
 #include <asm/smp.h>
 #include <asm/machdep.h>
 #include <asm/setup.h>
+#include <asm/sections.h>
 
 #include "mmu_decl.h"
 
@@ -177,11 +178,34 @@  unsigned long map_mem_in_cams(unsigned long ram, int max_cam_idx)
 	unsigned long virt = PAGE_OFFSET;
 	phys_addr_t phys = memstart_addr;
 	unsigned long amount_mapped = 0;
-
+	unsigned long cam_sz;
+
+#if defined(CONFIG_RELOCATABLE) && defined(CONFIG_PPC32)
+	/*
+	 * For a relocatable kernel, we would not map from memstart_addr.
+	 * We first align to PPC_PIN_SIZE (256M), then map the PAGE_OFFSET
+	 * from there.
+	 */
+	phys &= ~(PPC_PIN_SIZE - 1);
+	ram += memstart_addr & (PPC_PIN_SIZE - 1);
+
+	/*
+	 * For a kdump kernel, we may use a memory area reserved by the boot
+	 * kernel by using a kernel option like this 'crashkernel=32M@64M'.
+	 * In this case, the ram is 96M. The kernel will try to map the first
+	 * 64M in the first tlb entry. The kernel will definitely get stuck,
+	 * since the kernel is running above the 64M. So we have to make sure
+	 * that the first tlb cover the current kernel running address at least.
+	 */
+	while (1) {
+		cam_sz = calc_cam_sz(ram, virt, phys);
+		if (cam_sz + phys > PHYSICAL_START + _end - _stext)
+			break;
+		ram = 1 << (ilog2(ram) + 1);
+	}
+#endif
 	/* Calculate CAM values */
 	for (i = 0; ram && i < max_cam_idx; i++) {
-		unsigned long cam_sz;
-
 		cam_sz = calc_cam_sz(ram, virt, phys);
 		settlbcam(i, virt, phys, cam_sz, PAGE_KERNEL_X, 0);