diff mbox

[RFC] PPC: /dev/mem: All RAM is cacheable, not just the kernel's.

Message ID 20101018223256.GA30946@udp111988uds.am.freescale.net (mailing list archive)
State RFC
Headers show

Commit Message

Scott Wood Oct. 18, 2010, 10:32 p.m. UTC
If mem= is used on the kernel command line to create reserved regions
for userspace to map using /dev/mem, let it be mapped cacheable as long
as it is within the memory region described in the device tree.

Signed-off-by: Scott Wood <scottwood@freescale.com>
---
This isn't just a performance issue, but it could also be a correctness
issue, if the reserved portion of RAM is mapped cacheable by e.g. a KVM
guest.  This patch does not address cases where such regions could show up
as something other than a standard memory node -- such as shared regions
in an AMP configuration.  Ideally there would be some means for a platform
to register cacheable regions, without having to completely replace the
entire phys_mem_access_prot function.

This patch assumes that there is no region between memstart and memend that
must be non-cacheable.  This is already the case with the "for now"
implementation of page_is_ram on 32-bit, but will this be a problem on
64-bit?

 arch/powerpc/kernel/pci-common.c |    5 ++++-
 arch/powerpc/kernel/prom.c       |    3 +++
 arch/powerpc/mm/mem.c            |    3 ++-
 arch/powerpc/mm/mmu_decl.h       |    1 +
 4 files changed, 10 insertions(+), 2 deletions(-)

Comments

Kumar Gala Oct. 18, 2010, 10:41 p.m. UTC | #1
On Oct 18, 2010, at 5:32 PM, Scott Wood wrote:

> If mem= is used on the kernel command line to create reserved regions
> for userspace to map using /dev/mem, let it be mapped cacheable as long
> as it is within the memory region described in the device tree.
> 
> Signed-off-by: Scott Wood <scottwood@freescale.com>
> ---

Just a nit, but subject should be powerpc:.. not PPC:

> This isn't just a performance issue, but it could also be a correctness
> issue, if the reserved portion of RAM is mapped cacheable by e.g. a KVM
> guest.  This patch does not address cases where such regions could show up
> as something other than a standard memory node -- such as shared regions
> in an AMP configuration.  Ideally there would be some means for a platform
> to register cacheable regions, without having to completely replace the
> entire phys_mem_access_prot function.
> 
> This patch assumes that there is no region between memstart and memend that
> must be non-cacheable.  This is already the case with the "for now"
> implementation of page_is_ram on 32-bit, but will this be a problem on
> 64-bit?
> 
> arch/powerpc/kernel/pci-common.c |    5 ++++-
> arch/powerpc/kernel/prom.c       |    3 +++
> arch/powerpc/mm/mem.c            |    3 ++-
> arch/powerpc/mm/mmu_decl.h       |    1 +
> 4 files changed, 10 insertions(+), 2 deletions(-)

For some time I've been meaning for us to look at the address range tracking that x86 does so one can make sure we aren't mapping regions with different WIMG settings.  However, never enough time for this.  So I think this is reasonable for now.  Hopefully Ben will comment on 64-bit side of the world.

- k
Yang Li Oct. 19, 2010, 9:37 a.m. UTC | #2
On Tue, Oct 19, 2010 at 6:32 AM, Scott Wood <scottwood@freescale.com> wrote:
> If mem= is used on the kernel command line to create reserved regions
> for userspace to map using /dev/mem, let it be mapped cacheable as long
> as it is within the memory region described in the device tree.
>
> Signed-off-by: Scott Wood <scottwood@freescale.com>
> ---
> This isn't just a performance issue, but it could also be a correctness
> issue, if the reserved portion of RAM is mapped cacheable by e.g. a KVM
> guest.  This patch does not address cases where such regions could show up
> as something other than a standard memory node -- such as shared regions
> in an AMP configuration.  Ideally there would be some means for a platform
> to register cacheable regions, without having to completely replace the
> entire phys_mem_access_prot function.

Agreed.  Such requirement might be special case in server/desktop
world, but much more common in embedded market when migrating to
multi-core in which shared memory is commonly used for inter-core
communication.  It will be best to have a common way to achieve this.

Not only cache-ability but also coherent property might need to be tunable.

- Leo
diff mbox

Patch

diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index 10a44e6..4298a56 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -39,6 +39,8 @@ 
 #include <asm/firmware.h>
 #include <asm/eeh.h>
 
+#include <mm/mmu_decl.h>
+
 static DEFINE_SPINLOCK(hose_spinlock);
 LIST_HEAD(hose_list);
 
@@ -398,7 +400,8 @@  pgprot_t pci_phys_mem_access_prot(struct file *file,
 	resource_size_t offset = ((resource_size_t)pfn) << PAGE_SHIFT;
 	int i;
 
-	if (page_is_ram(pfn))
+	if (pfn >= (memstart_addr >> PAGE_SHIFT) &&
+	    pfn <= (memend_addr >> PAGE_SHIFT))
 		return prot;
 
 	prot = pgprot_noncached(prot);
diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
index fed9bf6..f8f5966 100644
--- a/arch/powerpc/kernel/prom.c
+++ b/arch/powerpc/kernel/prom.c
@@ -62,6 +62,8 @@ 
 #define DBG(fmt...)
 #endif
 
+phys_addr_t memend_addr;
+
 #ifdef CONFIG_PPC64
 int __initdata iommu_is_off;
 int __initdata iommu_force_on;
@@ -504,6 +506,7 @@  void __init early_init_dt_add_memory_arch(u64 base, u64 size)
 	memblock_add(base, size);
 
 	memstart_addr = min((u64)memstart_addr, base);
+	memend_addr = max((u64)memend_addr, base + size - 1);
 }
 
 u64 __init early_init_dt_alloc_memory_arch(u64 size, u64 align)
diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index 1a84a8d..f947a39 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -104,7 +104,8 @@  pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
 	if (ppc_md.phys_mem_access_prot)
 		return ppc_md.phys_mem_access_prot(file, pfn, size, vma_prot);
 
-	if (!page_is_ram(pfn))
+	if (pfn < (memstart_addr >> PAGE_SHIFT) ||
+	    pfn > (memend_addr >> PAGE_SHIFT))
 		vma_prot = pgprot_noncached(vma_prot);
 
 	return vma_prot;
diff --git a/arch/powerpc/mm/mmu_decl.h b/arch/powerpc/mm/mmu_decl.h
index dd0a258..05f1ac6 100644
--- a/arch/powerpc/mm/mmu_decl.h
+++ b/arch/powerpc/mm/mmu_decl.h
@@ -119,6 +119,7 @@  extern phys_addr_t __initial_memory_limit_addr;
 extern phys_addr_t total_memory;
 extern phys_addr_t total_lowmem;
 extern phys_addr_t memstart_addr;
+extern phys_addr_t memend_addr;
 extern phys_addr_t lowmem_end_addr;
 
 #ifdef CONFIG_WII