[1/2] arm: Fix cache inconsistency when using fixmap

Submitted by Jon Medhurst (Tixy) on March 17, 2017, 4:10 p.m.

Details

Message ID 1489767002.3063.10.camel@linaro.org
State New
Headers show

Commit Message

Jon Medhurst (Tixy) March 17, 2017, 4:10 p.m.
On Fri, 2017-03-17 at 11:53 +0000, Russell King - ARM Linux wrote:
> On Thu, Mar 16, 2017 at 01:29:57PM +0000, Jon Medhurst wrote:
> > To cope with the variety in ARM architectures and configurations, the
> > pagetable attributes for kernel memory are generated at runtime to match
> > the system the kernel finds itself on. This calculated value is stored
> > in pgprot_kernel.
> > 
> > However, when early fixmap support was added for arm (commit
> > a5f4c561b3b1) the attributes used for mappings were hard coded because
> > pgprot_kernel is not set up early enough. Unfortunately, the values used
> > didn't include the 'shareable' attribute which means that for later
> > non-early fixmap use, when multiple CPUs are running, any cache entries
> > allocated for fixmap memory aren't kept consistent between CPUs. This
> > can result in different CPUs seeing different memory contents.
> 
> This also likely causes unpredictable behaviour (aliased attributes).
> 
> > This issue was discovered on a dual cluster system by failures with
> > kprobes, which uses fixmap to modify the kernel image if
> > CONFIG_DEBUG_RODATA is enabled. It will also affect kgdb and jump_labels
> > which also make use of the same code to modify the kernel, and any other
> > uses of fixmap after secondary CPUs are brought online.
> > 
> > To fix this issue, and to help avoid other potential problems where
> > pagetable attributes are incorrect, we change the fixmap code to use the
> > same generated value in pgprot_kernel that the rest of the kernel uses,
> > and only fall back to a hard coded value if this isn't set - which will
> > be early on in boot before other CPUs are brought online.
> 
> I'm not happy with this - if we need to create early fixmaps, then
> we need to know the correct attributes to use, so let's move the
> attribute initialisation earlier.  This solution feels too much like
> hacking around the problem.

I must admit to having similar thoughts and plead guilty to ignoring
them. Not knowing how early fixmaps are being used I let myself think
'it must have been originally done this way for a reason'.

It looks to me that build_mem_type_table doesn't have much in the way of
dependencies so can probably be just called earlier. So, is the correct
solution to

a) call build_mem_type_table from setup_arch before early_fixmap_init
b) move call to build_mem_type_table into early_fixmap_init
c) call build_mem_type_table from early_fixmap_init as well as
paging_init and have a test in build_mem_type_table so it only exectutes
once
d) something else

a) seems simplest, b) seems wrong, c) seems over the top

Anyway, below is an alternative to $subject patch that works for my
kprobes test cases. Note, the removal of FIXMAP_PAGE_{NORMAL,RO} means
the generic fixmap code will define these from PAGE_KERNEL{,_RO}.

Not knowing how early fixmap is used, I hope Stefan and/or Ard have
testcases they could run. 

I'm also wondering if the existing definition of FIXMAP_PAGE_IO is
correc
t and should not also be based on some platform specific value
calculated
 in build_mem_type_table?


 	unsigned int cr = get_cr();
@@ -1616,7 +1616,6 @@ void __init paging_init(const struct machine_desc
*mdesc)
 {
 	void *zero_page;
 
-	build_mem_type_table();
 	prepare_page_table();
 	map_lowmem();
 	memblock_set_current_limit(arm_lowmem_limit);
-- 
2.11.0

Comments

Ard Biesheuvel March 17, 2017, 4:18 p.m.
On 17 March 2017 at 16:10, Jon Medhurst (Tixy) <tixy@linaro.org> wrote:
> On Fri, 2017-03-17 at 11:53 +0000, Russell King - ARM Linux wrote:
>> On Thu, Mar 16, 2017 at 01:29:57PM +0000, Jon Medhurst wrote:
>> > To cope with the variety in ARM architectures and configurations, the
>> > pagetable attributes for kernel memory are generated at runtime to match
>> > the system the kernel finds itself on. This calculated value is stored
>> > in pgprot_kernel.
>> >
>> > However, when early fixmap support was added for arm (commit
>> > a5f4c561b3b1) the attributes used for mappings were hard coded because
>> > pgprot_kernel is not set up early enough. Unfortunately, the values used
>> > didn't include the 'shareable' attribute which means that for later
>> > non-early fixmap use, when multiple CPUs are running, any cache entries
>> > allocated for fixmap memory aren't kept consistent between CPUs. This
>> > can result in different CPUs seeing different memory contents.
>>
>> This also likely causes unpredictable behaviour (aliased attributes).
>>
>> > This issue was discovered on a dual cluster system by failures with
>> > kprobes, which uses fixmap to modify the kernel image if
>> > CONFIG_DEBUG_RODATA is enabled. It will also affect kgdb and jump_labels
>> > which also make use of the same code to modify the kernel, and any other
>> > uses of fixmap after secondary CPUs are brought online.
>> >
>> > To fix this issue, and to help avoid other potential problems where
>> > pagetable attributes are incorrect, we change the fixmap code to use the
>> > same generated value in pgprot_kernel that the rest of the kernel uses,
>> > and only fall back to a hard coded value if this isn't set - which will
>> > be early on in boot before other CPUs are brought online.
>>
>> I'm not happy with this - if we need to create early fixmaps, then
>> we need to know the correct attributes to use, so let's move the
>> attribute initialisation earlier.  This solution feels too much like
>> hacking around the problem.
>
> I must admit to having similar thoughts and plead guilty to ignoring
> them. Not knowing how early fixmaps are being used I let myself think
> 'it must have been originally done this way for a reason'.
>
> It looks to me that build_mem_type_table doesn't have much in the way of
> dependencies so can probably be just called earlier. So, is the correct
> solution to
>
> a) call build_mem_type_table from setup_arch before early_fixmap_init
> b) move call to build_mem_type_table into early_fixmap_init
> c) call build_mem_type_table from early_fixmap_init as well as
> paging_init and have a test in build_mem_type_table so it only exectutes
> once
> d) something else
>
> a) seems simplest, b) seems wrong, c) seems over the top
>
> Anyway, below is an alternative to $subject patch that works for my
> kprobes test cases. Note, the removal of FIXMAP_PAGE_{NORMAL,RO} means
> the generic fixmap code will define these from PAGE_KERNEL{,_RO}.
>
> Not knowing how early fixmap is used, I hope Stefan and/or Ard have
> testcases they could run.
>

The early UEFI boot code maps firmware tables using early_ioremap(),
which is layered on top of early_fixmap(). This code executes after
early_ioremap_init(), so moving build_mem_type_table() before that
sounds like the obvious fix to me. The other use case is earlycon,
which uses early_fixmap() directly, but afaict, the same applies there
as well.

In terms of code changes, there is a d) option where the call sequence
build_mem_type/early_fixmap_init/early_ioremap_init is grouped into a
new function in mm/mmu.c, which you can call from setup_arch(). That
would be the cleanest approach imo.

> I'm also wondering if the existing definition of FIXMAP_PAGE_IO is
> correc
> t and should not also be based on some platform specific value
> calculated
>  in build_mem_type_table?
>
>
> diff --git a/arch/arm/include/asm/fixmap.h
> b/arch/arm/include/asm/fixmap.h
> index 5c17d2dec777..30871fb269f0 100644
> --- a/arch/arm/include/asm/fixmap.h
> +++ b/arch/arm/include/asm/fixmap.h
> @@ -41,9 +41,6 @@ static const enum fixed_addresses
> __end_of_fixed_addresses =
>
>  #define FIXMAP_PAGE_COMMON     (L_PTE_YOUNG | L_PTE_PRESENT |
> L_PTE_XN | L_PTE_DIRTY)
>
> -#define FIXMAP_PAGE_NORMAL     (FIXMAP_PAGE_COMMON |
> L_PTE_MT_WRITEBACK)
> -#define FIXMAP_PAGE_RO         (FIXMAP_PAGE_NORMAL |
> L_PTE_RDONLY)
> -
>  /* Used by set_fixmap_(io|nocache), both meant for mapping a device */
>  #define FIXMAP_PAGE_IO         (FIXMAP_PAGE_COMMON |
> L_PTE_MT_DEV_SHARED | L_PTE_SHARED)
>  #define FIXMAP_PAGE_NOCACHE    FIXMAP_PAGE_IO
> diff --git a/arch/arm/include/asm/pgtable.h
> b/arch/arm/include/asm/pgtable.h
> index 1c462381c225..6a98856a8fa9 100644
> --- a/arch/arm/include/asm/pgtable.h
> +++ b/arch/arm/include/asm/pgtable.h
> @@ -98,6 +98,7 @@ extern pgprot_t               pgprot_s2_device;
>  #define PAGE_READONLY_EXEC     _MOD_PROT(pgprot_user, L_PTE_USER |
> L_PTE_RDONLY)
>  #define PAGE_KERNEL            _MOD_PROT(pgprot_kernel, L_PTE_XN)
>  #define PAGE_KERNEL_EXEC       pgprot_kernel
> +#define PAGE_KERNEL_RO         _MOD_PROT(pgprot_kernel, L_PTE_XN
> | L_PTE_RDONLY)
>  #define PAGE_HYP               _MOD_PROT(pgprot_kernel, L_PTE_HYP |
> L_PTE_XN)
>  #define PAGE_HYP_EXEC          _MOD_PROT(pgprot_kernel, L_PTE_HYP
> | L_PTE_RDONLY)
>  #define PAGE_HYP_RO            _MOD_PROT(pgprot_kernel, L_PTE_HYP |
> L_PTE_RDONLY | L_PTE_XN)
> diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c
> index f4e54503afa9..fc4782fa5071 100644
> --- a/arch/arm/kernel/setup.c
> +++ b/arch/arm/kernel/setup.c
> @@ -79,6 +79,7 @@ __setup("fpe=", fpe_setup);
>  #endif
>
>  extern void init_default_cache_policy(unsigned long);
> +extern void build_mem_type_table(void);
>  extern void paging_init(const struct machine_desc *desc);
>  extern void early_paging_init(const struct machine_desc *);
>  extern void adjust_lowmem_bounds(void);
> @@ -1082,6 +1083,8 @@ void __init setup_arch(char **cmdline_p)
>         strlcpy(cmd_line, boot_command_line, COMMAND_LINE_SIZE);
>         *cmdline_p = cmd_line;
>
> +       build_mem_type_table();
> +
>         early_fixmap_init();
>         early_ioremap_init();
>
> diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
> index 4e016d7f37b3..c8e1de3ceb02 100644
> --- a/arch/arm/mm/mmu.c
> +++ b/arch/arm/mm/mmu.c
> @@ -425,7 +425,7 @@ void __set_fixmap(enum fixed_addresses idx,
> phys_addr_t phys, pgprot_t prot)
>  /*
>   * Adjust the PMD section entries according to the CPU in use.
>   */
> -static void __init build_mem_type_table(void)
> +void __init build_mem_type_table(void)
>  {
>         struct cachepolicy *cp;
>         unsigned int cr = get_cr();
> @@ -1616,7 +1616,6 @@ void __init paging_init(const struct machine_desc
> *mdesc)
>  {
>         void *zero_page;
>
> -       build_mem_type_table();
>         prepare_page_table();
>         map_lowmem();
>         memblock_set_current_limit(arm_lowmem_limit);
> --
> 2.11.0
>
>
Jon Medhurst (Tixy) March 20, 2017, 7:09 p.m.
On Fri, 2017-03-17 at 16:18 +0000, Ard Biesheuvel wrote:
> On 17 March 2017 at 16:10, Jon Medhurst (Tixy) <tixy@linaro.org> wrote:
> > 
[...]
> > It looks to me that build_mem_type_table doesn't have much in the way of
> > dependencies so can probably be just called earlier. So, is the correct
> > solution to
> > 
> > a) call build_mem_type_table from setup_arch before early_fixmap_init
> > b) move call to build_mem_type_table into early_fixmap_init
> > c) call build_mem_type_table from early_fixmap_init as well as
> > paging_init and have a test in build_mem_type_table so it only exectutes
> > once
> > d) something else
> > 
> > a) seems simplest, b) seems wrong, c) seems over the top
> > 
> > Anyway, below is an alternative to $subject patch that works for my
> > kprobes test cases. Note, the removal of FIXMAP_PAGE_{NORMAL,RO} means
> > the generic fixmap code will define these from PAGE_KERNEL{,_RO}.
> > 
> > Not knowing how early fixmap is used, I hope Stefan and/or Ard have
> > testcases they could run.
> > 
> 
> The early UEFI boot code maps firmware tables using early_ioremap(),
> which is layered on top of early_fixmap(). This code executes after
> early_ioremap_init(), so moving build_mem_type_table() before that
> sounds like the obvious fix to me. The other use case is earlycon,
> which uses early_fixmap() directly, but afaict, the same applies there
> as well.

So is that 'code should work, no need to test'? Guess it should be safe
to skip if those use cases use FIXMAP_PAGE_NOCACHE and FIXMAP_PAGE_IO
and we don't change those defines.

> In terms of code changes, there is a d) option where the call sequence
> build_mem_type/early_fixmap_init/early_ioremap_init is grouped into a
> new function in mm/mmu.c, which you can call from setup_arch(). That
> would be the cleanest approach imo.

Any suggestion on a name for that function?

> 
> > I'm also wondering if the existing definition of FIXMAP_PAGE_IO is
> > correc
> > t and should not also be based on some platform specific value
> > calculated
> >  in build_mem_type_table?

Answering myself. FIXMAP_PAGE_IO is defined with the same values as 
mem_types[MT_DEVICE].prot_pte which doesn't get modified at runtime, so
should be correct... Unless the device being mapped needs Non-shareable
Device memory (MT_DEVICE_NONSHARED) in which case we're onto dodgy
ground. I can't see how we can detect that in code or help someone using
the API to avoid that. I certainly don't intend trying to redesign the
API and implementation of early_ioremap to fix these limitations.
Ard Biesheuvel March 20, 2017, 7:26 p.m.
On 20 March 2017 at 19:09, Jon Medhurst (Tixy) <tixy@linaro.org> wrote:
> On Fri, 2017-03-17 at 16:18 +0000, Ard Biesheuvel wrote:
>> On 17 March 2017 at 16:10, Jon Medhurst (Tixy) <tixy@linaro.org> wrote:
>> >
> [...]
>> > It looks to me that build_mem_type_table doesn't have much in the way of
>> > dependencies so can probably be just called earlier. So, is the correct
>> > solution to
>> >
>> > a) call build_mem_type_table from setup_arch before early_fixmap_init
>> > b) move call to build_mem_type_table into early_fixmap_init
>> > c) call build_mem_type_table from early_fixmap_init as well as
>> > paging_init and have a test in build_mem_type_table so it only exectutes
>> > once
>> > d) something else
>> >
>> > a) seems simplest, b) seems wrong, c) seems over the top
>> >
>> > Anyway, below is an alternative to $subject patch that works for my
>> > kprobes test cases. Note, the removal of FIXMAP_PAGE_{NORMAL,RO} means
>> > the generic fixmap code will define these from PAGE_KERNEL{,_RO}.
>> >
>> > Not knowing how early fixmap is used, I hope Stefan and/or Ard have
>> > testcases they could run.
>> >
>>
>> The early UEFI boot code maps firmware tables using early_ioremap(),
>> which is layered on top of early_fixmap(). This code executes after
>> early_ioremap_init(), so moving build_mem_type_table() before that
>> sounds like the obvious fix to me. The other use case is earlycon,
>> which uses early_fixmap() directly, but afaict, the same applies there
>> as well.
>
> So is that 'code should work, no need to test'? Guess it should be safe
> to skip if those use cases use FIXMAP_PAGE_NOCACHE and FIXMAP_PAGE_IO
> and we don't change those defines.
>

In fact, the UEFI code uses early_memremap() not early_ioremap(), and
it does use the memory defines, not the device ones.

So yes, I should test it, but I don't see any reason for huge concern,
given that the UEFI code maps and unmaps those tables when we're still
running UP

>> In terms of code changes, there is a d) option where the call sequence
>> build_mem_type/early_fixmap_init/early_ioremap_init is grouped into a
>> new function in mm/mmu.c, which you can call from setup_arch(). That
>> would be the cleanest approach imo.
>
> Any suggestion on a name for that function?
>

early_mm_init?

>>
>> > I'm also wondering if the existing definition of FIXMAP_PAGE_IO is
>> > correc
>> > t and should not also be based on some platform specific value
>> > calculated
>> >  in build_mem_type_table?
>
> Answering myself. FIXMAP_PAGE_IO is defined with the same values as
> mem_types[MT_DEVICE].prot_pte which doesn't get modified at runtime, so
> should be correct... Unless the device being mapped needs Non-shareable
> Device memory (MT_DEVICE_NONSHARED) in which case we're onto dodgy
> ground. I can't see how we can detect that in code or help someone using
> the API to avoid that. I certainly don't intend trying to redesign the
> API and implementation of early_ioremap to fix these limitations.
>
> --
> Tixy
>

Patch hide | download patch | download mbox

diff --git a/arch/arm/include/asm/fixmap.h
b/arch/arm/include/asm/fixmap.h
index 5c17d2dec777..30871fb269f0 100644
--- a/arch/arm/include/asm/fixmap.h
+++ b/arch/arm/include/asm/fixmap.h
@@ -41,9 +41,6 @@  static const enum fixed_addresses
__end_of_fixed_addresses =
 
 #define FIXMAP_PAGE_COMMON	(L_PTE_YOUNG | L_PTE_PRESENT |
L_PTE_XN | L_PTE_DIRTY)
 
-#define FIXMAP_PAGE_NORMAL	(FIXMAP_PAGE_COMMON |
L_PTE_MT_WRITEBACK)
-#define FIXMAP_PAGE_RO		(FIXMAP_PAGE_NORMAL |
L_PTE_RDONLY)
-
 /* Used by set_fixmap_(io|nocache), both meant for mapping a device */
 #define FIXMAP_PAGE_IO		(FIXMAP_PAGE_COMMON |
L_PTE_MT_DEV_SHARED | L_PTE_SHARED)
 #define FIXMAP_PAGE_NOCACHE	FIXMAP_PAGE_IO
diff --git a/arch/arm/include/asm/pgtable.h
b/arch/arm/include/asm/pgtable.h
index 1c462381c225..6a98856a8fa9 100644
--- a/arch/arm/include/asm/pgtable.h
+++ b/arch/arm/include/asm/pgtable.h
@@ -98,6 +98,7 @@  extern pgprot_t		pgprot_s2_device;
 #define PAGE_READONLY_EXEC	_MOD_PROT(pgprot_user, L_PTE_USER |
L_PTE_RDONLY)
 #define PAGE_KERNEL		_MOD_PROT(pgprot_kernel, L_PTE_XN)
 #define PAGE_KERNEL_EXEC	pgprot_kernel
+#define PAGE_KERNEL_RO		_MOD_PROT(pgprot_kernel, L_PTE_XN
| L_PTE_RDONLY)
 #define PAGE_HYP		_MOD_PROT(pgprot_kernel, L_PTE_HYP |
L_PTE_XN)
 #define PAGE_HYP_EXEC		_MOD_PROT(pgprot_kernel, L_PTE_HYP
| L_PTE_RDONLY)
 #define PAGE_HYP_RO		_MOD_PROT(pgprot_kernel, L_PTE_HYP |
L_PTE_RDONLY | L_PTE_XN)
diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c
index f4e54503afa9..fc4782fa5071 100644
--- a/arch/arm/kernel/setup.c
+++ b/arch/arm/kernel/setup.c
@@ -79,6 +79,7 @@  __setup("fpe=", fpe_setup);
 #endif
 
 extern void init_default_cache_policy(unsigned long);
+extern void build_mem_type_table(void);
 extern void paging_init(const struct machine_desc *desc);
 extern void early_paging_init(const struct machine_desc *);
 extern void adjust_lowmem_bounds(void);
@@ -1082,6 +1083,8 @@  void __init setup_arch(char **cmdline_p)
 	strlcpy(cmd_line, boot_command_line, COMMAND_LINE_SIZE);
 	*cmdline_p = cmd_line;
 
+	build_mem_type_table();
+
 	early_fixmap_init();
 	early_ioremap_init();
 
diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index 4e016d7f37b3..c8e1de3ceb02 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -425,7 +425,7 @@  void __set_fixmap(enum fixed_addresses idx,
phys_addr_t phys, pgprot_t prot)
 /*
  * Adjust the PMD section entries according to the CPU in use.
  */
-static void __init build_mem_type_table(void)
+void __init build_mem_type_table(void)
 {
 	struct cachepolicy *cp;