diff mbox series

[v10,4/5] x86/kasan: support KASAN_VMALLOC

Message ID 20191029042059.28541-5-dja@axtens.net (mailing list archive)
State Superseded
Headers show
Series kasan: support backing vmalloc space with real shadow memory | expand

Checks

Context Check Description
snowpatch_ozlabs/apply_patch warning Failed to apply on branch powerpc/merge (a437a1c6e54c045f19f7d894b2a749bc5155f19a)
snowpatch_ozlabs/apply_patch warning Failed to apply on branch powerpc/next (612ee81b9461475b5a5612c2e8d71559dd3c7920)
snowpatch_ozlabs/apply_patch warning Failed to apply on branch linus/master (8005803a2ca0af49f36f6e9329b5ecda3df27347)
snowpatch_ozlabs/apply_patch warning Failed to apply on branch powerpc/fixes (a8a30219ba78b1abb92091102b632f8e9bbdbf03)
snowpatch_ozlabs/apply_patch success Successfully applied on branch linux-next (60c1769a45f4b6beddcc48843739d7d41b88dc1c)
snowpatch_ozlabs/checkpatch warning total: 0 errors, 0 warnings, 2 checks, 82 lines checked

Commit Message

Daniel Axtens Oct. 29, 2019, 4:20 a.m. UTC
In the case where KASAN directly allocates memory to back vmalloc
space, don't map the early shadow page over it.

We prepopulate pgds/p4ds for the range that would otherwise be empty.
This is required to get it synced to hardware on boot, allowing the
lower levels of the page tables to be filled dynamically.

Acked-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Daniel Axtens <dja@axtens.net>

---
v5: fix some checkpatch CHECK warnings. There are some that remain
    around lines ending with '(': I have not changed these because
    it's consistent with the rest of the file and it's not easy to
    see how to fix it without creating an overlong line or lots of
    temporary variables.

v2: move from faulting in shadow pgds to prepopulating
---
 arch/x86/Kconfig            |  1 +
 arch/x86/mm/kasan_init_64.c | 60 +++++++++++++++++++++++++++++++++++++
 2 files changed, 61 insertions(+)

Comments

Andrey Ryabinin Oct. 29, 2019, 5:21 p.m. UTC | #1
On 10/29/19 7:20 AM, Daniel Axtens wrote:
> In the case where KASAN directly allocates memory to back vmalloc
> space, don't map the early shadow page over it.
> 
> We prepopulate pgds/p4ds for the range that would otherwise be empty.
> This is required to get it synced to hardware on boot, allowing the
> lower levels of the page tables to be filled dynamically.
> 
> Acked-by: Dmitry Vyukov <dvyukov@google.com>
> Signed-off-by: Daniel Axtens <dja@axtens.net>
> 
> ---

> +static void __init kasan_shallow_populate_pgds(void *start, void *end)
> +{
> +	unsigned long addr, next;
> +	pgd_t *pgd;
> +	void *p;
> +	int nid = early_pfn_to_nid((unsigned long)start);

This doesn't make sense. start is not even a pfn. With linear mapping 
we try to identify nid to have the shadow on the same node as memory. But 
in this case we don't have memory or the corresponding shadow (yet),
we only install pgd/p4d.
I guess we could just use NUMA_NO_NODE.

The rest looks ok, so with that fixed:

Reviewed-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Daniel Axtens Oct. 30, 2019, 1:50 p.m. UTC | #2
Andrey Ryabinin <aryabinin@virtuozzo.com> writes:

> On 10/29/19 7:20 AM, Daniel Axtens wrote:
>> In the case where KASAN directly allocates memory to back vmalloc
>> space, don't map the early shadow page over it.
>> 
>> We prepopulate pgds/p4ds for the range that would otherwise be empty.
>> This is required to get it synced to hardware on boot, allowing the
>> lower levels of the page tables to be filled dynamically.
>> 
>> Acked-by: Dmitry Vyukov <dvyukov@google.com>
>> Signed-off-by: Daniel Axtens <dja@axtens.net>
>> 
>> ---
>
>> +static void __init kasan_shallow_populate_pgds(void *start, void *end)
>> +{
>> +	unsigned long addr, next;
>> +	pgd_t *pgd;
>> +	void *p;
>> +	int nid = early_pfn_to_nid((unsigned long)start);
>
> This doesn't make sense. start is not even a pfn. With linear mapping 
> we try to identify nid to have the shadow on the same node as memory. But 
> in this case we don't have memory or the corresponding shadow (yet),
> we only install pgd/p4d.
> I guess we could just use NUMA_NO_NODE.

Ah wow, that's quite the clanger on my part.

There are a couple of other invocations of early_pfn_to_nid in that file
that use an address directly, but at least they reference actual memory.
I'll send a separate patch to fix those up.

> The rest looks ok, so with that fixed:
>
> Reviewed-by: Andrey Ryabinin <aryabinin@virtuozzo.com>

Thanks heaps! I've fixed up the nit you identifed in the first patch,
and I agree that the last patch probably isn't needed. I'll respin the
series shortly.

Regards,
Daniel
Andrey Ryabinin Oct. 30, 2019, 2:12 p.m. UTC | #3
On 10/30/19 4:50 PM, Daniel Axtens wrote:
> Andrey Ryabinin <aryabinin@virtuozzo.com> writes:
> 
>> On 10/29/19 7:20 AM, Daniel Axtens wrote:
>>> In the case where KASAN directly allocates memory to back vmalloc
>>> space, don't map the early shadow page over it.
>>>
>>> We prepopulate pgds/p4ds for the range that would otherwise be empty.
>>> This is required to get it synced to hardware on boot, allowing the
>>> lower levels of the page tables to be filled dynamically.
>>>
>>> Acked-by: Dmitry Vyukov <dvyukov@google.com>
>>> Signed-off-by: Daniel Axtens <dja@axtens.net>
>>>
>>> ---
>>
>>> +static void __init kasan_shallow_populate_pgds(void *start, void *end)
>>> +{
>>> +	unsigned long addr, next;
>>> +	pgd_t *pgd;
>>> +	void *p;
>>> +	int nid = early_pfn_to_nid((unsigned long)start);
>>
>> This doesn't make sense. start is not even a pfn. With linear mapping 
>> we try to identify nid to have the shadow on the same node as memory. But 
>> in this case we don't have memory or the corresponding shadow (yet),
>> we only install pgd/p4d.
>> I guess we could just use NUMA_NO_NODE.
> 
> Ah wow, that's quite the clanger on my part.
> 
> There are a couple of other invocations of early_pfn_to_nid in that file
> that use an address directly, but at least they reference actual memory.
> I'll send a separate patch to fix those up.

I see only one incorrect, in kasan_init(): early_pfn_to_nid(__pa(_stext))
It should be wrapped with PFN_DOWN().
Other usages in map_range() seems to be correct, range->start,end is pfns.


> 
>> The rest looks ok, so with that fixed:
>>
>> Reviewed-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
> 
> Thanks heaps! I've fixed up the nit you identifed in the first patch,
> and I agree that the last patch probably isn't needed. I'll respin the
> series shortly.
> 

Hold on a sec, just spotted another thing to fix.

> @@ -352,9 +397,24 @@ void __init kasan_init(void)
>  	shadow_cpu_entry_end = (void *)round_up(
>  			(unsigned long)shadow_cpu_entry_end, PAGE_SIZE);
>  
> +	/*
> +	 * If we're in full vmalloc mode, don't back vmalloc space with early
> +	 * shadow pages. Instead, prepopulate pgds/p4ds so they are synced to
> +	 * the global table and we can populate the lower levels on demand.
> +	 */
> +#ifdef CONFIG_KASAN_VMALLOC
> +	kasan_shallow_populate_pgds(
> +		kasan_mem_to_shadow((void *)PAGE_OFFSET + MAXMEM),

This should be VMALLOC_START, there is no point to allocate pgds for the hole between linear mapping
and vmalloc, just waste of memory. It make sense to map early shadow for that hole, because if code
dereferences address in that hole we will see the page fault on that address instead of fault on the shadow.

So something like this might work:

	kasan_populate_early_shadow(
		kasan_mem_to_shadow((void *)PAGE_OFFSET + MAXMEM),
		kasan_mem_to_shadow((void *)VMALLOC_START));

	if (IS_ENABLED(CONFIG_KASAN_VMALLOC)
		kasan_shallow_populate_pgds(kasan_mem_to_shadow(VMALLOC_START), kasan_mem_to_shadow((void *)VMALLOC_END))
	else
		kasan_populate_early_shadow(kasan_mem_to_shadow(VMALLOC_START), kasan_mem_to_shadow((void *)VMALLOC_END));

	kasan_populate_early_shadow(
		kasan_mem_to_shadow((void *)VMALLOC_END + 1),
		shadow_cpu_entry_begin);
Daniel Axtens Oct. 30, 2019, 2:21 p.m. UTC | #4
Andrey Ryabinin <aryabinin@virtuozzo.com> writes:

> On 10/30/19 4:50 PM, Daniel Axtens wrote:
>> Andrey Ryabinin <aryabinin@virtuozzo.com> writes:
>> 
>>> On 10/29/19 7:20 AM, Daniel Axtens wrote:
>>>> In the case where KASAN directly allocates memory to back vmalloc
>>>> space, don't map the early shadow page over it.
>>>>
>>>> We prepopulate pgds/p4ds for the range that would otherwise be empty.
>>>> This is required to get it synced to hardware on boot, allowing the
>>>> lower levels of the page tables to be filled dynamically.
>>>>
>>>> Acked-by: Dmitry Vyukov <dvyukov@google.com>
>>>> Signed-off-by: Daniel Axtens <dja@axtens.net>
>>>>
>>>> ---
>>>
>>>> +static void __init kasan_shallow_populate_pgds(void *start, void *end)
>>>> +{
>>>> +	unsigned long addr, next;
>>>> +	pgd_t *pgd;
>>>> +	void *p;
>>>> +	int nid = early_pfn_to_nid((unsigned long)start);
>>>
>>> This doesn't make sense. start is not even a pfn. With linear mapping 
>>> we try to identify nid to have the shadow on the same node as memory. But 
>>> in this case we don't have memory or the corresponding shadow (yet),
>>> we only install pgd/p4d.
>>> I guess we could just use NUMA_NO_NODE.
>> 
>> Ah wow, that's quite the clanger on my part.
>> 
>> There are a couple of other invocations of early_pfn_to_nid in that file
>> that use an address directly, but at least they reference actual memory.
>> I'll send a separate patch to fix those up.
>
> I see only one incorrect, in kasan_init(): early_pfn_to_nid(__pa(_stext))
> It should be wrapped with PFN_DOWN().
> Other usages in map_range() seems to be correct, range->start,end is pfns.
>

Oh, right, I didn't realise map_range was already using pfns.

>
>> 
>>> The rest looks ok, so with that fixed:
>>>
>>> Reviewed-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
>> 
>> Thanks heaps! I've fixed up the nit you identifed in the first patch,
>> and I agree that the last patch probably isn't needed. I'll respin the
>> series shortly.
>> 
>
> Hold on a sec, just spotted another thing to fix.
>
>> @@ -352,9 +397,24 @@ void __init kasan_init(void)
>>  	shadow_cpu_entry_end = (void *)round_up(
>>  			(unsigned long)shadow_cpu_entry_end, PAGE_SIZE);
>>  
>> +	/*
>> +	 * If we're in full vmalloc mode, don't back vmalloc space with early
>> +	 * shadow pages. Instead, prepopulate pgds/p4ds so they are synced to
>> +	 * the global table and we can populate the lower levels on demand.
>> +	 */
>> +#ifdef CONFIG_KASAN_VMALLOC
>> +	kasan_shallow_populate_pgds(
>> +		kasan_mem_to_shadow((void *)PAGE_OFFSET + MAXMEM),
>
> This should be VMALLOC_START, there is no point to allocate pgds for the hole between linear mapping
> and vmalloc, just waste of memory. It make sense to map early shadow for that hole, because if code
> dereferences address in that hole we will see the page fault on that address instead of fault on the shadow.
>
> So something like this might work:
>
> 	kasan_populate_early_shadow(
> 		kasan_mem_to_shadow((void *)PAGE_OFFSET + MAXMEM),
> 		kasan_mem_to_shadow((void *)VMALLOC_START));
>
> 	if (IS_ENABLED(CONFIG_KASAN_VMALLOC)
> 		kasan_shallow_populate_pgds(kasan_mem_to_shadow(VMALLOC_START), kasan_mem_to_shadow((void *)VMALLOC_END))
> 	else
> 		kasan_populate_early_shadow(kasan_mem_to_shadow(VMALLOC_START), kasan_mem_to_shadow((void *)VMALLOC_END));
>
> 	kasan_populate_early_shadow(
> 		kasan_mem_to_shadow((void *)VMALLOC_END + 1),
> 		shadow_cpu_entry_begin);

Sounds good. It's getting late for me so I'll change and test that and
send a respin tomorrow my time.

Regards,
Daniel
diff mbox series

Patch

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 45699e458057..d65b0fcc9bc0 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -135,6 +135,7 @@  config X86
 	select HAVE_ARCH_JUMP_LABEL
 	select HAVE_ARCH_JUMP_LABEL_RELATIVE
 	select HAVE_ARCH_KASAN			if X86_64
+	select HAVE_ARCH_KASAN_VMALLOC		if X86_64
 	select HAVE_ARCH_KGDB
 	select HAVE_ARCH_MMAP_RND_BITS		if MMU
 	select HAVE_ARCH_MMAP_RND_COMPAT_BITS	if MMU && COMPAT
diff --git a/arch/x86/mm/kasan_init_64.c b/arch/x86/mm/kasan_init_64.c
index 296da58f3013..8f00f462709e 100644
--- a/arch/x86/mm/kasan_init_64.c
+++ b/arch/x86/mm/kasan_init_64.c
@@ -245,6 +245,51 @@  static void __init kasan_map_early_shadow(pgd_t *pgd)
 	} while (pgd++, addr = next, addr != end);
 }
 
+static void __init kasan_shallow_populate_p4ds(pgd_t *pgd,
+					       unsigned long addr,
+					       unsigned long end,
+					       int nid)
+{
+	p4d_t *p4d;
+	unsigned long next;
+	void *p;
+
+	p4d = p4d_offset(pgd, addr);
+	do {
+		next = p4d_addr_end(addr, end);
+
+		if (p4d_none(*p4d)) {
+			p = early_alloc(PAGE_SIZE, nid, true);
+			p4d_populate(&init_mm, p4d, p);
+		}
+	} while (p4d++, addr = next, addr != end);
+}
+
+static void __init kasan_shallow_populate_pgds(void *start, void *end)
+{
+	unsigned long addr, next;
+	pgd_t *pgd;
+	void *p;
+	int nid = early_pfn_to_nid((unsigned long)start);
+
+	addr = (unsigned long)start;
+	pgd = pgd_offset_k(addr);
+	do {
+		next = pgd_addr_end(addr, (unsigned long)end);
+
+		if (pgd_none(*pgd)) {
+			p = early_alloc(PAGE_SIZE, nid, true);
+			pgd_populate(&init_mm, pgd, p);
+		}
+
+		/*
+		 * we need to populate p4ds to be synced when running in
+		 * four level mode - see sync_global_pgds_l4()
+		 */
+		kasan_shallow_populate_p4ds(pgd, addr, next, nid);
+	} while (pgd++, addr = next, addr != (unsigned long)end);
+}
+
 #ifdef CONFIG_KASAN_INLINE
 static int kasan_die_handler(struct notifier_block *self,
 			     unsigned long val,
@@ -352,9 +397,24 @@  void __init kasan_init(void)
 	shadow_cpu_entry_end = (void *)round_up(
 			(unsigned long)shadow_cpu_entry_end, PAGE_SIZE);
 
+	/*
+	 * If we're in full vmalloc mode, don't back vmalloc space with early
+	 * shadow pages. Instead, prepopulate pgds/p4ds so they are synced to
+	 * the global table and we can populate the lower levels on demand.
+	 */
+#ifdef CONFIG_KASAN_VMALLOC
+	kasan_shallow_populate_pgds(
+		kasan_mem_to_shadow((void *)PAGE_OFFSET + MAXMEM),
+		kasan_mem_to_shadow((void *)VMALLOC_END));
+
+	kasan_populate_early_shadow(
+		kasan_mem_to_shadow((void *)VMALLOC_END + 1),
+		shadow_cpu_entry_begin);
+#else
 	kasan_populate_early_shadow(
 		kasan_mem_to_shadow((void *)PAGE_OFFSET + MAXMEM),
 		shadow_cpu_entry_begin);
+#endif
 
 	kasan_populate_shadow((unsigned long)shadow_cpu_entry_begin,
 			      (unsigned long)shadow_cpu_entry_end, 0);