[v2] kmemleak: skip scanning holes in the .bss section
diff mbox series

Message ID 20190313145717.46369-1-cai@lca.pw
State Superseded
Headers show
Series
  • [v2] kmemleak: skip scanning holes in the .bss section
Related show

Checks

Context Check Description
snowpatch_ozlabs/checkpatch warning total: 0 errors, 0 warnings, 1 checks, 74 lines checked
snowpatch_ozlabs/build-pmac32 success build succeeded & removed 0 sparse warning(s)
snowpatch_ozlabs/build-ppc64e success build succeeded & removed 0 sparse warning(s)
snowpatch_ozlabs/build-ppc64be success build succeeded & removed 0 sparse warning(s)
snowpatch_ozlabs/build-ppc64le success build succeeded & removed 0 sparse warning(s)
snowpatch_ozlabs/apply_patch success Successfully applied on branch next (9580b71b5a7863c24a9bd18bcd2ad759b86b1eff)

Commit Message

Qian Cai March 13, 2019, 2:57 p.m. UTC
The commit 2d4f567103ff ("KVM: PPC: Introduce kvm_tmp framework") adds
kvm_tmp[] into the .bss section and then free the rest of unused spaces
back to the page allocator.

kernel_init
  kvm_guest_init
    kvm_free_tmp
      free_reserved_area
        free_unref_page
          free_unref_page_prepare

With DEBUG_PAGEALLOC=y, it will unmap those pages from kernel. As the
result, kmemleak scan will trigger a panic below when it scans the .bss
section with unmapped pages.

Since this is done way before the first kmemleak_scan(), just go
lockless to make the implementation simple and skip those pages when
scanning the .bss section. Later, those pages could be tracked by
kmemleak again once allocated by the page allocator. Overall, this is
such a special case, so no need to make it a generic to let kmemleak
gain an ability to skip blocks in scan_large_block() for now.

BUG: Unable to handle kernel data access at 0xc000000001610000
Faulting instruction address: 0xc0000000003cc178
Oops: Kernel access of bad area, sig: 11 [#1]
LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=256 DEBUG_PAGEALLOC NUMA pSeries
CPU: 3 PID: 130 Comm: kmemleak Kdump: loaded Not tainted 5.0.0+ #9
REGS: c0000004b05bf940 TRAP: 0300   Not tainted  (5.0.0+)
NIP [c0000000003cc178] scan_block+0xa8/0x190
LR [c0000000003cc170] scan_block+0xa0/0x190
Call Trace:
[c0000004b05bfbd0] [c0000000003cc170] scan_block+0xa0/0x190 (unreliable)
[c0000004b05bfc30] [c0000000003cc2c0] scan_large_block+0x60/0xa0
[c0000004b05bfc70] [c0000000003ccc64] kmemleak_scan+0x254/0x960
[c0000004b05bfd40] [c0000000003cdd50] kmemleak_scan_thread+0xec/0x12c
[c0000004b05bfdb0] [c000000000104388] kthread+0x1b8/0x1c0
[c0000004b05bfe20] [c00000000000b364] ret_from_kernel_thread+0x5c/0x78
Instruction dump:
7fa3eb78 4844667d 60000000 60000000 60000000 60000000 3bff0008 7fbcf840
409d00b8 4bfffeed 2fa30000 409e00ac <e87f0000> e93e0128 7fa91840
419dffdc

Signed-off-by: Qian Cai <cai@lca.pw>
---

v2: make the function __init per Andrew.

 arch/powerpc/kernel/kvm.c |  3 +++
 include/linux/kmemleak.h  |  4 ++++
 mm/kmemleak.c             | 25 ++++++++++++++++++++++++-
 3 files changed, 31 insertions(+), 1 deletion(-)

Comments

Catalin Marinas March 19, 2019, 11:57 a.m. UTC | #1
Hi Qian,

On Wed, Mar 13, 2019 at 10:57:17AM -0400, Qian Cai wrote:
> @@ -1531,7 +1547,14 @@ static void kmemleak_scan(void)
>  
>  	/* data/bss scanning */
>  	scan_large_block(_sdata, _edata);
> -	scan_large_block(__bss_start, __bss_stop);
> +
> +	if (bss_hole_start) {
> +		scan_large_block(__bss_start, bss_hole_start);
> +		scan_large_block(bss_hole_stop, __bss_stop);
> +	} else {
> +		scan_large_block(__bss_start, __bss_stop);
> +	}
> +
>  	scan_large_block(__start_ro_after_init, __end_ro_after_init);

I'm not a fan of this approach but I couldn't come up with anything
better. I was hoping we could check for PageReserved() in scan_block()
but on arm64 it ends up not scanning the .bss at all.

Until another user appears, I'm ok with this patch.

Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Michael Ellerman March 20, 2019, 1:15 p.m. UTC | #2
Catalin Marinas <catalin.marinas@arm.com> writes:
> Hi Qian,
>
> On Wed, Mar 13, 2019 at 10:57:17AM -0400, Qian Cai wrote:
>> @@ -1531,7 +1547,14 @@ static void kmemleak_scan(void)
>>  
>>  	/* data/bss scanning */
>>  	scan_large_block(_sdata, _edata);
>> -	scan_large_block(__bss_start, __bss_stop);
>> +
>> +	if (bss_hole_start) {
>> +		scan_large_block(__bss_start, bss_hole_start);
>> +		scan_large_block(bss_hole_stop, __bss_stop);
>> +	} else {
>> +		scan_large_block(__bss_start, __bss_stop);
>> +	}
>> +
>>  	scan_large_block(__start_ro_after_init, __end_ro_after_init);
>
> I'm not a fan of this approach but I couldn't come up with anything
> better. I was hoping we could check for PageReserved() in scan_block()
> but on arm64 it ends up not scanning the .bss at all.
>
> Until another user appears, I'm ok with this patch.
>
> Acked-by: Catalin Marinas <catalin.marinas@arm.com>

I actually would like to rework this kvm_tmp thing to not be in bss at
all. It's a bit of a hack and is incompatible with strict RWX.

If we size it a bit more conservatively we can hopefully just reserve
some space in the text section for it.

I'm not going to have time to work on that immediately though, so if
people want this fixed now then this patch could go in as a temporary
solution.

cheers
Catalin Marinas March 20, 2019, 6:16 p.m. UTC | #3
On Thu, Mar 21, 2019 at 12:15:46AM +1100, Michael Ellerman wrote:
> Catalin Marinas <catalin.marinas@arm.com> writes:
> > On Wed, Mar 13, 2019 at 10:57:17AM -0400, Qian Cai wrote:
> >> @@ -1531,7 +1547,14 @@ static void kmemleak_scan(void)
> >>  
> >>  	/* data/bss scanning */
> >>  	scan_large_block(_sdata, _edata);
> >> -	scan_large_block(__bss_start, __bss_stop);
> >> +
> >> +	if (bss_hole_start) {
> >> +		scan_large_block(__bss_start, bss_hole_start);
> >> +		scan_large_block(bss_hole_stop, __bss_stop);
> >> +	} else {
> >> +		scan_large_block(__bss_start, __bss_stop);
> >> +	}
> >> +
> >>  	scan_large_block(__start_ro_after_init, __end_ro_after_init);
> >
> > I'm not a fan of this approach but I couldn't come up with anything
> > better. I was hoping we could check for PageReserved() in scan_block()
> > but on arm64 it ends up not scanning the .bss at all.
> >
> > Until another user appears, I'm ok with this patch.
> >
> > Acked-by: Catalin Marinas <catalin.marinas@arm.com>
> 
> I actually would like to rework this kvm_tmp thing to not be in bss at
> all. It's a bit of a hack and is incompatible with strict RWX.
> 
> If we size it a bit more conservatively we can hopefully just reserve
> some space in the text section for it.
> 
> I'm not going to have time to work on that immediately though, so if
> people want this fixed now then this patch could go in as a temporary
> solution.

I think I have a simpler idea. Kmemleak allows punching holes in
allocated objects, so just turn the data/bss sections into dedicated
kmemleak objects. This happens when kmemleak is initialised, before the
initcalls are invoked. The kvm_free_tmp() would just free the
corresponding part of the bss.

Patch below, only tested briefly on arm64. Qian, could you give it a try
on powerpc? Thanks.

--------8<------------------------------
diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c
index 683b5b3805bd..c4b8cb3c298d 100644
--- a/arch/powerpc/kernel/kvm.c
+++ b/arch/powerpc/kernel/kvm.c
@@ -712,6 +712,8 @@ static void kvm_use_magic_page(void)
 
 static __init void kvm_free_tmp(void)
 {
+	kmemleak_free_part(&kvm_tmp[kvm_tmp_index],
+			   ARRAY_SIZE(kvm_tmp) - kvm_tmp_index);
 	free_reserved_area(&kvm_tmp[kvm_tmp_index],
 			   &kvm_tmp[ARRAY_SIZE(kvm_tmp)], -1, NULL);
 }
diff --git a/mm/kmemleak.c b/mm/kmemleak.c
index 707fa5579f66..0f6adcbfc2c7 100644
--- a/mm/kmemleak.c
+++ b/mm/kmemleak.c
@@ -1529,11 +1529,6 @@ static void kmemleak_scan(void)
 	}
 	rcu_read_unlock();
 
-	/* data/bss scanning */
-	scan_large_block(_sdata, _edata);
-	scan_large_block(__bss_start, __bss_stop);
-	scan_large_block(__start_ro_after_init, __end_ro_after_init);
-
 #ifdef CONFIG_SMP
 	/* per-cpu sections scanning */
 	for_each_possible_cpu(i)
@@ -2071,6 +2066,15 @@ void __init kmemleak_init(void)
 	}
 	local_irq_restore(flags);
 
+	/* register the data/bss sections */
+	create_object((unsigned long)_sdata, _edata - _sdata,
+		      KMEMLEAK_GREY, GFP_ATOMIC);
+	create_object((unsigned long)__bss_start, __bss_stop - __bss_start,
+		      KMEMLEAK_GREY, GFP_ATOMIC);
+	create_object((unsigned long)__start_ro_after_init,
+		      __end_ro_after_init - __start_ro_after_init,
+		      KMEMLEAK_GREY, GFP_ATOMIC);
+
 	/*
 	 * This is the point where tracking allocations is safe. Automatic
 	 * scanning is started during the late initcall. Add the early logged
Qian Cai March 20, 2019, 8:04 p.m. UTC | #4
On Wed, 2019-03-20 at 18:16 +0000, Catalin Marinas wrote:
> I think I have a simpler idea. Kmemleak allows punching holes in
> allocated objects, so just turn the data/bss sections into dedicated
> kmemleak objects. This happens when kmemleak is initialised, before the
> initcalls are invoked. The kvm_free_tmp() would just free the
> corresponding part of the bss.
> 
> Patch below, only tested briefly on arm64. Qian, could you give it a try
> on powerpc? Thanks.

It works great so far!
Michael Ellerman March 21, 2019, 5:15 a.m. UTC | #5
Catalin Marinas <catalin.marinas@arm.com> writes:
> On Thu, Mar 21, 2019 at 12:15:46AM +1100, Michael Ellerman wrote:
>> Catalin Marinas <catalin.marinas@arm.com> writes:
>> > On Wed, Mar 13, 2019 at 10:57:17AM -0400, Qian Cai wrote:
>> >> @@ -1531,7 +1547,14 @@ static void kmemleak_scan(void)
>> >>  
>> >>  	/* data/bss scanning */
>> >>  	scan_large_block(_sdata, _edata);
>> >> -	scan_large_block(__bss_start, __bss_stop);
>> >> +
>> >> +	if (bss_hole_start) {
>> >> +		scan_large_block(__bss_start, bss_hole_start);
>> >> +		scan_large_block(bss_hole_stop, __bss_stop);
>> >> +	} else {
>> >> +		scan_large_block(__bss_start, __bss_stop);
>> >> +	}
>> >> +
>> >>  	scan_large_block(__start_ro_after_init, __end_ro_after_init);
>> >
>> > I'm not a fan of this approach but I couldn't come up with anything
>> > better. I was hoping we could check for PageReserved() in scan_block()
>> > but on arm64 it ends up not scanning the .bss at all.
>> >
>> > Until another user appears, I'm ok with this patch.
>> >
>> > Acked-by: Catalin Marinas <catalin.marinas@arm.com>
>> 
>> I actually would like to rework this kvm_tmp thing to not be in bss at
>> all. It's a bit of a hack and is incompatible with strict RWX.
>> 
>> If we size it a bit more conservatively we can hopefully just reserve
>> some space in the text section for it.
>> 
>> I'm not going to have time to work on that immediately though, so if
>> people want this fixed now then this patch could go in as a temporary
>> solution.
>
> I think I have a simpler idea. Kmemleak allows punching holes in
> allocated objects, so just turn the data/bss sections into dedicated
> kmemleak objects. This happens when kmemleak is initialised, before the
> initcalls are invoked. The kvm_free_tmp() would just free the
> corresponding part of the bss.
>
> Patch below, only tested briefly on arm64. Qian, could you give it a try
> on powerpc? Thanks.
>
> --------8<------------------------------
> diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c
> index 683b5b3805bd..c4b8cb3c298d 100644
> --- a/arch/powerpc/kernel/kvm.c
> +++ b/arch/powerpc/kernel/kvm.c
> @@ -712,6 +712,8 @@ static void kvm_use_magic_page(void)
>  
>  static __init void kvm_free_tmp(void)
>  {
> +	kmemleak_free_part(&kvm_tmp[kvm_tmp_index],
> +			   ARRAY_SIZE(kvm_tmp) - kvm_tmp_index);
>  	free_reserved_area(&kvm_tmp[kvm_tmp_index],
>  			   &kvm_tmp[ARRAY_SIZE(kvm_tmp)], -1, NULL);
>  }

Fine by me as long as it works (sounds like it does).

Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)

cheers

Patch
diff mbox series

diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c
index 683b5b3805bd..5cddc8fc56bb 100644
--- a/arch/powerpc/kernel/kvm.c
+++ b/arch/powerpc/kernel/kvm.c
@@ -26,6 +26,7 @@ 
 #include <linux/slab.h>
 #include <linux/of.h>
 #include <linux/pagemap.h>
+#include <linux/kmemleak.h>
 
 #include <asm/reg.h>
 #include <asm/sections.h>
@@ -712,6 +713,8 @@  static void kvm_use_magic_page(void)
 
 static __init void kvm_free_tmp(void)
 {
+	kmemleak_bss_hole(&kvm_tmp[kvm_tmp_index],
+			  &kvm_tmp[ARRAY_SIZE(kvm_tmp)]);
 	free_reserved_area(&kvm_tmp[kvm_tmp_index],
 			   &kvm_tmp[ARRAY_SIZE(kvm_tmp)], -1, NULL);
 }
diff --git a/include/linux/kmemleak.h b/include/linux/kmemleak.h
index 5ac416e2d339..17d3684e81ab 100644
--- a/include/linux/kmemleak.h
+++ b/include/linux/kmemleak.h
@@ -46,6 +46,7 @@  extern void kmemleak_alloc_phys(phys_addr_t phys, size_t size, int min_count,
 extern void kmemleak_free_part_phys(phys_addr_t phys, size_t size) __ref;
 extern void kmemleak_not_leak_phys(phys_addr_t phys) __ref;
 extern void kmemleak_ignore_phys(phys_addr_t phys) __ref;
+extern void kmemleak_bss_hole(void *start, void *stop) __init;
 
 static inline void kmemleak_alloc_recursive(const void *ptr, size_t size,
 					    int min_count, slab_flags_t flags,
@@ -131,6 +132,9 @@  static inline void kmemleak_not_leak_phys(phys_addr_t phys)
 static inline void kmemleak_ignore_phys(phys_addr_t phys)
 {
 }
+static inline void kmemleak_bss_hole(void *start, void *stop)
+{
+}
 
 #endif	/* CONFIG_DEBUG_KMEMLEAK */
 
diff --git a/mm/kmemleak.c b/mm/kmemleak.c
index 707fa5579f66..a2d894d3de07 100644
--- a/mm/kmemleak.c
+++ b/mm/kmemleak.c
@@ -237,6 +237,10 @@  static int kmemleak_skip_disable;
 /* If there are leaks that can be reported */
 static bool kmemleak_found_leaks;
 
+/* Skip scanning of a range in the .bss section. */
+static void *bss_hole_start;
+static void *bss_hole_stop;
+
 static bool kmemleak_verbose;
 module_param_named(verbose, kmemleak_verbose, bool, 0600);
 
@@ -1265,6 +1269,18 @@  void __ref kmemleak_ignore_phys(phys_addr_t phys)
 }
 EXPORT_SYMBOL(kmemleak_ignore_phys);
 
+/**
+ * kmemleak_bss_hole - skip scanning a range in the .bss section
+ *
+ * @start:	start of the range
+ * @stop:	end of the range
+ */
+void __init kmemleak_bss_hole(void *start, void *stop)
+{
+	bss_hole_start = start;
+	bss_hole_stop = stop;
+}
+
 /*
  * Update an object's checksum and return true if it was modified.
  */
@@ -1531,7 +1547,14 @@  static void kmemleak_scan(void)
 
 	/* data/bss scanning */
 	scan_large_block(_sdata, _edata);
-	scan_large_block(__bss_start, __bss_stop);
+
+	if (bss_hole_start) {
+		scan_large_block(__bss_start, bss_hole_start);
+		scan_large_block(bss_hole_stop, __bss_stop);
+	} else {
+		scan_large_block(__bss_start, __bss_stop);
+	}
+
 	scan_large_block(__start_ro_after_init, __end_ro_after_init);
 
 #ifdef CONFIG_SMP