Message ID | alpine.LRH.2.02.1804232003100.2299@file01.intranet.prod.int.rdu2.redhat.com |
---|---|
State | Not Applicable, archived |
Delegated to: | David Miller |
Headers | show |
Series | [v3] kvmalloc: always use vmalloc if CONFIG_DEBUG_SG | expand |
On Mon, 23 Apr 2018, Mikulas Patocka wrote: > The kvmalloc function tries to use kmalloc and falls back to vmalloc if > kmalloc fails. > > Unfortunatelly, some kernel code has bugs - it uses kvmalloc and then > uses DMA-API on the returned memory or frees it with kfree. Such bugs were > found in the virtio-net driver, dm-integrity or RHEL7 powerpc-specific > code. > > These bugs are hard to reproduce because kvmalloc falls back to vmalloc > only if memory is fragmented. > > In order to detect these bugs reliably I submit this patch that changes > kvmalloc to fall back to vmalloc with 1/2 probability if CONFIG_DEBUG_SG > is turned on. CONFIG_DEBUG_SG is used, because it makes the DMA API layer > verify the addresses passed to it, and so the user will get a reliable > stacktrace. > Why not just do it unconditionally? Sounds better than "50% of the time this will catch bugs". > Some bugs (such as buffer overflows) are better detected > with kmalloc code, so we must test the kmalloc path too. > > Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> > > --- > mm/util.c | 10 ++++++++++ > 1 file changed, 10 insertions(+) > > Index: linux-2.6/mm/util.c > =================================================================== > --- linux-2.6.orig/mm/util.c 2018-04-23 00:12:05.000000000 +0200 > +++ linux-2.6/mm/util.c 2018-04-23 17:57:02.000000000 +0200 > @@ -14,6 +14,7 @@ > #include <linux/hugetlb.h> > #include <linux/vmalloc.h> > #include <linux/userfaultfd_k.h> > +#include <linux/random.h> > > #include <asm/sections.h> > #include <linux/uaccess.h> > @@ -404,6 +405,12 @@ void *kvmalloc_node(size_t size, gfp_t f > */ > WARN_ON_ONCE((flags & GFP_KERNEL) != GFP_KERNEL); > > +#ifdef CONFIG_DEBUG_SG > + /* Catch bugs when the caller uses DMA API on the result of kvmalloc. */ > + if (!(prandom_u32_max(2) & 1)) > + goto do_vmalloc; > +#endif > + > /* > * We want to attempt a large physically contiguous block first because > * it is less likely to fragment multiple larger blocks and therefore > @@ -427,6 +434,9 @@ void *kvmalloc_node(size_t size, gfp_t f > if (ret || size <= PAGE_SIZE) > return ret; > > +#ifdef CONFIG_DEBUG_SG > +do_vmalloc: > +#endif You can just do do_vmalloc: __maybe_unused > return __vmalloc_node_flags_caller(size, node, flags, > __builtin_return_address(0)); > } > >
On Mon, Apr 23, 2018 at 08:06:16PM -0400, Mikulas Patocka wrote: > Some bugs (such as buffer overflows) are better detected > with kmalloc code, so we must test the kmalloc path too. Well now, this brings up another item for the collective TODO list -- implement redzone checks for vmalloc. Unless this is something already taken care of by kasan or similar.
On Mon, 23 Apr 2018, David Rientjes wrote: > On Mon, 23 Apr 2018, Mikulas Patocka wrote: > > > The kvmalloc function tries to use kmalloc and falls back to vmalloc if > > kmalloc fails. > > > > Unfortunatelly, some kernel code has bugs - it uses kvmalloc and then > > uses DMA-API on the returned memory or frees it with kfree. Such bugs were > > found in the virtio-net driver, dm-integrity or RHEL7 powerpc-specific > > code. > > > > These bugs are hard to reproduce because kvmalloc falls back to vmalloc > > only if memory is fragmented. > > > > In order to detect these bugs reliably I submit this patch that changes > > kvmalloc to fall back to vmalloc with 1/2 probability if CONFIG_DEBUG_SG > > is turned on. CONFIG_DEBUG_SG is used, because it makes the DMA API layer > > verify the addresses passed to it, and so the user will get a reliable > > stacktrace. > > Why not just do it unconditionally? Sounds better than "50% of the time > this will catch bugs". Because kmalloc (with slub_debug) detects buffer overflows better than vmalloc. vmalloc detects buffer overflows only at a page boundary. This is intended for debugging kernels and debugging kernels should detect as many bugs as possible. Mikulas
On Mon, 23 Apr 2018, Matthew Wilcox wrote: > On Mon, Apr 23, 2018 at 08:06:16PM -0400, Mikulas Patocka wrote: > > Some bugs (such as buffer overflows) are better detected > > with kmalloc code, so we must test the kmalloc path too. > > Well now, this brings up another item for the collective TODO list -- > implement redzone checks for vmalloc. Unless this is something already > taken care of by kasan or similar. The kmalloc overflow testing is also not ideal - it rounds the size up to the next slab size and detects buffer overflows only at this boundary. Some times ago, I made a "kmalloc guard" patch that places a magic number immediatelly after the requested size - so that it can detect overflows at byte boundary ( https://www.redhat.com/archives/dm-devel/2014-September/msg00018.html ) That patch found a bug in crypto code: ( http://lkml.iu.edu/hypermail/linux/kernel/1409.1/02325.html ) Mikulas
On Mon 23-04-18 20:06:16, Mikulas Patocka wrote: [...] > @@ -404,6 +405,12 @@ void *kvmalloc_node(size_t size, gfp_t f > */ > WARN_ON_ONCE((flags & GFP_KERNEL) != GFP_KERNEL); > > +#ifdef CONFIG_DEBUG_SG > + /* Catch bugs when the caller uses DMA API on the result of kvmalloc. */ > + if (!(prandom_u32_max(2) & 1)) > + goto do_vmalloc; > +#endif I really do not think there is anything DEBUG_SG specific here. Why you simply do not follow should_failslab path or even reuse the function? > + > /* > * We want to attempt a large physically contiguous block first because > * it is less likely to fragment multiple larger blocks and therefore > @@ -427,6 +434,9 @@ void *kvmalloc_node(size_t size, gfp_t f > if (ret || size <= PAGE_SIZE) > return ret; > > +#ifdef CONFIG_DEBUG_SG > +do_vmalloc: > +#endif > return __vmalloc_node_flags_caller(size, node, flags, > __builtin_return_address(0)); > }
On Tue, 24 Apr 2018, Michal Hocko wrote: > On Mon 23-04-18 20:06:16, Mikulas Patocka wrote: > [...] > > @@ -404,6 +405,12 @@ void *kvmalloc_node(size_t size, gfp_t f > > */ > > WARN_ON_ONCE((flags & GFP_KERNEL) != GFP_KERNEL); > > > > +#ifdef CONFIG_DEBUG_SG > > + /* Catch bugs when the caller uses DMA API on the result of kvmalloc. */ > > + if (!(prandom_u32_max(2) & 1)) > > + goto do_vmalloc; > > +#endif > > I really do not think there is anything DEBUG_SG specific here. Why you > simply do not follow should_failslab path or even reuse the function? CONFIG_DEBUG_SG is enabled by default in RHEL and Fedora debug kernel (if you don't like CONFIG_DEBUG_SG, pick any other option that is enabled there). Fail-injection framework is if off by default and it must be explicitly enabled and configured by the user - and most users won't enable it. Mikulas > > + > > /* > > * We want to attempt a large physically contiguous block first because > > * it is less likely to fragment multiple larger blocks and therefore > > @@ -427,6 +434,9 @@ void *kvmalloc_node(size_t size, gfp_t f > > if (ret || size <= PAGE_SIZE) > > return ret; > > > > +#ifdef CONFIG_DEBUG_SG > > +do_vmalloc: > > +#endif > > return __vmalloc_node_flags_caller(size, node, flags, > > __builtin_return_address(0)); > > } > > -- > Michal Hocko > SUSE Labs >
On Tue 24-04-18 11:50:30, Mikulas Patocka wrote: > > > On Tue, 24 Apr 2018, Michal Hocko wrote: > > > On Mon 23-04-18 20:06:16, Mikulas Patocka wrote: > > [...] > > > @@ -404,6 +405,12 @@ void *kvmalloc_node(size_t size, gfp_t f > > > */ > > > WARN_ON_ONCE((flags & GFP_KERNEL) != GFP_KERNEL); > > > > > > +#ifdef CONFIG_DEBUG_SG > > > + /* Catch bugs when the caller uses DMA API on the result of kvmalloc. */ > > > + if (!(prandom_u32_max(2) & 1)) > > > + goto do_vmalloc; > > > +#endif > > > > I really do not think there is anything DEBUG_SG specific here. Why you > > simply do not follow should_failslab path or even reuse the function? > > CONFIG_DEBUG_SG is enabled by default in RHEL and Fedora debug kernel (if > you don't like CONFIG_DEBUG_SG, pick any other option that is enabled > there). Are you telling me that you are shaping a debugging functionality basing on what RHEL has enabled? And you call me evil. This is just rediculous. > Fail-injection framework is if off by default and it must be explicitly > enabled and configured by the user - and most users won't enable it. It can be enabled easily. And if you care enough for your debugging kernel then just make it enabled unconditionally.
On Tue, 24 Apr 2018, Michal Hocko wrote: > On Tue 24-04-18 11:50:30, Mikulas Patocka wrote: > > > > > > On Tue, 24 Apr 2018, Michal Hocko wrote: > > > > > On Mon 23-04-18 20:06:16, Mikulas Patocka wrote: > > > [...] > > > > @@ -404,6 +405,12 @@ void *kvmalloc_node(size_t size, gfp_t f > > > > */ > > > > WARN_ON_ONCE((flags & GFP_KERNEL) != GFP_KERNEL); > > > > > > > > +#ifdef CONFIG_DEBUG_SG > > > > + /* Catch bugs when the caller uses DMA API on the result of kvmalloc. */ > > > > + if (!(prandom_u32_max(2) & 1)) > > > > + goto do_vmalloc; > > > > +#endif > > > > > > I really do not think there is anything DEBUG_SG specific here. Why you > > > simply do not follow should_failslab path or even reuse the function? > > > > CONFIG_DEBUG_SG is enabled by default in RHEL and Fedora debug kernel (if > > you don't like CONFIG_DEBUG_SG, pick any other option that is enabled > > there). > > Are you telling me that you are shaping a debugging functionality basing > on what RHEL has enabled? And you call me evil. This is just rediculous. > > > Fail-injection framework is if off by default and it must be explicitly > > enabled and configured by the user - and most users won't enable it. > > It can be enabled easily. And if you care enough for your debugging > kernel then just make it enabled unconditionally. So, should we add a new option CONFIG_KVMALLOC_FALLBACK_DEFAULT? I'm not quite sure if 3 lines of debugging code need an extra option, but if you don't want to reuse any existing debug option, it may be possible. Adding it to the RHEL debug kernel would be trivial. Mikulas
On Tue 24-04-18 13:00:11, Mikulas Patocka wrote: > > > On Tue, 24 Apr 2018, Michal Hocko wrote: > > > On Tue 24-04-18 11:50:30, Mikulas Patocka wrote: > > > > > > > > > On Tue, 24 Apr 2018, Michal Hocko wrote: > > > > > > > On Mon 23-04-18 20:06:16, Mikulas Patocka wrote: > > > > [...] > > > > > @@ -404,6 +405,12 @@ void *kvmalloc_node(size_t size, gfp_t f > > > > > */ > > > > > WARN_ON_ONCE((flags & GFP_KERNEL) != GFP_KERNEL); > > > > > > > > > > +#ifdef CONFIG_DEBUG_SG > > > > > + /* Catch bugs when the caller uses DMA API on the result of kvmalloc. */ > > > > > + if (!(prandom_u32_max(2) & 1)) > > > > > + goto do_vmalloc; > > > > > +#endif > > > > > > > > I really do not think there is anything DEBUG_SG specific here. Why you > > > > simply do not follow should_failslab path or even reuse the function? > > > > > > CONFIG_DEBUG_SG is enabled by default in RHEL and Fedora debug kernel (if > > > you don't like CONFIG_DEBUG_SG, pick any other option that is enabled > > > there). > > > > Are you telling me that you are shaping a debugging functionality basing > > on what RHEL has enabled? And you call me evil. This is just rediculous. > > > > > Fail-injection framework is if off by default and it must be explicitly > > > enabled and configured by the user - and most users won't enable it. > > > > It can be enabled easily. And if you care enough for your debugging > > kernel then just make it enabled unconditionally. > > So, should we add a new option CONFIG_KVMALLOC_FALLBACK_DEFAULT? I'm not > quite sure if 3 lines of debugging code need an extra option, but if you > don't want to reuse any existing debug option, it may be possible. Adding > it to the RHEL debug kernel would be trivial. Wouldn't it be equally trivial to simply enable the fault injection? You would get additional failure paths testing as a bonus.
On Tue, Apr 24, 2018 at 08:29:14AM -0400, Mikulas Patocka wrote: > > > On Mon, 23 Apr 2018, Matthew Wilcox wrote: > > > On Mon, Apr 23, 2018 at 08:06:16PM -0400, Mikulas Patocka wrote: > > > Some bugs (such as buffer overflows) are better detected > > > with kmalloc code, so we must test the kmalloc path too. > > > > Well now, this brings up another item for the collective TODO list -- > > implement redzone checks for vmalloc. Unless this is something already > > taken care of by kasan or similar. > > The kmalloc overflow testing is also not ideal - it rounds the size up to > the next slab size and detects buffer overflows only at this boundary. > > Some times ago, I made a "kmalloc guard" patch that places a magic number > immediatelly after the requested size - so that it can detect overflows at > byte boundary > ( https://www.redhat.com/archives/dm-devel/2014-September/msg00018.html ) > > That patch found a bug in crypto code: > ( http://lkml.iu.edu/hypermail/linux/kernel/1409.1/02325.html ) Is it still worth doing this, now we have kasan?
On Tue, 24 Apr 2018, Michal Hocko wrote: > On Tue 24-04-18 13:00:11, Mikulas Patocka wrote: > > > > > > On Tue, 24 Apr 2018, Michal Hocko wrote: > > > > > On Tue 24-04-18 11:50:30, Mikulas Patocka wrote: > > > > > > > > > > > > On Tue, 24 Apr 2018, Michal Hocko wrote: > > > > > > > > > On Mon 23-04-18 20:06:16, Mikulas Patocka wrote: > > > > > [...] > > > > > > @@ -404,6 +405,12 @@ void *kvmalloc_node(size_t size, gfp_t f > > > > > > */ > > > > > > WARN_ON_ONCE((flags & GFP_KERNEL) != GFP_KERNEL); > > > > > > > > > > > > +#ifdef CONFIG_DEBUG_SG > > > > > > + /* Catch bugs when the caller uses DMA API on the result of kvmalloc. */ > > > > > > + if (!(prandom_u32_max(2) & 1)) > > > > > > + goto do_vmalloc; > > > > > > +#endif > > > > > > > > > > I really do not think there is anything DEBUG_SG specific here. Why you > > > > > simply do not follow should_failslab path or even reuse the function? > > > > > > > > CONFIG_DEBUG_SG is enabled by default in RHEL and Fedora debug kernel (if > > > > you don't like CONFIG_DEBUG_SG, pick any other option that is enabled > > > > there). > > > > > > Are you telling me that you are shaping a debugging functionality basing > > > on what RHEL has enabled? And you call me evil. This is just rediculous. > > > > > > > Fail-injection framework is if off by default and it must be explicitly > > > > enabled and configured by the user - and most users won't enable it. > > > > > > It can be enabled easily. And if you care enough for your debugging > > > kernel then just make it enabled unconditionally. > > > > So, should we add a new option CONFIG_KVMALLOC_FALLBACK_DEFAULT? I'm not > > quite sure if 3 lines of debugging code need an extra option, but if you > > don't want to reuse any existing debug option, it may be possible. Adding > > it to the RHEL debug kernel would be trivial. > > Wouldn't it be equally trivial to simply enable the fault injection? You > would get additional failure paths testing as a bonus. The RHEL and Fedora debugging kernels are compiled with fault injection. But the fault-injection framework will do nothing unless it is enabled by a kernel parameter or debugfs write. Most users don't know about the fault injection kernel parameters or debugfs files and won't enabled it. We need a CONFIG_ option to enable it by default in the debugging kernels (and we could add a kernel parameter to override the default, fine-tune the fallback probability etc.) Mikulas
On Tue 24-04-18 13:28:49, Mikulas Patocka wrote: > > > On Tue, 24 Apr 2018, Michal Hocko wrote: > > > On Tue 24-04-18 13:00:11, Mikulas Patocka wrote: > > > > > > > > > On Tue, 24 Apr 2018, Michal Hocko wrote: > > > > > > > On Tue 24-04-18 11:50:30, Mikulas Patocka wrote: > > > > > > > > > > > > > > > On Tue, 24 Apr 2018, Michal Hocko wrote: > > > > > > > > > > > On Mon 23-04-18 20:06:16, Mikulas Patocka wrote: > > > > > > [...] > > > > > > > @@ -404,6 +405,12 @@ void *kvmalloc_node(size_t size, gfp_t f > > > > > > > */ > > > > > > > WARN_ON_ONCE((flags & GFP_KERNEL) != GFP_KERNEL); > > > > > > > > > > > > > > +#ifdef CONFIG_DEBUG_SG > > > > > > > + /* Catch bugs when the caller uses DMA API on the result of kvmalloc. */ > > > > > > > + if (!(prandom_u32_max(2) & 1)) > > > > > > > + goto do_vmalloc; > > > > > > > +#endif > > > > > > > > > > > > I really do not think there is anything DEBUG_SG specific here. Why you > > > > > > simply do not follow should_failslab path or even reuse the function? > > > > > > > > > > CONFIG_DEBUG_SG is enabled by default in RHEL and Fedora debug kernel (if > > > > > you don't like CONFIG_DEBUG_SG, pick any other option that is enabled > > > > > there). > > > > > > > > Are you telling me that you are shaping a debugging functionality basing > > > > on what RHEL has enabled? And you call me evil. This is just rediculous. > > > > > > > > > Fail-injection framework is if off by default and it must be explicitly > > > > > enabled and configured by the user - and most users won't enable it. > > > > > > > > It can be enabled easily. And if you care enough for your debugging > > > > kernel then just make it enabled unconditionally. > > > > > > So, should we add a new option CONFIG_KVMALLOC_FALLBACK_DEFAULT? I'm not > > > quite sure if 3 lines of debugging code need an extra option, but if you > > > don't want to reuse any existing debug option, it may be possible. Adding > > > it to the RHEL debug kernel would be trivial. > > > > Wouldn't it be equally trivial to simply enable the fault injection? You > > would get additional failure paths testing as a bonus. > > The RHEL and Fedora debugging kernels are compiled with fault injection. > But the fault-injection framework will do nothing unless it is enabled by > a kernel parameter or debugfs write. > > Most users don't know about the fault injection kernel parameters or > debugfs files and won't enabled it. We need a CONFIG_ option to enable it > by default in the debugging kernels (and we could add a kernel parameter > to override the default, fine-tune the fallback probability etc.) If it is a real issue to install the debugging kernel with the required kernel parameter then I a config option for the default on makes sense to me.
On Tue, 24 Apr 2018, Matthew Wilcox wrote: > On Tue, Apr 24, 2018 at 08:29:14AM -0400, Mikulas Patocka wrote: > > > > > > On Mon, 23 Apr 2018, Matthew Wilcox wrote: > > > > > On Mon, Apr 23, 2018 at 08:06:16PM -0400, Mikulas Patocka wrote: > > > > Some bugs (such as buffer overflows) are better detected > > > > with kmalloc code, so we must test the kmalloc path too. > > > > > > Well now, this brings up another item for the collective TODO list -- > > > implement redzone checks for vmalloc. Unless this is something already > > > taken care of by kasan or similar. > > > > The kmalloc overflow testing is also not ideal - it rounds the size up to > > the next slab size and detects buffer overflows only at this boundary. > > > > Some times ago, I made a "kmalloc guard" patch that places a magic number > > immediatelly after the requested size - so that it can detect overflows at > > byte boundary > > ( https://www.redhat.com/archives/dm-devel/2014-September/msg00018.html ) > > > > That patch found a bug in crypto code: > > ( http://lkml.iu.edu/hypermail/linux/kernel/1409.1/02325.html ) > > Is it still worth doing this, now we have kasan? The kmalloc guard has much lower overhead than kasan. (BTW. when I tried kasan, it oopsed with persistent memory) Mikulas
Hello, Mikulas. On Tue, Apr 24, 2018 at 02:41:47PM -0400, Mikulas Patocka wrote: > > > On Tue, 24 Apr 2018, Matthew Wilcox wrote: > > > On Tue, Apr 24, 2018 at 08:29:14AM -0400, Mikulas Patocka wrote: > > > > > > > > > On Mon, 23 Apr 2018, Matthew Wilcox wrote: > > > > > > > On Mon, Apr 23, 2018 at 08:06:16PM -0400, Mikulas Patocka wrote: > > > > > Some bugs (such as buffer overflows) are better detected > > > > > with kmalloc code, so we must test the kmalloc path too. > > > > > > > > Well now, this brings up another item for the collective TODO list -- > > > > implement redzone checks for vmalloc. Unless this is something already > > > > taken care of by kasan or similar. > > > > > > The kmalloc overflow testing is also not ideal - it rounds the size up to > > > the next slab size and detects buffer overflows only at this boundary. > > > > > > Some times ago, I made a "kmalloc guard" patch that places a magic number > > > immediatelly after the requested size - so that it can detect overflows at > > > byte boundary > > > ( https://www.redhat.com/archives/dm-devel/2014-September/msg00018.html ) > > > > > > That patch found a bug in crypto code: > > > ( http://lkml.iu.edu/hypermail/linux/kernel/1409.1/02325.html ) > > > > Is it still worth doing this, now we have kasan? > > The kmalloc guard has much lower overhead than kasan. I skimm at your code and it requires rebuilding the kernel. I think that if rebuilding is required as the same with the KASAN, using the KASAN is better since it has far better coverage for detection the bug. However, I think that if the redzone can be setup tightly without rebuild, it would be worth implementing. I have an idea to implement it only for the SLUB. Could I try it? (I'm asking this because I'm inspired from the above patch.) :) Or do you wanna try it? Thanks.
Index: linux-2.6/mm/util.c =================================================================== --- linux-2.6.orig/mm/util.c 2018-04-23 00:12:05.000000000 +0200 +++ linux-2.6/mm/util.c 2018-04-23 17:57:02.000000000 +0200 @@ -14,6 +14,7 @@ #include <linux/hugetlb.h> #include <linux/vmalloc.h> #include <linux/userfaultfd_k.h> +#include <linux/random.h> #include <asm/sections.h> #include <linux/uaccess.h> @@ -404,6 +405,12 @@ void *kvmalloc_node(size_t size, gfp_t f */ WARN_ON_ONCE((flags & GFP_KERNEL) != GFP_KERNEL); +#ifdef CONFIG_DEBUG_SG + /* Catch bugs when the caller uses DMA API on the result of kvmalloc. */ + if (!(prandom_u32_max(2) & 1)) + goto do_vmalloc; +#endif + /* * We want to attempt a large physically contiguous block first because * it is less likely to fragment multiple larger blocks and therefore @@ -427,6 +434,9 @@ void *kvmalloc_node(size_t size, gfp_t f if (ret || size <= PAGE_SIZE) return ret; +#ifdef CONFIG_DEBUG_SG +do_vmalloc: +#endif return __vmalloc_node_flags_caller(size, node, flags, __builtin_return_address(0)); }
The kvmalloc function tries to use kmalloc and falls back to vmalloc if kmalloc fails. Unfortunatelly, some kernel code has bugs - it uses kvmalloc and then uses DMA-API on the returned memory or frees it with kfree. Such bugs were found in the virtio-net driver, dm-integrity or RHEL7 powerpc-specific code. These bugs are hard to reproduce because kvmalloc falls back to vmalloc only if memory is fragmented. In order to detect these bugs reliably I submit this patch that changes kvmalloc to fall back to vmalloc with 1/2 probability if CONFIG_DEBUG_SG is turned on. CONFIG_DEBUG_SG is used, because it makes the DMA API layer verify the addresses passed to it, and so the user will get a reliable stacktrace. Some bugs (such as buffer overflows) are better detected with kmalloc code, so we must test the kmalloc path too. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> --- mm/util.c | 10 ++++++++++ 1 file changed, 10 insertions(+)