Message ID | CAHZ_AjsbkkXNMG8oZfOhr6WbUmV=4eK0j23ijbP=cNkfVdoAjQ@mail.gmail.com |
---|---|
State | Not Applicable |
Delegated to: | Pablo Neira |
Headers | show |
eric gisse <jowr.pi@gmail.com> wrote: > Background: > > This was discovered on a server running a tor exit node (crazy high > packet flow) with a firewall that uses a few connection tracking rules > in the INPUT chain: > > # iptables-save | grep conn > -A INPUT -m comment --comment "001-v4 drop invalid traffic" -m > conntrack --ctstate INVALID -j DROP > -A INPUT -m comment --comment "990-v4 accept existing connections" -m > conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT > > The kernel was not stock, but rather was modified with grsecurity. I > worked with the grsecurity folks first on this issue ( > https://forums.grsecurity.net/viewtopic.php?f=1&t=4071 ) to isolate > and explain what's going on. They were very helpful. Thanks for reporting. > because netconsole is ... inconsistent with when choosing to work. As > an aside, what is the ideal way to get kernel oops output anyway? booting into a crash-kernel has worked for me in the past to salvage original trace from memory. > Note: please Ignore the xt_* modules as they were not in use at the > time, and were not present for either the 3.16.5 panics or the 3.17.1 > + sanitize test case patch. Just to be clear, the 3.16.5 panic is also with pax memory sanitizing...? > The spot of code that's causing grief: > > # addr2line -e vmlinux -fip ffffffff814b58ce > nf_ct_tuplehash_to_ctrack at > /usr/src/linux/include/net/netfilter/nf_conntrack.h:122 > (inlined by) nf_ct_key_equal at > /usr/src/linux/net/netfilter/nf_conntrack_core.c:393 > (inlined by) ____nf_conntrack_find at > /usr/src/linux/net/netfilter/nf_conntrack_core.c:422 > (inlined by) __nf_conntrack_find_get at > /usr/src/linux/net/netfilter/nf_conntrack_core.c:453 Thanks. So this happens when we walk the conntrack hash lists to find a matching entry. > diff --git a/mm/slub.c b/mm/slub.c > index 3e8afcc07a76..08a7cbcf2274 100644 > --- a/mm/slub.c > +++ b/mm/slub.c > @@ -2643,6 +2643,12 @@ static __always_inline void slab_free(struct kmem_cache *s, > > slab_free_hook(s, x); > > + if (pax_sanitize_slab && !(s->flags & SLAB_NO_SANITIZE)) { > + memset(x, PAX_MEMORY_SANITIZE_VALUE, s->object_size); > + if (s->ctor) > + s->ctor(x); > + } > + I am no SLUB expert, but this looks wrong. slab_free() is called directly via kmem_cache_free(). conntrack objects are alloc'd/free'd from a SLAB_DESTROY_BY_RCU cache. It is therefore legal to access a conntrack object from another CPU even after kmem_cache_free() was invoked on another cpu, provided all readers that do so hold rcu_read_lock, and verify that object has not been freed yet by issuing appropriate atomic_inc_not_zero calls. Therefore, object poisoning will only be safe from rcu callback, after accesses are known to be illegal/invalid. (not saying that conntrack is bug free..., we had races there in the past). From a short glance at SLUB it seems poisoning objects for SLAB_DESTROY_BY_RCU caches is safe in __free_slab(), but not earlier. If you use different allocator, please tell us which one (check kernel config, slub is default). If its reproduceable with poisoning done after the RCU grace periods have elapsed (i.e., where its not legal anymore to access the memory), please let us know and we can have another look at it. Thanks. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Oct 31, 2014 at 4:50 PM, Florian Westphal <fw@strlen.de> wrote: > eric gisse <jowr.pi@gmail.com> wrote: >> Background: >> >> This was discovered on a server running a tor exit node (crazy high >> packet flow) with a firewall that uses a few connection tracking rules >> in the INPUT chain: >> >> # iptables-save | grep conn >> -A INPUT -m comment --comment "001-v4 drop invalid traffic" -m >> conntrack --ctstate INVALID -j DROP >> -A INPUT -m comment --comment "990-v4 accept existing connections" -m >> conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT >> >> The kernel was not stock, but rather was modified with grsecurity. I >> worked with the grsecurity folks first on this issue ( >> https://forums.grsecurity.net/viewtopic.php?f=1&t=4071 ) to isolate >> and explain what's going on. They were very helpful. > > Thanks for reporting. > >> because netconsole is ... inconsistent with when choosing to work. As >> an aside, what is the ideal way to get kernel oops output anyway? > > booting into a crash-kernel has worked for me in the past to salvage > original trace from memory. I'm using Gentoo which doesn't have the super nice crash kernel / abrtd stuff setup. That's the one thing I really like about RHEL, though I wouldn't be able to use grsecurity (or anything else custom) in kernel space with those tools for that matter... > >> Note: please Ignore the xt_* modules as they were not in use at the >> time, and were not present for either the 3.16.5 panics or the 3.17.1 >> + sanitize test case patch. > > Just to be clear, the 3.16.5 panic is also with pax memory > sanitizing...? Correct. Since it ran along the same syscall path as the 3.17.1 panics, I am making the assumption it is the same bug. I don't have the 3.16.5 kernel built with the debugging flags needed though, so I can't verify it 100% after the fact but I'm reasonably confident at this point with the amount of "reproducability" this issue has had. > >> The spot of code that's causing grief: >> >> # addr2line -e vmlinux -fip ffffffff814b58ce >> nf_ct_tuplehash_to_ctrack at >> /usr/src/linux/include/net/netfilter/nf_conntrack.h:122 >> (inlined by) nf_ct_key_equal at >> /usr/src/linux/net/netfilter/nf_conntrack_core.c:393 >> (inlined by) ____nf_conntrack_find at >> /usr/src/linux/net/netfilter/nf_conntrack_core.c:422 >> (inlined by) __nf_conntrack_find_get at >> /usr/src/linux/net/netfilter/nf_conntrack_core.c:453 > > Thanks. > So this happens when we walk the conntrack hash lists to find > a matching entry. That is as far as I was able to understand. My connection tracking table gets *big*. This is what it looks like at this instant in time on the machine in question: # sysctl -a | grep conntrack_count net.ipv4.netfilter.ip_conntrack_count = 46205 net.netfilter.nf_conntrack_count = 46203 > >> diff --git a/mm/slub.c b/mm/slub.c >> index 3e8afcc07a76..08a7cbcf2274 100644 >> --- a/mm/slub.c >> +++ b/mm/slub.c >> @@ -2643,6 +2643,12 @@ static __always_inline void slab_free(struct kmem_cache *s, >> >> slab_free_hook(s, x); >> >> + if (pax_sanitize_slab && !(s->flags & SLAB_NO_SANITIZE)) { >> + memset(x, PAX_MEMORY_SANITIZE_VALUE, s->object_size); >> + if (s->ctor) >> + s->ctor(x); >> + } >> + > > I am no SLUB expert, but this looks wrong. > slab_free() is called directly via kmem_cache_free(). I can't help with that one. My competence does not extend to kernel memory managment / allocation issues :) > > conntrack objects are alloc'd/free'd from a SLAB_DESTROY_BY_RCU cache. > > It is therefore legal to access a conntrack object from another > CPU even after kmem_cache_free() was invoked on another cpu, provided all > readers that do so hold rcu_read_lock, and verify that object has not been > freed yet by issuing appropriate atomic_inc_not_zero calls. > > Therefore, object poisoning will only be safe from rcu callback, after > accesses are known to be illegal/invalid. Can you expand on that? The term "object poisoning" to me means an object (you are talking about the conntract tuple, right?) with problematic values is put into memory, but the way you phrase it seems more like the hash table itself is being manipulated improperly. I'm still trying to work out what the actual ISSUE is. My understanding is this, thus far: It seems like an object in the connection track hash table is being improperly marked as free, which then is sanitized, and is then later being accessed by the netfilter codepath that loops through the table. > > (not saying that conntrack is bug free..., we had races there in the > past). > > From a short glance at SLUB it seems poisoning objects for SLAB_DESTROY_BY_RCU > caches is safe in __free_slab(), but not earlier. > > If you use different allocator, please tell us which one (check kernel > config, slub is default). SLAB allocator, though I do not remember making the choice. From the kernel config that's causing issues: # egrep 'SLAB|SLUB' .config CONFIG_SLAB=y # CONFIG_SLUB is not set CONFIG_SLABINFO=y # CONFIG_DEBUG_SLAB is not set CONFIG_PAX_USERCOPY_SLABS=y For reference, the current kernel, with the PaX sanitization feature disabled, doesn't exhibit the issue. Not that I am surprised. I don't, as a rule, mess with kernel memory/process management internals without a good reason because I don't have enough information to make a proper choice. Usually the defaults are "good enough". I can only think of a handful of instances where I have had reason to do so, and even then the results were inconsistent at best. > > If its reproduceable with poisoning done after the RCU grace periods > have elapsed (i.e., where its not legal anymore to access the memory), > please let us know and we can have another look at it. > > Thanks. Reproducability is an issue since I don't know what's triggering it in the first place. Just that it happens after a variable length of time along the same code path, subject to differences between the two kernel versions I've seen this issue with. The machine itself is pushing 20-25 megabytes (~50k packets) per second at any given time and has smacked the default conntrack hash table maximums. So the netfilter system is under nontrivial stresses. I'll happily work with you guys to isolate this as this is an interesting problem and I'm bored, but I need a bit of help and prompting to get this done properly. I am a sysadmin of reasonable (in my own estimate) skill and developer in puppet / perl, but kernel stuff beyond surface level debugging of panics is way beyond my aegis. Even after your explanation I am not yet sure I understand the issue, and am definitely sure I don't understand how to debug this further. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Oct 31, 2014 at 05:50:42PM +0100, Florian Westphal wrote: > eric gisse <jowr.pi@gmail.com> wrote: > > diff --git a/mm/slub.c b/mm/slub.c > > index 3e8afcc07a76..08a7cbcf2274 100644 > > --- a/mm/slub.c > > +++ b/mm/slub.c > > @@ -2643,6 +2643,12 @@ static __always_inline void slab_free(struct kmem_cache *s, > > > > slab_free_hook(s, x); > > > > + if (pax_sanitize_slab && !(s->flags & SLAB_NO_SANITIZE)) { > > + memset(x, PAX_MEMORY_SANITIZE_VALUE, s->object_size); > > + if (s->ctor) > > + s->ctor(x); > > + } > > + > > I am no SLUB expert, but this looks wrong. > slab_free() is called directly via kmem_cache_free(). > > conntrack objects are alloc'd/free'd from a SLAB_DESTROY_BY_RCU cache. > > It is therefore legal to access a conntrack object from another > CPU even after kmem_cache_free() was invoked on another cpu, provided all > readers that do so hold rcu_read_lock, and verify that object has not been > freed yet by issuing appropriate atomic_inc_not_zero calls. > > Therefore, object poisoning will only be safe from rcu callback, after > accesses are known to be illegal/invalid. Snap, you're right! I was misreading the following comment in include/linux/slab.h to allow "free reuse" by the slab allocator as well, e.g. for sanitizing/poisoning the object: * SLAB_DESTROY_BY_RCU - **WARNING** READ THIS! * * This delays freeing the SLAB page by a grace period, it does _NOT_ * delay object freeing. This means that if you do kmem_cache_free() * that memory location is free to be reused at any time. Thus it may * be possible to see another object there in the same RCU grace period. But, in fact, that assumption is not true. I now see how the conntrack code exploits this feature by testing &ct->ct_general.use. So if we scratch that by writing '\xfe' everywhere over the object, that test will no longer work. I guess we need to change the slab sanitization feature in PaX to handle SLAB_DESTROY_BY_RCU marked slabs the way they need to. > > (not saying that conntrack is bug free..., we had races there in the > past). > > From a short glance at SLUB it seems poisoning objects for SLAB_DESTROY_BY_RCU > caches is safe in __free_slab(), but not earlier. > > If you use different allocator, please tell us which one (check kernel > config, slub is default). > > If its reproduceable with poisoning done after the RCU grace periods > have elapsed (i.e., where its not legal anymore to access the memory), > please let us know and we can have another look at it. > Thanks, Mathias -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
eric gisse <jowr.pi@gmail.com> wrote: > On Fri, Oct 31, 2014 at 4:50 PM, Florian Westphal <fw@strlen.de> wrote: > >> + if (pax_sanitize_slab && !(s->flags & SLAB_NO_SANITIZE)) { > >> + memset(x, PAX_MEMORY_SANITIZE_VALUE, s->object_size); > >> + if (s->ctor) > >> + s->ctor(x); > >> + } > >> + > > > > I am no SLUB expert, but this looks wrong. > > slab_free() is called directly via kmem_cache_free(). > > I can't help with that one. My competence does not extend to kernel > memory managment / allocation issues :) Seems Mathias Krause will work on improving Pax poisoning to treat SLAB_DESTROY_BY_RCU specially. > > conntrack objects are alloc'd/free'd from a SLAB_DESTROY_BY_RCU cache. > > > > It is therefore legal to access a conntrack object from another > > CPU even after kmem_cache_free() was invoked on another cpu, provided all > > readers that do so hold rcu_read_lock, and verify that object has not been > > freed yet by issuing appropriate atomic_inc_not_zero calls. > > > > Therefore, object poisoning will only be safe from rcu callback, after > > accesses are known to be illegal/invalid. > > Can you expand on that? The term "object poisoning" to me means an > object (you are talking about the conntract tuple, right?) with Yes. > problematic values is put into memory, but the way you phrase it seems > more like the hash table itself is being manipulated improperly. No, afaics the conntrack object accesses are correct. > I'm still trying to work out what the actual ISSUE is. My > understanding is this, thus far: > > It seems like an object in the connection track hash table is being > improperly marked as free, which then is sanitized, and is then later > being accessed by the netfilter codepath that loops through the table. No. Conntrack objects are free'd when the last reference counter goes away. However, because lookup of the conntrack hash table is lockless, another CPU might be accessing the conntrack object that is being free'd right now. Usually this means that the access is invalid. However, in the conntrack case, the conntrack objects are allocated from a special cache that delays freeing of underlying pages until we know that no other cpu is currently accessing it. So there are 2 possible cases: 1 - the conntrack object that is being looked at is alive (refcnt > 1). 2 - the conntrack object that is being looked is being free'd RIGHT NOW on another cpu. RCU protects us from page fault, since the underlying memory page cannot be free'd. So, we're safe to look at the memory contents of the tuple and decide wheter its the object (conntrack tuple) we're trying to find or not. If it is, we try to obtain a reference, this will only succeed if the reference count is not 0 already, so we can detect the "its free'd" case. If we obtained a reference, we still need to re-validate the tuple address since its possible that the object was free'd on cpu x and almost-instantly reallocated for use by a different tuple. If you are interested in this you can have a look at the bug fixes made in that area, there are some more explanations there. https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/net/netfilter/nf_conntrack_core.c?id=e53376bef2cd97d3e3f61fdc677fb8da7d03d0da https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/net/netfilter/nf_conntrack_core.c?id=c6825c0976fa7893692e0e43b09740b419b23c09 > > If you use different allocator, please tell us which one (check kernel > > config, slub is default). > > SLAB allocator, though I do not remember making the choice. > > From the kernel config that's causing issues: > > # egrep 'SLAB|SLUB' .config > CONFIG_SLAB=y > # CONFIG_SLUB is not set > CONFIG_SLABINFO=y > # CONFIG_DEBUG_SLAB is not set > CONFIG_PAX_USERCOPY_SLABS=y Ok, from a quick glance PaX slab kfree is also zapping objects before grace period elapsed. > > If its reproduceable with poisoning done after the RCU grace periods > > have elapsed (i.e., where its not legal anymore to access the memory), > > please let us know and we can have another look at it. > > > > Thanks. > > Reproducability is an issue since I don't know what's triggering it in > the first place. Just that it happens after a variable length of time > along the same code path, subject to differences between the two > kernel versions I've seen this issue with. > > The machine itself is pushing 20-25 megabytes (~50k packets) per > second at any given time and has smacked the default conntrack hash > table maximums. So the netfilter system is under nontrivial stresses. It should be able to handle a lot more. https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/net/netfilter/nf_conntrack_core.c?id=93bb0ceb75be2fdfa9fc0dd1fb522d9ada515d9c > I'll happily work with you guys to isolate this as this is an > interesting problem and I'm bored, but I need a bit of help and > prompting to get this done properly. Sure, my understanding is that someone from pax team is working on the object poisoning to handle SLAB_DESTROY_BY_RCU properly. Please don't hesitate to report back with newer pax versions if you still see invalid accesses. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 31 October 2014 23:00, Florian Westphal <fw@strlen.de> wrote: > eric gisse <jowr.pi@gmail.com> wrote: >> On Fri, Oct 31, 2014 at 4:50 PM, Florian Westphal <fw@strlen.de> wrote: >> >> + if (pax_sanitize_slab && !(s->flags & SLAB_NO_SANITIZE)) { >> >> + memset(x, PAX_MEMORY_SANITIZE_VALUE, s->object_size); >> >> + if (s->ctor) >> >> + s->ctor(x); >> >> + } >> >> + >> > >> > I am no SLUB expert, but this looks wrong. >> > slab_free() is called directly via kmem_cache_free(). >> >> I can't help with that one. My competence does not extend to kernel >> memory managment / allocation issues :) > > Seems Mathias Krause will work on improving Pax poisoning to treat > SLAB_DESTROY_BY_RCU specially. Well, the fix is as easy as destructive. PaX sanitize has to exclude SLAB_DESTROY_BY_RCU marked caches from the per-object sanitization. It only can do page based sanitization for such slabs. >> > conntrack objects are alloc'd/free'd from a SLAB_DESTROY_BY_RCU cache. >> > >> > It is therefore legal to access a conntrack object from another >> > CPU even after kmem_cache_free() was invoked on another cpu, provided all >> > readers that do so hold rcu_read_lock, and verify that object has not been >> > freed yet by issuing appropriate atomic_inc_not_zero calls. >> > >> > Therefore, object poisoning will only be safe from rcu callback, after >> > accesses are known to be illegal/invalid. >> >> Can you expand on that? The term "object poisoning" to me means an >> object (you are talking about the conntract tuple, right?) with > > Yes. > >> problematic values is put into memory, but the way you phrase it seems >> more like the hash table itself is being manipulated improperly. > > No, afaics the conntrack object accesses are correct. > >> I'm still trying to work out what the actual ISSUE is. My >> understanding is this, thus far: >> >> It seems like an object in the connection track hash table is being >> improperly marked as free, which then is sanitized, and is then later >> being accessed by the netfilter codepath that loops through the table. > > No. Conntrack objects are free'd when the last reference counter goes > away. However, because lookup of the conntrack hash table is lockless, > another CPU might be accessing the conntrack object that is being free'd > right now. > > Usually this means that the access is invalid. However, in the > conntrack case, the conntrack objects are allocated from a special > cache that delays freeing of underlying pages until we know that no > other cpu is currently accessing it. > > So there are 2 possible cases: > 1 - the conntrack object that is being looked at is alive (refcnt > 1). > 2 - the conntrack object that is being looked is being free'd RIGHT NOW > on another cpu. RCU protects us from page fault, since the underlying > memory page cannot be free'd. > > So, we're safe to look at the memory contents of the tuple and decide > wheter its the object (conntrack tuple) we're trying to find or not. > > If it is, we try to obtain a reference, this will only succeed if the > reference count is not 0 already, so we can detect the "its free'd" > case. > > If we obtained a reference, we still need to re-validate the tuple > address since its possible that the object was free'd on cpu x and > almost-instantly reallocated for use by a different tuple. > > If you are interested in this you can have a look at the bug fixes made > in that area, there are some more explanations there. > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/net/netfilter/nf_conntrack_core.c?id=e53376bef2cd97d3e3f61fdc677fb8da7d03d0da > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/net/netfilter/nf_conntrack_core.c?id=c6825c0976fa7893692e0e43b09740b419b23c09 > PaX per-object sanitization just nullifies this assumption as its poisoning would set the reference count to 0xfefefefe which is clearly not zero. So, it looks like, the only place to safely sanitize the object is on the page level -- when it gets released after the RCU period has passed. And that's what the latest version of PaX sanitize is doing now. >> > If you use different allocator, please tell us which one (check kernel >> > config, slub is default). >> >> SLAB allocator, though I do not remember making the choice. >> >> From the kernel config that's causing issues: >> >> # egrep 'SLAB|SLUB' .config >> CONFIG_SLAB=y >> # CONFIG_SLUB is not set >> CONFIG_SLABINFO=y >> # CONFIG_DEBUG_SLAB is not set >> CONFIG_PAX_USERCOPY_SLABS=y > > Ok, from a quick glance PaX slab kfree is also zapping > objects before grace period elapsed. Yep. But that is now fixed in the latest version -- for all of them: SLAB/SLOB/SLUB. >> > If its reproduceable with poisoning done after the RCU grace periods >> > have elapsed (i.e., where its not legal anymore to access the memory), >> > please let us know and we can have another look at it. >> > >> > Thanks. >> >> Reproducability is an issue since I don't know what's triggering it in >> the first place. Just that it happens after a variable length of time >> along the same code path, subject to differences between the two >> kernel versions I've seen this issue with. >> >> The machine itself is pushing 20-25 megabytes (~50k packets) per >> second at any given time and has smacked the default conntrack hash >> table maximums. So the netfilter system is under nontrivial stresses. > > It should be able to handle a lot more. > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/net/netfilter/nf_conntrack_core.c?id=93bb0ceb75be2fdfa9fc0dd1fb522d9ada515d9c > >> I'll happily work with you guys to isolate this as this is an >> interesting problem and I'm bored, but I need a bit of help and >> prompting to get this done properly. > > Sure, my understanding is that someone from pax team is working on > the object poisoning to handle SLAB_DESTROY_BY_RCU properly. > > Please don't hesitate to report back with newer pax versions if you > still see invalid accesses. Thanks Florian, for investigating on this! Regards, Mathias -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 1edd5fdc629d..14eda90aa38e 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -2467,6 +2467,10 @@ bytes respectively. Such letter suffixes can also be entirely omitted. the specified number of seconds. This is to be used if your oopses keep scrolling off the screen. + pax_sanitize_slab= + 0/1 to disable/enable slab object sanitization (enabled + by default). + pcbit= [HW,ISDN] pcd. [PARIDE] diff --git a/include/linux/slab.h b/include/linux/slab.h index 1d9abb7d22a0..067bd01fed92 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -23,6 +23,7 @@ #define SLAB_DEBUG_FREE 0x00000100UL /* DEBUG: Perform (expensive) checks on free */ #define SLAB_RED_ZONE 0x00000400UL /* DEBUG: Red zone objs in a cache */ #define SLAB_POISON 0x00000800UL /* DEBUG: Poison objects */ +#define SLAB_NO_SANITIZE 0x00001000UL /* PaX: Do not sanitize objs on free */ #define SLAB_HWCACHE_ALIGN 0x00002000UL /* Align objs on cache lines */ #define SLAB_CACHE_DMA 0x00004000UL /* Use GFP_DMA memory */ #define SLAB_STORE_USER 0x00010000UL /* DEBUG: Store the last owner for bug hunting */ diff --git a/mm/slab.c b/mm/slab.c index 7c52b3890d25..3f111541d1ce 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -3384,6 +3384,16 @@ static inline void __cache_free(struct kmem_cache *cachep, void *objp, struct array_cache *ac = cpu_cache_get(cachep); check_irq_off(); + + if (pax_sanitize_slab) { + if (!(cachep->flags & (SLAB_POISON | SLAB_NO_SANITIZE))) { + memset(objp, PAX_MEMORY_SANITIZE_VALUE, cachep->object_size); + + if (cachep->ctor) + cachep->ctor(objp); + } + } + kmemleak_free_recursive(objp, cachep->flags); objp = cache_free_debugcheck(cachep, objp, caller); diff --git a/mm/slab.h b/mm/slab.h index 0e0fdd365840..3a2d6cbae601 100644 --- a/mm/slab.h +++ b/mm/slab.h @@ -32,6 +32,13 @@ extern struct list_head slab_caches; /* The slab cache that manages slab cache information */ extern struct kmem_cache *kmem_cache; +#ifdef CONFIG_X86_64 +#define PAX_MEMORY_SANITIZE_VALUE '\xfe' +#else +#define PAX_MEMORY_SANITIZE_VALUE '\xff' +#endif +extern bool pax_sanitize_slab; + unsigned long calculate_alignment(unsigned long flags, unsigned long align, unsigned long size); @@ -67,7 +74,7 @@ __kmem_cache_alias(const char *name, size_t size, size_t align, /* Legal flag mask for kmem_cache_create(), for various configurations */ #define SLAB_CORE_FLAGS (SLAB_HWCACHE_ALIGN | SLAB_CACHE_DMA | SLAB_PANIC | \ - SLAB_DESTROY_BY_RCU | SLAB_DEBUG_OBJECTS ) + SLAB_DESTROY_BY_RCU | SLAB_DEBUG_OBJECTS | SLAB_NO_SANITIZE) #if defined(CONFIG_DEBUG_SLAB) #define SLAB_DEBUG_FLAGS (SLAB_RED_ZONE | SLAB_POISON | SLAB_STORE_USER) diff --git a/mm/slab_common.c b/mm/slab_common.c index d319502b2403..f88dbc3fa1e7 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -30,6 +30,15 @@ LIST_HEAD(slab_caches); DEFINE_MUTEX(slab_mutex); struct kmem_cache *kmem_cache; +bool pax_sanitize_slab __read_mostly = true; +static int __init pax_sanitize_slab_setup(char *str) +{ + pax_sanitize_slab = !!simple_strtol(str, NULL, 0); + printk("%sabled PaX slab sanitization\n", pax_sanitize_slab ? "En" : "Dis"); + return 1; +} +__setup("pax_sanitize_slab=", pax_sanitize_slab_setup); + #ifdef CONFIG_DEBUG_VM static int kmem_cache_sanity_check(const char *name, size_t size) { diff --git a/mm/slob.c b/mm/slob.c index 21980e0f39a8..c4907d766048 100644 --- a/mm/slob.c +++ b/mm/slob.c @@ -365,6 +365,9 @@ static void slob_free(void *block, int size) return; } + if (pax_sanitize_slab) + memset(block, PAX_MEMORY_SANITIZE_VALUE, size); + if (!slob_page_free(sp)) { /* This slob page is about to become partially free. Easy! */ sp->units = units; diff --git a/mm/slub.c b/mm/slub.c index 3e8afcc07a76..08a7cbcf2274 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -2643,6 +2643,12 @@ static __always_inline void slab_free(struct kmem_cache *s, slab_free_hook(s, x); + if (pax_sanitize_slab && !(s->flags & SLAB_NO_SANITIZE)) { + memset(x, PAX_MEMORY_SANITIZE_VALUE, s->object_size); + if (s->ctor) + s->ctor(x); + } + redo: /* * Determine the currently cpus per cpu slab. @@ -2986,6 +2992,7 @@ static int calculate_sizes(struct kmem_cache *s, int forced_order) s->inuse = size; if (((flags & (SLAB_DESTROY_BY_RCU | SLAB_POISON)) || + (pax_sanitize_slab && !(flags & SLAB_NO_SANITIZE)) || s->ctor)) { /* * Relocate free pointer after the object if it is not diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 8d289697cc7a..7a4e52d90eed 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -3237,13 +3237,15 @@ void __init skb_init(void) skbuff_head_cache = kmem_cache_create("skbuff_head_cache", sizeof(struct sk_buff), 0, - SLAB_HWCACHE_ALIGN|SLAB_PANIC, + SLAB_HWCACHE_ALIGN|SLAB_PANIC| + SLAB_NO_SANITIZE, NULL); skbuff_fclone_cache = kmem_cache_create("skbuff_fclone_cache", (2*sizeof(struct sk_buff)) + sizeof(atomic_t), 0, - SLAB_HWCACHE_ALIGN|SLAB_PANIC, + SLAB_HWCACHE_ALIGN|SLAB_PANIC| + SLAB_NO_SANITIZE, NULL); }