diff mbox

slub: fix slab_pad_check() and SLAB_DESTROY_BY_RCU

Message ID 4A9F1620.2080105@gmail.com
State Not Applicable, archived
Delegated to: David Miller
Headers show

Commit Message

Eric Dumazet Sept. 3, 2009, 1:04 a.m. UTC
Zdenek Kabelac a écrit :
> 
> Well I'm not noticing any ill behavior - also note - rcu_barrier() is
> there before the cache is destroyed.
> But as I said - it's just my shot into the dark - which seems to work for me...
> 

Reading again your traces, I do believe there are two bugs in slub

Maybe not explaining your problem, but worth to fix !

Thank you

[PATCH] slub: fix slab_pad_check() and SLAB_DESTROY_BY_RCU

When SLAB_POISON is used and slab_pad_check() finds an overwrite of the
slab padding, we call restore_bytes() on the whole slab, not only
on the padding.

kmem_cache_destroy() should call rcu_barrier() *after* kmem_cache_close()
and *before* sysfs_slab_remove() or risk rcu_free_slab()
being called after kmem_cache is deleted (kfreed).

rmmod nf_conntrack can crash the machine because it has to
kmem_cache_destroy() a SLAB_DESTROY_BY_RCU enabled cache.

Reported-by: Zdenek Kabelac <zdenek.kabelac@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Pekka Enberg Sept. 3, 2009, 6:31 a.m. UTC | #1
On Thu, Sep 3, 2009 at 4:04 AM, Eric Dumazet<eric.dumazet@gmail.com> wrote:
> Zdenek Kabelac a écrit :
>>
>> Well I'm not noticing any ill behavior - also note - rcu_barrier() is
>> there before the cache is destroyed.
>> But as I said - it's just my shot into the dark - which seems to work for me...
>>
>
> Reading again your traces, I do believe there are two bugs in slub
>
> Maybe not explaining your problem, but worth to fix !
>
> Thank you
>
> [PATCH] slub: fix slab_pad_check() and SLAB_DESTROY_BY_RCU
>
> When SLAB_POISON is used and slab_pad_check() finds an overwrite of the
> slab padding, we call restore_bytes() on the whole slab, not only
> on the padding.
>
> kmem_cache_destroy() should call rcu_barrier() *after* kmem_cache_close()
> and *before* sysfs_slab_remove() or risk rcu_free_slab()
> being called after kmem_cache is deleted (kfreed).
>
> rmmod nf_conntrack can crash the machine because it has to
> kmem_cache_destroy() a SLAB_DESTROY_BY_RCU enabled cache.
>
> Reported-by: Zdenek Kabelac <zdenek.kabelac@gmail.com>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> ---
> diff --git a/mm/slub.c b/mm/slub.c
> index b9f1491..0ac839f 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -646,7 +646,7 @@ static int slab_pad_check(struct kmem_cache *s, struct page *page)
>        slab_err(s, page, "Padding overwritten. 0x%p-0x%p", fault, end - 1);
>        print_section("Padding", end - remainder, remainder);
>
> -       restore_bytes(s, "slab padding", POISON_INUSE, start, end);
> +       restore_bytes(s, "slab padding", POISON_INUSE, end - remainder, end);

OK, makes sense.

>        return 0;
>  }
>
> @@ -2594,8 +2594,6 @@ static inline int kmem_cache_close(struct kmem_cache *s)
>  */
>  void kmem_cache_destroy(struct kmem_cache *s)
>  {
> -       if (s->flags & SLAB_DESTROY_BY_RCU)
> -               rcu_barrier();
>        down_write(&slub_lock);
>        s->refcount--;
>        if (!s->refcount) {
> @@ -2606,6 +2604,8 @@ void kmem_cache_destroy(struct kmem_cache *s)
>                                "still has objects.\n", s->name, __func__);
>                        dump_stack();
>                }
> +               if (s->flags & SLAB_DESTROY_BY_RCU)
> +                       rcu_barrier();
>                sysfs_slab_remove(s);
>        } else
>                up_write(&slub_lock);

The rcu_barrier() call was added by this commit:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=7ed9f7e5db58c6e8c2b4b738a75d5dcd8e17aad5

I guess we should CC Paul as well.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Sept. 3, 2009, 7:38 a.m. UTC | #2
Pekka Enberg a écrit :
> On Thu, Sep 3, 2009 at 4:04 AM, Eric Dumazet<eric.dumazet@gmail.com> wrote:
>> Zdenek Kabelac a écrit :
>>> Well I'm not noticing any ill behavior - also note - rcu_barrier() is
>>> there before the cache is destroyed.
>>> But as I said - it's just my shot into the dark - which seems to work for me...
>>>
>> Reading again your traces, I do believe there are two bugs in slub
>>
>> Maybe not explaining your problem, but worth to fix !
>>
>> Thank you
>>
>> [PATCH] slub: fix slab_pad_check() and SLAB_DESTROY_BY_RCU
>>
>> When SLAB_POISON is used and slab_pad_check() finds an overwrite of the
>> slab padding, we call restore_bytes() on the whole slab, not only
>> on the padding.
>>
>> kmem_cache_destroy() should call rcu_barrier() *after* kmem_cache_close()
>> and *before* sysfs_slab_remove() or risk rcu_free_slab()
>> being called after kmem_cache is deleted (kfreed).
>>
>> rmmod nf_conntrack can crash the machine because it has to
>> kmem_cache_destroy() a SLAB_DESTROY_BY_RCU enabled cache.
>>
>> Reported-by: Zdenek Kabelac <zdenek.kabelac@gmail.com>
>> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
>> ---
>> diff --git a/mm/slub.c b/mm/slub.c
>> index b9f1491..0ac839f 100644
>> --- a/mm/slub.c
>> +++ b/mm/slub.c
>> @@ -646,7 +646,7 @@ static int slab_pad_check(struct kmem_cache *s, struct page *page)
>>        slab_err(s, page, "Padding overwritten. 0x%p-0x%p", fault, end - 1);
>>        print_section("Padding", end - remainder, remainder);
>>
>> -       restore_bytes(s, "slab padding", POISON_INUSE, start, end);
>> +       restore_bytes(s, "slab padding", POISON_INUSE, end - remainder, end);
> 
> OK, makes sense.
> 
>>        return 0;
>>  }
>>
>> @@ -2594,8 +2594,6 @@ static inline int kmem_cache_close(struct kmem_cache *s)
>>  */
>>  void kmem_cache_destroy(struct kmem_cache *s)
>>  {
>> -       if (s->flags & SLAB_DESTROY_BY_RCU)
>> -               rcu_barrier();
>>        down_write(&slub_lock);
>>        s->refcount--;
>>        if (!s->refcount) {
>> @@ -2606,6 +2604,8 @@ void kmem_cache_destroy(struct kmem_cache *s)
>>                                "still has objects.\n", s->name, __func__);
>>                        dump_stack();
>>                }
>> +               if (s->flags & SLAB_DESTROY_BY_RCU)
>> +                       rcu_barrier();
>>                sysfs_slab_remove(s);
>>        } else
>>                up_write(&slub_lock);
> 
> The rcu_barrier() call was added by this commit:
> 
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=7ed9f7e5db58c6e8c2b4b738a75d5dcd8e17aad5
> 
> I guess we should CC Paul as well.

Sure !

rcu_barrier() is definitly better than synchronize_rcu() in 
kmem_cache_destroy()

But its location was not really right (for SLUB at least)

SLAB_DESTROY_BY_RCU means subsystem will call kfree(elems) without waiting RCU
grace period.

By the time subsystem calls kmem_cache_destroy(), all previously allocated
elems must have already be kfreed() by this subsystem.

We must however wait that all slabs, queued for freeing by rcu_free_slab(),
are indeed freed, since this freeing needs access to kmem_cache pointer.

As kmem_cache_close() might clean/purge the cache and call rcu_free_slab(),
we must call rcu_barrier() *after* kmem_cache_close(), and before kfree(kmem_cache *s)

Alternatively we could delay this final kfree(s) (with call_rcu()) but would
have to copy s->name in kmem_cache_create() instead of keeping a pointer to
 a string that might be in a module, and freed at rmmod time.

Given that there is few uses in current tree that call kmem_cache_destroy()
on a SLAB_DESTROY_BY_RCU cache, there is no need to try to optimize this
rcu_barrier() call, unless we want superfast reboot/halt sequences...

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Pekka Enberg Sept. 3, 2009, 7:51 a.m. UTC | #3
Hi Eric,

On Thu, Sep 3, 2009 at 10:38 AM, Eric Dumazet<eric.dumazet@gmail.com> wrote:
>> The rcu_barrier() call was added by this commit:
>>
>> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=7ed9f7e5db58c6e8c2b4b738a75d5dcd8e17aad5
>>
>> I guess we should CC Paul as well.
>
> Sure !
>
> rcu_barrier() is definitly better than synchronize_rcu() in
> kmem_cache_destroy()
>
> But its location was not really right (for SLUB at least)
>
> SLAB_DESTROY_BY_RCU means subsystem will call kfree(elems) without waiting RCU
> grace period.
>
> By the time subsystem calls kmem_cache_destroy(), all previously allocated
> elems must have already be kfreed() by this subsystem.
>
> We must however wait that all slabs, queued for freeing by rcu_free_slab(),
> are indeed freed, since this freeing needs access to kmem_cache pointer.
>
> As kmem_cache_close() might clean/purge the cache and call rcu_free_slab(),
> we must call rcu_barrier() *after* kmem_cache_close(), and before kfree(kmem_cache *s)
>
> Alternatively we could delay this final kfree(s) (with call_rcu()) but would
> have to copy s->name in kmem_cache_create() instead of keeping a pointer to
>  a string that might be in a module, and freed at rmmod time.
>
> Given that there is few uses in current tree that call kmem_cache_destroy()
> on a SLAB_DESTROY_BY_RCU cache, there is no need to try to optimize this
> rcu_barrier() call, unless we want superfast reboot/halt sequences...

Oh, sure, the fix looks sane to me. It's just that I am a complete
coward when it comes to merging RCU related patches so I always try to
fish an Acked-by from Paul or Christoph ;).

                        Pekka
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Zdenek Kabelac Sept. 3, 2009, 1:28 p.m. UTC | #4
2009/9/3 Eric Dumazet <eric.dumazet@gmail.com>:
> Zdenek Kabelac a écrit :
>>
>> Well I'm not noticing any ill behavior - also note - rcu_barrier() is
>> there before the cache is destroyed.
>> But as I said - it's just my shot into the dark - which seems to work for me...
>>
>
> Reading again your traces, I do believe there are two bugs in slub
>
> Maybe not explaining your problem, but worth to fix !
>
> Thank you
>
> [PATCH] slub: fix slab_pad_check() and SLAB_DESTROY_BY_RCU
>


Ok - I can confirm that this patch fixes my reboot problem.

Thanks

Zdenek
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paul E. McKenney Sept. 3, 2009, 1:42 p.m. UTC | #5
On Thu, Sep 03, 2009 at 09:31:01AM +0300, Pekka Enberg wrote:
> On Thu, Sep 3, 2009 at 4:04 AM, Eric Dumazet<eric.dumazet@gmail.com> wrote:
> > Zdenek Kabelac a écrit :
> >>
> >> Well I'm not noticing any ill behavior - also note - rcu_barrier() is
> >> there before the cache is destroyed.
> >> But as I said - it's just my shot into the dark - which seems to work for me...
> >>
> >
> > Reading again your traces, I do believe there are two bugs in slub
> >
> > Maybe not explaining your problem, but worth to fix !
> >
> > Thank you
> >
> > [PATCH] slub: fix slab_pad_check() and SLAB_DESTROY_BY_RCU
> >
> > When SLAB_POISON is used and slab_pad_check() finds an overwrite of the
> > slab padding, we call restore_bytes() on the whole slab, not only
> > on the padding.
> >
> > kmem_cache_destroy() should call rcu_barrier() *after* kmem_cache_close()
> > and *before* sysfs_slab_remove() or risk rcu_free_slab()
> > being called after kmem_cache is deleted (kfreed).
> >
> > rmmod nf_conntrack can crash the machine because it has to
> > kmem_cache_destroy() a SLAB_DESTROY_BY_RCU enabled cache.
> >
> > Reported-by: Zdenek Kabelac <zdenek.kabelac@gmail.com>
> > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> > ---
> > diff --git a/mm/slub.c b/mm/slub.c
> > index b9f1491..0ac839f 100644
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -646,7 +646,7 @@ static int slab_pad_check(struct kmem_cache *s, struct page *page)
> >        slab_err(s, page, "Padding overwritten. 0x%p-0x%p", fault, end - 1);
> >        print_section("Padding", end - remainder, remainder);
> >
> > -       restore_bytes(s, "slab padding", POISON_INUSE, start, end);
> > +       restore_bytes(s, "slab padding", POISON_INUSE, end - remainder, end);
> 
> OK, makes sense.
> 
> >        return 0;
> >  }
> >
> > @@ -2594,8 +2594,6 @@ static inline int kmem_cache_close(struct kmem_cache *s)
> >  */
> >  void kmem_cache_destroy(struct kmem_cache *s)
> >  {
> > -       if (s->flags & SLAB_DESTROY_BY_RCU)
> > -               rcu_barrier();
> >        down_write(&slub_lock);
> >        s->refcount--;
> >        if (!s->refcount) {
> > @@ -2606,6 +2604,8 @@ void kmem_cache_destroy(struct kmem_cache *s)
> >                                "still has objects.\n", s->name, __func__);
> >                        dump_stack();
> >                }
> > +               if (s->flags & SLAB_DESTROY_BY_RCU)
> > +                       rcu_barrier();
> >                sysfs_slab_remove(s);
> >        } else
> >                up_write(&slub_lock);
> 
> The rcu_barrier() call was added by this commit:
> 
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=7ed9f7e5db58c6e8c2b4b738a75d5dcd8e17aad5
> 
> I guess we should CC Paul as well.

I agree that moving the rcu_barrier() as you have done is the right
thing to do -- no point in doing the rcu_barrier() unless you actually
are destroying the kmem_cache!

Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Sept. 3, 2009, 1:46 p.m. UTC | #6
Zdenek Kabelac a écrit :
> 2009/9/3 Eric Dumazet <eric.dumazet@gmail.com>:
>> Zdenek Kabelac a écrit :
>>> Well I'm not noticing any ill behavior - also note - rcu_barrier() is
>>> there before the cache is destroyed.
>>> But as I said - it's just my shot into the dark - which seems to work for me...
>>>
>> Reading again your traces, I do believe there are two bugs in slub
>>
>> Maybe not explaining your problem, but worth to fix !
>>
>> Thank you
>>
>> [PATCH] slub: fix slab_pad_check() and SLAB_DESTROY_BY_RCU
>>
> 
> 
> Ok - I can confirm that this patch fixes my reboot problem.
> 
> Thanks
> 
> Zdenek

Good news !

Hmm... but you had a problem with a 2.6.29.something kernel if I remember well ?

At that time, conntrack did not use SLAB_DESTROY_BY_RCU yet.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Pekka Enberg Sept. 3, 2009, 2:05 p.m. UTC | #7
Hi Christoph,

On Thu, 3 Sep 2009, Pekka Enberg wrote:
>> Oh, sure, the fix looks sane to me. It's just that I am a complete
>> coward when it comes to merging RCU related patches so I always try to
>> fish an Acked-by from Paul or Christoph ;).

On Thu, Sep 3, 2009 at 8:50 PM, Christoph
Lameter<cl@linux-foundation.org> wrote:
> I am fine with acking the poison piece.

Ok.

On Thu, Sep 3, 2009 at 8:50 PM, Christoph
Lameter<cl@linux-foundation.org> wrote:
> I did not ack the patch that added rcu to kmem_cache_destroy() and I
> likely wont ack that piece either.

Right. I didn't remember that I merged that over your NAK but I don't
share your view that kmem_cache_destroy() callers should do
rcu_barrier() or whatever is necessary if we can do it from
kmem_cache_destroy(). So I am happy to take both changes but I would
appreciate them being split into two separate patches.

                        Pekka
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Zdenek Kabelac Sept. 3, 2009, 2:35 p.m. UTC | #8
2009/9/3 Eric Dumazet <eric.dumazet@gmail.com>:
> Zdenek Kabelac a écrit :
>> 2009/9/3 Eric Dumazet <eric.dumazet@gmail.com>:
>>> Zdenek Kabelac a écrit :
>>>> Well I'm not noticing any ill behavior - also note - rcu_barrier() is
>>>> there before the cache is destroyed.
>>>> But as I said - it's just my shot into the dark - which seems to work for me...
>>>>
>>> Reading again your traces, I do believe there are two bugs in slub
>>>
>>> Maybe not explaining your problem, but worth to fix !
>>>
>>> Thank you
>>>
>>> [PATCH] slub: fix slab_pad_check() and SLAB_DESTROY_BY_RCU
>>>
>>
>>
>> Ok - I can confirm that this patch fixes my reboot problem.
>>
>> Thanks
>>
>> Zdenek
>
> Good news !
>
> Hmm... but you had a problem with a 2.6.29.something kernel if I remember well ?
>
> At that time, conntrack did not use SLAB_DESTROY_BY_RCU yet.

Yes - but it was also giving different error message - which even not
loaded ftp conntrack module. Basically it has been fine until
conntrack started to use rcu. From that moment various errors started
to appear with slub and some kernel debugging options - slab was fine.
With your recent patch it's the first time I do not see oops with nf
after 2.6.29 kernel.

Zdenek
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paul E. McKenney Sept. 3, 2009, 3 p.m. UTC | #9
On Thu, Sep 03, 2009 at 12:45:43PM -0500, Christoph Lameter wrote:
> On Thu, 3 Sep 2009, Eric Dumazet wrote:
> 
> > on a SLAB_DESTROY_BY_RCU cache, there is no need to try to optimize this
> > rcu_barrier() call, unless we want superfast reboot/halt sequences...
> 
> I stilll think that the action to quiesce rcu is something that the caller
> of kmem_cache_destroy must take care of.

Why?

							Thanx, Paul

> Could you split this into two patches: One that addresses the poison and
> another that deals with rcu?
> 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Lameter Sept. 3, 2009, 5:45 p.m. UTC | #10
On Thu, 3 Sep 2009, Eric Dumazet wrote:

> on a SLAB_DESTROY_BY_RCU cache, there is no need to try to optimize this
> rcu_barrier() call, unless we want superfast reboot/halt sequences...

I stilll think that the action to quiesce rcu is something that the caller
of kmem_cache_destroy must take care of.

Could you split this into two patches: One that addresses the poison and
another that deals with rcu?

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Lameter Sept. 3, 2009, 5:50 p.m. UTC | #11
On Thu, 3 Sep 2009, Pekka Enberg wrote:

> Oh, sure, the fix looks sane to me. It's just that I am a complete
> coward when it comes to merging RCU related patches so I always try to
> fish an Acked-by from Paul or Christoph ;).

I am fine with acking the poison piece.

I did not ack the patch that added rcu to kmem_cache_destroy() and I
likely wont ack that piece either.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/mm/slub.c b/mm/slub.c
index b9f1491..0ac839f 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -646,7 +646,7 @@  static int slab_pad_check(struct kmem_cache *s, struct page *page)
 	slab_err(s, page, "Padding overwritten. 0x%p-0x%p", fault, end - 1);
 	print_section("Padding", end - remainder, remainder);
 
-	restore_bytes(s, "slab padding", POISON_INUSE, start, end);
+	restore_bytes(s, "slab padding", POISON_INUSE, end - remainder, end);
 	return 0;
 }
 
@@ -2594,8 +2594,6 @@  static inline int kmem_cache_close(struct kmem_cache *s)
  */
 void kmem_cache_destroy(struct kmem_cache *s)
 {
-	if (s->flags & SLAB_DESTROY_BY_RCU)
-		rcu_barrier();
 	down_write(&slub_lock);
 	s->refcount--;
 	if (!s->refcount) {
@@ -2606,6 +2604,8 @@  void kmem_cache_destroy(struct kmem_cache *s)
 				"still has objects.\n", s->name, __func__);
 			dump_stack();
 		}
+		if (s->flags & SLAB_DESTROY_BY_RCU)
+			rcu_barrier();
 		sysfs_slab_remove(s);
 	} else
 		up_write(&slub_lock);